Portworx Guided Hands On-Labs. Register Now
Kubernetes does a great job of orchestrating your containerized applications and deploying them across all the worker nodes in a cluster. If a node has enough compute capacity to run a specific Kubernetes pod, Kubernetes will schedule that application pod on that worker node. But what if none of the worker nodes in the cluster have enough available capacity to accept new application pods? At this point, Kubernetes will not be able to deploy your application pods, and they will be stuck in a pending state. In addition to this scenario, Kubernetes also does not have the capability to monitor and manage the storage utilization in your cluster. These are two huge problems when it comes to running applications on Kubernetes. This blog covers solutions from Portworx and AWS that will help users architect a solution that helps them remediate these concerns.
In this blog, we will look at how Portworx Autopilot and AWS Karpenter work together to help users build a solution on top of AWS EKS clusters that allows them to automatically expand persistent volumes or add more storage capacity to the Kubernetes clusters using Portworx Autopilot. The solution also automatically adds more CPU and memory resources by dynamically adding more worker nodes in the AWS EKS cluster using AWS Karpenter.
We can begin by developing a better understanding of automated storage capacity management with Portworx Autopilot. Autopilot is a rule-based engine that responds to changes from a monitoring source. Autopilot allows you to specify monitoring conditions as well as the actions it should take when those conditions occur, which means you can set simple IFTT rules against your EKS cluster and have Autopilot automatically perform a certain action for you if a certain condition has been met. Portworx Autopilot supports the following three use cases:
- Automatically resizing PVCs when they are running out of capacity
- Scaling Portworx storage pools to accommodate increasing usage
- Rebalancing volumes across Portworx storage pools when they come unbalanced
To get started with Portworx Autopilot, first you will have to deploy Portworx on your Amazon EKS cluster and configure Prometheus and Grafana for monitoring. Once you have that up and running, use the following steps to configure Autopilot and create an Autopilot rule that will monitor the capacity utilization of a persistent volume and scale it up accordingly:
- Use the following yaml file to deploy Portworx Autopilot on your Amazon EKS cluster. Verify the Prometheus endpoint set in the autopilot-config config map and ensure that it matches the service endpoint for Prometheus in your cluster.
# SOURCE: https://install.portworx.com/?comp=autopilot --- apiVersion: v1 kind: ConfigMap metadata: name: autopilot-config namespace: kube-system data: config.yaml: |- providers: - name: default type: prometheus params: url=http://px-prometheus:9090 min_poll_interval: 2 --- apiVersion: v1 kind: ServiceAccount metadata: name: autopilot-account namespace: kube-system --- apiVersion: apps/v1 kind: Deployment metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" labels: tier: control-plane name: autopilot namespace: kube-system spec: selector: matchLabels: name: autopilot strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate replicas: 1 template: metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" labels: name: autopilot tier: control-plane spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "name" operator: In values: - autopilot topologyKey: "kubernetes.io/hostname" hostPID: false containers: - command: - /autopilot - -f - ./etc/config/config.yaml - -log-level - debug imagePullPolicy: Always image: portworx/autopilot:1.3.1 resources: requests: cpu: '0.1' securityContext: privileged: false name: autopilot volumeMounts: - name: config-volume mountPath: /etc/config serviceAccountName: autopilot-account volumes: - name: config-volume configMap: name: autopilot-config items: - key: config.yaml path: config.yaml --- apiVersion: v1 kind: Service metadata: name: autopilot namespace: kube-system labels: name: autopilot-service spec: ports: - name: autopilot protocol: TCP port: 9628 selector: name: autopilot tier: control-plane --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: autopilot-role rules: - apiGroups: ["*"] resources: ["*"] verbs: ["*"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: autopilot-role-binding subjects: - kind: ServiceAccount name: autopilot-account namespace: kube-system roleRef: kind: ClusterRole name: autopilot-role apiGroup: rbac.authorization.k8s.io
- Once you apply this configuration, you can start creating AutopilotRules for individual applications or namespaces. An AutopilotRule has four main sections:
- Selector: Matches labels on the objects that the rule should monitor.
- Namespace Selector: Matches labels on the Kubernetes namespaces the rule should monitor. This is optional, and the default is all namespaces.
- Conditions: These are the metrics for the objects to monitor.
- Actions: These are what Autopilot will perform once the metric conditions are met.
Here is an example of an AutopilotRule that checks for a persistent volume with a label of app: postgres deployed in a namespace with a label of type: db. If the used capacity exceeds 50%, it doubles the size of the persistent volume untill it hits a maximum size of 400Gi.
apiVersion: autopilot.libopenstorage.org/v1alpha1 kind: AutopilotRule metadata: name: volume-resize spec: selector: matchLabels: app: postgres namespaceSelector: matchLabels: type: db conditions: expressions: - key: "100 * (px_volume_usage_bytes / px_volume_capacity_bytes)" operator: Gt values: - "50" actions: - name: openstorage.io.action.volume/resize params: scalepercentage: "100" maxsize: "400Gi"
Once you apply this AutopilotRule specification, Portworx will monitor the capacity utilization using Prometheus metrics for that persistent volume and automatically perform actions as needed.
- In addition to expanding individual persistent volumes, Portworx Autopilot also allows you to put AutopilotRules in place that will direct Portworx to automatically expand the underlying storage pool if your applications are more storage intensive and you don’t want to add more EKS worker nodes in your cluster. Below is a sample AutopilotRule that monitors your storage pool utilization. If the available capacity falls below 50% and if the total capacity is still less than 2TB, it automatically creates and attaches EBS volumes to your EKS worker node to add 50% more capacity of your storage pool:
apiVersion: autopilot.libopenstorage.org/v1alpha1 kind: AutopilotRule metadata: name: pool-expand spec: enforcement: required ##### conditions are the symptoms to evaluate. All conditions are AND'ed conditions: expressions: # pool available capacity less than 50% - key: "100 * ( px_pool_stats_available_bytes/ px_pool_stats_total_bytes)" operator: Lt values: - "50" # pool total capacity should not exceed 2TB - key: "px_pool_stats_total_bytes/(1024*1024*1024)" operator: Lt values: - "2000" ##### action to perform when condition is true actions: - name: "openstorage.io.action.storagepool/expand" params: # resize pool by scalepercentage of current size scalepercentage: "50" # when scaling, add disks to the pool scaletype: "add-disk"
Now that you know how to automate storage capacity management, we can look at how we can leverage AWS Karpenter to add more nodes to our EKS cluster when we need more compute capacity for our application pods.
- Use the eksctl instructions on Karpenter’s documentation site to deploy an EKS cluster. For our testing, we used the following eksctl configuration to deploy an EKS cluster with two different node groups. The first node group, “storage-nodes,” will include three nodes that will be used by Portworx to provide storage for your stateful applications. The second node group, “kar-bshah-ng,” will be used by AWS Karpenter to dynamically add more nodes to the EKS cluster to increase the amount of compute capacity available for your applications.
apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: kar-bshah region: us-west-2 version: "1.21" tags: karpenter.sh/discovery: kar-bshah managedNodeGroups: - name: storage-nodes instanceType: m5.xlarge minSize: 3 maxSize: 3 desiredCapacity: 3 volumeSize: 100 amiFamily: AmazonLinux2 labels: {role: worker, "portworx.io/node-type": "storage"} tags: nodegroup-role: worker iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly - arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess - arn:aws:iam::<<aws-account-id>>:policy/<<px-role>> withAddonPolicies: imageBuilder: true autoScaler: true ebs: true fsx: true efs: true albIngress: true cloudWatch: true - instanceType: m5.large amiFamily: AmazonLinux2 name: kar-bshah-ng desiredCapacity: 1 minSize: 1 maxSize: 10 iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly - arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess - arn:aws:iam::<<aws-account-id>>:policy/<<px-role>> withAddonPolicies: imageBuilder: true autoScaler: true ebs: true fsx: true efs: true albIngress: true cloudWatch: true
- Once you have your EKS cluster deployed, use the steps on the documentation site to configure AWS Karpenter and have it add more nodes to your EKS cluster when you need more CPU and memory resources for your application.
If you want to see Portworx Autopilot and AWS Karpenter in action, watch the following video, where we demonstrate how you can scale your compute and storage capacity as and when you need it.
Share
Subscribe for Updates
About Us
Portworx is the leader in cloud native storage for containers.
Thanks for subscribing!