The Volume Node Affinity Conflict error occurs in Kubernetes when a
Persistent Volume (PV) has specific node affinity rules that restrict it to
certain nodes (based on labels like topology or availability zones). This error
occurs when a pod attempting to use a volume with these labels is scheduled on a
node that does not meet these rules. However, when a PV is configured with
specific node affinity rules:- such as requiring certain labels corresponding to
availability zones or other topological constraints, it restricts the nodes on
which pods can be scheduled. When a pod tries to access a PV but is scheduled
on a node that does not meet these affinity requirements, Kubernetes cannot
attach the volume to the pod, resulting in the node affinity error. This
situation often occurs in cloud environments where storage volumes are tied to
specific availability zones (AZs).
Situations In Which This Error Commonly Appears
This error occurs mostly in the following situations:
-
Volume Attachment and Multi-AZ Deployments: When setting up a PV with
node affinity for one AZ but scheduling a pod on a node in another AZ,
Kubernetes blocks the volume attachment to prevent a topology mismatch. This
is particularly important in multi-availability zone deployments, such as in
cloud environments, where volumes are zonal-specific and must align with the
node’s location to ensure proper attachment and functionality.
-
Dynamic Storage Provisioners: Some storage classes enforce topology
constraints such as node-specific storage pools or hardware tiers, which can
create conflicts if pods are scheduled on incompatible nodes. A dynamic
provisioner like Ceph RBD or Portworx might create volumes tied to specific
storage nodes or racks in on-prem clusters. The volume attachment fails if a
pod is scheduled on a node that cannot access that storage node.
Whereas in cloud environments, storage classes using CSI drivers may enforce
constraints based on underlying hardware like SSD, and HDD tiers, preventing
volume attachment if the pod lands on an incompatible node.
-
Hybrid Workloads Across Regions: Storage attachment issues typically
do not apply in multi-cluster setups, but in rare cases where a single
Kubernetes cluster is stretched across multiple regions, region-specific
volumes (like, AWS EBS, Azure Managed Disks) may fail to attach if scheduled
in a different region.
Reproducing the Error
To reproduce the error in Kubernetes, we created a scenario where a PV is
restricted to a specific Availability Zone, but the pod that uses it is
scheduled on a node in a different Availability Zone. This arises due to
topology awareness in Kubernetes, where node affinity rules prevent the volume
from being attached to incompatible nodes. Here are the steps we followed, with
detailed configurations:
Step 1: Create an EKS Cluster
Run this command to create an EKS cluster with worker nodes in multiple AZs to
ensure nodes are in different AZs which is required to trigger the conflict.
eksctl create cluster --name eks-node-affinity --region eu-north-1 \ --nodegroup-name worker-nodes \ --nodes 2 \ --nodes-min 2 \ --nodes-max 3 \ --node-type t3.medium \ --managed
Verify the nodes and zones creation using the following command:
input kubectl get nodes -o wide --label-columns=topology.kubernetes.io/zone
output NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ZONE ip-192-xxx-xx-xxx.eu-north-1.compute.internal Ready <none> 17m v1.30.8-eks-aeac579 192.xxx.xx.xxx 13.xx.xxx.xxx Amazon Linux 2 5.xx.xxx-xxx.xxx.amzn2.x86_64 containerd://1.7.25 eu-north-1a ip-192-xxx-xx-xx.eu-north-1.compute.internal Ready <none> 17m v1.30.8-eks-aeac579 192.xxx.xx.xx 13.xx.xxx.xxx Amazon Linux 2 5.xx.xxx-xxx.xxx.amzn2.x86_64 containerd://1.7.25 eu-north-1b
Step 2: Install the AWS EBS CSI Driver
The following command installs the EBS CSI driver, which enables Kubernetes to provision and attach EBS volumes dynamically.
kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.39"
Verify the deployment as shown below:
input kubectl get pods -n kube-system | grep ebs
output ebs-csi-controller-778b9db5f4-ghg8x 6/6 Running 0 4m45s ebs-csi-controller-778b9db5f4-gtl9g 6/6 Running 0 4m45s ebs-csi-node-95fhf 3/3 Running 0 4m40s ebs-csi-node-gc7rh 3/3 Running 0 4m40s
All the pods are up and running properly.
Step 3: Attach IAM Policy to Allow EBS Volume Creation
EKS worker nodes do not have permission to create EBS volumes by default. The AmazonEBSCSIDriverPolicy must be enabled to attach EBS Volumes. Find the Node IAM Roleas shown below:
input aws eks describe-nodegroup --cluster-name eks-node-affinity --nodegroup-name worker-nodes --query "nodegroup.nodeRole"
output "arn:aws:iam::9xxxxxxxxxx3:role/eksctl-eks-node-affinity-nodegroup-NodeInstanceRole-XXXXXX"
Then attach the IAM Policy replacing the Node IAM Role with the help of the below command:
aws iam attach-role-policy \ --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \ --role-name eksctl-eks-node-affinity-nodegroup-NodeInstanceRole-XXXXXX
Once done verify if the policy is attached:
aws iam list-attached-role-policies --role-name eksctl-eks-node-affinity-nodegroup-NodeInstanceRole-XXXXXX
Step 4: Create a StorageClass in eu-north-1a
Now, define a StorageClass to ensure all volumes are created in eu-north-1a. This ensures that all EBS volumes are provisioned in eu-north-1a, regardless of where the pod is scheduled. In this, the volumeBindingMode is set to Immediate as it forces the EBS volume to be created in eu-north-1a before any pod is scheduled. If we used WaitForFirstConsumer, Kubernetes would wait for a pod to be scheduled first, placing the volume in the correct AZ and avoiding the creation of the error.
storageclass.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ebs-sc provisioner: ebs.csi.aws.com parameters: type: gp3 fsType: ext4 reclaimPolicy: Delete volumeBindingMode: Immediate allowedTopologies: - matchLabelExpressions: - key: topology.kubernetes.io/zone values: - eu-north-1a
Apply the configuration on the cluster.
kubectl apply -f storageclass.yaml storageclass.storage.k8s.io/ebs-sc created
Step 5: Create a PVC That Requests EBS Storage
Create a PVC that binds to the EBS Storage.
pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ebs-pvc spec: accessModes: - ReadWriteOnce storageClassName: ebs-sc resources: requests: storage: 10Gi
Run the following command to apply it.
input $ kubectl apply -f pvc.yaml
output persistentvolumeclaim/ebs-pvc created
Check the PVC Status using the command
input kubectl get pvc ebs-pvc
output NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE ebs-pvc Bound pvc-959acf48-8557-4404-a5ac-75589cf70338 10Gi RWO ebs-sc <unset> 93s
This shows the status if in pending or in bound state.
Step 6: Create a Pod in a Different AZ (eu-north-1b)
To trigger the Volume Node Affinity Conflict, we run the pod in a different zone
eu-north-1b, in contrast to the volume which is in eu-north-1a.
pod.yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: volumes: - name: ebs-storage persistentVolumeClaim: claimName: ebs-pvc containers: - name: app image: busybox command: ["sleep", "3600"] volumeMounts: - mountPath: "/data" name: ebs-storage nodeSelector: topology.kubernetes.io/zone: eu-north-1b
To apply this, run the below command.
input kubectl apply -f pod.yaml
output pod/test-pod created
Get the status of the pod creation run:
input kubectl get pod test-pod
output NAME READY STATUS RESTARTS AGE test-pod 0/1 Pending 0 14s
The status stays pending. Let us describe the pod for more details.
input kubectl describe pod test-pod
ouput Name: test-pod Namespace: default Priority: 0 Service Account: default Node: <none> Labels: <none> Annotations: <none> Status: Pending IP: IPs: <none> Containers: app: Image: busybox Port: <none> Host Port: <none> Command: sleep 3600 Environment: <none> Mounts: /data from ebs-storage (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9m9w5 (ro) Conditions: Type Status PodScheduled False Volumes: ebs-storage: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: ebs-pvc ReadOnly: false kube-api-access-9m9w5: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: topology.kubernetes.io/zone=eu-north-1b Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 23s default-scheduler 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had volume node affinity conflict. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
In the events section, you can see the warning with the reason
FailedScheduling and the message
0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector,
1 node(s) had volume node affinity conflict. preemption: 0/2 nodes are
available: 2 Preemption is not helpful for scheduling.
What Causes This Error
-
Mismatched Availability Zones: The most common cause is that the
persistent volume claims (PVCs) associated with the pod exist in zones
different from where the pod is trying to run.
-
Improper or Inconsistent Labeling: Nodes in the cluster might not
have consistent labels or the expected topology labels, leading to
scheduling mismatches. -
Scheduler and Node Affinity Mismatch: If a pod’s scheduling
constraints (e.g., nodeSelector or affinity rules) do not align with the
PV’s node affinity, the scheduler may assign the pod to an incompatible
node. This misalignment prevents the volume from being mounted, leading to
scheduling failures.
Potential Impact
-
Pod Scheduling Failures: Pods may remain in a Pending state
indefinitely, unable to start due to the conflict, and Applications relying
on these pods may experience service disruptions.
-
Resource Utilization Conflicts: Resources may remain allocated
without being utilized, leading to inefficiencies and increased costs. -
Operational Complexity: Troubleshooting these conflicts can add
complexity to cluster management, and resolving the error can take time,
especially in multi-zone or multi-region environments.
How Troubleshoot And Resolve
When facing a volume node affinity conflict error in Kubernetes, follow these
general troubleshooting steps to identify and resolve the issues.
1. Inspect the Pod for Errors and Scheduling Details
Use the kubectl describe command to check the pod's status and view events for
any errors: kubectl describe pod <pod-name> Check which node the pod is
scheduled on with the help of the following command. kubectl get pod
<pod-name> -o wide Verify if the node's labels (e.g.,
topology.kubernetes.io/zone) align with the volume's node affinity rules.
2. Verify Node Labels
Check the labels of the node to confirm the availability zone or other
attributes: kubectl describe node <node-name> Ensure the node's labels
match the zone requirements for the Persistent Volume (PV).
3. Inspect the Persistent Volume (PV) Node Affinity
Check the node affinity rules defined on the PV to confirm they are restricted
to the correct zone. kubectl describe pv <pod-name> Look for the node
affinity section in the PV definition and verify the specified zone.
4. Identify and Resolve Conflicts
Case 1: Pod Scheduled in the Wrong Zone
If the pod is scheduled in a different zone than the PV, update the pod’s
scheduling rules to match the PV’s zone. Add a nodeSelector or affinity rule in
the pod definition (pod.yaml): spec: nodeSelector:
topology.kubernetes.io/zone: <desired-zone>
Case 2: PV Node Affinity Incompatibility
If necessary, adjust the PV's node affinity rules to be compatible with both
zones. Then, update the PV configuration to include all required zones.
5. Verify Binding
After making changes, check if the pod successfully binds to the PV: kubectl get
pods kubectl describe pod nginx-pod
6. Portworx Volume Placement Strategies
Additionally, by specifying rules for placing volumes and replicas near
eachother you can use Portworx.,
Portworx allows to define VolumePlacementStrategies
that can help avoid volume node affinity conflicts. For instance, if your
application relies on multiple volumes and they are distributed over multiple
nodes, it will have latency issues. You can avoid this by creating a
VolumePlacementStrategy that will place all the volumes on the same node.
apiVersion: portworx.io/v1beta2 kind: VolumePlacementStrategy metadata: name: webserver-volume-affinity spec: volumeAffinity: - matchExpressions: - key: app operator: In values: - webserver
How To Avoid The Error
-
Consistent Node Labeling: Ensure all nodes in the cluster have
consistent and accurate labels for topology (e.g., zones, regions). -
Regular Validation: Periodically audit the cluster’s node labels,
storage classes, and PV definitions to ensure alignment. -
Topology-Aware Storage Configuration: Use the storage classes and
provisioners that respect topology constraints and align with pod scheduling
requirements. -
Monitor Cloud Provider Settings: Regularly monitor the cloud
provider's settings and ensure that nodes' labels are correctly set up for
accurate Kubernetes scheduling. -
Deferred Volume Binding: Configure the storage class with
volumeBindingMode: WaitForFirstConsumer to delay volume binding until a pod
is scheduled. This ensures that PVs are provisioned in the correct topology,
preventing scheduling conflicts.
Conclusion
The "volume node affinity conflict" error highlights the importance of aligning
your pod scheduling, PV configuration, and PVC binding rules in Kubernetes. This
issue commonly arises in scenarios with topology-aware scheduling, particularly
in multi-AZ or hybrid environments. Ensuring proper node labeling, configuring
pod affinity/anti-affinity rules, and using solutions like Portworx to manage
volume placement dynamically with
topology awareness
can help in avoiding the error in the future.