Portworx Hands-on Labs Register Now

The Volume Node Affinity Conflict error occurs in Kubernetes when a
Persistent Volume (PV) has specific node affinity rules that restrict it to
certain nodes (based on labels like topology or availability zones). This error
occurs when a pod attempting to use a volume with these labels is scheduled on a
node that does not meet these rules. However, when a PV is configured with
specific node affinity rules:- such as requiring certain labels corresponding to
availability zones or other topological constraints, it restricts the nodes on
which pods can be scheduled.  When a pod tries to access a PV but is scheduled
on a node that does not meet these affinity requirements, Kubernetes cannot
attach the volume to the pod, resulting in the node affinity error. This
situation often occurs in cloud environments where storage volumes are tied to
specific availability zones (AZs).

Situations In Which This Error Commonly Appears

This error occurs mostly in the following situations: 

  • Volume Attachment and Multi-AZ Deployments: When setting up a PV with
    node affinity for one AZ but scheduling a pod on a node in another AZ, 
    Kubernetes blocks the volume attachment to prevent a topology mismatch. This
    is particularly important in multi-availability zone deployments, such as in
    cloud environments, where volumes are zonal-specific and must align with the
    node’s location to ensure proper attachment and functionality.
  • Dynamic Storage Provisioners: Some storage classes enforce topology
    constraints such as node-specific storage pools or hardware tiers, which can
    create conflicts if pods are scheduled on incompatible nodes. A dynamic
    provisioner like Ceph RBD or Portworx might create volumes tied to specific
    storage nodes or racks in on-prem clusters. The volume attachment fails if a
    pod is scheduled on a node that cannot access that storage node.

Whereas in cloud environments, storage classes using CSI drivers may enforce
constraints based on underlying hardware like SSD, and HDD tiers, preventing
volume attachment if the pod lands on an incompatible node.

  • Hybrid Workloads Across Regions: Storage attachment issues typically
    do not apply in multi-cluster setups, but in rare cases where a single
    Kubernetes cluster is stretched across multiple regions, region-specific
    volumes (like, AWS EBS, Azure Managed Disks) may fail to attach if scheduled
    in a different region.

Reproducing the Error

To reproduce the error in Kubernetes, we created a scenario where a PV is
restricted to a specific Availability Zone, but the pod that uses it is
scheduled on a node in a different Availability Zone. This arises due to
topology awareness in Kubernetes, where node affinity rules prevent the volume
from being attached to incompatible nodes.  Here are the steps we followed, with
detailed configurations: 

Step 1: Create an EKS Cluster

Run this command to create an EKS cluster with worker nodes in multiple AZs to
ensure nodes are in different AZs which is required to trigger the conflict. 

eksctl create cluster --name eks-node-affinity --region eu-north-1 \

  --nodegroup-name worker-nodes \

  --nodes 2 \

  --nodes-min 2 \

  --nodes-max 3 \

  --node-type t3.medium \

  --managed

Verify the nodes and zones creation using the following command:

input
kubectl get nodes -o wide --label-columns=topology.kubernetes.io/zone
output

NAME                                            STATUS   ROLES    AGE   VERSION               INTERNAL-IP      EXTERNAL-IP     OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME     ZONE

ip-192-xxx-xx-xxx.eu-north-1.compute.internal   Ready    <none>   17m   v1.30.8-eks-aeac579   192.xxx.xx.xxx   13.xx.xxx.xxx   Amazon Linux 2   5.xx.xxx-xxx.xxx.amzn2.x86_64   containerd://1.7.25   eu-north-1a

ip-192-xxx-xx-xx.eu-north-1.compute.internal    Ready    <none>   17m   v1.30.8-eks-aeac579   192.xxx.xx.xx    13.xx.xxx.xxx   Amazon Linux 2   5.xx.xxx-xxx.xxx.amzn2.x86_64   containerd://1.7.25   eu-north-1b

Step 2: Install the AWS EBS CSI Driver

The following command installs the EBS CSI driver, which enables Kubernetes to provision and attach EBS volumes dynamically.

kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.39"

Verify the deployment as shown below:

input
kubectl get pods -n kube-system | grep ebs
output
ebs-csi-controller-778b9db5f4-ghg8x   6/6     Running   0          4m45s

ebs-csi-controller-778b9db5f4-gtl9g   6/6     Running   0          4m45s

ebs-csi-node-95fhf                    3/3     Running   0          4m40s

ebs-csi-node-gc7rh                    3/3     Running   0          4m40s

All the pods are up and running properly.

Step 3: Attach IAM Policy to Allow EBS Volume Creation

EKS worker nodes do not have permission to create EBS volumes by default. The AmazonEBSCSIDriverPolicy must be enabled to attach EBS Volumes.  Find the Node IAM Roleas shown below:

input 

aws eks describe-nodegroup --cluster-name eks-node-affinity --nodegroup-name worker-nodes --query "nodegroup.nodeRole"
output

"arn:aws:iam::9xxxxxxxxxx3:role/eksctl-eks-node-affinity-nodegroup-NodeInstanceRole-XXXXXX"

Then attach the IAM Policy replacing the Node IAM Role with the help of the below command:

aws iam attach-role-policy \

    --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \

    --role-name eksctl-eks-node-affinity-nodegroup-NodeInstanceRole-XXXXXX

Once done verify if the policy is attached:

aws iam list-attached-role-policies --role-name eksctl-eks-node-affinity-nodegroup-NodeInstanceRole-XXXXXX

Step 4: Create a StorageClass in eu-north-1a

Now, define a StorageClass to ensure all volumes are created in eu-north-1a. This ensures that all EBS volumes are provisioned in eu-north-1a, regardless of where the pod is scheduled. In this, the volumeBindingMode is set to Immediate as it forces the EBS volume to be created in eu-north-1a before any pod is scheduled. If we used WaitForFirstConsumer, Kubernetes would wait for a pod to be scheduled first, placing the volume in the correct AZ and avoiding the creation of the error.

storageclass.yaml

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

  name: ebs-sc

provisioner: ebs.csi.aws.com

parameters:

  type: gp3

  fsType: ext4

reclaimPolicy: Delete

volumeBindingMode: Immediate  

allowedTopologies:

  - matchLabelExpressions:

      - key: topology.kubernetes.io/zone

        values:

          - eu-north-1a

Apply the configuration on the cluster.

kubectl apply -f storageclass.yaml

storageclass.storage.k8s.io/ebs-sc created

Step 5: Create a PVC That Requests EBS Storage

Create a PVC that binds to the EBS Storage.  

pvc.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: ebs-pvc

spec:

  accessModes:

    - ReadWriteOnce

  storageClassName: ebs-sc

  resources:

    requests:

      storage: 10Gi

Run the following command to apply it.

input

$ kubectl apply -f pvc.yaml
output
persistentvolumeclaim/ebs-pvc created

Check the PVC Status using the command 

input

kubectl get pvc ebs-pvc
output

NAME      STATUS   VOLUME  CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE

ebs-pvc   Bound    pvc-959acf48-8557-4404-a5ac-75589cf70338   10Gi       RWO            ebs-sc         <unset>                 93s

This shows the status if in pending or in bound state. 

Step 6: Create a Pod in a Different AZ (eu-north-1b)

To trigger the Volume Node Affinity Conflict, we run the pod in a different zone
eu-north-1b, in contrast to the volume which is in eu-north-1a.

pod.yaml

apiVersion: v1

kind: Pod

metadata:

  name: test-pod

spec:

  volumes:

    - name: ebs-storage

      persistentVolumeClaim:

        claimName: ebs-pvc

  containers:

    - name: app

      image: busybox

      command: ["sleep", "3600"]

      volumeMounts:

        - mountPath: "/data"

          name: ebs-storage

  nodeSelector:

    topology.kubernetes.io/zone: eu-north-1b  

To apply this, run the below command.

input

kubectl apply -f pod.yaml
output
pod/test-pod created

Get the status of the pod creation run:

input
kubectl get pod test-pod
output

NAME       READY   STATUS    RESTARTS   AGE

test-pod   0/1     Pending   0          14s

The status stays pending. Let us describe the pod for more details.

input
kubectl describe pod test-pod
ouput
 
Name:             test-pod

Namespace:        default

Priority:         0

Service Account:  default

Node:             <none>

Labels:           <none>

Annotations:      <none>

Status:           Pending

IP:               

IPs:              <none>

Containers:

  app:

    Image:      busybox

    Port:       <none>

    Host Port:  <none>

    Command:

      sleep

      3600

    Environment:  <none>

    Mounts:

      /data from ebs-storage (rw)

      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9m9w5 (ro)

Conditions:

  Type           Status

  PodScheduled   False 

Volumes:

  ebs-storage:

    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)

    ClaimName:  ebs-pvc

    ReadOnly:   false

  kube-api-access-9m9w5:

    Type:                    Projected (a volume that contains injected data from multiple sources)

    TokenExpirationSeconds:  3607

    ConfigMapName:           kube-root-ca.crt

    ConfigMapOptional:       <nil>

    DownwardAPI:             true

QoS Class:                   BestEffort

Node-Selectors:              topology.kubernetes.io/zone=eu-north-1b

Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events:

  Type     Reason            Age   From               Message

  ----     ------            ----  ----               -------

  Warning  FailedScheduling  23s   default-scheduler  0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had volume node affinity conflict. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.

In the events section, you can see the warning with the reason
FailedScheduling and the message
0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector,
1 node(s) had volume node affinity conflict. preemption: 0/2 nodes are
available: 2 Preemption is not helpful for scheduling.

What Causes This Error

  • Mismatched Availability Zones: The most common cause is that the
    persistent volume claims (PVCs) associated with the pod exist in zones
    different from where the pod is trying to run.
  • Improper or Inconsistent Labeling: Nodes in the cluster might not
    have consistent labels or the expected topology labels, leading to
    scheduling mismatches.
  • Scheduler and Node Affinity Mismatch: If a pod’s scheduling
    constraints (e.g., nodeSelector or affinity rules) do not align with the
    PV’s node affinity, the scheduler may assign the pod to an incompatible
    node. This misalignment prevents the volume from being mounted, leading to
    scheduling failures.

Potential Impact

  • Pod Scheduling Failures: Pods may remain in a Pending state
    indefinitely, unable to start due to the conflict, and Applications relying
    on these pods may experience service disruptions.
  • Resource Utilization Conflicts: Resources may remain allocated
    without being utilized, leading to inefficiencies and increased costs.
  • Operational Complexity: Troubleshooting these conflicts can add
    complexity to cluster management, and resolving the error can take time,
    especially in multi-zone or multi-region environments.

How Troubleshoot And Resolve

When facing a volume node affinity conflict error in Kubernetes, follow these
general troubleshooting steps to identify and resolve the issues.  

1. Inspect the Pod for Errors and Scheduling Details

Use the kubectl describe command to check the pod's status and view events for
any errors: kubectl describe pod <pod-name> Check which node the pod is
scheduled on with the help of the following command. kubectl get pod
<pod-name> -o wide Verify if the node's labels (e.g.,
topology.kubernetes.io/zone) align with the volume's node affinity rules.

2. Verify Node Labels

Check the labels of the node to confirm the availability zone or other
attributes: kubectl describe node <node-name> Ensure the node's labels
match the zone requirements for the Persistent Volume (PV).

3. Inspect the Persistent Volume (PV) Node Affinity

Check the node affinity rules defined on the PV to confirm they are restricted
to the correct zone. kubectl describe pv <pod-name> Look for the node
affinity section in the PV definition and verify the specified zone.

4. Identify and Resolve Conflicts

Case 1: Pod Scheduled in the Wrong Zone

If the pod is scheduled in a different zone than the PV, update the pod’s
scheduling rules to match the PV’s zone. Add a nodeSelector or affinity rule in
the pod definition (pod.yaml): spec:   nodeSelector:
    topology.kubernetes.io/zone: <desired-zone>

Case 2: PV Node Affinity Incompatibility

If necessary, adjust the PV's node affinity rules to be compatible with both
zones. Then, update the PV configuration to include all required zones.

5. Verify Binding

After making changes, check if the pod successfully binds to the PV: kubectl get
pods kubectl describe pod nginx-pod 

6. Portworx Volume Placement Strategies

Additionally, by specifying rules for placing volumes and replicas near
eachother you can use Portworx.,
Portworx allows to define VolumePlacementStrategies
that can help avoid volume node affinity conflicts. For instance, if your
application relies on multiple volumes and they are distributed over multiple
nodes, it will have latency issues. You can avoid this by creating a
VolumePlacementStrategy that will place all the volumes on the same node.

apiVersion: portworx.io/v1beta2

kind: VolumePlacementStrategy

metadata:

   name: webserver-volume-affinity

spec:

   volumeAffinity:

  - matchExpressions:

    - key: app

      operator: In

      values:

        - webserver

How To Avoid The Error

  • Consistent Node Labeling:  Ensure all nodes in the cluster have
    consistent and accurate labels for topology (e.g., zones, regions). 
  • Regular Validation: Periodically audit the cluster’s node labels,
    storage classes, and PV definitions to ensure alignment. 
  • Topology-Aware Storage Configuration: Use the storage classes and
    provisioners that respect topology constraints and align with pod scheduling
    requirements.
  • Monitor Cloud Provider Settings: Regularly monitor the cloud
    provider's settings and ensure that nodes' labels are correctly set up for
    accurate Kubernetes scheduling.
  • Deferred Volume Binding: Configure the storage class with
    volumeBindingMode: WaitForFirstConsumer to delay volume binding until a pod
    is scheduled. This ensures that PVs are provisioned in the correct topology,
    preventing scheduling conflicts. 

Conclusion

The "volume node affinity conflict" error highlights the importance of aligning
your pod scheduling, PV configuration, and PVC binding rules in Kubernetes. This
issue commonly arises in scenarios with topology-aware scheduling, particularly
in multi-AZ or hybrid environments. Ensuring proper node labeling, configuring
pod affinity/anti-affinity rules, and using solutions like Portworx to manage
volume placement dynamically with
topology awareness
can help in avoiding the error in the future.