Portworx & Red Hat Hands-on Labs Register Now

Stateful applications require persistent storage that survives pod restarts or rescheduling. This is where Kubernetes’ Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) help. In a PVC, you can define the storage requirements, such as capacity, access modes, and storage class, that the pod needs.

However, this process isn’t always straightforward. Misconfigurations or unmet requirements between a PVC and available PVs can lead to binding issues, leaving pods waiting for storage that isn’t properly allocated. You may see an error like, “Pod has unbound immediate persistent volume claims”, which means that a pod is trying to use a PVC that has not yet been successfully bound to a PV. This happens when no available PV can match the specifications of the PVC, such as capacity, access modes, or storage class, thus remaining unbound.

What Causes This Error

When a Pod creation is scheduled but its associated PVC cannot be matched to any suitable or available PV in the cluster, this error is logged in events. The error most commonly occurs in the following situations:

  • Node Constraints: The cluster may lack sufficient nodes to accommodate Pods requiring specific storage configurations, leading to scheduling failures.
  • Capacity mismatch: If the PVC requests more storage than available PVs can provide, the PVC remains unbound.
  • Access mode incompatibility: If the PVC requests for a specific access mode that no PV supports (for example, requesting ReadWriteMany when only ReadWriteOnce PVs are available), the binding will fail.
  • Storage Class Issues: The PVC specifies a storage class that doesn’t match any available storage class, or no default storage class is available on the cluster.
  • Dynamic Provisioning Failures: Dynamic provisioning fails due to misconfigured provisioners, insufficient storage, or other cluster issues.
  • Static Volume Binding Issues: If the PVC does not match any available PVs when you are statically provisioning PVs, you can encounter this error due to predefined node affinity on the PV or that specific PV is already bound to a PVC.

Reproducing the Error

To reproduce this error, we set up a Kubernetes cluster with 3 worker nodes and tried setting up Cassandra which is a stateful application:

  1. Create a headless service on a cluster using the following command.
  2. ```
    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: cassandra
      name: cassandra
    spec:
      clusterIP: None
      ports:
        - port: 9042
      selector:
        app: cassandra
    EOF
    ```
    
  3. Create a StorageClass as shown below:
  4. ```
    kubectl apply -f - <<EOF
    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: testing-sc
    provisioner: pd.storage.io #Change it based on your storage provider
    parameters:
      repl: "3"
      priority_io: "high"
      group: "cassandra_vg"
      fg: "true"
    EOF
    ```
    
  5. Create a StatefulSet using the following command:
  6. ```
    kubectl apply -f - <<EOF
    apiVersion: "apps/v1"
    kind: StatefulSet
    metadata:
      name: cassandra
    spec:
      selector:
        matchLabels:
          app: cassandra
      serviceName: cassandra
      replicas: 3
      template:
        metadata:
          labels:
            app: cassandra
        spec:
          containers:
          - name: cassandra
            image: gcr.io/google-samples/cassandra:v12
            imagePullPolicy: Always
            ports:
            - containerPort: 7000
              name: intra-node
            - containerPort: 7001
              name: tls-intra-node
            - containerPort: 7199
              name: jmx
            - containerPort: 9042
              name: cql
            resources:
              limits:
                cpu: "500m"
                memory: 1Gi
              requests:
                cpu: "500m"
                memory: 1Gi
            securityContext:
              capabilities:
                add:
                  - IPC_LOCK
            lifecycle:
              preStop:
                exec:
                  command: ["/bin/sh", "-c", "PID=$(pidof java) && kill $PID && while ps -p $PID > /dev/null; do sleep 1; done"]
            env:
              - name: MAX_HEAP_SIZE
                value: 512M
              - name: HEAP_NEWSIZE
                value: 100M
              - name: CASSANDRA_SEEDS
                value: "cassandra-0.cassandra.default.svc.cluster.local"
              - name: CASSANDRA_CLUSTER_NAME
                value: "K8Demo"
              - name: CASSANDRA_DC
                value: "DC1-K8Demo"
              - name: CASSANDRA_RACK
                value: "Rack1-K8Demo"
              - name: CASSANDRA_AUTO_BOOTSTRAP
                value: "false"
              - name: POD_IP
                valueFrom:
                  fieldRef:
                    fieldPath: status.podIP
              - name: POD_NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
            readinessProbe:
              exec:
                command:
                - /bin/bash
                - -c
                - /ready-probe.sh
              initialDelaySeconds: 40
              timeoutSeconds: 40
            volumeMounts:
            - name: cassandra-data
              mountPath: /var/lib/cassandra
      volumeClaimTemplates:
      - metadata:
          name: cassandra-data
          annotations:
            volume.beta.kubernetes.io/storage-class: testing-sc
        spec:
          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 1Gi
    EOF
    ```
    

In the above setup, we have defined the StorageClass to have 3 replicas and StatefulSet with 3 replicas. This StatefulSet configuration would spin 3 pods each with a PVC of specific storage.

Now if the cluster does not have enough resources to provide each PVC with a suitable persistent volume of the mentioned configuration, you will start to see the error. So, let us go and check if 3 pods have spun up. In a terminal, execute the following command to check the status of the pod for Cassandra:

```
kubectl get pods -l app=cassandra
NAME          READY   STATUS    RESTARTS   AGE
cassandra-0   1/1     Running   0          3m21s
cassandra-1   0/1     Pending   0          2m12s
```

Looks like `cassandra-0` pod has spun correctly but we are stuck at the provisioning of the second pod, `cassandra-1` which has been in pending status long enough. This means that there is some issue with resources available or suitable PV is not available for `cassandra-1` pod. Also, it means that if there would have been any misconfigurations, no pods would have been running.

Let us describe the pod `cassandra-1` and check for events:

``` 
kubectl describe pod cassandra-1
Name:             cassandra-1
Namespace:        default
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app=cassandra
                 apps.kubernetes.io/pod-index=1
                  controller-revision-hash=cassandra-54f79f768d
                  statefulset.kubernetes.io/pod-name=cassandra-1
Annotations:      <none>
Status:           Pending
…
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  93s                default-scheduler  0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
```

We can see that the pod failed to start because the PVC is in an unbound state due to the unavailability of suitable PV. Now let us understand what could be the possible reasons for this error.

Potential impacts of the error

The “pod has unbound immediate PersistentVolumeClaims” error can have several negative impacts on a Kubernetes environment. Here are the key impacts:

  • Application Downtime: Pods that rely on persistent storage cannot start until their PVCs are successfully bound to a PV. This error can cause service interruptions, especially if the application is critical and needs immediate access to its data.
  • Deployment Delays: When deploying applications, particularly StatefulSets or Operators (like database Operators), the error can cause a delay in pod deployment because the Kubernetes scheduler cannot bind PVCs to PVs, leading to pod startup failures.
  • Wasted resources: Resources allocated to pods stuck in a pending state (like CPU and memory) remain unutilized while waiting for the PVC to be bound, preventing effective resource utilization for other pods or services.
  • Increased Operational Complexity: Troubleshooting the error requires investigating various issues (e.g., missing PVs or misconfigured StorageClasses), increasing operational complexity and workload for administrators.
  • Negative Impact on Autoscaling: The horizontal pod autoscaling is impeded, as new replicas cannot be scheduled if their PVCs remain unbound, hindering scaling efficiency.
  • CI/CD pipeline failures: In development or testing environments, unbound PVCs can cause failures in CI/CD pipeline executions, delaying application deployments and impacting delivery timelines.

How Troubleshoot And Resolve

When you see this error, you can follow a step-by-step elimination process to find which of the reasons mentioned above has caused it.

  1. First, check the PVC status, since the error says unbound PVC.
  2. ```
    kubectl get pvc
    NAME                         STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
    cassandra-data-cassandra-0   Bound     pvc-b7abd57f-9be3-4a89-bb30-53b56faa115d   1Gi        RWO            testing-sc    <unset>                 4m31s
    cassandra-data-cassandra-1   Pending                                                                        testing-sc    <unset>                 3m21s
    
  3. Here, cassandra-data-cassandra-0 status is Bound while the status of cassandra-data-cassandra-1 is Pending.
  4. Describe the PVC in Pending state and check Events. Here, we can see that provisioning has failed because enough nodes are not available to provision the volume.
  5. ```
    kubectl describe pvc cassandra-data-cassandra-1
      Warning  ProvisioningFailed    10m (x2 over 13m)     pxd.portworx.com_xxxx-xxxx-xxxx  failed to provision volume with StorageClass "testing-sc": rpc error: code = Internal desc = Failed to create volume: could not find enough nodes to provision volume: 2 out of 2 pools could not be selected because they did not satisfy the following requirement: force-exclude nodes/pools with uuids: 4fccxxxx-xxxx-xxxxxx, 7252xxxx-xxxx-xxxx.
      Normal   ExternalProvisioning  4m11s (x41 over 14m)  persistentvolume-controller                                                     Waiting for a volume to be created either by the external provisioner 'pxd.portworx.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
    ```
    
  6. The next step is to check the status of the nodes. List the nodes on the cluster.
  7. ```
    kubectl get nodes
    NAME                                  STATUS   ROLES           AGE    VERSION
    ip-xx-xx-xx-xxx.node-1.com   Ready    <none>          137m   v1.31.0
    ip-xx-xx-xx-xxx.node-2.com   Ready    <none>          137m   v1.31.0
    ip-xx-xx-xx-xxx.node-3.com   Ready    control-plane   138m   v1.31.0
    ip-xx-xx-xx-xxx.node-4.com  Ready    <none>          137m   v1.31.0
    ```
    
  8. Since the nodes are in a Ready state, verify their utilization status and check for allocated resources.
  9. ```
    kubectl describe nodes
    Name:               ip-xx-xx-xx-xxx.node-1.com
    …
    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
    …
    Name:               ip-xx-xx-xx-xxx.node-2.com
    …
    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
    …
    ```
    
  10. All the worker nodes are in the overcommitted state, which means you need to add more worker nodes to complete the Cassandra setup.
  11. If you are not using a storage orchestrator, adding a node to a storage pool could be a lengthy, complex, and manual process, and may not be worth the developer’s time.Make sure you add labels, constraints, or affinity/anti-affinity-related details to ensure data locality. These rules are hard to manage when running applications at scale across a large number of servers and data centers, increasing room for error.Once you have storage available, delete the pod that is Pending state as shown below:
  12. ```
    kubectl delete pod cassandra-1
    ```
    
  13. Check the status of the pods again. Since StatefulSet is supposed to have 3 replicas, once it has enough resources available, it will spin up 3 pods.
  14. ```
    kubectl get pods -l app=cassandra
    
    NAME          READY   STATUS    RESTARTS   AGE
    cassandra-0   1/1     Running   0          58m47s
    cassandra-1   1/1     Running   0          7m29s
    cassandra-2   1/1     Running   0          6m28s
    ```
    

While the above process solved the problem, it did not focus on developer efficiency, or abstracting complexity, adding time to troubleshooting. A better approach would be to use a Kubernetes storage solution like Portworx, which works well with on-premises or any cloud-managed storage.

How Portworx simplifies Kubernetes storage and data management

You can install Portworx on your cluster and create a StorageClass as shown below.

```
kubectl apply -f - <<EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: portworx-sc
provisioner: pxd.portworx.com
parameters:
  repl: "1"
  priority_io: "high"
  group: "cassandra_vg"
  fg: "true"
EOF
```

Update the StatefulSet configuration to use this StorageClass.

```
  volumeClaimTemplates:
  - metadata:
      name: cassandra-data
      annotations:
        volume.beta.kubernetes.io/storage-class: portworx-sc
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi
```

Using Portworx helps prevent this error by:

  • Aggregating storage across nodes into pools and dynamically allocating capacity as needed
  • Resolving binding issues caused by storage exhaustion on individual nodes.
  • Allowing access to the volume from the storage pool in the cluster and eliminating the need to manually handle volume placement.

How to avoid this error in the future

Here are a few tips to avoid this error in the future:

  • Node Availability and Affinity Settings: Ensure that enough nodes are available in the cluster and add more if required. The node affinity settings for the PVs should be configured correctly to match the nodes where pods are scheduled to avoid conflicts leading to unbound claims.
  • Optimizing Pod Priority, Preemption, and Disruption Budgets: When using Pod priority and preemption, ensure to correctly set priority levels of the pods to facilitate the preemption of lower-priority pods when resources are needed. The PodDisruptionBudgets (PDBs), if any, shall allow necessary evictions so critical workloads can access required resources without constraints. When workling with database StatefulSets, PDB can be useful for minimum pods allowed to keep availability of data, otherwise it could lead to downtime for the database as well.
  • Set Resource Requests and Limits: Define PVCs with storageClassName and requested storage sizes and configure auto-scaling to cluster to add nodes when limits are reached. Additionally, avoid overcommitting resources in the cluster.
  • Enable Dynamic Provisioning: Ensure usage of such StorageClass that supports dynamic provisioning of volumes. Portworx easily handles growing storage demands by dynamically creating volumes on demand.
  • Monitor and Optimize PVC Binding: Continuously monitor node utilization and storage pool capacity with the help of tools like Prometheus and Grafana. Additionally also set up alerts for storage thresholds and configure PVC binding policies (Immediate or WaitForFirstConsumer) to align with workload needs, to ensure efficient resource usage.

Troubleshooting the “Pod has unbound immediate persistent volume claims” error can be a time-consuming and error-prone process. Analyzing your resource requirements and following preventive measures can significantly minimize the chances of encountering this error.

Once your volumes are bound and you have no more errors, if you are looking for more features for your stateful applications, then Portworx offers features like replication, encryption, and performance tuning, for better handling of stateful workloads in Kubernetes.

Get started with Portworx today

Share
Subscribe for Updates

About Us
Portworx is the leader in cloud native storage for containers.

company logo

Portworx Team

link
Portworx
February 17, 2025 Architect’s Corner
Top 5 Kubernetes-Based Alternatives to VMware
Guest Author
Guest Author
link
Application
February 17, 2025 Architect’s Corner
The Future of Modern Application Developement: The 3 Key Benefits of Platform Engineering
Rajiv Thakkar
Rajiv Thakkar
link
October 8, 2024 Architect’s Corner
From T-Mobile to Portworx: A Platform Engineer’s Journey
James Webb
James Webb