Portworx & Red Hat Hands-on Labs Register Now

While Kubernetes is great for stateless applications, its pods are ephemeral. Pods are recreated or terminated based on resource constraints, node failures, or scaling operations; when it restarts, it may not retain data or state. Kubernetes uses StatefulSets to manage stateful applications; in this article, we’ll explore core concepts of StatefulSets and dive into related use cases and best practices.

Introduction to StatefulSets in Kubernetes

Applications like databases, message brokers, and distributed systems require persistent storage, ordered deployment, and scaling. To achieve this, they must maintain state across pod restarts and have a unique identity. Deployments in Kubernetes provide automatic scaling capabilities but lack control over pod identities, which is essential for stateful applications.

Our article, Learning to Love Stateful Apps in Kubernetes, helps address deployment challenges, like gracefully managing creation or termination of pods to ensure the application’s state is preserved.

What is a StatefulSet?

StatefulSet is a Kubernetes API object that ensures each pod has a persistent identity. It assigns a unique identifier, such as a persistent hostname and volume, to each pod which persists even if it is recreated.

Importance of StatefulSets in Kubernetes

StatefulSets are crucial for stateful applications because:

  • All databases have a primary node. StatefulSets ensures that the pod serving as the primary node has a unique and stable identity for consistent communication.
  • Applications are regularly maintained or updated based on requirements. StatefulSets supports rolling updates, so newer pods are provisioned only when the older pods are updated and ready, ensuring application stability.
  • StatefulSets ensures that a pod’s lifecycle does not cause disruptions or impact data consistency by associating each pod with a PersistentVolume (PV). This helps databases manage highly available replicas and allows applications like message queues to maintain their state.

Core Concepts of StatefulSets

Persistent Storage

StatefulSets ensure persistent storage by associating each pod with a PersistentVolume (PV) provisioned dynamically based on the StorageClass defined in the PersistentVolumeClaim (PVC) request. This association is maintained even if a pod is rescheduled, ensuring data remains intact and accessible. Unlike stateless deployments where storage can be transient, StatefulSets ensure that data critical to your application is consistently available.

Ordered Deployment and Scaling

Many stateful applications require a precise deployment of pods to maintain consistency and avoid race conditions. For example, a primary database instance must be up and running before replicas can sync. The ordered deployment of pods in a StatefulSet helps ensure they start in the correct order.

The same is true for scaling – stateful application might have a dependency that a leader needs to be elected before a new pod is scheduled. StatefulSets follows an ordered approach, ensuring that any new pod added to the set is fully operational and integrated into the application stack before moving to the next.

Stable Network Identities

Stable network identities are derived from the pod name and original index. The controller creates endpoint records and binds each DNS record to a specific pod. By assigning stable network identities, StatefulSets ensures that applications always connect to the correct pod, even in a dynamic environment.

Suppose you’re deploying a database cluster in which each pod has specific roles such as primary, replica, or secondary node. To ensure each pod can be consistently addressed by its peers, StatefulSet can be linked with a headless Service. Each pod in the StatefulSet receives a stable network identity based on its ordinal index, such as db-0 or db-1. If a pod is restarted or rescheduled, its DNS record remains unchanged, allowing other nodes or services to address it consistently.

Ordered and Graceful Termination

When a pod in a StatefulSet is terminated, Kubernetes ensures that the termination occurs in reverse order, starting with the highest ordinal pod {from N-1,…1}. This process prevents disruption by allowing the system to gracefully shut down each pod, and ensuring that any dependent services or connections are properly closed before moving to the next pod in the sequence.
StatefulSets do not automatically delete the associated PVs when a pod is terminated. This provides stability to stateful applications, maintains data integrity, and ensures each component is properly decommissioned in a controlled and predictable manner.

Volume Claim Templates

Volume claim templates are essential for managing storage in StatefulSets, and act as blueprints defining each pod’s StorageClass, access modes, and capacity. When a StatefulSet is deployed, Kubernetes automatically creates PVCs based on these templates, ensuring each pod gets dedicated persistent storage.
PVC templates offer flexibility, allowing different pods to have customized storage specifications. It also ensures that PVCs are correctly bound to PVs, maintaining data integrity. As your application scales, Kubernetes provisions the necessary storage automatically, making it easier to manage and scale stateful applications.


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongodb-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: px-csi-db-encrypted-k8s


Example PVC template
In this configuration, PVC allows pods to request and use persistent storage. This PVC template:

  • Sets the PVC to be mounted as read-write by a single node at a time
  • Defines the amount of persistent storage that the PVC will require to function
  • Uses `px-csi-db-encrypted-k8s` StorageClass to provision storage

For more information, watch Real-World Guidance for Stateful Kubernetes.

Setting Up a StatefulSet

Let’s set up a StatefulSet by deploying a replicated, persistent database on a Kubernetes cluster. We’ll configure MongoDB with three replicas, where all nodes are available for reading, and one node handles writing from an external application.

Prerequisites

  • Create a Kubernetes cluster on the local machine or use a cluster from a managed provider.
  • Ensure that `StorageClass` is applied to your cluster.
    
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: px-csi-db-encrypted-k8s
    provisioner: pxd.portworx.com
    parameters:
      repl: "3"
      secure: "true"
      io_profile: auto
      io_priority: "high" 
      cow_ondemand: "false"
      csi.storage.k8s.io/provisioner-secret-name: px-secret
      csi.storage.k8s.io/provisioner-secret-namespace: kube-system
      csi.storage.k8s.io/node-publish-secret-name: px-secret
      csi.storage.k8s.io/node-publish-secret-namespace: kube-system
    reclaimPolicy: Delete
    volumeBindingMode: Immediate
    allowVolumeExpansion: true
    
    

    This YAML configuration will:

    • Create a `StorageClass` with the name `px-csi-db-encrypted-k8s`.
    • Use Portworx as a storage provisioner that dynamically provides persistent volumes.
    • Set storage volume to have 3 replicas for high availability, which will be encrypted.
    • Set `reclaimPolicy` to `Delete` so that when the associated PVCs are deleted, the PVs created with the StorageClass are also deleted.
    Creating a StatefulSet

    Let us configure a StatefulSet using the StorageClass we created earlier:

    
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: mongodb
      labels:
        app: mongodb
    spec:
      serviceName: "mongodb-service"
      replicas: 3
      selector:
        matchLabels:
          app: mongodb 
      template:
        metadata:
          labels:
            app: mongodb 
        spec:
          containers:
          - name: mongodb 
            image: mongo:4.4 
            ports:
            - containerPort: 27017 
            volumeMounts:
            - name: mongo-persistent-storage # Name of the volume to mount
              mountPath: /data/db # Path to mount the volume inside the container
            env:
            - name: MONGO_INITDB_ROOT_USERNAME
              value: root 
            - name: MONGO_INITDB_ROOT_PASSWORD
              value: test
      volumeClaimTemplates:
      - metadata:
          name: mongo-persistent-storage # Name of the volume claim template
        spec:
          accessModes: [ "ReadWriteOnce" ] # Access mode for the volume
          resources:
            requests:
              storage: 10Gi # Storage size requested
          storageClassName: px-csi-db-encrypted-k8s # StorageClass to use for the volume
    
    

    In this configuration, we have:

    • Defined the number of replicas for the database and image to use for MongoDB
    • Configured the credentials of the database
    • Set the name for volume claim with StorageClass to use for volume.

    Once the StatefulSet is successfully applied on the cluster, it will:

    • Create StatefulSet with 3 pods.
    • Assign pod name in an ordered manner:
      `-`.
    • Allocate each pod to a unique PVC which is a persistent storage.
    • Define a unique DNS name for each pod:
    • `.mongodb-service..svc.cluster.`
      Accessing Pods in a StatefulSet

      You can access the pods in a StatefulSet using the following commands.

      1. Get the list of the pods.

      ```
      kubectl get pods -l app=mongodb
      ```

      2. Interact with the pod using this command:

      ```
      kubectl exec -it  -- /bin/sh
      ```


      Note: Replace the shell with `/bin/bash` if `sh` is unavailable.

      Tip: To make the database accessible from a stateful application or browser, you will need to create a headless service and services for read and write operations. Apply the manifest on a Kubernetes cluster and check if

      `mongodb`

      StatefulSet is created.

      
      ```
      $ kubectl get statefulsets
      NAME      READY   AGE
      mongodb   3/3     2m
      ```
      
      

      Since the StatefulSet is created, view if the pods are ready.

      
      ```
      $ kubectl get pods -A
      NAMESPACE     NAME                               READY   STATUS    RESTARTS      AGE
      default       mongodb-0                          1/1     Running   0             2m
      default       mongodb-1                          1/1     Running   0             2m
      default       mongodb-2                          1/1     Running   0             2m
      ```
      
      

      Once the command executes successfully, you can directly interact with the container in the pod using the command line shell. While this is just a basic setup for StatefulSet in Kubernetes, you can also protect your cluster from ransomware attacks with PX-Backup.

      StatefulSet Use Cases

      In the previous section, we saw the databases use case of StatefulSet for MongoDB. Similarly, many applications rely on stable DNS for internal communication or stateful workloads. Changes in the pod identity because of (re)scheduling could cause conflicts or disrupt communication. Here are some StatefulSet use cases.

      Applications That Require Stable Identities

      Applications get stable identity when StatefulSet assigns a specific DNS name to the pods. Some of the applications that need stable identities for service discovery or leader election are:

      • Databases: For inter-node communication and data replication, databases like MongoDB and Cassandra require a stable network identity.
      • Message Queues: Apache Kaka brokers maintain logs. These brokers require a stable identity for replication management and partition leadership.
      • In-memory Storage: Redis clusters provide an in-memory key-value database for each node to know the other to manage key distribution. To do that and to maintain data integrity, they require a stable identity.
      Applications Needing Persistent Storage

      To maintain data consistency, capability to recover from failures, and support scalability with dedicated storage for each instance, applications need persistent storage. Here are some examples:

      • Databases: Applications that store user details in MySQL, PostgreSQL, or MariaDB need persistent storage for managing login credentials. Any data loss could not only lead to a bad user experience but also compromise security.
      • Logging Systems: Elasticsearch needs persistent storage for storing its indices. Similarly, monitoring tools like Prometheus or Grafana need data availability for long-term metrics analysis.
      • Multi-tenant WordPress: Large deployments like Multi-Tenant WordPress on GKE require a highly available database that scales vertically. They also need fast “shared” or “multi-writer” persistent volumes for file uploads, as the WordPress PHP container is scaled horizontally.

      Managing StatefulSets

      Due to their stateful nature, managing StatefulSets requires careful planning and consideration. Let’s examine an efficient way of managing them.

      Scaling a StatefulSet

      StatefulSets can be scaled up or down using any one of the following methods:

      • Using StatefulSet YAML configuration: In the existing StatefulSet configuration, update the value of
        `spec.replicas`

        to the desired number and apply it to the cluster.

        In the example below, we have updated the number of replicas to 6.

        
        ```
        apiVersion: apps/v1
        kind: StatefulSet
        metadata:
          name: mongodb
          labels:
            app: mongodb
        spec:
          serviceName: "mongodb-service"
          replicas: 6
        ```
        

      • Using kubectl scale: kubectl provides a
        `scale`

        command to directly update the number of replicas from the command line. You must provide the name of the StatefulSet and the number of desired replicas.

        
        ```
        kubectl scale statefulset mongodb --replicas=2
        ```
        

      Kubernetes gracefully terminates all unwanted pods, and the associated PVCs are retained or deleted based on the value set in

      `reclaimPolicy`.

      Updating a StatefulSet

      To minimize downtime and maintain application stability during updates, here are some best practices:

      • Use rolling updates: Kubernetes supports rolling updates for StatefulSets, which ensures that only one pod is updated at a time, ensuring high availability. If the pod ends up in a pending state, the issue can be easily debugged.
      • Version control manifests: Apply version control to your infrastructure code to track changes and allow rollback in case of issues.
      • Monitor pod health: After applying updates to a pod, use observability solutions to ensure that the pod’s health is good before rolling updates to all.
    Deleting a StatefulSet

    For graceful termination of pods without leaving an orphan resource, it’s recommended to first scale the StatefulSet down to 0 replicas. If you want to delete the PVC and StatefulSet entirely, use the following command:

    
    ```
    kubectl delete statefulset 
    kubectl delete pvc 
    ```
    
    

    You can also use the manifest to delete the StatefulSet:

    
    ```
    kubectl delete -f 
    ```
    
    

    This will require manually deleting the PVCs if not deleted with the StatefulSet.

    Handling Failures and Recovery

    Here’s some strategies and considerations for smooth recovery and failure management:

    • Backup and restore: Enable regular backup of data associated with the StatefulSet, and ensure you have a standard operating procedure defined to restore from backup.
    • Pod Disruption Budgets(PDB): Set the PDBs for voluntary disruptions to ensure a minimum number of pods always remain operational.
    • Monitoring and alerts: Set up monitoring and observability tools like Grafana and Prometheus, and enable alerts to continuously check the health and status of the StatefulSet pods .

    Portworx Enterprise offers flexible, persistent Kubernetes storage. It helps maximize application performance, optimize infrastructure costs, and automate storage management.

    Advanced Features of StatefulSets

    Pod Management Policies

    StatefulSet provides the option for pod ordering while preserving its uniqueness. Two policies can be set primarily via the

    `spec.podManagementPolicy`

    field.

    • OrderedReady: Before scaling operations, all pod predecessors must be up and running. If a pod is terminated, all its successors should be deleted first.
    • Parallel: Ensures that all the pods start or terminate in parallel instead of waiting for each pod to be up and running or completely terminated. This is only applicable for scaling and not updates.
    Rolling Updates in StatefulSets

    The field

    `spec.updateStrategy`

    helps you set the strategy type to

    `OnDelete`

    or

    `RollingUpdate`

    . The

    `OnDelete`

    option is preferred when you want to delete each pod manually. The controller will then create an updated pod. `RollingUpdate` is the default option and automates the pod deletion. The controller will delete one pod at a time, create an updated one, and then move to the next.

    To partition the

    `RollingUpdate`

    strategy, set the value of

    `spec.updateStrategy.rollingUpdate.partition`

    . All the pods with an ordinal greater than or equal to the partition will be updated, while the ones that are less than will not be, even if the pod is recreated. They will have older versions.

    Headless Services for StatefulSets

    Stateful applications require direct communication between pods in a StatefulSet, so a headless service is recommended instead of a load balancer. A headless service is used to address each instance individually in a StatefulSet. Since each pod gets a sticky identity in a StatefulSet, a headless service ensures direct communication to maintain state and consistency.

    Here is an example of a headless service for the MongoDB use case:

    
    ```
    apiVersion: v1
    kind: Service
    metadata:
      name: mongodb-service
      labels:
        app: mongodb
    spec:
      ports:
      - port: 27017
        name: mongodb
      clusterIP: None
      selector:
        app: mongodb
    ```
    

    Here,

    `spec.clusterIP`

    is set to

    `None`

    to create a headless service. The

    `mongodb-service`

    will be used to manage the network domain.

    Best Practices For Statefulsets

    When building and deploying stateful applications, managing persistent storage and ensuring stable network identities is critical. Let's look at some more best practices:

    Storage Considerations

    Ensuring persistent storage is correctly configured and managed in a StatefulSet is critical for data consistency and reliability. Below are some tips to manage storage efficiently.

    • Each pod in a StatefulSet must have its own PVC configured to ensure data isolation and persistence across pod restarts.
    • Choose a storage class that supports dynamic provisioning and meets your application’s performance requirements.
    • Devise and implement a robust backup and restore strategy for your PVs.
    Network Configuration

    Configuring the network correctly ensures efficient communication and service discovery within the cluster. Here’s some tips:

    • Use stable network names for each pod in a StatefulSet to ensure consistent network identity.
    • Use headless service to enable direct access to pods. This allows clients to interact directly with a pod.
    • Leverage Kubernetes Network Policies to control the traffic flow between pods. This also enhances security between pods.
    Monitoring and Logging

    Observability is critical to the health and performance of stateful applications. Effective logging and monitoring help detect issues early and improve application reliability. Here’s some observability best practices:

    • Collect metrics related to resource usage (CPU, RAM), storage, and networking to identify potential bottlenecks. Leverage tools like Prometheus and Grafana to collect and visualize metrics.
    • Implement centralized logging to collect logs from all the pods within a StatefulSet. This ensures that all the logs are easily searchable and helps in troubleshooting.
    • Set up alerts on critical metrics and log patterns to address potential issues proactively. Depending on the criticality, configure alerting tools to send alerts over Slack, emails, or even phone.

    Common Issues And Troubleshooting Tips

    Let’s look at some common problems and associated troubleshooting tips.

    Pods Stuck In Pending State

    When pods can’t find a suitable node to schedule on, they will go into a pending state. To fix this, look at the resource usage, such as CPU, RAM, or storage on nodes, and tweak them according to your pod’s requirements.

    Network Identity Conflicts

    This issue arises when different pods attempt to use the same network identity or hostname. To fix this, verify the headless service configuration and network policies to ensure that each pod in a StatefulSet has a unique network identity.

    Persistent Volume Issues

    When Persistent Volumes (PVs) don’t bind to Persistent Volume Claims (PVCs), this leads to data persistence issues. Verify the PVCs' storage class and PV configuration and ensure the PVs adhere to the requested storage capacity and access modes.

    Pod Restart Loops

    Pods may enter a restart loop due to misconfigurations, resource limits, or application errors. Check pod logs and Kubernetes events to identify the root cause and adjust resources and application configuration.

    Node Affinity and Anti-Affinity Conflicts

    Pods may fail to schedule if there are conflicting node affinity or anti-affinity rules. Verify and adjust these rules to ensure they are not overly restrictive, allowing pods to schedule on nodes with sufficient resources.

    Alternatives to StatefulSets

    While StatefulSets are ideal for managing stateful applications, there are scenarios where this might be unneeded; Deployments, DaemonSets, and ReplicaSets may suffice for simpler use cases. Let’s compare these implementations.

    Difference Between StatefulSets and Deployments

    Stateful sets are used for applications requiring stable identities and persistent storage, while Deployments are better for stateless applications.

    Feature StatefulSets Deployments
    Pod Identity Stable, unique identities for each pod. Pods are interchangeable with no stable network identities.
    Storage Persistence Uses PVCs to maintain storage across pod rescheduling Supports shared storage but no persistent association between pod and storage.
    Pod Order Maintains strict order during scaling, updates, and rescheduling. No order is maintained.
    Scaling Behavior Pods are scaled in a defined order. Pods are scaled without any specific order.
    Rolling Updates Updates occur sequentially to ensure that at least one pod is always available, maintaining state. Updates are applied simultaneously or in batches, with no specific order.
    Pod Communication SUses a headless service for direct pod-to-pod communication. Typically uses a standard service for load-balanced traffic without direct pod-to-pod communication.
    Use Case Suitable for stateful applications like databases, & messaging queues that require stable storage and network identities. Ideal for stateless applications like web servers, where pods can be replaced without affecting the application’s overall functionality.

    Difference Between StatefulSets and DaemonSets

    StatefulSets and DaemonSets each serve a distinct purpose; let’s help you choose the right one for your needs.

    Feature StatefulSets DaemonSets
    Pod Identity Stable and unique for each pod Identical pods across all nodes
    Node Distribution Not node-specific, pods are scheduled anywhere Ensures one pod per node
    Pod Order Maintains order during scaling and updates No fixed order, pods start as nodes become available
    Scaling Maintains order during scaling, ensuring one pod is always available. Pods are added or removed as nodes join or leave the cluster, with no specific scaling order,
    Networking Uses stable network identities for pod communication, often with a headless service Typically used with a standard service, focusing on uniform distribution rather than communication
    Use Case Stateful applications (example: databases) Background services (example: logging, monitoring)

    Difference Between StatefulSets and ReplicaSets

    StatefulSets and ReplicaSets manage pods in a Kubernetes cluster, but suit different needs.

    Feature StatefulSets ReplicaSets
    Pod Identity Stable and unique for each pod Interchangeable, with no stable identity
    Storage Persistent, with dedicated PVCs Typically does not manage persistent storage
    Scaling/Updates Pods are scaled and updated sequentially, maintaining state Pods are scaled and updated simultaneously, focusing on maintaining the desired replica count
    Networking Uses stable network identities and often requires direct pod-to-pod communication Typically used with a standard service, with no need for stable network identities
    Use Case Stateful applications (example: databases) Stateless applications (example: web servers)

    Conclusion

    Kubernetes provides robust container orchestration, but managing stateful applications requires careful attention to the challenges of pod ephemerality. StatefulSets offer a reliable way to handle these complexities, allowing developers to manage stateful workloads while preserving application state and data integrity in dynamic environments. Understanding these concepts is essential for building resilient, stateful applications in Kubernetes.