Learning To Love Stateful Apps In Kubernetes

Contact Sales

Architect’s Corner

container

Developers are always looking to be more agile in their software deployments. Containerization provides a powerful way to develop modular, highly scalable microservices rapidly. Developers now overwhelmingly turn to Kubernetes when they need to deploy containers.

However, stateful architectures are inherently more complicated to build than stateless ones, and the temporary nature of containers further magnifies the challenges of building resilient stateful apps.

In this article, we’ll explore how to use Kubernetes’s built-in tools to build and manage stateful applications. We’ll address in detail the role stateful apps play in the Kubernetes landscape and the challenges you’ll face when developing these apps. We’ll also examine some components and services you’ll need to run a stateful database using MySQL.

Kubernetes and Statefulness

An app is considered stateful when it stores information that can be read later. So, stateful apps must store data from client sessions, usually on the server-side, and reliably retrieve it for subsequent interaction. Databases present a classic example of a stateful system.

Stateless apps don’t require server-side processes to store data and, as such, are free to reschedule containers and scale resources without impacting the user’s experience. Instead, stateless apps store state information on the client-side, then pass this data to the server when it’s needed. A web server or Google search app are archetypical stateless systems.

Developers building RESTful APIs for their stateless apps have embraced Kubernetes. Google developed it as an open-source platform for orchestrating, deploying, and managing containers, and it has since become the dominant container orchestration and microservice platform.

However, it presents a challenging environment for database development, as containers aren’t generally well-suited for stateful applications. Ephemeral Kubernetes pod storage doesn’t outlive its pod, so you should use Kubernetes volumes to persist storage.

Key Concerns

Keep several core concerns in mind while making your architectural decisions.

To deliver resilient stateful apps, and when building databases like Redis, make sure persistent storage is available within the architecture. Unlike stateless applications, which are portable and adaptive, databases must maintain a steady connection and precise coordination with the services relying on their information.

You also need to ensure service discovery while maintaining the cloud-native architecture’s scalability. Additionally, weigh how your choice of storage providers may impact your data mobility.

Fortunately, you can overcome the challenges associated with statefulness and databases in Kubernetes by using the platform’s built-in tools. First, let’s examine the storage challenge.

Persistent Storage Challenges

It’s relatively easy to deploy stateless applications. Every time you start a stateless app, it presents the same data and functionality, and sessions don’t depend on information from previous transactions. So, they don’t require persistent storage. Additionally, you can quickly redeploy app components when they fail by simply starting a new instance.

In contrast, stateful applications need persistent storage that can survive the lifecycle of a Pod. Because Kubernetes was originally built for stateless applications, it was challenging to manage and scale stateful deployments until fairly recently. Kubernetes now offers native tools for provisioning dynamic volumes within stateful applications.

Another consideration you’ll need to factor is how to maintain availability of configuration files and metadata whenever containers restart. Deleting your app’s Cassandra or Postgres databases every time you restart a container isn’t an efficient way to run mission-critical applications.

Also, consider legacy architecture design. For many years, companies have relied on a monolithic approach to data management. They store and manage databases in monolithic architectures from a central point. This model has struggled to keep up with the highly granular nature of developing containerized environments.

However, as more enterprises have adopted a cloud-native approach to building systems of all types, Kubernetes contributors have expanded its scope by adding technology to accommodate stateful applications. This technology allows you to combine the flexibility and portability of containerization with the benefits of persistent storage in stateful contexts.

So, how do you strike a balance between persistent storage and the ephemeral nature of containerization?

The Native Approach to Building Stateful Apps in Kubernetes

Over the years, CNCF noted these challenges and addressed them by adding new storage objects. Kubernetes now has features like PersistentVolumes, Operators, StatefulSet controllers, and DaemonSet controllers to address running persistent applications which high availability, self-healing capabilities, and snapshots. These solutions let you expose Cassandra, NoSQL, MySQL, and PostgreSQL databases with built-in Kubernetes objects.

Let’s examine these features in more detail and explore how they fit together to handle statefulness easily.

PersistentVolumes and PersistentVolumeClaims

A Kubernetes PersistentVolume (PV) is a storage unit that a StorageClass has provisioned dynamically, or an administrator has provisioned manually. PVs ensure the data services your clusters require persist through the container lifecycle. These PVs are storage resources, just as nodes are compute resources.

Your Kubernetes node sends a PersistentVolumeClaim (PVC) request to access a PV. PVCs can include requests for specific attributes, like storage type, access type, or capacity.

StatefulSet Controllers

StatefulSet controllers are API objects. They scale and manage Pods by assigning unique identifiers that persist across rescheduling. Rather than routing requests to whichever Pod is available in a cluster, a StatefulSet ensures that each request reaches a specific Pod.

Each Pod’s identity consists of this stable ID, an ordinal index, and any PVCs assigned to it. As is the case in a Deployment, the Pods that a StatefulSet manages are replicas based on an identical container spec.

To set the Pod network identities in a StatefulSet, create a headless Service, which you configure in the manifest file. You don’t need load balancing, and you don’t want to allocate a cluster IP. Instead, the controller creates Endpoint records and binds each resulting DNS record to a specific Pod.

StatefulSets also deploy Pods sequentially. They assign each Pod an ordinal value and only create new Pods when the previously-numbered Pod is in its Running phase.

When you delete or scale down a StatefulSet, Kubernetes gracefully terminates Pods in reverse order and doesn’t delete the volumes and other resources associated with the Pod. Although this approach potentially introduces manual overhead when cleaning up Pods, the tradeoff is increased data safety, which is a high priority in stateful apps.

When combined with each Pod’s unique identifier, this approach creates a robust solution that more easily matches PVs to failed and replaced Pods. It ensures Pod resources are always available and deployed to stateful applications. It also ensures that applications remain connected, across nodes, to the storage where their databases reside.

DaemonSet Controllers

A DaemonSet controller is a workload object that lets you run a Pod copy within your Kubernetes nodes. This controller’s primary role is to manage Pods by ensuring they run in all nodes within a cluster.

DaemonSets, a Kubernetes implementation of daemons, are helpful container tools for running tasks not requiring your intervention. Suitable tasks include log collections, analyses, performance monitoring, and other background processes within nodes.

A DaemonSet ensures each node runs a copy of a pod. As you add nodes to your cluster, the Daemonset adds pods to them and also removes the pods whenever you delete the nodes.

Daemonsets are suitable for managing cluster storage by handling requests from Persistent Volumes and Persistent Volume Claims.

Some cluster-level operations require you to run a single Pod instance. For example, when a proxy needs to attach NodePort Services to ports within a cluster node, a DaemonSet will deploy Pod resources to each node to facilitate cluster services at a granular level. It’s an excellent tool for mounting volumes and running databases in a set of nodes within stateful Services.

DaemonSet controllers use pod templates to determine what each Pod should contain. They establish if a Pod needs app labels and selectors and identify which apps run in the Pod’s containers and what volumes those apps require.

Common use cases for DaemonSets are where you can deploy the kube-proxy service as a DaemonSet to run background operations on each node or deploy other daemons, including collectd, a node monitoring daemon for collecting application metrics, and Fluent Bit, a log collection and forwarding daemon. Since DaemonSets target background services, you can even deploy a node monitoring daemon that extracts node information from a database.

Managed Data Services for Stateful Application

Teams that prefer to offload database management can use alternative data services to handle their stateful workloads. Services like Portworx can accommodate multiple deployments globally with several key benefits:

High volume scalability, enabling you to create many persistent volumes in minutes.
Meeting strict Service Level Agreements (SLAs) with high availability (HA).
Seamless migration and backup for entire applications between clusters in the cloud or across data centers.
Optional synchronous disaster recovery, allowing for a zero recovery point objective (RPO) and recovery time objectives (RTOs) under one minute.

Running Stateful Services in Kubernetes

A stateful environment is ideal for storing MySQL data to maintain consistency and availability across multiple Pods. Examples of tools you need to deploy a stateful app include:

Dynamic volume provisioning with a StorageClass. This approach creates volumes on-demand, so you don’t need to manually create storage volumes from your provider or the cloud. You can also provision volumes statically by using the Kubernetes API to create multiple persistent volumes.
A StatefulSet. You can create a StatefulSet controller using this sample YAML file, which also makes a headless Service to enable Pods to communicate with each other and identify their addresses.

Running your databases in Kubernetes has benefits like providing a mechanism to run a database per application rather than one large monolithic one as well as providing primitives to provisioning custom resources for applications that need persistence. Additionally, Kubernetes’ ability to reschedule Pods when node failure occurs makes it an excellent fit for running mission-critical applications and services that require fast failover. This capability helps achieve high availability and reduces database recovery time.

Multi-cloud Kubernetes is useful for dev, test, backups and disaster recovery. By spreading your databases across different cloud providers, you both provide ultimate development flexibility but also reduce the likelihood of one cloud provider becoming a single point of failure.

However, some databases don’t handle failovers well and may cause standby nodes to restart slowly. For high traffic services, like streaming sites or health care applications, it’s better to use services like Portworx in your architecture that can provide reliable high availability in production.

You can also set up multi-site active-active configurations where you synchronize clusters and distribute workloads in real-time. This ensures mission-critical operations like user validation are always available when needed.

Conclusion

Kubernetes can initially look like a challenging environment for building stateful apps, but it provides native solutions to handle these demands.

StatefulSets help create and maintain persistent applications across a containerized environment by identifying and connecting specific PersistentVolumes to the uniquely identifiable Pods they match. DaemonSets makes sure that a copy of each Pod is running on every node, including new nodes added to the cluster.

Despite the native stateful application tools now available in Kubernetes, some workloads require additional tools beyond what Kubernetes offers. Learn more about how Portworx Data Services can help you create a better production-grade data service for Kubernetes.

Subscribe for Updates

About Us
Portworx is the leader in cloud native storage for containers.

Thanks for subscribing!

Ryan Wallner

Portworx | Technical Marketing Manager

April 15, 2022 Architect’s Corner

Synergizing Helm Charts and Kubernetes Operators for Database Deployment

Tim Darnell

April 8, 2022 Architect’s Corner

PostgreSQL Versus Cassandra for Kubernetes

Bhavin Shah

April 1, 2022 Architect’s Corner

Leveraging Databases in Multi-Cloud Kubernetes