This article is designed as an initial overview on Kubernetes persistent volumes and should answer the following key questions:
To help understand exactly what a Persistent Volume (PV) is, it is useful to know how Kubernetes manages storage resources. Kubernetes has a matching primitive for each of the traditional storage operational activities (provisioning/configuring/attaching ). These are mapped in the below table:
Storage Activity | Kubernetes storage primitive |
Provisioning | Persistent Volume |
Configuring | Storage Class |
Attaching | Persistent Volume Claim |
Kubernetes persistent volumes are administrator provisioned volumes. These are created with a particular filesystem, size, and identifying characteristics such as volume IDs and names.
A Kubernetes persistent volume has the following attributes
In order for pods to start using these volumes, they need to be claimed (via a persistent volume claim) and the claim referenced in the spec for a pod. A Persistent Volume Claim describes the amount and characteristics of the storage required by the pod, finds any matching persistent volumes and claims these. Storage Classes describe default volume information (filesystem,size,block size etc). The below image describes these processes:
There are currently two types of storage abstracts available with Kubernetes: Volumes and Persistent Volumes. A Kubernetes volume exists only while the containing pod exists. Once the pod is deleted, the associated volume is also deleted. As a result, Kubernetes volumes are useful for storing temporary data that does not need to exist outside of the pod’s lifecycle. Kubernetes persistent volumes remain available outside of the pod lifecycle – this means that the volume will remain even after the pod is deleted. It is available to claim by another pod if required, and the data is retained.
In the earlier days of containerization there were some pretty strict rules / best practices put in place. First and foremost was that containers should be stateless. As Kubernetes has matured and container native storage solutions (such as Portworx) have been created, there is no real reason to draw the line at stateless applications. You may want to gain the benefits of having your application in a container (fast startup, high availability, self healing etc) but also store, retain and back up data created or used by that application.
The most common use case for Persistent volumes in Kubernetes is for databases. Obviously a database needs to have access to its data at all times, and by leveraging PVs, we can start using databases like MySQL, Cassandra, CockroachDB and even MS SQL for our applications. By ensuring the consistent state of our data, we can start putting complex workloads into containers, and not just stateless, 12-factor style web applications.
By utilizing persistent volumes, we can simplify the deployment of distributed, stateful applications (such as Cassandra DB). When deploying Cassandra, we ensure that the following happens:
These high-level steps will be repeated for each pod in the application set, and happens in series. This way, we can be confident that the initial pod deploys and has required storage, and each additional replica that is created has identical storage attachments and mounts (required for any clustered application). We can easily scale this stateful set of pods and join many more replicas to the distributed application. If any of the pods fail, they can be replaced and have the storage reattached.
Now that we have an understanding of what persistent volumes are, how they differ from regular volumes and why they are used, we can move on to actually using PVs. The following set of steps will create a Kubernetes persistent volume with a set size. This assumes you have a functional Kubernetes environment up and running and Portworx installed and operational
apiVersion: v1
kind: PersistentVolume
metadata:
name: <pv_name>
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
portworxVolume:
volumeID: "<volume_id>"
kubectl create –f pv-demo.yaml
persistentvolume "pv0001" created
kubectl get pv
To confirm this, we can check the storage usage (this will differ based on the underlying storage used). For Portworx, we can run pxctl status on one of the storage nodes to get the below information:
You will see that 10GiB has been provisioned, spread across the 5 storage nodes in the example Kubernetes environment.
For more information on Kubernetes persistent volume claims, please see this tutorial