Tutorial – Kubernetes Persistent Volumes

This article is designed as an initial overview on Kubernetes persistent volumes and should answer the following key questions:

  • What is a kubernetes persistent volume?
  • How do they differ from kubernetes volumes?
  • Why would you use persistent volumes?
  • How do I get started using persistent volumes?

What is a Kubernetes persistent volume?

To help understand exactly what a Persistent Volume (PV) is, it is useful to know how Kubernetes manages storage resources. Kubernetes has a matching primitive for each of the traditional storage operational activities (provisioning/configuring/attaching ). These are mapped in the below table:

Storage Activity Kubernetes storage primitive
Provisioning Persistent Volume
Configuring Storage Class
Attaching Persistent Volume Claim

 

Kubernetes persistent volumes are administrator provisioned volumes. These are created with a particular filesystem, size, and identifying characteristics such as volume IDs and names.

 

A Kubernetes persistent volume has the following attributes

 

  • It is provisioned either dynamically or by an administrator
  • Created with a particular filesystem
  • Has a particular size
  • Has identifying characteristics such as volume IDs and a name


In order for pods to start using these volumes, they need to be claimed (via a persistent volume claim) and the claim referenced in the spec for a pod. A Persistent Volume Claim describes the amount and characteristics of the storage required by the pod, finds any matching persistent volumes and claims these. Storage Classes describe default volume information (filesystem,size,block size etc). The below image describes these processes:

 

Kubernetes Volumes vs Persistent Volumes

There are currently two types of storage abstracts available with Kubernetes: Volumes and Persistent Volumes. A Kubernetes volume exists only while the containing pod exists. Once the pod is deleted, the associated volume is also deleted. As a result, Kubernetes volumes are useful for storing temporary data that does not need to exist outside of the pod’s lifecycle. Kubernetes persistent volumes remain available outside of the pod lifecycle – this means that the volume will remain even after the pod is deleted. It is available to claim by another pod if required, and the data is retained.

So how does the usage of Kubernetes volumes differ from Kubernetes persistent volumes? The answer is quite simple : Kubernetes persistent volumes are used in situations where the data needs to be retained regardless of the pod lifecycle. Kubernetes volumes are used for storing temporary data.

Why do we use persistent volumes?

In the earlier days of containerization there were some pretty strict rules / best practices put in place. First and foremost was that containers should be stateless. As Kubernetes has matured and container native storage solutions (such as Portworx) have been created, there is no real reason to draw the line at stateless applications. You may want to gain the benefits of having your application in a container (fast startup, high availability, self healing etc) but also store, retain and back up data created or used by that application.

 

Persistent Volumes: A use case

The most common use case for Persistent volumes in Kubernetes is for databases. Obviously a database needs to have access to its data at all times, and by leveraging PVs, we can start using databases like MySQL, Cassandra, CockroachDB and even MS SQL for our applications. By ensuring the consistent state of our data, we can start putting complex workloads into containers, and not just stateless, 12-factor style web applications.

 

By utilizing persistent volumes, we can simplify the deployment of distributed, stateful applications (such as Cassandra DB). When deploying Cassandra, we ensure that the following happens:

  • Each pod is created (with appropriate config and environment variables required)
  • A persistent volume is attached to the pod (via a persistent volume claim)
  • The claimed storage is mounted inside the pod as required.

 

These high-level steps will be repeated for each pod in the application set, and happens in series. This way, we can be confident that the initial pod deploys and has required storage, and each additional replica that is created has identical storage attachments and mounts (required for any clustered application). We can easily scale this stateful set of pods and join many more replicas to the distributed application. If any of the pods fail, they can be replaced and have the storage reattached.

How to use Kubernetes Persistent Volumes

Now that we have an understanding of what persistent volumes are, how they differ from regular volumes and why they are used, we can move on to actually using PVs. The following set of steps will create a Kubernetes persistent volume with a set size. This assumes you have a functional Kubernetes environment up and running and Portworx installed and operational

Creating a persistent volume

  1.     Create a file named ‘pv-demo.yaml’ in your editor of choice
  2.     Edit this file, and paste the below spec in.

apiVersion: v1

kind: PersistentVolume

metadata:

name: <pv_name>

spec:

capacity:

storage: 10Gi

accessModes:

- ReadWriteOnce

persistentVolumeReclaimPolicy: Retain

portworxVolume:

volumeID: "<volume_id>"

  1.     Replace <pv_name> with the name of the PersistentVolume. For example pv0001
  2.     Change the storage capacity to your desired amount. The above example will create a 10 GiB volume.
  3.     Replace <volume_id> with the desired ID of the volume. For example, this could be “pv0001”.
  4.     Save this file
  5.     Create the persistent volume with the below command :

kubectl create –f pv-demo.yaml

  1.     You should receive the below message:

persistentvolume "pv0001" created

  1.     To see this persistent volume, run the below command:

kubectl get pv

    1.     You should see something similar to the below (depending on the name of the persistent volume, and the size you chose)

    1.     The volume has now been created, and is ready for a Persistent Volume Claim to use this.

 

To confirm this, we can check the storage usage (this will differ based on the underlying storage used). For Portworx, we can run pxctl status on one of the storage nodes to get the below information:


You will see that 10GiB has been provisioned, spread across the 5 storage nodes in the example Kubernetes environment.

 

For more information on Kubernetes persistent volume claims, please see this tutorial