This article is designed as an initial overview on Kubernetes persistent volumes and should answer the following key questions:

  • What is a kubernetes persistent volume?
  • How do they differ from kubernetes volumes?
  • Why would you use persistent volumes?
  • How do I get started using persistent volumes?

What is a Kubernetes persistent volume?

To help understand exactly what a Persistent Volume (PV) is, it is useful to know how Kubernetes manages storage resources. Kubernetes has a matching primitive for each of the traditional storage operational activities (provisioning/configuring/attaching ). These are mapped in the below table:

Storage Activity Kubernetes storage primitive
Provisioning Persistent Volume
Configuring Storage Class
Attaching Persistent Volume Claim

Kubernetes persistent volumes are administrator provisioned volumes. These are created with a particular filesystem, size, and identifying characteristics such as volume IDs and names.

A Kubernetes persistent volume has the following attributes

  • It is provisioned either dynamically or by an administrator
  • Created with a particular filesystem
  • Has a particular size
  • Has identifying characteristics such as volume IDs and a name

 

In order for pods to start using these volumes, they need to be claimed (via a persistent volume claim) and the claim referenced in the spec for a pod. A Persistent Volume Claim describes the amount and characteristics of the storage required by the pod, finds any matching persistent volumes and claims these. Storage Classes describe default volume information (filesystem,size,block size etc). The below image describes these processes:

image

Kubernetes Volumes vs Persistent Volumes

There are currently two types of storage abstracts available with Kubernetes: Volumes and Persistent Volumes. A Kubernetes volume exists only while the containing pod exists. Once the pod is deleted, the associated volume is also deleted. As a result, Kubernetes volumes are useful for storing temporary data that does not need to exist outside of the pod’s lifecycle. Kubernetes persistent volumes remain available outside of the pod lifecycle – this means that the volume will remain even after the pod is deleted. It is available to claim by another pod if required, and the data is retained.

So how does the usage of Kubernetes volumes differ from Kubernetes persistent volumes? The answer is quite simple : Kubernetes persistent volumes are used in situations where the data needs to be retained regardless of the pod lifecycle. Kubernetes volumes are used for storing temporary data.

Why do we use persistent volumes?

In the earlier days of containerization there were some pretty strict rules / best practices put in place. First and foremost was that containers should be stateless. As Kubernetes has matured and container native storage solutions (such as Portworx) have been created, there is no real reason to draw the line at stateless applications. You may want to gain the benefits of having your application in a container (fast startup, high availability, self healing etc) but also store, retain and back up data created or used by that application.

Persistent Volumes: A use case

The most common use case for Persistent volumes in Kubernetes is for databases. Obviously a database needs to have access to its data at all times, and by leveraging PVs, we can start using databases like MySQL, Cassandra, CockroachDB and even MS SQL for our applications. By ensuring the consistent state of our data, we can start putting complex workloads into containers, and not just stateless, 12-factor style web applications.

By utilizing persistent volumes, we can simplify the deployment of distributed, stateful applications (such as Cassandra DB). When deploying Cassandra, we ensure that the following happens:

  • Each pod is created (with appropriate config and environment variables required)
  • A persistent volume is attached to the pod (via a persistent volume claim)
  • The claimed storage is mounted inside the pod as required.

 

These high-level steps will be repeated for each pod in the application set, and happens in series. This way, we can be confident that the initial pod deploys and has required storage, and each additional replica that is created has identical storage attachments and mounts (required for any clustered application). We can easily scale this stateful set of pods and join many more replicas to the distributed application. If any of the pods fail, they can be replaced and have the storage reattached.

How to use Kubernetes Persistent Volumes

Now that we have an understanding of what persistent volumes are, how they differ from regular volumes and why they are used, we can move on to actually using PVs. The following set of steps will create a Kubernetes persistent volume with a set size. This assumes you have a functional Kubernetes environment up and running and Portworx installed and operational.

Persistent Volume Best Practices

Guidelines for creating PVs

A good Kubernetes persistent volumes example starts in the creation process. Here are two tips to keep in mind when doing this.

First, always specify a StorageClass when defining your PVC; otherwise, the request will fail. In addition, always give your StorageClass definition a meaningful name.

The other tip is always to include PVCs when configuring containers, but never PVs. You don’t want to bind containers to specific volumes, hence, the need for PVCs.

Use dynamic provisioning whenever possible

From a performance standpoint, it’s preferable to create PVs dynamically. Assigning static PVs to PVCs consumes more resources and overhead, making the practice challenging to scale. In addition, administrators can limit the amount of storage that can be assigned to a dynamic PVC request via StorageClasses.

Plan in advance the amount of storage your container will need

Kubernetes will often provide various storage sizes and capacities to account for different node sizes. To optimize usage, always know how much space your container will need and only request for that amount.

Use resource quotas to limit storage usage

Resource quotas are limits that you can set to control the amount of memory, processing, and storage that containers can use. They can be used to impose these limits on all containers on a specific namespace, backup, or service level.

Include Quality of Service (QoS) definitions if possible

Some PVC requests in specific Kubernetes platforms include an additional parameter called quality of service (QoS). Using this tells Kubernetes the nature of the workload, then assigns the best persistent storage for that scenario.
For example, if a container requires high read/write output, QoS can help Kubernetes assign SSD storage for the best results.

Lifecycle of Kubernetes PV and PVC

Guidelines for creating PVs

A good Kubernetes persistent volumes example starts in the creation process. Here are two tips to keep in mind when doing this.
First, always specify a StorageClass when defining your PVC; otherwise, the request will fail. In addition, always give your StorageClass definition a meaningful name.
The other tip is always to include PVCs when configuring containers, but never PVs. You don’t want to bind containers to specific volumes, hence, the need for PVCs.

Use dynamic provisioning whenever possible

From a performance standpoint, it’s preferable to create PVs dynamically. Assigning static PVs to PVCs consumes more resources and overhead, making the practice challenging to scale. In addition, administrators can limit the amount of storage that can be assigned to a dynamic PVC request via StorageClasses.

Plan in advance the amount of storage your container will need

Kubernetes will often provide various storage sizes and capacities to account for different node sizes. To optimize usage, always know how much space your container will need and only request for that amount.

Use resource quotas to limit storage usage

Resource quotas are limits that you can set to control the amount of memory, processing, and storage that containers can use. They can be used to impose these limits on all containers on a specific namespace, backup, or service level.

Include Quality of Service (QoS) definitions if possible

Some PVC requests in specific Kubernetes platforms include an additional parameter called quality of service (QoS). Using this tells Kubernetes the nature of the workload, then assigns the best persistent storage for that scenario.
For example, if a container requires high read/write output, QoS can help Kubernetes assign SSD storage for the best results.

Lifecycle of Kubernetes PV and PVC

In Kubernetes, PVC requests for a PV resource go through the following process:

Provisioning

Provisioning is the first step in the lifecycle of a claim and a persistent volume. Kubernetes does this in one of two ways:
The first is static provisioning used to create PVs that represent real storage. These PVs already exist as part of the Kubernetes API and are manually deployed in advance by cluster administrators. Static PVs are also a common resource, making them accessible to all users in the Kubernetes cluster.
The second method is dynamic provisioning, where the PV storage is created in real-time based on a pre-configured StorageClass definition. This happens when the PVC request doesn’t match any available static PV.

Binding

Once a suitable PV storage is found that matches the PVC request, they are then bound together. Here’s how that happens.
First, note that when a user creates a PVC request, it also defines the PV’s desired storage space and access mode. A control loop routine continually monitors the system for any new PVCs.
What it does next will depend on whether the PV was provisioned statically or dynamically.
In the case of a static provision, the control loop will look for a PV that matches the specifications of the PVC. If it’s already dynamically provisioned to a PVC, the control loop will simply bind them.
The k8s PVC and PV are bound using ClaimRef, which creates a bi-directional 1:1 mapping of the two objects. This also means that both PV and PVC are exclusive to each other.

Using

After binding the PV and PVC, Kubernetes pods can now mount the storage. It does this by treating the PVC as a Kubernetes volume mount. This, in turn, will cause the Kubernetes cluster to find the PV storage linked to the PVC, then mounts that to the pod. At this point, the user can now use the PV storage in the pod.

Reclaiming

After a pod is finished using a PVC, it can be deleted and released. What happens to the bound PV storage will depend on the policy set in the PersistentVolume definition. There are two possibilities.
The first is the retain policy. Here, the PV resource is unbound from the PVC but still keeps any data from the previous pod. Hence, it can’t be deleted and used by another PVC. The administrator must manually delete this PV to make it available for use.
The second policy is delete. This completely removes the PV together with the PVC. This also erases any storage asset from external platforms like Azure Disk, GCE PD, or AWS EBS.

Creating a persistent volume

1. Create a file named ‘pv-demo.yaml’ in your editor of choice
2. Edit this file, and paste the below spec in.

apiVersion: v1
kind: PersistentVolume
metadata:
name: <pv_name>
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
portworxVolume:
volumeID: "<volume_id>"

3. Replace <pv_name> with the name of the PersistentVolume. For example pv0001
4. Change the storage capacity to your desired amount. The above example will create a 10 GiB volume.
5. Replace <volume_id> with the desired ID of the volume. For example, this could be “pv0001”.
6. Save this file
7. Create the persistent volume with the below command :

kubectl create –f pv-demo.yaml

8. You should receive the below message:

persistentvolume "pv0001" created

9. To see this persistent volume, run the below command:

kubectl get pv

10. You should see something similar to the below (depending on the name of the persistent volume, and the size you chose)

image

10. The volume has now been created, and is ready for a Persistent Volume Claim to use this.

To confirm this, we can check the storage usage (this will differ based on the underlying storage used). For Portworx, we can run pxctl status on one of the storage nodes to get the below information:

image

You will see that 10GiB has been provisioned, spread across the 5 storage nodes in the example Kubernetes environment.

For more information on Kubernetes persistent volume claims, please see this tutorial