This tutorial is a walk-through of how to create local snapshots of PostgreSQL persistent volume…
October 18, 2018
Kubernetes Tutorial: How to Create Local Snapshots of MongoDB Persistent Volume Claims on GKE
This tutorial is a walk-through of how to create local snapshots of MongoDB persistent volume claims on Google Kubernetes Engine (GKE) by Certified Kubernetes Administrator (CKA) and Application Developer (CKAD) Janakiram MSV.
Janakiram MSV: Hi, in this demo, I’m going to walk you through how to create a local snapshot of a running MongoDB cluster on Google Kubernetes Engine. So let’s explore the environment. We have one pod that’s currently running MongoDB as a part of the deployment. This is already populated with some sample data. So let’s take a look at the sample data available within the pod. So now we are invoking the Mongo shell. And we will run a couple of commands to verify that the sample data is indeed available. So, we have this pod populated with a collection which has some records. Perfect. Now we’re going to take a snapshot of this data, kill the running pod, and then re-create or restore it from a new pod. So how do we take a snapshot?
Well, it’s pretty straightforward. So Kubernetes already has the concept of volume snapshot, and using that along with Portworx, we’re going to take a consistent snapshot of MongoDB currently running as a pod. So this is the volume snapshot manifest. And if you notice, the name of this is px-mongo-snapshot and we are associating this with px-mongo-pvc, and px-mongo-pvc is the name of the actual PVC that is backing the current pod. So when we are defining this snapshot, we are asking Portworx and behind the scenes, the storage orchestration engine for Kubernetes called STORK, to take a snapshot of this volume called px-mongo-pvc. So let’s create the snapshot. So this is going to result in a new volume snapshot, we can verify it with a couple of commands, volumesnapshot followed by volumesnapshotdatas.
So these two commands will ensure that we have the snapshot that’s readily available. So the most recent one, what you see here is the outcome of the previous step, which is the creation of volume snapshot. Alright, so now with the volume snapshot in place, we’re going to simulate data corruption by deleting the MongoDB collection. So before that, we’ll get the name of the pod and then we will access the Mongo shell. And we’re going to do something pretty crazy. We’re going to drop the collection called ships. And after this we cannot really access the data, it’s not existing. Now, thanks to the snapshot, we are going to create a new pod, but before that, we need to create a PVC from the snapshot. So as I mentioned, the magic behind snapshot is orchestrated by one of the custom scheduler that Portworx has created called STORK which is the STorage ORchestration for Kubernetes.
Now using that, we are going to create a PVC. So, this is a PVC definition, except that we are annotating it to create from one of the existing snapshots, which is created here. So the px-mongo-snapshot is going to be used by this PVC to create a 2GB PVC. This also gives us a chance to expand the size of a PVC which may be otherwise a bit difficult. So now we’re going to create a new PVC which is restored from the snapshot. So let’s go ahead and create that. So now the PVC is created. Now when we actually do get PVC we’ll notice there are two PVCs. One is the original PVC from which we have taken the snapshot, and the other one that we have restored. And this is based on the stork-snapshot-sc, the storage class responsible for orchestrating the volume snapshots and the co-location of the pods on the same node, where the data is restored. Perfect.
Now it’s time for us to create a new instance of MongoDB which will have the data intact. So how does that look like? Well, it is not very different from the standard definition of a deployment except that we are now pointing it to a mountPath, which is backed by the PVC created in the previous step. And because this PVC is restored from an existing snapshot, data is pre-populated, which means the moment the pod comes up, it automatically has the data that was created to the snapshot. So the beauty of this design is that the pod doesn’t even know that it is actually a replica or a clone that is talking to a snapshot. It doesn’t have any of that knowledge, except that it knows it needs to create the mountPath from an existing PVC. So time for us to go ahead and create the pod.
So this step will result in the creation of a new pod. So let’s check with kubectl get pods and we’ll see there are two. One is the original one where we corrupted the data deliberately, and the other one, which has been restored from the snapshot. Now, we’ll grab the name of this pod. So, that is the new pod where STORK has created the PVC and associated that with the actual pod. So now we are going to access the shell. So we are within the MongoDB shell of the pod that is created from the snapshot. Now it’s time for us to access the data. There we go. Everything seems to be intact. So even then… Give me a second while I get this right. So we are going to now access all the records of this collection. Perfect. So this demonstrates how you can essentially create a local snapshot and restore that to create a brand new pod that has data intact. I hope you found this video useful. Thanks for watching.