Kubernetes Tutorial: How to Create Cloud Snapshots of MySQL Persistent Volume Claims on GKE

This tutorial is a walk-through of how to create cloud snapshots of MySQL persistent volume claims on Google Kubernetes Engine (GKE) by Certified Kubernetes Administrator (CKA) and Application Developer (CKAD) Janakiram MSV.

TRANSCRIPT:

Janakiram MSV: Hi. In this demo, I want to walk you through creating and restoring a cloud snapshot. A cloud snapshot is a lot like a local snapshot except for the fact that this snapshot is stored in an object storage service that is compatible with Amazon S3 API. In GKE, when we create a cloud snapshot, we’re going to store the data in Google Cloud Storage. And because Portworx has to access an external resource, we need to establish a security context for Portworx to talk to the object storage service. So, the work flow for creating and restoring a snapshot remains the same, but in cloud snapshot, we need to perform additional steps to create the infrastructure for Portworx to talk to the object storage.

So, in this demo, we’ll first configure the credentials required for Portworx to talk to the object storage, and then we’ll proceed with the remaining steps of creating and restoring a snapshot. So, let’s get started. Now before we go any further, we need to make sure that we create a service account in GCP that has enough access to object storage service. So for this, I suggest you to access API and services, and go to the credentials part of it. Within credentials, click on Create credentials and choose a service account key.

Create a new service account, give it a name, something like px-snap, and make sure that you are selecting appropriate roles for this service account. And because we want this role to talk to object storage, it’s a good idea to make this as a storage admin. This will ensure that Portworx is able to perform object life cycle management on the objects created within Google Cloud Storage.

So we choose storage admin, and then click on Create. So when we click on Create, it’s going to download a JSON document to our development machine, which is going to act as a private key for Portworx to access the service. To save time, I’ve already gone ahead and created the JSON private key, and we will copy that over to the node and perform the remaining steps. So, this is already downloaded to my machine. Now, let’s perform some basic steps to make sure we are creating the credentials. First thing, we are going to grab the node where MySQL is running. This can be any node, but it is just easy for us to use kubectl get pods to grab the name of the node. And then we will copy the JSON file that we downloaded when we created the service account to the node where we are going to perform the authentication operation.

So, we will copy this to the node. Now this is done, we can now SSH into the node to verify that the JSON private key is indeed copied. Alright, so now we see that the JSON file has been successfully copied over to the node. With this, we can go ahead and create a credential. So let’s access the list of credentials. Obviously, this is empty because we haven’t created any credentials for Portworx. We can do that by accessing the pxctl binary, available in any node running Portworx storage.

Now we are going to create a credential, bypassing the provider. In our case, this is Google. The project ID, that is very specific to your GCP project. And finally, the JSON private key that we downloaded in the previous step. So, let’s create credentials that will be used to create the snapshot. Now the credentials are created successfully.

I want to remind you that before you perform this step, you are to ensure there are two things in place. One: While creating your spec for applying the Portworx YAML file, make sure you’re using STORK, number one. Number two: Also make sure that your Secrets that you are creating within Portworx are stored within Kubernetes. For further information, refer to Portworx documentation on how to enable STORK, and how to use Kubernetes for the Secrets of Portworx.

So, with that in place, we are now ready to create a cloud snap. So let’s get out of the node and then perform the remaining steps. So the very first thing that I want to do here is to simulate data corruption. We do that by dropping an existing database, but before that we need to take a snapshot. So, let’s access the pod where MySQL is currently running. We will invoke MySQL Shell. And then let’s ensure that we are able to access the sample data. So I’ll query from this table called officers, and we see that the data is intact. Perfect. Now, let’s get out of this and create a snapshot.

So before that, let me show you the available PVCs. So currently we only have one PVC, that is bound to our production MySQL pod. So now it’s time for us to create a snapshot. So, why did I show you the PVC? Well, we need to create the snapshot from the same PVC. So, here is a definition of the cloud snapshot. If you are familiar with local snapshot of Portworx, this is not very different except that there is a specific annotation we need to pass to STORK to the runtime to make sure that it is treated as a cloud snapshot.

While everything remains the same, we go ahead and introduce this new annotation called Portworx snapshot type is cloud. And this is pointing to the PVC that’s currently running the pod in production. So that’s the reason why I showed you the PVC here. So this PVC is going to act as a source for the snapshot we’re going to create. Alright. Now, let’s go ahead and create this snapshot. So, once we run this, we can verify through kubectl get volume snapshot, and we can also get the snapshot data which will confirm that the snapshot is intact and it is created. So, this is now in place. We can verify this through another mechanism. So when we go to the GCS, the Google Cloud Storage object browser and do a refresh, we’ll now see a new bucket, and this bucket is created by Portworx by using the credentials that we created in the previous step, and this bucket is acting as a container for storing the cloud snapshot. So, this confirms that the snapshot has been successfully created.

Now we are ready to simulate data corruption. So, let’s go back to the production pod, access MySQL shell and simulate data corruption by dropping this database called classicmodels. We can safely do it because we have just taken a snapshot and I’m very confident that we’ll be able to restore it. Now, when we actually create a new PVC, we are going to point it to the snapshot that we created in the previous step. But before that, let’s take a look at the storage class. So, the most important player in this entire work flow is the storage class called stork-snapshot-sc. This is not created by us explicitly. This is created when we set up Portworx on our GKE cluster. And this storage class is responsible for the magic behind cloud snap.

It coordinates with STORK, which is the Storage Orchestrator for Kubernetes from Portworx to ensure that it is talking to the external object storage service and it is able to perform the storage ops behind the scenes to ensure that the PVC is indeed created from one of the snapshots, and also managing the entire life cycle of the snapshot. So, I want to call out this storage class specifically because it is responsible for all the behind the scenes operations performed while we are using the familiar kubectl to manage it.

Alright. So now it’s time for us to create a new PVC, and the PVC is not very different. In fact, the PVC doesn’t even know that it is actually created from the cloud snapshot. Again, the most important thing here is the annotation where we are actually pointing it to px-mysql-snapshot, which is the name of the snapshot that we created in the previous step. So, from an existing PVC we create a snapshot and from that snapshot we are going ahead and creating a new volume. A new volume claim. So, this volume claim is based on the storage class name, stork-snapshot-sc, which I just touched upon, and it also gives us a chance to adjust the size of the volume. The original claim was actually of 1GB, but because we are restoring it from a snapshot, we have the opportunity to expand it further, to make it 2GB.

Alright. So now, let’s go ahead and create the PVC from the storage class that we created in the previous step. So this is going to now result in a new PVC. So we’ll actually have two of those, one which is bound to our original pod running in production, where we simulated data corruption and the most recent one that was created from the snapshot, the cloud snapshot.

So, now we only have one pod where the data is already corrupted and we drop the production database. So now it’s time for us to create a new pod that is pointing to the volume restored from the snapshot. So let’s take a look at the definition of this pod. Again, the pod doesn’t have a clue where it is actually being restored from. Everything is exactly the same, except that we are now pointing to the volume claim that is restored from this snapshot. So, px-mysql-snap-clone is what we created in the previous step. So the pod simply points to the PVC and the PVC has been already restored from the cloud snapshot, which means that data is already there and now we are simply pointing the pod to the PVC that’s readily available for us to access.

Alright. So now the moment of truth where we actually go ahead and create the new pod backed by the recovered snapshot. So let’s wait for this pod to get created. So, this is going to take a few seconds. I’m going to put this in watch mode. Alright, now it’s in running mode. So let’s go ahead and access the MySQL shell within this pod.

Okay, now we are inside the MySQL shell, let’s check if we have classicmodels database available to us. There it is, still available, and now finally we’re going to access the sample data. Perfect. So, the data even though it has been corrupted in the original production cluster, thanks to the snapshot, is now available to us in another pod. And now we can continue to access the data through various applications because we have successfully restored from an existing snapshot.

So this gives us a lot of confidence in managing production database workloads because we can schedule point in time snapshots periodically to make sure that we can restore it in case of a disaster or in case of a disruption. This is a very powerful concept based on STORK and the cloud snapshot concept of Portworx. I hope you found this video useful. Thanks for watching.