Kubernetes Tutorial: How to Failover MongoDB on Google Kubernetes Engine (GKE)

Contact Sales

How To

This tutorial is a walk-through of how to failover MongoDB on Google Kubernetes Engine (GKE) by Certified Kubernetes Administrator (CKA) and Application Developer (CKAD) Janakiram MSV.

TRANSCRIPT:

Janakiram MSV: Hi, in this demo I’m going to walk you through how to do a failover of a MongoDB instance, running on Google Kubernetes engine. So we have a MongoDB pod already running, which is backed by Portworx. Now we’re going to populate some data and then simulate a failure to see how we can do a graceful failover. So, let’s first get access to the running MongoDB pod. So we have exactly one pod that’s running as a part of the MongoDB replica set, which is actually a part of Kubernetes deployment. Don’t get confused between the replica set terminology. We are not referring to a MongoDB replica set, we are referring to the Kubernetes replica set instantiated by a deployment.

Alright. So now, we have the pod in place. So this is the name of the pod that’s running MongoDB. Now we can use that to access the MongoDB shell. So once we are within the MongoDB shell, I’m going to run a set of commands which will populate this collection with some sample data. So we just created a collection called ‘ships’. So, let’s verify the availability of the data. So, this is one of the records of the collection. We can also look at everything that we populated so far in a better format. And, of course, we can also run some other queries. For example, this returns all the names. And finally, this is one more query. This is just to ensure that the data is written to MongoDB in a consistent form. Alright, it looks like the data is in place, so let’s come out of this shell and continue with the failover use case.

So now, what we’re going to do, is to simulate a failure. So we have a MongoDB pod, and that is populated with some sample data. Our goal is to simulate a crash where the node becomes unavailable, the pod becomes unavailable, but we’ll be able to access the data, which is already written through the Portworx storage engine. So, let’s get the name of the node on which our pod is running, because we want to simulate a crash of this node. So, it looks like this is the node on which our pod is scheduled. Now what we’re going to do is to basically cordon off this node. Now what cordoning does is it’s going to ensure that none of the pods are scheduled on it. So now you notice that the pod has a status of scheduling disabled, which means none of the new pods will be scheduled or placed on this node. Why are we doing it? Because I want to force Kubernetes to schedule the newly created MongoDB pod on any other node, except this, because I want to demonstrate how the data is replicated, and how we can essentially create a new instance of MongoDB but talking to the same mount point and same data that’s created in the previous step. Perfect.

So now, we’ll do the next step in our simulation, which is essentially deleting the pod. So now we are deleting the pod, which is a part of the deployment where the replication factor is set to 1. So now, what happens is, because it’s a part of the deployment with replica is equal to 1, we’ll immediately see that Kubernetes is creating another instance of MongoDB pod. So one is getting terminated, and the other one has been in running mode for the last six seconds. That means Kubernetes has already created a new pod that’s running MongoDB. But the question is, will it really retain the same data? Well, before we do anything, let’s uncordon the node because the pod is already scheduled on some other node. So now we have uncordoned the previous node.

So now let’s get the name of the newly created pod and then use it to access the shell. So now we are going to be dropped right inside the MongoDB shell, and it’s time for us to run this query on our ‘ships’ collection, and there we go. So the data is intact, which means even after deleting the pod and forcing Kubernetes to schedule this pod on a completely different node, we are able to access the data. So, this is an indication of how you can create just one instance of a workload, but thanks to the high availability of the storage fabric, your data is always available.

So, this is a very powerful concept where you need not configure the workload at a high availability mode, but the data becomes highly available. Thanks for watching.