Kubernetes Tutorial: How to Deploy MongoDB on Google Kubernetes Engine (GKE)

This tutorial is a walk-through of how to deploy MongoDB on Google Kubernetes Engine (GKE) by Certified Kubernetes Administrator (CKA) and Application Developer (CKAD) Janakiram MSV.


Janakiram MSV: Hi. In this demo, I’m going to walk you through the steps involved in configuring a highly available instance of MongoDB on Google Kubernetes Engine. Let’s start by exploring the environment. So we have a three-node cluster configured on Google Kubernetes Engine. On top of that, we also have Portworx installed and configured. Since Portworx is installed as a daemon set, we see exactly three instances running, three pods on each node. With the GKE cluster and Portworx in place, it’s time for us to explore the configuration required to deploy MongoDB. It all starts with the creation of a storage class.

Let’s take a look at the storage class definition. A storage class in Kubernetes acts like a storage driver. It’s essentially an interface between the underlying storage engine and the workload. Out of the box, Kubernetes comes with multiple storage classes, which you are already familiar with. The Portworx storage class, apart from the default settings, comes with additional parameters. For example, to ensure high availability of our workload, we can mention the replication factor. In this case, I am defining the replication factor as 3, which means the data is going to be replicated across at least three nodes. Similarly, we can define the throughput through IO priority and even the file system. In this case, we have configured the file system as XFS.

Let’s go in and create the storage class. The storage class, as I mentioned, is going to interface between Portworx and MongoDB. Let’s verify the creation of the storage class. The px-ha-sc is the one that we just created, and this is going to be the interface between the rest of the workload and the Portworx cluster, the storage cluster. With the storage class in place, it’s time for us to create what is called as the persistent volume claim or the PVC.

Let’s take a look at the definition of the PVC. This PVC is based on the storage class that we created in the previous step. The connection between the PVC and Portworx is the annotation that mentions the storage class. So px-ha-sc is what we just created here. And we are defining this PVC with storage space of 1GB. The PVC is going to be exposed to the pods that we’re going to create later.

So let’s now create the PVC. This PVC is going to be exposed to the workload. Let’s ensure that this is available and ready to be used. Now, once the status turns to bound, we are ready to consume it in our workload. In the final step, we are going to define the actual MongoDB pod, which is going to be talking to the storage cluster backed by Portworx.

Let’s take a look at the application definition. This is a standard Kubernetes deployment, not very different from the other deployment manifests that you have seen. We are defining the replicas as one. Understand that this is a stateless pod. This is not a stateful set or any special pod. It is exactly like any other deployment that we would define, but what makes the difference is the association of this pod with the PVC that we created in the previous step. And that PVC is mounted as /data/db, which is the default location where MongoDB stores its data. And through this association, we’ll ensure that the data written by MongoDB is always stored on the Portworx cluster, which is by default replicated across multiple nodes.

And this is how we ensure the data is available across multiple nodes, even though the pod in itself is stateless, the underlying storage fabric is completely distributed and available. This will ensure that the subsequent instances of MongoDB when they are created are going to be instantly associated with exactly the same mount point and the underlying data that was created by this deployment.

Let’s go ahead and create the pod as a part of the deployment. Now we’re going to go ahead and create the app. Let’s verify this by checking the number of pods, we should have one and exactly one, representing the MongoDB pod as a part of the deployment. And remember, we just have one replica set as a part of the deployment. We can also verify the volume associated with this PVC. Portworx typically creates a volume per PVC. So now we are going to grab that and use that to verify the creation of the volume and also additional information. Before we can access that, we need to get the pod name of the Portworx daemon set, so one of the pods running as a part of the daemon set, so we will now populate PX pod with one of the pod names running in cube system as a part of the daemon set, and this will give us access to a binary called pxctl and pxctl is going to help us inspect the volume.

So now, we have the pod name followed by the volume name associated with the PVC and we can explore the volume by running this command, which is pxctl within the PX pod running as a part of the daemon set. So now, let’s run this and we should be able to see a lot of interesting information. So this is the volume ID and it is of size 1GB based on the file system XFS, and here we see that because of the replication factor set to 3, the data is instantly replicated across three nodes. The replication status is up and the current volume is associated with this MongoDB pod, which is a part of the replica set.

So that’s how we associate a workload with the underlying Portworx storage engine. I hope you found this video useful. Thanks for watching.

Janakiram MSV

Contributor | Certified Kubernetes Administrator (CKA) and Developer (CKAD)

Share Share on Facebook Tweet about this on Twitter Share on LinkedIn

Back to Blog