Choosing a Kubernetes Operator for Cassandra

Contact Sales

Architect’s Corner

cubes

Kubernetes operators are software extensions to broaden Kubernetes’ functionality. These extensions automate and manage stateful applications that require domain-specific knowledge not present in Kubernetes. Through automation, Kubernetes operators simplify deployments and eliminate manual tasks to deliver scalable and repeatable processes. Some of these Kubernetes operators work with Apache Cassandra. Cassandra is a NoSQL, open-source database. Its masterless infrastructure guarantees high availability and scalability. Cassandra’s low latency combined with fast read and write capabilities means the platform can quickly replace failed nodes when an outage occurs. Since Cassandra runs on scalable and distributed infrastructure that handles large volumes of data, Kubernetes operators enable you to seamlessly configure and replicate Cassandra clusters. This guarantees fault tolerance and high data availability for containerized applications.

Although Kubernetes has evolved to be the go-to container orchestration platform, it still needs improvement in managing stateful databases. Also, running Cassandra on Kubernetes requires plenty of scripting to deploy production-ready Cassandra clusters, which can be challenging. So, Kubernetes operators provide a simple interface for automating Cassandra in Kubernetes clusters.

In this article, we’ll review the following Kubernetes operators:

Instaclustr Cassandra-operator
CassKop
Cass-operator by DataStax
Cassandra Operator by Sky UK

We’ll outline each operator’s architecture, features, ease of use, performance, security, and documentation, helping you choose which operator may be best for your needs.

Instaclustr

Instaclustr‘s open-source Kubernetes Cassandra-operator provides replicable and consistent environments that quickly reproduce across multiple production clusters. With it, you can focus on your core development deliverables without handling operational concerns while maintaining consistency.

The Instaclustr operator has two architectural components: a Cassandra controller and a custom resource definition (CRD). The CRD component is a Kubernetes extension enabling developers to define custom resources. These resources allow the Cassandra controller to detect changes in the resource’s definition.

Developers can use the CRD to create objects that give Kubernetes all the configuration it needs to deploy Cassandra.

Instaclustr offers a managed Cassandra service for companies running Kubernetes applications. Their Cassandra-operator supports up and downscaling, custom configurations, autodiscovery monitoring, and backup. Additional capabilities and features of Instaclustr’s Cassandra operator include:

StatefulSet workload API for setting up scalable Cassandra clusters
Built-in repair coordination
Prometheus monitoring API for auto-discovery and metrics
Kerberos authentication and Lightweight Directory Access Protocol (LDAP)
Terraform-based provisioning infrastructure

How to Set Up Instaclustr Cassandra-Operator

Below, we’ll demonstrate how to set up Instaclustr’s Cassandra-operator. To follow along, first, install these dependencies in your build environment:

Maven
JDK 8
Docker
Git
Go 1.13
Operator-SDK 0.16.0

Also, ensure you have:

IDE of your choice
Minikube or any local Kubernetes environment for running the Cassandra operator locally

Start by setting up the Cassandra-operator in a change data capture CDC tool with the configurations below:

cassandraAuth:
  authenticator: AllowAllAuthenticator
  authorizer: AllowAllAuthorizer
  roleManager: CassandraRoleManager

Next up, begin the installation process.

Start by confirming your Kubernetes cluster is running and configured with persistent storage. Cassandra clusters expand their capacity linearly as you add new nodes, so balance Kubernetes nodes across Cassandra’s rack configurations.

We’ll deploy the operator next. First, deploy the CRD used by the Kubernetes operator with the following command:

kubectl apply -f deploy/crds.YAML

Then, deploy the Kubernetes operator:

kubectl apply -f deploy/bundle.yaml

To establish that the operator is up and running, run this command:

kubectl get pods | grep Cassandra-operator

Cassandra often sets a system-auth keyspace with one replication factor (RF 1) and a SimpleStrategy replication strategy when creating clusters from scratch.

However, RF 1 stores your authentication information in one node and adds it to the cluster. So, the system logs you out if you scale the Cassandra cluster down and its nodes have your authentication data.

So, it’s essential to update the replication factor to the number of nodes within a particular cluster. Also, always change the replication strategy to NetworkTopologyStrategy.

For example, the command below shows that the NetworkTopologyStrategy cluster has three nodes, so always remember to replicate the system-auth keyspace to reflect the number of nodes in place.

cqlsh>  ALTER KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'test-dc': 3};

To explore more configurations, view Instaclustr’s documentation.

As you can see, installing and running Instaclustr’s operator is straightforward. However, the operator is still a work in progress that requires more stabilization. Instaclustr is pushing breaking changes to the master repo and has yet to implement some planned features.

CassKop

CassKop’s operator can manage multiple Cassandra clusters in a single Kubernetes namespace, reducing the operator’s toil. Also, you assign one CassKop to manage the Cassandra clusters in a particular namespace. These capabilities allow you to isolate and add security between each namespace, protecting your deployments from malicious activity or accidental access.

MultiCassKop, CassKop’s multi-site management tool, enables users to spread Cassandra clusters across different regions that manage independent Kubernetes clusters. In doing so, Cassandra nodes that CassKop deploys can be part of the same Cassandra ring and create custom Cassandra objects from custom resources.

Other CassKop operator features include:

Cassandra operations automation: rack-aware cluster deployment, node removal or addition, and configuring and updating C and JVM versions
Rolling updates for Cassandra clusters and restarts for Cassandra racks minimize disruptions and keep your activities running
Scaling Cassandra clusters up and down with cleanup and decommission.
Prometheus-enabled monitoring
Live backups, restores, and repairs
Casskop’s MultiCassKop operator allows users to work with multiple Kubernetes clusters.
Users can define their sidecars for configurations that custom resource-managed containers don’t provide

How to Set Up CassKop’s Operator

To get started with CassKop’s operator, ensure you have these requirements:

Helm version v3
Kubectl (a Kubernetes command-line tool), preferably version v1.13.3+
Kubernetes cluster version v1.13.3 or higher

Ideally, you should also have fast local storage. You may be able to use fast remote storage, but it has not yet been tested by CassKop.

Start by creating a Kubernetes namespace for hosting Cassandra clusters and operators with this command:

kubectl create namespace cassandra

Next, install and deploy CassKop’s operator with a Helm 3 chart. To do this, add this repo:

helm repo add orange-incubator https://orange-kubernetes-charts-incubator.storage.googleapis.com/

Then, complete this process by installing the Helm 3 chart using this command:

helm install casskop orange-incubator/Cassandra-operator

You can deploy and monitor your Cassandra cluster inside Kubernetes using the newly-installed CassKop operator. See CassKop’s documentation for further details on deploying, deleting, updating, and monitoring Cassandra clusters.

CassKop offers bare-metal features and considerable benefits such as Cassandra operations automation, out-of-the-box monitoring, and multi-site management for Cassandra and Kubernetes clusters. Performance-wise, the operator’s rack awareness spreads Cassandra’s nodes across different Kubernetes racks and helps improve Cassandra’s availability.

Cass Operator by DataStax

The DataStax Kubernetes Operator, also called Cass Operator (or cass-operator), automates Cassandra and DataStax Enterprise (DSE) deployment and management in Kubernetes. Like other operators, cass-operator provides self-orchestration capabilities, server configuration through CRDs, and rack-aware data centers.

DataStax’s Cass Operator has some stand-out features worth noting:

In a Cassandra and DSE environment, Kubernetes nodes are equal and capable of running read and write operations with no failure. So, data replicates automatically between failure and availability zones to prevent container loss and application downtime.
Its K8ssandra project provides users with a production-ready environment for operating Cassandra on Kubernetes. K8ssandra packages various tools for automating cloud-native operations such as Apache Cassandra, a data gateway, cass-operator, anti-entropy repair, monitoring, metrics, and Kubernetes Ingress solutions.
The cass-operator automates encryption by creating key stores and trust stores for internode and client-to-node encryption. Users don’t have to update authentication data manually to scale the Cassandra clusters horizontally.

A minor downside to cass-operator is that it doesn’t automate standard repairs on inconsistent data. However, it replaces the conventional repair process with NodeSync, which performs continuous and self-orchestrating data repairs.

Another limitation is that the operator doesn’t automate scheduling and backups, a feature available in CassKop.

How to Set Up DataStax’s Cass Operator

There are two ways to set up DataStax’s cass-operator:

Through K8ssandra Helm charts (if you need a full-featured cluster)
Using Kustomize (if you only need cass-operator)

Kustomize enables you to install cass-operator cluster-wide or within a namespace. Use the default install approach where the kubectl creates a cass-operator namespace:

kubectl apply -k github.com/k8ssandra/cass-operator/config/deployments/cluster

You need these prerequisites to provision K8ssandra through Helm charts:

Helm v3+
Kubectl
Kubernetes v1.17+
Kubernetes (K8s) clusters, or alternatively, you can use minikube, kind, k3d, or OpenShift CodeReady Containers to run local K8s clusters

Start by configuring the Helm chart repository using this command:

helm repo add k8ssandra https://docs.k8ssandra.io/install/local/single-cluster-helm/

Then add a Traefik Ingress repository if you plan to access K8ssandra tools from outside your Kubernetes cluster:

helm repo add traefik https://doc.traefik.io/traefik/getting-started/install-traefik/

Next, update the Helm repo listing:

helm repo update

Then, install a K8ssandra cluster by copying the YAML below to a k8ssandra.yaml file:

 cassandra:
  version: "3.11.10"
  cassandraLibDirVolume:
    storageClass: local-path
    size: 5Gi
  allowMultipleNodesPerWorker: true
  heap:
   size: 1G
   newGenSize: 1G
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi
    limits:
      cpu: 1000m
      memory: 2Gi
  datacenters:
  - name: dc1
  size: 1
  racks:
  - name: default
 kube-prometheus-stack:
  grafana:
    adminUser: admin
    adminPassword: admin123
 stargate:
   enabled: true
   replicas: 1
   heapMB: 256
   cpuReqMillicores: 200
   cpuLimMillicores: 1000

Then install K8ssandra using this command:

helm install -f k8ssandra.yaml k8ssandra k8ssandra/k8ssandra

You should get output like this:

    NAME: k8ssandra
    LAST DEPLOYED: Fri Jan 18 11:00:00 2022
    NAMESPACE: default
    STATUS: deployed
    REVISION: 1

Verify and check the status of the K8ssandra you just deployed using this command:

kubectl get pods

This is the expected output:

    NAME READY STATUS RESTARTS AGE
    k8ssandra-cass-operator-766849b497-klgwf 1/1 Running 0 7m33s
    k8ssandra-dc1-default-sts-0 2/2 Running 0 7m5s
    k8ssandra-dc1-stargate-5c46975f66-pxl84 1/1 Running 0 7m32s
    k8ssandra-grafana-679b4bbd74-wj769 2/2 Running 0 7m32s
    k8ssandra-kube-prometheus-operator-85695ffb-ft8f8 1/1 Running 0 7m32s
    k8ssandra-reaper-655fc7dfc6-n9svw 1/1 Running 0 4m52s
    k8ssandra-reaper-operator-79fd5b4655-748rv 1/1 Running 0 7m33s
    k8ssandra-reaper-schema-dxvmm 0/1 Completed 0 5m3s
    prometheus-k8ssandra-kube-prometheus-prometheus-0 2/2 Running 1 7m27s

The steps above demonstrate cass-operator’s simplicity and ease of use in a local Kubernetes cluster. To deploy production-ready K8ssandra in a managed environment, read K8ssandra’s documentation.

Cassandra Operator by Sky UK

Sky UK’s Cassandra Operator is currently in alpha status, so you can use it in a development environment, but not in production. It’s a simple operator for deploying, managing, and monitoring Cassandra clusters inside Kubernetes. Its core features include:

Rack-aware operator
Comprehensive end-to-end testing tools
Users can schedule backups with retention policies
Prometheus-enabled metrics

Sky UK’s Cassandra Operator contains multiple sub-modules that manage Cassandra operations in Kubernetes. They include:

cassandra-operator: Kubernetes operator for managing cluster lifecycle in Kubernetes
cassandra-sidecar: sidecar container for exposing the status of Cassandra nodes
cassandra-snapshots: module responsible for capturing and deleting snapshots
cassandra-bootstrapper: component that configures Cassandra nodes before deployment
fake-cassandra-docker: accelerates end-to-end testing in the Cassandra operator and snapshots
test-kubernetes-cluster: facilitates testing in the Cassandra operator and snapshots

These modules automate Cassandra operations in Kubernetes, such as generating StatefulSet resources or manually adding new users to Cassandra.

How to Set Up Sky UK’s Cassandra Operator

To deploy the operator, start by naming a Cassandra CRD in the cluster where it will run. You can create the CRD using the cluster administrator found in this file: Cassandra-operator-crd.yml.

Next, use the Kubernetes-resources template files to deploy the operator and provision resources needed in the Cassandra cluster. The files you’ll need are:

cassandra-operator-rbac.yml
cassandra-node-rbac.yml
cassandra-operator-deployment.yml

Finally, use these substitutions in the templates to deploy the operator:

$TARGET_NAMESPACE: the namespace location for deploying the operator
$INGRESS_HOST: the operator’s ingress host
$OPERATOR_IMAGE: Docker Image for the Cassandra operator
$OPERATOR_ARGS: arguments for passing to the operator

For more guidance on cluster customization, metrics, troubleshooting, and more, check out SkyUK’s documentation.

Enabling High Availability Storage in Cassandra

The Cassandra operators we’ve reviewed above provide the best-in-class architecture for deploying highly-available Cassandra clusters in Kubernetes. Each has its own advantages, disadvantages, and unique features, so which one you choose will depend on your needs.

To reinforce Cassandra’s fault tolerance, you’ll also need storage replication and health-monitoring tools that provide granular-level provisioning, security, backup, and fast failover, among other features.

Learn more about how to augment your Cassandra clusters with Portworx Data Services single-click deployment and fully automated management.

Subscribe for Updates

About Us
Portworx is the leader in cloud native storage for containers.

Thanks for subscribing!

Ryan Wallner

Portworx | Technical Marketing Manager

March 18, 2022 Technical Insights

Choosing a Kubernetes Operator for PostgreSQL

Bhavin Shah

March 4, 2022 Technical Insights

What is the Best Database for Data on Kubernetes?

Ryan Wallner

February 24, 2022 Technical Insights

Choosing a Kubernetes Operator for MySQL

Ryan Wallner

Choosing a Kubernetes Operator for Cassandra

Instaclustr

How to Set Up Instaclustr Cassandra-Operator

CassKop

How to Set Up CassKop’s Operator

Cass Operator by DataStax

How to Set Up DataStax’s Cass Operator

Cassandra Operator by Sky UK

How to Set Up Sky UK’s Cassandra Operator

Enabling High Availability Storage in Cassandra

Share

Subscribe for Updates

Ryan Wallner

Choosing a Kubernetes Operator for PostgreSQL

What is the Best Database for Data on Kubernetes?

Choosing a Kubernetes Operator for MySQL