What is Kafka?

Apache Kafka is an event-streaming platform that runs as a cluster of nodes called “brokers” and was developed initially as a messaging queue. Today, Kafka can be used to process and store a massive amount of information all while seamlessly allowing applications to publish and consume these messages stored as records within a what is called a topic. Typically Kafka is used to efficiently broker data between systems or to allow applications to react to streams of data in real time. In addition to being a popular message queue for distributed systems, it is commonly used to stream data in IoT use cases.

What is Kubernetes?

Kubernetes is an open source platform which runs a cluster of worker nodes and master nodes which allow teams to build, manage, scale, and automate containerized workloads such as Kafka. Kubernetes can manage many applications at massive scale including stateful applications such as databases or streaming platforms. Kuberenetes was built on the shoulders of giants such as Google who initially conceived the software after using similar technology to run production workloads for over a decade.

Running Kafka on Kubernetes

There are a variety of reasons a Kafka / k8s architecture is appealing. First, if your organization is set on using Kubernetes as an application platform, then this is a great reason to look at running Kafka there too. Running Kafka on Kubernetes allows organizations to simplify operations such as upgrades, scaling, restarts, and monitoring which are more-or-less built into the Kubernetes platform.

Why Run Apache Kafka on Kubernetes?

Apache Kafka often runs on Kubernetes, a system that automates deploying, scaling, and managing containers across clusters of hosts. Apache Kafka is a perfect match for next-generation cloud-native app development.

While it might take more time and resources to switch your entire infrastructure to a new platform to accommodate Kafka, Kubernetes, in this scenario, offers an easier path to adoption.

For one, your IT teams will already be familiar with the environment. If they encounter any issue, it would be easier for them to fix it. Then there’s the problem of getting management approval to migrate to a different platform. The added cost will be hard to justify instead of just sticking with the Kafka / Kubernetes operator route.

Using Kafka in Kubernetes also offers benefits outside of easier adoption. One variable that many IT teams overlook is the number of Kafka clusters they’ll need to create. In larger enterprises, this can quickly grow into a multi-cluster scenario.

Of course, Kubernetes is well-suited for building and managing multiple clusters. With Kubernetes / Kafka, DevOps processes can be much smoother and seamless, thanks to its robust tools for provisioning, monitoring, and maintaining Kafka clusters.

Ultimately, choosing to run a different platform with Kafka vs. Kubernetes will depend on your situation.

Some additional items to consider when running Kafka on Kubernetes:

Low Latency Network and Storage

Kafka deployment demands low latency network and storage which means ideal conditions have low contention for data on the wire as well as high throughput and low noise accessing storage. Dedicating fast media such as SSDs to brokers and thinking about data locality where access to data for the brokers is local to where the pod is running will increase the overall performance of the system.

High Availability for Kafka Brokers

Kafka runs as a cluster of brokers, and these brokers can be deployed across a Kubernetes system and made to land on different workers across separate fault domains. Kubernetes automatically recovers pods when nodes or containers fail, so it can do this for your brokers too. One thing to consider with high availability is what happens to the data which that broker was storing. Does the data follow the pod? Does the data get rebuilt over the network? Kafka can rebuild brokers after a node failure, but these rebuilds come at the cost of lower I/O to the application during the rebuild. Consider a data replication strategy that allows you to leverage multiple brokers for higher throughput, but that enables faster failover in the case of a node failure.

Data Protection

Kafka provides replication of topics as well as data mirroring between clusters. Replication should be considered as a way to achieve fault tolerance if a broker should fail and mirroring is typically used to make data available in another datacenter. Some items to be considered are how long it takes for replicas to be rebuilt once the broker is back online, what disaster recovery strategy is in place in case of cluster or zone failure and what level or RTO and RPO are needed between sites. Consider your throughput requirements and use only the number of brokers necessary to achieve that performance while leveraging technologies that replicate data to other hosts in the cluster such that you can maximize availability, performance, and protection.

Data Security

Kafka provides built-in security features which include authentication, access controls for operations and encryption using SSL between brokers. However, something to consider is if your data in the filesystems on disk are protected, and which users have access to manipulate those backing stores where the data lives. Ideally, organizations should protect at the application level but also secure the data layer along with it for added security.

image
WHITEPAPER
Operate and scale Apache Kafka seamlessly on Kubernetes with the Portworx® platform
image

Kafka on Kubernetes

Step-by-step guides to run HA Kafka on the most popular Kubernetes platforms

px_containers
How To

Run Kafka on Kubernetes with Portworx Data Services

Kafka 101 image
How To

Deploying Kafka on Kubernetes using Portworx Data Services

iStock
Architect’s Corner

How to Combine Kafka and Cassandra on Kubernetes

doors
Technical Insights

Choosing the Right Kubernetes Operator for Apache Kafka

Graphic-90
Technical Insights

Persistence in Event Driven Architectures

Graphic-84
How To

How to Run HA Kafka with IBM Cloud Private

Graphic-52
How To

How to Run HA Kafka with Rancher Kubernetes Engine

Graphic-46
How To

How to Run HA Kafka Cluster on IBM Cloud Kubernetes Service

Graphic-28
How To

How to Run HA Kafka on Red Hat OpenShift

Graphic-16
How To

Kafka Kubernetes in production: How to Run HA Kafka on Amazon EKS (Elastic Container Service for Kubernetes)

Graphic-15
How To

How to Run HA Kafka on Azure Kubernetes Service

Graphic-14
How To

Kafka Kubernetes tutorial: How to Run HA Kafka on Google Kubernetes Engine

Architects
Architect’s Corner

Architect’s Corner: How Aurea went beyond the limits of Amazon EBS to run 200 Kubernetes stateful pods per host

Twitter Social
How To

Running an HA Kafka cluster on Amazon Elastic Container Service (ECS)

Architects
Architect’s Corner

Architect’s Corner: Jeffrey Zampieron, CTO at Beco Inc.