Apache Kafka is an event streaming platform that runs as a cluster of nodes called “brokers” and was developed initially as a messaging queue. Today, Kafka can be used to process and store a massive amount of information all while seamlessly allowing applications to publish and consume these messages stored as records within a what is called a topic. Typically Kafka is used to efficiently broker data between systems or to allow applications to react to streams of data in real-time. In addition to being a popular message queue for distributed systems, it is commonly used to stream data in IoT use cases.
Kubernetes is an open-source platform which runs a cluster of worker nodes and master nodes which allow teams to deploy, manage, scale and automate containerized workloads such as Kafka. Kubernetes can manage many applications at massive scale including stateful applications such as databases or streaming platforms. Kuberenetes builds on the shoulders of giants such as Google who initially conceived the software after using similar technology to run production workloads for over a decade.
There are a variety of reasons architecting Kafka on Kubernetes is appealing. First, if your organization is standardizing on using Kubernetes as an application platform, then this is a great reason to look at running Kafka there too. Running Kafka on Kubernetes allows organizations to simplify operations such as upgrades, scaling, restarts, and monitoring which are more-or-less built into the Kubernetes platform.
Kafka demands low latency network and storage which means ideal conditions have low contention for data on the wire as well as high throughput and low noise accessing storage. Dedicating fast media such as SSDs to brokers and thinking about data locality where access to data for the brokers is local to where the pod is running will increase the overall performance of the system.
Kafka runs as a cluster of brokers, and these brokers can be deployed across a Kubernetes system and made to land on different workers across separate fault domains. Kubernetes automatically recovers pods when nodes or containers fail, so it can do this for your brokers too. One thing to consider with high availability is what happens to the data which that broker was storing. Does the data follow the pod? Does the data get rebuilt over the network? Kafka can rebuild brokers after a node failure, but these rebuilds come at the cost of lower I/O to the application during the rebuild so consider a data replication strategy that allows you to leverage multiple brokers for higher throughput, but that enables faster failover in the case of a node failure.
Kafka provides replication of topics as well as data mirroring between clusters. Replication should be considered as a way to achieve fault tolerance if a broker should fail and mirroring is typically used to make data available in another datacenter. Some items to be considered are how long it takes for replicas to be rebuilt once the broker is back online, what disaster recovery strategy is in place in case of cluster or zone failure and what level or RTO and RPO are needed between sites. Consider your throughput requirements and use only the number of brokers necessary to achieve that performance while leveraging technologies that replicate data to other hosts in the cluster such that you can maximize availability, performance, and protection.
Kafka provides built-in security features which include authentication, access controls for operations and encryption using SSL between brokers. However, something to consider is if your data in the filesystems on disk are protected, and which users have access to manipulate those backing stores where the data lives. Ideally, organizations should protect at the application level but also secure the data layer along with it for added security.