Running stateful applications such as Elasticsearch as a microservice is not a trivial task because of how container orchestrators like Kubernetes treat the lifecycle of containers and pods. In essence, these resources are ephemeral entities with a short life span, depending on cluster state and application load …READ NOW
The Elasticsearch application is a highly scalable enterprise-grade search engine that can hold massive amounts of data and can be used by various other applications for data discovery. Some of the most common use cases for Elasticsearch is loading and visualizing source logs for analysis. This is where the term ELK comes in. ELK stands for Elasticsearch Logstash and Kibana. Logstash is a tool that can load data from a source and Kibana is a tool integrated with elasticsearch to visualize data.
Kubernetes is an open-source platform which runs a cluster of worker nodes and master nodes which allows teams to deploy, manage, scale and automate containerized workloads such as Elasticsearch. Kubernetes can manage many applications at massive scale including stateful applications such as databases or search platforms. Kuberenetes builds on the shoulders of giants such as Google who initially conceived the software after using similar technology to run production workloads for over a decade.
If you are architecting an Elasticsearch stack such as ELK to be highly scalable, performant, and secure and want to follow today’s DevOps patterns then Kubernetes is a great place to start. This is because Kubernetes allows organizations to simplify operations such as upgrades, scaling, restarts, and monitoring which are more-or-less built into the Kubernetes platform. Though Kubernetes offers a lot out of the box it is not a silver bullet for what you will need to consider in order to run Elasticsearch.
Elasticsearch can be memory heavy as it sorts and aggregates data, so make sure that your Kubernetes worker nodes have enough memory to run Kubernetes, data management tools and Elasticsearch itself. Having worker nodes that are labeled for memory intensive workloads may be a good way to deploy a statefulset that demands a certain amount of memory.
Elasticsearch can be write-heavy which means you need a storage and data management layer that is flexible enough to expand to meet these requirements but that is also closely tied to your Kuberenetes cluster for ease of use. Enforcing io profiles optimized for Elasticsearch and storage pool types that use SSDs can help with performance. Additionally, using a container storage and data management layer that enforce data locality (i.e. pod and data volume on the same host) for your Pod’s persistent volume claims even in the event of failover will ensure the best possible configuration.
Elasticsearch provides built-in security features which include application-level RBAC, and encryption of data in flight as well as auditing. However, you should also consider if your data on-disk are protected, and which users have access to manipulate these backing stores. For added security, organizations should protect their data at the application level as well as secure the data layer with encryption and access controls.
Elasticsearch is able to receive and keep that information in indices. These indices can also have data retention policies set based on a curator configuration. However, that is just half the battle. Pick a container storage and data management solution that provides proper data replication, backups, disaster recovery, and off-site backups to recover from node or site failure and gain maximum availability and protection.