Elasticsearch on Kubernetes: Step-by-step guide to run ELK on the most popular k8s platforms

Contact Sales

What is Elasticsearch?

The Elasticsearch application is a highly scalable enterprise-grade search engine that can hold massive amounts of data and can be used by various other applications for data discovery. Some of the most common use cases for Elasticsearch is loading and visualizing source logs for analysis. This is where the term ELK comes in. ELK stands for Elasticsearch Logstash and Kibana. Logstash is a tool that can load data from a source and Kibana is a tool integrated with elasticsearch to visualize data.

Benefits of Elasticsearch

One of the biggest advantages of Elasticsearch is its speed. The main reason Elasticsearch offers this level of speed is that it’s deployed on top of Apache Lucene, which provides powerful searching and indexing features.

Elasticsearch can perform text searches at close to real-time speeds. This allows you to get search results with minimal latency – sometimes as fast as 10 ms. In contrast, the same search in a standard SQL database can take more than 10 seconds, even in optimal conditions.

This setup makes Elasticsearch suitable for search requests in time-critical scenarios like infrastructure monitoring and cybersecurity applications.

Another positive feature of Elasticsearch is its distributed architecture. That means data is duplicated and spread out in different containers, called shards. Not only does this make information redundant and reliable, but it also makes searches super fast. In addition, the distributed nature of Elasticsearch also allows it to handle huge volumes of searches – up to several petabytes at a time.

Because Elasticsearch is distributed, it’s also easy to scale. It can be expanded horizontally by adding any number of servers, nodes, or clusters. This gives it the ability to store and process thousands of gigabytes of information without suffering in performance.

Elasticsearch also uses the JavaScript Object Notation (JSON) format, which is easy for humans to read and machines to parse and interpret. Compared to a regular database, a NoSQL database that uses JSON can realize higher search performance and speeds. JSON is also compatible with a wide array of programming languages, which makes it easy for any developer to integrate into any software project.

Lastly, Elasticsearch is a scheme-free type of storage that doesn’t need data definitions. You also don’t need to specify the data type explicitly because it will either assign a default type or detect it automatically. Overall, this makes managing the data much easier.

What is Kubernetes?

Kubernetes is an open-source platform which runs a cluster of worker nodes and master nodes which allows teams to deploy, manage, scale and automate containerized workloads such as Elasticsearch. Kubernetes can manage many applications at massive scale including stateful applications such as databases or search platforms. Kuberenetes builds on the shoulders of giants such as Google who initially conceived the software after using similar technology to run production workloads for over a decade.

Benefits of Kubernetes

The main advantage of Kubernetes is that it speeds up development. The containerized approach means applications can be developed as true microservices that communicate via API calls. Thus, the development team can code, test, and deploy individual components independently and in parallel with the rest of the project.

Kubernetes also streamlines development by enabling teams to push container apps easily across the pipeline. Moving from production to testing, for example, doesn’t require installing the software or setting up the environment. Containers are self-sufficient so that they can be simply deployed straight away to the testing server.

The result is that dev teams can shorten their timeframes and even reduce their costs.

Container orchestration solutions like Kubernetes are particularly cost-efficient. Resources can be automatically controlled and allocated to the applications that need the most. Most container tasks are also automated in Kubernetes, allowing IT staff to focus more on higher-level tasks.

Kubernetes also gives unparalleled scalability to organizations with its auto scaling tools like VPA and HPA. Operations can grow on-demand during peak demand without having to set up and invest in the infrastructure. Kubernetes simply allots new resources to compensate.

Conversely, Kubernetes can re-assign resources to other applications when demand dies down. This can help make your operation incredibly efficient by minimizing wasted resources.

Related to this is Kubernetes’ flexibility in any environment. Thanks to its platform independence, Kubernetes can run in any network infrastructure — whether it be cloud, on-premise, or hybrid.

That means enterprises are free to migrate their operations as they see fit, providing unparalleled flexibility. In line with this, Kubernetes offers various tools and methodologies such as lift and shift and re-platforming. This allows developers to take any application – even traditional monolithics – and redeploy them as containers with minimal code changes.

Indeed, Kubernetes can help your operations move much more rapidly – a necessary characteristic to thrive in today’s competitive landscape.

Why Deploy Elasticsearch on Kubernetes?

The Elasticsearch / Kubernetes combination is a good idea in theory and practice due to the way Elasticsearch is built.

Remember that Elasticsearch has a distributed architecture. Essentially, it’s composed of multiple nodes, each contained in a cluster. It’s this setup that allows Elasticsearch to perform search queries quite rapidly.

Nodes in Elasticsearch often have different roles. A master node, for instance, is responsible for controlling the entire Elasticsearch cluster. Some nodes are focused on data indexing, while others are primarily for load balancing.

As it turns out, the containerized approach of Kubernetes is the perfect fit for Elasticsearch’s distributed nature.

For one, the Elasticsearch / Kubernetes solution is a much better approach than using virtual machines (VMs) to run Elasticsearch instances. The latter is much more complicated to deploy and much less efficient.

An Elasticsearch / Kubernetes operator makes configuring, deploying and monitoring Elasticsearch instances trivial. Using a few commands, you can quickly install it on Kubernetes. You can also opt for other solutions like Helm Charts to further automate the process.

You can also exploit the many advantages of containers, such as automatic scaling and migration between hybrid and cloud environments. Elasticsearch / k8s can also give added resiliency, especially during node restarts.

The key to making a stateful application (like Elasticsearch) run on a stateless system (like Kubernetes) is to use Persistent Volumes (PV). This solution bridges the gap between the two different environments, allowing Elasticsearch to deploy stateful workloads on clusters and nodes in Kubernetes.

The bottom line is that the Elasticsearch / Kubernetes pairing makes the search engine even more powerful, fast, and efficient.

Running ElasticSearch on Kubernetes

If you are architecting an Elasticsearch stack such as ELK to be highly scalable, performant, and secure and want to follow today’s DevOps patterns then Kubernetes is a great place to start. This is because Kubernetes allows organizations to simplify operations such as upgrades, scaling, restarts, and monitoring which are more-or-less built into the Kubernetes platform. Though Kubernetes offers a lot out of the box it is not a silver bullet for what you will need to consider in order to run Elasticsearch.

Proper Worker Configuration

Elasticsearch can be memory heavy as it sorts and aggregates data, so make sure that your Kubernetes worker nodes have enough memory to run Kubernetes, data management tools and Elasticsearch itself. Having worker nodes that are labeled for memory intensive workloads may be a good way to deploy a statefulset that demands a certain amount of memory.

Reliable Container Storage and Data Management

Elasticsearch can be write-heavy which means you need a storage and data management layer that is flexible enough to expand to meet these requirements but that is also closely tied to your Kuberenetes cluster for ease of use. Enforcing io profiles optimized for Elasticsearch and storage pool types that use SSDs can help with performance. Additionally, using a container storage and data management layer that enforce data locality (i.e. pod and data volume on the same host) for your Pod’s persistent volume claims even in the event of failover will ensure the best possible configuration.

Data Security

Elasticsearch provides built-in security features which include application-level RBAC, and encryption of data in flight as well as auditing. However, you should also consider if your data on-disk are protected, and which users have access to manipulate these backing stores. For added security, organizations should protect their data at the application level as well as secure the data layer with encryption and access controls.

Data Protection & Disaster Recovery

Elasticsearch is able to receive and keep that information in indices. These indices can also have data retention policies set based on a curator configuration. However, that is just half the battle. Pick a container storage and data management solution that provides proper data replication, backups, disaster recovery, and off-site backups to recover from node or site failure and gain maximum availability and protection.

Best Practices for Running Elasticsearch on Kubernetes

Once you’ve decided to give the Elastic / Kubernetes pipeline a try, here are some tips and best practices to follow.

First, make sure you’re using the latest version of Elasticsearch and the Elastic operator for Kubernetes. This ensures you have the latest updates and fixes that can protect you from potential security vulnerabilities and give you access to the latest features.

Before deploying the Elasticsearch operator, make sure your security is established first. Elasticsearch is particularly susceptible to hackers, who can easily find and infiltrate an exposed port, even if you have authentication protocols. Because of this, never expose Elasticsearch directly to the Internet. If you need to, have the Kubernetes environment do it instead.

Speaking of the Kubernetes environment, keeping it as secure as possible is vital. Utilize Kubernetes’s built-in security controls, such as limiting pod access and image scanning. Also, ensure your foot filesystem is well-protected – make it read-only, if possible.

Once that’s set up, you must enable communication between Elasticsearch nodes. You can do this by enabling SSL/TLS. This also allows encryption of internode transfer of data, which improves the security and privacy of the cluster.

As an added layer of security, you should also enable role-based access (RBAC) with Kibana. This allows you to manage who can access Elasticsearch nodes down to each data field.

You should also enable Elasticsearch’s audit logging. This is crucial for monitoring your clusters and recording what happens during an attack. The latter is useful as it gives you forensic evidence to know how a breach occurred so that you can improve your security.

Now would also be a good time to consider your storage approach with Elasticsearch. An easy way is with a storage platform that handles provisioning in the background. A good option is the data management solutions offered by Portworx Enterprise.