Run Hadoop, Spark, ElasticSearch in Containers

Contact Sales

In a world of big data, business intelligence depends on data processing applications. Workloads built on tools like ElasticSearch, Riak, Cassandra, and Hadoop allow large amounts of data to be processed quickly. But to support these workloads, DevOps teams must manage a variety of big data workloads on the same infrastructure. And all these workloads need to scale compute independently from underlying storage. That means that data should be separate from but accessible to your compute cluster.

Portworx supports your data processing workloads with a data layer that can support your scale-out big data jobs, whether they need high-performance, IOPS-optimized storage, or less expensive commodity storage for batch jobs.

Problems with data processing today:

Storage infrastructure doesn’t map to scale-out compute clusters
Slow volume provisioning makes it hard to quickly scale compute
It is difficult to support IOPS-intensive workloads and batch jobs on the same infrastructure

Portworx Works with Major Data Processing Platforms

Containerized Storage that Maps to Your Data Processing Workloads

Portworx storage is designed to work alongside scale-out compute clusters like those powering big data workloads. By turning commodity servers into a hyper-converged scale-out storage cluster, Portworx lets you scale your storage as you scale your compute cluster.

Volumes Are Ready as Soon as Your Container Starts

One of the main benefits of containers is how quickly they launch. But if you have to wait 30-45 seconds to mount a volume to a container each time it starts, bursting to 1,000 nodes for quick data processing is anything but quick. Portworx enables on-demand data volumes for your containers as soon as the containers start.

Class-of-Service Lets You Pick the Right Storage for Each Data Processing Job

Not all big data workloads are created equal. Your ElasticSearch-based business intelligence tooling might need IOPS-optimized storage, while your Hadoop batch jobs might be fine running on slightly slower—but much cheaper—HDDs. With Portworx, you can match the containerized workload to the storage infrastructure optimized for the task at hand. Portworx automatically fingerprints all storage resources in your cluster and presents this storage back to your containerized applications based on the Class-of-Service (COS) requested by the container. By setting the COS to “High” for ElasticSeach, and “Low” for Hadoop, your jobs will automatically run on the most efficient hardware, so you can tier your storage appropriately.

Explore Portworx Features

Containerize Your Data Processing Workloads

Get storage that supports your scale-out data processing jobs.

Portworx Works with Major Data Processing Platforms

Containerized Storage that Maps to Your Data Processing Workloads

Volumes Are Ready as Soon as Your Container Starts

Class-of-Service Lets You Pick the Right Storage for Each Data Processing Job

Related Blog Posts