Umair Mufti is the product manager for Portworx Data Services at Portworx by Pure Storage. You can read about the Portworx Data Services announcement here.
0.7.6. That was the latest release of Docker when I joined DreamWorks Animation in 2014. The company had hired me as a Cassandra DBA, and my first assignment was to figure out how to run Cassandra inside a container. Eight years and three LinkedIn updates later, I’m still working on that same assignment.
That’s not to say I was unable to containerize Cassandra. In fact, by the time I left the company, there were over 1,500 database containers running in dozens of production and non-production environments. We had moved well beyond Cassandra to include other NoSQL solutions as well as relational databases, document stores, search indexes, streaming, message queues, caches, and more. But the more we understood the problem, the more we realized how foolish our ambition had been.
The first oversight was failing to realize how different databases are from the microservices Docker was really created for. The issue is not that databases are stateful applications. In fact, in the days before Kubernetes, providing storage to containers was not much of a challenge. The real challenge with Cassandra—and most databases, for that matter—is that it is a distributed application. All tooling that exists for managing containers, from the Docker image specification itself to container orchestrators like Kubernetes, conceive of containers encapsulating entire applications rather than components of a distributed application. In other words, distributed applications have never been a natural fit to run inside of containers.
Despite this, we found a way to make it work. More accurately, we found many ways to make it work. This problem, like many others we’d soon discover, was something we needed to solve repeatedly. Running a containerized data service in 2021 is a very different thing than it was in 2014. Without doubt, it will be something else again in 2025. This was the other fact we failed to appreciate when we set out on this journey: the goalposts are constantly shifting. What worked with LXC would need to be updated to work with libcontainer. What worked with Docker would need to be updated to work with Kubernetes. For each purported advancement—like the introduction of StatefulSets and, later, Operators—we would need to re-engineer solutions. What started as a seemingly simple task soon became a full-time job. And what was a full-time job for one suddenly required a full team to support.
By this time, I realized that our goal was much broader than I originally understood it. The goal was not about reducing toil for DBAs or developers. The goal was to make their lives easier. To achieve this goal would require not just a team but the backing of a company whose mission it is to build a better world with data. Pure Storage is that company, and Portworx Data Services is the realization of that goal.
PDS allows companies of any size to run an enterprise-grade database-as-a-service on any Kubernetes cluster, on-prem or in-cloud. Built on top of Portworx Enterprise—the gold standard in Kubernetes storage—PDS handles backup and restore, HA, DR, autoscaling, encryption, and data migration for the industry’s broadest catalog of options. Put simply, PDS makes data services easy.
To learn more, visit the Portworx Data Services product page.