What is PostgreSQL?
PostgreSQL, or more commonly Postgres, is a relational database that uses the common SQL language to perform queries. It provides features that safely allow users to persist and scale data workloads. Postgres has proven itself in many production software stacks in its lifetime and can provide flexible and reliable features to a range of applications.
What is Kubernetes?
Kubernetes is an open-source platform which runs a cluster of worker and master nodes which allow teams to deploy, manage, scale and automate containerized workloads such as PostgreSQL. Kubernetes can manage many applications at massive scale including stateful applications such as databases or streaming platforms. Kuberenetes builds on the shoulders of giants such as Google who initially conceived the software after using similar technology to run production workloads for over a decade.
Running Postgres on Kubernetes
Kubernetes is a great place to run many types of workloads that require automation and scale. Due to the particular requirements of stateful services–security, reliability, performance– they benefit in particular from this automation, enabling teams to move faster to market without sacrificing reliability. Because they house data, stateful workloads like Postgres running on Kubernetes must be able to meet particular business requirements outlined below in order to meet compliance, security, availability, and performance demands of mission-critical applications.
Container Native Storage
Kubernetes provides automation for application deployments by scheduling and rescheduling pods dynamically based on available resources. Due to the nature of stateful services like Postgres, you must have a data layer in place that can provide dynamic storage provisioning for Postgres pods as well as the ability to keep that data available in the case of pod reschedules. Using block devices from a cloud provider as container volumes often lead to low density of stateful containers per host, slow database provisioning and slow failover as these systems were not designed to provide storage directly to containers. Likewise, traditional on-premises storage systems that were built for VMs have similar problems. Choose a container-native storage layer that can match the scale, density, and dynamism of container environments.
Performance & SLA Considerations
As a stateful service, Postgres running on Kubernetes is composed of a container image running in a pod and a data volume which is stored on a disk. This volume must be persisted in the case of server failure, otherwise, the Postgres pod will be relaunched on another host without its data. Highly performant persistent storage for Postgres is particularly important given the use of components such as the Write Ahead Logs (WAL). A WAL is a major piece of the database software that ensures data integrity by logging changes to data and flushing the data to disk before actual changes are written to the database. This allows Postgres to re-apply logged changes in case of database corruption or recovery. These logs are stored within the persistent location configured when Postgres starts, so any data management layer should support high performance and data locality for these logs to ensure that an application is not sluggish writing to disk. In the case of server failure, Postgres can rebuild a data set by replaying logs, however this process can be very slow and for production systems with strict SLAs, it is preferable to keep an up-to-date copy of the volume elsewhere in the cluster such that Kubernetes can simply reschedule a pod to a host with a local copy of the data already available.
Cloud Migrations and Disaster Recovery
PostgreSQL is often the system of record for important data sets that must be managed in a way that ensures data integrity and availability. Pick a cloud native storage system that allows the migration of Postgres pods and data between environments using the Kubernetes control plan. In the event of scheduled and unscheduled maintenance, you should be able to copy both your Postgres data and Kubernetes objects like controllers and PVCs between environments without time-consuming WAL replays or ETL processes. Additionally, you should ensure that in the event of a complete data center or regional outage, you can run your Postgres database in another environment with RPO zero data loss.
Postgres provides built-in security features such as encryption and tls, However, you should also consider how data volumes are protected when running inside a multi-tenant Kubernetes environment, and which users have access to manipulate these objects and backing stores. For added security, organizations should protect their data at the application level as well as secure the data layer with encryption and role-based access controls in the data management layer.