The Kubernetes Operator Pattern for Databases

Table of Contents

How Does the Operator Pattern Work?
What the Manifest Looks Like
For VMware Experts: What Changes When You Leave vSphere Behind?
The Management Plane
Application Awareness
Storage
Placement and Anti-Affinity
Operator Availability
Key Differences at a Glance
Wrapping Up: The Policy Is Still the Runbook
About Portworx

A Kubernetes operator is a custom controller that encodes operational knowledge of an application into the cluster itself. It essentially turns your operational runbooks into YAML manifests. Instead of relying on humans to execute procedures, you declare the desired state of a complex application and the operator continuously reconciles the cluster toward that state. The pattern can manage any workload with a complicated lifecycle, but it delivers the most value with stateful systems like databases, where failure handling, replication, and backups must happen correctly every time.

Database operators are therefore one of the clearest demonstrations of the pattern in practice. Running a replicated database reliably requires operational decisions that normally live in runbooks: how to detect a failed primary, which replica is safest to promote, and how to ensure backups continue during topology changes.

In this article, we’ll illustrate concepts using the cloud-native Postgres operator (CloudNativePG) which has become a reference implementation for this pattern. Not because Postgres is novel, but because the operator handles the hard parts: automated failover, continuous backup to object storage, and fencing of failed instances, with minimal configuration and high predictability. If you’re evaluating how database operators work in Kubernetes, this one’s a good example.

How Does the Operator Pattern Work?

A Kubernetes operator is a custom controller paired with a Custom Resource Definition (CRD). The CRD extends the Kubernetes API with a new resource type (Cluster in the CloudNativePG case), and the controller runs a reconciliation loop that watches instances of that resource and continuously drives actual system state toward the declared state.

This is the same pattern the built-in Kubernetes controllers use. A Deployment controller watches Deployment resources and ensures the right number of Pod replicas exist. A database operator does the same thing, but the reconciliation logic encodes operational knowledge specific to the database engine: how to initialize a replica from a backup, how to determine which replica has the most recent WAL position during failover, how to fence a former primary that might still believe it’s active.

CloudNativePG manages the Postgres lifecycle with three properties that define how operators differ from standard infrastructure automation:

Failover is declarative, not scripted. When a primary instance becomes unreachable, the operator evaluates all available replicas, selects the one with the least replication lag by Log Sequence Number, promotes it, and reconfigures the remaining replicas to follow the new primary. There is no runbook to execute, as the operator does it automatically.
Backup is built in. Continuous WAL archiving and base backups to S3-compatible object storage are part of the CRD spec, not a separate job or agent. The backup configuration lives in the same manifest as the replication configuration.
Instance fencing is automatic. If a former primary comes back after a failover, the operator does not let it rejoin as a primary. CloudNativePG first attempts to recover it using pg_rewind if the PVC is still available, which is the faster path. If that isn’t possible, the instance is re-initialized as a replica from backup. Either way, it cannot re-enter the cluster as a primary. This prevents split-brain at the database layer, without manual intervention.

What the Manifest Looks Like

Here is an illustrative CloudNativePG cluster definition using the current inline backup method. The annotations explain the configuration choices that matter for availability. This manifest uses barmanObjectStore for illustration, as it keeps backup configuration inline and self-contained, which makes the overall structure easier to follow. For new deployments on CNPG v1.26+, use the Barman Cloud Plugin via the CNPG-I plugin architecture, which moves backup configuration into a separate ObjectStore resource.

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: inventory-db
spec:
  instances: 3

  # Hard anti-affinity: pods stay Pending rather than co-locating on the
  # same node when capacity is constrained. The default (omitting this
  # block) is a soft preference — two replicas can land on the same node
  # with no warning. Use "required" for any production HA topology.
  affinity:
    enablePodAntiAffinity: true
    podAntiAffinityType: required

  storage:
    size: 50Gi
    storageClass: fast-replicated       # If your backend is topology-scoped or zonal, use
 # volumeBindingMode: WaitForFirstConsumer — see storage section

  postgresql:
    parameters:
      shared_buffers: "256MB"
      max_connections: "200"

  # Please note:  barmanObjectStore is the current inline configuration method.
  # From CloudNativePG v1.26, native Barman Cloud integration is deprecated
  # in favor of the CNPG-I plugin architecture. For new deployments, evaluate
  # the Barman Cloud Plugin. The inline method remains functional for
  # deployments before CNPG v1.26

  backup:
    barmanObjectStore:
      destinationPath: "s3://backups/inventory-db"
      s3Credentials:
        accessKeyId:
          name: backup-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: backup-creds
          key: SECRET_ACCESS_KEY
    retentionPolicy: "7d"

  # Set requests equal to limits for Guaranteed QoS. If requests are lower
  # than limits (Burstable QoS), the pod is eligible for OOM eviction under
  # memory pressure. Match both cpu and memory to get deterministic,
  # reservation-like behavior — a single mismatched resource drops you to
  # Burstable.

  resources:
    requests:
      memory: "1Gi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "1"

The entire database lifecycle from provisioning, replication, backup, to resource allocation lives in one object that Kubernetes tracks, versions, and reconciles.

For VMware Experts: What Changes When You Leave vSphere Behind?

The operator pattern is designed to manage both infrastructure and application state as a single concern. For teams coming from VMware vSphere®, that means several familiar responsibilities shift in ways that aren’t always obvious.

The Management Plane

In vSphere, vCenter® is the convergence point for compute, storage, and availability decisions. It is centralized, stateful, and on by default. When you migrate to Kubernetes, you lose that single pane but you don’t lose the capability it provided.

vSphere HA Admission Control prevents you from overcommitting a cluster to the point where a database can’t restart after a node failure, as it actively reserves failover headroom and blocks new placements that would consume it. Kubernetes doesn’t replicate this as a single default mechanism, but gives you the building blocks to approximate it. Resource Requests give the scheduler a guaranteed placement floor. PriorityClasses determine eviction order under memory pressure. Pod Preemption allows high-priority database pods to displace lower-priority workloads and claim capacity after a node failure.

These primitives govern scheduling and eviction ordering, which gets you most of the way there. What they don’t do by themselves is proactively hold capacity in reserve the way Admission Control does. Replicating that behavior requires deliberate overprovisioning strategy on top of PriorityClasses and Requests.

The difference is operational, not architectural. Admission Control is on by default in vSphere. In Kubernetes, you define PriorityClasses explicitly and assign them to your database workloads. The coordination layer isn’t missing, but it’s also not pre-configured.

Application Awareness

In vSphere, the database process inside a VM is largely transparent to the infrastructure for most deployments. vSphere HA has supported application-level heartbeats via VMware Tools since version 6.5, but this requires explicit SDK-level integration that most Postgres deployments don’t implement. In practice, the standard setup is what most teams are running: vSphere protects the VM; the DBA protects the database.

This means that vSphere and whatever database management tooling you use must agree on the state of the world. When they disagree, you get split-brain scenarios, stale replicas promoted to primary, or backup jobs that silently stop running because a VM was migrated to a host where the backup agent wasn’t configured.

A database operator collapses this separation. CloudNativePG has native awareness of Postgres internals: replication lag by LSN, WAL archive status, and which replica is safest to promote. It doesn’t monitor the VM that Postgres runs in, but rather Postgres itself.

Storage

In vSphere, vSAN handles replication below the VM. The database doesn’t know or care. In Kubernetes, the CSI driver and StorageClass determine whether your PersistentVolumeClaims are actually replicated. But the operator can’t fix bad storage topology. If you configure three Postgres replicas on a non-replicated backend, you have three copies of Postgres and a single point of failure at the storage layer.

StorageClass configuration also introduces a failure mode that has no vSphere equivalent: volumeBindingMode: Immediate provisions the PersistentVolume as soon as the PVC is created, before the scheduler knows which node will host the pod. In a multi-zone cluster, this can place your volume in Zone A while the pod lands in Zone B. Many storage backends are zone-scoped, so the pod simply fails to attach. The fix is volumeBindingMode: WaitForFirstConsumer, which delays provisioning until the scheduler has selected a node. This ensures storage and compute land in the same failure domain.

Placement and Anti-Affinity

vSphere DRS anti-affinity rules keep database VMs on separate hosts. In Kubernetes, the equivalent is pod anti-affinity, and CloudNativePG supports two modes.

The default is a soft preference. With the default, if your cluster has fewer nodes than database instances, the scheduler will co-locate replicas without warning. Set podAntiAffinityType: required and the behavior changes: the pod stays in Pending state until a unique node is available. That’s the equivalent of a DRS “Must” rule. Use it for any production HA topology.

Operator Availability

The operator itself runs as a Deployment in the cluster. If the operator is down, your databases keep running, but no reconciliation happens. A failover during an operator outage won’t be handled.

How to address this? CloudNativePG supports leader election across multiple operator replicas. Increase the replica count on the cnpg-controller-manager Deployment and you get an active/standby management plane. If the active operator pod fails, a follower takes over leadership via a Kubernetes Lease object and reconciliation resumes. This requires a single replica count change in a Deployment spec, which is simpler than vCenter Server HA, which requires a dedicated passive node, a witness, and dedicated management networks.

Key Differences at a Glance

Capability	vSphere	Kubernetes / CloudNativePG
Management plane	Centralized (vCenter); on by default	Distributed reconciliation loop; requires configuration
Admission control	Slot-based or percentage reservation; default-on	Resource Requests + PriorityClasses + Preemption; must be configured explicitly
Application awareness	VM-level; App Monitoring via Tools requires SDK integration	Native: LSN, WAL status, per-instance Postgres health
Storage replication	Below the VM (vSAN); transparent to the workload	At the StorageClass layer; must be explicitly configured
Storage topology	vSAN handles zone placement	WaitForFirstConsumer required to co-locate compute and storage
Placement rules	DRS anti-affinity (“Must” or “Should”); default-on	podAntiAffinityType: required or preferred; soft by default
Management HA	Active/Passive + Witness; complex network config	Leader election via Lease object; replica count change only
Failover logic	VM restart on surviving host; no database-layer awareness	LSN-based replica promotion; database-consistent by design

Wrapping Up: The Policy Is Still the Runbook

The operator pattern doesn’t just automate database administration, it also encodes operational decisions into the same API that manages everything else in the cluster.

The capabilities that VMware engineers rely on still exist: resource guarantees, placement rules, admission-equivalent capacity management, management-plane HA. What changes is that Kubernetes requires you to configure them explicitly rather than inheriting them from a centralized management plane that applies them by default.

The Operator pattern blurs the boundary between infrastructure and application which vSphere draws cleanly. But the operational discipline that made vSphere reliable doesn’t disappear. It’s just now encoded in YAML.

About Portworx

Portworx products enable you to run any cloud-native data service, in any cloud, using any Kubernetes platform, with built-in high availability, data protection, data security, and hybrid-cloud mobility. Learn more at portworx.com

What Is The Kubernetes Operator Pattern, and Why Does It Matter for Databases?