The Cloud Operating Model in Platform Engineering

Contact Sales

Platform Engineering

Platform engineering is changing the way that organizations handle software deployments and how they manage the underlying infrastructure. Some of the ways we used to work are changing and others are just consumed into our new paradigm. This blog will focus on the cloud operating model in a platform engineering organization.

Public clouds are a very common infrastructure platform for DevOps teams seeking scalable and flexible hosting solutions for their applications. Public clouds offer unified APIs for a broad list of solutions, and immediate access to seemingly unlimited infrastructure resources. The cloud operating model is an approach to using these types of resources to get the most out of your applications while limiting your cloud spend. It encourages automation, self-service access to needed resources, and elasticity of resources when needed to gain performance and lower costs when resources are not needed. In a world that got flipped upside down by elastic public cloud resources, the cloud operating model lended some clarity to how organizations should best make use of cloud resources.

New Challenges

The cloud operating model is great, and it helps eliminate service tickets and time waiting for infrastructure resources to be provisioned by a separate team. However, this new model brought challenges with it. Developers who had been used to writing application code now sometimes got tasked with provisioning their own infrastructure to improve their deployment velocities. It seemed like nirvana at first, as developers were able to quickly get code deployed on their own schedule, but it introduced issues with the operations and compliance teams. Operations teams were often still responsible for ensuring that these applications were highly available, protected from data corruption, ready for a sitewide disaster, and performed optimally. Security and governance teams had to ensure that the security patches were deployed, network access rules were in place, secrets were encrypted, standards were met, and costs were minimized.

Many teams began “shifting left” all of the responsibilities to have more control over the deployed applications and infrastructure resources. After all, it’s more difficult to make changes to already deployed applications than it is to deploy them according to the proper constraints to begin with. All of this working to shift left put a high cognitive load on the development teams building their applications. Taking on the responsibilities of infrastructure security, disaster recovery, availability, and data protection on top of their existing workloads was a lot to deal with. The teams that were able to do this effectively re-organized into cross-functional teams or hired those legendary Full Stack Engineers who seemingly could do anything. This left many companies facing a skills gap simply because of how many different things their teams needed to know. It also meant that companies tried to standardize on a single cloud provider to limit the number of cloud services, APIs and environment specifics they were managing. This led to a cloud lock-in that the business would rather avoid.

Platform Engineering

Platform engineering can remove some of the challenges that the Cloud Operating Model created, while sticking true to the goals of elasticity, self-service, and agility that developers were striving to obtain. Platform engineering teams took the complexities of operations away from the developers by orchestrating them and providing a catalog of curated capabilities. The operations, security, and governance items that once had to be shifted left to the development teams can now be codified into the internal development platform (IDP). The cloud operating model tenets are still in place, but the responsibilities have changed to a dedicated team to allow developers to do what they do best; develop code. Look at the Cloud Operating Model tenets with platform engineering:

Self-Service – Maybe the most important tenet of the cloud operating model was allowing developers on-demand access to deploy their resources. Self-service is still a prominent capability with the platform engineering paradigm, but instead of developers accessing the cloud API directly, the internal developer platform is used for self-service access.

Automation – Automation is critical, not only for application deployments, but also for setting up the infrastructure platforms needed to run the apps. Automation is critical to ensure that resources are deployed the same way every time. Automation prevents us from needing to revert back to the old ticketing solutions from long (or not so long) ago.

Guardrails – Governance teams need to make sure that applications are deployed in ways that limit downtime, ensure security, and reduce costs. These guardrails can now be built into the IDP automation routines at deployment time. This accomplishes the goal of shifting governance left, earlier in the overall process, but doesn’t burden developers with the cognitive load of having to understand all of these guardrails.

Vendor Lock-in – The challenge of cloud lock-in can be effectively addressed by the collaborative efforts of platform engineering teams. Developers can continue to interact with a unified API for their deployment needs, while platform engineering experts have the flexibility to choose from multiple cloud platforms as deployment destinations. Engineers proficient in a specific public cloud can integrate its features into the IDP, ensuring seamless accessibility for development teams. Over time, a variety of public cloud services can be integrated into the developer platform service catalog, enabling developers to utilize multiple cloud services effortlessly through a unified API.

Kubernetes

Similar to how developers were dealing with high cognitive loads for deploying their apps, platform teams are still burdened by a sea of cloud resources that they need to front-end with their IDPs. Every capability offered through the internal developer platform might have to be codified for each of the clouds the company allows developers to use, including on-premises resources. This leads to a lot of work and technical debt to manage multiple environments. Kubernetes provides a consistent app platform that can be used the same way across clouds, which reduces the number of custom routines that have to be coded and maintained. It also provides a key set of capabilities such as service discovery, DNS, secrets management, elasticity, ingress, firewalling and other capabilities that can simply be leveraged once Kubernetes has been added as a deployment target. This reduces the amount of time platform engineers have to write custom solutions in infrastructure as code (IaC) tools like Terraform, Cloud Formation, or Pulumi.

Portworx

Platform teams have to offer storage capabilities as part of their internal developer platforms. Applications being deployed through the IDP will need encryption, authorization, capacity management, backups, and site-to-site replication capabilities to make the applications production ready. Each cloud will have different capabilities to complete these storage tasks, and they might depend further on what cloud services you are using. For example, virtual machine backups and containerized backups might need different backup solutions even if you’re using the same cloud for each of them.

By using Kubernetes, platform teams got a common set of capabilities that could be depended upon no matter what cloud it ran on. Portworx provides the same set of storage capabilities and feature parity regardless of what kind of Kubernetes cluster you are working with. Managed Kubernetes clusters like AKS/EKS/GKE running Portworx are no different from operating OpenShift or Tanzu clusters running Portworx in your own data center. Portworx will always provide the ability to resize your persistent volumes automatically, configure application I/O control for noisy neighbors, make data highly available across availability zones, replicate data offsite, take scheduled or on-demand backups, and reduce costs for capacity and IOPS. Portworx offers capabilities that fit neatly into the cloud operating model:

Self-Service – Portworx Enterprise grants platform administrators self-service capabilities through a native Kubernetes construct called a StorageClass. A StorageClass provides a way for Kubernetes users to request access to the storage system for persistent volumes. Portworx can advertise their capabilities through a native Kubernetes construct so that it can fit into any environment.

Automation – Portworx technologies can ensure that not only are you provisioning storage with the necessary capabilities, but also manage them for day 2 operations. Portworx also includes day 2 capabilities in an automatic fashion so that operations teams aren’t constantly managing already deployed resources. The Portworx Cloud Drives capabilities can automatically provision and increase your storage pool capacity, and the AutoPilot feature can expand individual persistent volumes as they need additional space. Backup policies can be created to ensure scheduled backups are automatically applied to new Kubernetes resources in individual namespaces or an entire cluster. These types of capabilities reduce the manual effort required from operations teams.

Guardrails – Platform Engineers can rest assured that new deployments are protected by corporate standards while allowing flexibility for installations. When new applications are deployed, Portworx automation routines can ensure that these new applications are protected with scheduled backup routines, automatically replicated to a disaster recovery site, and their data is encrypted at rest and in transit.

Vendor Lock-in – Portworx is not a traditional storage array that pins your data to a piece of hardware, or a specific public cloud solution. Portworx is a software defined storage solution specifically designed for containers and can help prevent cloud lock-in. Being software defined means that the Portworx solution can be installed on-premises, at the edge, and in the public cloud using any block storage solution that is available to provide storage capabilities. This means that you have a consistent experience no matter where the solution is deployed, and can replicate your data to any other Kubernetes cluster, even if they use different backing storage devices.

Portworx can reduce the cognitive load on platform engineers – just like IDPs reduce cognitive load on developers – so that they can provide a familiar storage experience no matter where your Kubernetes cluster is running and no matter what block devices are available within that environment.

Summary

Platform engineering plays a critical role in helping organizations to maximize the benefits of the cloud operating model without placing so much of the cognitive load on the development teams. This comes from automating the operational routines that make applications production ready and placing them behind self-service portals or APIs. The cognitive load of the platform engineers and the technical debt that they have to manage can be reduced by using an application platform like Kubernetes with a data platform like Portworx Enterprise.

For more information check out our recent webinar on platform engineering with Portworx or the other posts in this series.