Platform Engineering for Cloud-Native Organizations

Motivation & Introduction

Within the last years, enterprises have already migrated large portions of their workload to the cloud – whether it is a private, public or hybrid cloud. However, many companies still fail to grasp all the benefits of cloud computing. According to the State of DevOps report released by Puppet and DORA (DevOps Research & Assessment), that highlights a DevOps maturity model, the overwhelming majority of companies are struggling with to reach the highest level of DevOps majority. This results in a set of problems to be solved: First, long lead times can be observed. Secondly, processes are often carried out manually, usually initiated by means of tickets created by developers. Third, the overall complexity of the tools and the way how to integrate them is usually high leading to the fact that developers are overwhelmed. Last, waiting times slow done developers as many companies lack an internal self-service developer platform.

In mature DevOps organizations, the structure typically includes stream-aligned application teams and platform teams. Stream-aligned teams focus on developing and deploying code, while platform teams support them by providing various services and support:

  • Platform servicing such as CI/CD and infrastructure provisioning by using industry-wide best practices such as Infrastructure-as-Code (IaC).
  • Evangelization and mentoring DevOps practices for promoting cultural values, such as communication, transparency and knowledge sharing.
  • Rotary human resources, which means that platform teams might help and provide product teams with human resources when these teams lack of specific skills to accomplish their work.

The post discusses the formation of a platform team to efficiently support enterprises with cloud-native technologies. It explores how this team integrates with product teams and outlines its responsibilities, emphasizing technical requirements like Infrastructure-as-Code and GitOps. The paper’s structure includes a review of related work, a technical background overview, detailing efficient platform team requirements and processes, and concluding with future work considerations.

Related Work

Platform Teams and Platform Engineering emerged as a relatively new trend, with initial publications dating back to 2020. Leite et al. (2020) define platform teams as structures for continuous delivery and discuss their role in organizations and interactions with product teams. Srivastava (2023) emphasizes the importance of Platform and Site Reliability Engineering for both startups and larger organizations, enabling efficient delivery of high-quality, secure, compliant, robust, and reliable products. Seremet et al. (2022) explain the symbiotic relationship between Platform Engineering and Site Reliability Engineering, highlighting shared principles and differences. Puppet’s State of DevOps Report (2023) underscores the significance of platform engineering in achieving DevOps success at scale, its rising prominence, and the benefits it brings to organizations when executed effectively.

Background

Platform Teams in cloud-native environments leverage software engineering principles to accelerate software delivery. Infrastructure-as-Code (IaC) is a key practice, involving declarative descriptions of the cloud environment in source control systems to efficiently provision and manage resources, ensuring consistency and cost mitigation. GitOps extends IaC to all deployments, maintaining a single source of truth for declarative deployment information and utilizing monitoring tools for continuous state reconciliation.

Site Reliability Engineering (SRE) automates IT infrastructure tasks, prioritizing the reliability of scalable software systems amidst frequent updates. SRE emphasizes observability through Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure reliability and set availability goals. It also promotes a cultural shift where every failure is viewed as a system reliability issue.

Platform Team Engineering For Cloud-Native-Environments

In organizations dealing with cloud and Kubernetes environments, DevOps teams often face challenges in infrastructure provisioning and security. Platform teams alleviate these issues by abstracting infrastructure concerns from application teams. Companies may not have the size or expertise to form their own platform team, leading to the option of offering platform engineering as a service.

Standardization is crucial for such services to enhance efficiency in supporting application teams. Tooling is required to enable the consumption of Infrastructure-as-Code (IaC) modules and provide self-service capabilities. To facilitate GitOps and security best practices, a well-defined structure in a Git repository is proposed, allowing infrastructure and application deployments to work seamlessly.

Figure 1 illustrates the responsibilities and interaction of the Platform Engineering team, which acts as an abstraction layer over the underlying infrastructure (typically a public or private cloud). It offers various services to product teams to enhance efficiency and manage infrastructure effectively.

  • Most important, a DevOps tool with pipeline support in order to run workflows is provided to run workflows.
  • The Platform Team provides standardized modules that are optimized in terms of programming and security best practices and are compliant for the organization.
  • Application Teams need a way to interact with the Platform Team. In most use cases, using Git is sufficient in order to support IaC pipelines and GitOps workflows. For advanced use cases, a self-service catalog or an internal developer platform can be used.
  • Besides infrastructure deployment support, a CI/CD platform should also be provided. Functionality should include at least support for container images builds and app delivery.
  • In order to support SRE, observability should be integrated in the platform tooling and all provisioned resources should have observability support included.
  • Platform tooling should support product teams in the complete DevOps lifecycle – from building, testing to operating.
  • If Application Teams need support, the Platform Team might provide consultancy and human resources to the respective teams.

Figure 1 depicts the relationship between the Platform Tooling and the Applications that are built on top of it.

The platform engineering team serves as a central point of contact for various aspects of the infrastructure and platform. As the team is responsible for the development, deployment and management of the cloud platform, it can act as the main point of contact for other teams within the organization.