Internal Delevoper Platforms – Part 11: Backstage Entities

Table of Contents:


In the last blog post, we have shown how to register Software Entities into the Software Catalog and have found out that there are different kind of entities.

As a recap, the Backstage Software Catalog is a centralized system designed to manage and track ownership and metadata for all software within an ecosystem, including services, websites, libraries, and data pipelines. This catalog uses metadata YAML files, stored with the code, which are collected and displayed in Backstage, facilitating easy management and visualization.

Backstage and the Backstage Software Catalog make it easy for one team to manage 10 services — and makes it possible for your company to manage thousands of them.

In detail, the Software Catalog supports two primary use-cases:

1. Management and Maintenance: It provides teams with a consistent view of all their software assets, regardless of type—services, libraries, websites, or machine learning models. This enables teams to efficiently manage and maintain their software.

2. Discovery and Ownership: The catalog ensures all software within a company is easily discoverable and clearly associated with its respective owners, eliminating issues related to „orphan“ software that may otherwise be overlooked or lost within the broader ecosystem.

Entitity Overview

Overall, Backstage and its Software Catalog simplify the management of numerous services, making it feasible for a single team to oversee many services and for a company to handle thousands.

Now it’s time to address these entities, which include Components, Templates, APIs, Resources, and Systems among others. Each entity type has its specific descriptors:

  • Component: This type refers to a software component, usually closely tied to its source code, and is meant to be viewed as a deployable unit by developers. It typically comes with its own deployable artifact.
  • Template: Entities registered as Templates in Backstage have descriptor files that contain metadata, parameters, and the procedural steps required when executing a template.
  • API: This type covers interfaces that generally provide external endpoints, facilitating communication with other software systems.
  • Resource: Describes types that act as infrastructure resources, which usually provide the foundational technical elements of a system, such as databases or server clusters.
  • System: Unlike the singular definition of components, entities marked as Systems represent a collection of resources and components, meaning a system may encompass multiple other entities. The key advantage of this model is that it conceals the internal resources and private APIs from consumers, allowing system owners to modify components and resources as needed.
  • Domain: While Systems serve as a fundamental method for encapsulating related entities, but for enhanced organizational clarity and coherence, it is often beneficial to group multiple systems that share common characteristics into a bounded context. These characteristics can include shared terminology, domain models, metrics, key performance indicators (KPIs), business purposes, or documentation.

Quite interesting there are also organizational entities:

  • User: A user describes a person, such as an employee, a contractor, or similar.
  • Group: Describes an organizational entity, such as for example a team, a business unit, or a loose collection of people in an interest group.

Entity Details

Entities in Backstage are written in YAML and have basically a metadata and a spec section:

The metadata section consists of the following fields:

  • A required field that specifies the name of an entity.
  • metadata.namespace: Used for defining the namespace of the entity and for classifying entities.
  • metadata.annotations: Primarily for listing references to external systems, such as links to a GitHub repository.
  • metadata.links: Specifies which links are displayed in the „Overview“ tab of the entity’s page in Backstage.

On the other side, the spec field’s structure and content depend on the entity type selected in the „kind“ key. It determines how an entry is categorized in the software catalog and the possible relationships among entities.

For the entity type „Component,“ the spec fields include spec.type, spec.lifecycle, spec.system, among other relationship types. The fields within the spec section define essential attributes of the entity, such as the lifecycle stage (e.g., active, production, deprecated) and the entity’s owner, which is typically a person, team, or organizational unit responsible for the entity’s maintenance and development.

As it would be too long for the blog post, we will point to the documentation, where all the manifests are described in detail:

System Model Overview

All together these elements form a complete system model, which is shown in the following architecture diagram:

This diagram provides a comprehensive view of Spotify's Backstage entities and their relationships.

Key Entities:

Template: Defines parameters used in the frontend and steps executed in the scaffolding process.
Location: References other places for catalog data.
Domain (Orange box): Represents domain models, metrics, KPIs, and business purposes.
System (Yellow box): A collection of entities working together to perform a function.
API (Green box): Represents different APIs, including OpenAPI, gRPC, Avro, etc.
Resource (Light Green box): Contains resources such as SQL databases, S3 buckets.
Component (Light Blue box): Backend services, data pipelines, and similar components.
Group (Blue box): Groups related by type (team, business-unit, product-area).
User (Blue box): Represents users belonging to groups.

Domain is part of System.
Depends on Resource.
Depends on Component.
Part of Domain.
Provides API to API.
Part of System.
Contains types: database, S3-bucket, cluster.
Part of System.
Depends on other Component.
Provides API.
Consumes API.
Types: service, website, library.
Owned by User.
Has members and sub-groups.
Member of Group.
The diagram uses different colored boxes for distinct entity types and directional arrows to represent relationships.


Relations between entities always involve two parties, each assuming a specific role within the relationship. These relationships are directional, featuring a source and a target. In a YAML file, the source entity defines the type of relationship as the key name, while the target entity is specified as the value assigned to this key. For example, in the YAML file of a Component type entity, relationships like `dependsOn`, `providesApis`, `consumesApis`, and `subComponentOf` (noted as `partOf` in diagrams) can be defined as keys, followed by an entity reference according to the previously described pattern.

Each entity in the relationship has a corresponding opposite role, which need not be defined in the YAML file but is used in queries or visualizations of relationships. For instance, if Component A has a relationship role `providesApis`, the referenced Component B would assume the opposite role `apiProvidedBy`.

Let’s recap our example from before:

kind: Component 
 name: exampleappfrontend
 description: Simple Webseite
 - url:
   title: ExampleApp
   icon: web
 annotations: sclabs/exampleApp/frontend/ dir:.
   type: Website
   lifecycle: production
   owner: Joseph Cooper
   system: exampleapp
   consumesApis: ['component:exampleappservice']

In this application, the `exampleappfrontend` in its descriptor file might have a key-value pair `consumesApis: [‚component:exampleappservice‘]`, indicating a reference to the `exampleappservice` component, which serves as a backend providing an API.

Entity Lifeycylce

In the Backstage Software Catalog, the process for registering entities follows a standardized technical flow, regardless of how the entities are initially registered. This flow can be visualized in a comprehensive diagram, often referred to as „The Life of an Entity.“

This diagram represents the data flow of Spotify's Backstage catalog ingestion, processing, and stitching pipeline.

Pipeline Stages:

External Sources (Red box): The origin of entity data.
Entity Providers (Green box): Components that ingest data from external sources.
Unprocessed Entities (Yellow box): Entities directly fetched from providers.
Edges (Yellow box): Relationships extracted between unprocessed entities.
Processors (Green box): Modules that transform unprocessed entities.
Processed Entities (Yellow box): Entities after processing.
Relations (Yellow box): Extracted relationships among processed entities.
Errors (Yellow box): Issues detected during processing.
Stitcher (Green box): Combines processed entities and relationships into the final set.
Final Entities (Yellow box): Fully processed entities ready for use.
Search (Yellow box): Indexes entities for quick searching.
Catalog API (Red box): Serves the final entities via an API.
Directional arrows represent the flow of entities through the different stages and components.
  • Entity Ingestion and Provider: The process begins with the „Entity Provider“ which collects raw entity data from specified sources and translates them into unprocessed entity objects. To avoid duplicates, the database tracks which provider has ingested which entity. This stage also includes an initial validation to ensure critical fields like ‚kind‘ and ‚‘ are present.
  • Entity Processing: The next step involves processing these unprocessed entities. This includes validation and further processing through „Policies“ and „Processors.“ Policies are sets of rules for validation, while Processors apply these rules to validate the entities. This step may involve exporting relationships, error messages, or the entity itself from the raw data.
  • Entity Stitching: The final step is „Stitching,“ where processed entities, along with any error messages and relationships, are retrieved from the database and combined into the final entity that will be used in the Software Catalog. This process considers relationships that might be defined in other entities and handles any error messages, displaying them within the catalog as necessary.

Throughout these steps, developers have the flexibility to implement custom Providers and Processors to fetch entities from unique sources or at specific intervals, though some system constraints like processing intervals are predefined. The recommended method for automating entity ingestion in the catalog is through Custom Entity Providers. Once all sub-steps are completed, the final entities are stored in the internal database and presented in the Software Catalog, ready for use.


Dr. Guido Söldner


Guido Söldner ist Geschäftsführer und Principal Consultant bei Söldner Consult. Sein Themenfeld umfasst Cloud Infrastruktur, Automatisierung und DevOps, Kubernetes, Machine Learning und Enterprise Programmierung mit Spring.