Google Cloud Landing Zone Series – Part 7: Network Design

In the last blog post, we have shown how to establish connectivity between the on-premises network and the Google Cloud Landing Zone. Now it is time to talk about some network concepts. Networking is at the core of a Landing Zone, hence that’s there is plenty to discuss. We will split that topic into two different blog posts. In this blog post we will first introduce the most important network concepts and in the second blog post we will introduce various architectural designs for different scenarios. We will assume that you have a basic understanding about cloud networking.

In detail, this blog post, will introduce the following networking components:

  • Private Google Access
  • Private Google Access for on-premises hosts
  • Private Service Access
  • Private Service Connect

Private Google Access

Private Google Access (PGA) allows instances in a Virtual Private Cloud (VPC) network to connect to Google APIs and services through internal IP addresses rather than using external IP addresses. This capability ensures secure and private communication between your Google Cloud resources and Google APIs and services without the need for public IP addresses or NAT (Network Address Translation) gateways.

Why should we use it:

1. Security and Privacy: By using internal IP addresses, traffic remains within Google’s network, enhancing security and privacy. For example, applications running on Google Cloud can securely access Google services like Cloud Storage, BigQuery, or Pub/Sub.

2. No Public IP Required: Instances without public IP addresses can still access Google APIs and services.

3. Cost-Effective: Helps in reducing costs associated with managing and securing public IP addresses. This helps allows by reducing reliance on NAT gateways.

The following picture shows an implementation of Private Google Access:

The image shows a network diagram depicting the Google Cloud Platform (GCP) architecture for a "Landing Zone."

At the top, the Internet is connected to Google APIs and Services via public IP addresses. Below that is the main project with a VPC network that includes an Internet Gateway and VPC Routing.

The VPC network consists of two regions: us-west1 and us-east1.

In the us-west1 region, there are two virtual machines (VMs) in subnet-a, where Private Google Access is enabled. VM A1 has the IP address 10.240.0.2, and VM A2 has the IP address 10.240.0.3 with a public IP.
In the us-east1 region, there are two virtual machines (VMs) in subnet-b, where Private Google Access is disabled. VM B1 has the IP address 192.168.1.2, and VM B2 has the IP address 192.168.1.3 with a public IP.
The diagram uses colored lines to indicate traffic paths: green for traffic to Google APIs and Services, and yellow for traffic to the Internet.

Private Google Access for on-premises hosts

Private Google Access for on-premises hosts extends the capability of Private Google Access to on-premises environments. This feature allows on-premises hosts to access Google APIs and services privately, over internal IP addresses, without exposing the traffic to the public internet.

Why should we use it?

1. Secure and Private Access: On-premises hosts can securely access Google Cloud services via internal IP addresses.

2. No Public IPs Required: Similar to PGA for VPC networks, it eliminates the need for public IP addresses for on-premises hosts.

3. Hybrid Cloud Integration: Facilitates seamless integration between on-premises data centers and Google Cloud services.

How does it work?

In order to configure Private Google Access for on-premises hosts, a couple of steps have to be done:

1. Establish a Secure Connection: Use Cloud Interconnect or VPN to connect your on-premises network to your Google Cloud VPC network.

2. Configure DNS: Ensure that DNS queries for Google APIs resolve to private IP addresses.

3. Enable Private Google Access: Make sure Private Google Access is enabled on the relevant VPC subnets in Google Cloud.

4. Update Routing: Configure routing to direct traffic from on-premises hosts to Google Cloud services via the secure connection.

The following picture show the implementation:

The image depicts a network architecture diagram for a Google Cloud Platform (GCP) landing zone with an on-premises network.

At the top, the on-premises network includes subnets and resources connected to an on-premises VPN Gateway with an external IP (BGP IP: 169.254.1.2). A VPN tunnel carries encrypted traffic to the Internet.

Below, within the GCP project "my-project," there is a VPC network with an Internet Gateway connected to VPC Routing and a Routing Table. In the us-east1 region, a Cloud VPN Gateway with a regional external IP is connected to a Cloud Router (169.254.1.1). This setup communicates with the on-premises VPN Gateway via the VPN tunnel.

There is also a restricted range for Google APIs and Services (199.36.153.4/30) connected within the VPC network. The Cloud Router advertises this range, with the next hop being the Cloud Router (169.254.1.1). DNS CNAME maps *.googleapis.com to restricted.googleapis.com for secure access to Google services. The diagram uses colored lines to indicate different traffic paths: green for internal routing, red for encrypted VPN traffic, and connections to the Internet.

Traffic from on-premises hosts to Google APIs travels through the tunnel to the VPC network. After traffic reaches the VPC network, it is sent through a route that uses the default internet gateway as its next hop. This next hop allows traffic to leave the VPC network and be delivered to restricted.googleapis.com (199.36.153.4/30).

Private Service Access

Private Service Access allows you to connect your Virtual Private Cloud (VPC) networks to Google-managed services such as Cloud SQL, AI Platform, and other Google APIs in a secure and private manner. The connection is made over internal IP addresses, hence ensuring that traffic does not traverse the public internet,

Why should we use it?

1. Private Connectivity: Establishes private connectivity between your VPC network and Google-managed services, avoiding public internet.

2. Enhanced Security: Keeps data traffic secure within the Google Cloud network.

3. Simplified Network Management: Reduces the complexity of managing firewall rules and NAT gateways for service access.

How It Works?

Private Service Access involves setting up private connections from your VPC to Google-managed services using VPC peering.

VPC Peering allows networks to communicate internally using private IP addresses without the need for public IPs or additional firewall rules.

The following picture shows the implementation:

The image depicts a network architecture diagram for a Google Cloud Platform (GCP) landing zone with a customer project and a service producer project.

On the left, the customer project includes a Customer VPC network in the us-central1 region with a virtual machine (VM1) having the IP address 10.1.0.2 in subnet 10.1.0.0/24. There is also an allocated range of 10.240.0.0/16 for private connections.

On the right, the service producer project for the customer includes a Service Producer VPC network. In the us-central1 region, it contains a database instance (DB1) with the IP address 10.240.0.2 in a subnet for Cloud SQL (10.240.0.0/24). In the europe-west1 region, there is another resource with the IP address 10.240.10.2 in a subnet for another service (10.240.10.0/24).

The two projects are connected via VPC Network Peering, allowing private services access traffic between the customer project and the service producer project. The green lines indicate the paths for private services access traffic.

In the diagramm, the customer VPC network allocated the 10.240.0.0/16 address range for Google services and established a private connection that uses the allocated range. Each Google service creates a subnet from the allocated block to provision new resources in a given region, such as Cloud SQL instances.

Private Service Connect

Private Service Connect allows you to securely and privately access Google services, third-party services, and your own services through private IP addresses. It ensures that the traffic between your Virtual Private Cloud (VPC) network and these services does not traverse the public internet, thereby enhancing security and performance.

Why should we use it?

1. Private Connectivity: Establishes private connections using internal IP addresses, avoiding public internet exposure.

2. Enhanced Security: Protects data by keeping it within Google’s network, reducing the risk of external threats.

3. Simplified Network Configuration: Streamlines the process of connecting to Google services, third-party services, and your own services.

4. Service Access Control: Allows granular access control and policy management for services.

5. Load Balancing: Supports integration with Google Cloud’s load balancing services to distribute traffic efficiently.

How It Works?

Private Service Connect creates endpoints in your VPC network that serve as entry points to the service you want to access. These endpoints use internal IP addresses, ensuring that the communication remains within the private network.

The following picture shows this in more detail:

The image depicts a network architecture diagram for a Google Cloud Platform (GCP) landing zone utilizing Private Service Connect.

On the left, the Consumer VPC includes various clients accessing different types of Private Service Connect endpoints:

Endpoint
Backend
Interface
These connect through the central Private Service Connect, represented by a secure lock symbol.

On the right, the Producer VPC offers published services, categorized into:

Google services
Third-party services
Intra-org services
Above, managed services like Google APIs are also accessible via Private Service Connect. The diagram illustrates the secure, private connection paths between consumer clients and various managed and published services within GCP.

Google Cloud Landing Zone Series – Part 6: Connectivity

One of the most important things to consider when creating a Landing Zone is how connectivity can be implemented. It is easy to figure out that various options are possible and as cloud technologies including networking are evolving, a Landing Zone that was created in the past might need to be modernized since new technologies and services exist now. The same applies for the future: A Landing Zone can only be built with the available technologies from today, and if there is something new on the market, you might consider changing or modernizing some parts of your landing zone and that – of course – includes connectivity and networking as well.

Most companies tend to implement a hybrid cloud model where some workload remains on-premises. In that case connectivity between the cloud and on-premises must be established.

Connectivity options

So, let’s shortly introduce the different options:

First, there is Google Cloud Interconnect, that provides a high-speed, highly available connection directly to Google’s network. There are two main types:

  • Dedicated Interconnect: This provides physical connections between your on-premises network and Google’s network. This is suitable for high-volume, business-critical workloads that require high throughput and low latency. With Google you could have 10 Gbps circuits and 100 Gbps circuits.
  • Partner Interconnect: If you want to start smaller, but still want to have an Interconnect, then Partner Interconnect might be right solution. It allows you to connect to Google through a supported service provider. This is a more flexible and cost-effective option if you don’t need the full scale of a dedicated connection.

On the other side, there is Cloud VPN: If you’re looking for a less expensive option than Interconnect and can tolerate the generally higher latency of internet-based connections, Google Cloud VPN is a good choice. It securely connects your on-premises network to your VPC (virtual private cloud) in GCP over the public internet using IPsec VPN tunnels.

If you start your cloud journey, you might consider starting with a VPN and changing later to an Interconnect.

What about MACsec?

MACsec (Media Access Control Security) is a security technology that provides secure communication for Ethernet traffic. It is designed to protect data as it travels on the point-to-point Ethernet links between supported devices or between a supported device and a host. In the context of Google Cloud and hybrid cloud setups, MACsec can be used with Dedicated Interconnect and Partner Interconnect.

Like VPN, MACsec encrypts traffic and hence it is recommended for customers to use in combination with an Interconnect, as the former does not encrypt traffic.

The following figure shows an architectural diagram for MACsec with a Dedicated Interconnect:

The diagram illustrates the network connectivity between Google Cloud and an on-premises network through a colocation facility.
Diagram showing Google Cloud Landing Zone Connectivity. On the left, in a Google Cloud network (labeled as my-network), a Compute Engine instance (IP: 10.128.0.2) and a Cloud Router (Link-local address: 169.254.10.1) are connected. The Cloud Router is linked via the Google peering edge within a colocation facility (Zone 1). A dedicated interconnect labeled my-interconnect with MACsec encryption connects to the On-premises router (Link-local address: 169.254.10.2) in the on-premises network (Subnet: 192.168.0.0/24). A User device (IP: 192.168.0.11) is connected to the on-premises router. The diagram shows seamless connectivity between the Google Cloud network and the on-premises network via a secure interconnect through the colocation facility.

In the picture, a VLAN attachment for Cloud Interconnect will be configured at the Cloud Router. Behind the scenes, Cloud Router uses Border Gateway Protocol (BGP) to exchange routes between your Virtual Private Cloud (VPC) network and your on-premises network.

Since shortly, MACsec can be also used for Partner Interconnect. The following picture depicts the architecture:

Diagram showing Google Cloud Landing Zone Connectivity using a service provider network. On the left, in the Google Cloud network (labeled as vpc1 (VPC network)), a Compute Engine instance (IP: 10.128.0.2) and a Cloud Router (ASN: 16550, Link-local address: 169.254.10.1) are connected. The Cloud Router is linked via the Google peering edge within a colocation facility (Zone 1). The Google peering edge connects securely to a Service provider peering edge using MACsec encryption via an interconnect labeled my-interconnect (my-project1). The Service provider peering edge leads to another Service provider peering edge through a service provider network. Finally, the connection reaches the On-premises router (Link-local address: 169.254.10.2) in the on-premises network (Subnet: 192.168.0.0/24). A User device (IP: 192.168.0.11) is connected to the on-premises router. The diagram demonstrates how the service provider network facilitates secure connectivity between Google Cloud and the on-premises network through the colocation facility.

Connectivity and Beyond

After having described, how to establish connectivity between on-premises between on-premises and the Google Cloud with a Cloud Router, we have discuss how to come up with a design for the workload. As always, the design depends on your requirements, but basically two “flavors” are quite popular:

  • If you work, with (Partner/Dedicated) Interconnect and are using few Shared VPCs – for example for different stages like Test, Int or Prod – a feasible option is to create dedicated MACsec connections with having a Cloud Router and a VLAN attachment in every Shared VPC. In that case, the different Shared VPCs are isolated from each other. If you need connection between the different VPCs, you still can setup a VPN between the VPCs or use Private Service Connect to publish services to other VPCs. However, keep in mind that you are limited in the number of VLAN attachment (often between 10 and 15), so you better use Shared VPCs.
  • Another way would be to setup a Transit VPC with a MACsec connection and hence use VPNs to connect to other VPCs or Shared VPCs. This approach scales better as you can have much more VPN connections as VLAN attachments.

While we have been discussing MACsec, basically the same considerations apply when using VPN between on-premises and the Google Cloud.

In addition, while there is also the possibility to create a peering between VPCs, please consider its limitations. There is a hard limit on creating peerings and in addition there is no transitive routing between three or more different VPCs.

Another possibility would be to use a Third-Vendor appliance for the connectivity. If you prefer such a solution, that might be possible, but you should check if there is integration between the appliance and the Google Cloud Router – otherwise BGP routes cannot be exchanged.

What about Google Network Connectivity Center?

It is quite important to know that there is also a service called Network Connectivity Center. It is designed to act as a single place to manage global connectivity, providing elastic connectivity across Google Cloud, multicloud, and hybrid networks and giving deep visibility into Google Cloud and tight integration with third-party solutions.

For those of you who have experience with Microsoft Azure Virtual WAN, or AWS Transit Gateway it is interesting to learn that Google Network Connectivity Center basically is designed to work somehow similar. However, at the current time, not all features for the Network Connectivity Center are right now available, so we do not recommend it right now and wait most likely until 2025.

Google Cloud Landing Zone Series – Part 5: Organizational Policies

As described, a Landing Zone serves as the foundation and enables customers to effectively deploy workloads and operate their cloud environment at scale. But while enabling is important, it is also crucial to define standards and define guardrails what the different teams can do or cannot do. At this point, organizational policies come into play and that’s reason enough to discuss them in our Google Cloud Landing Zone series.

What are Organizational Policies?

Let’s give some kind of formal description:

Basically, Organizational Policies in Google Cloud Platform (GCP) are a set of constraints that apply to resources across your entire organization. These policies help govern resource usage and enforce security and compliance practices across all projects and resources within a GCP organization. Organizational Policies ensure that the actions of individual resources align with the broader business rules and regulations that a company wants to enforce.

How do Organizational Policies work?

Basically, Organizational Policies are easy to understand. Let’s discuss the most important aspects:

  • Constraints: Policies are enforced through constraints, which define the specific rules or limitations for resource management within the organization. For example, a constraint can limit which Google Cloud services can be activated or restrict the locations (regions and zones) where resources can be deployed.
  • Policy types: There are two policy types: Boolean constraints are simple enable/disable toggles for certain features or behaviors. For example, disabling serial port access for VM instances. On the other side, list constraints manage lists of values that either deny or allow specific behaviors. For example, restricting which Google Cloud APIs can be enabled in a project
  • Hierachy and Scope: Organizational Policies are implemented within a hierarchical structure in GCP. This hierarchy starts from the organization level, extends to folders, and then to projects. Policies set at a higher level (like the organization) apply to all items within it unless explicitly overridden at a lower level (like a project).
  • Customizability: Each constraint can be customized to meet specific organizational needs. This means policies can be tailored to allow exceptions, enforce stricter controls, or completely block certain actions.
  • Enforcement and Compliance: Organizational policies are automatically enforced by the platform, ensuring compliance and reducing the risk of human error. This automated enforcement helps maintain security standards and compliance with internal policies and regulatory requirements.

The following picture shows how Organizational Polices are embedded within the GCP organization hierarchy:

Flowchart depicting the policy management structure in a Google Cloud Landing Zone. The chart shows an Organization Policy Administrator defining an Org Policy, which is set on a Resource Hierarchy Node. This policy is inherited by default to Descendant Resource Hierarchy Nodes, which enforce constraints outlined in the policy. Constraints are defined and referenced by GCP Services, indicating how policies are evaluated and enforced across the cloud resource hierarchy.

Why do I need Organizational Policies?

I think it is basically easy to understand, why guardrails should be set in a cloud enviroment, but let’s write down the reasons:

  • Security and Compliance: Organizational policies help ensure that your cloud environment complies with both internal security policies and external regulatory requirements. For example, you can enforce policies that restrict the deployment of resources to specific regions to comply with data residency laws.
  • Risk Management: Policies reduce the risk of data breaches and other security incidents by limiting how resources are configured and who can access them. For example, disabling public IP addresses on virtual machines can prevent accidental exposure of services to the internet.
  • Consistency and Standardization: Applying uniform policies across an entire organization helps maintain consistency in how resources are managed and configured. This standardization is crucial for large organizations where different teams might deploy and manage their resources differently.
  • Operational Visibility: With organizational policies, administrators have a clearer view of the entire organization’s configurations.
  • Minimize Human Error: By enforcing certain configurations and restrictions at the organizational level, you minimize the risk of human error. This can be particularly valuable in preventing misconfigurations that might otherwise lead to security vulnerabilities or operational issues.

What are examples of Organizational Policies?

Currently, at the time this blog post has been written, there are 121 different Organizational Policies in GCP and this number is still increasing. The list of Organizational Policies can be found in the Google Cloud documentation:

https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints

While it is too long to discuss all the Organizational Policies in detail, we will nevertheless give some examples of some policies:

  1. Resource Location Restriction: This policy restricts the geographical location where resources can be created. Organizations can enforce data residency requirements by ensuring that data and resources are stored in specific regions or countries, complying with local laws and regulations. For example, you could restrict the locations for the European Union.
  2. Restricting VM IP Forwarding: This policy prevents virtual machines from forwarding packets, which can be a critical security measure to avoid misuse of the network.
  3. Disable Serial Port Access: By disabling serial port access for VM instances, organizations can enhance the security of their virtual machines by preventing potential external access through these ports.
  4. Service Usage Restrictions: Organizations can control which Google Cloud services are available for use. For example, you might want to restrict the use of certain services that are not compliant with your security standards or are deemed unnecessary for your business operations.
  5. Restrictions on External IP Addresses: This policy can be used to prevent resources such as virtual machines from being assigned external IP addresses, reducing exposure to external threats and helping to enforce a more secure network perimeter.
  6. Enforce uniform bucket-level access: For Google Cloud Storage, enabling the “ Enforce uniform bucket-level access“ setting ensures that access controls are uniformly managed through IAM roles, rather than through both IAM and Access Control Lists (ACLs), simplifying management and improving security.
  7. Enforcing Disk Encryption: You can enforce the encryption of compute disks, ensuring that all data is encrypted at rest and reducing the risk of data theft or exposure.
  8. Enforcing Minimum TLS Version: This policy ensures that services communicate using a minimum version of TLS, enhancing the security of data in transit by protecting against vulnerabilities in older versions of the protocol.
  9. Disabling Service Account Key Creation: By preventing the creation of new service account keys, organizations can encourage more secure and manageable authentication methods, such as using the IAM roles or the Workload Identity feature.

These examples represent just a few of the many organizational policies available in GCP that can be applied to secure and manage cloud resources effectively, ensuring they align with organizational objectives and compliance requirements.

Are Organizational Policies related to regulatory frameworks like Digital Operational Resilience Act  (DORA) or the revised Directive on Security of Network and Information Systems (NIS2)?

Yes, organizational policies help you with implementing those regulations. For example, in CHAPTER II, ICT risk management Article 5, Governance and organization the following is written:

Financial entities shall have in place an internal governance and control framework that ensures an effective and prudent management of ICT risk, in accordance with Article 6(4), in order to achieve a high level of digital operational resilience.

The management body of the financial entity shall define, approve, oversee, and be responsible for the implementation of all arrangements related to the ICT risk management framework referred to in Article 6(1).

Here are some examples, which are also available as Organizational Policies:

IAM
– Appropriate Service Accounts Access Key Rotation

Storage:
– Object Storage – Blocked Public Access (Organization-wise)

Networking
– Disabled Endpoint Public Access in Existing Clusters 

We at Soeldner Consult can support you not only in building a safe Landing Zone, but also help you with setting Organizational Policies the right way.

Google Cloud Landing Zone Series – Part 4: Naming Conventions

Naming Conventions

In the last blog post, we talked a lot about resource hierarchies. Resource hierarchies help to group projects into a folder structure and help with issues like governance, automation, access control, billing and cost management and other things.

Advantages of naming conventions

Another important issue for building up a scalable landing zone are naming conventions. Naming conventions bring up advantages for your environment, let’s shortly name some of them:

Clarity and Readability: Good naming conventions help in clearly identifying resources, their purpose, and their relationships. This enhances readability and understanding for anyone who interacts with the cloud environment, from developers to system administrators.

Consistency: Consistent naming makes it easier to manage resources across your different teams and projects. It reduces confusion and helps in setting standard practices for operations within the cloud environment.

Automation and Tooling Compatibility: Automated tools and scripts often rely on naming patterns to select and manage resources. Consistent naming conventions ensure that these tools can function correctly and efficiently, whether they are used for monitoring, provisioning, or management.

Security: Proper naming can aid in implementing security policies. For instance, names can indicate the sensitivity level of data stored in a resource, or whether a resource is in a production or development environment, helping in applying appropriate security controls.

Cost Management: Naming conventions can also aid in tracking and managing costs. By identifying resources clearly, organizations can monitor usage and costs more effectively, making it easier to optimize resource allocation and reduce wastage.

Examples of naming conventions

So there are clearly a lot of advantages of naming conventions, so let’s continue with that and provide some examples.

For projects you might want to embed some information within the project name. Common components might be:

  • The stage of the project, e.g. Test, QA or Prod
  • If you have a CMDB in your place, you might reuse some kind of service numbers or project numbers and embed them in the project name.
  • The purpose of the project, for example a network project, a project for storing audit information

In Google, it is also important to remember that project ids cannot be changed, but project names can be changed. So if a project number or something changes over time, changing the project name is possible, but changing the project id is not. That’s why it might be better to use a surrogate as the project id, and a descriptive name for the project name.

Another thing to consider is that resource ids might not be re-used – at least not immediately. For example, if you delete a project it will be first be in the trash for something like a month until it is eventually deleted. In this time, the name for the resource cannot be resued. Luckily, you can easily deal with the problem by appending some random suffix on your ressource.

Another important thing is to clearly adhere to the naming standard – even in edge cases. For example, if you separate your components in your project name by means of “-“, you can run into problems if your descriptive name also use “-“.  Here is a small example:

Good: p-1234557-landingzone-ab123

Bad: p-1234566-landing-zone-ab123

The latter example might break your automation processes later, because you cannot clearly figure out the purpose of the project anymore.

Also important, while naming conventions are important, you do not need naming conventions for everything. Cloud providers and Google is no exception here have hundreds of services and components and if you come up with a naming convention, you would only be busy with setting up and enforcing naming conventions.

Labels

Beside naming conventions, cloud providers also allow the use of labels to store metadata information. Here are some examples of labels you might encounter or use in a cloud setting:

Environment

env=production
env=staging
env=development
env=test

Project or Application

project=finance-app
project=customer-portal
app=inventory-management
app=hr-system

Owner or Team

team=backend
owner=joseph.cooper
team=frontend
owner=devops-team

Cost Center or Budget

cost-center=12345
budget=2024-Q1
cost-center=67890
budget=annual-2023

Automation Support

To demonstrate the how you can use automation for working with projects, let’s take a look at a Python snippet. In Google, you use the Resource Manager API for working with projects. Luckily, like for all APIs, there is a comprehensive library to be imported:

from google.cloud import resource_manager
 
def list_projects():
 
    client = resource_manager.Client()
    projects = client.list_projects()
 
    project_ids = [project.project_id for project in projects]
 
    # For debugging or direct response, convert list to string
    return str(project_ids)

As you can see, automation in the Google Cloud is really easy. For many use cases, the best way to deploy such scripts, is to use a Cloud Function and trigger it manually, with a schedule or based on some event.

Google Landing Zone Series – Part 3: Resource hierarchy

After we have outlined the importance of a Landing Zone for a successful cloud journey in the first part and having discussed Cloud Identity and its federation with Active Directory and Entra ID, now it is time to move to next part of our Landing Zone series – Resource hierarchies.

The Google Cloud is an excellent choice for your cloud journey – and the resource hierarchy is certainly one of the (many) reasons for it.

What is a resource hierarchy?

Let’s begin with what a resource hierarchy actually is.

In Google Cloud, a resource hierarchy is a structured framework that organizes all the resources (like VM instances, storage buckets, databases) you use on Google Cloud Platform (GCP). This structure is crucial for managing and administering your resources, especially as it pertains to organizing, managing access and permissions, and tracking costs. The hierarchy provides a clear, logical structure for grouping and segregating resources, facilitating governance, and cost management across an organization.

The hierarchy levels, from broadest to most specific, are:

1. Organization: The top-level container that represents your company. It is the root node in the Google Cloud resource hierarchy and provides centralized visibility and control over all GCP resources. Organizations are associated with a Google Workspace or Cloud Identity account.

2. Folders: Folders can contain projects or other folders, allowing you to group projects that share common attributes, such as the same team, application, environment (development, test, production), or other categorizations relevant to your organization. Folders help you manage access control and policies at a more granular level than the organization.

3. Projects: Projects are the fundamental grouping of resources and services in GCP. Every resource belongs to a project. Projects serve as a basis for enabling and using GCP services like compute engines, storage buckets, and database services, tracking and managing Google Cloud costs, managing permissions, and enabling billing. They can be used to represent logical divisions like different environments (prod, dev, test) or different parts of your organization.

4. Resources: At the bottom of the hierarchy, resources are the individual GCP services and components you use, such as Compute Engine instances, Cloud Storage buckets, or BigQuery datasets. Resources inherit policies and permissions from the project they are part of and can have additional policies applied directly to them.

This hierarchy allows you to apply permissions and policies at the level that makes the most sense. For example, you can set broad policies at the organization level (applicable to all resources within the organization), more specific policies at the folder or project level, and highly specific policies at the resource level. The structure also simplifies billing and resource management by allowing you to group and manage resources based on your organization’s operational needs and structure.

The following picture depicts a resource hiarchy in the Google Cloud:

An organizational chart representing the Google Cloud resource hierarchy for a company. At the top level, there is the 'Company' which is the root of the Google Cloud Organization. Below this are 'Folders' for different divisions such as 'Department X', 'Department Y', and a 'Shared infrastructure' folder. 'Department X' further branches out into 'Team A' and includes a 'Product 1' folder, while 'Department Y' contains 'Team B' and a 'Product 2' folder. The next level down shows 'Projects' under the company's organization, specifically a 'Development project', a 'Test project', and a 'Production project'. Lastly, at the bottom of the hierarchy, there are 'Resources' for each project, which include 'Compute Engine Instances', 'App Engine Services', and 'Cloud Storage Buckets'. This structure delineates the allocation and organization of resources in a clear and modular fashion.

What are design considerations for a resource hiarchary?

Designing a resource hierarchy in Google Cloud requires careful planning and consideration to ensure it effectively meets your organization’s needs for governance, cost management, security, and compliance. The design should be scalable, flexible, and capable of adapting to future changes in your organization or technology strategy. Here are some key considerations:

1. Organization Structure

Align the hierarchy with your organization’s structure to facilitate management and operational efficiency.

Consider how different departments, teams, or business units will use Google Cloud resources and how you can best structure projects and folders to reflect these use cases.

2. Resource Management and Governance

Plan for how resources will be managed, including who will have administrative control at various levels of the hierarchy.

Determine how to implement policies for resource usage, access control, and cost management effectively across the hierarchy.

3. Access Control and Security

Use the principle of least privilege to manage permissions; grant users only the access they need to perform their roles.

Structure your hierarchy to simplify the management of IAM policies and ensure secure access to resources.

Consider using folders to delegate administrative responsibilities and segregate environments (e.g., development, staging, production) for enhanced security.

4. Billing and Cost Management

Organize projects in a way that aligns with your billing and budgetary requirements.

Utilize folders to group projects by cost center or department to simplify billing management.

Leverage billing accounts and subaccounts effectively to manage and track costs.

5. Compliance and Regulatory Requirements

Ensure your resource hierarchy supports compliance with relevant laws and regulations.

Structure your hierarchy to isolate resources subject to specific regulatory requirements, facilitating easier compliance audits and controls.

6. Scalability and Flexibility

Design for future growth; consider how new teams, projects, or services can be added to the hierarchy without major reorganizations.

Ensure the hierarchy allows for flexibility in resource management, scaling, and reorganization as needed.

7. Environment Segregation

Clearly segregate resources for different environments (development, testing, production) to prevent unintended access or changes to critical systems.

Use projects or folders to isolate environments, applying policies and permissions accordingly to manage access and resource deployment.

8. Naming Conventions

Establish clear naming conventions for projects, folders, and resources to improve clarity and manageability.

Use meaningful names that reflect the resource’s purpose, environment, and associated team or department.

9. Policy Inheritance

Understand how policies are inherited in the hierarchy and design your structure to leverage this for efficient policy management.

Plan how to apply organization-wide policies (e.g., security policies) and how to override these policies at lower levels when necessary.

Are there any best practices and patterns?

So, as you can see, designing a resource hierarchy can be challenging at first sight. Fortunately, there are some patterns that can be widely seen as discussed above. As they are important, let’s recap some of them:

  • Isolate environments into separate projects or folders. As a pattern you can zse folders to group projects by environment, then apply environment-specific policies at the folder level.
  • Centralize logging and monitoring to a dedicated project. As a pattern, create a “shared services” project for centralized logging, monitoring, and auditing. Use aggregated log sinks to collect logs from all projects.
  • Structure projects in a way that aligns with organizational billing and budgetary needs. The pattern would be to assign a billing account to projects or folders based on budgetary ownership.
  • Design projects around logical groupings of resources that share the same lifecycle, security requirements, and team. In this case the pattern is Create projects based on applications, microservices, or teams, rather than a single monolithic project for all resources.

Google Cloud Landing Zone Series – Part 2: Identity and Access Management

Introduction

After having described what a Landing Zone is and having discussed its benefits, we want to explore what components are part of a Landing Zone and where design is needed.

In detail, we want to cover the following points:

  • Identity and Access Management
  • Resource hierarchy
  • Group creation and naming conventions
  • Organizational polices
  • Connectivity
  • Network Design
  • DNS Design
  • Availability and DR Strategy

Let’s begin with Identity and Access Management

Identity and Access Management

Every Google Cloud journey starts with setting up the basic organization – and hence setting up Google Cloud Identity. Google Cloud Identity is basically free, but also has an enterprise tier with additional features. It offers a range of benefits for organizations seeking to manage user identities and access to applications and services securely:

  1. Centralized Identity Management: It allows for centralized management of users and groups, making it easier to control access to resources across an organization. 
  2. Secure Access to Applications: Google Cloud Identity provides secure, single sign-on (SSO) access to both Google services and third-party applications. 
  3. Multi-factor Authentication (MFA): It supports multi-factor authentication, which adds an extra layer of security beyond just passwords. 
  4. Integration with Existing Identity Systems: Google Cloud Identity can be integrated with existing identity management systems, such as Microsoft Active Directory or Entra ID. This allows organizations to extend their current identity and access management (IAM) policies to Google Cloud resources without needing to manage separate IAM systems. As many companies already use Office 365 and its user management, this is a crucial factor.
  5. Access and Identity Governance: It offers tools for identity governance, allowing organizations to enforce policies regarding who can access what resources under what conditions. This includes setting up conditional access policies based on user attributes, device status, location, and more. Note that some of those features come in the free Cloud Identiy Edition, but some features are only within the paid edition.
  6. Enhanced Compliance and Reporting: It helps organizations comply with regulatory requirements by providing detailed access and activity logs. These logs can be used for auditing purposes to ensure that access policies are being followed and to investigate security incidents.

In essence, while the features of Google Cloud Identity in its paid edition are certainly worth paying, companies that do already have Microsoft Entra ID in place, for example because they use Microsoft Office 365 usually want to integrate Google Cloud Identity with Microsoft Entra ID. Fortunately, this can be done, but takes a little effort for the integration.

Entra ID and Google Identity Federation

Setting up such a federation involves two main processes:

  1. Provisioning users and groups: Users and groups from Microsoft Entra ID are periodically synchronized to Cloud Identity or Google Workspace. This ensures that new users in Microsoft Entra ID are also available in Google Cloud for access management, including before their first login. It also ensures that deletions in Microsoft Entra ID are propagated to Google Cloud. This provisioning is unidirectional; changes in Microsoft Entra ID are mirrored in Google Cloud but not the other way around, and passwords are not included in the synchronization.
  2. Single sign-on (SSO): For authentication, Google Cloud uses the Security Assertion Markup Language (SAML) protocol to delegate authentication to Microsoft Entra ID. Depending on the configuration, Microsoft Entra ID can authenticate directly, use pass-through authentication or password hash synchronization, or delegate to an on-premises AD FS server. This setup avoids the need for password synchronization and ensures enforcement of any configured security policies or multi-factor authentication (MFA) mechanisms in Microsoft Entra ID or AD FS.

The following picture depicts this integration:

A schematic illustrating identity management and integration between an on-premises environment, Azure AD, and Google Cloud. The on-premises environment features 'example.com AD Forest' and 'example.com AD Domain.' This domain is connected to Azure AD, which shows 'example.com Azure AD tenant.' The Azure AD tenant connects to Google Cloud, providing user provisioning and enabling single sign-on to 'example.com Cloud Identity.' This identity service is then linked to 'example.com Google Cloud Organization' and other Google services. There is also a link from Azure AD to third-party corporate SaaS applications, indicating an integrated identity ecosystem.

One important thing to consider is the use of DNS – which plays an important role both for Entra ID and Cloud Identity. In detail, customers have to consider how to share DNS names between Entra ID and Cloud Identity. 

Basically, Cloud Identity uses email addresses for identifying users, which guarantees that Google can send notifications to those users. This email address is stable and needs to be mapped to an user attribute in Entra ID. This can be the UPN or the email address itself.

When implementing the federation between these two systems, the following steps have to be done:

  1. Users for the automatic account provisioning must be set up and configured accordingly.
  2. Within Entra ID, there is an Enterprise Application for Google (Google Cloud/G Suite Connector by Microsoft), which must be configured. This involves at least the configuration of user provisioning and optionally the group provisioning. After the configuration, users and groups are synchronized with Cloud Identity.
  3. Single Sign-on must be set up. Once again, there is an Enterprise Application: Google Cloud/G Suite Connector by Microsoft.
  4. Last but not least, SSO needs to be configured in Cloud Identity.

Once everything is configured, users will be able to log in to the Google Cloud with their Entra ID credentials.

Active Directory Federation with Google Identity

In other scenarios, customers might not have Entra ID, but use Active Directory and want to use those credentials for Cloud Identity. The basic steps remain the same:

  • Users and groups should be provisioned to Cloud Identity
  • SSO should be configured.

To set up federation for user identity management, two main tools can be used:

1. Google Cloud Directory Sync: This is a free tool from Google that facilitates the synchronization process between your existing identity management system and Google Cloud. It operates over Secure Sockets Layer (SSL) for security and is typically deployed within your current computing infrastructure.

2. Active Directory Federation Services (AD FS): Offered by Microsoft as a component of Windows Server, AD FS allows the use of Active Directory to achieve federated authentication. 

The configuration depends on the customer’s Active Directory. Many companies will have a single forest and a single domain in AD, but larger enterprises might have multiple fores. The following graphic depicts a simple scenario:

A flow diagram depicting the synchronization of an existing computing environment with Google's services. On the left, 'Example.com Active Directory forest' and 'Forest Root Domain' represent the existing computing environment. A connection is shown moving to the right towards Google's 'Cloud Identity,' where an 'Example.com Cloud Identity account' is set up. Further to the right, this account is associated with 'Example.com GCP organization,' which manages various 'Projects' within the Google Cloud Platform, indicating a streamlined identity and access management across the systems.

When an Active Directory forest comprises only one domain, it’s possible to link the entire forest to a singular Cloud Identity or Google Workspace account. This setup forms a unified Google Cloud organization through which all Google Cloud resources can be managed. In such a single-domain scenario, both domain controllers and global catalog servers grant access to the entirety of objects managed within Active Directory. Typically, managing this setup involves running a single instance of Google Cloud Directory Sync to synchronize user accounts and groups to Google Cloud, alongside maintaining a single AD FS instance or fleet for single sign-on functionality.

As this blog series focus on Landing Zone, we will not go into more details here and continue with resource hierarchies in the next blog post.

Google Cloud Landing Zone Series – Part 1: Introduction

Welcome to our new blog post series about Landing Zones in Google. In this and the next blog posts, we will explain what is a landing zone, why you need it, describe the components of a landing zone and explain how to setup a landing zone.

What is a Landing Zone

A landing zone, as outlined by Google’s best practices, is a foundational element in constructing an organization’s Google Cloud Platform (GCP) infrastructure. It utilizes an Infrastructure-as-Code (IaC) approach to set up a GCP organization and manage the deployment of resources for various tenants. A tenant, in this context, refers to an independent entity—typically a team responsible for one or more applications—that consumes platform resources.

The rationale behind implementing a landing zone is to streamline and standardize the setup of an organization’s cloud environment. By following established best practices, a landing zone helps prevent the duplication of efforts among tenants, ensures the use of shared components, and enforces adherence to agreed-upon policies. All environment setups are done through approved IaC methods.

Why do we need a Landing Zone

The benefits of deploying a Landing Zone include:

A landing zone for the cloud is essential for several key reasons, especially for organizations looking to deploy and manage their cloud environments effectively and securely. Here are the primary reasons why a landing zone is needed:

1. Standardization: A landing zone provides a standardized approach to setting up and configuring cloud environments. This ensures that all deployments follow the same best practices, configurations, and security standards, leading to consistency across the organization’s cloud infrastructure. This also helps in reducing unnecessary complexity: Solutions are designed following a predefined methodology.

2. Security and Compliance: By establishing a set of security baselines and policies from the outset, a landing zone helps ensure that all cloud resources comply with the organization’s security requirements and regulatory standards. This preemptive approach to security greatly reduces the risk of vulnerabilities and breaches. As such, strengthening security posture is achieved by a common framework for security, access control, and patch management for improving security and compliance.

3. Efficiency and Scalability: With a landing zone, organizations can automate the provisioning of cloud resources, making it easier to scale up or down based on demand. This automation not only speeds up the deployment process but also reduces the likelihood of human error, contributing to a more reliable and efficient cloud environment.

4. Cost Management: Landing zones can help organizations avoid unnecessary costs by ensuring that resources are efficiently allocated and used. Through governance and standardized tagging, it becomes easier to track and manage cloud spending across different departments or projects.

5. Simplified Governance: A landing zone provides a framework for governance, allowing organizations to enforce policies, monitor compliance, and manage access control effectively. This simplifies the governance of cloud resources and helps maintain order as the cloud environment grows. For example, this helps inavoiding unmanaged project sprawl, which s achieved by deploying projects within a standard structure, using consistent naming conventions, and a uniform approach to labeling resources.

6. Faster Time to Market: By streamlining the setup process and enabling automation, landing zones reduce the time it takes to deploy new applications or services. This faster deployment capability can provide a competitive advantage by allowing organizations to bring solutions to market more quickly.

7. Resource Isolation: Landing zones can be designed to isolate resources between different environments (e.g., development, testing, production) or between different projects or tenants. This isolation enhances security and operational efficiency by preventing unintended interactions between resources.

8. Improving reliability: The use of automation, immutable infrastructure, and standardized monitoring, logging, and alerting mechanisms enhance system reliability.

9. Delegating resource management: Tenants are empowered to create and manage their resources within the landing zone framework, ensuring flexibility within a controlled environment.

In summary, landing zones are foundational to building a secure, efficient, and scalable cloud environment. They enable organizations to deploy cloud resources in a controlled, automated, and consistent manner, paving the way for innovation and growth while minimizing risks and costs.

What do you get?

After talking about the benefits of a Landing Zone, let’s talk about what you get with a Landing Zone. 

1. Standardization and Efficiency: Landing Zones provide a repeatable, consistent approach for deploying cloud services using a standardized set of tools and Infrastructure-as-Code (IaC). This methodology prevents unnecessary duplication of effort and limits the proliferation of disparate products by employing curated and endorsed design blueprints as IaC.

2. IaC Capabilities:

A Landing Zone should be build by means of IaC. In order to have a repeatable set of components, the following elements can be provided:

Tenant Factory: Enables the creation of a top-level folder for a tenant along with an associated service account. In terms of Google, this is about configuring the Google organization.

Project Factory: Allows tenants to create their own projects using their service accounts, ensuring that resources are deployed exclusively via IaC and service accounts, except in sandbox projects where experimentation is allowed. With such a Project Factory, workload can later easily be onboarded with the Google Cloud.

CI/CD Toolchain: Facilitates automation and consistent deployment practices. We recommend using GitLab and GitLab CI, as already comes with support for Terraform.

3. Enforcement of Infrastructure Automation: To maintain consistency and agility, the use of infrastructure automation is enforced, preventing configuration drift and aligning with principles of automation and immutable infrastructure. This ensures that outcomes are predictable and that manual console-based configurations, which undermine consistency, are avoided. This involves not only using Terraform, but also have the “right DevOps workflows”, so that automation is done right.

4. Organizational Hierarchy and Policies:

Supports the creation of multiple isolated tenants within a platform, each with the autonomy to manage their own resources within defined boundaries.

Enforces a set of organization-wide policies aligned with best practices for security, such as preventing the creation of default networks, external IP addresses on compute instances, and mandating the use of OS Login for SSH access.

5. Predefined Network Topology:

Options for network topology include a shared VPC model or a hub-and-spoke pattern, promoting efficient resource allocation and connectivity among tenants while maintaining security through centralized control mechanisms like ingress and egress patterns. 

Like other hyperscalers, Google already provides architectural guidelines and sample architectures for  Landing Zones. For example, in the following, we can see a design from Google.

A detailed diagram representing the architecture of Google Cloud Landing Zones. On the left, 'On-premises' applications are connected via 'Cloud Interconnect' to Google Cloud's 'Shared VPC prod'. There's also a connection from 'Other cloud providers' through a 'VPN gateway' to a 'Cloud VPN'. 'Identity' is managed by 'Cloud Identity' and 'IAM' for identity and access management. In the center, Google Cloud's infrastructure is outlined, with services like 'Cloud Router', 'Cloud NAT', and 'Cloud Resource Manager'. There are also organizational tools such as the 'Organization Policy Service'. The diagram depicts two separate application service projects labeled 'Workload 1' and 'Workload 2', each containing 'Compute Engine' instances, and a 'Workload cluster' with 'GKE' for containerized applications. 'Cloud Storage' is also part of the setup. On the right, the data analytics project is connected, featuring services like 'Cloud Dataflow' and 'BigQuery'. The entire setup is monitored and secured by 'VPC Flow Logs', 'Cloud Audit Logs', and 'Firewall Rules Logging'. The private zone managed by 'Cloud DNS' is also noted. The network is segmented into production and development environments, with firewall rules ensuring secure access controls.

We at Soeldner Consult have a strategic partnership with CloudGems, for building Landing Zones in a very short time. Cloud Gems have their own design for a Landing Zone, which is very flexible and can be used in regulated environments as well as in traditional industries.