Google Cloud Landing Zone Series – Part 11: Landing Zone Logging

Table of Contents:

Tags:

A Landing Zone serves as the foundation for the whole cloud journey and as such it allows to deploy workloads and operate it in an efficient, scalable and secure manner. For the sake of security, it is important to know what security relevant events happen within the cloud and provide means to centrally collect those events and help to analyze them. Of course, the Google Cloud provides built-in services that can with these tasks.

What services does Google offer for Observability?

Basically, Logging in Google Cloud involves the collection, storage, analysis, and management of log data generated by applications and services running on GCP. This is crucial for debugging, monitoring the performance, and ensuring the security of applications.

In detail, Google Cloud offers several services that facilitate effective logging:

1. Cloud Logging: This is the primary service used for logging in Google Cloud. It allows you to store, search, analyze, monitor, and alert on log data and events from Google Cloud and Amazon Web Services. Cloud Logging provides a centralized log management for services such as Google Compute Engine, Google Kubernetes Engine, and other Google Cloud services.

2. Cloud Monitoring: While primarily a monitoring service, it works closely with Cloud Logging to provide visibility into the performance, uptime, and overall health of cloud-powered applications. It helps detect and diagnose infrastructure problems using the data provided by logs.

3. Cloud Trace: This service is used for performance analysis. It collects latency data from applications and uses it to troubleshoot performance bottlenecks. Although it’s more about tracing than logging, the data can complement logs by providing insights into the timing of various operations.

4. Cloud Debugger: Integrates with Cloud Logging to help you inspect the state of an application at any code point without stopping or slowing down the execution. This is particularly useful for live applications and can use logs to provide context for the state of the application at specific times.

5. Cloud Audit Logs: Keeps track of user activities and API calls, providing logs that are useful for security and compliance monitoring. Cloud Audit Logs help you to answer „who did what, where, and when?“ within your Google Cloud resources.

6.Error Reporting: Automatically counts, analyzes, and aggregates the crashes in your running cloud services. Error Reporting works with Cloud Logging to provide a real-time view of errors.

In this blog post, we will cover Cloud Logging and Cloud Audit Logs. Later we will also discuss Cloud Monitoring and how to create dashboards for a Landing Zone.

What kind of logs can we collect in the Google Cloud

There is  a wide variety of logs, each providing valuable insights into different aspects of your cloud environment. These logs are typically categorized into several types, depending on the source and nature of the data they contain:

1. Audit Logs:

   – Admin Activity Logs: These logs record operations that modify the configuration or metadata of resources. Examples include changes to security rules, creation and deletion of resources, and modifications to service settings.

   – Data Access Logs: These include logs for actions that read from or write to a service. Such logs are crucial for compliance and security auditing but may not be enabled by default due to their potential volume.

   – System Event Logs: These logs track actions taken by the Google Cloud system (rather than by users) that modify the configuration of resources.

   – Policy Denied Logs: These logs are generated when a Google Cloud service denies access to a user or service account based on the current policy (like violations against Organizational Policies)

2. Platform Logs:

   – These logs are generated by the Google Cloud infrastructure itself, including Google Compute Engine, Google Kubernetes Engine, and other services. They help in troubleshooting and monitoring the performance of these services.

3. Application Logs:

   – Custom logs generated by user applications running on Google Cloud services. These logs are defined by the application developers and can provide application-specific insights, such as user activities or operational metrics.

4. Network Logs:

   – VPC Flow Logs: Record information about network traffic in and out of your Google Cloud Virtual Private Cloud (VPC) networks, helping with network monitoring, forensics, real-time security analysis, and cost allocation.

   – Firewall Rules Logs: Log details about the traffic affected by firewall rules, providing insights into security and traffic patterns.

5. Access Logs:

   – These include logs generated by services that process requests on behalf of users, such as Cloud Load Balancing, Cloud Storage, and Identity-Aware Proxy. These logs are useful for understanding request patterns and troubleshooting issues related to access or latency.

6. Resource Logs:

   – Logs from specific Google Cloud resources, providing details about operations performed on or by these resources. This can include logs from Cloud SQL, Cloud Functions, and other managed services.

These logs can be used for a variety of purposes including debugging application errors, monitoring system health, auditing user activities, and optimizing resource usage. In Google Cloud, you can use Cloud Logging to manage these logs effectively, applying filters, setting up exports, and integrating with other analytics and monitoring tools to derive deeper insights from your log data.

What kind of logs can are relevant for the Landing Zone perspective

When building a landing zone, not all logs are relevant. We should clearly differentiate between the application logs for the different teams and the logs, which are relevant in terms of security, governance or network. Those logs are most important:

  • Audit Logs
  • Network Logs

What Google Cloud components do I need for centralized logging and how does an architecture look like?

In order to configure a centralize logging, we have to understand Log Routers. In Google GCP, a Log Router is a crucial component of Cloud Logging that allows you to manage and control how your logs are processed and utilized. It serves as the interception point for all logs generated by your cloud resources before they are sent to various destinations like Cloud Storage, BigQuery, Pub/Sub, or retained within Cloud Logging itself.

How Log Router Works

The Log Router works by evaluating each log entry against user-defined rules and then routing the log entries to appropriate destinations based on these rules:

1. Log Entry Creation: Log entries are generated by Google Cloud services, VM instances, containers, and user applications that are integrated with Cloud Logging.

2. Log Ingestion: These log entries are ingested by Cloud Logging where they are temporarily stored.

3. Log Filtering: The Log Router examines each log entry against filtering rules specified in the logging sinks. These filters determine what kind of logs should be captured or excluded and can be customized to match specific log attributes like severity, resource type, or other log metadata.

4. Routing: Based on the filters, the Log Router then routes the logs to configured sinks. If a log entry matches the filter of a sink, it is sent to the sink’s specified destination. Logs can be sent to:

   – Cloud Storage: For cost-effective long-term storage, especially useful for audit and compliance needs.

   – BigQuery: For detailed analysis and querying of log data.

   – Pub/Sub: For streaming logs to other applications, real-time monitoring, or integration with third-party tools and services.

   – Another Cloud Logging bucket: For retaining logs within Cloud Logging under different storage parameters.

5. Storage and Management: Once routed, the logs are managed based on the settings of the destination. For example, logs in BigQuery can be queried using SQL, while logs in Cloud Storage are retained as per the lifecycle rules configured for the storage bucket. The following picture shows the architecture:

Diagram showing the architecture of Google's Cloud Logging infrastructure. The process starts with logs data that is routed through the Logs Router via the Cloud Logging API, which can generate log-based metrics. Logs are directed to different log sinks: Required log sink, Default log sink, and User-defined log sinks. These sinks write logs to corresponding log buckets: Required log bucket (with 400-day non-configurable retention), Default log bucket (with 30-day configurable retention), and User-defined log buckets (with 30-day configurable retention). The logs can be used in the Google Cloud ecosystem, including Cloud Storage, BigQuery, Pub/Sub, and Cloud Monitoring, and can also be exported to third-party platforms.

Where should I configure the log sink?

For the Landing Zone, it is usually recommended to configure the log sink on an organizational level. This has the following advantages:

·  Centralized Management: Setting up log sinks at the organization level allows you to manage logging centrally for all projects and resources within the organization. This centralization simplifies administration and ensures consistent logging practices across multiple teams and projects.

·  Compliance and Auditing: For organizations subject to regulatory requirements, centralizing logs at the organization level helps in maintaining comprehensive audit trails. It ensures that logs are collected uniformly, making it easier to demonstrate compliance with data governance and privacy regulations.

·  Cost Efficiency: By aggregating logs at the organization level, you can optimize costs by eliminating redundant storage and processing. You can also establish more efficient data retention policies and take advantage of economies of scale in log storage and analysis.

·  Improved Security Monitoring: Centralized logging enhances security monitoring and incident response capabilities. It enables security teams to have an overarching view of the security posture across all projects and resources, facilitating quicker detection of suspicious activities and anomalies.

·  Streamlined Analysis and Insights: With all logs collected in a central repository, it’s easier to perform analysis and generate insights that are applicable across the entire organization. This can aid in strategic decision-making and improve operational efficiencies.

How do log sink filter look like?

We already mentioned, that we don’t need all the logs, so we will just bring some examples for activity, VPC flow logs and Firewall logs:

logName:"logs/cloudaudit.googleapis.com%2Factivity"

resource.type="gce_subnetwork"

logName:"logs/compute.googleapis.com%2Fvpc_flows"

resource.type="gce_firewall_rule"

logName:"logs/compute.googleapis.com%2Ffirewall"

Log filters can be configured with the GUI, CLI or automation. Here are the steps necessary to configure it with the GUI:

·  Open the Cloud Console: Go to the Google Cloud Console and navigate to the Logging section.

·  Select “Log Router”: From the Logging menu, access the Log Router to manage and create new log sinks.

·  Choose Organization Scope: When creating a new log sink, set the scope to the organization level by selecting your organization from the dropdown menu.

·  Configure Log Sink Filters: Here are the filters you might use for each type of log at the organization level.

Autor

Dr. Guido Söldner

Geschäftsführer

Guido Söldner ist Geschäftsführer und Principal Consultant bei Söldner Consult. Sein Themenfeld umfasst Cloud Infrastruktur, Automatisierung und DevOps, Kubernetes, Machine Learning und Enterprise Programmierung mit Spring.