In the last blog post, we shortly introduced the different components and discussed the key points including an architecture for centralized logging and collection of audit logs.
In this blog post, we will shift our focus on monitoring, especially monitoring a landing zone.
Why Monitoring
But why monitoring is important. Let’s recap the most important points:
- Security: Continuous monitoring helps detect potential security threats and vulnerabilities in real-time, enabling timely responses to mitigate risks as.
- Performance: Monitoring helps in tracking the performance of cloud resources, ensuring optimal utilization and identifying underperforming or overutilized resources.
- Cost Management: Monitoring usage patterns helps in tracking and managing cloud costs, identifying areas of wastage, and optimizing expenditure
- Operational Efficiency: Automated monitoring tools can trigger alerts for any anomalies, allowing for proactive management of issues.
- Scalabilty: Automated monitoring tools can trigger alerts for any anomalies, allowing for proactive management of issues.
In conclusion, effective monitoring in a cloud landing zone is essential for maintaining the security, performance, and cost-efficiency of cloud operations. It enables proactive management, ensures compliance, and provides valuable insights for continuous improvement.
Out-of-the-box monitoring and dashboards
If you are starting building your own dashboards, it is good to know that Google Cloud provides several out-of-the-box monitoring dashboards through its Google Cloud Monitoring (formerly Stackdriver) service. These dashboards offer pre-configured insights and visualizations for various Google Cloud resources. For example, if you go Monitoring > Dashboards, you will find preconfigured dashboards as well as a dashboard library. Here are some examples of those dashboards:
- VM Instances Dashboard
- Provides insights into the performance and health of virtual machine instances, including CPU utilization, memory usage, disk I/O, and network traffic.
- Provides insights into the performance and health of virtual machine instances, including CPU utilization, memory usage, disk I/O, and network traffic.
- Google Kubernetes Engine (GKE) Dashboard
- Offers comprehensive monitoring for GKE clusters, including cluster health, node performance, pod metrics, and resource utilization.
- Offers comprehensive monitoring for GKE clusters, including cluster health, node performance, pod metrics, and resource utilization.
- Google Cloud Functions Dashboard
- Monitors the performance and execution of serverless functions, including invocation count, execution times, and error rates.
- Monitors the performance and execution of serverless functions, including invocation count, execution times, and error rates.
- Google Cloud Pub/Sub Dashboard
- Tracks the performance and usage of Pub/Sub topics and subscriptions, including message delivery, processing latency, and backlog size.
- Tracks the performance and usage of Pub/Sub topics and subscriptions, including message delivery, processing latency, and backlog size.
- Google Cloud BigQuery Dashboard
- Provides insights into BigQuery performance, including query execution times, slot usage, and job statistics.
- Provides insights into BigQuery performance, including query execution times, slot usage, and job statistics.
- Google Cloud SQL Dashboard
- Offers monitoring for Cloud SQL instances, covering metrics like CPU utilization, memory usage, disk space, and connection counts.
- Offers monitoring for Cloud SQL instances, covering metrics like CPU utilization, memory usage, disk space, and connection counts.
- Google Cloud Storage Dashboard
- Tracks the usage and performance of Cloud Storage buckets, including storage capacity, request counts, and latency metrics.
- Tracks the usage and performance of Cloud Storage buckets, including storage capacity, request counts, and latency metrics.
- Google App Engine Dashboard
- Provides insights into the performance of applications running on App Engine, including request counts, response times, and error rates.
- Provides insights into the performance of applications running on App Engine, including request counts, response times, and error rates.
- Google Cloud Load Balancing Dashboard
- Monitors the performance of load balancers, including request counts, response times, and backend instance health.
What components should be monitored in a Landing Zone?
For a landing zone, you should monitor all the key components. The following list gives some examples:
- Monitoring VPNs (Virtual Private Networks) and Interconnect is crucial in a cloud landing zone to ensure secure, reliable, and efficient connectivity between on-premises networks and the cloud environment. In detail, you should monitor the connection status, performance metrics, some basic traffic analysis, resource utilization, errors and event logs or redundancy.
- If you have components like VPC Serverless connectors, you should monitor them as well. Like with VPN, you should check performance metrics and resource utilization.
- Any network appliance should be monitored as it is crucial for the landing zone.
- In the last blog post we discussed logging and auditing. If you export those logs, it is certainly useful to have some basic monitoring.
- The violation of organizational policies can also be monitored.
What tools can be used
Google Monitoring is already quite powerful. Nevertheless, you can also use other tools:
- Cloud Monitoring offers an importer that allows you to import Grafana dashboard files in JSON format into Cloud Monitoring. This document describes how to use the importer to convert Grafana dashboards and optionally upload them to your Google Cloud project.
With the importer, you can convert and upload Grafana dashboards to Cloud Monitoring in a single operation, or you can perform the conversion and upload steps separately. You might choose this approach if you want to edit the converted dashboards before uploading them.
- You can also run your own Grafana instance based on Google Managed Prometheus. Google Managed Prometheus is a fully managed monitoring solution provided by Google Cloud that is compatible with the open-source Prometheus monitoring system. It allows users to leverage Prometheus‘ powerful monitoring and alerting capabilities while benefiting from Google Cloud’s scalability, security, and integration with other Google Cloud services.
- You can use third party tools like Looker.