ContainerHighCpuUtilization #
Container CPU utilization is above 80%
Alert Rule
alert: ContainerHighCpuUtilization
annotations:
description: |-
Container CPU utilization is above 80%
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/google-cadvisor/containerhighcpuutilization/
summary: Container High CPU utilization (instance {{ $labels.instance }})
expr: (sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod, container)
/ sum(container_spec_cpu_quota{container!=""}/container_spec_cpu_period{container!=""})
by (pod, container) * 100) > 80
for: 2m
labels:
severity: warning
Here is a runbook for the Prometheus alert rule ContainerHighCpuUtilization
:
Meaning #
This alert is triggered when a container’s CPU utilization exceeds 80% over a 5-minute period. This can indicate that the container is experiencing high CPU usage, which may lead to performance issues, slower response times, or even crashes.
Impact #
High CPU utilization can have several consequences, including:
- Slower response times: High CPU usage can cause the container to slow down, leading to delayed responses to requests.
- Increased latency: High CPU utilization can increase the time it takes for the container to process requests, leading to increased latency.
- Resource starvation: High CPU utilization can starve other containers or processes of CPU resources, leading to performance issues or failures.
- Increased risk of crashes: Prolonged high CPU utilization can cause the container to crash or become unresponsive.
Diagnosis #
To diagnose the root cause of high CPU utilization, follow these steps:
- Identify the affected container: Check the
pod
andcontainer
labels in the alert to determine which container is experiencing high CPU utilization. - Check container logs: Review the container logs to see if there are any error messages or indications of high CPU usage.
- Check container resource usage: Use tools like
kubectl top
ordocker stats
to check the container’s CPU usage, memory usage, and other resource utilization. - Analyze container configuration: Review the container configuration to ensure that it is properly configured and optimized for performance.
- Check system resource usage: Verify that the underlying system has sufficient resources (e.g., CPU, memory) to support the container’s workload.
Mitigation #
To mitigate high CPU utilization, follow these steps:
- Optimize container configuration: Review and optimize the container configuration to ensure it is properly configured for performance.
- Increase container resources: If necessary, increase the container’s CPU or memory resources to ensure it has sufficient resources to handle the workload.
- Implement rate limiting: Implement rate limiting or throttling to prevent excessive requests that can cause high CPU utilization.
- Scale container instances: If the container is experiencing high CPU utilization due to high traffic, consider scaling the number of container instances to distribute the workload.
- Investigate underlying system issues: If high CPU utilization persists, investigate underlying system issues, such as hardware failures or system configuration problems.
Remember to take a proactive approach to monitoring and addressing high CPU utilization to prevent performance issues and ensure the reliability of your system.