ContainerLowCpuUtilization #
Container CPU utilization is under 20% for 1 week. Consider reducing the allocated CPU.
Alert Rule
alert: ContainerLowCpuUtilization
annotations:
description: |-
Container CPU utilization is under 20% for 1 week. Consider reducing the allocated CPU.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/google-cadvisor/containerlowcpuutilization/
summary: Container Low CPU utilization (instance {{ $labels.instance }})
expr: (sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod, container)
/ sum(container_spec_cpu_quota{container!=""}/container_spec_cpu_period{container!=""})
by (pod, container) * 100) < 20
for: 7d
labels:
severity: info
Here is the runbook for the Prometheus alert rule ContainerLowCpuUtilization
:
Meaning #
The ContainerLowCpuUtilization
alert is triggered when a container’s CPU utilization falls below 20% for a period of 7 days. This alert is classified as informational and is not critical, but it may indicate inefficient resource allocation.
Impact #
Low CPU utilization can lead to wasted resources and inefficient use of cluster capacity. If left unchecked, it can result in:
- Underutilized nodes and clusters
- Inefficient resource allocation
- Higher costs due to unused resources
- Potential performance issues due to overprovisioning
Diagnosis #
To diagnose the issue, follow these steps:
- Identify the affected container and pod using the
instance
label provided in the alert. - Check the container’s CPU usage over the past 7 days using a tool like Prometheus or Grafana.
- Verify that the container is not experiencing any performance issues or errors that could be contributing to the low CPU utilization.
- Review the container’s resource allocation and configuration to ensure it is properly sized for its workload.
Mitigation #
To mitigate the issue, consider the following steps:
- Reduce the allocated CPU resources for the container to match its actual usage.
- Right-size the container’s resource allocation based on its workload and performance requirements.
- Consider consolidating underutilized containers onto fewer nodes to optimize resource usage.
- Implement automated resource scaling and optimization tools to detect and adjust resource allocation in real-time.
Note: This runbook provides general guidance and may need to be tailored to your specific use case and infrastructure.