ContainerHighThrottleRate #
Container is being throttled
Alert Rule
alert: ContainerHighThrottleRate
annotations:
description: |-
Container is being throttled
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/google-cadvisor/containerhighthrottlerate/
summary: Container high throttle rate (instance {{ $labels.instance }})
expr: sum(increase(container_cpu_cfs_throttled_periods_total{container!=""}[5m]))
by (container, pod, namespace) / sum(increase(container_cpu_cfs_periods_total[5m]))
by (container, pod, namespace) > ( 25 / 100 )
for: 5m
labels:
severity: warning
Meaning #
The ContainerHighThrottleRate alert is triggered when the CPU throttle rate of a container exceeds 25% over a 5-minute period. This means that the container is being throttled, resulting in reduced performance and potential impact on the overall system.
Impact #
The impact of a high throttle rate can be significant, leading to:
- Reduced container performance, resulting in slower response times and decreased throughput
- Increased latency and errors, potentially affecting user experience and business operations
- Resource waste, as the container is not utilizing available CPU resources efficiently
- Potential cascade effects on dependent services and applications
Diagnosis #
To diagnose the root cause of the high throttle rate, follow these steps:
- Identify the affected container, pod, and namespace using the alert labels
- Check the container’s CPU usage and throttling period metrics using tools like
top
ordocker stats
- Investigate potential causes, such as:
- Insufficient CPU resources allocated to the container
- Resource-intensive tasks or workloads running in the container
- Poor resource utilization or inefficient application design
- Review system logs and container logs to identify any relevant errors or warnings
- Consult with the application team and developers to understand the expected behavior and resource requirements of the container
Mitigation #
To mitigate the high throttle rate, follow these steps:
- Investigate and address any underlying resource issues, such as:
- Increasing CPU resources allocated to the container
- Optimizing resource utilization through container tuning or rightsizing
- Implementing load balancing or horizontal scaling to distribute workload
- Optimize application performance and resource efficiency, such as:
- Implementing caching or content delivery networks to reduce load
- Optimizing database queries and transactions
- Implementing efficient algorithms and data structures
- Consider implementing throttling limits or rate limiting to prevent excessive resource usage
- Monitor and analyze container performance and resource utilization to ensure the issue is resolved and to prevent future occurrences
- Develop and implement a long-term plan to ensure sustainable resource allocation and efficient container performance.