HostCpuIsUnderutilized #
CPU load is < 20% for 1 week. Consider reducing the number of CPUs.
Alert Rule
alert: HostCpuIsUnderutilized
annotations:
description: |-
CPU load is < 20% for 1 week. Consider reducing the number of CPUs.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/node-exporter/hostcpuisunderutilized/
summary: Host CPU is underutilized (instance {{ $labels.instance }})
expr: (100 - (rate(node_cpu_seconds_total{mode="idle"}[30m]) * 100) < 20) * on(instance)
group_left (nodename) node_uname_info{nodename=~".+"}
for: 1w
labels:
severity: info
Here is a sample runbook for the HostCpuIsUnderutilized
alert:
Meaning #
The HostCpuIsUnderutilized
alert is triggered when the CPU load of a host is less than 20% for a sustained period of 1 week. This indicates that the host is not utilizing its CPU resources efficiently, and there may be opportunities to optimize resource allocation.
Impact #
The impact of underutilized CPU resources can be significant. It can lead to:
- Inefficient use of resources, resulting in unnecessary costs
- Inability to handle increased workloads, potentially leading to performance issues
- Difficulty in scaling applications and services, as underutilized resources may not be able to handle increased demand
Diagnosis #
To diagnose the root cause of the underutilized CPU, follow these steps:
- Check the node’s CPU utilization graph in Prometheus to identify the pattern of underutilization.
- Review the node’s workload and application logs to identify any potential bottlenecks or inefficiencies.
- Verify that the node’s configuration and resource allocation are optimal for the current workload.
- Check for any resource-intensive processes or applications that may be consuming CPU resources inefficiently.
Mitigation #
To mitigate the issue of underutilized CPU, follow these steps:
- Review the node’s configuration and resource allocation to identify opportunities for optimization.
- Consider reducing the number of CPUs allocated to the node, if possible, to align with the current workload demands.
- Optimize resource-intensive processes or applications to improve their CPU efficiency.
- Consider right-sizing the node’s resources to match the current workload demands.
- Monitor the node’s CPU utilization regularly to ensure that the mitigation steps have been effective.