NetdataHighCpuUsage #
Netdata high CPU usage (> 80%)
Alert Rule
alert: NetdataHighCpuUsage
annotations:
description: |-
Netdata high CPU usage (> 80%)
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/netdata-internal/netdatahighcpuusage/
summary: Netdata high cpu usage (instance {{ $labels.instance }})
expr: rate(netdata_cpu_cpu_percentage_average{dimension="idle"}[1m]) > 80
for: 5m
labels:
severity: warning
Here is a runbook for the Prometheus alert rule NetdataHighCpuUsage
:
Meaning #
The NetdataHighCpuUsage
alert is triggered when the average CPU usage of the Netdata instance exceeds 80% over a 1-minute period. This alert indicates that the Netdata instance is experiencing high CPU utilization, which may impact its performance and ability to collect and process metrics.
Impact #
High CPU usage on the Netdata instance can lead to:
- Delays or loss of metric data
- Increased latency in data processing and visualization
- Potential crashes or instability of the Netdata instance
- Impaired ability to monitor and troubleshoot system performance
Diagnosis #
To diagnose the root cause of high CPU usage on the Netdata instance:
- Check the Netdata instance logs for errors or warnings related to CPU usage.
- Verify that the system resources (CPU, memory, etc.) are sufficient to support the Netdata instance.
- Investigate any recent changes to the system configuration, plugins, or metric collection settings that may be contributing to the high CPU usage.
- Use Netdata’s built-in dashboards and charts to identify any specific metrics or plugins that are consuming excessive CPU resources.
Mitigation #
To mitigate high CPU usage on the Netdata instance:
- Optimize system resources: Ensure that the system has sufficient CPU and memory resources to support the Netdata instance.
- Adjust plugin settings: Review and adjust plugin settings to reduce CPU usage, such as reducing the frequency of metric collection or disabling unnecessary plugins.
- Implement caching: Consider implementing caching mechanisms to reduce the load on the Netdata instance.
- Upgrade Netdata: Ensure that the Netdata instance is running the latest version, as newer versions may include performance optimizations.
- Consider load balancing: If the Netdata instance is handling a high volume of metrics, consider load balancing the traffic across multiple instances to reduce the CPU load on individual instances.