PrometheusRuleEvaluationSlow #
Prometheus rule evaluation took more time than the scheduled interval. It indicates a slower storage backend access or too complex query.
Alert Rule
alert: PrometheusRuleEvaluationSlow
annotations:
description: |-
Prometheus rule evaluation took more time than the scheduled interval. It indicates a slower storage backend access or too complex query.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-self-monitoring-internal/prometheusruleevaluationslow/
summary: Prometheus rule evaluation slow (instance {{ $labels.instance }})
expr: prometheus_rule_group_last_duration_seconds > prometheus_rule_group_interval_seconds
for: 5m
labels:
severity: warning
Here is a runbook for the PrometheusRuleEvaluationSlow alert:
Meaning #
The PrometheusRuleEvaluationSlow alert is triggered when the time it takes to evaluate Prometheus rules exceeds the scheduled interval. This can indicate a slower storage backend access or overly complex queries.
Impact #
If left unchecked, slow rule evaluation can lead to:
- Delays in alerting and notification delivery
- Increased load on the Prometheus server
- Potential loss of Prometheus data due to slow processing
- Inability to detect issues in a timely manner
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Prometheus server logs for any errors or slowdowns related to rule evaluation.
- Investigate the storage backend performance, such as disk I/O or database query times.
- Review the complexity of the rules being evaluated, looking for any unnecessary or inefficient queries.
- Check the
prometheus_rule_group_last_duration_seconds
andprometheus_rule_group_interval_seconds
metrics to understand the scale of the issue.
Mitigation #
To mitigate the issue, follow these steps:
- Optimize the storage backend configuration for better performance.
- Simplify or optimize complex rules to reduce evaluation time.
- Increase the
prometheus_rule_group_interval_seconds
to give the server more time to evaluate rules. - Consider scaling up the Prometheus server or distributing the load across multiple servers.
- Implement a more efficient storage solution, such as(SSD) or a distributed database.