PrometheusTargetMissing #
A Prometheus target has disappeared. An exporter might be crashed.
Alert Rule
alert: PrometheusTargetMissing
annotations:
description: |-
A Prometheus target has disappeared. An exporter might be crashed.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-self-monitoring-internal/prometheustargetmissing/
summary: Prometheus target missing (instance {{ $labels.instance }})
expr: up == 0
for: 0m
labels:
severity: critical
Here is a runbook for the PrometheusTargetMissing alert:
Meaning #
The PrometheusTargetMissing alert is triggered when a Prometheus target (an instance that exposes metrics to Prometheus) becomes unresponsive or disappears. This can indicate a problem with the exporter (the component that exposes the metrics) or the underlying system.
Impact #
The impact of this alert is high, as it means that Prometheus is no longer receiving metrics from the affected target. This can lead to:
- Loss of visibility into the system’s performance and health
- Delayed detection of potential issues
- Inability to trigger alerts and notifications based on metrics from the missing target
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Prometheus target’s logs for errors or signs of a crash
- Verify that the exporter is running and configured correctly
- Check the network connectivity between Prometheus and the target
- Investigate any recent changes to the system or exporter configuration
- Review the Prometheus metrics to identify any trends or patterns that may indicate the cause of the issue
Mitigation #
To mitigate the issue, follow these steps:
- Restart the exporter service (if it’s not already running)
- Verify that the exporter is configured correctly and pointing to the correct metrics endpoint
- Check for any firewall or network connectivity issues between Prometheus and the target
- If the issue persists, consider redeploying the exporter or restarting the underlying system
- Investigate and address any underlying causes of the issue, such as resource constraints or configuration errors.