BlackboxProbeHttpFailure #
HTTP status code is not 200-399
Alert Rule
alert: BlackboxProbeHttpFailure
annotations:
description: |-
HTTP status code is not 200-399
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/blackbox-exporter/blackboxprobehttpfailure/
summary: Blackbox probe HTTP failure (instance {{ $labels.instance }})
expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
for: 0m
labels:
severity: critical
Here is the runbook for the Prometheus alert rule:
Meaning #
The BlackboxProbeHttpFailure alert is triggered when a blackbox probe (a synthetic HTTP request) fails to receive a successful HTTP response (200-399) from a target instance. This alert indicates that the target instance is not responding as expected, which can impact the overall health and availability of the system.
Impact #
The impact of this alert can be significant, as it may indicate:
- Loss of visibility into system performance and health
- Unavailability of critical services or applications
- Potential data loss or corruption
- Degraded user experience
Diagnosis #
To diagnose the root cause of this alert, follow these steps:
- Check the blackbox exporter logs for errors or exceptions related to the failing probe.
- Verify that the target instance is reachable and responding to HTTP requests.
- Check the HTTP response code and body for any errors or clues.
- Review system and application logs for any signs of errors or issues.
- Check for any network connectivity issues or firewall rules that may be blocking the probe.
Mitigation #
To mitigate this alert, follow these steps:
- Restart the blackbox exporter service to reset the probe.
- Verify that the target instance is running and responding to HTTP requests.
- Check for any updates or patches to the blackbox exporter or target instance.
- Investigate and resolve any underlying issues causing the probe failure.
- Consider increasing the timeout or retries for the probe to make it more resilient to temporary failures.
Remember to update the runbook with specific details and procedures relevant to your environment and systems.