BlackboxProbeFailed #

Probe failed

Alert Rule

alert: BlackboxProbeFailed
annotations:
  description: |-
    Probe failed
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/blackbox-exporter/blackboxprobefailed/
  summary: Blackbox probe failed (instance {{ $labels.instance }})
expr: probe_success == 0
for: 0m
labels:
  severity: critical

Here is a runbook for the Prometheus alert rule “BlackboxProbeFailed”:

Meaning #

The BlackboxProbeFailed alert is triggered when a blackbox probe fails to return a successful response. This probe is used to monitor the availability and responsiveness of a specific endpoint or service. When this alert is triggered, it indicates that the probe was unable to connect to the endpoint or receive a valid response, which may indicate a problem with the service or endpoint being monitored.

Impact #

The impact of this alert varies depending on the specific use case and service being monitored. However, in general, a failed blackbox probe can indicate:

Loss of service availability or responsiveness
Potential outage or downtime for users
Increased latency or errors for dependent systems
Possible security concerns if the probe is monitoring a critical security endpoint

Diagnosis #

To diagnose the root cause of the BlackboxProbeFailed alert, follow these steps:

Check the blackbox exporter logs for errors or issues related to the probe
Verify that the endpoint or service being monitored is reachable and responding correctly
Check for any network connectivity issues or firewall rules that may be blocking the probe
Verify that the probe configuration is correct and up-to-date
Check for any recent changes or updates to the service or endpoint being monitored that may have caused the probe to fail

Mitigation #

To mitigate the effects of the BlackboxProbeFailed alert, follow these steps:

Restart the blackbox exporter service to retry the probe
Investigate and resolve any network connectivity issues or firewall rules that may be blocking the probe
Update the probe configuration to ensure it is correct and up-to-date
Verify that the service or endpoint being monitored is available and responsive
Notify relevant teams or stakeholders of the issue and work to resolve it as quickly as possible to minimize impact to users.

Additionally, consider implementing measures to prevent similar issues in the future, such as:

Implementing redundancy or failover mechanisms for critical services or endpoints
Improving monitoring and logging for the blackbox exporter and dependent systems
Conducting regular maintenance and testing of the probe and endpoint configurations