PrometheusNotConnectedToAlertmanager

PrometheusNotConnectedToAlertmanager #

Prometheus cannot connect the alertmanager

Alert Rule
alert: PrometheusNotConnectedToAlertmanager
annotations:
  description: |-
    Prometheus cannot connect the alertmanager
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-self-monitoring-internal/prometheusnotconnectedtoalertmanager/
  summary: Prometheus not connected to alertmanager (instance {{ $labels.instance
    }})
expr: prometheus_notifications_alertmanagers_discovered < 1
for: 0m
labels:
  severity: critical

Here is a runbook for the PrometheusNotConnectedToAlertmanager alert rule:

Meaning #

The PrometheusNotConnectedToAlertmanager alert is triggered when Prometheus is unable to connect to the Alertmanager instance. This is a critical alert as it means that Prometheus is not able to send alerts to the Alertmanager, which may cause important notifications to be lost.

Impact #

The impact of this alert is that Prometheus will not be able to send alerts to the Alertmanager, which may lead to:

  • Missed notifications for critical events
  • Delayed response to incidents
  • Incomplete incident response due to lack of notifications
  • Potential service disruptions or outages

Diagnosis #

To diagnose this issue, follow these steps:

  1. Check the Prometheus logs for errors related to connecting to the Alertmanager
  2. Verify that the Alertmanager is running and reachable from the Prometheus instance
  3. Check the Alertmanager configuration to ensure it is correctly set up to receive alerts from Prometheus
  4. Verify that the network connectivity between Prometheus and Alertmanager is not blocked by firewalls or other network issues

Mitigation #

To mitigate this issue, follow these steps:

  1. Restart the Prometheus instance to attempt to re-establish the connection to the Alertmanager
  2. Verify that the Alertmanager is correctly configured and running
  3. Check the network connectivity between Prometheus and Alertmanager and resolve any issues found
  4. If the issue persists, investigate further to determine the root cause of the connection issue and take corrective action

Additional resources: