CortexNotConnectedToAlertmanager #
Cortex not connected to Alertmanager (instance {{ $labels.instance }})
Alert Rule
alert: CortexNotConnectedToAlertmanager
annotations:
description: |-
Cortex not connected to Alertmanager (instance {{ $labels.instance }})
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/cortex-internal/cortexnotconnectedtoalertmanager/
summary: Cortex not connected to Alertmanager (instance {{ $labels.instance }})
expr: cortex_prometheus_notifications_alertmanagers_discovered < 1
for: 0m
labels:
severity: critical
Here is a runbook for the Prometheus alert rule:
Meaning #
The CortexNotConnectedToAlertmanager
alert is triggered when Cortex, a popular Prometheus-compatible metric store, is not connected to Alertmanager, a popular alerting component in the Prometheus ecosystem. This alert indicates that Cortex is not able to send alerts to Alertmanager, which is responsible for deduplicating, grouping, and sending notifications to users.
Impact #
The impact of this alert is that alerts generated by Cortex will not be sent to Alertmanager, and therefore will not be notified to users. This means that critical issues may go unnoticed, leading to potential service disruptions or data loss.
Diagnosis #
To diagnose this issue, follow these steps:
- Check the Cortex logs for any errors related to connecting to Alertmanager.
- Verify that the Alertmanager URL and credentials are correctly configured in Cortex.
- Check the network connectivity between Cortex and Alertmanager.
- Verify that Alertmanager is running and accepting connections.
Mitigation #
To mitigate this issue, follow these steps:
- Restart the Cortex service to re-establish the connection to Alertmanager.
- Verify that the Alertmanager URL and credentials are correctly configured in Cortex.
- Check the network connectivity between Cortex and Alertmanager and resolve any issues.
- If the issue persists, check the Alertmanager logs for any errors and resolve them accordingly.
- Consider increasing the logging level of Cortex to debug to gather more information about the issue.
Remember to also review the alert labels and values to determine the specific instance of Cortex that is not connected to Alertmanager.