CortexIngesterUnhealthy #

Cortex has an unhealthy ingester

Alert Rule

alert: CortexIngesterUnhealthy
annotations:
  description: |-
    Cortex has an unhealthy ingester
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/cortex-internal/cortexingesterunhealthy/
  summary: Cortex ingester unhealthy (instance {{ $labels.instance }})
expr: cortex_ring_members{state=&#34;Unhealthy&#34;, name=&#34;ingester&#34;} &gt; 0
for: 0m
labels:
  severity: critical

Here is a sample runbook for the CortexIngesterUnhealthy alert:

Meaning #

The CortexIngesterUnhealthy alert is triggered when one or more Cortex ingesters are reported as unhealthy by the Cortex ring. This indicates a critical issue with the Cortex cluster, as unhealthy ingesters can lead to data loss and errors in the system.

Impact #

The impact of an unhealthy ingester can be severe, leading to:

Data loss or corruption
Errors in query results
Increased latency and timeouts
Reduced system reliability and availability

Diagnosis #

To diagnose the issue, follow these steps:

Check the Cortex ring membership to identify the unhealthy ingester(s)
Review the ingester logs to determine the cause of the unhealthy state
Check the system metrics (e.g. CPU, memory, disk usage) to identify any resource issues
Verify that the ingester is properly configured and running with the correct version
Check for any network connectivity issues between the ingester and other Cortex components

Mitigation #

To mitigate the issue, follow these steps:

Restart the unhealthy ingester(s) to attempt to recover
If the issue persists, investigate and resolve any underlying causes (e.g. resource issues, configuration errors)
If necessary, replace the unhealthy ingester with a new instance
Verify that the Cortex cluster is functioning correctly and data is being ingested properly
Monitor the system closely to ensure the issue does not recur

Note: This is just a sample runbook, and you should tailor it to your specific environment and requirements.