RabbitmqNodeDown #

Less than 3 nodes running in RabbitMQ cluster

Alert Rule

alert: RabbitmqNodeDown
annotations:
  description: |-
    Less than 3 nodes running in RabbitMQ cluster
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/rabbitmq-exporter/rabbitmqnodedown/
  summary: RabbitMQ node down (instance {{ $labels.instance }})
expr: sum(rabbitmq_build_info) &lt; 3
for: 0m
labels:
  severity: critical

Here is a runbook for the RabbitmqNodeDown alert rule:

Meaning #

The RabbitmqNodeDown alert is triggered when the number of nodes running in the RabbitMQ cluster falls below 3. This means that one or more nodes in the cluster are not available, which can lead to reduced capacity, increased latency, and potential data loss.

Impact #

The impact of this alert is critical, as it can cause:

Reduced message processing capacity, leading to increased latency and potential message loss
Increased risk of data loss and inconsistencies between nodes
Potential business impact due to reduced system availability and reliability

Diagnosis #

To diagnose the issue, follow these steps:

Check the RabbitMQ cluster status using the RabbitMQ Management UI or the rabbitmqctl command-line tool
Verify the number of nodes running in the cluster and identify which nodes are down
Check the system logs for errors or warnings related to the down nodes
Verify that the RabbitMQ exporter is correctly configured and running

Mitigation #

To mitigate the issue, follow these steps:

Restart the down nodes or replace them if they are failed
Verify that the nodes are properly configured and connected to the cluster
Check for any software or hardware issues that may be preventing the nodes from running
Consider adding more nodes to the cluster to increase capacity and redundancy
Verify that the RabbitMQ exporter is correctly configured and running to ensure accurate monitoring and alerting.