RabbitmqTooManyUnackMessages #
Too many unacknowledged messages
Alert Rule
alert: RabbitmqTooManyUnackMessages
annotations:
  description: |-
    Too many unacknowledged messages
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/rabbitmq-exporter/rabbitmqtoomanyunackmessages/
  summary: RabbitMQ too many unack messages (instance {{ $labels.instance }})
expr: sum(rabbitmq_queue_messages_unacked) BY (queue) > 1000
for: 1m
labels:
  severity: warning
Here is a runbook for the Prometheus alert rule RabbitmqTooManyUnackMessages:
Meaning #
This alert is triggered when the number of unacknowledged messages in a RabbitMQ queue exceeds 1000. This indicates that messages are not being processed or acknowledged in a timely manner, which can lead to message loss, queue buildup, and performance issues.
Impact #
The impact of this alert can be significant, as unacknowledged messages can lead to:
- Message loss: If messages are not acknowledged, they may be lost if the RabbitMQ node restarts or crashes.
- Queue buildup: Unacknowledged messages can cause queues to build up, leading to increased memory usage, slower performance, and potentially even queue crashes.
- Performance issues: Unacknowledged messages can cause RabbitMQ to slow down, leading to delays in message processing and potential timeouts.
Diagnosis #
To diagnose the issue, follow these steps:
- Check the RabbitMQ queue metrics to identify the specific queue(s) with the high number of unacknowledged messages.
- Investigate the application or service that is consuming from the affected queue(s) to determine if there are any issues or slowdowns.
- Check the RabbitMQ node logs for any errors or exceptions related to message processing or acknowledgment.
- Verify that the RabbitMQ node is properly configured and that the Erlang processes are running correctly.
Mitigation #
To mitigate the issue, follow these steps:
- Investigate and resolve any issues with the application or service that is consuming from the affected queue(s).
- Consider increasing the number of workers or consumers to process the backlog of messages.
- Monitor the RabbitMQ queue metrics to ensure that the number of unacknowledged messages is decreasing.
- Consider implementing a dead-letter queue to handle messages that cannot be processed or acknowledged.
- If necessary, restart the RabbitMQ node or Erlang processes to recover from any potential issues.
Note: This runbook is just a starting point, and the specific steps and procedures may vary depending on your environment and setup.