RabbitmqDeadLetterQueueFillingUp #
Dead letter queue is filling up (> 10 msgs)
Alert Rule
alert: RabbitmqDeadLetterQueueFillingUp
annotations:
description: |-
Dead letter queue is filling up (> 10 msgs)
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/kbudde-rabbitmq-exporter/rabbitmqdeadletterqueuefillingup/
summary: RabbitMQ dead letter queue filling up (instance {{ $labels.instance }})
expr: rabbitmq_queue_messages{queue="my-dead-letter-queue"} > 10
for: 1m
labels:
severity: warning
Here is a runbook for the RabbitMQ Dead Letter Queue Filling Up alert:
Meaning #
The RabbitMQ Dead Letter Queue Filling Up alert is triggered when the number of messages in the dead letter queue exceeds 10. This queue is a special queue in RabbitMQ where messages are sent when they cannot be delivered to their intended queue. A growing dead letter queue can indicate issues with message processing, network connectivity, or configuration problems.
Impact #
If left unchecked, a growing dead letter queue can lead to:
- Increased memory usage and potential crashes of RabbitMQ nodes
- Delayed or lost messages
- Increased latency for message processing
- Difficulty in troubleshooting and debugging issues due to the accumulating messages in the dead letter queue
Diagnosis #
To diagnose the issue, follow these steps:
- Check the RabbitMQ logs for errors or warnings related to message processing or network connectivity.
- Verify that the dead letter queue is not being drained or consumed by a consumer.
- Check the RabbitMQ node metrics for any signs of memory pressure or high CPU usage.
- Investigate recent changes to the RabbitMQ configuration, message producers, or consumers.
- Use the RabbitMQ management UI or CLI tools to inspect the dead letter queue and its contents.
Mitigation #
To mitigate the issue, follow these steps:
- Identify and fix the root cause of messages being sent to the dead letter queue (e.g., fix message serialization issues, network connectivity problems, or configuration errors).
- Configure a dead letter queue consumer or drain to process and remove messages from the queue.
- Monitor the RabbitMQ node metrics and adjust resource allocation as needed to prevent memory pressure or high CPU usage.
- Consider implementing retries or circuit breakers in message producers to prevent overloading the dead letter queue.
- Verify that the dead letter queue is being properly monitored and alerted on to prevent future occurrences.