RabbitmqMemoryHigh #
A node use more than 90% of allocated RAM
Alert Rule
alert: RabbitmqMemoryHigh
annotations:
description: |-
A node use more than 90% of allocated RAM
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/rabbitmq-exporter/rabbitmqmemoryhigh/
summary: RabbitMQ memory high (instance {{ $labels.instance }})
expr: rabbitmq_process_resident_memory_bytes / rabbitmq_resident_memory_limit_bytes
* 100 > 90
for: 2m
labels:
severity: warning
Meaning #
The RabbitmqMemoryHigh alert is triggered when the resident memory usage of a RabbitMQ node exceeds 90% of the allocated RAM. This alert indicates that the RabbitMQ instance is consuming a large amount of memory, which can lead to performance issues, slow message processing, and even node crashes.
Impact #
- Performance degradation: High memory usage can slow down message processing, leading to increased latency and decreased throughput.
- Node instability: Prolonged high memory usage can cause the RabbitMQ node to crash, resulting in message loss and downtime.
- Cascading failures: If multiple nodes experience high memory usage, it can lead to a cascading failure of the entire RabbitMQ cluster.
Diagnosis #
To diagnose the issue, follow these steps:
- Check the RabbitMQ node’s memory usage: Verify the current memory usage using the Prometheus metric
rabbitmq_process_resident_memory_bytes
. - Identify the root cause: Investigate the cause of high memory usage, such as:
- Increased message volume
- Large message sizes
- Inefficient message processing
- Resource-intensive plugins or extensions
- Review RabbitMQ configuration: Check the RabbitMQ configuration to ensure that it is optimized for the current workload.
- Monitor node performance: Observe the node’s performance metrics, such as CPU usage, to determine if there are any underlying performance issues.
Mitigation #
To mitigate the issue, follow these steps:
- Reduce message volume: Implement measures to reduce the message volume, such as:
- Load balancing
- Message filtering
- Queue length reduction
- Optimize message processing: Improve message processing efficiency by:
- Optimizing plugin configurations
- Implementing message batching
- Reducing message sizes
- Increase allocated RAM: Consider increasing the allocated RAM for the RabbitMQ node to ensure it has sufficient resources.
- Restart the RabbitMQ node: If the issue persists, restart the RabbitMQ node to clear out any temporary memory allocations.
- Implement permanent fixes: Implement permanent fixes, such as optimization of RabbitMQ configuration, plugin updates, or architecture changes, to prevent similar issues in the future.