RabbitmqTooManyConnections #
The total connections of a node is too high
Alert Rule
alert: RabbitmqTooManyConnections
annotations:
description: |-
The total connections of a node is too high
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/rabbitmq-exporter/rabbitmqtoomanyconnections/
summary: RabbitMQ too many connections (instance {{ $labels.instance }})
expr: rabbitmq_connections > 1000
for: 2m
labels:
severity: warning
Here is a sample runbook for the Prometheus alert rule “RabbitmqTooManyConnections”:
Meaning #
This alert is triggered when the number of connections to a RabbitMQ node exceeds 1000. This can indicate a potential performance issue or resource overload on the node.
Impact #
A high number of connections to a RabbitMQ node can lead to:
- Decreased performance and throughput
- Increased memory usage and resource consumption
- Potential node crashes or instability
- Impacted message processing and delivery times
Diagnosis #
To diagnose the root cause of this issue, follow these steps:
- Check the RabbitMQ node’s connection statistics to identify the source of the high connection count.
- Review the node’s configuration to ensure it is properly sized and configured for the current workload.
- Verify that there are no issues with the application(s) producing messages to the RabbitMQ node.
- Check for any network connectivity issues or firewall rules that may be preventing connections from being closed.
- Review RabbitMQ logs for any error messages related to connection handling.
Mitigation #
To mitigate this issue, follow these steps:
- Reduce the number of connections to the RabbitMQ node by:
- Adjusting application configuration to reduce message production rates
- Implementing connection pooling or queuing mechanisms
- Increasing the node’s resources (e.g., RAM, CPU)
- Optimize RabbitMQ node configuration for better performance and resource utilization.
- Monitor RabbitMQ node statistics and adjust as necessary to prevent future connection count issues.
- Consider implementing RabbitMQ clustering or load balancing to distribute connections across multiple nodes.
- Verify that firewall rules are configured to allow connections to be closed properly.