NatsJetstreamConsumersExceeded #
JetStream has more than 100 active consumers
Alert Rule
alert: NatsJetstreamConsumersExceeded
annotations:
description: |-
JetStream has more than 100 active consumers
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/nats-exporter/natsjetstreamconsumersexceeded/
summary: Nats JetStream consumers exceeded (instance {{ $labels.instance }})
expr: sum(gnatsd_varz_jetstream_stats_accounts) > 100
for: 5m
labels:
severity: warning
Here is a runbook for the Prometheus alert rule NatsJetstreamConsumersExceeded
:
Meaning #
The NatsJetstreamConsumersExceeded
alert is triggered when the number of active JetStream consumers exceeds 100. This indicates that the system is experiencing high consumption rates, which may lead to performance issues, increased latency, and potential errors.
Impact #
The impact of this alert is moderate to high. If left unattended, high consumption rates can cause:
- Performance degradation
- Increased latency
- Errors and failures in message processing
- Increased resource utilization, leading to potential resource exhaustion
Diagnosis #
To diagnose the issue, follow these steps:
- Check the JetStream account statistics using the
gnatsd_varz_jetstream_stats_accounts
metric to identify the specific account(s) with excessive consumption. - Investigate the applications or services consuming messages from JetStream to determine the root cause of the increased consumption.
- Review the JetStream configuration to ensure that it is correctly sized for the expected workload.
- Analyze the system logs to identify any errors or issues related to message processing.
Mitigation #
To mitigate the issue, follow these steps:
- Identify and address the root cause of the increased consumption, such as:
- Optimizing application or service behavior to reduce message consumption rates.
- Implementing message batching or buffering to reduce the load on JetStream.
- Increasing the capacity of the JetStream cluster to handle the increased load.
- Implement rate limiting or quotas to prevent excessive consumption from individual applications or services.
- Monitor the system closely to ensure that the mitigation steps are effective and the consumption rates return to normal.
- Consider implementing automated scaling or alerting to prevent similar issues in the future.