NatsHighNumberOfSubscriptions #
NATS server has more than 1000 active subscriptions
Alert Rule
alert: NatsHighNumberOfSubscriptions
annotations:
description: |-
NATS server has more than 1000 active subscriptions
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/nats-exporter/natshighnumberofsubscriptions/
summary: Nats high number of subscriptions (instance {{ $labels.instance }})
expr: gnatsd_connz_subscriptions > 1000
for: 5m
labels:
severity: warning
Here is a runbook for the Prometheus alert rule “NatsHighNumberOfSubscriptions”:
Meaning #
The “NatsHighNumberOfSubscriptions” alert is triggered when the number of active subscriptions on a NATS server exceeds 1000. This alert indicates that the NATS server is experiencing a high load, which can lead to performance issues and increased latency.
Impact #
A high number of subscriptions on a NATS server can have several impacts on the system:
- Increased latency: With a large number of subscriptions, the NATS server may take longer to process messages, leading to increased latency and slower response times.
- Decreased performance: The server may become overwhelmed, causing message loss, duplication, or other issues.
- Resource utilization: A high number of subscriptions can lead to increased memory and CPU usage, potentially causing resource shortages.
Diagnosis #
To diagnose the issue, follow these steps:
- Check the NATS server metrics: Verify that the
gnatsd_connz_subscriptions
metric is indeed above 1000. - Identify the source of the subscriptions: Investigate which applications or services are creating the subscriptions and why.
- Review NATS server configuration: Check the NATS server configuration to ensure it is properly tuned for the current load.
- Monitor system resources: Verify that the NATS server has sufficient resources (CPU, memory, etc.) to handle the load.
Mitigation #
To mitigate the issue, follow these steps:
- Identify and optimize subscription usage: Work with the application teams to optimize subscription usage and reduce the number of subscriptions.
- Increase NATS server resources: Consider increasing the resources (CPU, memory, etc.) available to the NATS server to handle the load.
- Implement subscription limits: Consider implementing subscription limits or quotas to prevent a single application or service from consuming too many resources.
- Monitor and adjust: Continuously monitor the NATS server metrics and adjust the configuration as needed to ensure the system remains stable and performant.