NatsHighConnectionCount #

High number of NATS connections ({{ $value }}) for {{ $labels.instance }}

Alert Rule

alert: NatsHighConnectionCount
annotations:
  description: |-
    High number of NATS connections ({{ $value }}) for {{ $labels.instance }}
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/nats-exporter/natshighconnectioncount/
  summary: Nats high connection count (instance {{ $labels.instance }})
expr: gnatsd_varz_connections &gt; 100
for: 3m
labels:
  severity: warning

Here is a sample runbook for the Prometheus alert rule NatsHighConnectionCount:

Meaning #

The NatsHighConnectionCount alert is triggered when the number of connections to a NATS instance exceeds 100. This alert is categorized as a warning, indicating a potential issue that may impact system performance.

Impact #

A high number of connections to a NATS instance can lead to:

Increased memory usage, potentially causing performance degradation or even crashes.
Slowdowns in message processing, leading to delayed or lost messages.
Increased latency and decreased overall system responsiveness.

Diagnosis #

To diagnose the root cause of the high connection count, perform the following steps:

Check the NATS instance’s configuration to ensure it is properly tuned for the expected load.
Investigate the applications or services connected to the NATS instance to identify any issues or misconfigurations.
Review the NATS instance’s logs to identify any errors or warnings related to connection handling.
Use the gnatsd_varz metrics to monitor the connection count and identify any trends or patterns.

Mitigation #

To mitigate the high connection count, perform the following steps:

Identify and disconnect any unnecessary or idle connections to the NATS instance.
Optimize the NATS instance’s configuration to handle the current load, if necessary.
Implement connection pooling or other load-reducing measures to minimize the number of connections.
Consider increasing the resources allocated to the NATS instance, such as increasing the instance size or adding more nodes to the cluster.

Remember to monitor the NATS instance’s performance and adjust the mitigation steps as necessary to ensure the connection count returns to a normal level.