NatsHighPendingBytes #
High number of NATS pending bytes
Alert Rule
alert: NatsHighPendingBytes
annotations:
description: |-
High number of NATS pending bytes ({{ $value }}) for {{ $labels.instance }}
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/nats-exporter/natshighpendingbytes/
summary: Nats high pending bytes (instance {{ $labels.instance }})
expr: gnatsd_connz_pending_bytes > 100000
for: 3m
labels:
severity: warning
alert: NatsHighPendingBytes
annotations:
description: |-
NATS server has more than 100,000 pending bytes
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/nats-exporter/natshighpendingbytes/
summary: Nats high pending bytes (instance {{ $labels.instance }})
expr: gnatsd_connz_pending_bytes > 100000
for: 5m
labels:
severity: warning
Meaning #
This alert triggers when the number of pending bytes for NATS connections exceeds 100,000 bytes for a duration of 3 minutes. Pending bytes represent messages waiting to be delivered to clients, indicating potential performance issues or bottlenecks in message processing.
Impact #
- High pending bytes may cause delays in message delivery, impacting downstream systems or applications relying on NATS for real-time data.
- Prolonged high pending bytes can lead to resource exhaustion in the NATS server, including increased memory usage and potential message loss.
- It may indicate a slow or unresponsive client, insufficient server resources, or network congestion.
Diagnosis #
Check the Alert Details:
- Review the alert annotations to identify the affected instance and the current value of pending bytes.
- Example:
- Instance:
{{ $labels.instance }}
- Pending Bytes:
{{ $value }}
- Instance:
Inspect Client Connections:
- Access the NATS management interface or use monitoring tools to view detailed connection statistics.
- Identify clients with high pending bytes and verify if they are slow or unresponsive.
Review Resource Utilization:
- Check the NATS server’s CPU, memory, and network usage to ensure sufficient resources are available.
Check Application Logs:
- Look for errors or warnings in the logs of applications connected to NATS.
- Identify any issues with message processing or acknowledgment.
Network Analysis:
- Analyze network performance between the NATS server and clients to identify potential bottlenecks.
Mitigation #
Address Client Issues:
- Restart or fix any unresponsive or slow clients causing the high pending bytes.
- Optimize client applications to process messages more efficiently.
Scale NATS Infrastructure:
- Add more NATS server instances or increase resources (CPU, memory) for existing servers.
- Consider implementing clustering for better load distribution.
Optimize Message Flow:
- Reduce the volume of messages published to NATS if unnecessary or redundant data is being sent.
- Adjust NATS server and client configuration settings, such as buffer sizes, to better handle the message load.
Network Improvements:
- Investigate and resolve any network-related issues, such as high latency or packet loss, between the NATS server and clients.
Clear Message Backlog:
- If safe, purge undelivered messages for non-critical topics or queues to reduce pending bytes.
Preventative Measures #
- Implement proactive monitoring for NATS metrics such as pending bytes, message rates, and resource utilization.
- Establish alert thresholds and response plans tailored to your system’s load patterns.
- Regularly test and validate client and server configurations for optimal performance.
- Ensure adequate resource allocation for NATS servers and clients based on anticipated workloads.