PulsarSubscriptionHighNumberOfBacklogEntries #
The number of subscription backlog entries is over 5k
Alert Rule
alert: PulsarSubscriptionHighNumberOfBacklogEntries
annotations:
description: |-
The number of subscription backlog entries is over 5k
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/pulsar-internal/pulsarsubscriptionhighnumberofbacklogentries/
summary: Pulsar subscription high number of backlog entries (instance {{ $labels.instance
}})
expr: sum(pulsar_subscription_back_log) by (subscription) > 5000
for: 1h
labels:
severity: warning
Meaning #
The PulsarSubscriptionHighNumberOfBacklogEntries alert is triggered when the number of backlog entries for a Pulsar subscription exceeds 5000. This alert indicates that the subscription is experiencing a high volume of unprocessed messages, which can lead to performance issues, latency, and potential message loss.
Impact #
The impact of this alert can be significant, as a high number of backlog entries can:
- Cause delays in message processing, affecting the overall performance of the system
- Lead to increased memory usage, potentially causing node crashes or restarts
- Result in message loss or duplication, affecting data integrity and consistency
- Affect the overall reliability and availability of the Pulsar cluster
Diagnosis #
To diagnose the cause of this alert, follow these steps:
- Check the Pulsar subscription configuration to ensure it is properly configured and scaled to handle the expected message volume.
- Verify that the Pulsar broker nodes have sufficient resources (e.g., CPU, memory, and disk space) to handle the backlog.
- Investigate if there are any issues with the message producers or consumers that may be contributing to the backlog.
- Check the Pulsar metrics to identify any trends or patterns that may indicate the root cause of the issue.
- Review the Pulsar subscription logs to identify any errors or exceptions that may be related to the backlog.
Mitigation #
To mitigate this alert, follow these steps:
- Increase the resources (e.g., CPU, memory, and disk space) of the Pulsar broker nodes to handle the backlog.
- Adjust the Pulsar subscription configuration to optimize message processing and reduce the backlog.
- Implement message consolidation or deduplication to reduce the volume of messages being processed.
- Investigate and resolve any issues with message producers or consumers that may be contributing to the backlog.
- Monitor the Pulsar subscription metrics and logs to ensure the backlog is decreasing and the system is returning to a stable state.