PulsarSubscriptionVeryHighNumberOfBacklogEntries #
The number of subscription backlog entries is over 100k
Alert Rule
alert: PulsarSubscriptionVeryHighNumberOfBacklogEntries
annotations:
description: |-
The number of subscription backlog entries is over 100k
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/pulsar-internal/pulsarsubscriptionveryhighnumberofbacklogentries/
summary: Pulsar subscription very high number of backlog entries (instance {{ $labels.instance
}})
expr: sum(pulsar_subscription_back_log) by (subscription) > 100000
for: 1h
labels:
severity: critical
Here is a sample runbook for the Prometheus alert rule:
Meaning #
This alert is triggered when the total number of backlog entries for a Pulsar subscription exceeds 100,000. This indicates that the subscription is not able to keep up with the incoming messages, leading to a large backlog of unprocessed messages.
Impact #
A high backlog of unprocessed messages can cause:
- Delays in processing messages, potentially leading to data loss or staleness
- Increased memory usage on the Pulsar brokers, potentially leading to performance issues or even crashes
- Decreased overall system performance and reliability
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Pulsar subscription logs to identify the root cause of the backlog buildup.
- Verify that the subscription is properly configured and that the consumers are functioning correctly.
- Check the message rate and size to identify if there are any unusual patterns or spikes.
- Verify that the Pulsar brokers have sufficient resources (e.g., memory, CPU) to handle the message load.
Mitigation #
To mitigate the issue, follow these steps:
- Increase the number of consumers for the subscription to help process the backlog.
- Check for any message processing bottlenecks and optimize the processing pipeline as needed.
- Consider increasing the resources (e.g., memory, CPU) available to the Pulsar brokers to handle the message load.
- Implement message retention policies to prevent backlog buildup in the future.
Note: This is just a sample runbook, and you may need to customize it to fit your specific use case and environment.