PulsarTopicLargeBacklogStorageSize #
The topic backlog storage size is over 5 GB
Alert Rule
alert: PulsarTopicLargeBacklogStorageSize
annotations:
description: |-
The topic backlog storage size is over 5 GB
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/pulsar-internal/pulsartopiclargebacklogstoragesize/
summary: Pulsar topic large backlog storage size (instance {{ $labels.instance }})
expr: sum(pulsar_storage_size > 5*1024*1024*1024) by (topic)
for: 1h
labels:
severity: warning
Here is a sample runbook for the Prometheus alert rule PulsarTopicLargeBacklogStorageSize
:
Meaning #
This alert is triggered when the storage size of a Pulsar topic’s backlog exceeds 5 GB. This indicates that the topic is experiencing a backlog of unprocessed messages, which can lead to performance issues, increased latency, and potentially even data loss.
Impact #
The impact of this alert is moderate to high, as a large backlog of unprocessed messages can:
- Cause performance issues and increased latency for producers and consumers
- Lead to data loss if the backlog is not addressed in a timely manner
- Potentially cause other downstream systems to fail or become unstable
Diagnosis #
To diagnose the root cause of this alert, follow these steps:
- Identify the affected Pulsar topic using the
topic
label in the alert. - Check the Pulsar topic’s configuration to ensure that it is properly sized for the expected message volume.
- Investigate the producer and consumer configurations to ensure that they are functioning correctly.
- Check the Pulsar cluster’s overall performance and resource utilization to ensure that it is not experiencing any bottlenecks.
Mitigation #
To mitigate this alert, follow these steps:
- Increase the storage capacity of the affected Pulsar topic to accommodate the backlog of messages.
- Identify and address any issues with the producer or consumer configurations that may be contributing to the backlog.
- Consider implementing a message retention policy to automatically remove older messages from the topic.
- Monitor the Pulsar topic’s storage size and adjust the retention policy as needed to prevent future backlogs.
Additional resources: