PulsarLargeMessagePayload #
Observing large message payload (> 1MB)
Alert Rule
alert: PulsarLargeMessagePayload
annotations:
description: |-
Observing large message payload (> 1MB)
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/pulsar-internal/pulsarlargemessagepayload/
summary: Pulsar large message payload (instance {{ $labels.instance }})
expr: sum(pulsar_entry_size_overflow > 0) by (topic)
for: 1h
labels:
severity: warning
Here is a runbook for the Prometheus alert rule:
Meaning #
This alert rule is triggered when the sum of Pulsar entries with a size overflow (larger than 1MB) is greater than 0 for a specific topic over a 1-hour period. This indicates that there are large message payloads being sent to Pulsar, which can cause performance issues and increased storage usage.
Impact #
- Performance degradation: Large message payloads can slow down Pulsar’s processing and lead to increased latency.
- Storage usage increase: Large messages can consume a significant amount of storage space, leading to increased storage costs and potentially causing storage capacity issues.
- Potential data loss: If the large messages are not processed correctly, there is a risk of data loss or corruption.
Diagnosis #
To diagnose the issue, follow these steps:
- Identify the topic: Check the
topic
label in the alert to identify which topic is experiencing the large message payloads. - Check message sizes: Use Pulsar’s built-in metrics or a tool like
pulsar-admin
to check the message sizes for the affected topic. - Investigate message content: Analyze the content of the large messages to determine if they are legitimate or if there is an issue with the message producer.
- Check producer configuration: Verify that the message producer is configured correctly and that there are no issues with the producer’s message formatting or serialization.
Mitigation #
To mitigate the issue, follow these steps:
- Adjust producer configuration: If the large messages are due to incorrect producer configuration, adjust the configuration to limit message sizes or implement message compression.
- Optimize message serialization: If the large messages are due to inefficient serialization, optimize the serialization format to reduce message sizes.
- Increase storage capacity: If the large messages are legitimate, consider increasing Pulsar’s storage capacity to accommodate the larger message sizes.
- Implement message deduplication: If the large messages are due to duplicate messages, implement message deduplication to reduce the overall message size.