LokiRequestPanic #
The {{ $labels.job }} is experiencing {{ printf “%.2f” $value }}% increase of panics
Alert Rule
alert: LokiRequestPanic
annotations:
description: |-
The {{ $labels.job }} is experiencing {{ printf "%.2f" $value }}% increase of panics
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/loki-internal/lokirequestpanic/
summary: Loki request panic (instance {{ $labels.instance }})
expr: sum(increase(loki_panic_total[10m])) by (namespace, job) > 0
for: 5m
labels:
severity: critical
Here is a runbook for the LokiRequestPanic alert rule:
Meaning #
The LokiRequestPanic alert is triggered when there is a significant increase in panics within Loki requests within a 10-minute window, indicating a critical issue with the Loki service.
Impact #
The impact of this alert is high, as it can result in:
- Loki requests failing, leading to incomplete or missing log data
- Increased latency and errors in the logging pipeline
- Potential cascading failures in dependent systems
Diagnosis #
To diagnose the root cause of the LokiRequestPanic alert, follow these steps:
- Check the Loki logs for errors related to the panic
- Verify that the Loki service is running and configured correctly
- Investigate recent changes to the Loki configuration or deployment
- Review system metrics for resource utilization and potential bottlenecks
- Check for any signs of resource exhaustion or overload
Mitigation #
To mitigate the LokiRequestPanic alert, follow these steps:
- Roll back recent changes to the Loki configuration or deployment
- Increase resources (e.g. CPU, memory) allocated to the Loki service
- Implement rate limiting or queuing to handle high traffic volumes
- Restart the Loki service to clear out any stuck requests
- Consider implementing retry logic or circuit breakers to handle failed requests
Note: For more detailed steps and context-specific instructions, refer to the runbook URL provided in the alert annotation: https://github.com/srerun/prometheus-alerts/blob/main/content/runbooks/loki-internal/LokiRequestPanic.md