PrometheusTsdbWalTruncationsFailed #
Prometheus encountered {{ $value }} TSDB WAL truncation failures
Alert Rule
alert: PrometheusTsdbWalTruncationsFailed
annotations:
description: |-
Prometheus encountered {{ $value }} TSDB WAL truncation failures
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-self-monitoring-internal/prometheustsdbwaltruncationsfailed/
summary: Prometheus TSDB WAL truncations failed (instance {{ $labels.instance }})
expr: increase(prometheus_tsdb_wal_truncations_failed_total[1m]) > 0
for: 0m
labels:
severity: critical
Here is a runbook for the PrometheusTsdbWalTruncationsFailed alert rule:
Meaning #
The PrometheusTsdbWalTruncationsFailed alert is triggered when Prometheus encounters failures during the truncation of its Write-Ahead Log (WAL). The WAL is a critical component of Prometheus’s storage system, and failures during truncation can lead to data loss or inconsistencies.
Impact #
If this alert is triggered, it may indicate that Prometheus is experiencing issues with its storage system, which can lead to:
- Data loss or inconsistencies
- Inaccurate or incomplete metrics
- Performance degradation
- Increased risk of data corruption
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Prometheus logs for errors related to WAL truncation.
- Verify that the disk space is sufficient and not running out.
- Check the disk I/O performance and verify that it’s within acceptable limits.
- Verify that the WAL directory is not corrupted and is writable.
- Check the Prometheus configuration to ensure that it’s correctly configured for WAL truncation.
Mitigation #
To mitigate the issue, follow these steps:
- Check the disk space and ensure that it’s sufficient.
- Restart the Prometheus instance to see if it resolves the issue.
- Verify that the WAL directory is not corrupted and is writable.
- Check the disk I/O performance and optimize it if necessary.
- If the issue persists, consider increasing the
storage.tsdb.retention
configuration option to reduce the frequency of WAL truncations. - Consider increasing the
storage.tsdb.wal-compression
configuration option to reduce the WAL size. - If none of the above steps resolve the issue, consider seeking assistance from a Prometheus administrator or a qualified engineer.