PrometheusTsdbWalTruncationsFailed #
Prometheus encountered {{ $value }} TSDB WAL truncation failures
Alert Rule
alert: PrometheusTsdbWalTruncationsFailed
annotations:
  description: |-
    Prometheus encountered {{ $value }} TSDB WAL truncation failures
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-self-monitoring-internal/prometheustsdbwaltruncationsfailed/
  summary: Prometheus TSDB WAL truncations failed (instance {{ $labels.instance }})
expr: increase(prometheus_tsdb_wal_truncations_failed_total[1m]) > 0
for: 0m
labels:
  severity: critical
Here is a runbook for the PrometheusTsdbWalTruncationsFailed alert rule:
Meaning #
The PrometheusTsdbWalTruncationsFailed alert is triggered when Prometheus encounters failures during the truncation of its Write-Ahead Log (WAL). The WAL is a critical component of Prometheus’s storage system, and failures during truncation can lead to data loss or inconsistencies.
Impact #
If this alert is triggered, it may indicate that Prometheus is experiencing issues with its storage system, which can lead to:
- Data loss or inconsistencies
 - Inaccurate or incomplete metrics
 - Performance degradation
 - Increased risk of data corruption
 
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Prometheus logs for errors related to WAL truncation.
 - Verify that the disk space is sufficient and not running out.
 - Check the disk I/O performance and verify that it’s within acceptable limits.
 - Verify that the WAL directory is not corrupted and is writable.
 - Check the Prometheus configuration to ensure that it’s correctly configured for WAL truncation.
 
Mitigation #
To mitigate the issue, follow these steps:
- Check the disk space and ensure that it’s sufficient.
 - Restart the Prometheus instance to see if it resolves the issue.
 - Verify that the WAL directory is not corrupted and is writable.
 - Check the disk I/O performance and optimize it if necessary.
 - If the issue persists, consider increasing the 
storage.tsdb.retentionconfiguration option to reduce the frequency of WAL truncations. - Consider increasing the 
storage.tsdb.wal-compressionconfiguration option to reduce the WAL size. - If none of the above steps resolve the issue, consider seeking assistance from a Prometheus administrator or a qualified engineer.