ThanosCompactHasNotRun #
Thanos Compact {{$labels.job}} has not uploaded anything for 24 hours.
Alert Rule
alert: ThanosCompactHasNotRun
annotations:
description: |-
Thanos Compact {{$labels.job}} has not uploaded anything for 24 hours.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/thanos-compactor/thanoscompacthasnotrun/
summary: Thanos Compact Has Not Run (instance {{ $labels.instance }})
expr: (time() - max by (job) (max_over_time(thanos_objstore_bucket_last_successful_upload_time{job=~".*thanos-compact.*"}[24h])))
/ 60 / 60 > 24
for: 0m
labels:
severity: warning
Here is a runbook for the Prometheus alert rule “ThanosCompactHasNotRun”:
Meaning #
The “ThanosCompactHasNotRun” alert is triggered when a Thanos Compact instance has not uploaded data to the object store for more than 24 hours. This indicates that the compaction process is not functioning correctly, which can lead to data inconsistencies and potential data loss.
Impact #
If this alert is not addressed, it can lead to:
- Data inconsistencies and potential data loss
- Increased storage usage due to uncompact data
- Performance degradation of the Thanos system
- Inability to query or retrieve data from the object store
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Thanos Compact instance logs for errors or exceptions related to the upload process.
- Verify that the Thanos Compact instance is running and has not crashed or terminated abnormally.
- Check the object store configuration and credentials to ensure they are correct and up-to-date.
- Verify that the network connectivity between the Thanos Compact instance and the object store is stable and functional.
- Check the system resources (CPU, memory, disk space) to ensure they are not overutilized or exhausted.
Mitigation #
To mitigate the issue, follow these steps:
- Restart the Thanos Compact instance to ensure it is running correctly.
- Check and update the object store configuration and credentials as necessary.
- Verify that the network connectivity between the Thanos Compact instance and the object store is stable and functional.
- Check the system resources (CPU, memory, disk space) and allocate additional resources if necessary.
- Manually trigger a compaction upload to the object store to ensure the process is functioning correctly.
Note: If the issue persists after following these steps, it may be necessary to escalate the issue to a senior engineer or expert in Thanos Compact for further assistance.