ThanosReceiveHighForwardRequestFailures #
Thanos Receive {{$labels.job}} is failing to forward {{$value | humanize}}% of requests.
Alert Rule
alert: ThanosReceiveHighForwardRequestFailures
annotations:
description: |-
Thanos Receive {{$labels.job}} is failing to forward {{$value | humanize}}% of requests.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/thanos-receiver/thanosreceivehighforwardrequestfailures/
summary: Thanos Receive High Forward Request Failures (instance {{ $labels.instance
}})
expr: (sum by (job) (rate(thanos_receive_forward_requests_total{result="error", job=~".*thanos-receive.*"}[5m]))/ sum
by (job) (rate(thanos_receive_forward_requests_total{job=~".*thanos-receive.*"}[5m])))
* 100 > 20
for: 5m
labels:
severity: info
Here is a runbook for the Prometheus alert rule ThanosReceiveHighForwardRequestFailures
:
Meaning #
The ThanosReceiveHighForwardRequestFailures
alert is triggered when the percentage of failed forward requests in Thanos Receive exceeds 20% in a 5-minute window. This indicates that Thanos Receive is experiencing issues while forwarding requests, which can lead to data loss or inconsistencies.
Impact #
The impact of this alert is moderate to high, as failed forward requests can result in:
- Data loss or inconsistencies
- Delayed or incomplete data availability
- Increased latency or errors in dependent systems
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Thanos Receive logs for errors or exceptions related to forwarding requests.
- Verify the configuration of Thanos Receive and ensure that it is correctly set up to forward requests.
- Check the network connectivity and latency between Thanos Receive and the dependent systems.
- Investigate any recent changes or updates to the Thanos Receive configuration or dependent systems.
- Review the Prometheus metrics to identify any patterns or trends in the failed forward requests.
Mitigation #
To mitigate the issue, follow these steps:
- Check the Thanos Receive configuration and ensure that it is correctly set up to forward requests.
- Restart the Thanos Receive service to ensure that it is running with the correct configuration.
- Verify the network connectivity and latency between Thanos Receive and the dependent systems.
- Implement temporary workarounds, such as increasing the retry count or timeout, to reduce the impact of failed forward requests.
- Investigate and address any underlying issues or bugs in Thanos Receive or dependent systems.
Additional resources: