ThanosStoreSeriesGateLatencyHigh #
Thanos Store {{$labels.job}} has a 99th percentile latency of {{$value}} seconds for store series gate requests.
Alert Rule
alert: ThanosStoreSeriesGateLatencyHigh
annotations:
description: |-
Thanos Store {{$labels.job}} has a 99th percentile latency of {{$value}} seconds for store series gate requests.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/thanos-store/thanosstoreseriesgatelatencyhigh/
summary: Thanos Store Series Gate Latency High (instance {{ $labels.instance }})
expr: (histogram_quantile(0.99, sum by (job, le) (rate(thanos_bucket_store_series_gate_duration_seconds_bucket{job=~".*thanos-store.*"}[5m])))
> 2 and sum by (job) (rate(thanos_bucket_store_series_gate_duration_seconds_count{job=~".*thanos-store.*"}[5m]))
> 0)
for: 10m
labels:
severity: warning
Here is a runbook for the ThanosStoreSeriesGateLatencyHigh alert rule:
Meaning #
The ThanosStoreSeriesGateLatencyHigh alert is triggered when the 99th percentile latency of store series gate requests in Thanos Store exceeds 2 seconds and there are more than 0 requests in the last 5 minutes. This indicates that the Thanos Store is experiencing high latency when handling store series gate requests, which may impact the overall performance and reliability of the system.
Impact #
High latency in store series gate requests can cause:
- Increased response times for queries and writes to the Thanos Store
- Reduced throughput and performance of dependent systems
- Potential data inconsistency and errors due to timeouts or retries
- Increased load on the Thanos Store and its dependencies
Diagnosis #
To diagnose the root cause of the high latency, follow these steps:
- Check the Thanos Store log for any errors or warnings related to store series gate requests
- Verify that the Thanos Store is properly configured and has sufficient resources (e.g., CPU, memory, disk space)
- Investigate any recent changes or updates to the Thanos Store or its dependencies
- Check the network connectivity and latency between the Thanos Store and its clients
- Analyze the performance metrics of the Thanos Store and its dependencies to identify any bottlenecks
Mitigation #
To mitigate the high latency, follow these steps:
- Increase the resources (e.g., CPU, memory, disk space) allocated to the Thanos Store
- Optimize the configuration of the Thanos Store and its dependencies for better performance
- Implement caching or other performance optimization techniques for store series gate requests
- Reduce the load on the Thanos Store by scaling out or distributing the traffic
- Consider implementing a queue or buffer to handle store series gate requests to reduce latency and improve overall system reliability