ElasticsearchHighIndexingRate #
The indexing rate on Elasticsearch cluster is higher than the threshold.
Alert Rule
alert: ElasticsearchHighIndexingRate
annotations:
description: |-
The indexing rate on Elasticsearch cluster is higher than the threshold.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-community-elasticsearch-exporter/elasticsearchhighindexingrate/
summary: Elasticsearch High Indexing Rate (instance {{ $labels.instance }})
expr: sum(rate(elasticsearch_indices_indexing_index_total[1m]))> 10000
for: 5m
labels:
severity: warning
Meaning #
The ElasticsearchHighIndexingRate alert is triggered when the indexing rate on an Elasticsearch cluster exceeds 10,000 documents per minute for a sustained period of 5 minutes. This alert indicates that the cluster is experiencing high indexing load, which can lead to performance issues, increased latency, and potential data loss.
Impact #
- Performance degradation: High indexing rates can cause Elasticsearch nodes to slow down, leading to increased latency and decreased query performance.
- Resource utilization: High indexing rates can consume excessive CPU, memory, and disk resources, potentially causing node instability or even crashes.
- Data inconsistencies: High indexing rates can lead to indexing timeouts, causing data inconsistencies and potential data loss.
Diagnosis #
To diagnose the cause of the high indexing rate, follow these steps:
- Check the Elasticsearch cluster’s current indexing rate using the
elasticsearch_indices_indexing_index_total
metric. - Identify the specific indices or types of data that are causing the high indexing rate.
- Investigate recent changes to the indexing pipeline, such as new data sources or changed indexing patterns.
- Review Elasticsearch node logs for any errors or warnings related to indexing.
- Check the cluster’s resource utilization, such as CPU, memory, and disk usage.
Mitigation #
To mitigate the impact of high indexing rates, follow these steps:
- Temporary mitigation: Reduce the indexing rate by pausing or slowing down the data ingestion pipeline.
- Index optimization: Optimize index settings, such as increasing the
index.buffer.size
andindex.refresh.interval
to reduce the indexing load. - Add more resources: Scale up the Elasticsearch cluster by adding more nodes or increasing the resources allocated to existing nodes.
- Data pipeline optimization: Optimize the data pipeline to reduce the volume of data being indexed or to distribute the indexing load more evenly.
- Long-term solution: Implement a more sustainable indexing strategy, such as using a message queue or batch processing to reduce the real-time indexing load.