ElasticsearchHighIndexingRate #

The indexing rate on Elasticsearch cluster is higher than the threshold.

Alert Rule

alert: ElasticsearchHighIndexingRate
annotations:
  description: |-
    The indexing rate on Elasticsearch cluster is higher than the threshold.
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-community-elasticsearch-exporter/elasticsearchhighindexingrate/
  summary: Elasticsearch High Indexing Rate (instance {{ $labels.instance }})
expr: sum(rate(elasticsearch_indices_indexing_index_total[1m]))&gt; 10000
for: 5m
labels:
  severity: warning

Meaning #

The ElasticsearchHighIndexingRate alert is triggered when the indexing rate on an Elasticsearch cluster exceeds 10,000 documents per minute for a sustained period of 5 minutes. This alert indicates that the cluster is experiencing high indexing load, which can lead to performance issues, increased latency, and potential data loss.

Impact #

Performance degradation: High indexing rates can cause Elasticsearch nodes to slow down, leading to increased latency and decreased query performance.
Resource utilization: High indexing rates can consume excessive CPU, memory, and disk resources, potentially causing node instability or even crashes.
Data inconsistencies: High indexing rates can lead to indexing timeouts, causing data inconsistencies and potential data loss.

Diagnosis #

To diagnose the cause of the high indexing rate, follow these steps:

Check the Elasticsearch cluster’s current indexing rate using the elasticsearch_indices_indexing_index_total metric.
Identify the specific indices or types of data that are causing the high indexing rate.
Investigate recent changes to the indexing pipeline, such as new data sources or changed indexing patterns.
Review Elasticsearch node logs for any errors or warnings related to indexing.
Check the cluster’s resource utilization, such as CPU, memory, and disk usage.

Mitigation #

To mitigate the impact of high indexing rates, follow these steps:

Temporary mitigation: Reduce the indexing rate by pausing or slowing down the data ingestion pipeline.
Index optimization: Optimize index settings, such as increasing the index.buffer.size and index.refresh.interval to reduce the indexing load.
Add more resources: Scale up the Elasticsearch cluster by adding more nodes or increasing the resources allocated to existing nodes.
Data pipeline optimization: Optimize the data pipeline to reduce the volume of data being indexed or to distribute the indexing load more evenly.
Long-term solution: Implement a more sustainable indexing strategy, such as using a message queue or batch processing to reduce the real-time indexing load.