ElasticsearchHighQueryLatency #

The query latency on Elasticsearch cluster is higher than the threshold.

Alert Rule

alert: ElasticsearchHighQueryLatency
annotations:
  description: |-
    The query latency on Elasticsearch cluster is higher than the threshold.
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-community-elasticsearch-exporter/elasticsearchhighquerylatency/
  summary: Elasticsearch High Query Latency (instance {{ $labels.instance }})
expr: elasticsearch_indices_search_fetch_time_seconds / elasticsearch_indices_search_fetch_total
  &gt; 1
for: 5m
labels:
  severity: warning

Here is a runbook for the ElasticsearchHighQueryLatency alert:

Meaning #

The ElasticsearchHighQueryLatency alert is triggered when the average query latency on an Elasticsearch cluster exceeds 1 second. This is calculated by dividing the total search fetch time (in seconds) by the total number of search fetch operations.

Impact #

High query latency on Elasticsearch can have a significant impact on the performance and responsiveness of dependent applications and services. This can lead to:

Slow query response times, affecting user experience
Increased load on the Elasticsearch cluster, potentially leading to additional latency or even node failures
Potential impact on business-critical operations, such as search, logging, and analytics

Diagnosis #

To diagnose the root cause of high query latency, follow these steps:

Check the Elasticsearch cluster health: Verify that the cluster is healthy and all nodes are available.
Investigate query patterns: Analyze query logs to identify any unusual or heavy query patterns that may be contributing to the latency.
Check indexing and shard configuration: Verify that indexing and shard configurations are optimal for the workload.
Review cluster resource utilization: Monitor CPU, memory, and disk usage to identify any resource bottlenecks.
Check for any ongoing maintenance or upgrades: Verify that there are no ongoing maintenance or upgrades that may be affecting query performance.

Mitigation #

To mitigate high query latency, follow these steps:

Optimize indexing and shard configuration: Review and adjust indexing and shard configurations to improve query performance.
Adjust query patterns: Optimize query patterns to reduce the load on the Elasticsearch cluster.
Add more resources: Consider adding more resources (e.g., nodes, CPU, memory) to the Elasticsearch cluster to improve query performance.
Implement query caching: Implement query caching to reduce the load on the Elasticsearch cluster.
Consider upgrading Elasticsearch: If the cluster is running an older version of Elasticsearch, consider upgrading to a newer version that may have performance improvements.

Remember to investigate and address the root cause of the high query latency to prevent similar issues in the future.