ElasticsearchClusterRed #
Elastic Cluster Red status
Alert Rule
alert: ElasticsearchClusterRed
annotations:
description: |-
Elastic Cluster Red status
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-community-elasticsearch-exporter/elasticsearchclusterred/
summary: Elasticsearch Cluster Red (instance {{ $labels.instance }})
expr: elasticsearch_cluster_health_status{color="red"} == 1
for: 0m
labels:
severity: critical
Here is a runbook for the ElasticsearchClusterRed alert rule:
Meaning #
The ElasticsearchClusterRed alert is triggered when the Elasticsearch cluster health status is reported as “red” by the Elasticsearch exporter. This indicates a critical problem with the Elasticsearch cluster, such as node failures, data loss, or shard allocation issues.
Impact #
A red Elasticsearch cluster health status can have severe consequences on the system, including:
- Data loss or inconsistency
- Search and indexing delays or failures
- Increased latency and errors
- Potential data corruption
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Elasticsearch cluster health API to confirm the red status.
- Review the Elasticsearch logs for errors or warnings related to cluster health.
- Verify that all nodes in the cluster are online and reachable.
- Check for any ongoing indexing or search operations that may be causing the issue.
- Investigate if there are any pending tasks, such as shard relocations or cluster updates, that may be contributing to the problem.
Mitigation #
To mitigate the issue, follow these steps:
- Identify and resolve any node failures or connectivity issues.
- Check for and address any data inconsistencies or corruption.
- Run the Elasticsearch cluster health API to retrieve more detailed information about the cluster status.
- Consider restarting the Elasticsearch cluster or individual nodes if necessary.
- Implement any necessary configuration changes or patches to prevent similar issues in the future.
- Monitor the Elasticsearch cluster health status closely to ensure the issue is resolved and does not recur.
Remember to refer to the Elasticsearch documentation and your organization’s specific guidelines for additional troubleshooting and mitigation steps.