ElasticsearchDiskOutOfSpace #
The disk usage is over 90%
Alert Rule
alert: ElasticsearchDiskOutOfSpace
annotations:
description: |-
The disk usage is over 90%
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-community-elasticsearch-exporter/elasticsearchdiskoutofspace/
summary: Elasticsearch disk out of space (instance {{ $labels.instance }})
expr: elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes
* 100 < 10
for: 0m
labels:
severity: critical
Here is a runbook for the ElasticsearchDiskOutOfSpace alert:
Meaning #
The ElasticsearchDiskOutOfSpace alert is triggered when the available disk space on an Elasticsearch node falls below 10% of the total disk capacity. This indicates that the disk is running out of space, which can cause Elasticsearch to become unstable, slow, or even crash.
Impact #
If left unaddressed, a disk running out of space can have severe consequences, including:
- Elasticsearch cluster instability or downtime
- Data loss or corruption
- Increased latency or timeouts for searches and indexing operations
- Potential data loss or inconsistency due to incomplete writes
- Increased risk of node crashes or failures
Diagnosis #
To diagnose the issue, follow these steps:
- Check the disk usage of the affected Elasticsearch node using the
df -h
command or a similar tool. - Verify that the disk usage is indeed above 90% by checking the
elasticsearch_filesystem_data_available_bytes
andelasticsearch_filesystem_data_size_bytes
metrics. - Check the Elasticsearch logs for any errors or warnings related to disk space issues.
- Review the indexing and search patterns to identify any unusual activity that may be contributing to the disk space issue.
Mitigation #
To mitigate the issue, follow these steps:
- Free up disk space: Immediately free up disk space by deleting unnecessary files, logs, or indices.
- Add more disk space: Consider adding more disk space to the affected node or distributing the data across multiple nodes.
- Optimize indexing and searching: Optimize indexing and searching patterns to reduce disk usage.
- Monitor disk usage: Regularly monitor disk usage to catch potential issues before they escalate.
- Consider cluster rebalancing: If the issue persists, consider rebalancing the Elasticsearch cluster to distribute data more evenly across nodes.
Remember to follow your organization’s specific procedures and guidelines for resolving disk space issues in your Elasticsearch cluster.