ElasticsearchDiskOutOfSpace #

The disk usage is over 90%

Alert Rule

alert: ElasticsearchDiskOutOfSpace
annotations:
  description: |-
    The disk usage is over 90%
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/prometheus-community-elasticsearch-exporter/elasticsearchdiskoutofspace/
  summary: Elasticsearch disk out of space (instance {{ $labels.instance }})
expr: elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes
  * 100 &lt; 10
for: 0m
labels:
  severity: critical

Here is a runbook for the ElasticsearchDiskOutOfSpace alert:

Meaning #

The ElasticsearchDiskOutOfSpace alert is triggered when the available disk space on an Elasticsearch node falls below 10% of the total disk capacity. This indicates that the disk is running out of space, which can cause Elasticsearch to become unstable, slow, or even crash.

Impact #

If left unaddressed, a disk running out of space can have severe consequences, including:

Elasticsearch cluster instability or downtime
Data loss or corruption
Increased latency or timeouts for searches and indexing operations
Potential data loss or inconsistency due to incomplete writes
Increased risk of node crashes or failures

Diagnosis #

To diagnose the issue, follow these steps:

Check the disk usage of the affected Elasticsearch node using the df -h command or a similar tool.
Verify that the disk usage is indeed above 90% by checking the elasticsearch_filesystem_data_available_bytes and elasticsearch_filesystem_data_size_bytes metrics.
Check the Elasticsearch logs for any errors or warnings related to disk space issues.
Review the indexing and search patterns to identify any unusual activity that may be contributing to the disk space issue.

Mitigation #

To mitigate the issue, follow these steps:

Free up disk space: Immediately free up disk space by deleting unnecessary files, logs, or indices.
Add more disk space: Consider adding more disk space to the affected node or distributing the data across multiple nodes.
Optimize indexing and searching: Optimize indexing and searching patterns to reduce disk usage.
Monitor disk usage: Regularly monitor disk usage to catch potential issues before they escalate.
Consider cluster rebalancing: If the issue persists, consider rebalancing the Elasticsearch cluster to distribute data more evenly across nodes.

Remember to follow your organization’s specific procedures and guidelines for resolving disk space issues in your Elasticsearch cluster.