HostOutOfDiskSpace #
Disk is almost full (< 10% left)
Alert Rule
alert: HostOutOfDiskSpace
annotations:
description: |-
Disk is almost full (< 10% left)
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/node-exporter/hostoutofdiskspace/
summary: Host out of disk space (instance {{ $labels.instance }})
expr: ((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON
(instance, device, mountpoint) node_filesystem_readonly == 0) * on(instance) group_left
(nodename) node_uname_info{nodename=~".+"}
for: 2m
labels:
severity: warning
Meaning #
The HostOutOfDiskSpace alert is triggered when a host’s available disk space falls below 10% and the file system is not read-only. This alert indicates that the host is at risk of running out of disk space, which can cause service disruptions, data loss, and system crashes.
Impact #
If left unaddressed, a host running out of disk space can lead to:
- Service disruptions and downtime
- Data loss or corruption
- System crashes or freezes
- Increased latency and decreased performance
- Inability to write logs, causing debugging and troubleshooting difficulties
Diagnosis #
To diagnose the issue, follow these steps:
- Check the disk usage of the affected host using tools like
df
ordu
. - Identify the file system that is running low on disk space.
- Investigate the cause of the disk space usage, such as:
- Large log files or directories.
- Unused or unnecessary files and directories.
- Disk-hungry applications or processes.
- Check the Prometheus metrics, such as
node_filesystem_avail_bytes
andnode_filesystem_size_bytes
, to confirm the disk space usage.
Mitigation #
To mitigate the issue, follow these steps:
- Free up disk space by deleting unnecessary files and directories.
- compress or rotate log files to reduce their size.
- Identify and terminate disk-hungry applications or processes.
- Consider increasing the disk space or migrating data to a larger storage solution.
- Implement disk space monitoring and alerting to catch low disk space issues before they become critical.
- Schedule regular disk cleanups and maintenance tasks to prevent disk space issues from recurring.
Additional resources:
- Refer to the official Node Exporter documentation for more information on disk space metrics.
- Consult the Prometheus Alerting documentation for configuration and customization options.
- Review the runbook for this alert on GitHub for more detailed instructions and troubleshooting steps.