HostUnusualDiskWriteRate #
Disk is probably writing too much data (> 50 MB/s)
Alert Rule
alert: HostUnusualDiskWriteRate
annotations:
description: |-
Disk is probably writing too much data (> 50 MB/s)
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/node-exporter/hostunusualdiskwriterate/
summary: Host unusual disk write rate (instance {{ $labels.instance }})
expr: (sum by (instance) (rate(node_disk_written_bytes_total[2m])) / 1024 / 1024 >
50) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
for: 2m
labels:
severity: warning
Here is the runbook for the HostUnusualDiskWriteRate
alert rule:
Meaning #
The HostUnusualDiskWriteRate
alert is triggered when a host’s disk write rate exceeds 50 MB/s over a 2-minute period. This alert indicates that a host is writing data to disk at an unusually high rate, which may be a sign of abnormal system behavior or a potential issue.
Impact #
If left unaddressed, an unusual disk write rate can lead to:
- Disk I/O bottlenecks, causing slow system performance and potentially affecting application responsiveness
- Increased wear and tear on disk hardware, potentially reducing its lifespan
- Potential data loss or corruption if the underlying issue is not addressed
Diagnosis #
To diagnose the root cause of the unusual disk write rate, follow these steps:
- Investigate the host: Check the host’s system logs, disk usage, and running processes to identify any unusual activity or resource-intensive processes.
- Verify disk usage: Use tools like
df
ordu
to check disk usage and identify which files or directories are contributing to the high write rate. - Check for disk errors: Run disk diagnostic tools like
smartctl
orfsck
to identify any disk errors or corruption. - Review system configuration: Check system configuration files, such as
sysctl
andfstab
, to ensure that they are correctly configured and not contributing to the high write rate.
Mitigation #
To mitigate the effects of an unusual disk write rate, follow these steps:
- Identify and terminate resource-intensive processes: Use tools like
top
orhtop
to identify and terminate any resource-intensive processes that may be contributing to the high write rate. - Reduce disk usage: Free up disk space by deleting unnecessary files or relocating data to a different storage location.
- Adjust system configuration: Adjust system configuration files to optimize disk performance and reduce the write rate.
- Monitor disk performance: Closely monitor disk performance and adjust mitigation strategies as needed to prevent further issues.
Remember to investigate and address the root cause of the unusual disk write rate to prevent similar issues from occurring in the future.