HostUnusualDiskReadLatency #
Disk latency is growing (read operations > 100ms)
Alert Rule
alert: HostUnusualDiskReadLatency
annotations:
description: |-
Disk latency is growing (read operations > 100ms)
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/node-exporter/hostunusualdiskreadlatency/
summary: Host unusual disk read latency (instance {{ $labels.instance }})
expr: (rate(node_disk_read_time_seconds_total[1m]) / rate(node_disk_reads_completed_total[1m])
> 0.1 and rate(node_disk_reads_completed_total[1m]) > 0) * on(instance) group_left
(nodename) node_uname_info{nodename=~".+"}
for: 2m
labels:
severity: warning
Here is a sample runbook for the HostUnusualDiskReadLatency
alert:
Meaning #
The HostUnusualDiskReadLatency
alert is triggered when the average disk read latency on a host exceeds 100ms. This indicates that disk I/O operations are taking longer than usual to complete, which can impact system performance and responsiveness.
Impact #
High disk read latency can cause:
- Slow application performance
- Increased load times
- Decreased system responsiveness
- Potential data loss or corruption
Diagnosis #
To diagnose the issue, follow these steps:
- Check the system logs for any errors or warnings related to disk I/O operations.
- Verify that the disk is not experiencing any hardware issues, such as high disk usage or hardware failures.
- Check the disk queue length and disk utilization to identify if there are any bottlenecks.
- Investigate if there are any resource-intensive applications or processes running on the system that may be contributing to the high disk read latency.
Mitigation #
To mitigate the issue, follow these steps:
- Check and clean up any unnecessary files or data on the disk to free up space.
- Consider upgrading to faster disk hardware, such as SSDs, if possible.
- Optimize disk configuration and settings, such as adjusting disk scheduling algorithms or enabling write caching.
- Identify and optimize resource-intensive applications or processes that may be contributing to the high disk read latency.
- Consider implementing disk I/O monitoring and alerting to catch potential issues before they become critical.
Additionally, review system configuration and tuning options, such as adjusting disk timeouts or queue lengths, to optimize disk performance.