ProxmoxDiskUnhealthy

ProxmoxDiskUnhealthy #

The disk {{ printf “{{ $labels.devpath }}” }} in node {{ printf “{{ $labels.node }}” }} is reporting unhealthy in SMART tests

Alert Rule
alert: ProxmoxDiskUnhealthy
annotations:
  description: The disk {{ $labels.devpath }} in node {{ $labels.node }} is reporting
    unhealthy in SMART tests
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/proxmox-exporter/proxmoxdiskunhealthy/
  summary: Proxmox disk {{ $labels.devpath }} is unhealthy
expr: "proxmox_node_disk_smart_status{devpath != \"/dev/sdc\"} == 0 \n"
for: 1m
labels:
  severity: critical

Meaning #

The ProxmoxDiskUnhealthy alert is triggered when a disk in a Proxmox node reports an unhealthy status in its SMART (Self-Monitoring, Analysis and Reporting Technology) tests. This alert indicates a potential issue with the disk’s reliability and may lead to data loss or system crashes if left unattended.

Impact #

  • Data loss or corruption
  • System crashes or instability
  • Downtime for the affected node
  • Potential impact on hosted virtual machines or containers

Diagnosis #

To diagnose the issue, follow these steps:

  1. Check the Proxmox Web Interface or the pvecli command-line tool to verify the disk’s status and identify the specific error.
  2. Review the disk’s SMART attributes to determine the cause of the unhealthy status.
  3. Check the system logs for any errors or warnings related to the disk.
  4. Verify that the disk is properly seated and connected to the node.

Mitigation #

To mitigate the issue, follow these steps:

  1. Replace the unhealthy disk with a functioning one, if possible.
  2. Run a thorough disk check and repair any errors found.
  3. Update the disk’s firmware to the latest version.
  4. Consider migrating critical data to a redundant storage system or backup.
  5. Implement additional monitoring and alerting for disk health to detect potential issues earlier.

Note: The specific mitigation steps may vary depending on the specific error and the node’s configuration. It is recommended to consult the Proxmox documentation and the disk manufacturer’s guidelines for further guidance.