ProxmoxGuestDown #
Guest {{ printf “{{ $labels.name }}” }} of type {{ printf “{{ $labels.type }}” }} on node {{ printf “{{ $labels.node }}” }} is down
Alert Rule
alert: ProxmoxGuestDown
annotations:
description: Guest {{ $labels.name }} of type {{ $labels.type }} on node {{ $labels.node
}} is down
runbook: https://srerun.github.io/prometheus-alerts/runbooks/proxmox-exporter/proxmoxguestdown/
summary: Proxmox guest {{ $labels.name }} is down
expr: |
proxmox_guest_up{tags!~"standby"} == 0
for: 1m
labels:
severity: critical
Meaning #
This runbook is for the ProxmoxGuestDown alert, which is triggered when a Proxmox guest virtual machine is down. This alert is critical and indicates that a guest machine is not functioning, which can impact services and applications that rely on it.
Impact #
- Unavailability of services or applications running on the affected guest machine
- Potential data loss or corruption if the guest machine was not properly shut down
- Impact on business operations and productivity if the guest machine is critical to daily activities
- Increased risk of security breaches if the guest machine is not properly secured
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Proxmox web interface or command-line interface to confirm the guest machine is down
- Verify the guest machine’s configuration and settings to ensure they are correct
- Review the system logs to identify any errors or issues that may have caused the guest machine to go down
- Check for any hardware or software issues on the node that the guest machine is running on
- Verify that the Proxmox node is properly configured and running correctly
Mitigation #
To mitigate the issue, follow these steps:
- Restart the guest machine from the Proxmox web interface or command-line interface
- If the guest machine does not restart, attempt to migrate it to another node or reboot the node it is running on
- If the issue persists, restore the guest machine from a backup or rebuild it from scratch
- Verify that all services and applications running on the guest machine are functioning correctly
- Take preventative measures to avoid similar issues in the future, such as:
- Regularly backing up guest machines
- Implementing high availability and redundancy for critical guest machines
- Monitoring guest machine performance and logs for signs of issues
- Performing regular maintenance and updates on the Proxmox node and guest machines