ProxmoxGuestTargetLost #
Guest {{ printf “{{ $labels.name }}” }} of type {{ printf “{{ $labels.type }}” }} on node {{ printf “{{ $labels.node }}” }} may be down
Alert Rule
alert: ProxmoxGuestTargetLost
annotations:
description: Guest {{ $labels.name }} of type {{ $labels.type }} on node {{ $labels.node
}} may be down
runbook: https://srerun.github.io/prometheus-alerts/runbooks/proxmox-exporter/proxmoxguesttargetlost/
summary: Proxmox guest up metric absent for {{ $labels.name }}
expr: |
absent_over_time(proxmox_guest_up[1h])
for: 1m
labels:
severity: critical
Here is a sample runbook for the Prometheus alert rule ProxmoxGuestTargetLost
:
Meaning #
The ProxmoxGuestTargetLost
alert is triggered when the proxmox_guest_up
metric is absent for 1 hour, indicating that a Proxmox guest virtual machine is potentially down. This alert is critical and requires immediate attention to prevent downtime and data loss.
Impact #
The impact of this alert is high, as it indicates that a guest virtual machine is unavailable, which can result in:
- Downtime and loss of productivity for users relying on the virtual machine
- Potential data loss or corruption if the virtual machine was not shut down cleanly
- Increased workload for IT teams to troubleshoot and resolve the issue
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Proxmox web interface to verify the status of the guest virtual machine.
- Review the Proxmox logs to identify any errors or issues related to the guest virtual machine.
- Check the node’s system logs to identify any underlying hardware or software issues.
- Verify that the Proxmox node is reachable and responding to requests.
- Check the network connectivity between the Proxmox node and the guest virtual machine.
Mitigation #
To mitigate the issue, follow these steps:
- Attempt to restart the guest virtual machine from the Proxmox web interface.
- If the guest virtual machine cannot be restarted, try to migrate it to a different node or host.
- If migration is not possible, restore the guest virtual machine from a backup (if available).
- Investigate and resolve any underlying issues causing the guest virtual machine to become unavailable.
- Implement proactive monitoring and alerting to prevent similar issues in the future.
Remember to update the runbook with specific steps and procedures relevant to your organization’s infrastructure and policies.