ProxmoxCertificateExpiring #
The certificate with subject {{ printf “{{ $labels.subject }}” }} on that node is expiring in {{ printf “{{ $value }}” }} days
Alert Rule
alert: ProxmoxCertificateExpiring
annotations:
description: The certificate with subject {{ $labels.subject }} on that node is
expiring in {{ $value }} days
runbook: https://srerun.github.io/prometheus-alerts/runbooks/proxmox-exporter/proxmoxcertificateexpiring/
summary: Proxmox certificate on node {{ $labels.node }} is expiring in a week
expr: "proxmox_node_days_until_cert_expiration < 7 \n"
for: 5m
labels:
severity: critical
Here is a runbook for the Prometheus alert rule “ProxmoxCertificateExpiring”:
Meaning #
This alert is triggered when the certificate on a Proxmox node is expiring in less than 7 days (configurable via the threshold_ProxmoxCertificateExpiring
value). This alert is critical and requires immediate attention to avoid service disruption.
Impact #
If the certificate is not renewed, it will expire and cause disruptions to the Proxmox node and its associated services. This can lead to:
- Loss of connectivity to the node
- Inability to manage the node or its resources
- Potential security risks due to an invalid or expired certificate
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Proxmox node’s certificate expiration date using the Proxmox web interface or the
proxmox-node
command-line tool. - Verify that the certificate subject matches the one reported in the alert.
- Check the node’s system logs for any certificate-related errors or warnings.
Mitigation #
To mitigate the issue, follow these steps:
- Renew the certificate on the affected Proxmox node using the Proxmox web interface or the
proxmox-node
command-line tool. - Verify that the new certificate is valid and has a sufficient lifetime.
- Update any relevant configurations or dependencies that rely on the certificate.
- Clear the alert in Prometheus and verify that the
proxmox_node_days_until_cert_expiration
metric returns a value greater than the threshold.
Remember to update the runbook URL in the alert rule to point to this document for easy reference.