ProxmoxCertificateExpiring

ProxmoxCertificateExpiring #

The certificate with subject {{ printf “{{ $labels.subject }}” }} on that node is expiring in {{ printf “{{ $value }}” }} days

Alert Rule
alert: ProxmoxCertificateExpiring
annotations:
  description: The certificate with subject {{ $labels.subject }} on that node is
    expiring in {{ $value }} days
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/proxmox-exporter/proxmoxcertificateexpiring/
  summary: Proxmox certificate on node {{ $labels.node }} is expiring in a week
expr: "proxmox_node_days_until_cert_expiration < 7 \n"
for: 5m
labels:
  severity: critical

Here is a runbook for the Prometheus alert rule “ProxmoxCertificateExpiring”:

Meaning #

This alert is triggered when the certificate on a Proxmox node is expiring in less than 7 days (configurable via the threshold_ProxmoxCertificateExpiring value). This alert is critical and requires immediate attention to avoid service disruption.

Impact #

If the certificate is not renewed, it will expire and cause disruptions to the Proxmox node and its associated services. This can lead to:

  • Loss of connectivity to the node
  • Inability to manage the node or its resources
  • Potential security risks due to an invalid or expired certificate

Diagnosis #

To diagnose the issue, follow these steps:

  1. Check the Proxmox node’s certificate expiration date using the Proxmox web interface or the proxmox-node command-line tool.
  2. Verify that the certificate subject matches the one reported in the alert.
  3. Check the node’s system logs for any certificate-related errors or warnings.

Mitigation #

To mitigate the issue, follow these steps:

  1. Renew the certificate on the affected Proxmox node using the Proxmox web interface or the proxmox-node command-line tool.
  2. Verify that the new certificate is valid and has a sufficient lifetime.
  3. Update any relevant configurations or dependencies that rely on the certificate.
  4. Clear the alert in Prometheus and verify that the proxmox_node_days_until_cert_expiration metric returns a value greater than the threshold.

Remember to update the runbook URL in the alert rule to point to this document for easy reference.