OxidizedBackupVeryOldCritical #
Device {{ $labels.name }} has not been backed up in over 7 days!
Alert Rule
alert: OxidizedBackupVeryOldCritical
annotations:
description: |
Device {{ $labels.name }} has not been backed up in over 7 days!
Last backup: {{ $value | humanizeDuration }} ago
Immediate attention required.
runbook: https://srerun.github.io/prometheus-alerts/runbooks/oxidized/oxidizedbackupveryoldcritical/
summary: 'CRITICAL: No recent backup for {{ $labels.name }}'
expr: |
time() - oxidized_device_last_backup_end > 86400 * 7
and on(name) oxidized_device_status == 2
for: 60m
labels:
severity: critical
Meaning #
The OxidizedBackupVeryOldCritical alert is triggered when a device has not been backed up in over 7 days. Oxidized is a tool used for backing up network device configurations. This alert indicates that the backup process for a specific device has failed or been skipped for an extended period, which could lead to potential configuration loss in case of a device failure or misconfiguration.
Impact #
The impact of this alert is significant, as it indicates that the configuration of a critical network device has not been backed up recently. If the device were to fail or become misconfigured, the lack of a recent backup could lead to extended downtime and potential data loss. This could have serious consequences for business operations, network reliability, and security.
Diagnosis #
To diagnose the issue, follow these steps:
- Check Oxidized logs: Review the Oxidized logs to determine the cause of the backup failure. Common issues include connectivity problems, authentication failures, or device configuration errors.
- Verify device connectivity: Ensure that the device is reachable and responding to Oxidized requests.
- Confirm device configuration: Verify that the device configuration has not changed in a way that would prevent backups from occurring.
- Check Oxidized configuration: Review the Oxidized configuration to ensure that it is correctly set up to back up the device.
Mitigation #
To mitigate the issue, follow these steps:
- Manually trigger a backup: Use Oxidized to manually trigger a backup of the device to ensure that the configuration is captured.
- Investigate and resolve the root cause: Identify the root cause of the backup failure and take corrective action to prevent future failures.
- Schedule regular backups: Ensure that regular backups are scheduled to prevent this issue from occurring in the future.
- Implement monitoring and alerting: Implement additional monitoring and alerting to detect backup failures and notify teams in a timely manner.
- Review and update runbooks: Review and update runbooks to ensure that they are current and effective in resolving backup-related issues.