CephMonitorClockSkew #
Ceph monitor clock skew detected. Please check ntp and hardware clock settings
Alert Rule
alert: CephMonitorClockSkew
annotations:
description: |-
Ceph monitor clock skew detected. Please check ntp and hardware clock settings
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/ceph-internal/cephmonitorclockskew/
summary: Ceph monitor clock skew (instance {{ $labels.instance }})
expr: abs(ceph_monitor_clock_skew_seconds) > 0.2
for: 2m
labels:
severity: warning
Here is a sample runbook for the CephMonitorClockSkew alert:
Meaning #
The CephMonitorClockSkew alert is triggered when the clock skew between Ceph monitors exceeds 0.2 seconds. This can cause issues with data consistency and cluster stability.
Impact #
If left unaddressed, clock skew can lead to:
- Inconsistent data replication
- Cluster instability
- Performance degradation
- Potential data loss
Diagnosis #
To diagnose the issue, follow these steps:
- Check the NTP (Network Time Protocol) configuration and make sure it is correctly set up and synchronized across all nodes in the cluster.
- Verify the hardware clock settings on each node to ensure they are accurate and consistent.
- Review the Ceph monitor logs for any errors or warnings related to clock skew.
- Use the
ceph mon dump
command to check the current clock skew values.
Mitigation #
To mitigate the issue, follow these steps:
- Adjust the NTP configuration to ensure accurate time synchronization across all nodes in the cluster.
- Correct any hardware clock settings that are found to be inaccurate or inconsistent.
- Restart the Ceph monitor service on each node to ensure that the new clock settings take effect.
- Monitor the cluster for any further issues related to clock skew and adjust the NTP configuration and hardware clock settings as needed.
Additional resources: