CephOsdReweighted #
Ceph Object Storage Daemon takes too much time to resize.
Alert Rule
alert: CephOsdReweighted
annotations:
description: |-
Ceph Object Storage Daemon takes too much time to resize.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/ceph-internal/cephosdreweighted/
summary: Ceph OSD reweighted (instance {{ $labels.instance }})
expr: ceph_osd_weight < 1
for: 2m
labels:
severity: warning
Here is the runbook for the CephOsdReweighted alert rule:
Meaning #
The CephOsdReweighted alert rule triggers when a Ceph Object Storage Daemon (OSD) has its weight reduced to less than 1. This means that the OSD is taking too much time to resize, which can lead to performance issues and impact the overall health of the Ceph cluster.
Impact #
The impact of this alert is that the Ceph cluster may experience reduced performance, slower data replication, and increased latency. This can lead to:
- Slower data access times for clients
- Increased risk of data loss or corruption
- Decreased overall cluster reliability
Diagnosis #
To diagnose the root cause of this alert, follow these steps:
- Check the Ceph cluster status using
ceph -s
orceph status
- Verify the OSD’s weight using
ceph osd tree
orceph osd dump
- Check the OSD’s logs for any error messages related to resizing or slow performance
- Verify that the OSD is not overloaded or experiencing high latency
- Check if there are any ongoing maintenance tasks or upgrades that may be causing the issue
Mitigation #
To mitigate the issue, follow these steps:
- Check if the OSD can be restarted or remapped to improve performance
- Verify that the OSD’s disk is not full or experiencing high latency
- Check if there are any configuration issues or misconfigurations that may be causing the slow performance
- Consider increasing the OSD’s weight or adjusting the crush map to improve performance
- If the issue persists, consider consulting the Ceph documentation or seeking assistance from a Ceph expert or support team.
Remember to always follow proper procedures and testing before making any changes to the Ceph cluster.