EtcdNoLeader #
Etcd cluster have no leader
Alert Rule
alert: EtcdNoLeader
annotations:
description: |-
Etcd cluster have no leader
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/etcd-internal/etcdnoleader/
summary: Etcd no Leader (instance {{ $labels.instance }})
expr: etcd_server_has_leader == 0
for: 0m
labels:
severity: critical
Here is the runbook for the EtcdNoLeader alert:
Meaning #
The EtcdNoLeader alert is triggered when the etcd cluster does not have a leader node. etcd is a distributed key-value store that is used to manage and store data in a distributed system. The leader node is responsible for managing the cluster and ensuring that data is replicated correctly. If the cluster does not have a leader, it can lead to data inconsistencies and errors.
Impact #
The impact of this alert is critical, as it can cause the following issues:
- Data inconsistencies: Without a leader, the etcd cluster may not be able to replicate data correctly, leading to inconsistencies across the nodes.
- Cluster instability: The lack of a leader can cause the cluster to become unstable, leading to errors and failures.
- Service disruptions: Depending on the services that rely on etcd, this alert can cause service disruptions and errors.
Diagnosis #
To diagnose the issue, follow these steps:
- Check the etcd cluster status using the etcdctl command-line tool.
- Verify that all etcd nodes are running and healthy.
- Check the etcd logs for any error messages related to leader election or cluster instability.
- Verify that the network connectivity between the etcd nodes is working correctly.
Mitigation #
To mitigate the issue, follow these steps:
- Check the etcd configuration and ensure that it is correctly configured for leader election.
- Verify that all etcd nodes are running with the correct configuration and that there are no network connectivity issues.
- If the issue persists, try to manually elect a new leader using the etcdctl tool.
- If the issue still persists, consider restarting the etcd nodes or seeking assistance from a qualified administrator.
Remember to refer to the etcd documentation and the official runbook for more detailed information on troubleshooting and mitigating this issue.