KubernetesStatefulsetGenerationMismatch #
StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has failed but has not been rolled back.
Alert Rule
alert: KubernetesStatefulsetGenerationMismatch
annotations:
description: |-
StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has failed but has not been rolled back.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/kubestate-exporter/kubernetesstatefulsetgenerationmismatch/
summary: Kubernetes StatefulSet generation mismatch ({{ $labels.namespace }}/{{
$labels.statefulset }})
expr: kube_statefulset_status_observed_generation != kube_statefulset_metadata_generation
for: 10m
labels:
severity: critical
Here is a runbook for the KubernetesStatefulsetGenerationMismatch alert:
Meaning #
The KubernetesStatefulsetGenerationMismatch alert is triggered when the observed generation of a StatefulSet does not match the expected generation. This can occur when a StatefulSet update fails, causing the observed generation to diverge from the expected generation.
Impact #
If left unresolved, a StatefulSet generation mismatch can lead to:
- Unexpected behavior or errors in the application
- Inconsistent data or state across replicas
- Difficulty in rolling back to a previous version
- Potential data loss or corruption
Diagnosis #
To diagnose the issue, follow these steps:
- Check the StatefulSet’s status using
kubectl describe statefulset <statefulset_name> -n <namespace>
- Verify the observed generation and expected generation values
- Check the StatefulSet’s update history using
kubectl rollout history statefulset <statefulset_name> -n <namespace>
- Inspect the Pods’ status and logs for any errors or issues
Mitigation #
To mitigate the issue, follow these steps:
- Identify the root cause of the generation mismatch (e.g., failed update, network issues, etc.)
- Roll back the StatefulSet to a previous version using
kubectl rollout undo statefulset <statefulset_name> -n <namespace>
- Verify the StatefulSet’s status and generation values after the rollback
- Investigate and address any underlying issues that caused the generation mismatch
- Consider implementing additional monitoring and logging to detect similar issues in the future