KubernetesDaemonsetRolloutStuck #
Some Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled or not ready
Alert Rule
alert: KubernetesDaemonsetRolloutStuck
annotations:
description: |-
Some Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} are not scheduled or not ready
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/kubestate-exporter/kubernetesdaemonsetrolloutstuck/
summary: Kubernetes DaemonSet rollout stuck ({{ $labels.namespace }}/{{ $labels.daemonset
}})
expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled
* 100 < 100 or kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled
> 0
for: 10m
labels:
severity: warning
Here is a runbook for the Prometheus alert rule “KubernetesDaemonsetRolloutStuck”:
Meaning #
The “KubernetesDaemonsetRolloutStuck” alert is triggered when a DaemonSet rollout is not progressing as expected. This can happen when some Pods of the DaemonSet are not scheduled or not ready, preventing the rollout from completing.
Impact #
A stuck DaemonSet rollout can have significant implications on the overall health and performance of the Kubernetes cluster. It can lead to:
- Inconsistent service availability
- Increased latency and errors
- Resource waste and inefficiency
- Difficulty in rolling out new features or fixes
Diagnosis #
To diagnose the issue, follow these steps:
- Check the DaemonSet status: Run
kubectl describe daemonset <daemonset_name> -n <namespace>
to view the current status of the DaemonSet. - Identify the stuck Pods: Run
kubectl get pods -l <daemonset_label> -n <namespace>
to list the Pods associated with the DaemonSet and identify which ones are not scheduled or not ready. - Check Pod logs: Run
kubectl logs <pod_name> -n <namespace>
to view the logs of the stuck Pods and identify any error messages or issues. - Verify node availability: Run
kubectl get nodes
to check if there are any issues with the nodes that are preventing Pods from being scheduled. - Check for resource constraints: Verify that there are sufficient resources (e.g. CPU, memory) available on the nodes to schedule the Pods.
Mitigation #
To mitigate the issue, follow these steps:
- Investigate and resolve any underlying issues: Address any issues identified during diagnosis, such as node availability or resource constraints.
- Force rollout: Run
kubectl rollout undo daemonset <daemonset_name> -n <namespace>
to force the rollout to retry. - Check and adjust DaemonSet configuration: Verify that the DaemonSet configuration is correct and adjust as needed.
- Monitor DaemonSet status: Continuously monitor the DaemonSet status using
kubectl describe daemonset <daemonset_name> -n <namespace>
to ensure the rollout is progressing as expected. - Consider rolling back: If the issue persists, consider rolling back the DaemonSet to a previous version or configuration.