KubernetesNodeNetworkUnavailable #
Node {{ $labels.node }} has NetworkUnavailable condition
Alert Rule
alert: KubernetesNodeNetworkUnavailable
annotations:
description: |-
Node {{ $labels.node }} has NetworkUnavailable condition
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/kubestate-exporter/kubernetesnodenetworkunavailable/
summary: Kubernetes Node network unavailable (instance {{ $labels.instance }})
expr: kube_node_status_condition{condition="NetworkUnavailable",status="true"} ==
1
for: 2m
labels:
severity: critical
Here is a runbook for the KubernetesNodeNetworkUnavailable alert:
Meaning #
The KubernetesNodeNetworkUnavailable alert is triggered when a Kubernetes node reports a NetworkUnavailable condition, indicating that the node’s network is not available. This can cause pods running on the node to become unreachable, leading to service disruptions and potential data loss.
Impact #
- Pods running on the affected node may become unreachable, leading to service disruptions and potential data loss.
- Applications relying on the node’s network connectivity may experience errors or failures.
- Cluster performance and reliability may be impacted if the node is critical to the cluster’s operations.
Diagnosis #
To diagnose the issue, follow these steps:
- Check the node’s status using
kubectl get node <node_name>
. - Verify the node’s network configuration using
kubectl describe node <node_name>
. - Review the node’s system logs for network-related errors using
kubectl logs -f <node_name>
. - Check the node’s network interface configuration using
ip addr show
orifconfig
. - Verify that the node’s network cable is connected and functioning properly.
Mitigation #
To mitigate the issue, follow these steps:
- Restart the node’s network interface using
ip link set <interface_name> down && ip link set <interface_name> up
. - Check if the node’s network configuration is correct and update it if necessary.
- Verify that the node’s network cable is connected and functioning properly.
- If the issue persists, consider rebooting the node or replacing the network hardware if necessary.
- Once the node’s network is available again, verify that pods are running correctly and services are accessible.
Remember to refer to the Kubernetes documentation and your organization’s specific guidelines for troubleshooting and resolving node network issues.