KubernetesNodeMemoryPressure #
Node {{ $labels.node }} has MemoryPressure condition
Alert Rule
alert: KubernetesNodeMemoryPressure
annotations:
description: |-
Node {{ $labels.node }} has MemoryPressure condition
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/kubestate-exporter/kubernetesnodememorypressure/
summary: Kubernetes memory pressure (node {{ $labels.node }})
expr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1
for: 2m
labels:
severity: critical
Meaning #
The KubernetesNodeMemoryPressure alert is triggered when a Kubernetes node is experiencing memory pressure, indicating that the node is running low on available memory. This can lead to performance issues, slow downs, and even node failures.
Impact #
The impact of this alert is critical, as memory pressure on a Kubernetes node can:
- Cause pods to restart or fail
- Impact the performance of applications running on the node
- Lead to node failures, causing downtime and service disruptions
- Affect the overall stability and reliability of the Kubernetes cluster
Diagnosis #
To diagnose the issue, perform the following steps:
- Check the node’s resource utilization using
kubectl top
or a monitoring tool like Prometheus. - Verify that the node is not experiencing any other issues, such as disk pressure or CPU throttling.
- Review the node’s configuration and resource allocation to ensure that it is suitable for the workloads running on it.
- Investigate if there are any memory-intensive workloads or pods running on the node that may be contributing to the memory pressure.
Mitigation #
To mitigate the issue, perform the following steps:
- Identify and terminate any unnecessary or idle pods consuming excessive memory.
- Scale up or spread out memory-intensive workloads to reduce the load on the node.
- Add more nodes to the cluster or upgrade existing nodes with more memory.
- Implement memory-related constraints and limits in the pod configurations to prevent memory over-allocation.
- Consider implementing a cluster autoscaler to dynamically adjust the node count based on resource utilization.
For further guidance, refer to the runbook at https://github.com/srerun/prometheus-alerts/blob/main/content/runbooks/kubestate-exporter/KubernetesNodeMemoryPressure.md