HostKernelVersionDeviations #
Different kernel versions are running
Alert Rule
alert: HostKernelVersionDeviations
annotations:
description: |-
Different kernel versions are running
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/node-exporter/hostkernelversiondeviations/
summary: Host kernel version deviations (instance {{ $labels.instance }})
expr: (count(sum(label_replace(node_uname_info, "kernel", "$1", "release", "([0-9]+.[0-9]+.[0-9]+).*"))
by (kernel)) > 1) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
for: 6h
labels:
severity: warning
Here is the runbook for the HostKernelVersionDeviations
alert rule:
Meaning #
This alert is triggered when there are multiple kernel versions running on hosts in the cluster. This can indicate a potential security risk or compatibility issues between hosts.
Impact #
The impact of this issue can be:
- Security vulnerabilities: Running different kernel versions can expose the cluster to known security vulnerabilities.
- Inconsistent behavior: Different kernel versions can lead to inconsistent behavior between hosts, making troubleshooting and maintenance more challenging.
- Performance issues: Mismatched kernel versions can cause performance issues, affecting the overall cluster performance.
Diagnosis #
To diagnose the issue, follow these steps:
- Check the alert description for the instance and kernel versions affected.
- Verify the kernel versions running on each host using
node_uname_info
metrics. - Identify the hosts with inconsistent kernel versions.
- Check the kernel version installed on each host using
uname -r
command.
Mitigation #
To mitigate the issue, follow these steps:
- Identify the recommended kernel version for the cluster.
- Update the kernel version on each host to the recommended version.
- Verify that all hosts are running the same kernel version using
node_uname_info
metrics. - Implement a process to ensure kernel version consistency across the cluster, such as automating kernel updates or implementing a kernel version management policy.
Additionally, consider implementing a monitoring and alerting system to detect kernel version deviations in the future.