HostNetworkReceiveErrors #
Host {{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf “%.0f” $value }} receive errors in the last two minutes.
Alert Rule
alert: HostNetworkReceiveErrors
annotations:
description: |-
Host {{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf "%.0f" $value }} receive errors in the last two minutes.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/node-exporter/hostnetworkreceiveerrors/
summary: Host Network Receive Errors (instance {{ $labels.instance }})
expr: (rate(node_network_receive_errs_total[2m]) / rate(node_network_receive_packets_total[2m])
> 0.01) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
for: 2m
labels:
severity: warning
Meaning #
The HostNetworkReceiveErrors
alert is triggered when the rate of receive errors on a host’s network interface exceeds 1% of the total receive packets over a 2-minute period. This indicates a potential issue with the host’s network connectivity or configuration.
Impact #
- Receive errors can cause packet loss, leading to decreased network performance and potential application failures.
- High receive error rates can indicate issues with the network interface, cable, or switch, which can impact the entire cluster or application.
- Unaddressed receive errors can lead to increased latency, reduced throughput, and decreased overall system reliability.
Diagnosis #
To diagnose the issue, follow these steps:
- Identify the affected host and interface using the
instance
anddevice
labels. - Check the system logs for any error messages related to the network interface or driver.
- Verify the network cable and connection to ensure they are secure and functioning properly.
- Use tools like
ethtool
orip link
to check the interface configuration and stats. - Check for any firmware or driver updates for the network interface.
Mitigation #
To mitigate the issue, follow these steps:
- Restart the network interface using
ip link set <interface> down && ip link set <interface> up
. - Check and update the network interface firmware or driver to the latest version.
- Inspect and clean the network cable and connection to ensure they are secure and functioning properly.
- Consider replacing the network cable or interface if it’s faulty.
- If the issue persists, consider escalating to a network administrator or infrastructure team for further assistance.
Remember to reference the node exporter documentation and official runbooks for more detailed troubleshooting and mitigation steps.