HostNetworkTransmitErrors #
Host {{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf “%.0f” $value }} transmit errors in the last two minutes.
Alert Rule
alert: HostNetworkTransmitErrors
annotations:
description: |-
Host {{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf "%.0f" $value }} transmit errors in the last two minutes.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/node-exporter/hostnetworktransmiterrors/
summary: Host Network Transmit Errors (instance {{ $labels.instance }})
expr: (rate(node_network_transmit_errs_total[2m]) / rate(node_network_transmit_packets_total[2m])
> 0.01) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
for: 2m
labels:
severity: warning
Here is a runbook for the HostNetworkTransmitErrors alert rule:
Meaning #
This alert is triggered when the rate of network transmit errors on a host exceeds 1% of the total packets transmitted over a 2-minute period. This indicates that there is an issue with the host’s network interface or the network itself, leading to a significant number of packet transmission errors.
Impact #
The impact of this issue can be significant, as it can cause:
- Packet loss and retransmission, leading to increased latency and decreased network performance
- Increased CPU usage on the host as it attempts to retransmit packets
- Potential application errors or crashes due to network connectivity issues
Diagnosis #
To diagnose the issue, follow these steps:
- Check the host’s network interface configuration and status using tools such as
ip addr show
orethtool
. - Verify that the host’s network cables are securely connected and that there are no issues with the physical network infrastructure.
- Check the system logs for any errors or warnings related to the network interface or device driver.
- Use tools such as
tcpdump
orWireshark
to capture and analyze network traffic to identify any patterns or issues. - Check the Node Exporter metrics to see if there are any other network-related issues, such as high levels of packet drops or errors.
Mitigation #
To mitigate the issue, follow these steps:
- Restart the network interface or the entire host to reset the network configuration and clear any temporary issues.
- Check and update the network interface drivers to the latest version.
- Adjust the network interface configuration to optimize performance and reduce errors.
- Implement network Quality of Service (QoS) policies to prioritize critical network traffic and reduce congestion.
- Consider replacing or repairing the network interface or physical network infrastructure if hardware issues are suspected.
Remember to investigate and address the root cause of the issue to prevent it from recurring.