HostNetworkInterfaceSaturated #
The network interface “{{ $labels.device }}” on “{{ $labels.instance }}” is getting overloaded.
Alert Rule
alert: HostNetworkInterfaceSaturated
annotations:
description: |-
The network interface "{{ $labels.device }}" on "{{ $labels.instance }}" is getting overloaded.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/node-exporter/hostnetworkinterfacesaturated/
summary: Host Network Interface Saturated (instance {{ $labels.instance }})
expr: ((rate(node_network_receive_bytes_total{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"}[1m])
+ rate(node_network_transmit_bytes_total{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"}[1m]))
/ node_network_speed_bytes{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"} > 0.8 < 10000)
* on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
for: 1m
labels:
severity: warning
Here is the runbook for the HostNetworkInterfaceSaturated
alert:
Meaning #
This alert is triggered when the network interface on a host is experiencing high utilization, indicating potential network congestion and performance issues. The alert is based on the rate of received and transmitted bytes over a 1-minute period, compared to the network interface’s speed.
Impact #
- Network congestion and packet loss
- Slow application performance and response times
- Increased latency and timeouts
- Potential impact on critical business applications and services
Diagnosis #
To diagnose the issue, follow these steps:
- Check network interface utilization: Verify the current network utilization using tools like
top
,htop
, ormpstat
. - Investigate network traffic patterns: Use tools like
tcpdump
orWireshark
to analyze network traffic patterns and identify potential causes of high utilization. - Check system resource utilization: Verify CPU, memory, and disk utilization to ensure they are not contributing to the high network utilization.
- Review application logs: Check application logs for any errors or warnings related to network connectivity or performance issues.
Mitigation #
To mitigate the issue, follow these steps:
- Investigate and resolve underlying causes: Identify and address the root cause of high network utilization, such as:
- High traffic volumes from a specific application or service
- Network configuration issues
- Hardware or firmware issues with the network interface
- Implement traffic shaping or rate limiting: Consider implementing traffic shaping or rate limiting to prevent overutilization of the network interface.
- Upgrade network infrastructure: If necessary, consider upgrading network infrastructure to increase bandwidth and reduce congestion.
- Monitor and adjust: Continuously monitor network utilization and adjust mitigation strategies as needed to ensure optimal network performance.