ClickhouseHighNetworkTraffic #
Network traffic is unusually high, may affect cluster performance.
Alert Rule
alert: ClickhouseHighNetworkTraffic
annotations:
description: |-
Network traffic is unusually high, may affect cluster performance.
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/clickhouse-internal/clickhousehighnetworktraffic/
summary: ClickHouse High Network Traffic (instance {{ $labels.instance }})
expr: ClickHouseMetrics_NetworkSend > 250 or ClickHouseMetrics_NetworkReceive > 250
for: 5m
labels:
severity: warning
Meaning #
The ClickhouseHighNetworkTraffic alert is triggered when the ClickHouse instance is experiencing unusually high network traffic, with either the send or receive rates exceeding 250 bytes per second. This alert indicates a potential performance issue that may impact the overall cluster performance.
Impact #
If left unaddressed, high network traffic can lead to:
- Slow query performance
- Increased latency
- Decreased cluster responsiveness
- Potential out-of-memory errors
- Impaired data ingestion and processing capabilities
Diagnosis #
To diagnose the root cause of the high network traffic, perform the following steps:
- Check ClickHouse logs: Review the ClickHouse logs to identify any errors, warnings, or unusual patterns that may be contributing to the high network traffic.
- Analyze query patterns: Use the ClickHouse query log to identify any resource-intensive or long-running queries that may be causing the high network traffic.
- Investigate data ingestion: Check the data ingestion pipeline to ensure that it is functioning correctly and not overwhelming the ClickHouse instance with high volumes of data.
- Verify network configuration: Confirm that the network configuration is correctly set up and not causing any bottlenecks or connectivity issues.
- Check for resource shortages: Verify that the ClickHouse instance has sufficient resources (CPU, memory, disk space) to handle the current workload.
Mitigation #
To mitigate the high network traffic, perform the following steps:
- Optimize queries: Optimize resource-intensive queries to reduce their impact on the network traffic.
- Implement query throttling: Throttle queries to prevent excessive resource usage and reduce network traffic.
- Adjust data ingestion rates: Adjust the data ingestion rate to prevent overwhelming the ClickHouse instance with high volumes of data.
- Upgrade instance resources: Consider upgrading the ClickHouse instance resources (CPU, memory, disk space) to better handle the current workload.
- Implement traffic shaping: Consider implementing traffic shaping or quality of service (QoS) policies to prioritize critical traffic and reduce the impact of high network traffic.