HaproxyServerHealthcheckFailure #
Some server healthcheck are failing on {{ $labels.server }}
Alert Rule
alert: HaproxyServerHealthcheckFailure
annotations:
description: |-
Some server healthcheck are failing on {{ $labels.server }}
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/embedded-exporter-v2/haproxyserverhealthcheckfailure/
summary: HAProxy server healthcheck failure (instance {{ $labels.instance }})
expr: increase(haproxy_server_check_failures_total[1m]) > 0
for: 1m
labels:
severity: warning
Here is the runbook for the HaproxyServerHealthcheckFailure alert:
Meaning #
The HaproxyServerHealthcheckFailure alert is triggered when the HAProxy server healthcheck fails, indicating that one or more servers are not responding to healthcheck requests. This alert is raised when the rate of healthcheck failures increases over a 1-minute period, indicating a potential issue with the server or the HAProxy configuration.
Impact #
The impact of this alert can be significant, as it may indicate that one or more servers are not functioning correctly, which can lead to:
- Reduced system availability
- Increased error rates
- Degraded performance
- Potential data loss or corruption
Diagnosis #
To diagnose the issue, follow these steps:
- Check the HAProxy logs for errors or warnings related to the healthcheck failures.
- Verify that the servers are configured correctly and are responding to healthcheck requests.
- Check the server metrics to identify any potential issues, such as high CPU usage or memory consumption.
- Review the HAProxy configuration to ensure that it is correct and up-to-date.
- Perform a manual healthcheck to validate that the servers are responding correctly.
Mitigation #
To mitigate this issue, follow these steps:
- Investigate and resolve any issues with the servers, such as restarting the server or applying configuration changes.
- Update the HAProxy configuration to ensure it is correct and up-to-date.
- Perform a rolling restart of the HAProxy servers to ensure that all servers are updated and functioning correctly.
- Verify that the healthcheck failures have decreased or resolved after taking the above steps.
- Consider implementing additional monitoring and alerting to detect potential issues with the servers or HAProxy configuration.