JenkinsBuildsHealthScore #
Healthcheck failure for {{$labels.instance}}
in realm {{$labels.realm}}/{{$labels.env}} ({{$labels.region}})
Alert Rule
alert: JenkinsBuildsHealthScore
annotations:
description: |-
Healthcheck failure for `{{$labels.instance}}` in realm {{$labels.realm}}/{{$labels.env}} ({{$labels.region}})
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/metric-plugin/jenkinsbuildshealthscore/
summary: Jenkins builds health score (instance {{ $labels.instance }})
expr: default_jenkins_builds_health_score < 1
for: 0m
labels:
severity: critical
Meaning #
The JenkinsBuildsHealthScore alert is triggered when the default Jenkins builds health score falls below 1. This score is a measure of the overall health of Jenkins builds across all instances, realms, environments, and regions. A score below 1 indicates that one or more Jenkins builds are experiencing issues, which can impact the reliability and efficiency of the development pipeline.
Impact #
The impact of this alert is high, as it can lead to:
- Delays in code deployment and release
- Increased risk of errors and bugs in production
- Decreased developer productivity
- Inefficiencies in the development pipeline
If not addressed promptly, this issue can cause significant disruptions to the development team and the overall business.
Diagnosis #
To diagnose the root cause of this issue, follow these steps:
- Check the Jenkins dashboard for any failed or stuck builds
- Review the Jenkins build logs for errors or warnings
- Verify that all Jenkins instances are running and healthy
- Check the network connectivity and communication between Jenkins instances
- Investigate any recent changes to the Jenkins configuration or plugins
Mitigation #
To mitigate this issue, follow these steps:
- Identify and cancel any stuck or failed builds
- Restart the affected Jenkins instance(s)
- Verify that all Jenkins instances are running and healthy
- Review and update the Jenkins configuration and plugins as needed
- Implement additional monitoring and logging to prevent similar issues in the future
For more detailed instructions and troubleshooting steps, refer to the JenkinsBuildsHealthScore runbook.