KubernetesCronjobSuspended #
CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is suspended
Alert Rule
alert: KubernetesCronjobSuspended
annotations:
description: |-
CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is suspended
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/kubestate-exporter/kubernetescronjobsuspended/
summary: Kubernetes CronJob suspended ({{ $labels.namespace }}/{{ $labels.cronjob
}})
expr: kube_cronjob_spec_suspend != 0
for: 0m
labels:
severity: warning
Here is a runbook for the KubernetesCronjobSuspended alert:
Meaning #
The KubernetesCronjobSuspended alert is triggered when a CronJob in a Kubernetes cluster is suspended. A suspended CronJob will not execute its scheduled tasks, which can impact the application or system that relies on it. This alert is classified as a warning, indicating a potential issue that requires attention.
Impact #
The impact of a suspended CronJob can vary depending on the specific use case and application. Some possible consequences include:
- Missed scheduled tasks, leading to data inconsistency or staleness
- Delays in processing critical workflows or jobs
- Increased latency or errors in dependent systems
- Potential security risks if the suspended CronJob is responsible for security-related tasks
Diagnosis #
To diagnose the issue, follow these steps:
- Check the CronJob’s configuration and status using
kubectl
commands:
kubectl describe cronjob <cronjob-name> -n <namespace>
- Review the CronJob’s suspension history:
kubectl get cronjob <cronjob-name> -n <namespace> -o yaml
- Verify if there are any errors or warnings in the CronJob’s logs:
kubectl logs <cronjob-name> -n <namespace>
- Check the Kubernetes cluster’s event logs for any relevant errors or warnings:
kubectl get events -n <namespace>
- Verify if there are any changes to the CronJob’s configuration or deployment that may have caused the suspension.
Mitigation #
To mitigate the issue, follow these steps:
- Identify the reason for the CronJob’s suspension and address it accordingly:
- If the suspension is intentional, ensure that the CronJob is correctly configured and rescheduled as needed.
- If the suspension is unintentional, investigate and resolve the underlying issue that caused the suspension.
- Unsuspend the CronJob using
kubectl
:
kubectl patch cronjob <cronjob-name> -n <namespace> -p='{"spec":{"suspend": false}}'
- Verify that the CronJob is running successfully and not suspended:
kubectl get cronjob <cronjob-name> -n <namespace> -o yaml
- Monitor the CronJob’s status and logs to ensure it continues to run as expected.
- Update the KubernetesCronjobSuspended alert rule to reflect the resolved status.