VaultTooManyPendingTokens #
Too many pending tokens {{ $labels.instance }}: {{ $value | printf “%.2f”}}%
Alert Rule
alert: VaultTooManyPendingTokens
annotations:
description: |-
Too many pending tokens {{ $labels.instance }}: {{ $value | printf "%.2f"}}%
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/hashicorp-vault-internal/vaulttoomanypendingtokens/
summary: Vault too many pending tokens (instance {{ $labels.instance }})
expr: avg(vault_token_create_count - vault_token_store_count) > 0
for: 5m
labels:
severity: warning
Here is a runbook for the Prometheus alert rule “VaultTooManyPendingTokens”:
Meaning #
The “VaultTooManyPendingTokens” alert is triggered when the average difference between the number of token creations and token stores in Vault exceeds 0 over a 5-minute period. This indicates that there are too many pending tokens in Vault, which can lead to performance issues and potential security risks.
Impact #
If left unaddressed, this issue can cause:
- Performance degradation in Vault and dependent systems
- Increased latency for token requests
- Potential security risks due to unauthorized access or token leakage
- In extreme cases, Vault may become unavailable or crash due to excessive memory usage
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Vault logs for any errors or warnings related to token creation or storage.
- Verify that the
vault_token_create_count
andvault_token_store_count
metrics are correct and up-to-date. - Investigate any recent changes to Vault configurations, plugins, or dependent systems that may be causing the issue.
- Check the Vault instance’s resource utilization (CPU, memory, disk space) to ensure it has sufficient capacity.
Mitigation #
To mitigate the issue, follow these steps:
- Reduce the load on Vault by identifying and addressing any excessive token requests or misconfigured clients.
- Verify that Vault is properly configured for token storage and rotation.
- Consider increasing the resources (CPU, memory, disk space) allocated to the Vault instance.
- Implement additional monitoring and logging to detect and prevent similar issues in the future.
- If necessary, consult the Vault documentation and HashiCorp support resources for further guidance.
Remember to update the vault_token_create_count
and vault_token_store_count
metrics to reflect any changes made to Vault configurations or dependent systems.