ProviderFailedBecauseNet_versionFailed #
Failed net_version for Provider {{$labels.provider}}
in Graph node {{$labels.instance}}
Alert Rule
alert: ProviderFailedBecauseNet_versionFailed
annotations:
description: |-
Failed net_version for Provider `{{$labels.provider}}` in Graph node `{{$labels.instance}}`
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/graph-node-internal/providerfailedbecausenet_versionfailed/
summary: Provider failed because net_version failed (instance {{ $labels.instance
}})
expr: eth_rpc_status == 1
for: 0m
labels:
severity: critical
Here is a runbook for the Prometheus alert rule:
Meaning #
This alert is triggered when the eth_rpc_status
metric returns a value of 1, indicating that the Provider has failed because of a failed net_version
check. This is a critical alert, as it affects the functionality of the Graph node.
Impact #
The impact of this alert is that the Provider is unable to function correctly, which can lead to issues with data retrieval and processing. This can have a cascading effect on dependent services and applications, causing data inconsistencies and errors.
Diagnosis #
To diagnose the root cause of this issue, follow these steps:
- Check the Graph node logs for errors related to
net_version
checks. - Verify that the Provider is configured correctly and that the
net_version
API is accessible. - Check the
eth_rpc_status
metric to see if there are any trends or patterns that can indicate the cause of the failure. - Investigate any recent changes to the Provider configuration or the Graph node environment that may have caused the issue.
Mitigation #
To mitigate this issue, follow these steps:
- Check the Provider configuration and ensure that it is correct and up-to-date.
- Restart the Provider service to try to recover from the failed
net_version
check. - If the issue persists, try to debug the
net_version
API to identify the root cause of the failure. - If necessary, roll back any recent changes to the Provider configuration or Graph node environment to a known good state.
- If the issue is still not resolved, escalate to the development team for further investigation and resolution.