ProviderFailedBecauseNet_versionFailed #
Failed net_version for Provider {{$labels.provider}} in Graph node {{$labels.instance}}
Alert Rule
alert: ProviderFailedBecauseNet_versionFailed
annotations:
  description: |-
    Failed net_version for Provider `{{$labels.provider}}` in Graph node `{{$labels.instance}}`
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/graph-node-internal/providerfailedbecausenet_versionfailed/
  summary: Provider failed because net_version failed (instance {{ $labels.instance
    }})
expr: eth_rpc_status == 1
for: 0m
labels:
  severity: critical
Here is a runbook for the Prometheus alert rule:
Meaning #
This alert is triggered when the eth_rpc_status metric returns a value of 1, indicating that the Provider has failed because of a failed net_version check. This is a critical alert, as it affects the functionality of the Graph node.
Impact #
The impact of this alert is that the Provider is unable to function correctly, which can lead to issues with data retrieval and processing. This can have a cascading effect on dependent services and applications, causing data inconsistencies and errors.
Diagnosis #
To diagnose the root cause of this issue, follow these steps:
- Check the Graph node logs for errors related to net_versionchecks.
- Verify that the Provider is configured correctly and that the net_versionAPI is accessible.
- Check the eth_rpc_statusmetric to see if there are any trends or patterns that can indicate the cause of the failure.
- Investigate any recent changes to the Provider configuration or the Graph node environment that may have caused the issue.
Mitigation #
To mitigate this issue, follow these steps:
- Check the Provider configuration and ensure that it is correct and up-to-date.
- Restart the Provider service to try to recover from the failed net_versioncheck.
- If the issue persists, try to debug the net_versionAPI to identify the root cause of the failure.
- If necessary, roll back any recent changes to the Provider configuration or Graph node environment to a known good state.
- If the issue is still not resolved, escalate to the development team for further investigation and resolution.