CassandraClientRequestUnavailableWrite #
Some Cassandra client requests are unavailable to write - {{ $labels.cassandra_cluster }}
Alert Rule
alert: CassandraClientRequestUnavailableWrite
annotations:
description: |-
Some Cassandra client requests are unavailable to write - {{ $labels.cassandra_cluster }}
VALUE = {{ $value }}
LABELS = {{ $labels }}
runbook: https://srerun.github.io/prometheus-alerts/runbooks/instaclustr-cassandra-exporter/cassandraclientrequestunavailablewrite/
summary: Cassandra client request unavailable write (instance {{ $labels.instance
}})
expr: changes(cassandra_client_request_unavailable_exceptions_total{operation="write"}[1m])
> 0
for: 2m
labels:
severity: critical
Meaning #
The CassandraClientRequestUnavailableWrite alert is triggered when the number of Cassandra client request unavailable exceptions for write operations increases within a 1-minute window. This indicates that the Cassandra cluster is currently unable to handle write requests, which can lead to data loss, inconsistencies, or application errors.
Impact #
- Data may not be written to the Cassandra cluster, leading to data loss or inconsistencies.
- Applications that rely on Cassandra for writes may experience errors or failures, causing disruptions to business-critical processes.
- The inability to write data to Cassandra can have a cascading effect on dependent systems, leading to a broader impact on the overall system.
Diagnosis #
To diagnose the issue, follow these steps:
- Check the Cassandra cluster status and node health using the Cassandra nodetool or the Instaclustr Cassandra console.
- Investigate the Cassandra system logs for any errors or exceptions related to write operations.
- Verify that the Cassandra cluster is properly configured and that there are no network connectivity issues.
- Check the application logs for any errors or exceptions related to Cassandra write operations.
- Review the Prometheus metrics for any anomalies or trends that may indicate the root cause of the issue.
Mitigation #
To mitigate the issue, follow these steps:
- Check the Cassandra cluster status and node health, and restart any nodes that are not operational.
- Investigate and resolve any network connectivity issues between the Cassandra cluster and the application.
- Verify that the Cassandra cluster is properly configured, including checking the replication factor, consistency level, and write timeouts.
- Implement retries or fallback mechanisms in the application to handle temporary Cassandra unavailability.
- Consider increasing the Cassandra cluster capacity or adjusting the write workload to prevent similar issues in the future.
Note: For more detailed instructions and troubleshooting steps, refer to the runbook provided in the alert annotation.