CassandraTombstoneDump #

Cassandra tombstone dump - {{ $labels.cassandra_cluster }}

Alert Rule

alert: CassandraTombstoneDump
annotations:
  description: |-
    Cassandra tombstone dump - {{ $labels.cassandra_cluster }}
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/instaclustr-cassandra-exporter/cassandratombstonedump/
  summary: Cassandra tombstone dump (instance {{ $labels.instance }})
expr: avg(cassandra_table_tombstones_scanned{quantile=&#34;0.99&#34;}) by (instance,cassandra_cluster,keyspace)
  &gt; 100
for: 2m
labels:
  severity: critical

Here is a sample runbook for the CassandraTombstoneDump alert:

Meaning #

The CassandraTombstoneDump alert is triggered when the 99th percentile of Cassandra table tombstones scanned exceeds 100 per instance, cluster, and keyspace. This indicates that Cassandra is experiencing high tombstone dumping, which can lead to performance issues and increased latency.

Impact #

High tombstone dumping can cause:

Increased memory usage and garbage collection pauses
Slower query performance and increased latency
Higher disk usage due to bloom filter and index creation
Potential for Cassandra node crashes or failures

Diagnosis #

To diagnose the issue, follow these steps:

Check the Cassandra node logs for errors or warnings related to tombstone dumping.
Verify the Cassandra configuration to ensure that the tombstone threshold is set correctly.
Check the Cassandra metrics to identify the specific tables and keyspaces experiencing high tombstone dumping.
Run the nodetool command to inspect the Cassandra node’s tombstone statistics.

Mitigation #

To mitigate the issue, follow these steps:

Increase the Cassandra node’s heap size to accommodate increased memory usage.
Adjust the Cassandra configuration to optimize tombstone collection and removal.
Run the nodetool compact command to compact the Cassandra data files and remove tombstones.
Consider adding more Cassandra nodes to the cluster to distribute the load and reduce tombstone dumping.
Monitor the Cassandra metrics and adjust the configuration as needed to prevent future tombstone dumping issues.

Note: This is just a sample runbook, and the specific diagnosis and mitigation steps may vary depending on your Cassandra environment and configuration.