IstioLatency99Percentile #

Istio 1% slowest requests are longer than 1000ms.

Alert Rule

alert: IstioLatency99Percentile
annotations:
  description: |-
    Istio 1% slowest requests are longer than 1000ms.
      VALUE = {{ $value }}
      LABELS = {{ $labels }}    
  runbook: https://srerun.github.io/prometheus-alerts/runbooks/istio-internal/istiolatency99percentile/
  summary: Istio latency 99 percentile (instance {{ $labels.instance }})
expr: histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket[1m]))
  by (destination_canonical_service, destination_workload_namespace, source_canonical_service,
  source_workload_namespace, le)) &gt; 1000
for: 1m
labels:
  severity: warning

Here is a runbook for the IstioLatency99Percentile alert rule:

Meaning #

The IstioLatency99Percentile alert is triggered when the 99th percentile of Istio request duration exceeds 1000ms for a given service and namespace. This indicates that 1% of requests are experiencing high latency, which can impact the performance and responsiveness of the application.

Impact #

Slow request processing can lead to a poor user experience, decreased system throughput, and increased error rates.
High latency can also cause cascading failures, as dependent services may timeout or become unavailable.
Prolonged latency issues can result in revenue loss, customer dissatisfaction, and damage to the organization’s reputation.

Diagnosis #

To diagnose the root cause of high latency, follow these steps:

Check request traffic patterns: Analyze the request rate and distribution to identify any sudden changes or spikes.
Investigate service dependencies: Verify that dependent services are operating within expected latency bounds.
Review Istio configuration: Ensure that Istio is properly configured, and that service mesh metrics are being collected correctly.
Examine pod and container logs: Review logs for errors, warnings, or other indicators of issues that may be contributing to high latency.
Check for resource constraints: Verify that pods have sufficient resources (CPU, memory, etc.) to handle incoming requests.

Mitigation #

To mitigate high latency, take the following steps:

Scale out services: Temporarily increase the number of replicas for the affected service to handle the increased request volume.
Optimize service configuration: Review and optimize service configuration, such as timeouts, retries, and circuit breakers.
Improve resource allocation: Ensure that pods have sufficient resources to handle incoming requests.
Apply caching or content compression: Implement caching or content compression to reduce the load on services and improve response times.
Investigate root cause: Identify and address the underlying cause of high latency, which may require fixing application code, database queries, or other system components.