Fix gRPC DEADLINE_EXCEEDED Errors Under Load

Q: Why does my server show `CANCELLED` but my client shows `DEADLINE_EXCEEDED`?

That is the correct, healthy pairing. The client hit its deadline (code 4); gRPC then cancelled the in-flight server handler (code 1) because nobody is waiting for the result. It means deadline propagation is working. The fix is to make the downstream faster or raise the budget, not to chase the cancellation.

Q: Should I put `DEADLINE_EXCEEDED` in `retryableStatusCodes`?

Only for idempotent methods, and only if the per-attempt `timeout` is smaller than the overall call deadline. A retry fires only while the overall deadline still has time left, so if both budgets are equal the retry dies instantly on the same deadline. For most setups, retry `UNAVAILABLE` and `RESOURCE_EXHAUSTED` and leave `DEADLINE_EXCEEDED` out.

Q: Retries made things worse, not better. Why?

You likely have no throttle. Without `retryThrottling`, every failed call multiplies load on an already-slow backend, the classic retry storm. Add the token-bucket `retryThrottling` block, and pair it with a circuit breaker that fails fast once the failure ratio crosses your threshold.

Q: Increasing the deadline did not help. What now?

A bigger deadline only helps when the work genuinely finishes a little late. If p99 is climbing without bound, you are capacity- or contention-bound: trace the longest span (Step 5), and look for a lock, an untimed third-party call, or an N+1 RPC fan-out. More time just lets more requests pile up.

gRPC clients throw DEADLINE_EXCEEDED when traffic rises. Propagate deadlines, set per-RPC timeouts by SLO, add the built-in retry policy with throttling, and trip a circuit breaker so one slow backend stops cascading.

Published: May 24, 2026 Updated: Jun 18, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Latency was fine at noon. By 14:00 your gRPC service is on fire: clients return DEADLINE_EXCEEDED (status code 4) everywhere, the upstream is also getting DEADLINE_EXCEEDED from its dependencies, and tail latency has eaten the dashboards. The pattern is almost always the same: a slow downstream, no deadline propagation, and per-RPC timeouts so tight they all trip at once.

Fastest fix: raise the client deadline to sit above your current server p99 (buy yourself breathing room), then fix the real cause: propagate the caller’s ctx to every downstream so nobody works on a request the user abandoned, switch hand-rolled retries to the built-in retryPolicy with retryThrottling, and put a circuit breaker on the slowest dependency so a single bad backend short-circuits instead of cascading.

First, read the status code correctly

DEADLINE_EXCEEDED is reported on the client side when the call did not finish within the deadline the client set. On the server side, once that same deadline passes, gRPC cancels the in-flight handler and surfaces CANCELLED (status code 1), not DEADLINE_EXCEEDED. So:

Seeing DEADLINE_EXCEEDED on the caller and CANCELLED on the callee for the same trace = deadline propagation is working; the downstream is just too slow.
Seeing DEADLINE_EXCEEDED on the caller but the callee runs to completion with OK = deadline is not propagating. That is cause #2 below.

Deadlines are absolute points in time, but gRPC does not ship a wall-clock timestamp (the two machines’ clocks disagree). It serializes the remaining budget into the grpc-timeout request header as a relative duration with elapsed time already deducted, and the server reconstructs the deadline on receipt. Confirm the header is present with grpcurl -vv or by inspecting metadata.

Common causes

Ordered by hit rate.

#	Cause	Tell-tale signal
1	Server is genuinely slower than the client deadline	Server p99 `>` client deadline in the matching window
2	Deadlines not propagated through the chain	Callee finishes `OK` long after caller timed out; inflight goroutines climb
3	No retry policy	Error rate spikes on minor blips, recovers slowly
4	Naive retry with no throttle / breaker	Backend RPS rises during a slowdown instead of dropping (retry storm)
5	Head-of-line blocking on one HTTP/2 connection	p99 across all endpoints degrades together when only one is slow

1. Server is genuinely slower than the client timeout

Service is at p99 = 1.5 s under load; client deadline is 1 s. Every p99 request fails. Adding more clients makes it worse because nobody backs off.

How to spot it: server p99 > client deadline in the matching time window.

2. Deadlines are not propagated through the chain

Client sets a 5 s deadline calling service A. Service A calls service B with a fresh context.Background() (no deadline, defaults to infinity). A times out, but B keeps working on a request nobody is waiting for, wasting capacity.

How to spot it: B never sees CANCELLED from upstream cancellation; requests keep finishing long after the user gave up. Inflight goroutines / threads climb.

3. No retry policy

A transient hiccup turns into a permanent failure because the client gives up after the first deadline.

How to spot it: error rate jumps sharply during minor blips and recovers slowly.

4. Naive retry without throttling or a circuit breaker

Client retries every failure. A slow backend gets 3x the load because every call retries twice. Retry storm.

How to spot it: backend RPS spikes during a slowdown instead of dropping.

5. Head-of-line blocking on a single connection

HTTP/2 streams on one connection share flow control. One slow stream slows everyone on that connection, and a default gRPC channel often uses a single TCP connection per subchannel.

How to spot it: p99 across endpoints all degrade together even when only one endpoint is slow.

Shortest path to fix

Step 1: Propagate deadlines through every hop

Go server-side handler: ctx carries the deadline from the caller; pass that same ctx to downstreams. Never swap in context.Background() or context.TODO().

func (s *server) GetOrder(ctx context.Context, req *pb.GetOrderReq) (*pb.Order, error) {
    // ctx already carries the caller's deadline. Do NOT replace it with
    // context.Background() — that detaches the downstream from cancellation.
    user, err := s.userClient.GetUser(ctx, &pb.GetUserReq{Id: req.UserId})
    if err != nil { return nil, err }
    // ...
}

If you genuinely need a sub-budget for a downstream (e.g. reserve 200 ms for cleanup), derive it from the inbound ctx, do not start from scratch:

dctx, cancel := context.WithTimeout(ctx, 800*time.Millisecond)
defer cancel()
resp, err := s.userClient.GetUser(dctx, req)

Node client: set the deadline as an absolute Date. @grpc/grpc-js converts it to the relative grpc-timeout header for you, so you do not subtract elapsed time by hand. The mistake to avoid is creating a brand-new full-length deadline at every hop instead of carrying the remaining budget forward.

import { credentials, Metadata } from '@grpc/grpc-js';

const deadline = new Date(Date.now() + 2000);   // 2 s budget
client.getOrder({ id: '...' }, { deadline }, (err, res) => { /* ... */ });

Step 2: Pick per-RPC timeouts by SLO, not by guess

Call type	Typical deadline
Sync user-facing read	200-500 ms
Sync user-facing write	1-2 s
Background batch	30-60 s
Streaming	no deadline on the call; budget each iteration

Set a default in the service config and override per call. The official gRPC guidance is to start from an educated guess (network latency + server processing) and validate it with load testing, not pull a round number out of the air.

Step 3: Use the built-in retry policy (with throttling)

gRPC supports a service-config-driven retry policy (defined in gRFC A6). Use it instead of hand-rolled retries: it applies exponential backoff with automatic jitter of +/- 20%, so you do not have to add jitter yourself. With initialBackoff: 0.1s the first wait is a uniform random value in [80ms, 120ms], which is exactly the spread you want to avoid a synchronized retry herd.

{
  "methodConfig": [{
    "name": [{ "service": "shop.OrderService" }],
    "retryPolicy": {
      "maxAttempts": 4,
      "initialBackoff": "0.05s",
      "maxBackoff": "1s",
      "backoffMultiplier": 2.0,
      "retryableStatusCodes": ["UNAVAILABLE", "RESOURCE_EXHAUSTED"]
    },
    "timeout": "2s"
  }],
  "retryThrottling": {
    "maxTokens": 10,
    "tokenRatio": 0.1
  }
}

Three things people get wrong here:

maxAttempts is capped at 5 by default. As of June 2026 the gRPC retry spec (gRFC A6) defines a client-side maximum of 5: any value above 5 is silently clamped to 5 (it is not even a validation error), so maxAttempts: 10 behaves like 5. The cap is raisable through a channel argument if you truly need it, but that is almost never the right move. Count the first try in the number you set (4 means 1 try + 3 retries).
retryThrottling is your built-in anti-retry-storm. It maintains a per-channel token bucket: each failed RPC spends a token, each success adds tokenRatio back. When the bucket empties, retries stop until successes refill it. This is the channel-level defense against cause #4; add it whenever you enable retries.
DEADLINE_EXCEEDED is configurable as a retryable code, but be careful. The gRPC docs do allow it in retryableStatusCodes, yet a retry only fires if the overall deadline still has budget left. If the per-attempt and overall budgets are equal, the retry dies on the same deadline. Only add it for idempotent methods and set the per-attempt timeout below the overall call deadline.

Wire it in (Go):

const cfg = `{ "methodConfig": [...], "retryThrottling": {...} }`
conn, _ := grpc.NewClient("dns:///orders:50051",
    grpc.WithDefaultServiceConfig(cfg),
    grpc.WithTransportCredentials(insecure.NewCredentials()),
)

grpc.Dial was deprecated in grpc-go v1.63 in favor of grpc.NewClient (which is lazy and does not block on connect); migrate if you are still calling Dial. One gotcha during that migration: grpc.NewClient defaults to the dns resolver, whereas grpc.Dial defaulted to passthrough. If you pass a bare host:port target you now want dns:///host:port (or passthrough:///host:port to keep the old behavior), otherwise name resolution and load balancing will not behave the way they did under Dial.

Step 4: Add a circuit breaker on the slow path

Retry throttling slows a storm; a circuit breaker stops one entirely. Wrap the outbound call so that after N failures in a window the breaker opens for M seconds and short-circuits immediately. Use github.com/sony/gobreaker/v2 (the v2 module is generics-based and needs Go 1.21+; the old non-versioned import using interface{} is legacy):

import "github.com/sony/gobreaker/v2"

cb := gobreaker.NewCircuitBreaker[*pb.Order](gobreaker.Settings{
    Name:        "orders",
    MaxRequests: 1,
    Interval:    30 * time.Second,
    Timeout:     10 * time.Second,
    ReadyToTrip: func(c gobreaker.Counts) bool {
        return c.Requests >= 20 && float64(c.TotalFailures)/float64(c.Requests) > 0.5
    },
})

res, err := cb.Execute(func() (*pb.Order, error) {
    return client.GetOrder(ctx, req)
})

When the breaker is open it returns gobreaker.ErrOpenState instantly; map that to a fast UNAVAILABLE for your own callers instead of letting them wait out the full deadline. That fast failure is what stops the cascade. If you want retry, breaker, timeout, and hedging composed in one place, failsafe-go bundles all of them.

Step 5: Trace the slow span

Turn the symptom into a single span to optimize.

# Reproduce the call directly against the service
grpcurl -d '{"id":"abc"}' -plaintext orders:50051 shop.OrderService/GetOrder

In your tracing UI (Jaeger, Tempo, Honeycomb), filter for status_code = DEADLINE_EXCEEDED (or CANCELLED on the callee) and open the longest child span. That is the path to optimize. Common offenders: a synchronous DB query stuck behind a lock, a blocking call to a third-party API with no timeout of its own, or an N+1 RPC pattern fanning out one request into dozens.

Step 6: Spread load across more connections

conn, _ := grpc.NewClient("dns:///orders.example.com:50051",
    grpc.WithDefaultServiceConfig(`{"loadBalancingConfig":[{"round_robin":{}}]}`),
    grpc.WithTransportCredentials(insecure.NewCredentials()),
)

round_robin over multiple subchannels (one per resolved address) avoids head-of-line blocking on a single HTTP/2 connection. For this to help you need a DNS name that resolves to multiple backends (or an xDS / pick_first alternative); a single static host:port still gives you one connection.

How to confirm it’s fixed

Pull the DEADLINE_EXCEEDED rate by method (not just the global error rate) and confirm it returns to baseline during the same load level that broke it.
Re-run the load test that triggered it; server p99 should now sit comfortably below the client deadline.
Check that the callee logs CANCELLED (not OK) for requests the caller abandoned — that proves propagation works.
During an induced slowdown, backend RPS should now fall (breaker open) instead of spiking.

Prevention

Deadlines propagate through every hop; no context.Background() in handlers.
Per-RPC timeouts derived from SLO and load testing, not guesses.
Built-in retryPolicy with bounded attempts (maxAttempts <= 5) plus retryThrottling; never retry DEADLINE_EXCEEDED without a smaller per-attempt timeout and an idempotent method.
Circuit breaker on every external dependency.
Tracing on by default; alert on the per-method DEADLINE_EXCEEDED rate, not just the global error rate.

FAQ

Why does my server show CANCELLED but my client shows DEADLINE_EXCEEDED? That is the correct, healthy pairing. The client hit its deadline (code 4); gRPC then cancelled the in-flight server handler (code 1) because nobody is waiting for the result. It means deadline propagation is working. The fix is to make the downstream faster or raise the budget, not to chase the cancellation.

Should I put DEADLINE_EXCEEDED in retryableStatusCodes? Only for idempotent methods, and only if the per-attempt timeout is smaller than the overall call deadline. A retry fires only while the overall deadline still has time left, so if both budgets are equal the retry dies instantly on the same deadline. For most setups, retry UNAVAILABLE and RESOURCE_EXHAUSTED and leave DEADLINE_EXCEEDED out.

My maxAttempts: 10 does not seem to retry that many times. It cannot. As of June 2026 the gRPC retry spec sets a default client-side maximum of 5, so anything above 5 is clamped to 5. The cap is adjustable via a channel argument, but if you find yourself reaching for that you almost certainly need to fix the downstream, not pile on more retries.

Retries made things worse, not better. Why? You likely have no throttle. Without retryThrottling, every failed call multiplies load on an already-slow backend, the classic retry storm. Add the token-bucket retryThrottling block, and pair it with a circuit breaker that fails fast once the failure ratio crosses your threshold.

Increasing the deadline did not help. What now? A bigger deadline only helps when the work genuinely finishes a little late. If p99 is climbing without bound, you are capacity- or contention-bound: trace the longest span (Step 5), and look for a lock, an untimed third-party call, or an N+1 RPC fan-out. More time just lets more requests pile up.

External references: gRPC Deadlines guide, gRPC Retry guide, gRPC Status Codes, gRFC A6: client retries.

Tags: #Backend #Troubleshooting #grpc

First, read the status code correctly

Common causes

1. Server is genuinely slower than the client timeout

2. Deadlines are not propagated through the chain

3. No retry policy

4. Naive retry without throttling or a circuit breaker

5. Head-of-line blocking on a single connection

Shortest path to fix

Step 1: Propagate deadlines through every hop

Step 2: Pick per-RPC timeouts by SLO, not by guess

Step 3: Use the built-in retry policy (with throttling)

Step 4: Add a circuit breaker on the slow path

Step 5: Trace the slow span

Step 6: Spread load across more connections

How to confirm it’s fixed

Prevention

FAQ

Related

Related Articles

Scheduled Cron Job Skipped Silently With No Error Logged

Postgres Migration Stuck on ALTER TABLE in Production

Docker Container Restarts With Exit Code 137 (OOM Killed): Fix It

JWT 'jwt expired' on Fresh Tokens: Fix Clock Skew

Kafka Consumer Lag Keeps Growing After Scaling Consumers

MongoDB Aggregation With $lookup + $group Runs for 30 Seconds