Fix External API 429 Rate Limit Errors

Q: Is it safe to retry a 429?

Yes for idempotent reads (`GET`) and for providers that say so — Twilio explicitly states 429'd requests aren't processed and are safe to retry after backing off. For writes (`POST` a charge, send an SMS), use the provider's **idempotency key** (Stripe's `Idempotency-Key` header, etc.) so a retry can't create a duplicate.

Q: Stripe returns 429 but there's no `Retry-After` header — what do I wait on?

Stripe doesn't send `Retry-After`. It sends `Stripe-Rate-Limited-Reason` to explain *why* (rate vs concurrency, global vs endpoint). Fall back to exponential back-off with jitter, and if the reason is `*-concurrency`, reduce parallelism rather than just slowing the rate.

Q: In-memory cache or Redis?

In-memory (`lru-cache`) is fine for a single process or where slightly stale per-instance data is acceptable. Once you run multiple workers or want a shared, longer TTL, move to Redis so every instance reads one cache and you stop multiplying upstream calls per instance.

Stripe, Twilio, and SendGrid all return 429 once traffic ramps. Read the rate-limit headers, add exponential back-off with jitter, cache idempotent GETs, dedupe concurrent calls, and batch writes — in that order.

Published: May 17, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You call Stripe / Shopify / Twilio / SendGrid from your server. Dev is fine. Traffic ramps up and you start getting:

HTTP 429 Too Many Requests
{
  "error": {
    "type": "rate_limit_error",
    "message": "Too many requests"
  }
}

Fastest fix: wrap every outbound call in a retry helper that reads the provider’s retry header, sleeps with exponential back-off plus random jitter, and caps at about 60s. That alone clears most 429 storms. Then cache idempotent GETs and dedupe concurrent identical calls so the same request never hits upstream twice. The problem is almost never “too many calls” in absolute terms — it is that your client layer retries instantly, doesn’t cache idempotent reads, and doesn’t dedupe concurrent calls, so a small spike multiplies into a flood.

Mental model: third-party rate limits are scoped to some combination of (account, endpoint, time window), and many providers also enforce a separate concurrency cap (how many requests are in flight at once). The fix isn’t “go slower” globally — it’s reforming the request pattern so you stop sending avoidable and bursty calls.

Which 429 are you getting? (read the headers first)

Before changing code, look at the actual response headers — they tell you which limit you hit and how long to wait. As of June 2026:

Provider	Status on limit	Header that tells you to wait	Documented limits
Stripe	`429 Too Many Requests`	`Stripe-Rate-Limited-Reason` (values: `global-rate`, `endpoint-rate`, `global-concurrency`, `endpoint-concurrency`, `resource-specific`)	100 req/s live mode, 25 req/s sandbox
SendGrid (Twilio)	`429 Too Many Requests`	`X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset` (epoch reset)	600 req/min on most v3 endpoints; mail send much higher
Twilio REST	`429`, error code `20429`	`Twilio-Concurrent-Requests` (in-flight count)	Concurrency-based; not a fixed RPS
Generic / RFC	`429 Too Many Requests`	`Retry-After` (seconds, e.g. `Retry-After: 30`, or an HTTP-date)	Defined in RFC 6585; `Retry-After` in RFC 9110 §10.2.3

Two things to notice. First, Stripe does not send a standard X-RateLimit-* or Retry-After header — it sends Stripe-Rate-Limited-Reason instead, so a generic “read Retry-After” helper gets nothing from Stripe and must fall back to back-off. Second, Twilio’s 429 is usually a concurrency cap, not a per-second rate — fewer parallel requests fixes it even if your total volume is low.

Common causes

Ordered by hit rate, highest first.

1. Tight retry loop, no back-off

try { fetch() } catch { setTimeout(fetch, 100) } — 100ms later it is still 429, and the infinite retry locks the window harder.

How to spot it: 10+ 429s in seconds in the logs, all from the same code path.

2. Ignoring the wait header

The provider’s 429 tells you how long to wait (Retry-After, X-RateLimit-Reset, or simply “back off” for Stripe). A hardcoded sleep ignores it and almost certainly hits the wall again.

How to spot it: code has a fixed await sleep(1000) with no header read.

3. Multiple workers calling the same endpoint

10 workers each run GET /products/123 for the same product. Each worker thinks its own volume is fine; combined, they bust the limit.

How to spot it: logs show the same URL hit several times within the same second.

4. Not caching idempotent GETs

GET /products hits upstream on every call, even though the catalog rarely changes — every user request triggers a fresh fetch.

How to spot it: frequent duplicate GETs in logs for slow-changing data.

5. Burst traffic (cron, batch) blows the limit

Promise.all([100 fetches]) instantly burns through “60 / min” or trips the concurrency cap, yielding immediate 429s.

How to spot it: code has wide fan-out (Promise.all, parallel map) over a large array.

6. Shared API key across services

Backend, CI, and cron all run on one API key — each thinks its quota is fine, but the sum is over.

How to spot it: the provider’s dashboard shows multiple source IPs or services on one key.

Shortest path to fix

Step 1: Exponential back-off + respect the wait header

async function fetchWithRetry(url: string, opts: RequestInit = {}, maxRetries = 5) {
  const baseDelay = 1000;
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const res = await fetch(url, opts);
    if (res.status !== 429) return res;

    // RFC 9110 Retry-After (seconds). Stripe omits it; treat missing as 0.
    const retryAfter = parseInt(res.headers.get('retry-after') || '0', 10);
    const backoff = baseDelay * 2 ** attempt + Math.random() * 250; // exp + jitter
    const wait = Math.min(Math.max(retryAfter * 1000, backoff), 60_000);

    console.warn(`429 on ${url}, sleeping ${wait}ms (attempt ${attempt + 1})`);
    await new Promise((r) => setTimeout(r, wait));
  }
  throw new Error(`Max retries exceeded for ${url}`);
}

Read Retry-After first; if it is missing (Stripe), fall back to exponential back-off with jitter, capped at 60s. The jitter (Math.random() * 250) is what prevents a thundering herd where every retrying client wakes up at the same instant — Stripe explicitly recommends adding randomness to the back-off.

Note: the official Stripe and OpenAI Node SDKs already retry 429s with back-off internally, so if you call those through their SDK you mostly get this for free — this helper is for raw fetch calls to APIs that don’t.

Step 2: Cache idempotent GETs

// Short-window in-memory cache. lru-cache v7+ exports LRUCache (named, not default).
import { LRUCache } from 'lru-cache';
const cache = new LRUCache<string, unknown>({ max: 1000, ttl: 60_000 }); // 60s

async function getProduct(id: string) {
  const key = `product:${id}`;
  const hit = cache.get(key);
  if (hit) return hit;

  const res = await fetchWithRetry(`/api/products/${id}`);
  const data = await res.json();
  cache.set(key, data);
  return data;
}

(If you are still on lru-cache v6 you used a default import; v7+ switched to the named LRUCache export, so import LRU from 'lru-cache' now throws a type error.)

For longer windows or multi-instance deployments, use Redis (Upstash’s free tier is plenty) so all workers share one cache:

import { Redis } from '@upstash/redis';
const redis = Redis.fromEnv();

async function getProduct(id: string) {
  const cached = await redis.get(`product:${id}`);
  if (cached) return cached;
  const res = await fetchWithRetry(`/api/products/${id}`);
  const data = await res.json();
  await redis.set(`product:${id}`, JSON.stringify(data), { ex: 3600 }); // 1h
  return data;
}

Step 3: Request coalescing (dedupe concurrent identical calls)

const inFlight = new Map<string, Promise<unknown>>();

async function getProductDedup(id: string) {
  const key = `product:${id}`;
  const existing = inFlight.get(key);
  if (existing) return existing;

  const promise = fetchWithRetry(`/api/products/${id}`)
    .then((r) => r.json())
    .finally(() => inFlight.delete(key));

  inFlight.set(key, promise);
  return promise;
}

10 workers all calling getProductDedup(123) in the same instant produce only one upstream call. This is the single biggest win against Twilio-style concurrency limits.

Step 4: Throttle fan-out

import pLimit from 'p-limit';
const limit = pLimit(5); // at most 5 requests in flight

const results = await Promise.all(
  items.map((item) => limit(() => fetchItem(item))),
);

Size the concurrency from the docs: documented RPM / 60 gives a rough RPS ceiling, and a safe in-flight count is usually 1-3 for a 60 RPM endpoint. For Twilio, this is a hard requirement — match the limiter to your account’s concurrency budget rather than its message throughput.

Step 5: Batch endpoints

Many APIs have a batch or bulk variant that does in one request what you were doing in N:

Inefficient: N × GET /users/{id}
Efficient:   1 × POST /users:batchGet { ids: [...] }

Read the docs, find the batch endpoint (Stripe lets you expand related objects in one call; SendGrid’s mail send takes many recipients per request), and cut your request count.

Step 6: Split keys per service

Backend service A → KEY_A
Backend service B → KEY_B
CI / cron          → KEY_C

Each key carries its own quota, so a runaway cron job can’t starve live traffic. This also makes the dashboard readable when you are diagnosing which service is the noisy one.

Step 7: Upgrade plan / request a quota increase

After back-off, caching, dedupe, batching, and key-splitting, if you are still grazing the cap, request a higher limit from the provider. Stripe asks you to contact support for a sustained increase; SendGrid and Twilio raise concurrency/throughput limits as you scale or upgrade plan.

How to confirm it’s fixed

Watch the 429 rate. Log every 429 with the URL and the relevant header (Retry-After / Stripe-Rate-Limited-Reason / Twilio-Concurrent-Requests). After the fix it should drop to near zero under the same traffic.
Re-run the burst. Replay the cron job or load test that triggered it. With p-limit and dedupe in place, the upstream call count should be far lower than the number of items processed.
Check the cache hit rate. For cached GETs, confirm most requests now resolve from cache (log a cache HIT/MISS line) instead of hitting upstream.
Alert on regressions. Set an alert if the 429 rate goes above ~0.5% of requests — that means you are grazing the cap again and should optimize before users notice.

FAQ

Is it safe to retry a 429? Yes for idempotent reads (GET) and for providers that say so — Twilio explicitly states 429’d requests aren’t processed and are safe to retry after backing off. For writes (POST a charge, send an SMS), use the provider’s idempotency key (Stripe’s Idempotency-Key header, etc.) so a retry can’t create a duplicate.

Stripe returns 429 but there’s no Retry-After header — what do I wait on? Stripe doesn’t send Retry-After. It sends Stripe-Rate-Limited-Reason to explain why (rate vs concurrency, global vs endpoint). Fall back to exponential back-off with jitter, and if the reason is *-concurrency, reduce parallelism rather than just slowing the rate.

Increasing my retries made it worse. Why? More retries without jitter create a thundering herd: every client retries on the same schedule and re-bursts the limit in sync. Add randomness to the back-off and cap total attempts (5 is plenty); past that, queue the work instead of hammering.

My total volume is well under the limit but I still get 429s. You are probably hitting a concurrency cap, not a rate cap (common on Twilio). Ten parallel requests can trip it even at low total volume. Add request coalescing (Step 3) and a p-limit concurrency cap (Step 4).

In-memory cache or Redis? In-memory (lru-cache) is fine for a single process or where slightly stale per-instance data is acceptable. Once you run multiple workers or want a shared, longer TTL, move to Redis so every instance reads one cache and you stop multiplying upstream calls per instance.

Prevention

When integrating any API, read the rate-limit docs first, estimate peak RPS, and leave a 30% buffer.
Wrap every outbound call in a retry helper; ban naked fetch to third parties.
Cache idempotent GETs by default, with TTL matched to freshness (prices 60s, catalog 1h, config 1 day).
Multi-worker deployments must dedupe and use a shared cache (Redis), not in-memory.
Use p-limit for fan-out; never run a naked Promise.all over 100+ items.
One API key per service for fault isolation and split quota.
Monitor the 429 rate as an SLO: assign each upstream an RPM allowance and alert when it exhausts.

Tags: #Backend #Debug #Troubleshooting