Hit External API Rate Limit

Stripe, Twilio, SendGrid all return 429 once traffic ramps. Read Retry-After, add exponential back-off, cache GETs, and batch writes — in that order.

You call Stripe / Shopify / Twilio / SendGrid from your server. Dev is fine. Traffic ramps up and you start getting:

HTTP 429 Too Many Requests
{
  "error": "rate_limit_exceeded",
  "retry_after": 5
}

Your code retries instantly — limit gets stricter. Three workers all call the same endpoint and combined they bust. The problem isn’t “too many calls” — it’s that your client layer doesn’t handle 429 correctly, doesn’t cache idempotent GETs, and doesn’t dedupe concurrent calls.

Mental model: third-party rate limits are usually scoped to (account, endpoint, time window). The fix isn’t “go slower” — it’s request pattern reform.

Common causes

Ordered by hit rate, highest first.

1. Tight retry loop, no back-off

try { fetch() } catch { setTimeout(fetch, 100) } — 100ms later still 429, infinite retry just locks the window harder.

How to spot it: 10+ 429s in seconds in the logs.

2. Ignoring Retry-After

The provider’s 429 response says “wait X seconds.” Hardcoded sleep ignores it and almost certainly hits again.

How to spot it: Code has hardcoded await sleep(1000).

3. Multiple workers calling same endpoint

10 workers, each GET /products/123 for the same product. Each worker thinks their volume is fine; combined it busts.

How to spot it: Logs show same URL hit several times rapidly.

4. Not caching idempotent GETs

GET /products hits upstream every call, even though catalog rarely changes — every user request triggers a fetch.

How to spot it: Frequent duplicate GETs in logs.

5. Burst traffic (cron, batch) blows RPM

Promise.all([100 fetches]) instantly burns through “60 / min,” immediate 429.

How to spot it: Code has wide fan-out (Promise.all, parallel map).

6. Shared API key across services

Backend, CI, cron all on one API key — each thinks its quota is fine, sum is over.

How to spot it: API console shows multiple source IPs on one key.

Shortest path to fix

Step 1: Exponential back-off + respect Retry-After

async function fetchWithRetry(url, opts = {}, maxRetries = 5) {
  let baseDelay = 1000;
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const res = await fetch(url, opts);
    if (res.status !== 429) return res;

    const retryAfter = parseInt(res.headers.get('retry-after') || '0', 10);
    const backoff = baseDelay * Math.pow(2, attempt) + Math.random() * 250;
    const sleep = Math.min(Math.max(retryAfter * 1000, backoff), 60_000);

    console.log(`429, sleeping ${sleep}ms (attempt ${attempt + 1})`);
    await new Promise(r => setTimeout(r, sleep));
  }
  throw new Error('Max retries exceeded');
}

Read Retry-After first, then exponential, cap at 60s.

Step 2: Cache idempotent GETs

// Short-window in-memory LRU
import LRU from 'lru-cache';
const cache = new LRU({ max: 1000, ttl: 60_000 }); // 60s

async function getProduct(id) {
  const key = `product:${id}`;
  if (cache.has(key)) return cache.get(key);

  const res = await fetchWithRetry(`/api/products/${id}`);
  const data = await res.json();
  cache.set(key, data);
  return data;
}

Or longer windows with Redis (Upstash free tier is plenty):

import { Redis } from '@upstash/redis';
const redis = Redis.fromEnv();

async function getProduct(id) {
  const cached = await redis.get(`product:${id}`);
  if (cached) return cached;
  const data = await (await fetchWithRetry(...)).json();
  await redis.setex(`product:${id}`, 3600, JSON.stringify(data)); // 1h
  return data;
}

Step 3: Request coalescing

const inFlight = new Map<string, Promise<any>>();

async function getProductDedup(id) {
  const key = `product:${id}`;
  if (inFlight.has(key)) return inFlight.get(key);

  const promise = fetchWithRetry(`/api/products/${id}`)
    .then(r => r.json())
    .finally(() => inFlight.delete(key));

  inFlight.set(key, promise);
  return promise;
}

10 workers all calling getProductDedup(123) → only one upstream call.

Step 4: Throttle fan-out

import pLimit from 'p-limit';
const limit = pLimit(5);  // at most 5 in flight

const results = await Promise.all(
  items.map(item => limit(() => fetchItem(item)))
);

Per API docs RPM / 60 → 60 RPM = 1 RPS = typically 1-3 concurrency is safe.

Step 5: Batch endpoints

Many APIs have a batch variant:

Inefficient: N × GET /users/{id}
Efficient:   1 × POST /users:batchGet {ids: [...]}

Read the docs, find the batch/bulk endpoint, save RPS.

Step 6: Split keys

Backend service A → KEY_A
Backend service B → KEY_B
CI / cron          → KEY_C

Each key has its own quota. Fault isolation.

Step 7: Upgrade plan / request quota

After all the above and it’s still tight, request a quota increase or upgrade plan with the provider. Stripe, SendGrid, etc. all support this.

Prevention

  • First thing when integrating any API: read rate-limit docs, estimate peak RPS, leave 30% buffer
  • Wrap every fetch in a retry helper; ban naked fetch
  • Cache idempotent GETs by default with TTL matched to freshness (prices 60s, catalog 1h, config 1day)
  • Multi-worker: must dedupe + use shared cache (Redis), not in-memory
  • Use p-limit for fan-out; never naked Promise.all over 100+ items
  • One API key per service for isolation and split quota
  • Monitor 429 rate: > 0.5% means you’re already grazing the cap — optimize
  • Treat “API rate-limit budget” as an SLO: assign each upstream a RPM allowance, alert on exhaust

Tags: #Backend #Debug #Troubleshooting