You call Stripe / Shopify / Twilio / SendGrid from your server. Dev is fine. Traffic ramps up and you start getting:
HTTP 429 Too Many Requests
{
"error": "rate_limit_exceeded",
"retry_after": 5
}
Your code retries instantly — limit gets stricter. Three workers all call the same endpoint and combined they bust. The problem isn’t “too many calls” — it’s that your client layer doesn’t handle 429 correctly, doesn’t cache idempotent GETs, and doesn’t dedupe concurrent calls.
Mental model: third-party rate limits are usually scoped to (account, endpoint, time window). The fix isn’t “go slower” — it’s request pattern reform.
Common causes
Ordered by hit rate, highest first.
1. Tight retry loop, no back-off
try { fetch() } catch { setTimeout(fetch, 100) } — 100ms later still 429, infinite retry just locks the window harder.
How to spot it: 10+ 429s in seconds in the logs.
2. Ignoring Retry-After
The provider’s 429 response says “wait X seconds.” Hardcoded sleep ignores it and almost certainly hits again.
How to spot it: Code has hardcoded await sleep(1000).
3. Multiple workers calling same endpoint
10 workers, each GET /products/123 for the same product. Each worker thinks their volume is fine; combined it busts.
How to spot it: Logs show same URL hit several times rapidly.
4. Not caching idempotent GETs
GET /products hits upstream every call, even though catalog rarely changes — every user request triggers a fetch.
How to spot it: Frequent duplicate GETs in logs.
5. Burst traffic (cron, batch) blows RPM
Promise.all([100 fetches]) instantly burns through “60 / min,” immediate 429.
How to spot it: Code has wide fan-out (Promise.all, parallel map).
6. Shared API key across services
Backend, CI, cron all on one API key — each thinks its quota is fine, sum is over.
How to spot it: API console shows multiple source IPs on one key.
Shortest path to fix
Step 1: Exponential back-off + respect Retry-After
async function fetchWithRetry(url, opts = {}, maxRetries = 5) {
let baseDelay = 1000;
for (let attempt = 0; attempt < maxRetries; attempt++) {
const res = await fetch(url, opts);
if (res.status !== 429) return res;
const retryAfter = parseInt(res.headers.get('retry-after') || '0', 10);
const backoff = baseDelay * Math.pow(2, attempt) + Math.random() * 250;
const sleep = Math.min(Math.max(retryAfter * 1000, backoff), 60_000);
console.log(`429, sleeping ${sleep}ms (attempt ${attempt + 1})`);
await new Promise(r => setTimeout(r, sleep));
}
throw new Error('Max retries exceeded');
}
Read Retry-After first, then exponential, cap at 60s.
Step 2: Cache idempotent GETs
// Short-window in-memory LRU
import LRU from 'lru-cache';
const cache = new LRU({ max: 1000, ttl: 60_000 }); // 60s
async function getProduct(id) {
const key = `product:${id}`;
if (cache.has(key)) return cache.get(key);
const res = await fetchWithRetry(`/api/products/${id}`);
const data = await res.json();
cache.set(key, data);
return data;
}
Or longer windows with Redis (Upstash free tier is plenty):
import { Redis } from '@upstash/redis';
const redis = Redis.fromEnv();
async function getProduct(id) {
const cached = await redis.get(`product:${id}`);
if (cached) return cached;
const data = await (await fetchWithRetry(...)).json();
await redis.setex(`product:${id}`, 3600, JSON.stringify(data)); // 1h
return data;
}
Step 3: Request coalescing
const inFlight = new Map<string, Promise<any>>();
async function getProductDedup(id) {
const key = `product:${id}`;
if (inFlight.has(key)) return inFlight.get(key);
const promise = fetchWithRetry(`/api/products/${id}`)
.then(r => r.json())
.finally(() => inFlight.delete(key));
inFlight.set(key, promise);
return promise;
}
10 workers all calling getProductDedup(123) → only one upstream call.
Step 4: Throttle fan-out
import pLimit from 'p-limit';
const limit = pLimit(5); // at most 5 in flight
const results = await Promise.all(
items.map(item => limit(() => fetchItem(item)))
);
Per API docs RPM / 60 → 60 RPM = 1 RPS = typically 1-3 concurrency is safe.
Step 5: Batch endpoints
Many APIs have a batch variant:
Inefficient: N × GET /users/{id}
Efficient: 1 × POST /users:batchGet {ids: [...]}
Read the docs, find the batch/bulk endpoint, save RPS.
Step 6: Split keys
Backend service A → KEY_A
Backend service B → KEY_B
CI / cron → KEY_C
Each key has its own quota. Fault isolation.
Step 7: Upgrade plan / request quota
After all the above and it’s still tight, request a quota increase or upgrade plan with the provider. Stripe, SendGrid, etc. all support this.
Prevention
- First thing when integrating any API: read rate-limit docs, estimate peak RPS, leave 30% buffer
- Wrap every fetch in a retry helper; ban naked fetch
- Cache idempotent GETs by default with TTL matched to freshness (prices 60s, catalog 1h, config 1day)
- Multi-worker: must dedupe + use shared cache (Redis), not in-memory
- Use p-limit for fan-out; never naked Promise.all over 100+ items
- One API key per service for isolation and split quota
- Monitor 429 rate: > 0.5% means you’re already grazing the cap — optimize
- Treat “API rate-limit budget” as an SLO: assign each upstream a RPM allowance, alert on exhaust
Related
Tags: #Backend #Debug #Troubleshooting