You’re running a Claude API script — bulk-translating 200 articles, or an agent processing customer support tickets — and halfway through it starts looping on 429 rate_limit_exceeded. Your script’s retry-on-error logic just keeps hammering and digging the hole deeper. Or you’re in Claude Code, the agent hits the wall, can’t stop, and keeps making things worse.
Claude’s rate limits have two axes: RPM (requests per minute) and TPM (tokens per minute). Both can trigger 429. The real problem isn’t the limit itself — it’s that your client isn’t honoring retry-after. The more impatient you are, the longer you stay locked out.
Common causes
Ordered by hit rate, highest first.
1. Tight retry loop that ignores Retry-After
Anthropic’s 429 response includes a retry-after header (seconds). If your script retries instantly, each retry is rejected and the limit window extends.
How to spot it: Your retry logic uses a fixed sleep(1) instead of reading retry-after? That’s the problem.
2. Fanout too wide — instant RPM blowout
Promise.all over 50 prompts at once burns through your tier’s RPM in a single second. Per-minute average might be fine, but the instant peak triggers.
How to spot it: Are you using Promise.all / threadpool fan-out? How wide?
3. Long context exhausts TPM
Each request carrying 100K input tokens × 5 in flight = 500K TPM, over Sonnet Tier 1’s cap. Output tokens also count.
How to spot it: Compute single-request input tokens × QPS vs. Anthropic Rate Limits for your tier.
4. Agent doesn’t know when to stop
Claude Code or a custom agent sees a 429 from a tool call and might treat it as “temporary unavailable” and just keep retrying. Without explicit failure handling, agents will spin forever.
How to spot it: Agent log has > 5 occurrences of 429.
5. Pro / Team daily cap
Subscription plans have a daily message cap on a 5-hour rolling window. Hit it and every request 429s until the window resets.
How to spot it: UI shows “You’ve reached your daily limit, resets in X hours.”
6. Multiple clients sharing one API key
Backend, cron, and CI all use the same key; each thinks its share is fine; combined they bust the limit.
How to spot it: API console shows multiple source IPs hitting one key.
Shortest path to fix
Step 1: Kill the loop, back off 5-10 minutes
ps aux | grep my-script
kill -9 <PID>
Or Ctrl+C the Claude Code agent. Wait 5-10 minutes for the rate-limit window to fully reset before debugging.
Step 2: Proper exponential back-off
import time
import anthropic
from anthropic import RateLimitError
client = anthropic.Anthropic()
def call_with_backoff(prompt, max_retries=5):
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
except RateLimitError as e:
wait = int(e.response.headers.get("retry-after", 2 ** attempt))
print(f"Rate limited, sleeping {wait}s")
time.sleep(wait)
raise Exception("Max retries exceeded")
Key rule: read retry-after. Don’t hardcode sleep.
Step 3: Cap concurrency
import asyncio
sem = asyncio.Semaphore(3) # at most 3 in flight
async def safe_call(prompt):
async with sem:
return await call_with_backoff(prompt)
Rule of thumb: Tier 1 → ≤ 5 concurrent, Tier 2 → ≤ 10. Compute as RPM/60.
Step 4: Batch requests
Combine 10 small translations into one prompt asking for 10 results:
Translate the following 10 English passages to Chinese,
return as a JSON array:
[
"text 1",
"text 2",
...
]
Output tokens still count toward TPM, but RPM drops 10x.
Step 5: Lighter model or Batch API
Tasks that don’t need Opus → Sonnet / Haiku (more generous TPM). Or use the Batch API — async, doesn’t compete with live traffic, 50% cheaper.
Step 6: Upgrade tier or split keys
Production workloads → request a tier upgrade. Short term: give each service its own API key — each gets its own quota.
Step 7: Daily cap → just wait
Pro / Team daily cap → UI shows reset time, wait. Or temporarily switch to API + pay-as-you-go.
Prevention
- Anthropic SDK has built-in retry (2x exponential by default); tune with
max_retries=Ninstead of rolling your own - Before launching any script, estimate peak TPM/RPM and leave 20% buffer under your tier
- Don’t fire batch jobs on the hour (00:00, 05:00…) — add jitter to avoid global peaks
- Backend, CI, cron each get their own API key for fault isolation
- Track 429 ratio: > 1% means upgrade tier or optimize request patterns
- For large batch jobs, use the Batch API — saves money and quota
Related
- Claude usage limit
- Claude file generation eats quota
- Claude beginner guide
- Claude prompt best practices
- Claude Projects
Tags: #Claude #Debug #Troubleshooting