Claude "Rate Limited" Loops — Why and How to Stop

Claude keeps retrying and looping on rate-limit errors — the fix is back-off, batch, and conversation hygiene.

You’re running a Claude API script — bulk-translating 200 articles, or an agent processing customer support tickets — and halfway through it starts looping on 429 rate_limit_exceeded. Your script’s retry-on-error logic just keeps hammering and digging the hole deeper. Or you’re in Claude Code, the agent hits the wall, can’t stop, and keeps making things worse.

Claude’s rate limits have two axes: RPM (requests per minute) and TPM (tokens per minute). Both can trigger 429. The real problem isn’t the limit itself — it’s that your client isn’t honoring retry-after. The more impatient you are, the longer you stay locked out.

Common causes

Ordered by hit rate, highest first.

1. Tight retry loop that ignores Retry-After

Anthropic’s 429 response includes a retry-after header (seconds). If your script retries instantly, each retry is rejected and the limit window extends.

How to spot it: Your retry logic uses a fixed sleep(1) instead of reading retry-after? That’s the problem.

2. Fanout too wide — instant RPM blowout

Promise.all over 50 prompts at once burns through your tier’s RPM in a single second. Per-minute average might be fine, but the instant peak triggers.

How to spot it: Are you using Promise.all / threadpool fan-out? How wide?

3. Long context exhausts TPM

Each request carrying 100K input tokens × 5 in flight = 500K TPM, over Sonnet Tier 1’s cap. Output tokens also count.

How to spot it: Compute single-request input tokens × QPS vs. Anthropic Rate Limits for your tier.

4. Agent doesn’t know when to stop

Claude Code or a custom agent sees a 429 from a tool call and might treat it as “temporary unavailable” and just keep retrying. Without explicit failure handling, agents will spin forever.

How to spot it: Agent log has > 5 occurrences of 429.

5. Pro / Team daily cap

Subscription plans have a daily message cap on a 5-hour rolling window. Hit it and every request 429s until the window resets.

How to spot it: UI shows “You’ve reached your daily limit, resets in X hours.”

6. Multiple clients sharing one API key

Backend, cron, and CI all use the same key; each thinks its share is fine; combined they bust the limit.

How to spot it: API console shows multiple source IPs hitting one key.

Shortest path to fix

Step 1: Kill the loop, back off 5-10 minutes

ps aux | grep my-script
kill -9 <PID>

Or Ctrl+C the Claude Code agent. Wait 5-10 minutes for the rate-limit window to fully reset before debugging.

Step 2: Proper exponential back-off

import time
import anthropic
from anthropic import RateLimitError

client = anthropic.Anthropic()

def call_with_backoff(prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}]
            )
        except RateLimitError as e:
            wait = int(e.response.headers.get("retry-after", 2 ** attempt))
            print(f"Rate limited, sleeping {wait}s")
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Key rule: read retry-after. Don’t hardcode sleep.

Step 3: Cap concurrency

import asyncio
sem = asyncio.Semaphore(3)  # at most 3 in flight

async def safe_call(prompt):
    async with sem:
        return await call_with_backoff(prompt)

Rule of thumb: Tier 1 → ≤ 5 concurrent, Tier 2 → ≤ 10. Compute as RPM/60.

Step 4: Batch requests

Combine 10 small translations into one prompt asking for 10 results:

Translate the following 10 English passages to Chinese,
return as a JSON array:
[
  "text 1",
  "text 2",
  ...
]

Output tokens still count toward TPM, but RPM drops 10x.

Step 5: Lighter model or Batch API

Tasks that don’t need Opus → Sonnet / Haiku (more generous TPM). Or use the Batch API — async, doesn’t compete with live traffic, 50% cheaper.

Step 6: Upgrade tier or split keys

Production workloads → request a tier upgrade. Short term: give each service its own API key — each gets its own quota.

Step 7: Daily cap → just wait

Pro / Team daily cap → UI shows reset time, wait. Or temporarily switch to API + pay-as-you-go.

Prevention

  • Anthropic SDK has built-in retry (2x exponential by default); tune with max_retries=N instead of rolling your own
  • Before launching any script, estimate peak TPM/RPM and leave 20% buffer under your tier
  • Don’t fire batch jobs on the hour (00:00, 05:00…) — add jitter to avoid global peaks
  • Backend, CI, cron each get their own API key for fault isolation
  • Track 429 ratio: > 1% means upgrade tier or optimize request patterns
  • For large batch jobs, use the Batch API — saves money and quota

Tags: #Claude #Debug #Troubleshooting