You’re on Claude Pro or Team. You generate “just a few PDFs” or ask Claude to write one full report and suddenly hit a usage limit — even though you spent the morning chatting freely. This isn’t Claude penalizing file output; it’s the token-billing model. File generation is naturally 5-20x more expensive than chat.
The key insight: Claude bills tokens, not “messages.” Every reply you see, every Artifact body, every regeneration, and every prior turn replayed into context all count as tokens. File generation pushes “thousands of words / lines” through output tokens in one shot, and your quota drops accordingly fast.
Common causes
Ordered by hit rate, highest first.
1. The whole file body counts as output tokens
A 5,000-word English report ≈ ~6,500 output tokens. 1,000 lines of TypeScript ≈ 8,000-12,000 tokens. One “write me the full report” request is the token equivalent of 50-100 chat turns.
How to spot it: Settings → Usage on the web client — the spike from a file generation is unmistakable.
2. Every Artifact re-render rebills
Artifacts are not stored snapshots. Every “change this one line” re-emits the entire artifact body and rebills. 5 small edits = 5x the tokens.
How to spot it: Same artifact, repeated “fix this,” and usage climbs disproportionately.
3. Long conversations replay the full history
Claude is stateless. Each new turn re-sends the entire conversation (including artifact bodies) as input. By turn 30, your input may be 50K tokens — every prompt rebills that 50K.
How to spot it: A short single-line question burns a surprising amount of usage. Almost certainly the history.
4. Large file / long PDF references
Uploading a 100-page PDF tokenizes the whole thing as input. If the conversation continues, the PDF is reattached every turn. An 80K-token PDF across 10 turns = 800K input tokens.
How to spot it: You recently uploaded a large file and subsequent chat feels slow and quota drops fast.
5. Extended Thinking doubles output
Claude’s extended-thinking produces hidden reasoning that still counts as output tokens. Thinking mode on complex tasks can be 3-5x normal output.
How to spot it: You have the “thinking” badge on, with a noticeable pre-answer pause.
6. Repeated retries of the same prompt
“Try again” — every retry is a full input + full new output. Three retries = 3x tokens.
How to spot it: Recent turns show several consecutive regenerations.
Shortest path to fix
Step 1: Two-stage generation — outline first, expand second
Don’t open with “write a complete 50-page competitor analysis.” Instead:
prompt 1: Give me the outline for this competitor analysis,
section by section, with 1-2 sentences on each.
Confirm direction, then:
prompt 2: Now expand section 1 to ~800 words.
prompt 3: Section 2 ...
Saves tokens and catches wrong direction early.
Step 2: Edit Artifacts by diff, not by full body
Don't say: "Rewrite this code with X changed" (full reemit)
Do say: "Between lines 42-48, insert this block: [code], everything else unchanged.
Only reply with the modified section, do not reprint the full file."
Or just: “Output a unified diff.”
Step 3: Long tasks → new conversations
After each big file, start a new conversation for the next. Carry a one-paragraph summary of the prior conclusion — never paste full history.
Rule of thumb: when input > 30K tokens (visible in Settings → Usage), open a fresh window.
Step 4: Reference large PDFs, don’t reattach
Inefficient: dump 100-page PDF, then 20 turns of chat
Efficient: attach PDF → ask Claude to extract key facts → close that chat
→ open new chat, paste only the extracted facts
If you really need raw text, put the PDF in a Project’s Knowledge — retrieved on demand instead of attached every turn.
Step 5: Toggle off Extended Thinking
Reserve thinking for code debugging / math proofs / complex planning. Drafting emails or outlines doesn’t need it — saves 60-80% output tokens.
Step 6: Think before retrying
Don’t just say “try again.” Say “the previous version had problem X — adjust only X, keep the rest.” Best: “rewrite only paragraph N, leave others as-is” to avoid full regeneration.
Prevention
- Mental budget per task: 5K-word file ≈ 6.5K output tokens — work out how many you can do per day
- Save large artifacts to disk, then reference instead of regenerating
- Projects are cheaper than long chats: Knowledge is cached, not recounted per turn
- Pro / Team quotas reset on a 5-hour rolling window; schedule file-heavy work at the window start and leave headroom
- Disable Extended Thinking when not needed
- Learn to express edits as diffs; never let Claude reprint the entire file repeatedly
Related
- Claude usage limit
- Claude Code intro
- Claude beginner guide
- Claude prompt best practices
- Claude Projects
Tags: #Claude #Debug #Troubleshooting