Cursor Fast / Slow Request Billing Confusion

Month starts with 500 fast requests, days later you're stuck on slow — how fast vs slow actually works and how to stretch your quota.

Pro gives you 500 fast requests at month start. Looks like plenty. Mid-month, Composer shows “You are on slow requests” and every reply takes 20-60 seconds to start. Open Settings → Usage: fast is already at 0; everything else is queueing in the slow pool. Cursor isn’t cheating — different models cost different multiples of a fast request, and one Composer “turn” can fire several backend calls.

To reconcile the bill with what it feels like, understand the three layers of Cursor’s pricing.

Common causes

1. Models burn different numbers of fast requests

Cursor publishes a “premium request multiplier” per model on the models page. Typical tiers:

  • claude-sonnet-4 / gpt-5: 1× per call
  • claude-opus-4 / o3: usually 2-4× per call
  • gemini-2.5-pro: 1-2× depending on context length
  • in-house small models (cursor-small): 0× (free)

If you default to opus, 500 fast covers roughly 125 Composer turns, not 500.

How to judge: Settings → Usage → turn on “Group by model.” Compare each model’s draw against the published multiplier.

2. Composer makes multiple calls per turn

In agent mode the model can chain read_file → grep → write_file → run_terminal — each step is a backend call. “Fix this bug” can cost 3-8 fast requests, not 1.

How to judge: compare the Usage dashboard “requests” column to the number of prompts you sent today. A 3× gap means agent steps are doing the burning.

3. Slow requests aren’t free — they’re queued + rate-limited

People think “fast runs out, falls back to free but slow.” Slow pool is a shared queue, 30-90s waits at peak, and some premium models (opus, o3) have a monthly hard cap on slow too. Past that you get refused.

How to judge: Output → Cursor for “slow pool full, please upgrade” or similar.

4. Max mode / long-context mode adds a multiplier

Composer’s “Max mode” or 200k-context mode multiplies the per-call cost. One large refactor can burn 5-10 fast on a single prompt.

How to judge: look at the model name in Composer’s input bar — a “Max” badge means it’s on.

5. BYOK configured but Cursor proxy still in use

If you added an OpenAI / Anthropic API key in Settings → Models but didn’t tick “Use my API key for …”, calls still go through Cursor’s proxy and still burn fast requests.

How to judge: Settings → Models — each model shows either “Cursor” or “Your API Key” next to it.

6. Cross-device / account stat lag

After switching devices or networks, Usage figures lag a few minutes, making it feel like “I haven’t used anything.”

How to judge: refresh the Settings page or wait 5 minutes.

Before you start

  • Identify which entry point is burning: chat, Composer, or Cmd+K. Same billing, very different frequency.
  • Use https://cursor.com/settings on the web for Usage, not the in-IDE panel — numbers update faster.
  • Note your Cursor version and current default model (bottom-right dropdown). Multipliers differ per model.

Info to collect

  • Cursor version, current plan (Hobby / Pro / Business), whether you bought any fast-request add-on.
  • Screenshots of Settings → Usage grouped by model and by day.
  • Roughly how many prompts you sent today, which models, whether Max mode was on.
  • Whether you’re using BYOK; if yes, check that each model shows “Your API Key” in Settings → Models.

Shortest fix path

Ordered by impact on remaining fast.

Step 1: Understand your actual draw

Open https://cursor.com/settings → Usage, group by model, export CSV, and lay out the last 30 days. Usually 60-80% of quota goes to one or two high-multiplier models.

Step 2: Demote daily work to 1× models

Set the default model to claude-sonnet-4 or gpt-5 (1× multiplier). Reserve opus / o3 for “1× couldn’t get it right” moments and switch manually.

Switch per-prompt: Composer input bar → model dropdown → pick a 1× model
Disable globally:  Settings → Models → disable the high-multiplier ones

Step 3: Turn off Max mode

Don’t leave Max mode on by default in Composer. Enable it only when you must hand the model an entire huge file; turn it off after.

Step 4: Use Cursor Tab + Cmd+K instead of Composer

Cursor Tab autocomplete and Cmd+K inline edit run on cursor-small / in-house models — basically 0× cost. For routine completions and “tweak this line of comment” use Tab/Cmd+K, save Composer for hard tasks.

Step 5: Wire up BYOK

Settings → Models → paste your Anthropic / OpenAI key, tick “Use my API key” on the relevant models. Fast quota stops draining; you pay your own provider bill. Note: agent features can be slightly behind on BYOK because some new tools require Cursor’s proxy.

Step 6: Buy fast-request usage-based pricing, don’t upgrade plan

https://cursor.com/settings → Plan → “Usage-based pricing.” Once enabled, anything past your fast quota is metered (~$0.04 per fast equivalent, scaled by multiplier). Almost always cheaper than jumping from Pro to Business.

How to verify the fix

  • Wait 7 days, recheck Usage, and confirm your daily fast burn now lines up with month-end.
  • Sign into the same account on another device and confirm Usage matches, ruling out a frontend cache delta.
  • Have a teammate on the same plan run the same workflow with the same model setup — compare fast spent.

If it still fails

  • Reduce repro to one prompt with one model and watch how much fast a single Composer turn actually costs.
  • Roll back the latest Cursor upgrade in case the default model or multipliers changed.
  • Search forum.cursor.com for the model’s published multiplier; bring your Usage screenshots.
  • Grab View → Output → Cursor logs and post to Bug Reports; the billing team monitors that channel.

Prevention

  • Schedule a weekly look at the Usage dashboard so you can pace through the month.
  • Team agreement: opus only for hard problems; sonnet / gpt-5 for the bulk.
  • Treat Composer Max mode as an explicit action, never default-on.
  • Use .cursorrules to enforce shorter replies, indirectly cutting multi-turn calls.
  • Big refactors: talk through the plan in chat first, only let agent execute once the plan is locked, avoiding agent flail.

Tags: #Cursor #Troubleshooting