Claude Computer Use Stuck Clicking the Same Button in a Loop

Q: I'm on the older tool version and can't zoom. What's my equivalent fix?

`enable_zoom` requires `computer_20251124`. On `computer_20250124`, lower your capture resolution or crop the screenshot to the region of interest before sending, and prompt Claude to read a specific element rather than the whole screen.

Claude Computer Use keeps clicking the same button or field without progressing — almost always a screenshot-timing, coordinate-scaling, or hidden-state problem. Here is the fastest fix plus the full diagnosis.

Published: May 24, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You ask Claude Computer Use to fill a form, click submit, and read the confirmation. Claude clicks submit. Nothing visibly changes. It screenshots again, sees the form still open, and clicks submit again. And again. Twenty iterations later the transcript is a wall of “I will click the Submit button” lines. The model is not broken — it reacts to what it sees, and what it sees is a UI state that did not change between turns.

Fastest fix (works for most loops): add an explicit wait-screenshot-verify rule to your prompt so Claude stops blindly retrying. Anthropic’s own recommended wording is in Step 1. If you control the API loop, also cap iterations with a max_iterations budget (Step 5) so a stuck run cannot rack up hundreds of turns. Those two changes resolve the large majority of action loops.

The loop almost always traces back to one of six things: the click landed on a covering element, the button is gated behind async validation, the screenshot fired before the DOM settled, coordinates drifted after a resize or on a Retina display, the confirmation rendered offscreen, or the action genuinely succeeded but the success cue was too brief for Claude to catch.

Which tool version are you on?

The fix details depend on your tool version, and the lineup changed in late 2025. As of June 2026:

Tool version `type`	`anthropic-beta` header	Models	Notable additions
`computer_20251124`	`computer-use-2025-11-24`	Claude Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5	`zoom` action (`enable_zoom: true`)
`computer_20250124`	`computer-use-2025-01-24`	Claude Sonnet 4.5, Haiku 4.5, older (Opus 4.1, Sonnet 4 deprecated)	`scroll`, `wait`, `triple_click`, `hold_key`

If you are on computer_20251124 you have the zoom action, which directly helps with two of the loop causes below (subtle success cues and tiny labels). Computer use is still a beta feature on the Claude API and requires the beta header — leaving it off returns a tool-definition error, not a loop.

One scaling change matters for loops: as of June 2026 the Anthropic docs note that Claude Opus 4.7 supports screenshots up to 2576px on the long edge and treats coordinates as 1:1 with image pixels (no scale-factor conversion). The 1568px limit and the manual coordinate-scaling math below still apply to earlier models (Sonnet 4.6, Opus 4.6, Opus 4.5). In other words, the single biggest source of “every click misses” loops — silent downscaling — is largely gone if you run Opus 4.7 at or below 2576px.

Common causes

Ordered by likelihood for typical Computer Use loops.

Computer Use clicks at pixel coordinates. If a tooltip, modal backdrop, or a pointer-events: auto overlay sits on top of the target button at click time, the click hits the overlay and the button never fires. The next screenshot looks identical, so Claude tries again.

How to spot it: Open the page manually, hover near the button, and use the DevTools “Inspect” picker at those coordinates. If anything other than the button is the top element, that is the culprit.

2. Button is technically enabled but waiting on async validation

Many forms show the submit button as visually enabled but block the click handler until a debounced validation finishes. Claude’s click fires too early, gets silently swallowed, and the page stays put.

How to spot it: Manually click the button immediately after typing. If nothing happens for 500-1500ms and then it works, this is your case.

3. Screenshot taken before the DOM settled

The screenshot is just another tool call, so timing depends entirely on your agent loop. If the page does a full client-side route change with a spinner that takes 2s and your loop screenshots immediately, Claude captures the spinner state and concludes “still loading, click again” — even though a click during loading is now wrong.

How to spot it: The screenshot shows a spinner or skeleton where content should be, and Claude keeps clicking the now-stale call to action.

4. Coordinate drift after a resize, or wrong scaling on a Retina display

Two distinct flavors:

Resize drift: if the browser zoom level, window size, or layout changed between turns (auto-resize, sidebar collapse, virtual keyboard on mobile emulation), the coordinates Claude computed earlier no longer point at the same DOM element.
Retina / DPI scaling: on macOS Retina displays the screenshot is captured at a device pixel ratio of 2, so the image is twice the resolution of the logical coordinate space. Per Anthropic’s docs you must either downscale the screenshot by 2x before sending, or halve the coordinates Claude returns before clicking. Skip this and every click lands at roughly double the intended offset — a textbook never-changes loop. The same applies whenever the screenshot you send exceeds the API image limits (1568px on the long edge for most models) and gets silently downscaled.

How to spot it: Compare two consecutive screenshots — if the button moved between turns but Claude clicks the old position, it is resize drift. If clicks are consistently offset in one direction (e.g. always low-and-right), it is a scaling mismatch, not drift.

5. Confirmation dialog rendered offscreen or in a portal Claude does not see

Some UIs portal their modals to document.body at a position outside Claude’s current view (e.g. top of the document while the scroll is mid-page). Claude sees an unchanged form; the modal is waiting offscreen.

How to spot it: Scroll to the top of the page manually. If there is a modal there blocking the page, Claude needs to be told to scroll up first.

6. The page genuinely succeeded but the success cue is too subtle

A toast that fades in over 200ms then fades out 1.5s later. A green border that lasts 800ms. Tiny status text Claude cannot read at the downscaled resolution. By the time the next screenshot fires, the success signal is gone, so Claude thinks the action failed.

How to spot it: Watch the page manually after the action. If the success indicator is visible for less than 2 seconds, or the text is smaller than the surrounding body copy, Claude will likely miss it.

Before you start

Save the entire screenshot sequence from the loop. Per-turn screenshots are the only ground truth for what Claude saw.
Record the exact prompt that triggered the run. Wording like “keep trying until you succeed” makes loops worse.
Note whether the loop is deterministic (same site every time) or intermittent (sometimes works). Intermittent points at timing; deterministic points at structure or scaling.

Information to collect

Per-turn screenshots from the loop (at minimum 3 consecutive iterations).
The target site URL and the specific action Claude was asked to perform.
Tool version (computer_20251124 vs computer_20250124), the model, and whether you run the API directly, through the reference container, or a desktop wrapper.
The exact display_width_px / display_height_px you declared, the real screen resolution, and the device pixel ratio.
Any custom system prompt or tool-use instructions you supplied.

Step-by-step fix

Ordered by ROI. Cheapest checks first.

Step 1: Add the official wait-screenshot-verify instruction

The single highest-impact fix. Anthropic publishes a recommended prompt for exactly this in the Computer use tool docs (the “Optimize model performance with prompting” section) — use their wording or a close variant:

After each step, take a screenshot and carefully evaluate if you have
achieved the right outcome. Explicitly show your thinking: "I have
evaluated step X..." If not correct, try again. Only when you confirm a
step was executed correctly should you move on to the next one.

Add one rule on top for loop prevention specifically:

If a screenshot looks identical to the one before your last click, do
NOT click the same element again. Scroll, take a wider screenshot, or
look elsewhere on the page for an error message.

This breaks causes 1, 2, 3, and 6 most of the time, because Claude stops assuming an action worked and stops blindly retrying.

Step 2: Force a scroll to top before reading state

If you suspect the confirmation is rendered above the current scroll position (cause 5):

Before deciding whether the action succeeded, scroll the page to the
top, wait 1 second, then take a screenshot. Modals and toasts often
render at the top of the document body.

Step 3: Re-locate by label, and use keyboard for sticky widgets

Bias the prompt to identify targets by visible text first, which neutralizes resize drift (cause 4):

When clicking a button, identify it by its visible text label first,
not by remembered pixel coordinates. If the same coordinates fail
twice, the button has likely moved — re-locate it by label.

Anthropic also notes that dropdowns and scrollbars are hard for mouse-based control. For those, prompt Claude to use keyboard shortcuts (Tab to focus, arrow keys, Enter) instead of clicking. And per the docs, put your instruction text before the screenshot image in each user turn — describing the target before the image is processed measurably improves click accuracy.

Step 4: Fix display dimensions and Retina scaling

The most common silent cause of “every click misses.” Make display_width_px / display_height_px exactly match the pixel dimensions of the image you actually send, and stay within Anthropic’s recommended ranges:

General desktop tasks: 1024x768 or 1280x720
Web applications: 1280x800 or 1366x768
Avoid resolutions above 1920x1080

# Tool definition (computer_20251124)
tools = [{
    "type": "computer_20251124",
    "name": "computer",
    "display_width_px": 1280,
    "display_height_px": 800,
    "display_number": 1,
    # enable_zoom lets Claude read tiny labels / status text
    "enable_zoom": True,
}]

On a macOS Retina display, Claude captures at a device pixel ratio of 2, so the image is twice the resolution of the logical coordinate space. Per Anthropic’s docs you must either downscale the screenshot by 2x before sending, or halve the coordinates Claude returns before clicking. The same scale-down-then-scale-coordinates-back-up pattern applies whenever your source exceeds the per-model image limit (1568px on the long edge for Sonnet 4.6, Opus 4.6, and Opus 4.5; 2576px with 1:1 coordinates for Opus 4.7). The API actually enforces two constraints at once — 1568px on the long edge and roughly 1.15 megapixels total — so the docs compute the scale factor as the minimum of both, then divide Claude’s returned coordinates by that same scale before clicking:

import math

def get_scale_factor(width, height):
    """Pre-downscale screenshots so coordinates stay 1:1 with what Claude sees."""
    long_edge_scale = 1568 / max(width, height)
    total_pixels_scale = math.sqrt(1_150_000 / (width * height))
    return min(1.0, long_edge_scale, total_pixels_scale)

scale = get_scale_factor(screen_w, screen_h)
# resize the screenshot to (screen_w*scale, screen_h*scale) before sending,
# then map Claude's returned (x, y) back with: screen_x = x / scale

For Opus 4.7 at or below 2576px you can skip this entirely — coordinates are already 1:1. Setting enable_zoom: True (only on computer_20251124) lets Claude inspect a region at full resolution, which fixes both the “clicks land near but miss tiny targets” and “could not read the success text” variants. Per the docs, if Claude is not zooming when you expect, ask it about a specific element or region rather than the screen as a whole.

Step 5: Cap the agent loop with a max-iterations budget

This is the canonical loop-breaker from Anthropic’s own reference implementation: run the sampling loop until Claude stops requesting tools or a hard iteration limit is reached.

def sampling_loop(model, messages, max_iterations=10):
    for _ in range(max_iterations):
        response = client.beta.messages.create(
            model=model,
            max_tokens=4096,
            messages=messages,
            tools=TOOLS,
            betas=["computer-use-2025-11-24"],
        )
        messages.append({"role": "assistant", "content": response.content})
        tool_results = process_tool_calls(response)
        if not tool_results:        # no more tool use -> task complete
            return messages
        messages.append({"role": "user", "content": tool_results})
    return messages  # hit the limit -> surface to the user

For extra safety, add a same-screenshot detector so you abort before burning the whole budget on a frozen page:

import hashlib

last_hash, identical_count = None, 0

def on_screenshot(image_bytes):
    global last_hash, identical_count
    h = hashlib.sha256(image_bytes).hexdigest()
    if h == last_hash:
        identical_count += 1
        if identical_count >= 3:
            raise StopIteration("3 identical screenshots in a row — aborting loop")
    else:
        identical_count = 0
    last_hash = h

Three byte-identical screenshots in a row means the run is stuck. Abort and surface to the user instead of paying for more turns.

Step 6: For known sticky sites, pre-script the awkward step

Some sites (banking, legacy enterprise apps, heavy SPA dashboards) are reliably hostile to Computer Use. For those, fall back to a deterministic Playwright or Selenium script for the click-and-wait step, and only hand control to Claude for the parts that need reasoning.

How to confirm it’s fixed

Re-run the original task with the new prompt and tool config. Confirm Claude no longer clicks the same element three times in a row.
Check the screenshot sequence: between consecutive actions the screenshots should be visibly different (modal opened, page changed, toast appeared).
The total turn count for the task should drop noticeably versus the loopy run, and the run should now end with “task complete” rather than hitting max_iterations.
If you suspected a scaling bug, click a known target (e.g. a corner button) and confirm the cursor lands exactly on it, not offset.

Diagnose click-offset issues (Anthropic’s table)

If clicks miss instead of looping, Anthropic’s docs map the symptom to a fix:

Symptom	Likely cause	Try
Offset consistently in one direction	`display_*_px` does not match the image sent, or it was silently downscaled	Match display dims to the resized screenshot; pre-downscale within API limits
Lands in the right area but misses small targets	Detail lost downscaling a 4K+ source, or distorted aspect ratio	Set `enable_zoom: true`; capture at lower DPI or crop; preserve aspect ratio
Clicks the wrong element entirely	Ambiguous instruction, or similar elements nearby	Use positional prompts (“the blue Submit button, bottom-right”); smaller steps
Accuracy consistently poor	Screenshots above API limits, or resolution too low	Pre-downscale within limits; try `1280x720` as a baseline

Long-term prevention

Treat every Computer Use prompt as a state machine: “click X, verify Y appeared, then proceed.” Make the verify step explicit.
Bake a “do not click the same element twice without a state change” rule into your system prompt template.
Lock display_width_px / display_height_px and the device pixel ratio for the session; never let the host auto-resize.
For high-value sites, include example screenshots and tool calls of a successful run in the prompt — the docs recommend this for repeatable tasks.
Pipe screenshots to a file system so you can replay any failed run end-to-end without re-charging the API.

Common pitfalls

Telling Claude to “try harder” or “keep trying” — this actively encourages loops.
Trusting a button’s visible enabled state without checking for an async-validation gate.
Running without a max_iterations budget — a loop can rack up hundreds of turns and real dollars.
Ignoring Retina / downscaling math, so every click is offset and nothing ever changes.
Assuming a captcha-blocked or rate-limited page is “just slow.” Claude will keep clicking forever.
Maxing out the thinking effort on UI tasks. Anthropic’s June 2026 guidance: on Sonnet 4.6 and Opus 4.6 use medium and avoid max (it adds token cost without improving accuracy here); on those models low actually spends fewer output tokens than disabling thinking, because fewer mistakes mean fewer retries. Opus 4.7 is the exception — it defaults to high, with low reserved for high-throughput or cost-sensitive runs.

FAQ

Q: Claude clicks the right button but the page goes blank. What now?

Likely a CSP or network-block issue where the click triggered an outbound request the sandbox blocked. Reproduce the flow manually and check the Network tab. If you see a CORS or CSP error, your sandbox needs the destination domain allowlisted. See also Claude file connectors stuck in re-auth loop.

Q: Why does the loop happen on one site but not others?

Sites with heavy client-side routing, portal-rendered modals, or async validation are loop-prone. Static-HTML forms almost never loop. The fix is per-site prompting plus correct scaling, not a single global setting.

Q: Can I just raise max-iterations and hope it eventually succeeds?

No. If the input does not change between turns, Claude’s reasoning will not either, so it loops forever within whatever budget you give it. You must break the input pattern (scroll, wait, re-locate, fix scaling) or abort.

Q: Does switching models help?

Sometimes. Per Anthropic’s docs, Sonnet 4.6 is mechanically more precise at clicking than Opus 4.6 and more robust when screenshots are heavily downscaled; Opus 4.7 narrows that gap to roughly comparable precision and needs less downscaling because it accepts up to 2576px on the long edge with 1:1 coordinates. If your loop is a downscaling/offset problem, moving to Opus 4.7 can remove the scaling math entirely. But model choice does not fix a wrong-frame screenshot or a Retina DPR-2 bug — fix those first.

Q: I’m on the older tool version and can’t zoom. What’s my equivalent fix?

enable_zoom requires computer_20251124. On computer_20250124, lower your capture resolution or crop the screenshot to the region of interest before sending, and prompt Claude to read a specific element rather than the whole screen.

Tags: #Troubleshooting #Claude #computer-use #automation #agent