Claude Computer Use Workflow: A Practical 2026 Setup Guide

Pick the right tasks, run it in a sandbox, and verify every few steps. A reproducible Computer Use workflow with June 2026 models, pricing, and OSWorld numbers.

Published: May 23, 2026 Updated: Jun 14, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

TL;DR

Computer Use lets Claude take screenshots, move the cursor, click, and type in a real desktop. As of June 2026 it scores 72.5% on OSWorld-Verified with Sonnet 4.6 and 78.0% with Opus 4.7, roughly at or above the 72.4% human-expert baseline, but those numbers come from clean benchmark environments. On the messy real web (popups, slow loads, drifted layouts) the reliable pattern is the same one a careful operator uses: pick a repeatable, read-mostly task, run it in a sandbox, cap the run at 8-12 steps, and add an explicit verification step every 2-3 actions. If you do not want to write code, use it through Claude Cowork in the desktop app (Pro at $20/month and up); if you do, use the API computer use tool (computer_20251124, beta header computer-use-2025-11-24).

What this covers

Computer Use is Anthropic’s tool that gives Claude four primitives: capture a screenshot, control the mouse, send keystrokes, and interact with any visible application. The demos look slick; the day-to-day is messier. This guide is the practical setup: which of the three access paths to pick, which tasks are worth automating, how to scope a run so it can fail safely, and the review pattern that catches the silent miss.

Three ways to run it in June 2026

Path	Best for	Plan / cost	Setup
Claude Cowork (desktop app)	Non-engineers, one-off and recurring desktop tasks	Pro $20/mo, Max $100/$200/mo (included)	None; runs in an isolated local VM, research preview
API computer use tool	Engineers building repeatable, scripted automations	Pay per token: Sonnet 4.6 $3/$15, Opus 4.7 $5/$25 per 1M tokens	Beta header `computer-use-2025-11-24`, your own loop
Reference Docker demo	Trying it safely on day one	API token cost only	`anthropics/anthropic-quickstarts` container

Cowork has been bundled with Claude Pro since January 16, 2026 and reached Windows parity on February 10, 2026. It runs code and browser actions inside an isolated virtual machine on your own machine, so it is the lowest-risk way for a non-engineer to start. The API tool works with Opus 4.7, Opus 4.6, Opus 4.5, and Sonnet 4.6 using the computer-use-2025-11-24 header; older Sonnet 4.5 and Haiku 4.5 still use computer-use-2025-01-24.

Who this is for

Operators, analysts, and support staff with a recurring sequence of clicks they hate: pulling a weekly report from a dashboard, filing the same ticket form, harvesting numbers from a portal that has no API. Engineers usually have better tools (a real script, an API call), so the high-leverage audience is non-engineers using Cowork, plus engineers who specifically need to drive a UI that exposes no API.

When to reach for it

Reach for Computer Use when the task is browser-based, repeatable, and read-mostly. Pulling a chart screenshot weekly, copying a table from a legacy admin panel, filling a known form with structured input: all good. Anything involving a payment confirmation, a destructive button, or a real-time judgment call: not yet. The model still misclicks, and on OSWorld-Verified the gap between 78% and 100% is exactly the long tail of dynamic UIs you will hit in production.

Before you start

Isolate it. Run in a dedicated VM or container with minimal privileges, never your main desktop. Anthropic’s own guidance is to sandbox the environment so a misclick or a prompt injection cannot touch sensitive data. Cowork does this for you with a local VM; for the API you build the sandbox.
Write the task as a numbered list a junior could follow without you in the room. If you cannot write it down, Claude cannot follow it.
Cap the run at 8-12 steps. Each action is a separate API round trip (screenshot, decide, act, screenshot again), so error compounds and cost climbs with length. Past ~12 steps you cannot tell where it lost the plot.
Decide stop conditions up front: success looks like X, failure looks like Y, ambiguity means halt and ask.

Step by step (API or Cowork)

Spin up the sandbox. For Cowork this is automatic. For the API, run the reference Docker container or your own VM with an Xvfb virtual display. Log in to the target system manually so you are not handing credentials to the model. If login is unavoidable, pass them only inside <robot_credentials> XML tags, never in plain prose.
Set a sane resolution. The reference implementation defaults to 1024x768 (XGA). Opus 4.7 accepts up to 2576 px on the long edge with coordinates 1:1 to image pixels, but smaller screenshots are faster and cheaper. If small text (file names, tab titles, line numbers) trips it up, enable the zoom action with enable_zoom: true instead of cranking the whole resolution.
Paste the task as a numbered list with explicit targets: “Click the gear icon top-right”, not “go to settings”. Put the instruction text before any reference screenshot in the message; describing the target before the image is processed measurably improves click accuracy.
Add a verification step every 2-3 actions. Anthropic recommends prompting: “After each step, take a screenshot and carefully evaluate if you achieved the right outcome. Only when you confirm a step was correct should you move on.” Verification turns a silent miss into a halt-and-ask.
Watch the first run end to end. You are not optimizing yet; you are mapping where it stalls. Note timeouts, ambiguous popups, layout shifts.
Harden the brittle spots. Usually it is wait-for-load issues: add “wait for the table to render before clicking Export.” Re-run.
Save the prompt once it is stable across three runs. Add a one-line “what good looks like” so future-you remembers the success shape.

First-run exercise

Pick the dullest, lowest-stakes task you do at least once a week: exporting a CSV from a single dashboard.
Run it with no edits the first time. Expect roughly 2-3x your manual speed initially; speed is not the point yet.
Save the screen recording and watch it back. Mark every place Claude hesitated; that is where to add a verification step.
Re-run with the patched prompt. Goal: zero hesitation, not zero seconds.

Quality check

Did every verification step pass? All checkpoints green but a wrong final output means the checkpoints are in the wrong place.
Spot-check against a known-good run from last week. Computer Use can grab the wrong row when a dashboard re-sorts itself between runs.
Log the run ID and the action list for anything touching shared systems. You need an audit trail before letting it run unattended.

Cost and accuracy, with real numbers

Each action is one API call that includes a fresh screenshot, and screenshots are images that consume input tokens; the computer use beta also adds roughly 466-499 tokens to the system prompt on every call. That is why token usage runs far higher than a normal chat and why capping at 8-12 steps matters for spend. To cut cost, prefer Sonnet 4.6 ($3 in / $15 out per 1M tokens) over Opus 4.7 ($5 / $25) for routine UI work: Sonnet 4.6’s 72.5% on OSWorld-Verified is essentially level with the prior Opus generation (Opus 4.6 at ~72.7%), and Opus 4.7’s edge (78.0%) mostly shows up on the hard, dynamic flows.

Model	OSWorld-Verified	API price (in/out, per 1M)	Use when
Claude Sonnet 4.6	72.5%	$3 / $15	Default for routine, structured UI tasks
Claude Opus 4.7	78.0%	$5 / $25	Hard, dynamic, multi-step flows
Human expert (baseline)	~72.4%	—	The bar both models now meet

Benchmark figures are as of June 2026; treat them as ceiling-in-clean-conditions, not your dashboard.

Security: the part people skip

Computer Use reads the screen and acts on it, so a malicious instruction hidden in a webpage or image can try to hijack the run (prompt injection). Anthropic ships two mitigations: the model is trained to resist injected instructions, and a classifier scans screenshots and steers Claude to ask for confirmation when it spots a likely injection. Neither is a substitute for isolation. Keep credentials and sensitive data out of the sandbox, keep a human approving anything irreversible, and review Anthropic’s computer use security guidance before pointing it at a logged-in app.

How to reuse this workflow

Keep a computer-use-runbook.md per task: prompt, expected screenshots, stop conditions. Treat it like an SRE runbook, not a chat snippet.
Build prompts in pairs: a dry-run version that screenshots but never clicks destructive buttons, and a live version that does. Always test in dry-run after a UI change.
Run a small regression weekly when the target site updates often. UI changes silently break automation; weekly catches it before Monday.
Pair with Claude Skills so the team can fire the task by name from a normal chat.

Common mistakes

Pointing it at your main desktop. One misclick on a real Slack message is enough to regret it.
Skipping verification steps. The model will happily continue five steps past a silent failure.
Automating a task you barely do. Payoff requires repetition; one-offs are faster by hand.
Trusting it with anything irreversible (payments, deletes, sends). Keep a human approval on those.
Letting the prompt drift into “use your best judgment.” On Computer Use, that phrase is a license to misclick.
Maxing out resolution to read small text. Use the zoom action instead; full-screen high-res screenshots just cost more tokens.

FAQ

Is Computer Use safe to run on my work laptop? Not directly. Use Cowork (it runs in an isolated local VM) or, for the API, a dedicated VM or container with minimal privileges. Keep credentials out of the prompt; log in manually and hand over the session.

Which Claude model should I use? Sonnet 4.6 for routine, structured UIs (its 72.5% on OSWorld-Verified is roughly level with the prior Opus generation at a fraction of the price), and Opus 4.7 (78.0%) only for harder, dynamic, multi-step flows. Both use the computer-use-2025-11-24 beta header.

Do I need to write code? No. Claude Cowork in the desktop app gives you no-code Computer Use, included with Pro ($20/month) and Max ($100 and $200/month). The API tool is for engineers building scripted, repeatable automations.

How much does it cost? On the API you pay per token, and screenshots are images that consume tokens, so a run is far pricier than a chat: roughly 466-499 extra system tokens per call plus a screenshot each step. Sonnet 4.6 is $3 in / $15 out per 1M tokens; Opus 4.7 is $5 / $25. Capping runs at 8-12 steps is the main cost lever.

Can it handle two-factor login? No. It cannot read SMS codes, Authenticator apps, or hardware keys. Log in manually, then hand the authenticated session to Computer Use for the remaining steps.

Tags: #Claude #computer-use #automation #Tutorial

TL;DR

What this covers

Three ways to run it in June 2026

Who this is for

When to reach for it

Before you start

Step by step (API or Cowork)

First-run exercise

Quality check

Cost and accuracy, with real numbers

Security: the part people skip

How to reuse this workflow

Common mistakes

FAQ

Related

Related Articles

Claude Mobile Voice Workflow: Draft Half a Doc on the Walk Home

Claude Skills Walkthrough: How a Skill Actually Fires (2026)

Claude Team Knowledge Base Workflow: Shared Projects That Last 6 Months

Claude vs Codex for PM Tasks (June 2026): Which Saves More Time

Claude Analysis Workflow: Categorize Before You Conclude

Claude Artifacts Deep Workflow: Build, Persist, and Share (2026)