🇺🇸 USA · Anthropic Computer Use

Status: 🟩 COMPLETE 🟦 LIVING Last updated: 2026-06-26 Plain-English tagline: Anthropic’s API capability that lets Claude see your screen and operate your computer — take screenshots, click buttons, type, scroll. The foundation for AI desktop agents.


Front-matter facts

FieldValue
VendorAnthropic (San Francisco, USA)
Country / origin🇺🇸 USA
Recommended for Australian users?✅ Yes — via Claude API, accessible globally
Privacy summaryScreen captures sent to Anthropic API for processing; no-training-by-default on API. Use sandboxed environments for sensitive workflows.
Free tierAPI has free trial credit; production use is metered
Paid tiersOpenAI API-style billing — pay per token; vision tokens cost extra
First releasedOctober 2024 (initial public release)
Last reviewed2026-06-26
Official sitehttps://docs.claude.com/en/docs/build-with-claude/computer-use

What it is

Computer Use is an Anthropic API capability that gives Claude the ability to:

  • Take screenshots of a computer screen (mouse + keyboard via the OS)
  • Click at specific coordinates
  • Type text
  • Scroll
  • Press keyboard shortcuts

In other words: Claude can operate a computer like a human can. The model receives the screenshot, decides what to do (click here, type there), the response is executed in your environment, the new screenshot is sent back, the loop continues.

Important framing: Computer Use is an API capability, not a finished product. Anthropic provides:

  • The model (claude-sonnet-4-X and others have Computer Use vision tuning)
  • API access to a special set of tools (computer_20250108, screenshot, etc.)
  • Reference container images for running it safely

You build the actual product that uses Computer Use. Many third-party products and Anthropic’s own offerings sit on top:

  • Claude in Chrome (Anthropic’s browser extension) — uses Computer-Use-style operation in the browser
  • Claude Code Computer Use mode — Claude Code can drive your desktop via Computer Use
  • Third-party agents — anyone can build computer-operating agents on the API

Analogy: Computer Use is the underlying “see and click” muscle for AI agents. ChatGPT’s Operator and Google’s Project Mariner are competing implementations of the same broad capability — driven by their own models.


What you’d use it for

  • Automating GUI workflows — applications that have no API but do have UIs
  • Web browsing as an agent — opening pages, filling forms, downloading files
  • Cross-application workflows — copy data from app A, paste into app B
  • Repetitive desktop tasks — anywhere “click here, type there” repeats
  • Testing UI — visual regression testing, accessibility audits
  • Building custom agents for proprietary internal applications

What you’d NOT use it for:

  • Tasks better done via API (always prefer API over UI automation when one exists)
  • High-security workflows on your main machine (use sandbox)
  • Real-time critical work (latency of screenshot loops adds up)

How to use it

Via the Anthropic API

  1. Sign up at console.anthropic.com
  2. Use a Computer-Use-capable model (Claude Sonnet 4.x, Opus 4.x, etc.)
  3. Configure the computer_use tool in your API request
  4. Run your agent in a sandboxed environment (Docker container with virtual display, or a dedicated VM, or a remote machine)
  5. The loop: screenshot → model decides action → execute → screenshot → …

Via Claude Code’s Computer Use mode

  1. In Claude Code, enable Computer Use via configuration
  2. Claude Code can drive your local desktop within configured permissions
  3. Recommended: use Anthropic’s reference Docker container for safety

Via third-party products

  • Multiple agent products are built on Computer Use under the hood — check the product, not the underlying capability

Reference implementation

  • Anthropic publishes a Docker container at anthropic-quickstarts/computer-use-demo on GitHub with a safe sandboxed Linux environment

What it costs — what you actually get

Anthropic API pricing

  • Standard token rates apply (input/output tokens)
  • Vision tokens are real — every screenshot adds image-input tokens; the cost scales with image resolution and frequency
  • Typical Computer Use session can run US$0.50-5.00 in API costs depending on complexity and image resolution

Cost-control techniques

  • Lower screenshot resolution when full-res isn’t needed
  • Limit loop iterations — set max steps
  • Cache static content — Anthropic supports prompt caching
  • Use cheaper Computer-Use-capable models when the task allows (Haiku 4.5 can do many Computer Use tasks)

Subscription bundling

  • Computer Use is NOT included in claude.ai Pro / Max — those are consumer subscriptions for chat
  • It’s an API-only capability for now

How it compares to alternatives

CapabilityAnthropic Computer UseOpenAI Operator / AgentGoogle Project MarinerDevin
Underlying capabilityAPI + reference containerProduct (cloud)Product (browser)Product (managed)
Open / self-hostedYes (reference Docker image)NoNoNo
Cost modelAPI token-basedChatGPT Pro subscriptionGoogle AI Ultra subscriptionPer-Devin-instance
Primary surfaceAnywhere you run itchatgpt.comChrome browserdevin.ai
MaturityReleased Oct 2024Generally available 2025Preview 2024-25Generally available 2024
For building your own agentBest — full API + containerLimitedLimitedLimited

If you’re a developer building an AI agent product, Computer Use is the foundation you’d choose. If you’re a consumer wanting an out-of-the-box agent, OpenAI’s Operator (in ChatGPT Pro), Project Mariner (in Google AI Ultra), or Devin are more polished.


Privacy / data handling

Critical to understand: when Claude operates your computer, screenshots of whatever is on the screen are sent to Anthropic’s API. This includes:

  • Open browser tabs
  • Email contents visible
  • Other applications
  • Any sensitive data visible in any window

Best practice for safety:

  1. Use a sandboxed environment — Anthropic’s reference Docker container, a dedicated VM, a remote machine. NOT your main daily-driver laptop with personal data visible.
  2. Anthropic API does NOT train on Computer Use inputs by default — same no-training policy as the rest of the API
  3. Permissions — limit what the sandbox has access to (no email login, no banking, no production credentials)
  4. Inspect actions before execution for high-stakes operations — Anthropic’s reference implementation supports a “pause-and-confirm” mode

Where data lives: US data centres for standard API; AUS data residency via AWS Bedrock Sydney when running through Bedrock.


Recent changes

  • 2026: Computer Use generally available across more Claude models; quality improvements
  • 2025: Computer Use expanded; reference container and quickstart matured
  • October 2024: Initial public release of Computer Use capability

Gotchas

  • Run in a sandbox. This is the single most important safety practice. Don’t point Computer Use at your daily-driver desktop without careful permissions.
  • Vision tokens add up fast. Set a budget and monitor.
  • Latency is real — each screenshot-to-decision-to-execute loop takes seconds. Computer Use is for tasks where the alternative is “I’d do this manually,” not for real-time interaction.
  • Reliability varies by task — modern Computer Use is excellent for simple, well-defined GUI tasks; complex multi-app workflows can fail at unexpected steps.
  • Don’t combine with sensitive data — assume the model sees everything on screen; act accordingly.
  • Display resolution matters — Computer Use works best at standard resolutions (1280x800, 1366x768, 1920x1080). Very high-DPI or unusual aspect ratios can confuse coordinate translation.
  • OS-specific quirks — works on Linux (best supported), macOS, Windows; specifics differ.
  • Web-only workflows are often better with Claude in Chrome (which is purpose-built for browser) than with full desktop Computer Use.
  • Computer Use is API-only. No claude.ai-included consumer surface for it. Buy ChatGPT Pro for Operator or Google AI Ultra for Mariner if you want a consumer agent.

See also


Sources