Skip to content

Computer Use Guide

Implementation note: EchoFlow Code provides local Computer Use through a Python Bridge. macOS uses pyautogui + mss + pyobjc, and Windows uses pyautogui + mss + win32gui + psutil, wrapping screenshots, mouse, keyboard, and app management as auditable local MCP tools.


Table of Contents


Overview

Computer Use allows AI models to directly control your computer — taking screenshots, moving the mouse, clicking buttons, typing text, and managing application windows.

24 MCP tools are available:

CategoryTools
Screenshotscreenshot, zoom
Mouseleft_click, right_click, middle_click, double_click, triple_click, left_click_drag, mouse_move, left_mouse_down, left_mouse_up, cursor_position, scroll
Keyboardtype, key, hold_key
Appsopen_application, switch_display
Permissionsrequest_access, list_granted_applications
Clipboardread_clipboard, write_clipboard
Otherwait, computer_batch

Supported Platforms

PlatformArchitectureStatusNotes
macOSApple Silicon (M1/M2/M3/M4)✅ Fully supportedRecommended
macOSIntel x86_64✅ Fully supported
Windowsx64✅ Fully supportedUses win32gui + psutil + pyperclip + screeninfo instead of macOS APIs
LinuxAny⚠️ Theoretically possibleSame as above — pyobjc needs to be replaced with wmctrl + xdotool. Not yet adapted

Requirements

  • Bun >= 1.1.0
  • Python >= 3.8 (venv and dependencies are auto-installed on first use)
  • macOS permissions: Accessibility + Screen Recording
  • Windows: no extra OS permission setup

How It Works

Computer Use operates through a screenshot → analyze → act feedback loop:

┌────────────────────────────────────────────────────┐
│  AI Model (Claude / any Anthropic-protocol model)   │
│                                                     │
│  1. Receives user request: "open Music app"         │
│  2. Calls screenshot tool → receives screen image   │
│  3. Model analyzes pixels, identifies UI elements   │
│     → "search box is at (756, 342)"                 │
│  4. Calls left_click { coordinate: [756, 342] }     │
│  5. Calls type { text: "search query" }             │
│  6. Calls screenshot again → verify → next step...  │
└───────────────┬────────────────────────────────────┘
                │ MCP Tool Call

┌────────────────────────────────────────────────────┐
│  TypeScript Tool Layer (vendor/computer-use-mcp)    │
│  - Security checks (app allowlist, TCC permissions) │
│  - Coordinate transformation                        │
│  - Tool dispatch → executor                         │
└───────────────┬────────────────────────────────────┘
                │ callPythonHelper()

┌────────────────────────────────────────────────────┐
│  Python Bridge                                      │
│  macOS: runtime/mac_helper.py                       │
│  Windows: runtime/win_helper.py                     │
│  pyautogui.click(756, 342)   ← mouse control        │
│  mss.grab(monitor)           ← screenshot            │
│  NSWorkspace / win32gui      ← app management        │
└────────────────────────────────────────────────────┘

Key: Coordinate analysis is performed entirely by the model's vision capabilities — it "sees" the screenshot like a human sees a screen, identifying buttons, text fields, and other UI elements directly from pixels.


Quick Start

1. Install dependencies

bash
bun install

2. Ensure Python 3 is available

bash
python3 --version  # >= 3.8 required

Python dependencies are automatically installed into .runtime/venv/ on first Computer Use invocation.

3. Grant macOS permissions

Accessibility:

bash
open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"

Add your terminal app (iTerm, Terminal, Ghostty, etc.) to the allow list.

Screen Recording:

bash
open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"

Add your terminal app as well. You may need to restart your terminal after granting permission.

4. Start

bash
./bin/claude-haha

5. Use

Just ask in natural language:

> Take a screenshot of my desktop
> Open Safari and search for something
> Type "hello" in the text editor

Disable Computer Use

If you only want the regular Coding Agent and do not want to expose computer-use MCP tools, disable it with either command:

bash
claude-haha --no-computer-use
CLAUDE_COMPUTER_USE_ENABLED=0 claude-haha

You can also write the global config file at ~/.claude/cc-haha/computer-use-config.json:

json
{
  "enabled": false
}

The desktop Settings > Computer Use switch writes the same config. Once disabled, new sessions will not inject the dynamic computer-use MCP server or add its desktop-control tools to allowedTools.


Security

MechanismDescription
App allowlistEach session requires explicit authorization for which apps Claude can interact with
Concurrency lockOnly one Claude session can use Computer Use at a time (file lock)
Clipboard guardOriginal clipboard content is saved and restored when typing via clipboard
Sensitive action gatesSystem keyboard shortcuts require additional authorization

Note: The Python Bridge currently does not provide a global Escape hotkey abort or automatic window hiding. Use Ctrl+C to abort instead.


Environment Variables

VariableDefaultDescription
CLAUDE_COMPUTER_USE_ENABLED1Set to 0 to disable Computer Use
CLAUDE_COMPUTER_USE_COORDINATE_MODEpixelsCoordinate mode: pixels or normalized_0_100
CLAUDE_COMPUTER_USE_CLIPBOARD_PASTE1Enable clipboard-based text input
CLAUDE_COMPUTER_USE_MOUSE_ANIMATION1Enable mouse animation
CLAUDE_COMPUTER_USE_DEBUG0Debug mode

Technical Architecture

Capability Enablement

EchoFlow Code controls Computer Use through local configuration and startup flags instead of remote feature flags. The related switches are centralized in gates.ts and config files so CLI, desktop, and tests share the same behavior.

LayerCurrent Strategy
Build configComputer Use tools are injected when enabled
Local configCLAUDE_COMPUTER_USE_ENABLED and computer-use-config.json control availability
Remote configNo remote feature flag dependency
Session safetyApp allowlists, OS permissions, and sensitive-action checks remain active

Python Bridge

On first invocation, the bridge automatically:

  1. Creates a Python virtual environment (.runtime/venv/)
  2. Installs pip
  3. Installs dependencies (mss, Pillow, pyautogui, pyobjc-*)
  4. Validates via SHA256 hash (only reinstalls when requirements.txt changes)

Approaches We Tried

Approach 1: Extract native .node modules from Claude Code binary ❌

Extracted computer-use-swift.node and computer-use-input.node from the installed Claude Code Mach-O binary. Synchronous methods worked, but async Swift methods (screenshot) hung due to N-API async incompatibility between Bun versions.

Approach 2: Create empty stub packages ❌

Stub packages allowed compilation but provided no actual functionality.

Approach 3: Python Bridge ✅ (current)

Replaced all native module calls with Python subprocess calls via callPythonHelper(). Zero binary dependencies, auto-bootstrapping, full functionality on any macOS.


Known Limitations

LimitationDescription
Linux not adaptedLinux needs wmctrl + xdotool style platform integration
No global Escape abortOriginal used CGEventTap; use Ctrl+C instead
No auto-hide windowsOriginal's prepareDisplay relied on Swift
Slightly higher latency~100ms Python process startup overhead per call

References and Credits

ProjectLicenseContribution
wimi321/macos-computer-use-skillMITPython bridge architecture, mac_helper.py runtime, executor adaptation
domdomegg/computer-use-mcpMITIndependent Computer Use MCP server (nut.js based), used as reference
paoloanzn/free-code-Feature flag system analysis
oboard/claude-code-rev-Early compatibility research and stub package reference

Underlying Libraries

LibraryPurpose
pyautoguiMouse and keyboard control
mssScreenshot capture
PillowImage processing and compression
pyobjcmacOS Cocoa/Quartz framework bindings

Released under the MIT License.