Computer Use Guide

Implementation note: EchoFlow Code provides local Computer Use through a Python Bridge. macOS uses pyautogui + mss + pyobjc, and Windows uses pyautogui + mss + win32gui + psutil, wrapping screenshots, mouse, keyboard, and app management as auditable local MCP tools.

Overview
Supported Platforms
How It Works
Quick Start
Usage
Security
Environment Variables
Technical Architecture
Approaches We Tried
Known Limitations
References and Credits

Overview

Computer Use allows AI models to directly control your computer — taking screenshots, moving the mouse, clicking buttons, typing text, and managing application windows.

24 MCP tools are available:

Category	Tools
Screenshot	`screenshot`, `zoom`
Mouse	`left_click`, `right_click`, `middle_click`, `double_click`, `triple_click`, `left_click_drag`, `mouse_move`, `left_mouse_down`, `left_mouse_up`, `cursor_position`, `scroll`
Keyboard	`type`, `key`, `hold_key`
Apps	`open_application`, `switch_display`
Permissions	`request_access`, `list_granted_applications`
Clipboard	`read_clipboard`, `write_clipboard`
Other	`wait`, `computer_batch`

Supported Platforms

Platform	Architecture	Status	Notes
macOS	Apple Silicon (M1/M2/M3/M4)	✅ Fully supported	Recommended
macOS	Intel x86_64	✅ Fully supported
Windows	x64	✅ Fully supported	Uses `win32gui` + `psutil` + `pyperclip` + `screeninfo` instead of macOS APIs
Linux	Any	⚠️ Theoretically possible	Same as above — `pyobjc` needs to be replaced with `wmctrl` + `xdotool`. Not yet adapted

Requirements

Bun >= 1.1.0
Python >= 3.8 (venv and dependencies are auto-installed on first use)
macOS permissions: Accessibility + Screen Recording
Windows: no extra OS permission setup

How It Works

Computer Use operates through a screenshot → analyze → act feedback loop:

┌────────────────────────────────────────────────────┐
│  AI Model (Claude / any Anthropic-protocol model)   │
│                                                     │
│  1. Receives user request: "open Music app"         │
│  2. Calls screenshot tool → receives screen image   │
│  3. Model analyzes pixels, identifies UI elements   │
│     → "search box is at (756, 342)"                 │
│  4. Calls left_click { coordinate: [756, 342] }     │
│  5. Calls type { text: "search query" }             │
│  6. Calls screenshot again → verify → next step...  │
└───────────────┬────────────────────────────────────┘
                │ MCP Tool Call
                ▼
┌────────────────────────────────────────────────────┐
│  TypeScript Tool Layer (vendor/computer-use-mcp)    │
│  - Security checks (app allowlist, TCC permissions) │
│  - Coordinate transformation                        │
│  - Tool dispatch → executor                         │
└───────────────┬────────────────────────────────────┘
                │ callPythonHelper()
                ▼
┌────────────────────────────────────────────────────┐
│  Python Bridge                                      │
│  macOS: runtime/mac_helper.py                       │
│  Windows: runtime/win_helper.py                     │
│  pyautogui.click(756, 342)   ← mouse control        │
│  mss.grab(monitor)           ← screenshot            │
│  NSWorkspace / win32gui      ← app management        │
└────────────────────────────────────────────────────┘

Key: Coordinate analysis is performed entirely by the model's vision capabilities — it "sees" the screenshot like a human sees a screen, identifying buttons, text fields, and other UI elements directly from pixels.

Quick Start

1. Install dependencies

bash

bun install

2. Ensure Python 3 is available

bash

python3 --version  # >= 3.8 required

Python dependencies are automatically installed into .runtime/venv/ on first Computer Use invocation.

3. Grant macOS permissions

Accessibility:

bash

open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"

Add your terminal app (iTerm, Terminal, Ghostty, etc.) to the allow list.

Screen Recording:

bash

open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"

Add your terminal app as well. You may need to restart your terminal after granting permission.

4. Start

bash

./bin/claude-haha

5. Use

Just ask in natural language:

> Take a screenshot of my desktop
> Open Safari and search for something
> Type "hello" in the text editor

Disable Computer Use

If you only want the regular Coding Agent and do not want to expose computer-use MCP tools, disable it with either command:

bash

claude-haha --no-computer-use
CLAUDE_COMPUTER_USE_ENABLED=0 claude-haha

You can also write the global config file at ~/.claude/cc-haha/computer-use-config.json:

json

{
  "enabled": false
}

The desktop Settings > Computer Use switch writes the same config. Once disabled, new sessions will not inject the dynamic computer-use MCP server or add its desktop-control tools to allowedTools.

Security

Mechanism	Description
App allowlist	Each session requires explicit authorization for which apps Claude can interact with
Concurrency lock	Only one Claude session can use Computer Use at a time (file lock)
Clipboard guard	Original clipboard content is saved and restored when typing via clipboard
Sensitive action gates	System keyboard shortcuts require additional authorization

Note: The Python Bridge currently does not provide a global Escape hotkey abort or automatic window hiding. Use Ctrl+C to abort instead.

Environment Variables

Variable	Default	Description
`CLAUDE_COMPUTER_USE_ENABLED`	`1`	Set to `0` to disable Computer Use
`CLAUDE_COMPUTER_USE_COORDINATE_MODE`	`pixels`	Coordinate mode: `pixels` or `normalized_0_100`
`CLAUDE_COMPUTER_USE_CLIPBOARD_PASTE`	`1`	Enable clipboard-based text input
`CLAUDE_COMPUTER_USE_MOUSE_ANIMATION`	`1`	Enable mouse animation
`CLAUDE_COMPUTER_USE_DEBUG`	`0`	Debug mode

Technical Architecture

Capability Enablement

EchoFlow Code controls Computer Use through local configuration and startup flags instead of remote feature flags. The related switches are centralized in gates.ts and config files so CLI, desktop, and tests share the same behavior.

Layer	Current Strategy
Build config	Computer Use tools are injected when enabled
Local config	`CLAUDE_COMPUTER_USE_ENABLED` and `computer-use-config.json` control availability
Remote config	No remote feature flag dependency
Session safety	App allowlists, OS permissions, and sensitive-action checks remain active

Python Bridge

On first invocation, the bridge automatically:

Creates a Python virtual environment (.runtime/venv/)
Installs pip
Installs dependencies (mss, Pillow, pyautogui, pyobjc-*)
Validates via SHA256 hash (only reinstalls when requirements.txt changes)

Approaches We Tried

Approach 1: Extract native .node modules from Claude Code binary ❌

Extracted computer-use-swift.node and computer-use-input.node from the installed Claude Code Mach-O binary. Synchronous methods worked, but async Swift methods (screenshot) hung due to N-API async incompatibility between Bun versions.

Approach 2: Create empty stub packages ❌

Stub packages allowed compilation but provided no actual functionality.

Approach 3: Python Bridge ✅ (current)

Replaced all native module calls with Python subprocess calls via callPythonHelper(). Zero binary dependencies, auto-bootstrapping, full functionality on any macOS.

Known Limitations

Limitation	Description
Linux not adapted	Linux needs `wmctrl` + `xdotool` style platform integration
No global Escape abort	Original used CGEventTap; use `Ctrl+C` instead
No auto-hide windows	Original's `prepareDisplay` relied on Swift
Slightly higher latency	~100ms Python process startup overhead per call

References and Credits

Project	License	Contribution
wimi321/macos-computer-use-skill	MIT	Python bridge architecture, `mac_helper.py` runtime, executor adaptation
domdomegg/computer-use-mcp	MIT	Independent Computer Use MCP server (nut.js based), used as reference
paoloanzn/free-code	-	Feature flag system analysis
oboard/claude-code-rev	-	Early compatibility research and stub package reference

Underlying Libraries

Library	Purpose
pyautogui	Mouse and keyboard control
mss	Screenshot capture
Pillow	Image processing and compression
pyobjc	macOS Cocoa/Quartz framework bindings

Computer Use Guide ​

Table of Contents ​

Overview ​

Supported Platforms ​

Requirements ​

How It Works ​

Quick Start ​

1. Install dependencies ​

2. Ensure Python 3 is available ​

3. Grant macOS permissions ​

4. Start ​

5. Use ​

Disable Computer Use ​

Security ​

Environment Variables ​

Technical Architecture ​

Capability Enablement ​

Python Bridge ​

Approaches We Tried ​

Approach 1: Extract native .node modules from Claude Code binary ❌ ​

Approach 2: Create empty stub packages ❌ ​

Approach 3: Python Bridge ✅ (current) ​

Known Limitations ​

References and Credits ​

Underlying Libraries ​

Computer Use Guide

Table of Contents

Overview

Supported Platforms

Requirements

How It Works

Quick Start

1. Install dependencies

2. Ensure Python 3 is available

3. Grant macOS permissions

4. Start

5. Use

Disable Computer Use

Security

Environment Variables

Technical Architecture

Capability Enablement

Python Bridge

Approaches We Tried

Approach 1: Extract native .node modules from Claude Code binary ❌

Approach 2: Create empty stub packages ❌

Approach 3: Python Bridge ✅ (current)

Known Limitations

References and Credits

Underlying Libraries