Computer Use Guide
Implementation note: EchoFlow Code provides local Computer Use through a Python Bridge. macOS uses
pyautogui+mss+pyobjc, and Windows usespyautogui+mss+win32gui+psutil, wrapping screenshots, mouse, keyboard, and app management as auditable local MCP tools.
Table of Contents
- Overview
- Supported Platforms
- How It Works
- Quick Start
- Usage
- Security
- Environment Variables
- Technical Architecture
- Approaches We Tried
- Known Limitations
- References and Credits
Overview
Computer Use allows AI models to directly control your computer — taking screenshots, moving the mouse, clicking buttons, typing text, and managing application windows.
24 MCP tools are available:
| Category | Tools |
|---|---|
| Screenshot | screenshot, zoom |
| Mouse | left_click, right_click, middle_click, double_click, triple_click, left_click_drag, mouse_move, left_mouse_down, left_mouse_up, cursor_position, scroll |
| Keyboard | type, key, hold_key |
| Apps | open_application, switch_display |
| Permissions | request_access, list_granted_applications |
| Clipboard | read_clipboard, write_clipboard |
| Other | wait, computer_batch |
Supported Platforms
| Platform | Architecture | Status | Notes |
|---|---|---|---|
| macOS | Apple Silicon (M1/M2/M3/M4) | ✅ Fully supported | Recommended |
| macOS | Intel x86_64 | ✅ Fully supported | |
| Windows | x64 | ✅ Fully supported | Uses win32gui + psutil + pyperclip + screeninfo instead of macOS APIs |
| Linux | Any | ⚠️ Theoretically possible | Same as above — pyobjc needs to be replaced with wmctrl + xdotool. Not yet adapted |
Requirements
- Bun >= 1.1.0
- Python >= 3.8 (venv and dependencies are auto-installed on first use)
- macOS permissions: Accessibility + Screen Recording
- Windows: no extra OS permission setup
How It Works
Computer Use operates through a screenshot → analyze → act feedback loop:
┌────────────────────────────────────────────────────┐
│ AI Model (Claude / any Anthropic-protocol model) │
│ │
│ 1. Receives user request: "open Music app" │
│ 2. Calls screenshot tool → receives screen image │
│ 3. Model analyzes pixels, identifies UI elements │
│ → "search box is at (756, 342)" │
│ 4. Calls left_click { coordinate: [756, 342] } │
│ 5. Calls type { text: "search query" } │
│ 6. Calls screenshot again → verify → next step... │
└───────────────┬────────────────────────────────────┘
│ MCP Tool Call
▼
┌────────────────────────────────────────────────────┐
│ TypeScript Tool Layer (vendor/computer-use-mcp) │
│ - Security checks (app allowlist, TCC permissions) │
│ - Coordinate transformation │
│ - Tool dispatch → executor │
└───────────────┬────────────────────────────────────┘
│ callPythonHelper()
▼
┌────────────────────────────────────────────────────┐
│ Python Bridge │
│ macOS: runtime/mac_helper.py │
│ Windows: runtime/win_helper.py │
│ pyautogui.click(756, 342) ← mouse control │
│ mss.grab(monitor) ← screenshot │
│ NSWorkspace / win32gui ← app management │
└────────────────────────────────────────────────────┘Key: Coordinate analysis is performed entirely by the model's vision capabilities — it "sees" the screenshot like a human sees a screen, identifying buttons, text fields, and other UI elements directly from pixels.
Quick Start
1. Install dependencies
bun install2. Ensure Python 3 is available
python3 --version # >= 3.8 requiredPython dependencies are automatically installed into
.runtime/venv/on first Computer Use invocation.
3. Grant macOS permissions
Accessibility:
open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"Add your terminal app (iTerm, Terminal, Ghostty, etc.) to the allow list.
Screen Recording:
open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"Add your terminal app as well. You may need to restart your terminal after granting permission.
4. Start
./bin/claude-haha5. Use
Just ask in natural language:
> Take a screenshot of my desktop
> Open Safari and search for something
> Type "hello" in the text editorDisable Computer Use
If you only want the regular Coding Agent and do not want to expose computer-use MCP tools, disable it with either command:
claude-haha --no-computer-use
CLAUDE_COMPUTER_USE_ENABLED=0 claude-hahaYou can also write the global config file at ~/.claude/cc-haha/computer-use-config.json:
{
"enabled": false
}The desktop Settings > Computer Use switch writes the same config. Once disabled, new sessions will not inject the dynamic computer-use MCP server or add its desktop-control tools to allowedTools.
Security
| Mechanism | Description |
|---|---|
| App allowlist | Each session requires explicit authorization for which apps Claude can interact with |
| Concurrency lock | Only one Claude session can use Computer Use at a time (file lock) |
| Clipboard guard | Original clipboard content is saved and restored when typing via clipboard |
| Sensitive action gates | System keyboard shortcuts require additional authorization |
Note: The Python Bridge currently does not provide a global Escape hotkey abort or automatic window hiding. Use
Ctrl+Cto abort instead.
Environment Variables
| Variable | Default | Description |
|---|---|---|
CLAUDE_COMPUTER_USE_ENABLED | 1 | Set to 0 to disable Computer Use |
CLAUDE_COMPUTER_USE_COORDINATE_MODE | pixels | Coordinate mode: pixels or normalized_0_100 |
CLAUDE_COMPUTER_USE_CLIPBOARD_PASTE | 1 | Enable clipboard-based text input |
CLAUDE_COMPUTER_USE_MOUSE_ANIMATION | 1 | Enable mouse animation |
CLAUDE_COMPUTER_USE_DEBUG | 0 | Debug mode |
Technical Architecture
Capability Enablement
EchoFlow Code controls Computer Use through local configuration and startup flags instead of remote feature flags. The related switches are centralized in gates.ts and config files so CLI, desktop, and tests share the same behavior.
| Layer | Current Strategy |
|---|---|
| Build config | Computer Use tools are injected when enabled |
| Local config | CLAUDE_COMPUTER_USE_ENABLED and computer-use-config.json control availability |
| Remote config | No remote feature flag dependency |
| Session safety | App allowlists, OS permissions, and sensitive-action checks remain active |
Python Bridge
On first invocation, the bridge automatically:
- Creates a Python virtual environment (
.runtime/venv/) - Installs pip
- Installs dependencies (
mss,Pillow,pyautogui,pyobjc-*) - Validates via SHA256 hash (only reinstalls when
requirements.txtchanges)
Approaches We Tried
Approach 1: Extract native .node modules from Claude Code binary ❌
Extracted computer-use-swift.node and computer-use-input.node from the installed Claude Code Mach-O binary. Synchronous methods worked, but async Swift methods (screenshot) hung due to N-API async incompatibility between Bun versions.
Approach 2: Create empty stub packages ❌
Stub packages allowed compilation but provided no actual functionality.
Approach 3: Python Bridge ✅ (current)
Replaced all native module calls with Python subprocess calls via callPythonHelper(). Zero binary dependencies, auto-bootstrapping, full functionality on any macOS.
Known Limitations
| Limitation | Description |
|---|---|
| Linux not adapted | Linux needs wmctrl + xdotool style platform integration |
| No global Escape abort | Original used CGEventTap; use Ctrl+C instead |
| No auto-hide windows | Original's prepareDisplay relied on Swift |
| Slightly higher latency | ~100ms Python process startup overhead per call |
References and Credits
| Project | License | Contribution |
|---|---|---|
| wimi321/macos-computer-use-skill | MIT | Python bridge architecture, mac_helper.py runtime, executor adaptation |
| domdomegg/computer-use-mcp | MIT | Independent Computer Use MCP server (nut.js based), used as reference |
| paoloanzn/free-code | - | Feature flag system analysis |
| oboard/claude-code-rev | - | Early compatibility research and stub package reference |
Underlying Libraries
| Library | Purpose |
|---|---|
| pyautogui | Mouse and keyboard control |
| mss | Screenshot capture |
| Pillow | Image processing and compression |
| pyobjc | macOS Cocoa/Quartz framework bindings |