dom0 — System Architecture
Overview
dom0 is a browser automation system that enables AI agents to control Chrome through a CLI interface. It uses Chrome's DevTools Protocol (CDP) via the chrome.debugger API for stealth automation that avoids bot detection.
┌─────────────────────────────────────────────────────────────────────────────┐
│ dom0 SYSTEM │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ AI AGENT │ │
│ │ bot0 │ Claude Code │ Cursor │ Any CLI-capable agent │ │
│ │ │ │
│ │ Uses: Shell commands (dom0 snapshot, dom0 click @d1, etc.) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Shell │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ dom0 CLI │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Commands │ │ Daemon │ │ Output │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ • snapshot │ │ • Auto-start│ │ • Text │ │ │
│ │ │ • click │ │ • Keep-alive│ │ • JSON │ │ │
│ │ │ • type │ │ • WS Server │ │ • Token-eff │ │ │
│ │ │ • navigate │ │ • 3min idle │ │ │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ WebSocket (localhost:9222) │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Chrome Extension (MV3) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Debugger │ │ Accessibility│ │ Actions │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ • Attach │ │ • Get tree │ │ • Click │ │ │
│ │ │ • CDP cmds │ │ • Parse │ │ • Type │ │ │
│ │ │ • Events │ │ • Build refs│ │ • Scroll │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ chrome.debugger API │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Chrome Browser │ │
│ │ │ │
│ │ Active Tab → CDP Commands → DOM/Input Events │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
1. Package Structure
@bot0/dom0 (Core)
Shared types and protocol definitions. No runtime dependencies.
// Key types interface ElementRef { alias: string; // "d1", "d2" backendNodeId: number; // Chrome's internal node ID selector: string; // CSS selector (fallback) role?: string; // ARIA role name?: string; // Accessible name isInteractable: boolean; } interface PageSnapshot { url: string; title: string; refs: Record<string, ElementRef>; tree: string; // Formatted accessibility tree } // WebSocket protocol type Dom0Request = | { type: 'snapshot' } | { type: 'click'; ref: string } | { type: 'type'; ref: string; text: string } | { type: 'navigate'; url: string } // ... 20+ command types interface RequestMessage { id: string; // Unique request ID request: Dom0Request; } interface ResponseMessage { id: string; success: boolean; data?: unknown; error?: string; duration?: number; }
@bot0/dom0-cli (CLI)
Command-line interface with background daemon.
packages/dom0-cli/
├── bin/dom0.js # Entry point
├── src/
│ ├── index.ts # Commander setup
│ ├── daemon.ts # Background daemon
│ ├── client.ts # WebSocket client
│ ├── output.ts # Formatting
│ └── commands/
│ ├── snapshot.ts
│ ├── click.ts
│ ├── type.ts
│ ├── navigate.ts
│ └── ... (17 more)
├── SKILL.md # AI agent instructions
└── package.json
@bot0/dom0-extension (Chrome Extension)
Manifest V3 service worker with chrome.debugger integration.
packages/dom0-extension/
├── manifest.json # MV3 manifest
├── src/
│ ├── background/
│ │ ├── index.ts # Service worker entry
│ │ ├── debugger.ts # CDP wrapper
│ │ └── commands.ts # Command handlers
│ ├── lib/
│ │ ├── accessibility.ts # Tree traversal
│ │ └── refs.ts # Ref assignment
│ └── types/
│ └── index.ts
└── package.json
2. Communication Flow
Daemon Architecture
The CLI uses a background daemon pattern to maintain connection state:
┌─────────────────────────────────────────────────────────────────────────────┐
│ DAEMON ARCHITECTURE │
│ │
│ First command: dom0 snapshot │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 1. CLI checks if daemon running (port 9222) │
│ 2. If not → spawn detached daemon process │
│ 3. Wait for daemon to start │
│ 4. Connect to daemon via WebSocket │
│ 5. Send request, receive response │
│ 6. Disconnect (daemon stays running) │
│ │
│ Subsequent commands: dom0 click @d1 │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 1. CLI checks if daemon running → YES │
│ 2. Connect to daemon via WebSocket │
│ 3. Send request, receive response │
│ 4. Disconnect (daemon stays running) │
│ │
│ Daemon lifecycle: │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ • Starts on first command │
│ • Maintains extension connection │
│ • 3-minute inactivity timeout │
│ • Shuts down automatically when idle │
│ • Can be stopped manually: dom0 stop │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Message Flow
┌─────────────────────────────────────────────────────────────────────────────┐
│ MESSAGE FLOW │
│ │
│ CLI Client Daemon Extension │
│ ────────── ────── ───────── │
│ │
│ 1. Connect ─────────────►│ │
│ │ │
│ 2. Send request ─────────►│ │
│ { id: "abc", │ │
│ request: { │ │
│ type: "click", │ 3. Forward ──────────────────────────►│ │
│ ref: "@d1" │ to extension │ │
│ } │ │ │
│ } │ │ │
│ │ ▼ │
│ │ 4. Execute CDP command │
│ │ chrome.debugger │
│ │ │ │
│ │ 5. Return result │
│ │◄──────────────────────────────────────│ │
│ 6. Receive response ◄────│ │
│ { id: "abc", │ │
│ success: true, │ │
│ duration: 42 │ │
│ } │ │
│ │ │
│ 7. Disconnect │ │
│ │ │
└─────────────────────────────────────────────────────────────────────────────┘
Extension Registration
When the extension connects, it registers with the daemon:
// Extension → Daemon { type: 'extension:register', version: '0.1.0' } // Daemon logs [dom0] Extension registered (v0.1.0)
This allows the daemon to distinguish between:
- CLI client connections (send requests, expect responses)
- Extension connection (receives requests, sends responses)
3. Chrome Debugger Integration
CDP (Chrome DevTools Protocol)
The extension uses chrome.debugger to send CDP commands:
class DebuggerClient { private attached = new Map<number, boolean>(); async attach(tabId: number): Promise<void> { if (this.attached.get(tabId)) return; await chrome.debugger.attach({ tabId }, '1.3'); this.attached.set(tabId, true); } async sendCommand<T>( tabId: number, method: string, params?: object ): Promise<T> { await this.attach(tabId); return chrome.debugger.sendCommand( { tabId }, method, params ) as T; } }
Key CDP Commands Used
| CDP Method | Purpose |
|---|---|
Accessibility.getFullAXTree | Get accessibility tree |
DOM.getBoxModel | Get element position |
DOM.focus | Focus element |
Input.dispatchMouseEvent | Click, hover |
Input.dispatchKeyEvent | Type, press keys |
Page.navigate | Go to URL |
Page.captureScreenshot | Take screenshot |
Runtime.evaluate | Execute JavaScript |
Click Implementation
async click(tabId: number, backendNodeId: number): Promise<void> { // 1. Get element's box model const { model } = await this.sendCommand<BoxModelResult>( tabId, 'DOM.getBoxModel', { backendNodeId } ); // 2. Calculate center point const x = (model.content[0] + model.content[2]) / 2; const y = (model.content[1] + model.content[5]) / 2; // 3. Dispatch real mouse events await this.sendCommand(tabId, 'Input.dispatchMouseEvent', { type: 'mousePressed', x, y, button: 'left', clickCount: 1 }); await this.sendCommand(tabId, 'Input.dispatchMouseEvent', { type: 'mouseReleased', x, y, button: 'left' }); }
4. Accessibility Tree & Refs
Why Accessibility Tree?
The accessibility tree provides:
- Semantic structure — Elements have roles (button, textbox, link)
- Accessible names — Human-readable labels
- Interactability — Focus, click, type capabilities
- Stability — Less brittle than CSS selectors
- Stealth — Same data screen readers use
Tree Parsing
const INTERACTIVE_ROLES = new Set([ 'button', 'link', 'textbox', 'checkbox', 'radio', 'combobox', 'listbox', 'menuitem', 'searchbox', 'slider', 'switch', 'tab', 'option' ]); function isInteractable(node: AXNode): boolean { if (!node.role?.value) return false; if (INTERACTIVE_ROLES.has(node.role.value)) return true; // Also check for clickable/focusable properties const props = node.properties || []; return props.some(p => p.name === 'focusable' && p.value?.value === true ); } async function buildRefMap( debugger: DebuggerClient, tabId: number ): Promise<RefState> { const { nodes } = await debugger.sendCommand<AXTreeResult>( tabId, 'Accessibility.getFullAXTree' ); const refs = new Map<string, ElementRef>(); let counter = 0; for (const node of nodes) { if (isInteractable(node) && node.backendDOMNodeId) { const alias = `d${++counter}`; refs.set(alias, { alias, backendNodeId: node.backendDOMNodeId, selector: '', // Built on-demand if needed role: node.role?.value, name: node.name?.value, isInteractable: true }); } } return { refs, counter }; }
Ref Resolution
When a command references @d1:
function resolveRef(ref: string, state: RefState): ElementRef { // Remove @ prefix if present const alias = ref.startsWith('@') ? ref.slice(1) : ref; const element = state.refs.get(alias); if (!element) { throw new Error(`Unknown ref: ${ref}`); } return element; }
5. Output Formatting
Token-Efficient Design
Output is designed to minimize AI context usage:
# Snapshot output (compact)
URL: https://example.com/login
Title: Login - Example
@d1 button "Sign In"
@d2 textbox "Email"
@d3 textbox "Password"
@d4 link "Forgot password?"
@d5 checkbox "Remember me"
# Command output (minimal)
Clicked @d1 (button "Sign In")
# Error output (actionable)
Error: Element @d99 not found. Run 'dom0 snapshot' to refresh refs.
JSON Mode
For programmatic access:
$ dom0 snapshot --json
{ "success": true, "data": { "url": "https://example.com/login", "title": "Login - Example", "refs": { "d1": { "alias": "d1", "role": "button", "name": "Sign In", "backendNodeId": 42 } } }, "duration": 156 }
6. Stealth Features
Why Stealth Matters
Bot detection systems check for:
navigator.webdriverproperty- Puppeteer/Playwright fingerprints
- Headless browser indicators
- Automation framework signatures
How dom0 Avoids Detection
| Detection Method | Playwright/Puppeteer | dom0 |
|---|---|---|
navigator.webdriver | Modified (detected) | Untouched |
| Chrome flags | --enable-automation | None |
| CDP connection | External (detected) | Native extension |
| Input events | Synthetic (detected) | Real dispatch |
| Browser profile | New/temp (detected) | User's actual profile |
Real Input Events
dom0 dispatches actual input events through CDP:
// Real mouse event - indistinguishable from user await debugger.sendCommand(tabId, 'Input.dispatchMouseEvent', { type: 'mousePressed', x: 150, y: 200, button: 'left', clickCount: 1, timestamp: Date.now() / 1000 // Real timestamp });
7. Error Handling
Ref Errors
// Element not found { success: false, error: "Unknown ref: @d99" } // Element no longer valid (page changed) { success: false, error: "Element @d3 is stale. Re-run snapshot." } // Element not interactable { success: false, error: "Element @d5 is not clickable" }
Connection Errors
// Extension not connected { success: false, error: "Extension not connected. Refresh the extension." } // Daemon not running Error: Connection refused. Run 'dom0 status' to check daemon. // Tab not found { success: false, error: "No active tab. Open a browser tab first." }
Recovery Suggestions
Errors include actionable recovery steps:
Error: Extension not connected.
To fix:
1. Open Chrome
2. Go to chrome://extensions
3. Find "dom0" and click the refresh icon
4. Run 'dom0 ping' to verify connection
8. Security Considerations
Permissions Required
The extension requires:
{ "permissions": [ "debugger", // CDP access "tabs", // Tab info "activeTab", // Current tab access "scripting" // Script injection (for selectors) ], "host_permissions": ["<all_urls>"] }
Localhost Only
The daemon only listens on localhost:9222:
const wss = new WebSocketServer({ host: 'localhost', // Not exposed to network port: DEFAULT_PORT });
No Credential Storage
dom0 never:
- Stores passwords or cookies
- Persists session data
- Logs sensitive information
- Transmits data externally
9. Limitations
| Limitation | Reason | Workaround |
|---|---|---|
| Chrome only | Uses chrome.debugger API | None (Chrome-specific) |
| Single tab | One active debugging session | Use tab-switch command |
| No iframes | Accessibility tree is flat | Manual navigation |
| Extension required | MV3 service worker | Must install extension |
| Local only | Daemon on localhost | Run on same machine |
10. Future Enhancements
Planned Features
- Multi-tab support
- iframe navigation
- Network interception
- Cookie management
- File upload handling
- Drag and drop
- Shadow DOM support
- Recording/playback
Architecture Improvements
- Hot-reload extension connection
- Persistent ref mapping
- Command queuing
- Retry with backoff
- Health monitoring