system-architecture.md

dom0 — System Architecture

Overview

dom0 is a browser automation system that enables AI agents to control Chrome through a CLI interface. It uses Chrome's DevTools Protocol (CDP) via the chrome.debugger API for stealth automation that avoids bot detection.

┌─────────────────────────────────────────────────────────────────────────────┐
│                              dom0 SYSTEM                                     │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                           AI AGENT                                   │   │
│   │   bot0 │ Claude Code │ Cursor │ Any CLI-capable agent               │   │
│   │                                                                      │   │
│   │   Uses: Shell commands (dom0 snapshot, dom0 click @d1, etc.)        │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    │ Shell                                  │
│                                    ▼                                        │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                         dom0 CLI                                     │   │
│   │                                                                      │   │
│   │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                │   │
│   │   │  Commands   │  │   Daemon    │  │   Output    │                │   │
│   │   │             │  │             │  │             │                │   │
│   │   │ • snapshot  │  │ • Auto-start│  │ • Text      │                │   │
│   │   │ • click     │  │ • Keep-alive│  │ • JSON      │                │   │
│   │   │ • type      │  │ • WS Server │  │ • Token-eff │                │   │
│   │   │ • navigate  │  │ • 3min idle │  │             │                │   │
│   │   └─────────────┘  └─────────────┘  └─────────────┘                │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    │ WebSocket (localhost:9222)             │
│                                    ▼                                        │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                      Chrome Extension (MV3)                          │   │
│   │                                                                      │   │
│   │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                │   │
│   │   │  Debugger   │  │ Accessibility│ │   Actions   │                │   │
│   │   │             │  │             │  │             │                │   │
│   │   │ • Attach    │  │ • Get tree  │  │ • Click     │                │   │
│   │   │ • CDP cmds  │  │ • Parse     │  │ • Type      │                │   │
│   │   │ • Events    │  │ • Build refs│  │ • Scroll    │                │   │
│   │   └─────────────┘  └─────────────┘  └─────────────┘                │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    │ chrome.debugger API                    │
│                                    ▼                                        │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                         Chrome Browser                               │   │
│   │                                                                      │   │
│   │   Active Tab → CDP Commands → DOM/Input Events                      │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

1. Package Structure

@bot0/dom0 (Core)

Shared types and protocol definitions. No runtime dependencies.

typescript
// Key types interface ElementRef { alias: string; // "d1", "d2" backendNodeId: number; // Chrome's internal node ID selector: string; // CSS selector (fallback) role?: string; // ARIA role name?: string; // Accessible name isInteractable: boolean; } interface PageSnapshot { url: string; title: string; refs: Record<string, ElementRef>; tree: string; // Formatted accessibility tree } // WebSocket protocol type Dom0Request = | { type: 'snapshot' } | { type: 'click'; ref: string } | { type: 'type'; ref: string; text: string } | { type: 'navigate'; url: string } // ... 20+ command types interface RequestMessage { id: string; // Unique request ID request: Dom0Request; } interface ResponseMessage { id: string; success: boolean; data?: unknown; error?: string; duration?: number; }

@bot0/dom0-cli (CLI)

Command-line interface with background daemon.

packages/dom0-cli/
├── bin/dom0.js           # Entry point
├── src/
│   ├── index.ts          # Commander setup
│   ├── daemon.ts         # Background daemon
│   ├── client.ts         # WebSocket client
│   ├── output.ts         # Formatting
│   └── commands/
│       ├── snapshot.ts
│       ├── click.ts
│       ├── type.ts
│       ├── navigate.ts
│       └── ... (17 more)
├── SKILL.md              # AI agent instructions
└── package.json

@bot0/dom0-extension (Chrome Extension)

Manifest V3 service worker with chrome.debugger integration.

packages/dom0-extension/
├── manifest.json         # MV3 manifest
├── src/
│   ├── background/
│   │   ├── index.ts      # Service worker entry
│   │   ├── debugger.ts   # CDP wrapper
│   │   └── commands.ts   # Command handlers
│   ├── lib/
│   │   ├── accessibility.ts  # Tree traversal
│   │   └── refs.ts           # Ref assignment
│   └── types/
│       └── index.ts
└── package.json

2. Communication Flow

Daemon Architecture

The CLI uses a background daemon pattern to maintain connection state:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DAEMON ARCHITECTURE                                   │
│                                                                             │
│   First command: dom0 snapshot                                              │
│   ─────────────────────────────────────────────────────────────────────    │
│                                                                             │
│   1. CLI checks if daemon running (port 9222)                              │
│   2. If not → spawn detached daemon process                                │
│   3. Wait for daemon to start                                              │
│   4. Connect to daemon via WebSocket                                        │
│   5. Send request, receive response                                        │
│   6. Disconnect (daemon stays running)                                      │
│                                                                             │
│   Subsequent commands: dom0 click @d1                                       │
│   ─────────────────────────────────────────────────────────────────────    │
│                                                                             │
│   1. CLI checks if daemon running → YES                                    │
│   2. Connect to daemon via WebSocket                                        │
│   3. Send request, receive response                                        │
│   4. Disconnect (daemon stays running)                                      │
│                                                                             │
│   Daemon lifecycle:                                                          │
│   ─────────────────────────────────────────────────────────────────────    │
│                                                                             │
│   • Starts on first command                                                 │
│   • Maintains extension connection                                          │
│   • 3-minute inactivity timeout                                            │
│   • Shuts down automatically when idle                                      │
│   • Can be stopped manually: dom0 stop                                      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Message Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                            MESSAGE FLOW                                       │
│                                                                             │
│   CLI Client              Daemon                    Extension               │
│   ──────────              ──────                    ─────────               │
│                                                                             │
│   1. Connect ─────────────►│                                                │
│                            │                                                │
│   2. Send request ─────────►│                                                │
│      { id: "abc",          │                                                │
│        request: {          │                                                │
│          type: "click",    │ 3. Forward ──────────────────────────►│        │
│          ref: "@d1"        │    to extension                       │        │
│        }                   │                                       │        │
│      }                     │                                       │        │
│                            │                                       ▼        │
│                            │                        4. Execute CDP command  │
│                            │                           chrome.debugger      │
│                            │                                       │        │
│                            │                        5. Return result        │
│                            │◄──────────────────────────────────────│        │
│   6. Receive response ◄────│                                                │
│      { id: "abc",          │                                                │
│        success: true,      │                                                │
│        duration: 42        │                                                │
│      }                     │                                                │
│                            │                                                │
│   7. Disconnect            │                                                │
│                            │                                                │
└─────────────────────────────────────────────────────────────────────────────┘

Extension Registration

When the extension connects, it registers with the daemon:

typescript
// Extension → Daemon { type: 'extension:register', version: '0.1.0' } // Daemon logs [dom0] Extension registered (v0.1.0)

This allows the daemon to distinguish between:

  • CLI client connections (send requests, expect responses)
  • Extension connection (receives requests, sends responses)

3. Chrome Debugger Integration

CDP (Chrome DevTools Protocol)

The extension uses chrome.debugger to send CDP commands:

typescript
class DebuggerClient { private attached = new Map<number, boolean>(); async attach(tabId: number): Promise<void> { if (this.attached.get(tabId)) return; await chrome.debugger.attach({ tabId }, '1.3'); this.attached.set(tabId, true); } async sendCommand<T>( tabId: number, method: string, params?: object ): Promise<T> { await this.attach(tabId); return chrome.debugger.sendCommand( { tabId }, method, params ) as T; } }

Key CDP Commands Used

CDP MethodPurpose
Accessibility.getFullAXTreeGet accessibility tree
DOM.getBoxModelGet element position
DOM.focusFocus element
Input.dispatchMouseEventClick, hover
Input.dispatchKeyEventType, press keys
Page.navigateGo to URL
Page.captureScreenshotTake screenshot
Runtime.evaluateExecute JavaScript

Click Implementation

typescript
async click(tabId: number, backendNodeId: number): Promise<void> { // 1. Get element's box model const { model } = await this.sendCommand<BoxModelResult>( tabId, 'DOM.getBoxModel', { backendNodeId } ); // 2. Calculate center point const x = (model.content[0] + model.content[2]) / 2; const y = (model.content[1] + model.content[5]) / 2; // 3. Dispatch real mouse events await this.sendCommand(tabId, 'Input.dispatchMouseEvent', { type: 'mousePressed', x, y, button: 'left', clickCount: 1 }); await this.sendCommand(tabId, 'Input.dispatchMouseEvent', { type: 'mouseReleased', x, y, button: 'left' }); }

4. Accessibility Tree & Refs

Why Accessibility Tree?

The accessibility tree provides:

  1. Semantic structure — Elements have roles (button, textbox, link)
  2. Accessible names — Human-readable labels
  3. Interactability — Focus, click, type capabilities
  4. Stability — Less brittle than CSS selectors
  5. Stealth — Same data screen readers use

Tree Parsing

typescript
const INTERACTIVE_ROLES = new Set([ 'button', 'link', 'textbox', 'checkbox', 'radio', 'combobox', 'listbox', 'menuitem', 'searchbox', 'slider', 'switch', 'tab', 'option' ]); function isInteractable(node: AXNode): boolean { if (!node.role?.value) return false; if (INTERACTIVE_ROLES.has(node.role.value)) return true; // Also check for clickable/focusable properties const props = node.properties || []; return props.some(p => p.name === 'focusable' && p.value?.value === true ); } async function buildRefMap( debugger: DebuggerClient, tabId: number ): Promise<RefState> { const { nodes } = await debugger.sendCommand<AXTreeResult>( tabId, 'Accessibility.getFullAXTree' ); const refs = new Map<string, ElementRef>(); let counter = 0; for (const node of nodes) { if (isInteractable(node) && node.backendDOMNodeId) { const alias = `d${++counter}`; refs.set(alias, { alias, backendNodeId: node.backendDOMNodeId, selector: '', // Built on-demand if needed role: node.role?.value, name: node.name?.value, isInteractable: true }); } } return { refs, counter }; }

Ref Resolution

When a command references @d1:

typescript
function resolveRef(ref: string, state: RefState): ElementRef { // Remove @ prefix if present const alias = ref.startsWith('@') ? ref.slice(1) : ref; const element = state.refs.get(alias); if (!element) { throw new Error(`Unknown ref: ${ref}`); } return element; }

5. Output Formatting

Token-Efficient Design

Output is designed to minimize AI context usage:

# Snapshot output (compact)
URL: https://example.com/login
Title: Login - Example

@d1  button "Sign In"
@d2  textbox "Email"
@d3  textbox "Password"
@d4  link "Forgot password?"
@d5  checkbox "Remember me"

# Command output (minimal)
Clicked @d1 (button "Sign In")

# Error output (actionable)
Error: Element @d99 not found. Run 'dom0 snapshot' to refresh refs.

JSON Mode

For programmatic access:

bash
$ dom0 snapshot --json
json
{ "success": true, "data": { "url": "https://example.com/login", "title": "Login - Example", "refs": { "d1": { "alias": "d1", "role": "button", "name": "Sign In", "backendNodeId": 42 } } }, "duration": 156 }

6. Stealth Features

Why Stealth Matters

Bot detection systems check for:

  • navigator.webdriver property
  • Puppeteer/Playwright fingerprints
  • Headless browser indicators
  • Automation framework signatures

How dom0 Avoids Detection

Detection MethodPlaywright/Puppeteerdom0
navigator.webdriverModified (detected)Untouched
Chrome flags--enable-automationNone
CDP connectionExternal (detected)Native extension
Input eventsSynthetic (detected)Real dispatch
Browser profileNew/temp (detected)User's actual profile

Real Input Events

dom0 dispatches actual input events through CDP:

typescript
// Real mouse event - indistinguishable from user await debugger.sendCommand(tabId, 'Input.dispatchMouseEvent', { type: 'mousePressed', x: 150, y: 200, button: 'left', clickCount: 1, timestamp: Date.now() / 1000 // Real timestamp });

7. Error Handling

Ref Errors

typescript
// Element not found { success: false, error: "Unknown ref: @d99" } // Element no longer valid (page changed) { success: false, error: "Element @d3 is stale. Re-run snapshot." } // Element not interactable { success: false, error: "Element @d5 is not clickable" }

Connection Errors

typescript
// Extension not connected { success: false, error: "Extension not connected. Refresh the extension." } // Daemon not running Error: Connection refused. Run 'dom0 status' to check daemon. // Tab not found { success: false, error: "No active tab. Open a browser tab first." }

Recovery Suggestions

Errors include actionable recovery steps:

Error: Extension not connected.

To fix:
1. Open Chrome
2. Go to chrome://extensions
3. Find "dom0" and click the refresh icon
4. Run 'dom0 ping' to verify connection

8. Security Considerations

Permissions Required

The extension requires:

json
{ "permissions": [ "debugger", // CDP access "tabs", // Tab info "activeTab", // Current tab access "scripting" // Script injection (for selectors) ], "host_permissions": ["<all_urls>"] }

Localhost Only

The daemon only listens on localhost:9222:

typescript
const wss = new WebSocketServer({ host: 'localhost', // Not exposed to network port: DEFAULT_PORT });

No Credential Storage

dom0 never:

  • Stores passwords or cookies
  • Persists session data
  • Logs sensitive information
  • Transmits data externally

9. Limitations

LimitationReasonWorkaround
Chrome onlyUses chrome.debugger APINone (Chrome-specific)
Single tabOne active debugging sessionUse tab-switch command
No iframesAccessibility tree is flatManual navigation
Extension requiredMV3 service workerMust install extension
Local onlyDaemon on localhostRun on same machine

10. Future Enhancements

Planned Features

  • Multi-tab support
  • iframe navigation
  • Network interception
  • Cookie management
  • File upload handling
  • Drag and drop
  • Shadow DOM support
  • Recording/playback

Architecture Improvements

  • Hot-reload extension connection
  • Persistent ref mapping
  • Command queuing
  • Retry with backoff
  • Health monitoring