welcome.md

dom0

dom0 is a browser automation toolkit for AI agents. It uses Chrome's DevTools Protocol (CDP) via chrome.debugger to provide stealth browser control that avoids detection by anti-bot systems.

Why dom0?

Traditional browser automation tools like Playwright and Puppeteer are easily detected. dom0 takes a different approach:

  • Stealth by default — Uses native Chrome debugging, not injected scripts
  • Accessibility-first — Works with the accessibility tree, not fragile CSS selectors
  • Ref system — Elements get stable aliases (@d1, @d2) that persist across interactions
  • Token-efficient — Outputs designed to minimize AI context usage
  • Daemon architecture — Background process stays alive between commands

Quick Example

bash
# Get page snapshot with element refs $ dom0 snapshot URL: https://example.com/login Title: Login @d1 button "Sign In" @d2 textbox "Email" @d3 textbox "Password" @d4 link "Forgot password?" # Interact using refs $ dom0 type @d2 "user@example.com" $ dom0 type @d3 "secret123" $ dom0 click @d1

Architecture

dom0 consists of three packages:

PackagePurpose
@bot0/dom0Core types and WebSocket protocol
@bot0/dom0-cliCLI tool with background daemon
@bot0/dom0-extensionChrome extension (MV3)
┌─────────────────┐     WebSocket     ┌─────────────────────┐
│   dom0 CLI      │◄──────────────────│  Chrome Extension   │
│                 │                   │                     │
│  • Commands     │                   │  • chrome.debugger  │
│  • Daemon       │                   │  • Accessibility    │
│  • Output       │                   │  • CDP commands     │
└─────────────────┘                   └─────────────────────┘

Documentation

Key Concepts

Element Refs

Every interactable element gets a short alias (@d1, @d2, etc.) based on the accessibility tree. These refs:

  • Are assigned fresh on each snapshot
  • Target elements by backendNodeId (Chrome's internal ID)
  • Fall back to CSS selectors if needed
  • Include role and accessible name for context

Background Daemon

The CLI runs a background daemon that:

  • Auto-starts on first command
  • Stays alive between commands (3-minute timeout)
  • Maintains WebSocket connection to extension
  • Enables sequential command execution in same terminal

Stealth Approach

Unlike Playwright/Puppeteer which inject JavaScript and modify browser properties, dom0:

  • Uses chrome.debugger API (native CDP)
  • Operates through accessibility tree (like a screen reader)
  • Dispatches real input events (mouse, keyboard)
  • Doesn't modify navigator or window properties

Installation

bash
# Build all dom0 packages pnpm dom0:build # Load extension in Chrome # 1. Go to chrome://extensions # 2. Enable "Developer mode" # 3. Click "Load unpacked" # 4. Select: packages/dom0-extension/dist # Link CLI globally (optional) cd packages/dom0-cli && pnpm link --global # Test connection dom0 ping

Usage with AI Agents

dom0 is designed to be used as a skill by AI agents like bot0:

markdown
## Browser Automation (dom0) Use the dom0 CLI for web interactions. ### Workflow 1. `dom0 snapshot` — See page elements with refs 2. Interact using refs (@d1, @d2, etc.) 3. Re-snapshot after navigation ### Key Commands - `dom0 navigate <url>` — Go to URL - `dom0 snapshot` — Get page state - `dom0 click @d1` — Click element - `dom0 type @d2 "text"` — Type into input - `dom0 screenshot` — Capture page

See the Agent Integration guide for complete SKILL.md instructions.