dom0 — Agent Integration
Guide for using dom0 as an AI agent skill for browser automation.
Overview
dom0 is designed to be used by AI agents like bot0, Claude Code, or any agent that can execute shell commands. This guide covers:
- SKILL.md format for agent instructions
- Workflow patterns
- Best practices
- Common pitfalls
SKILL.md Template
Include this in your agent's skill definitions:
--- name: dom0 description: Browser automation via CLI. Use for web interaction, form filling, data extraction, and web scraping. --- # dom0 Browser Automation Control Chrome browser through the dom0 CLI. ## Core Workflow 1. **Navigate** — Go to a URL 2. **Snapshot** — See page elements with refs (@d1, @d2, etc.) 3. **Interact** — Click, type, scroll using refs 4. **Re-snapshot** — Get updated state after changes ## Key Commands | Command | Description | Example | |---------|-------------|---------| | `dom0 navigate <url>` | Go to URL | `dom0 navigate https://google.com` | | `dom0 snapshot` | Get page state | `dom0 snapshot` | | `dom0 click <ref>` | Click element | `dom0 click @d1` | | `dom0 type <ref> <text>` | Type into input | `dom0 type @d2 "hello"` | | `dom0 fill <ref> <text>` | Clear and type | `dom0 fill @d3 "new text"` | | `dom0 select <ref> <value>` | Select dropdown | `dom0 select @d4 "option1"` | | `dom0 scroll <direction>` | Scroll page | `dom0 scroll down` | | `dom0 wait` | Wait for condition | `dom0 wait --text "Success"` | | `dom0 screenshot` | Capture page | `dom0 screenshot -o page.png` | | `dom0 press <key>` | Press keyboard key | `dom0 press Enter` | | `dom0 get-text <ref>` | Get element text | `dom0 get-text @d5` | ## Understanding Refs Refs are temporary aliases assigned to interactable elements:
@d1 button "Sign In" ← Click target @d2 textbox "Email" ← Type target @d3 textbox "Password" ← Type target @d4 link "Forgot password?" ← Click target
**Important:**
- Refs are assigned fresh on each snapshot
- Refs change after navigation or page updates
- Always re-snapshot after major interactions
## Example Workflows
### Login Flow
```bash
dom0 navigate https://example.com/login
dom0 snapshot
dom0 type @d2 "user@example.com"
dom0 type @d3 "password123"
dom0 click @d1
dom0 wait --text "Dashboard"
dom0 snapshot
Search
dom0 navigate https://google.com dom0 snapshot dom0 type @d1 "search query" dom0 press Enter dom0 wait --selector "#search" dom0 snapshot
Form Filling
dom0 snapshot dom0 fill @d2 "John Doe" dom0 fill @d3 "john@example.com" dom0 select @d4 "United States" dom0 click @d5 # Submit
Rules
- Always snapshot first — Never guess refs
- Re-snapshot after navigation — Refs change
- Use fill for inputs — Clears existing text
- Wait after actions — Pages may need time to update
- Check for errors — Handle failures gracefully
Error Handling
If a command fails:
- Run
dom0 snapshotto see current state - Check if the element still exists
- Look for error messages or modals
- Retry with correct refs
---
## Integration Patterns
### Pattern 1: Direct Shell Access
For agents with shell tool access:
```typescript
// Agent executes shell commands directly
await shell('dom0 navigate https://example.com');
await shell('dom0 snapshot');
const output = await shell('dom0 click @d1');
Pattern 2: MCP Tool
Wrap dom0 as an MCP tool:
const dom0Tools = [ { name: 'dom0_navigate', description: 'Navigate browser to URL', input_schema: { type: 'object', properties: { url: { type: 'string', description: 'URL to navigate to' } }, required: ['url'] }, handler: async ({ url }) => { return exec(`dom0 navigate "${url}"`); } }, { name: 'dom0_snapshot', description: 'Get current page state with element refs', input_schema: { type: 'object', properties: {} }, handler: async () => { return exec('dom0 snapshot'); } }, { name: 'dom0_click', description: 'Click an element by ref', input_schema: { type: 'object', properties: { ref: { type: 'string', description: 'Element ref (e.g., @d1)' } }, required: ['ref'] }, handler: async ({ ref }) => { return exec(`dom0 click ${ref}`); } } // ... more tools ];
Pattern 3: Composite Tool
Single tool that accepts all dom0 commands:
{ name: 'browser', description: 'Browser automation. Commands: snapshot, click, type, navigate, etc.', input_schema: { type: 'object', properties: { command: { type: 'string', description: 'dom0 command (e.g., "snapshot", "click @d1", "type @d2 hello")' } }, required: ['command'] }, handler: async ({ command }) => { return exec(`dom0 ${command}`); } }
Best Practices
1. Always Snapshot First
Never assume refs exist. Always start with a snapshot:
# Good dom0 snapshot dom0 click @d1 # Bad - might fail dom0 click @d1 # What is @d1?
2. Re-snapshot After Navigation
Refs change when the page changes:
dom0 click @d1 # Submit button dom0 wait --text "Success" dom0 snapshot # Get new refs for new page
3. Use Wait Commands
Pages need time to load:
# Good dom0 click @d1 dom0 wait --text "Loading complete" dom0 snapshot # Bad - might get stale page dom0 click @d1 dom0 snapshot
4. Handle Errors Gracefully
Check command output:
result=$(dom0 click @d1) if [[ $result == *"Error"* ]]; then dom0 snapshot # Get current state # Retry or adapt fi
5. Use Fill for Inputs
fill clears existing text, type appends:
# Clear and type new value dom0 fill @d2 "new@example.com" # Append to existing value dom0 type @d2 ".au" # new@example.com.au
Common Workflows
Web Search
#!/bin/bash # Search Google and get results dom0 navigate https://google.com dom0 snapshot # Find search box and type dom0 type @d1 "$1" dom0 press Enter # Wait for results dom0 wait --selector "#search" dom0 snapshot
Login Automation
#!/bin/bash # Login to a website URL=$1 EMAIL=$2 PASSWORD=$3 dom0 navigate "$URL" dom0 snapshot # Find and fill credentials dom0 fill @d2 "$EMAIL" dom0 fill @d3 "$PASSWORD" dom0 click @d1 # Sign in button # Verify success dom0 wait --text "Welcome" dom0 snapshot
Data Extraction
#!/bin/bash # Extract data from a page dom0 navigate "$1" dom0 snapshot --json > page.json # Get specific element text dom0 get-text @d5 > title.txt dom0 get-attr @d6 href > link.txt # Screenshot for reference dom0 screenshot -o page.png
Form Submission
#!/bin/bash # Fill and submit a form dom0 navigate "$FORM_URL" dom0 snapshot # Fill fields dom0 fill @d2 "John Doe" dom0 fill @d3 "john@example.com" dom0 fill @d4 "555-1234" dom0 select @d5 "United States" dom0 click @d6 # Checkbox "I agree" dom0 click @d7 # Submit # Verify dom0 wait --text "Thank you" dom0 screenshot -o confirmation.png
Troubleshooting
"Extension not connected"
The Chrome extension isn't communicating with the daemon.
Fix:
- Open Chrome
- Go to
chrome://extensions - Find dom0 and click refresh
- Run
dom0 pingto verify
"Unknown ref: @d99"
The ref doesn't exist on the current page.
Fix:
- Run
dom0 snapshotto see available refs - Use a ref that exists
"Element is stale"
The page changed since the last snapshot.
Fix:
- Run
dom0 snapshotto refresh refs - Find the element again
Refs Changed Unexpectedly
The page content changed (AJAX, animations, etc.)
Fix:
- Add
dom0 waitafter interactions - Re-snapshot before continuing
Command Timeout
Page took too long to respond.
Fix:
- Increase timeout:
dom0 wait --timeout 60000 - Check if page is actually loading
- Consider if the site is blocking automation
Agent Prompting Tips
Be Specific About State
Tell the agent when to snapshot:
After each navigation or form submission, run `dom0 snapshot` to see the updated page state before continuing.
Explain Ref Lifecycle
Make refs clear:
Refs (@d1, @d2) are temporary identifiers assigned to page elements. They change after navigation. Always snapshot to get current refs.
Suggest Error Recovery
Help agents handle failures:
If a command fails with "Unknown ref", run `dom0 snapshot` to see current elements and find the correct ref.
Limit Token Usage
Use JSON for parsing, text for display:
Use `dom0 snapshot` for readable output. Use `dom0 snapshot --json` when you need to parse the result.
JSON Output Mode
For programmatic access, use --json:
dom0 snapshot --json
{ "success": true, "data": { "url": "https://example.com", "title": "Example", "refs": { "d1": { "alias": "d1", "role": "button", "name": "Click me", "backendNodeId": 42 } } }, "duration": 156 }
This is useful for:
- Parsing specific elements
- Checking success/failure
- Performance monitoring
Security Considerations
Credential Handling
Never hardcode credentials in scripts:
# Bad dom0 type @d2 "mypassword123" # Good - use environment variables dom0 type @d2 "$PASSWORD"
Sensitive Data
dom0 does not log or store:
- Page content
- Typed text
- Cookies or sessions
However, be careful with:
- Screenshots containing sensitive data
- Snapshot output in logs
Rate Limiting
Some sites detect rapid automation:
# Add delays between actions dom0 click @d1 sleep 1 dom0 snapshot