agent-integration.md

dom0 — Agent Integration

Guide for using dom0 as an AI agent skill for browser automation.

Overview

dom0 is designed to be used by AI agents like bot0, Claude Code, or any agent that can execute shell commands. This guide covers:

  • SKILL.md format for agent instructions
  • Workflow patterns
  • Best practices
  • Common pitfalls

SKILL.md Template

Include this in your agent's skill definitions:

markdown
--- name: dom0 description: Browser automation via CLI. Use for web interaction, form filling, data extraction, and web scraping. --- # dom0 Browser Automation Control Chrome browser through the dom0 CLI. ## Core Workflow 1. **Navigate** — Go to a URL 2. **Snapshot** — See page elements with refs (@d1, @d2, etc.) 3. **Interact** — Click, type, scroll using refs 4. **Re-snapshot** — Get updated state after changes ## Key Commands | Command | Description | Example | |---------|-------------|---------| | `dom0 navigate <url>` | Go to URL | `dom0 navigate https://google.com` | | `dom0 snapshot` | Get page state | `dom0 snapshot` | | `dom0 click <ref>` | Click element | `dom0 click @d1` | | `dom0 type <ref> <text>` | Type into input | `dom0 type @d2 "hello"` | | `dom0 fill <ref> <text>` | Clear and type | `dom0 fill @d3 "new text"` | | `dom0 select <ref> <value>` | Select dropdown | `dom0 select @d4 "option1"` | | `dom0 scroll <direction>` | Scroll page | `dom0 scroll down` | | `dom0 wait` | Wait for condition | `dom0 wait --text "Success"` | | `dom0 screenshot` | Capture page | `dom0 screenshot -o page.png` | | `dom0 press <key>` | Press keyboard key | `dom0 press Enter` | | `dom0 get-text <ref>` | Get element text | `dom0 get-text @d5` | ## Understanding Refs Refs are temporary aliases assigned to interactable elements:

@d1 button "Sign In" ← Click target @d2 textbox "Email" ← Type target @d3 textbox "Password" ← Type target @d4 link "Forgot password?" ← Click target


**Important:**
- Refs are assigned fresh on each snapshot
- Refs change after navigation or page updates
- Always re-snapshot after major interactions

## Example Workflows

### Login Flow
```bash
dom0 navigate https://example.com/login
dom0 snapshot
dom0 type @d2 "user@example.com"
dom0 type @d3 "password123"
dom0 click @d1
dom0 wait --text "Dashboard"
dom0 snapshot
bash
dom0 navigate https://google.com dom0 snapshot dom0 type @d1 "search query" dom0 press Enter dom0 wait --selector "#search" dom0 snapshot

Form Filling

bash
dom0 snapshot dom0 fill @d2 "John Doe" dom0 fill @d3 "john@example.com" dom0 select @d4 "United States" dom0 click @d5 # Submit

Rules

  1. Always snapshot first — Never guess refs
  2. Re-snapshot after navigation — Refs change
  3. Use fill for inputs — Clears existing text
  4. Wait after actions — Pages may need time to update
  5. Check for errors — Handle failures gracefully

Error Handling

If a command fails:

  1. Run dom0 snapshot to see current state
  2. Check if the element still exists
  3. Look for error messages or modals
  4. Retry with correct refs

---

## Integration Patterns

### Pattern 1: Direct Shell Access

For agents with shell tool access:

```typescript
// Agent executes shell commands directly
await shell('dom0 navigate https://example.com');
await shell('dom0 snapshot');
const output = await shell('dom0 click @d1');

Pattern 2: MCP Tool

Wrap dom0 as an MCP tool:

typescript
const dom0Tools = [ { name: 'dom0_navigate', description: 'Navigate browser to URL', input_schema: { type: 'object', properties: { url: { type: 'string', description: 'URL to navigate to' } }, required: ['url'] }, handler: async ({ url }) => { return exec(`dom0 navigate "${url}"`); } }, { name: 'dom0_snapshot', description: 'Get current page state with element refs', input_schema: { type: 'object', properties: {} }, handler: async () => { return exec('dom0 snapshot'); } }, { name: 'dom0_click', description: 'Click an element by ref', input_schema: { type: 'object', properties: { ref: { type: 'string', description: 'Element ref (e.g., @d1)' } }, required: ['ref'] }, handler: async ({ ref }) => { return exec(`dom0 click ${ref}`); } } // ... more tools ];

Pattern 3: Composite Tool

Single tool that accepts all dom0 commands:

typescript
{ name: 'browser', description: 'Browser automation. Commands: snapshot, click, type, navigate, etc.', input_schema: { type: 'object', properties: { command: { type: 'string', description: 'dom0 command (e.g., "snapshot", "click @d1", "type @d2 hello")' } }, required: ['command'] }, handler: async ({ command }) => { return exec(`dom0 ${command}`); } }

Best Practices

1. Always Snapshot First

Never assume refs exist. Always start with a snapshot:

bash
# Good dom0 snapshot dom0 click @d1 # Bad - might fail dom0 click @d1 # What is @d1?

2. Re-snapshot After Navigation

Refs change when the page changes:

bash
dom0 click @d1 # Submit button dom0 wait --text "Success" dom0 snapshot # Get new refs for new page

3. Use Wait Commands

Pages need time to load:

bash
# Good dom0 click @d1 dom0 wait --text "Loading complete" dom0 snapshot # Bad - might get stale page dom0 click @d1 dom0 snapshot

4. Handle Errors Gracefully

Check command output:

bash
result=$(dom0 click @d1) if [[ $result == *"Error"* ]]; then dom0 snapshot # Get current state # Retry or adapt fi

5. Use Fill for Inputs

fill clears existing text, type appends:

bash
# Clear and type new value dom0 fill @d2 "new@example.com" # Append to existing value dom0 type @d2 ".au" # new@example.com.au

Common Workflows

bash
#!/bin/bash # Search Google and get results dom0 navigate https://google.com dom0 snapshot # Find search box and type dom0 type @d1 "$1" dom0 press Enter # Wait for results dom0 wait --selector "#search" dom0 snapshot

Login Automation

bash
#!/bin/bash # Login to a website URL=$1 EMAIL=$2 PASSWORD=$3 dom0 navigate "$URL" dom0 snapshot # Find and fill credentials dom0 fill @d2 "$EMAIL" dom0 fill @d3 "$PASSWORD" dom0 click @d1 # Sign in button # Verify success dom0 wait --text "Welcome" dom0 snapshot

Data Extraction

bash
#!/bin/bash # Extract data from a page dom0 navigate "$1" dom0 snapshot --json > page.json # Get specific element text dom0 get-text @d5 > title.txt dom0 get-attr @d6 href > link.txt # Screenshot for reference dom0 screenshot -o page.png

Form Submission

bash
#!/bin/bash # Fill and submit a form dom0 navigate "$FORM_URL" dom0 snapshot # Fill fields dom0 fill @d2 "John Doe" dom0 fill @d3 "john@example.com" dom0 fill @d4 "555-1234" dom0 select @d5 "United States" dom0 click @d6 # Checkbox "I agree" dom0 click @d7 # Submit # Verify dom0 wait --text "Thank you" dom0 screenshot -o confirmation.png

Troubleshooting

"Extension not connected"

The Chrome extension isn't communicating with the daemon.

Fix:

  1. Open Chrome
  2. Go to chrome://extensions
  3. Find dom0 and click refresh
  4. Run dom0 ping to verify

"Unknown ref: @d99"

The ref doesn't exist on the current page.

Fix:

  1. Run dom0 snapshot to see available refs
  2. Use a ref that exists

"Element is stale"

The page changed since the last snapshot.

Fix:

  1. Run dom0 snapshot to refresh refs
  2. Find the element again

Refs Changed Unexpectedly

The page content changed (AJAX, animations, etc.)

Fix:

  1. Add dom0 wait after interactions
  2. Re-snapshot before continuing

Command Timeout

Page took too long to respond.

Fix:

  1. Increase timeout: dom0 wait --timeout 60000
  2. Check if page is actually loading
  3. Consider if the site is blocking automation

Agent Prompting Tips

Be Specific About State

Tell the agent when to snapshot:

markdown
After each navigation or form submission, run `dom0 snapshot` to see the updated page state before continuing.

Explain Ref Lifecycle

Make refs clear:

markdown
Refs (@d1, @d2) are temporary identifiers assigned to page elements. They change after navigation. Always snapshot to get current refs.

Suggest Error Recovery

Help agents handle failures:

markdown
If a command fails with "Unknown ref", run `dom0 snapshot` to see current elements and find the correct ref.

Limit Token Usage

Use JSON for parsing, text for display:

markdown
Use `dom0 snapshot` for readable output. Use `dom0 snapshot --json` when you need to parse the result.

JSON Output Mode

For programmatic access, use --json:

bash
dom0 snapshot --json
json
{ "success": true, "data": { "url": "https://example.com", "title": "Example", "refs": { "d1": { "alias": "d1", "role": "button", "name": "Click me", "backendNodeId": 42 } } }, "duration": 156 }

This is useful for:

  • Parsing specific elements
  • Checking success/failure
  • Performance monitoring

Security Considerations

Credential Handling

Never hardcode credentials in scripts:

bash
# Bad dom0 type @d2 "mypassword123" # Good - use environment variables dom0 type @d2 "$PASSWORD"

Sensitive Data

dom0 does not log or store:

  • Page content
  • Typed text
  • Cookies or sessions

However, be careful with:

  • Screenshots containing sensitive data
  • Snapshot output in logs

Rate Limiting

Some sites detect rapid automation:

bash
# Add delays between actions dom0 click @d1 sleep 1 dom0 snapshot