dom0 — Agent Integration

Guide for using dom0 as an AI agent skill for browser automation.

Overview

dom0 is designed to be used by AI agents like bot0, Claude Code, or any agent that can execute shell commands. This guide covers:

SKILL.md format for agent instructions
Workflow patterns
Best practices
Common pitfalls

SKILL.md Template

Include this in your agent's skill definitions:

markdown

---
name: dom0
description: Browser automation via CLI. Use for web interaction, form filling, data extraction, and web scraping.
---

# dom0 Browser Automation

Control Chrome browser through the dom0 CLI.

## Core Workflow

1. **Navigate** — Go to a URL
2. **Snapshot** — See page elements with refs (@d1, @d2, etc.)
3. **Interact** — Click, type, scroll using refs
4. **Re-snapshot** — Get updated state after changes

## Key Commands

| Command | Description | Example |
|---------|-------------|---------|
| `dom0 navigate <url>` | Go to URL | `dom0 navigate https://google.com` |
| `dom0 snapshot` | Get page state | `dom0 snapshot` |
| `dom0 click <ref>` | Click element | `dom0 click @d1` |
| `dom0 type <ref> <text>` | Type into input | `dom0 type @d2 "hello"` |
| `dom0 fill <ref> <text>` | Clear and type | `dom0 fill @d3 "new text"` |
| `dom0 select <ref> <value>` | Select dropdown | `dom0 select @d4 "option1"` |
| `dom0 scroll <direction>` | Scroll page | `dom0 scroll down` |
| `dom0 wait` | Wait for condition | `dom0 wait --text "Success"` |
| `dom0 screenshot` | Capture page | `dom0 screenshot -o page.png` |
| `dom0 press <key>` | Press keyboard key | `dom0 press Enter` |
| `dom0 get-text <ref>` | Get element text | `dom0 get-text @d5` |

## Understanding Refs

Refs are temporary aliases assigned to interactable elements:

@d1 button "Sign In" ← Click target @d2 textbox "Email" ← Type target @d3 textbox "Password" ← Type target @d4 link "Forgot password?" ← Click target


**Important:**
- Refs are assigned fresh on each snapshot
- Refs change after navigation or page updates
- Always re-snapshot after major interactions

## Example Workflows

### Login Flow
```bash
dom0 navigate https://example.com/login
dom0 snapshot
dom0 type @d2 "user@example.com"
dom0 type @d3 "password123"
dom0 click @d1
dom0 wait --text "Dashboard"
dom0 snapshot

Search

bash

dom0 navigate https://google.com
dom0 snapshot
dom0 type @d1 "search query"
dom0 press Enter
dom0 wait --selector "#search"
dom0 snapshot

Form Filling

bash

dom0 snapshot
dom0 fill @d2 "John Doe"
dom0 fill @d3 "john@example.com"
dom0 select @d4 "United States"
dom0 click @d5  # Submit

Rules

Always snapshot first — Never guess refs
Re-snapshot after navigation — Refs change
Use fill for inputs — Clears existing text
Wait after actions — Pages may need time to update
Check for errors — Handle failures gracefully

Error Handling

If a command fails:

Run dom0 snapshot to see current state
Check if the element still exists
Look for error messages or modals
Retry with correct refs


---

## Integration Patterns

### Pattern 1: Direct Shell Access

For agents with shell tool access:

```typescript
// Agent executes shell commands directly
await shell('dom0 navigate https://example.com');
await shell('dom0 snapshot');
const output = await shell('dom0 click @d1');

Pattern 2: MCP Tool

Wrap dom0 as an MCP tool:

typescript

const dom0Tools = [
  {
    name: 'dom0_navigate',
    description: 'Navigate browser to URL',
    input_schema: {
      type: 'object',
      properties: {
        url: { type: 'string', description: 'URL to navigate to' }
      },
      required: ['url']
    },
    handler: async ({ url }) => {
      return exec(`dom0 navigate "${url}"`);
    }
  },
  {
    name: 'dom0_snapshot',
    description: 'Get current page state with element refs',
    input_schema: { type: 'object', properties: {} },
    handler: async () => {
      return exec('dom0 snapshot');
    }
  },
  {
    name: 'dom0_click',
    description: 'Click an element by ref',
    input_schema: {
      type: 'object',
      properties: {
        ref: { type: 'string', description: 'Element ref (e.g., @d1)' }
      },
      required: ['ref']
    },
    handler: async ({ ref }) => {
      return exec(`dom0 click ${ref}`);
    }
  }
  // ... more tools
];

Pattern 3: Composite Tool

Single tool that accepts all dom0 commands:

typescript

{
  name: 'browser',
  description: 'Browser automation. Commands: snapshot, click, type, navigate, etc.',
  input_schema: {
    type: 'object',
    properties: {
      command: {
        type: 'string',
        description: 'dom0 command (e.g., "snapshot", "click @d1", "type @d2 hello")'
      }
    },
    required: ['command']
  },
  handler: async ({ command }) => {
    return exec(`dom0 ${command}`);
  }
}

Best Practices

1. Always Snapshot First

Never assume refs exist. Always start with a snapshot:

bash

# Good
dom0 snapshot
dom0 click @d1

# Bad - might fail
dom0 click @d1  # What is @d1?

Refs change when the page changes:

bash

dom0 click @d1  # Submit button
dom0 wait --text "Success"
dom0 snapshot   # Get new refs for new page

3. Use Wait Commands

Pages need time to load:

bash

# Good
dom0 click @d1
dom0 wait --text "Loading complete"
dom0 snapshot

# Bad - might get stale page
dom0 click @d1
dom0 snapshot

4. Handle Errors Gracefully

Check command output:

bash

result=$(dom0 click @d1)
if [[ $result == *"Error"* ]]; then
  dom0 snapshot  # Get current state
  # Retry or adapt
fi

5. Use Fill for Inputs

fill clears existing text, type appends:

bash

# Clear and type new value
dom0 fill @d2 "new@example.com"

# Append to existing value
dom0 type @d2 ".au"  # new@example.com.au

Common Workflows

Web Search

bash

#!/bin/bash
# Search Google and get results

dom0 navigate https://google.com
dom0 snapshot

# Find search box and type
dom0 type @d1 "$1"
dom0 press Enter

# Wait for results
dom0 wait --selector "#search"
dom0 snapshot

bash

#!/bin/bash
# Login to a website

URL=$1
EMAIL=$2
PASSWORD=$3

dom0 navigate "$URL"
dom0 snapshot

# Find and fill credentials
dom0 fill @d2 "$EMAIL"
dom0 fill @d3 "$PASSWORD"
dom0 click @d1  # Sign in button

# Verify success
dom0 wait --text "Welcome"
dom0 snapshot

Data Extraction

bash

#!/bin/bash
# Extract data from a page

dom0 navigate "$1"
dom0 snapshot --json > page.json

# Get specific element text
dom0 get-text @d5 > title.txt
dom0 get-attr @d6 href > link.txt

# Screenshot for reference
dom0 screenshot -o page.png

Form Submission

bash

#!/bin/bash
# Fill and submit a form

dom0 navigate "$FORM_URL"
dom0 snapshot

# Fill fields
dom0 fill @d2 "John Doe"
dom0 fill @d3 "john@example.com"
dom0 fill @d4 "555-1234"
dom0 select @d5 "United States"
dom0 click @d6  # Checkbox "I agree"
dom0 click @d7  # Submit

# Verify
dom0 wait --text "Thank you"
dom0 screenshot -o confirmation.png

Troubleshooting

"Extension not connected"

The Chrome extension isn't communicating with the daemon.

Fix:

Open Chrome
Go to chrome://extensions
Find dom0 and click refresh
Run dom0 ping to verify

"Unknown ref: @d99"

The ref doesn't exist on the current page.

Fix:

Run dom0 snapshot to see available refs
Use a ref that exists

"Element is stale"

The page changed since the last snapshot.

Fix:

Run dom0 snapshot to refresh refs
Find the element again

Refs Changed Unexpectedly

The page content changed (AJAX, animations, etc.)

Fix:

Add dom0 wait after interactions
Re-snapshot before continuing

Command Timeout

Page took too long to respond.

Fix:

Increase timeout: dom0 wait --timeout 60000
Check if page is actually loading
Consider if the site is blocking automation

Agent Prompting Tips

Be Specific About State

Tell the agent when to snapshot:

markdown

After each navigation or form submission, run `dom0 snapshot`
to see the updated page state before continuing.

Explain Ref Lifecycle

Make refs clear:

markdown

Refs (@d1, @d2) are temporary identifiers assigned to page elements.
They change after navigation. Always snapshot to get current refs.

Suggest Error Recovery

Help agents handle failures:

markdown

If a command fails with "Unknown ref", run `dom0 snapshot` to see
current elements and find the correct ref.

Limit Token Usage

Use JSON for parsing, text for display:

markdown

Use `dom0 snapshot` for readable output.
Use `dom0 snapshot --json` when you need to parse the result.

JSON Output Mode

For programmatic access, use --json:

bash

dom0 snapshot --json

json

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "title": "Example",
    "refs": {
      "d1": {
        "alias": "d1",
        "role": "button",
        "name": "Click me",
        "backendNodeId": 42
      }
    }
  },
  "duration": 156
}

This is useful for:

Parsing specific elements
Checking success/failure
Performance monitoring

Security Considerations

Credential Handling

Never hardcode credentials in scripts:

bash

# Bad
dom0 type @d2 "mypassword123"

# Good - use environment variables
dom0 type @d2 "$PASSWORD"

Sensitive Data

dom0 does not log or store:

Page content
Typed text
Cookies or sessions

However, be careful with:

Screenshots containing sensitive data
Snapshot output in logs

Rate Limiting

Some sites detect rapid automation:

bash

# Add delays between actions
dom0 click @d1
sleep 1
dom0 snapshot

dom0 — Agent Integration

Overview

SKILL.md Template

Search

Form Filling

Rules

Error Handling

Pattern 2: MCP Tool

Pattern 3: Composite Tool

Best Practices

1. Always Snapshot First

2. Re-snapshot After Navigation

3. Use Wait Commands

4. Handle Errors Gracefully

5. Use Fill for Inputs

Common Workflows

Web Search

Login Automation

Data Extraction

Form Submission

Troubleshooting

"Extension not connected"

"Unknown ref: @d99"

"Element is stale"

Refs Changed Unexpectedly

Command Timeout

Agent Prompting Tips

Be Specific About State

Explain Ref Lifecycle

Suggest Error Recovery

Limit Token Usage

JSON Output Mode

Security Considerations

Credential Handling

Sensitive Data

Rate Limiting