Browser Automation Automate browser interactions using the browse CLI with Claude. Setup check Before running any browser commands, verify the CLI is available: which browse || npm install -g @browserbasehq/browse-cli Environment Selection (Local vs Remote) The CLI automatically selects between local and remote browser environments based on available configuration: Local mode (default) Uses local Chrome — no API keys needed Best for: development, simple pages, trusted sites with no bot protection Remote mode (Browserbase) Activated when BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID are set Provides: anti-bot stealth, automatic CAPTCHA solving, residential proxies, session persistence Use remote mode when: the target site has bot detection, CAPTCHAs, IP rate limiting, Cloudflare protection, or requires geo-specific access Get credentials at https://browserbase.com/settings When to choose which Simple browsing (docs, wikis, public APIs): local mode is fine Protected sites (login walls, CAPTCHAs, anti-scraping): use remote mode If local mode fails with bot detection or access denied: switch to remote mode Commands All commands work identically in both modes. The daemon auto-starts on first command. Navigation browse open < url

Go to URL (aliases: goto)

browse reload

Reload current page

browse back

Go back in history

browse forward

Go forward in history

Page state (prefer snapshot over screenshot) browse snapshot

Get accessibility tree with element refs (fast, structured)

browse screenshot [ path ]

Take visual screenshot (slow, uses vision tokens)

browse get url

Get current URL

browse get title

Get page title

browse get text < selector

Get text content (use "body" for all text)

browse get html < selector

Get HTML content of element

browse get value < selector

Get form field value

Use browse snapshot as your default for understanding page state — it returns the accessibility tree with element refs you can use to interact. Only use browse screenshot when you need visual context (layout, images, debugging). Interaction browse click < ref

Click element by ref from snapshot (e.g., @0-5)

browse type < text

Type text into focused element

browse fill < selector

< value

Fill input and press Enter

browse select < selector

< values .. .

browse press < key

Press key (Enter, Tab, Escape, Cmd+A, etc.)

browse drag < fromX

< fromY

< toX

< toY

Drag from one point to another

browse scroll < x

< y

< deltaX

< deltaY

Scroll at coordinates

browse highlight < selector

Highlight element on page

browse is visible < selector

Check if element is visible

browse is checked < selector

Check if element is checked

browse wait < type

[ arg ]

Wait for: load, selector, timeout

Session management browse stop

Stop the browser daemon

browse status

Check daemon status (includes env)

browse env

Show current environment (local or remote)

browse env local

Switch to local Chrome

browse env remote

Switch to Browserbase (requires API keys)

browse pages

List all open tabs

browse tab_switch < index

Switch to tab by index

browse tab_close [ index ]

Close tab

Typical workflow browse open — navigate to the page browse snapshot — read the accessibility tree to understand page structure and get element refs browse click / browse type / browse fill — interact using refs from snapshot browse snapshot — confirm the action worked Repeat 3-4 as needed browse stop — close the browser when done Quick Example browse open https://example.com browse snapshot

see page structure + element refs

browse click @0-5

click element with ref 0-5

browse get title

browse stop

Mode Comparison

Feature

Local

Browserbase

Speed

Faster

Slightly slower

Setup

Chrome required

API key required

Stealth mode

No

Yes (custom Chromium, anti-bot fingerprinting)

CAPTCHA solving

No

Yes (automatic reCAPTCHA/hCaptcha)

Residential proxies

No

Yes (201 countries, geo-targeting)

Session persistence

No

Yes (cookies/auth persist across sessions)

Best for

Development/simple pages

Protected sites, bot detection, production scraping

Best Practices

Always

browse open

first

before interacting

Use

browse snapshot

to check page state — it's fast and gives you element refs

Only screenshot when visual context is needed

(layout checks, images, debugging)

Use refs from snapshot

to click/interact — e.g.,

browse click @0-5

browse stop

when done to clean up the browser session

Troubleshooting

"No active page"

Run

browse stop

, then check

browse status

. If it still says running, kill the zombie daemon with

pkill -f "browse.*daemon"

, then retry

browse open

Chrome not found

Install Chrome or use

browse env remote

Action fails

Run
browse snapshot
to see available elements and their refs
Browserbase fails: Verify API key and project ID are set Switching to Remote Mode Switch to remote when you detect: CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile), bot detection pages ("Checking your browser..."), HTTP 403/429, empty pages on sites that should have content, or the user asks for it. Don't switch for simple sites (docs, wikis, public APIs, localhost). browse env remote

switch to Browserbase

browse env local

switch back to local Chrome

The switch is sticky until you run browse stop or switch again. If API keys aren't set: openclaw browserbase setup

interactive — prompts for API key + project ID

For detailed examples, see EXAMPLES.md . For API reference, see REFERENCE.md .

安装

Go to URL (aliases: goto)

Reload current page

Go back in history

Go forward in history

Get accessibility tree with element refs (fast, structured)

Take visual screenshot (slow, uses vision tokens)

Get current URL

Get page title

Get text content (use "body" for all text)

Get HTML content of element

Get form field value

Click element by ref from snapshot (e.g., @0-5)

Type text into focused element

Fill input and press Enter

Select dropdown option(s)

Press key (Enter, Tab, Escape, Cmd+A, etc.)

Drag from one point to another

Scroll at coordinates

Highlight element on page

Check if element is visible

Check if element is checked

Wait for: load, selector, timeout

Stop the browser daemon

Check daemon status (includes env)

Show current environment (local or remote)

Switch to local Chrome

Switch to Browserbase (requires API keys)

List all open tabs

Switch to tab by index

Close tab

see page structure + element refs

click element with ref 0-5

switch to Browserbase

switch back to local Chrome

interactive — prompts for API key + project ID