AutoBrowse — Self-Improving Browser Skill Build reliable browser automation skills through iterative experimentation. An inner agent browses the site ( evaluate.ts ). You — the outer agent — read what happened and improve the instructions ( strategy.md ). Repeat until it passes consistently. Entry Points Invocation is flexible — both explicit flags and free-form natural language work: /autobrowse --task google-flights /autobrowse --task google-flights --iterations 10 --env remote /autobrowse --tasks google-flights,amazon-add-to-cart /autobrowse --all
Also fine — parse freely:
/autobrowse https://flights.google.com/
/autobrowse book a flight on delta.com
/autobrowse fix the existing google-flights skill
When the user drops a URL or free-form instruction instead of
--task
cp ${CLAUDE_SKILL_DIR} /references/example-task.md ./autobrowse/tasks/ < task
/task.md
Then edit task.md to describe the URL, inputs, steps, and expected JSON output
The skill source at
${CLAUDE_SKILL_DIR}
stays read-only — only
./autobrowse/
in CWD gets written to during training. Graduation (final step) writes a single file to
~/.claude/skills/
--workspace ./autobrowse
or for bot-protected sites:
- node
- ${CLAUDE_SKILL_DIR}
- /scripts/evaluate.mjs
- --task
- <
- task-name
- >
- --workspace
- ./autobrowse
- --env
- remote
- This runs the browser session and writes a full trace to
- ./autobrowse/traces/
/latest/ - .
- Read the trace
- cat
- ./autobrowse/traces/
- <
- task-name
- >
- /latest/summary.md
- The summary has duration, cost, turns, the decision log, and the final JSON output.
- If the agent failed or got stuck, look deeper:
- Read
- ./autobrowse/traces/
/latest/trace.json - — search for the failure turn
- Read screenshots around the failure point with the Read tool
- Form one hypothesis
- Find the exact turn where things went wrong. What single heuristic would have prevented it?
- Examples:
- "After clicking the dropdown, wait 1s — options animate in before they're clickable"
- "Navigate directly to
- /pay-invoice/
- — skip the landing page entirely"
- "Use
- browse fill #field_3 value
- not
- browse type
- — this field clears on focus"
- "The page shows a spinner at turn 8 — add
- browse wait timeout 2000
- before snapshot"
- Update strategy.md
- Edit
- ./autobrowse/tasks/
/strategy.md - . Keep everything that worked. Fix the specific failure. Add a concrete heuristic.
- Good strategies have:
- Fast path
-
- direct URL or shortcuts to skip exploration
- Step-by-step workflow
-
- exact sequence with timing notes
- Site-specific knowledge
-
- selector IDs, form field names, success indicators
- Failure recovery
- what to do when X goes wrong
Judge the result
Read the new summary. Did it pass? Make clear progress?
Pass or progress
→ keep, next iteration
No progress or regression
→ revert strategy.md to the previous version and try a different hypothesis
After all iterations — publish if ready
If the task passed on 2+ of the last 3 iterations
or has reached the max iteration limit
, install it as a Claude Code skill.
Do not just copy strategy.md
— the skill must be self-contained and useful to someone who has never seen this codebase. If graduating at max iterations without a clean pass, note the known failure point but still document everything learned.
Install by writing to
~/.claude/skills/
/SKILL.md : mkdir -p ~/.claude/skills/ < task-name Use this structure for the SKILL.md:
name : <task - name
description : <1 - 2 sentences describing what this skill does and when to use it. Include trigger keywords.
Purpose <1-2 sentences: what this automates and why it exists.>
When to Use < When should someone reach for this skill.
Browse CLI Reference
The inner agent uses the
browse
CLI. Key commands for this task:
-
browse stop
— kill existing session (always run before switching to remote)
-
browse env remote
— start a fresh Browserbase cloud session
-
browse newpage <url>
— open URL in a new tab (required in remote mode —
browse open
fails with "no page available")
-
browse open <url>
— navigate existing tab (local mode only)
-
browse wait load
— wait for page to finish loading
-
browse wait timeout <ms>
— wait a fixed amount of time for spinners or animations
-
browse wait selector "<selector>"
— wait for an element to become visible
-
browse get title
— verify you're on the right page
-
browse get text body
— extract all visible text (preferred for content extraction)
-
browse snapshot
— get accessibility tree; each node has a ref in
[X-Y]
format (e.g.
[0-5]
,
[2-147]
)
-
browse click [X-Y]
— click element by ref from the latest snapshot (include the brackets)
**
Never use
--session <name>
flags in SKILL.md.
**
Named sessions are a parallel-run workaround — they contaminate skills with infrastructure concerns. Skills must work in isolation with the default session.
Workflow
Step 1 — Start session < exact browse commands in order
Step 2 — Navigate < exact URL and verification steps
Step 3 — Extract < exact extraction commands
Step 4 — Output < what JSON to emit, referencing the schema below
Site-Specific Gotchas < Bullet list of every hard-won heuristic from the iterations. This is the core value of the skill.
Failure Recovery < What to do when navigation fails, session is contaminated, or extraction returns garbage
Expected Output ```json < paste the exact expected output schema from task.md
After writing the SKILL.md, confirm it's installed: ```bash ls ~/.claude/skills/
/SKILL.md The skill is now available as / in Claude Code. Final report (multi-task mode) After all sub-agents complete, print a markdown table: Task Iterations Final Status Graduated Cost google-flights 5 ✅ pass yes $0.42 amazon-add-to-cart 5 ❌ fail no $1.20 Then write a persistent session report to ./autobrowse/reports/ so there's a durable record of the run inside the workspace: mkdir -p ./autobrowse/reports Write the file ./autobrowse/reports/YYYY-MM-DD-HH-MM- .md with:
AutoBrowse Session Report ** Date: ** < ISO date
** Tasks: ** < comma-separated list
** Environment: ** remote|local ** Total cost: ** $X.XX
Results | Task | Iterations | Pass Rate | Final Status | Graduated | Cost | |
|
|
|
|
|
| | ... | ... | X/5 | ✅/❌ | yes/no | $X.XX |
Per-Task Learnings
** Key insight 1: ** < what the agent learned
- ** Key insight 2: ** < another heuristic
- ** Failure mode fixed: ** < what was failing and how it was resolved
Iteration Log
|
|
|
|
| | 1 | 79 | $18.75 | ❌ fail | baseline | | 2 | 9 | $0.26 | ✅ pass | session contamination fix | | ... | ... | ... | ... | ... | Rules Only edit strategy.md — never touch task.md (unless creating it from the template) or evaluate.mjs Stay in the workspace — all training writes go to ./autobrowse/ , never to ~/.claude/skills/autobrowse/ . The skill source is read-only. One hypothesis per iteration — test one change at a time Build on wins — keep what worked, add to it Trust the trace — the inner agent shows exactly what it saw and did Graduate to ~/.claude/skills/ — the only file you write there is the final graduated SKILL.md