- Skill: Chrome Automation (agent-browser)
- Automate browser tasks in the user's real Chrome session via the
- agent-browser
- CLI.
- Prerequisite
- agent-browser must be installed and Chrome must have remote debugging enabled. See
references/agent-browser-setup.md
if unsure.
Core Principle: Reuse the User's Existing Chrome
This skill operates on a
single Chrome process
— the user's real browser. There is no session management, no separate profiles, no launching a fresh Playwright browser.
Always Start by Listing Tabs
Before opening any new page,
always list existing tabs first
:
agent-browser --auto-connect tab list
This returns all open tabs with their index numbers, titles, and URLs. Check if the page you need is already open:
If the target page is already open
→ switch to that tab directly instead of opening a new one. The user likely has it open because they are already logged in and the page is in the right state.
agent-browser --auto-connect tab
<
index
If the target page is NOT open → open it in the current tab or a new tab. agent-browser --auto-connect open < url
Why This Matters The user's Chrome has their cookies, login sessions, and browser state Opening a new page when one is already available wastes time and may lose login state Many marketing platforms (social media dashboards, ad managers, CMS tools) require login — reusing an existing logged-in tab avoids re-authentication Connection Always use --auto-connect to connect to the user's running Chrome instance: agent-browser --auto-connect < command
This auto-discovers Chrome with remote debugging enabled. If connection fails, guide the user through enabling remote debugging (see references/agent-browser-setup.md ). Common Workflows 1. Navigate and Interact
List tabs to find existing pages
agent-browser --auto-connect tab list
Switch to an existing tab (if found)
agent-browser --auto-connect tab < index
Or open a new page
agent-browser --auto-connect open https://example.com agent-browser --auto-connect wait --load networkidle
Take a snapshot to see interactive elements
agent-browser --auto-connect snapshot -i
Click, fill, etc.
agent-browser --auto-connect click @e3 agent-browser --auto-connect fill @e5 "some text" 2. Extract Data from a Page
Get all text content
agent-browser --auto-connect get text body
Take a screenshot for visual inspection
agent-browser --auto-connect screenshot
Execute JavaScript for structured data
agent-browser --auto-connect
eval
"JSON.stringify(document.querySelectorAll('table tr').length)"
3. Replay a Chrome DevTools Recording
The user may provide a recording exported from Chrome DevTools Recorder (JSON, Puppeteer JS, or @puppeteer/replay JS format). See
Replaying Recordings
below.
Step-by-Step Interaction Guide
Taking Snapshots
Use
snapshot -i
to see all interactive elements with refs (
@e1
,
@e2
, ...):
agent-browser --auto-connect snapshot
-i
The output lists each interactive element with its role, text, and ref. Use these refs for subsequent actions.
Step Type Mapping
Action
Command
Navigate
agent-browser --auto-connect open
check interactive state
agent-browser --auto-connect screenshot
visual verification
Replaying Recordings Accepted Formats JSON (recommended) — structured, can be read progressively:
Count steps
jq '.steps | length' recording.json
Read first 5 steps
- jq
- '.steps[0:5]'
- recording.json
- @puppeteer/replay JS
- (
- import
- )
- Puppeteer JS
- (
- require('puppeteer')
- ,
- page.goto
- ,
- Locator.race
- )
- How to Replay
- Parse the recording
- — understand the full intent before acting. Summarize what the recording does.
- List tabs first
- — check if the target page is already open.
- Navigate
- — execute
- navigate
- steps, reusing existing tabs when possible.
- For each interaction step
- :
- Take a snapshot (
- snapshot -i
- ) to see current interactive elements
- Match the recording's
- aria/...
- selectors against the snapshot
- Fall back to
- text/...
- , then CSS class hints, then screenshot
- Do not rely on ember IDs, numeric IDs, or exact XPaths
- — these change every page load
- Verify after each step
- — snapshot or screenshot to confirm
- Iframe-Heavy Sites
- snapshot -i
- operates on the main frame only and
- cannot penetrate iframes
- . Sites like LinkedIn, Gmail, and embedded editors render content inside iframes.
- Detecting Iframe Issues
- snapshot -i
- returns unexpectedly short or empty results
- Recording references elements not appearing in snapshot output
- get text body
- content doesn't match what a screenshot shows
- Workarounds
- Use
- eval
- to access iframe content
- :
- agent-browser --auto-connect
- eval
- --stdin
- <<
- 'EVALEOF'
- const frame = document.querySelector('iframe[data-testid="interop-iframe"]');
- const doc = frame.contentDocument;
- const btn = doc.querySelector('button[aria-label="Send"]');
- btn.click();
- EVALEOF
- Note: Only works for same-origin iframes.
- Use
- keyboard
- for blind input
-
- If the iframe element has focus,
- keyboard inserttext "..."
- sends text regardless of frame boundaries.
- Use
- get text body
- to read full page content including iframes.
- Use
- screenshot
- for visual verification when snapshot is unreliable.
- When to Ask the User
- If workarounds fail after 2 attempts on the same step, pause and explain:
- The page uses iframes that cannot be accessed via snapshot
- Which element you need and what you expected
- Ask the user to perform that step manually, then continue
- Handling Unexpected Situations
- Handle Automatically (do not stop):
- Popups or banners → dismiss them (
- find text "Dismiss" click
- or
- find text "Close" click
- )
- Cookie consent dialogs → accept or dismiss
- Tooltip overlays → close them first
- Element not in snapshot → try
- find text "..." click
- , or scroll to reveal with
- scroll down 300
- Pause and Ask the User:
- Login / authentication is required
- A CAPTCHA appears
- Page structure is completely different from expected
- A destructive action is about to happen (deleting data, sending real content) — confirm first
- Stuck for more than 2 attempts on the same step
- All iframe workarounds have failed
- When pausing, explain clearly: what step you are on, what you expected, and what you see.
- Key Commands Reference
- Command
- Description
- tab list
- List all open tabs with index, title, and URL
- tab
- Switch to an existing tab by index
- tab new
- Open a new empty tab
- tab close
- Close the current tab
- open
- Navigate to URL
- snapshot -i
- List interactive elements with refs
- click @eN
- Click element by ref
- fill @eN "text"
- Clear and fill standard input/textarea
- type @eN "text"
- Type without clearing
- keyboard inserttext "text"
- Insert text (best for contenteditable)
- press
- Press keyboard key
- scroll down/up
- Scroll page in pixels
- wait @eN
- Wait for element to appear
- wait --load networkidle
- Wait for network to settle
- wait
- Wait for a duration
- screenshot [path]
- Take screenshot
- screenshot --annotate
- Screenshot with numbered labels
- eval
- Execute JavaScript in page
- get text body
- Get all text content
- get url
- Get current URL
- set viewport
- Set viewport size
- find text "..." click
- Semantic find and click
- close
- Close browser session
- Known Limitations
- Iframe blindness
- :
- snapshot -i
- cannot see inside iframes. See
- Iframe-Heavy Sites
- .
- find text
- strict mode
- Fails when multiple elements match. Use snapshot -i to locate the specific ref instead. fill vs contenteditable : fill only works on and