gemini-research-browser-use

安装量: 56
排名: #13137

安装

npx skills add https://github.com/grasseed/google-search-browser-use --skill gemini-research-browser-use

Perform research or queries using Google Gemini via Chrome DevTools Protocol (CDP). This method reuses the user's existing Chrome login session to interact with the Gemini web interface (https://gemini.google.com/).

Prerequisites

  • Python + websockets Verify:
python3 --version
python3 -m pip show websockets

Install if missing:

python3 -m pip install websockets
  • Google Chrome Verify:
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --version
  • CDP Port Availability Verify Chrome is listening (after launch in Step 2):
curl -s http://localhost:9222/json | python3 -m json.tool
  • Non-default user data directory (required by Chrome) Chrome CDP requires a non-default profile path. Use a cloned profile so you keep login state.
rm -rf /tmp/chrome-gemini-profile
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/

Method Comparison

| Chrome Remote Debugging (CDP) | Uses existing login, full automation, reliable | Requires Chrome restart with debugging flag | ✅ Yes

| browser-use --browser real | Simple CLI | Opens new session without login | ❌ No

| browser_subagent | Visual feedback | Rate limited, may fail | ❌ No

This is the most reliable method that uses your system Chrome with existing Google login.

Prerequisites

  • Python 3 with websockets library

  • Google Chrome installed at /Applications/Google Chrome.app/

  • User logged into Google in Chrome

Step 1: Install websockets (if needed)

pip3 install websockets
# Or in virtual environment:
python3 -m venv .venv && ./.venv/bin/pip install websockets

Step 2: Launch Chrome with Remote Debugging (Non-default profile)

Important: Close any existing Chrome windows first, or use a different debugging port.

"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="/tmp/chrome-gemini-profile" \
  "https://gemini.google.com/" &

Parameters explained:

  • --remote-debugging-port=9222: Enables CDP on port 9222

  • --user-data-dir: Points to your existing Chrome profile (with login session)

  • The URL opens Gemini directly

Step 3: Verify Connection (CDP)

curl -s http://localhost:9222/json | python3 -m json.tool

Look for the Gemini page entry:

{
  "title": "Google Gemini",
  "url": "https://gemini.google.com/app",
  "webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/XXXXXXXX"
}

Note: If URL shows /app instead of just /, it means you're logged in.

Step 4: Send Query to Gemini

Save this as gemini_query.py or run inline:

import asyncio
import websockets
import json
import subprocess
import sys

async def query_gemini(query_text, wait_seconds=30):
    # Get the Gemini page WebSocket URL
    result = subprocess.run(
        ["curl", "-s", "http://localhost:9222/json"],
        capture_output=True, text=True
    )
    pages = json.loads(result.stdout)

    # Find Gemini page
    gemini_page = None
    for page in pages:
        if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
            gemini_page = page
            break

    if not gemini_page:
        print("Error: Gemini page not found. Make sure Chrome is open with Gemini.")
        return None

    ws_url = gemini_page["webSocketDebuggerUrl"]
    print(f"Connecting to: {ws_url}")

    async with websockets.connect(ws_url) as ws:
        # Step 1: Input the query
        input_js = f'''
        const editor = document.querySelector('div[contenteditable="true"]');
        if(editor) {{
            editor.focus();
            document.execCommand('insertText', false, `{query_text}`);
            editor.dispatchEvent(new Event('input', {{bubbles: true}}));
            'success';
        }} else {{
            'editor not found';
        }}
        '''

        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": input_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"Input result: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")

        # Step 2: Click send button
        await asyncio.sleep(1)
        click_js = '''
        const btn = document.querySelector('button[aria-label="傳送訊息"]');
        if(btn) { btn.click(); 'clicked'; } else { 'button not found'; }
        '''

        await ws.send(json.dumps({
            "id": 2,
            "method": "Runtime.evaluate",
            "params": {"expression": click_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"Click result: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")

        # Step 3: Wait for response
        print(f"Waiting {wait_seconds} seconds for Gemini to respond...")
        await asyncio.sleep(wait_seconds)

        # Step 4: Extract the response
        extract_js = '''
        const markdownEls = document.querySelectorAll('.markdown');
        if(markdownEls.length > 0) {
            markdownEls[markdownEls.length - 1].innerText;
        } else {
            'No response found';
        }
        '''

        await ws.send(json.dumps({
            "id": 3,
            "method": "Runtime.evaluate",
            "params": {"expression": extract_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        content = result.get('result', {}).get('result', {}).get('value', 'No content')

        return content

# Main execution
if __name__ == "__main__":
    query = sys.argv[1] if len(sys.argv) > 1 else "範例問題:請用繁體中文回答什麼是區塊鏈?"
    result = asyncio.run(query_gemini(query, wait_seconds=30))
    print("\n" + "="*50)
    print("GEMINI RESPONSE:")
    print("="*50)
    print(result)

Step 5: Run the Query

python3 gemini_query.py "範例問題:你的查詢問題"

Or inline for simple queries:

python3 << 'EOF'
import asyncio
import websockets
import json

async def send_to_gemini():
    # Get WebSocket URL
    import subprocess
    result = subprocess.run(["curl", "-s", "http://localhost:9222/json"], capture_output=True, text=True)
    pages = json.loads(result.stdout)
    ws_url = next(p["webSocketDebuggerUrl"] for p in pages if "gemini.google.com" in p.get("url", ""))

    async with websockets.connect(ws_url) as ws:
        # Input query
        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": '''
                const editor = document.querySelector('div[contenteditable="true"]');
                editor.focus();
                document.execCommand('insertText', false, '範例問題:請分析比特幣未來的價格走勢');
                editor.dispatchEvent(new Event('input', {bubbles: true}));
            '''}
        }))
        await ws.recv()

        # Click send
        await asyncio.sleep(1)
        await ws.send(json.dumps({
            "id": 2,
            "method": "Runtime.evaluate",
            "params": {"expression": '''document.querySelector('button[aria-label="傳送訊息"]').click()'''}
        }))
        await ws.recv()

        # Wait and extract
        await asyncio.sleep(30)
        await ws.send(json.dumps({
            "id": 3,
            "method": "Runtime.evaluate",
            "params": {"expression": '''
                document.querySelectorAll('.markdown')[document.querySelectorAll('.markdown').length - 1].innerText
            '''}
        }))
        response = await ws.recv()
        print(json.loads(response)['result']['result']['value'])

asyncio.run(send_to_gemini())
EOF

Alternative Method: browser-use CLI

This method is simpler but does not use your existing Chrome login. You'll need to log in manually each time.

Prerequisites

# Create virtual environment
python3 -m venv .venv

# Install browser-use
./.venv/bin/pip install browser-use

Workflow

1) Open Gemini

./.venv/bin/browser-use --browser real open "https://gemini.google.com/"

2) Get Page State

./.venv/bin/browser-use --browser real state

Look for:

  • The input textbox: contenteditable=true role=textbox

  • The send button: aria-label=傳送訊息

3) Input Text via JavaScript eval

./.venv/bin/browser-use --browser real eval "const editor = document.querySelector('div[contenteditable=\"true\"]'); editor.focus(); document.execCommand('insertText', false, 'YOUR QUERY HERE'); editor.dispatchEvent(new Event('input', {bubbles: true}));"

4) Click Send Button

# Get current state to find button index
./.venv/bin/browser-use --browser real state

# Click the send button (replace INDEX with actual number)
./.venv/bin/browser-use --browser real click INDEX

5) Close Session

./.venv/bin/browser-use close

Troubleshooting

Chrome Remote Debugging Issues

| curl: (7) Failed to connect | Chrome not running with debugging | Restart Chrome with --remote-debugging-port=9222

| WebSocket connection refused | Page ID changed | Re-fetch /json to get new WebSocket URL

| "editor not found" | Page not fully loaded | Wait a few seconds before running script

| "button not found" | Send button not visible | Check if text was actually input first

| Login page instead of app | Wrong user-data-dir path | Verify path: "$HOME/Library/Application Support/Google/Chrome"

| DevTools remote debugging requires a non-default data directory | Chrome disallows default profile for CDP | Launch with a cloned profile: /tmp/chrome-gemini-profile

| curl shows connection refused even though Chrome is running | CDP not listening due to profile path | Ensure --user-data-dir is not default and the port is free

| No Gemini page found via CDP | Gemini not loaded or not logged in | Open https://gemini.google.com/ in the launched Chrome and wait for /app

browser-use Issues

| Not logged in | browser-use creates isolated session | Use Chrome Remote Debugging method instead

| Unknown key: "請" error | CLI doesn't support Unicode | Use eval with JavaScript execCommand

| Click doesn't work | Element index changed | Re-run state before each click

Best Practices

  • Always use Chrome Remote Debugging for queries requiring authentication

  • Wait 30+ seconds for complex queries (Gemini's "Deep Think" mode takes longer)

  • Check for .markdown elements to verify response is complete

  • Use inline Python for one-off queries; use the full script for automation

  • Close Chrome debugging session when done to avoid port conflicts

  • Keep profile cloned in /tmp/chrome-gemini-profile to avoid CDP blocking the default profile

Complete Example: Crypto Price Analysis

完整工作流程

# Step 1: 準備 Chrome 設定檔副本 (避免 CDP 預設目錄限制)
rm -rf /tmp/chrome-gemini-profile
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/

# Step 2: 啟動 Chrome 遠端除錯模式
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="/tmp/chrome-gemini-profile" \
  "https://gemini.google.com/" > /dev/null 2>&1 &

# Step 3: 等待頁面載入並驗證連接
sleep 8
curl -s http://localhost:9222/json | python3 -c "import sys, json; pages = json.load(sys.stdin); gemini = [p for p in pages if p.get('type') == 'page' and 'gemini.google.com' in p.get('url', '')]; print(f\"找到 Gemini 頁面: {gemini[0]['url'] if gemini else '未找到'}\")"

方法 1: 完整查詢腳本 (query_gemini.py)

將以下內容儲存為 query_gemini.py:

import asyncio
import websockets
import json
import subprocess
import sys

async def query_gemini(query_text, wait_seconds=60):
    # Get the Gemini page WebSocket URL
    result = subprocess.run(
        ["curl", "-s", "http://localhost:9222/json"],
        capture_output=True, text=True
    )
    pages = json.loads(result.stdout)

    # Find Gemini page
    gemini_page = None
    for page in pages:
        if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
            gemini_page = page
            break

    if not gemini_page:
        print("錯誤:找不到 Gemini 頁面。請確保 Chrome 已開啟 Gemini。")
        return None

    ws_url = gemini_page["webSocketDebuggerUrl"]
    print(f"正在連接到: {ws_url}")

    async with websockets.connect(ws_url) as ws:
        # Step 1: Input the query
        input_js = f'''
        const editor = document.querySelector('div[contenteditable="true"]');
        if(editor) {{
            editor.focus();
            document.execCommand('insertText', false, `{query_text}`);
            editor.dispatchEvent(new Event('input', {{bubbles: true}}));
            'success';
        }} else {{
            'editor not found';
        }}
        '''

        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": input_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"輸入結果: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")

        # Step 2: Click send button
        await asyncio.sleep(1)
        click_js = '''
        const btn = document.querySelector('button[aria-label="傳送訊息"]');
        if(btn) { btn.click(); 'clicked'; } else { 'button not found'; }
        '''

        await ws.send(json.dumps({
            "id": 2,
            "method": "Runtime.evaluate",
            "params": {"expression": click_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"點擊結果: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")

        # Step 3: Wait for response
        print(f"等待 {wait_seconds} 秒讓 Gemini 回應...")
        await asyncio.sleep(wait_seconds)

        # Step 4: Extract the response - try to get complete content
        extract_js = '''
        const markdownEls = document.querySelectorAll('.markdown');
        if(markdownEls.length > 0) {
            const lastMarkdown = markdownEls[markdownEls.length - 1];
            // Get all text content including nested elements
            lastMarkdown.innerText || lastMarkdown.textContent || 'Empty response';
        } else {
            'No response found';
        }
        '''

        await ws.send(json.dumps({
            "id": 3,
            "method": "Runtime.evaluate",
            "params": {"expression": extract_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        content = result.get('result', {}).get('result', {}).get('value', 'No content')

        return content

# Main execution
if __name__ == "__main__":
    query = """範例問題:請詳細分析 BTC、ETH 的價格預測走勢。
需包含相關專業指標,並用繁體中文回答。"""

    result = asyncio.run(query_gemini(query, wait_seconds=60))
    print("\n" + "="*50)
    print("GEMINI 回應:")
    print("="*50)
    print(result)

執行方式:

python3 query_gemini.py

方法 2: 獲取已存在的回應 (get_gemini_response.py)

如果 Gemini 頁面已經有回應,可以使用此腳本直接提取:

import asyncio
import websockets
import json
import subprocess

async def get_all_gemini_content():
    # Get the Gemini page WebSocket URL
    result = subprocess.run(
        ["curl", "-s", "http://localhost:9222/json"],
        capture_output=True, text=True
    )
    pages = json.loads(result.stdout)

    # Find Gemini page
    gemini_page = None
    for page in pages:
        if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
            gemini_page = page
            break

    if not gemini_page:
        print("錯誤:找不到 Gemini 頁面。")
        return None

    ws_url = gemini_page["webSocketDebuggerUrl"]
    print(f"正在連接到: {ws_url}\n")

    async with websockets.connect(ws_url) as ws:
        # Extract all markdown content from the page
        extract_js = '''
        (function() {
            const markdownEls = document.querySelectorAll('.markdown');
            console.log('Found markdown elements:', markdownEls.length);

            if(markdownEls.length === 0) {
                return 'No markdown elements found';
            }

            // Get the last two markdown elements (user query and AI response)
            const responses = [];
            const startIdx = Math.max(0, markdownEls.length - 2);

            for(let i = startIdx; i < markdownEls.length; i++) {
                const text = markdownEls[i].innerText || markdownEls[i].textContent || '';
                if(text.trim()) {
                    responses.push(`[回應 ${i+1}]:\\n${text}`);
                }
            }

            return responses.join('\\n\\n' + '='.repeat(80) + '\\n\\n');
        })()
        '''

        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": extract_js, "returnByValue": True}
        }))
        response = await ws.recv()
        result = json.loads(response)
        content = result.get('result', {}).get('result', {}).get('value', 'No content')

        return content

# Main execution
if __name__ == "__main__":
    result = asyncio.run(get_all_gemini_content())
    print("="*80)
    print("GEMINI 對話內容:")
    print("="*80)
    print(result)

執行方式:

python3 get_gemini_response.py

實際使用範例

# 完整流程
rm -rf /tmp/chrome-gemini-profile && \
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/ && \
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="/tmp/chrome-gemini-profile" \
  "https://gemini.google.com/" > /dev/null 2>&1 &

# 等待並執行查詢
sleep 8 && python3 query_gemini.py

清理資源

完成查詢後,建議清理臨時文件和資源:

# 1. 關閉 Chrome 除錯會話
pkill -9 "Google Chrome"

# 2. 清理臨時設定檔 (可選,釋放磁碟空間)
rm -rf /tmp/chrome-gemini-profile

# 3. 清理測試過程中生成的臨時腳本和輸出文件
rm -f query_gemini.py get_gemini_response.py get_all_gemini_content.py
rm -f gemini_response.txt gemini_full_response.txt

最佳實踐:

  • 每次使用後關閉 Chrome - 避免佔用 9222 端口

  • 定期清理臨時設定檔 - /tmp/chrome-gemini-profile 可能佔用數百 MB

  • 保持工作目錄整潔 - 刪除測試腳本,將常用腳本整合到專案中

  • 使用完整腳本 - 將上述 query_gemini.py 儲存為專案文件,而非每次重新建立

注意事項

  • 等待時間調整 - 複雜查詢(如深度分析)建議 wait_seconds=60 或更長

  • 回應截斷問題 - 如果回應很長,可能需要多次提取或使用 get_all_gemini_content.py 方法

  • 登入狀態 - 確保 Chrome 設定檔中已登入 Google 帳號

  • 網路穩定性 - CDP 連接需要穩定的網路環境

  • 並發限制 - 避免同時開啟多個 Chrome 除錯會話在同一端口

返回排行榜