Gemini Computer Use Quick start

Source the env file and set your API key:

cp env.example env.sh $EDITOR env.sh source env.sh

Create a virtual environment and install dependencies:

python -m venv .venv source .venv/bin/activate pip install google-genai playwright playwright install chromium

Run the agent script with a prompt:

python scripts/computer_use_agent.py \ --prompt "Find the latest blog post title on example.com" \ --start-url "https://example.com" \ --turn-limit 6

Browser selection Default: Playwright's bundled Chromium (no env vars required). Choose a channel (Chrome/Edge) with COMPUTER_USE_BROWSER_CHANNEL. Use a custom Chromium-based executable (e.g., Brave) with COMPUTER_USE_BROWSER_EXECUTABLE.

If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.

Core workflow (agent loop) Capture a screenshot and send the user goal + screenshot to the model. Parse function_call actions in the response. Execute each action in Playwright. If a safety_decision is require_confirmation, prompt the user before executing. Send function_response objects containing the latest URL + screenshot. Repeat until the model returns only text (no actions) or you hit the turn limit. Operational guidance Run in a sandboxed browser profile or container. Use --exclude to block risky actions you do not want the model to take. Keep the viewport at 1440x900 unless you have a reason to change it. Resources Script: scripts/computer_use_agent.py Reference notes: references/google-computer-use.md Env template: env.example

gemini-computer-use

安装