Google's official preview project of the Gemini 2.5 Computer Use model, supporting an AI agent that controls the browser to perform tasks through natural language instructions.
Google Computer Use Preview Project Introduction
Project Overview
Google Computer Use Preview is an open-source project officially released by Google, showcasing the capabilities of the Gemini 2.5-based Computer Use model. This project allows developers to control a browser to perform various tasks using natural language instructions, achieving a true browser automation agent.
Project Address: https://github.com/google/computer-use-preview
Open Source License: Apache 2.0
Core Features
1. Natural Language Control
Users can describe tasks using simple natural language, and the AI agent will automatically parse and execute corresponding browser operations, such as:
- Clicking buttons
- Filling forms
- Scrolling pages
- Entering text
- Performing searches
2. Multi-environment Support
The project supports two operating environments:
- Playwright: Local browser control, executing tasks locally using the Chrome browser.
- Browserbase: Cloud browser service, supporting remote browser control.
3. Based on Gemini 2.5 Model
This project uses Google's latest gemini-2.5-computer-use-preview-10-2025 model, which is specifically optimized for UI interaction and features:
- Powerful visual understanding capabilities
- Precise UI element recognition
- Low-latency response
- Excellent reasoning capabilities
4. API Flexibility
Supports two API access methods:
- Gemini Developer API: Suitable for rapid development and testing.
- Vertex AI: Suitable for enterprise-grade application deployment.
Technical Architecture
Core Components
Browser Control Layer
- Playwright: Local browser automation framework
- Browserbase: Cloud browser infrastructure
AI Model Layer
- Gemini 2.5 Computer Use model
- Visual understanding and reasoning capabilities
- UI action generation
Agent Loop
- Receives user queries
- Captures screenshots
- Generates and executes actions
- Tracks historical operations
Working Principle
- The user provides a task description in natural language.
- The system captures a screenshot of the current browser state.
- The Gemini model analyzes the screenshot and task requirements.
- The model generates specific UI operation instructions (click, type, scroll, etc.).
- The operation is executed, and the new screen state is obtained.
- Steps 2-5 are repeated until the task is complete.
Quick Start
Environment Requirements
- Python 3.x
- Chrome browser
- Gemini API key (or Vertex AI access)
Installation Steps
Clone the project
git clone https://github.com/google/computer-use-preview.git cd computer-use-previewCreate a virtual environment and install dependencies
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txtInstall Playwright and browser
# Install system dependencies required by Chrome playwright install-deps chrome # Install Chrome browser playwright install chrome
Configure API Key
Using Gemini Developer API
export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"
Or permanently add to the virtual environment:
echo 'export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"' >> .venv/bin/activate
deactivate
source .venv/bin/activate
Using Vertex AI
export USE_VERTEXAI=true
export VERTEXAI_PROJECT="YOUR_PROJECT_ID"
export VERTEXAI_LOCATION="YOUR_LOCATION"
Usage Examples
1. Basic Usage (Playwright local environment)
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright"
2. Specify Initial URL
python main.py \
--query="Go to Google and type 'Hello World' into the search bar" \
--env="playwright" \
--initial_url="https://www.google.com/search?q=latest+AI+news"
3. Use Browserbase Cloud Environment
First, set Browserbase environment variables:
export BROWSERBASE_API_KEY="YOUR_BROWSERBASE_API_KEY"
export BROWSERBASE_PROJECT_ID="YOUR_BROWSERBASE_PROJECT_ID"
Then run:
python main.py \
--query="Go to Google and type 'Hello World' into the search bar" \
--env="browserbase"
Command Line Argument Description
Main Parameters
| Parameter | Description | Required | Default Value | Supported Environments |
|---|---|---|---|---|
--query |
Natural language task description | Yes | N/A | All |
--env |
Operating environment (playwright/browserbase) | No | N/A | All |
--initial_url |
Initial URL to load when the browser starts | No | https://www.google.com | playwright |
--highlight_mouse |
Highlight mouse position in screenshots (for debugging) | No | false | playwright |
Environment Variables
| Variable Name | Description | Required |
|---|---|---|
GEMINI_API_KEY |
Gemini API key | Yes (when using Gemini API) |
BROWSERBASE_API_KEY |
Browserbase API key | Yes (when using browserbase environment) |
BROWSERBASE_PROJECT_ID |
Browserbase project ID | Yes (when using browserbase environment) |
USE_VERTEXAI |
Enable Vertex AI | No |
VERTEXAI_PROJECT |
Vertex AI project ID | Yes (when using Vertex AI) |
VERTEXAI_LOCATION |
Vertex AI location | Yes (when using Vertex AI) |
Application Scenarios
1. Automated Testing
- UI regression testing
- End-to-end testing
- Cross-browser testing
2. Data Scraping
- Automated form filling
- Web data extraction
- Scheduled task execution
3. Workflow Automation
- Repetitive task automation
- Multi-step business processes
- Batch operation processing
4. Personal Assistant
- Automate daily web operations
- Information collection and organization
- Intelligent web navigation
Performance
According to evaluation data from Google and Browserbase, the Gemini 2.5 Computer Use model performs excellently in multiple benchmark tests:
- OnlineMind2Web: Leading accuracy in web control tasks
- WebVoyager: Excellent performance in complex web navigation tasks
- Low Latency: Faster response compared to competing models
- High Accuracy: Outperforms other mainstream models in browser and mobile control benchmarks
Notes
Security
- This model is a preview version and may contain bugs and security vulnerabilities.
- Model-suggested actions may be inappropriate or unsafe.
- Adversarial inputs may lead to malicious operations.
- It is recommended to conduct thorough testing before using in a production environment.
Usage Restrictions
- Requires explicit human confirmation mechanisms.
- Complies with Google's Generative AI prohibited use policy.
- This product is subject to Pre-GA terms.
Best Practices
- Always test in a controlled environment.
- Monitor the agent's operational behavior.
- Add human review for critical operations.
- Regularly update to the latest version.
Related Resources
- Official Documentation: Vertex AI Computer Use Documentation
- Google AI Studio: Rapid testing and prototyping
- Browserbase Demo: Experience Computer Use features online
- Developer Forum: Provide feedback and get support
Technical Advantages
- Visual Understanding: Powerful visual recognition capabilities based on Gemini 2.5 Pro.
- Native UI Interaction: Operates directly on graphical interfaces without requiring structured APIs.
- Post-login Operations: Supports complex tasks requiring authentication.
- Form Handling: Intelligently fills and submits complex forms.
- Interactive Element Operations: Handles interactive components like dropdown menus and filters.
Project Significance
Google Computer Use Preview represents a significant advancement in AI agent technology. By enabling AI models to interact directly with graphical interfaces, much like humans do, rather than relying on structured APIs, this technology opens up new possibilities for building general-purpose agents. It allows developers to:
- Automate complex tasks that previously required human intervention.
- Rapidly build intelligent browser automation applications.
- Reduce development costs for UI testing and workflow automation.
- Explore new forms of human-computer interaction.
Future Outlook
As model capabilities continue to improve, computer use technology will evolve in the following areas:
- Higher accuracy and reliability
- More complex multi-step task execution
- Better security and controllability
- Deeper integration with other AI capabilities
- Broader coverage of application scenarios