google/computer-use-preview View GitHub Homepage for Latest Official Releases

Google's official preview project of the Gemini 2.5 Computer Use model, supporting an AI agent that controls the browser to perform tasks through natural language instructions.

Apache-2.0Pythoncomputer-use-previewgoogle 1.6k Last Updated: October 10, 2025

Google Computer Use Preview Project Introduction

Project Overview

Google Computer Use Preview is an open-source project officially released by Google, showcasing the capabilities of the Gemini 2.5-based Computer Use model. This project allows developers to control a browser to perform various tasks using natural language instructions, achieving a true browser automation agent.

Project Address: https://github.com/google/computer-use-preview

Open Source License: Apache 2.0

Core Features

1. Natural Language Control

Users can describe tasks using simple natural language, and the AI agent will automatically parse and execute corresponding browser operations, such as:

Clicking buttons
Filling forms
Scrolling pages
Entering text
Performing searches

2. Multi-environment Support

The project supports two operating environments:

Playwright: Local browser control, executing tasks locally using the Chrome browser.
Browserbase: Cloud browser service, supporting remote browser control.

3. Based on Gemini 2.5 Model

This project uses Google's latest gemini-2.5-computer-use-preview-10-2025 model, which is specifically optimized for UI interaction and features:

Powerful visual understanding capabilities
Precise UI element recognition
Low-latency response
Excellent reasoning capabilities

4. API Flexibility

Supports two API access methods:

Gemini Developer API: Suitable for rapid development and testing.
Vertex AI: Suitable for enterprise-grade application deployment.

Technical Architecture

Core Components

Browser Control Layer
- Playwright: Local browser automation framework
- Browserbase: Cloud browser infrastructure
AI Model Layer
- Gemini 2.5 Computer Use model
- Visual understanding and reasoning capabilities
- UI action generation
Agent Loop
- Receives user queries
- Captures screenshots
- Generates and executes actions
- Tracks historical operations

Working Principle

The user provides a task description in natural language.
The system captures a screenshot of the current browser state.
The Gemini model analyzes the screenshot and task requirements.
The model generates specific UI operation instructions (click, type, scroll, etc.).
The operation is executed, and the new screen state is obtained.
Steps 2-5 are repeated until the task is complete.

Quick Start

Environment Requirements

Python 3.x
Chrome browser
Gemini API key (or Vertex AI access)

Installation Steps

Clone the project

git clone https://github.com/google/computer-use-preview.git
cd computer-use-preview

Create a virtual environment and install dependencies

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Install Playwright and browser

# Install system dependencies required by Chrome
playwright install-deps chrome

# Install Chrome browser
playwright install chrome

Configure API Key

Using Gemini Developer API

export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"

Or permanently add to the virtual environment:

echo 'export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"' >> .venv/bin/activate
deactivate
source .venv/bin/activate

Using Vertex AI

export USE_VERTEXAI=true
export VERTEXAI_PROJECT="YOUR_PROJECT_ID"
export VERTEXAI_LOCATION="YOUR_LOCATION"

Usage Examples

1. Basic Usage (Playwright local environment)

python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright"

2. Specify Initial URL

python main.py \
  --query="Go to Google and type 'Hello World' into the search bar" \
  --env="playwright" \
  --initial_url="https://www.google.com/search?q=latest+AI+news"

3. Use Browserbase Cloud Environment

First, set Browserbase environment variables:

export BROWSERBASE_API_KEY="YOUR_BROWSERBASE_API_KEY"
export BROWSERBASE_PROJECT_ID="YOUR_BROWSERBASE_PROJECT_ID"

Then run:

python main.py \
  --query="Go to Google and type 'Hello World' into the search bar" \
  --env="browserbase"

Command Line Argument Description

Main Parameters

Parameter	Description	Required	Default Value	Supported Environments
`--query`	Natural language task description	Yes	N/A	All
`--env`	Operating environment (playwright/browserbase)	No	N/A	All
`--initial_url`	Initial URL to load when the browser starts	No	https://www.google.com	playwright
`--highlight_mouse`	Highlight mouse position in screenshots (for debugging)	No	false	playwright

Environment Variables

Variable Name	Description	Required
`GEMINI_API_KEY`	Gemini API key	Yes (when using Gemini API)
`BROWSERBASE_API_KEY`	Browserbase API key	Yes (when using browserbase environment)
`BROWSERBASE_PROJECT_ID`	Browserbase project ID	Yes (when using browserbase environment)
`USE_VERTEXAI`	Enable Vertex AI	No
`VERTEXAI_PROJECT`	Vertex AI project ID	Yes (when using Vertex AI)
`VERTEXAI_LOCATION`	Vertex AI location	Yes (when using Vertex AI)

Application Scenarios

1. Automated Testing

UI regression testing
End-to-end testing
Cross-browser testing

2. Data Scraping

Automated form filling
Web data extraction
Scheduled task execution

3. Workflow Automation

Repetitive task automation
Multi-step business processes
Batch operation processing

4. Personal Assistant

Automate daily web operations
Information collection and organization
Intelligent web navigation

Performance

According to evaluation data from Google and Browserbase, the Gemini 2.5 Computer Use model performs excellently in multiple benchmark tests:

OnlineMind2Web: Leading accuracy in web control tasks
WebVoyager: Excellent performance in complex web navigation tasks
Low Latency: Faster response compared to competing models
High Accuracy: Outperforms other mainstream models in browser and mobile control benchmarks

Notes

Security

This model is a preview version and may contain bugs and security vulnerabilities.
Model-suggested actions may be inappropriate or unsafe.
Adversarial inputs may lead to malicious operations.
It is recommended to conduct thorough testing before using in a production environment.

Usage Restrictions

Requires explicit human confirmation mechanisms.
Complies with Google's Generative AI prohibited use policy.
This product is subject to Pre-GA terms.

Best Practices

Always test in a controlled environment.
Monitor the agent's operational behavior.
Add human review for critical operations.
Regularly update to the latest version.

Related Resources

Official Documentation: Vertex AI Computer Use Documentation
Google AI Studio: Rapid testing and prototyping
Browserbase Demo: Experience Computer Use features online
Developer Forum: Provide feedback and get support

Technical Advantages

Visual Understanding: Powerful visual recognition capabilities based on Gemini 2.5 Pro.
Native UI Interaction: Operates directly on graphical interfaces without requiring structured APIs.
Post-login Operations: Supports complex tasks requiring authentication.
Form Handling: Intelligently fills and submits complex forms.
Interactive Element Operations: Handles interactive components like dropdown menus and filters.

Project Significance

Google Computer Use Preview represents a significant advancement in AI agent technology. By enabling AI models to interact directly with graphical interfaces, much like humans do, rather than relying on structured APIs, this technology opens up new possibilities for building general-purpose agents. It allows developers to:

Automate complex tasks that previously required human intervention.
Rapidly build intelligent browser automation applications.
Reduce development costs for UI testing and workflow automation.
Explore new forms of human-computer interaction.

Future Outlook

As model capabilities continue to improve, computer use technology will evolve in the following areas:

Higher accuracy and reliability
More complex multi-step task execution
Better security and controllability
Deeper integration with other AI capabilities
Broader coverage of application scenarios