Google's official preview project of the Gemini 2.5 Computer Use model, supporting an AI agent that controls the browser to perform tasks through natural language instructions.

Apache-2.0Pythoncomputer-use-previewgoogle 1.6k Last Updated: October 10, 2025

Google Computer Use Preview Project Introduction

Project Overview

Google Computer Use Preview is an open-source project officially released by Google, showcasing the capabilities of the Gemini 2.5-based Computer Use model. This project allows developers to control a browser to perform various tasks using natural language instructions, achieving a true browser automation agent.

Project Address: https://github.com/google/computer-use-preview

Open Source License: Apache 2.0

Core Features

1. Natural Language Control

Users can describe tasks using simple natural language, and the AI agent will automatically parse and execute corresponding browser operations, such as:

  • Clicking buttons
  • Filling forms
  • Scrolling pages
  • Entering text
  • Performing searches

2. Multi-environment Support

The project supports two operating environments:

  • Playwright: Local browser control, executing tasks locally using the Chrome browser.
  • Browserbase: Cloud browser service, supporting remote browser control.

3. Based on Gemini 2.5 Model

This project uses Google's latest gemini-2.5-computer-use-preview-10-2025 model, which is specifically optimized for UI interaction and features:

  • Powerful visual understanding capabilities
  • Precise UI element recognition
  • Low-latency response
  • Excellent reasoning capabilities

4. API Flexibility

Supports two API access methods:

  • Gemini Developer API: Suitable for rapid development and testing.
  • Vertex AI: Suitable for enterprise-grade application deployment.

Technical Architecture

Core Components

  1. Browser Control Layer

    • Playwright: Local browser automation framework
    • Browserbase: Cloud browser infrastructure
  2. AI Model Layer

    • Gemini 2.5 Computer Use model
    • Visual understanding and reasoning capabilities
    • UI action generation
  3. Agent Loop

    • Receives user queries
    • Captures screenshots
    • Generates and executes actions
    • Tracks historical operations

Working Principle

  1. The user provides a task description in natural language.
  2. The system captures a screenshot of the current browser state.
  3. The Gemini model analyzes the screenshot and task requirements.
  4. The model generates specific UI operation instructions (click, type, scroll, etc.).
  5. The operation is executed, and the new screen state is obtained.
  6. Steps 2-5 are repeated until the task is complete.

Quick Start

Environment Requirements

  • Python 3.x
  • Chrome browser
  • Gemini API key (or Vertex AI access)

Installation Steps

  1. Clone the project

    git clone https://github.com/google/computer-use-preview.git
    cd computer-use-preview
    
  2. Create a virtual environment and install dependencies

    python3 -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
    
  3. Install Playwright and browser

    # Install system dependencies required by Chrome
    playwright install-deps chrome
    
    # Install Chrome browser
    playwright install chrome
    

Configure API Key

Using Gemini Developer API

export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"

Or permanently add to the virtual environment:

echo 'export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"' >> .venv/bin/activate
deactivate
source .venv/bin/activate

Using Vertex AI

export USE_VERTEXAI=true
export VERTEXAI_PROJECT="YOUR_PROJECT_ID"
export VERTEXAI_LOCATION="YOUR_LOCATION"

Usage Examples

1. Basic Usage (Playwright local environment)

python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright"

2. Specify Initial URL

python main.py \
  --query="Go to Google and type 'Hello World' into the search bar" \
  --env="playwright" \
  --initial_url="https://www.google.com/search?q=latest+AI+news"

3. Use Browserbase Cloud Environment

First, set Browserbase environment variables:

export BROWSERBASE_API_KEY="YOUR_BROWSERBASE_API_KEY"
export BROWSERBASE_PROJECT_ID="YOUR_BROWSERBASE_PROJECT_ID"

Then run:

python main.py \
  --query="Go to Google and type 'Hello World' into the search bar" \
  --env="browserbase"

Command Line Argument Description

Main Parameters

Parameter Description Required Default Value Supported Environments
--query Natural language task description Yes N/A All
--env Operating environment (playwright/browserbase) No N/A All
--initial_url Initial URL to load when the browser starts No https://www.google.com playwright
--highlight_mouse Highlight mouse position in screenshots (for debugging) No false playwright

Environment Variables

Variable Name Description Required
GEMINI_API_KEY Gemini API key Yes (when using Gemini API)
BROWSERBASE_API_KEY Browserbase API key Yes (when using browserbase environment)
BROWSERBASE_PROJECT_ID Browserbase project ID Yes (when using browserbase environment)
USE_VERTEXAI Enable Vertex AI No
VERTEXAI_PROJECT Vertex AI project ID Yes (when using Vertex AI)
VERTEXAI_LOCATION Vertex AI location Yes (when using Vertex AI)

Application Scenarios

1. Automated Testing

  • UI regression testing
  • End-to-end testing
  • Cross-browser testing

2. Data Scraping

  • Automated form filling
  • Web data extraction
  • Scheduled task execution

3. Workflow Automation

  • Repetitive task automation
  • Multi-step business processes
  • Batch operation processing

4. Personal Assistant

  • Automate daily web operations
  • Information collection and organization
  • Intelligent web navigation

Performance

According to evaluation data from Google and Browserbase, the Gemini 2.5 Computer Use model performs excellently in multiple benchmark tests:

  • OnlineMind2Web: Leading accuracy in web control tasks
  • WebVoyager: Excellent performance in complex web navigation tasks
  • Low Latency: Faster response compared to competing models
  • High Accuracy: Outperforms other mainstream models in browser and mobile control benchmarks

Notes

Security

  • This model is a preview version and may contain bugs and security vulnerabilities.
  • Model-suggested actions may be inappropriate or unsafe.
  • Adversarial inputs may lead to malicious operations.
  • It is recommended to conduct thorough testing before using in a production environment.

Usage Restrictions

  • Requires explicit human confirmation mechanisms.
  • Complies with Google's Generative AI prohibited use policy.
  • This product is subject to Pre-GA terms.

Best Practices

  • Always test in a controlled environment.
  • Monitor the agent's operational behavior.
  • Add human review for critical operations.
  • Regularly update to the latest version.

Related Resources

  • Official Documentation: Vertex AI Computer Use Documentation
  • Google AI Studio: Rapid testing and prototyping
  • Browserbase Demo: Experience Computer Use features online
  • Developer Forum: Provide feedback and get support

Technical Advantages

  1. Visual Understanding: Powerful visual recognition capabilities based on Gemini 2.5 Pro.
  2. Native UI Interaction: Operates directly on graphical interfaces without requiring structured APIs.
  3. Post-login Operations: Supports complex tasks requiring authentication.
  4. Form Handling: Intelligently fills and submits complex forms.
  5. Interactive Element Operations: Handles interactive components like dropdown menus and filters.

Project Significance

Google Computer Use Preview represents a significant advancement in AI agent technology. By enabling AI models to interact directly with graphical interfaces, much like humans do, rather than relying on structured APIs, this technology opens up new possibilities for building general-purpose agents. It allows developers to:

  • Automate complex tasks that previously required human intervention.
  • Rapidly build intelligent browser automation applications.
  • Reduce development costs for UI testing and workflow automation.
  • Explore new forms of human-computer interaction.

Future Outlook

As model capabilities continue to improve, computer use technology will evolve in the following areas:

  • Higher accuracy and reliability
  • More complex multi-step task execution
  • Better security and controllability
  • Deeper integration with other AI capabilities
  • Broader coverage of application scenarios

Star History Chart