Login

AI-powered browser automation framework that combines natural language and code to control browsers

MITTypeScript 13.9kbrowserbasestagehand Last Updated: 2025-07-14

Stagehand - AI-Powered Browser Automation Framework

Project Overview

Stagehand is a production-ready AI browser automation framework developed by Browserbase. It addresses the pain points of existing browser automation tools: they either require writing low-level code (like Selenium, Playwright, Puppeteer) or use high-level agents that are unpredictable in production environments.

Core Features

1. Flexible Control Methods

  • Code and Natural Language Combination: Developers can choose when to use code and when to use natural language.
  • AI Navigation: Use AI for navigation on unfamiliar pages.
  • Precise Control: Use code (Playwright) when you know exactly what you want to do.

2. Preview and Caching Features

  • Action Preview: Preview AI actions before execution.
  • Caching Mechanism: Easily cache repeatable operations to save time and token consumption.

3. One-Click Integration of Computer Vision Models

  • SOTA Model Support: Integrate the latest computer vision models from OpenAI and Anthropic with a single line of code.
  • Seamless Integration: Introducing stagehand.agent: a powerful new method to integrate SOTA computer vision models or Browserbase's Open Operator into Stagehand with just one line of code.

Quick Start

Installation

npx create-browser-app

Local Development

git clone https://github.com/browserbase/stagehand.git
cd stagehand
npm install
npx playwright install
npm run build
npm run example # Run the example script in ./examples/example.ts

Environment Configuration

cp .env.example .env
nano .env # Edit the .env file to add API keys

Usage Example

Basic Usage

// Use Playwright functions to interact with the page object
const page = stagehand.page;
await page.goto("https://github.com/browserbase");

// Use act() to perform a single action
await page.act("click on the stagehand repo");

// Use a computer vision agent for larger operations
const agent = stagehand.agent({
  provider: "openai",
  model: "computer-use-preview",
});
await agent.execute("Get to the latest PR");

// Use extract() to read data from the page
const { author, title } = await page.extract({
  instruction: "extract the author and title of the PR",
  schema: z.object({
    author: z.string().describe("The username of the PR author"),
    title: z.string().describe("The title of the PR"),
  }),
});

Core Methods

1. The act() Method

  • Performs a single browser action.
  • Supports natural language instructions.
  • Suitable for actions like clicking, typing, and navigation.

2. The extract() Method

  • Extracts structured data from the page.
  • Integrates Zod schema validation.
  • Supports complex data extraction tasks.

3. The observe() Method

  • Observes page state and changes.
  • Used for conditional logic and state monitoring.

4. The agent() Method (New in V2)

  • Integrates advanced computer vision models.
  • Supports multi-step workflows.
  • Suitable for complex interaction scenarios.

Version 2.0 New Features

Stagehand V2 introduces several significant improvements:

Performance Improvements

  • Lightning-fast act and extract: Significant performance improvements make automation run faster.
  • a11y-tree based optimizations: Faster act/extract methods based on the accessibility tree.

Enhanced Logging

  • Better visibility into the automation process.
  • Improved logging and debugging capabilities.

Comprehensive Documentation

  • Completely redesigned documentation site.
  • Better examples, guides, and best practices.

Improved Error Handling

  • More stable error handling mechanisms.
  • Better error messages and debugging support.

Technical Architecture

Dependencies

  • Playwright: As the core backbone for web automation.
  • Zod: For data structure validation.
  • TypeScript: The primary development language.

Multi-language Support

In addition to the TypeScript/JavaScript version, the project also offers:

Integration with Browserbase

Browserbase is your cloud browser provider. Use Stagehand to build more powerful features, including advanced capabilities like session replay, prompt observability, and CAPTCHA solving.

Summary

Stagehand is a revolutionary browser automation framework that perfectly blends the precision of traditional code control with the flexibility of AI natural language processing. Whether for simple web operations or complex data extraction tasks, Stagehand provides a production-grade solution. Its Version 2.0 performance enhancements and new features make it the preferred tool for modern browser automation.

Star History Chart