Introduction - Browser Use

What is Browser Use?

Browser Use is a powerful Python library that enables AI agents to autonomously interact with web browsers. By combining Large Language Models (LLMs) with browser automation, Browser Use allows you to describe tasks in natural language and have them executed automatically.

Example Task

“Go to HackerNews, find the top 5 Show HN posts, and extract their titles, URLs, and comment counts”

Instead of writing complex Selenium or Playwright scripts, you simply tell Browser Use what you want to accomplish, and it figures out how to do it.

Key Features

Natural Language Control

Describe tasks in plain English - no need to write explicit automation scripts

Multiple LLM Support

Works with ChatBrowserUse, OpenAI, Google Gemini, Anthropic Claude, Ollama, and more

Smart Interactions

Handles forms, navigation, data extraction, file uploads, and complex workflows automatically

Custom Tools

Extend agent capabilities with custom Python functions for APIs, 2FA, file operations, and more

Production Ready

Deploy to production with Browser Use Cloud sandboxes - handles browsers, authentication, and scaling

Visual Understanding

Uses vision models to understand page layouts and identify interactive elements

How It Works

You Provide a Task

Describe what you want to accomplish in natural language:

task = "Find the top 3 posts on HackerNews Show HN"

Agent Analyzes the Page

The agent loads the webpage, analyzes the DOM structure, and identifies interactive elements using computer vision

LLM Decides Actions

The language model determines what actions to take: clicking, typing, scrolling, extracting data, etc.

Actions Are Executed

Browser Use executes the actions through CDP (Chrome DevTools Protocol) and evaluates the results

Process Repeats

The agent continues this loop until the task is complete or encounters an error

Simple Example

Here’s a complete working example:

from browser_use import Agent, ChatBrowserUse
import asyncio
from dotenv import load_dotenv

load_dotenv()

async def main():
    agent = Agent(
        task="Search Google for 'Python automation' and tell me the top 3 results",
        llm=ChatBrowserUse(),
    )
    history = await agent.run()
    print(history.final_result())

if __name__ == "__main__":
    asyncio.run(main())

The agent automatically handles browser setup, navigation, searching, and extracting results - all from a single task description!

Common Use Cases

Data Extraction & Web Scraping

Extract structured data from websites without writing CSS selectors or XPath expressions:

task = """
Go to https://quotes.toscrape.com/ and extract the first 5 quotes
with their authors and tags in a structured format
"""

Form Automation

Fill out and submit forms automatically:

task = """
Go to the contact form and fill it with:
- Name: John Doe
- Email: john@example.com
- Message: This is an automated test
Then submit the form
"""

Authenticated Workflows

Handle logins and authenticated sessions:

task = """
1. Log in to the dashboard
2. Navigate to the reports section
3. Download the latest monthly report
"""

Research & Monitoring

Gather information from multiple sources:

task = """
Find the latest news about browser automation tools,
compare their GitHub stars, and summarize the top 3
"""

Why Browser Use?

Developer Friendly

Simple Python API, extensive documentation, and rich examples

Flexible Architecture

Customize browser settings, add custom tools, and control every aspect of automation

Open Source

MIT licensed with an active community on GitHub and Discord

Production Scale

Built-in cloud deployment with Browser Use Cloud for enterprise workloads

Architecture Overview

Browser Use consists of three main components:

Agent - The orchestrator that manages task execution and decision-making
Browser - The automation layer that controls Chromium via CDP
Tools - Built-in and custom actions the agent can perform

from browser_use import Agent, Browser, Tools

# Configure browser
browser = Browser(
    headless=False,
    window_size={'width': 1280, 'height': 720}
)

# Add custom tools
tools = Tools()

@tools.action('Get current timestamp')
def get_timestamp() -> str:
    from datetime import datetime
    return datetime.now().isoformat()

# Create agent with custom configuration
agent = Agent(
    task="Your task here",
    llm=ChatBrowserUse(),
    browser=browser,
    tools=tools
)

Next Steps

Quick Start

Get up and running in 5 minutes

Installation Guide

Detailed installation and setup instructions

Examples

Browse 100+ real-world examples

Join Discord

Get help from 20k+ developers

Community & Support

Browser Use has a thriving community:

20,000+ developers in our Discord
1,000+ examples and use cases
Active development - we ship updates daily
Enterprise support available at support@browser-use.com

Ready to automate? Continue to the Quick Start guide to build your first agent in under 5 minutes.

​What is Browser Use?

Example Task

​Key Features

Natural Language Control

Multiple LLM Support

Smart Interactions

Custom Tools

Production Ready

Visual Understanding

​How It Works

​Simple Example

​Common Use Cases

​Data Extraction & Web Scraping

​Form Automation

​Authenticated Workflows

​Research & Monitoring

​Why Browser Use?

Developer Friendly

Flexible Architecture

Open Source

Production Scale

​Architecture Overview

​Next Steps

Quick Start

Installation Guide

Examples

Join Discord

​Community & Support

What is Browser Use?

Key Features

How It Works

Simple Example

Common Use Cases

Data Extraction & Web Scraping

Form Automation

Authenticated Workflows

Research & Monitoring

Why Browser Use?

Architecture Overview

Next Steps

Community & Support