Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/browser-use/browser-use/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Tools system is the bridge between the LLM and browser actions. It provides a registry of actions the agent can perform, handles parameter validation, and manages action execution. Each tool is a function that the LLM can call to interact with the web.

Architecture

Registry

Centralized catalog of available actions

Action Models

Pydantic models for type-safe parameters

Parameter Injection

Automatic dependency injection from context

Result Handling

Structured responses with ActionResult

The Tools Class

The Tools class (line 345 in tools/service.py) manages the action registry:
class Tools(Generic[Context]):
    def __init__(
        self,
        exclude_actions: list[str] | None = None,
        output_model: type[T] | None = None,
        display_files_in_done_text: bool = True,
    ):
        self.registry = Registry[Context](exclude_actions or [])
        self._output_model = output_model
        self._coordinate_clicking_enabled = False

Creating a Tools Instance

from browser_use import Tools

# Default tools (all actions)
tools = Tools()

# Exclude specific actions
tools = Tools(exclude_actions=['search', 'wait'])

# With structured output model
from pydantic import BaseModel

class ProductInfo(BaseModel):
    name: str
    price: float
    rating: float

tools = Tools(output_model=ProductInfo)

Default Browser Actions

Interaction Actions

Click an element by index or coordinate
# By index (element from selector_map)
{
    "click": {
        "index": 42
    }
}

# By coordinate (advanced models only)
{
    "click": {
        "coordinate_x": 500,
        "coordinate_y": 300
    }
}
Implementation (line 565 for index, line 521 for coordinate):
  • Looks up element from selector_map
  • Highlights element visually
  • Detects if click opens new tab
  • Handles special cases (dropdowns, file inputs)
  • Returns: "Clicked {element_description}"
Coordinate clicking is auto-enabled for Claude Sonnet 4, Claude Opus 4, Gemini 3 Pro, and Browser Use models.

Content Extraction

Extract structured data from page using LLM
{
    "extract": {
        "query": "Extract product names and prices",
        "extract_links": true,
        "start_from_char": 0,  # For pagination
        "output_schema": {  # Optional structured output
            "type": "object",
            "properties": {
                "products": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "price": {"type": "number"}
                        }
                    }
                }
            }
        }
    }
}
Implementation (line 947):
  1. Extracts clean markdown from page (removes ads/noise)
  2. Chunks content if over 100k chars
  3. Calls page_extraction_llm with query
  4. Returns structured or free-text result
  5. Saves to file if result is large
Content Processing:
  • Original HTML → Initial Markdown → Filtered Markdown
  • Structure-aware chunking (preserves tables, lists)
  • Overlap context for continuation chunks
  • Stats included in response
Returns: <url>...</url><query>...</query><result>...</result>

Tab Management

Switch to another tab
{
    "switch": {
        "tab_id": "a1b2"  # Last 4 chars of target_id
    }
}
Returns: "Switched to tab #{tab_id}"

File Operations

Upload file to input[type=file]
{
    "upload_file": {
        "index": 10,  # File input element index
        "path": "/path/to/file.pdf"
    }
}
Implementation (line 721):
  • Validates file exists and has content
  • Finds file input near selected element
  • Falls back to closest file input to scroll position
  • Returns: "Successfully uploaded file"
Files must be in available_file_paths parameter when creating the agent.

Form Controls

Completion

Complete the task
{
    "done": {
        "success": true,
        "text": "Successfully extracted 5 products",
        "attachments": ["products.csv"]  # Optional file paths
    }
}
Signals task completion with final output.

Creating Custom Tools

Basic Custom Tool

from browser_use import Tools, ActionResult

tools = Tools()

@tools.action('Ask human for help with a question')
async def ask_human(question: str) -> ActionResult:
    """Human-in-the-loop tool."""
    answer = input(f'\n{question}\n> ')
    return ActionResult(
        extracted_content=f'Human answered: {answer}'
    )

Tool with Browser Access

Critical: The parameter must be named exactly browser_session with type BrowserSession. Parameter injection works by name matching.
from browser_use import BrowserSession, ActionResult

@tools.action('Get current page title deterministically')
async def get_title(browser_session: BrowserSession) -> ActionResult:
    """Access browser directly for deterministic actions."""
    page = await browser_session.get_current_page()
    title = await page.evaluate('document.title')
    return ActionResult(extracted_content=f'Title: {title}')

Tool with Multiple Injections

Available injectable parameters (line 77 in tools/service.py):
@tools.action('Advanced tool with all context')
async def advanced_tool(
    # Your parameters
    query: str,
    max_results: int = 10,
    
    # Injected by agent (name-based matching)
    browser_session: BrowserSession,
    file_system: FileSystem,
    page_extraction_llm: BaseChatModel,
    available_file_paths: list[str],
    has_sensitive_data: bool,
    sensitive_data: dict[str, str | dict[str, str]] | None,
    extraction_schema: dict | None,
) -> ActionResult:
    """Tool with full agent context access."""
    
    # Use browser
    state = await browser_session.get_browser_state_summary()
    
    # Use file system
    file_system.write_file('output.txt', 'data')
    
    # Use extraction LLM
    response = await page_extraction_llm.ainvoke([...])
    
    return ActionResult(extracted_content='Result')

Domain-Restricted Tools

@tools.action(
    'Login to example.com with credentials',
    allowed_domains=['*.example.com', 'auth.example.com']
)
async def login(username: str, password: str, browser_session: BrowserSession):
    """Only callable on example.com domains."""
    page = await browser_session.get_current_page()
    # ... login logic
    return ActionResult(extracted_content='Logged in successfully')

ActionResult Response

The ActionResult class structures tool responses:
from browser_use.agent.views import ActionResult

# Simple success
ActionResult(extracted_content="Found 5 products")

# With error
ActionResult(error="Failed to load page: timeout")

# Task completion
ActionResult(
    extracted_content="Task completed",
    is_done=True,
    success=True,
    attachments=["report.pdf", "data.csv"]
)

# With metadata
ActionResult(
    extracted_content="Clicked button",
    metadata={'click_x': 500, 'click_y': 300}
)

# With memory
ActionResult(
    extracted_content="Long result text...",
    long_term_memory="Short summary for agent memory",
    include_extracted_content_only_once=True
)
ActionResult Fields:
  • extracted_content: Main result text (shown to agent)
  • error: Error message if action failed
  • is_done: Mark task as complete
  • success: Whether task succeeded (for done action)
  • attachments: List of file paths
  • metadata: Additional structured data
  • long_term_memory: Summary for agent’s memory
  • include_extracted_content_only_once: Show full content once, use memory after

Parameter Injection System

The tools system automatically injects context based on parameter names:
# Parameter name -> Injected value
browser_session: BrowserSession  → Current browser session
file_system: FileSystem          → Agent's file system
page_extraction_llm: BaseChatModel → Extraction LLM
available_file_paths: list[str]  → Available files
has_sensitive_data: bool         → Whether sensitive data exists
sensitive_data: dict             → Sensitive data mapping
extraction_schema: dict          → Structured output schema
The injection happens at execution time, not registration. You don’t need to pass these values when registering tools.

Tools Registry

The Registry class manages action registration:
from browser_use.tools.registry.service import Registry

registry = Registry()

# Register action
@registry.action(
    description='My custom action',
    param_model=MyParamModel,
    terminates_sequence=False,  # Whether action ends multi-action sequence
)
async def my_action(params: MyParamModel) -> ActionResult:
    return ActionResult(extracted_content='Done')

# Get prompt description (for LLM)
prompt = registry.get_prompt_description(url='https://example.com')

# Create dynamic action model
ActionModel = registry.create_action_model(
    include_actions=['click', 'input', 'done'],
    exclude_actions=['search']
)

Excluding Default Actions

Remove actions you don’t need:
tools = Tools(exclude_actions=[
    'search',        # Don't allow web searches
    'wait',          # Don't allow waiting
    'screenshot',    # Don't allow screenshot requests
    'upload_file',   # Don't allow file uploads
])
Common exclusions:
  • screenshot: When use_vision != 'auto'
  • search: For domain-restricted tasks
  • upload_file: For read-only tasks

Coordinate Clicking

Enable coordinate-based clicking for advanced models:
tools = Tools()
tools.set_coordinate_clicking(True)

# Now agent can click by pixel coordinates
# Automatically enabled for: Claude Sonnet 4, Claude Opus 4, Gemini 3 Pro, Browser Use models
When enabled, click action accepts coordinates:
{
    "click": {
        "coordinate_x": 500,
        "coordinate_y": 300
    }
}

Structured Output Integration

Define expected output format:
from pydantic import BaseModel

class SearchResult(BaseModel):
    title: str
    url: str
    snippet: str

class SearchResults(BaseModel):
    results: list[SearchResult]
    total_count: int

tools = Tools(output_model=SearchResults)

# Agent now has a 'structured_output' action
# LLM will call it when task is complete:
# {
#     "structured_output": {
#         "results": [...],
#         "total_count": 10
#     }
# }

Real-World Examples

Human-in-the-Loop

from browser_use import Tools, Agent, ChatBrowserUse, ActionResult

tools = Tools()

@tools.action('Ask human for 2FA code')
async def get_2fa_code() -> ActionResult:
    """Get 2FA code from user."""
    code = input('Enter 2FA code: ')
    return ActionResult(extracted_content=f'2FA code: {code}')

agent = Agent(
    task="""
    1. Go to example.com/login
    2. Enter username and password
    3. Use get_2fa_code action to get 2FA code
    4. Complete login
    """,
    llm=ChatBrowserUse(),
    tools=tools,
)

API Integration

import httpx

@tools.action('Fetch product data from external API')
async def fetch_product_data(product_id: str) -> ActionResult:
    """Call external API for product information."""
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f'https://api.example.com/products/{product_id}'
        )
        data = response.json()
        
    return ActionResult(
        extracted_content=f"Product: {data['name']}, Price: ${data['price']}"
    )

Database Access

import asyncpg

@tools.action('Save scraped data to database')
async def save_to_db(
    product_name: str,
    price: float,
    rating: float
) -> ActionResult:
    """Store extracted data in PostgreSQL."""
    conn = await asyncpg.connect('postgresql://localhost/mydb')
    
    await conn.execute(
        'INSERT INTO products (name, price, rating) VALUES ($1, $2, $3)',
        product_name, price, rating
    )
    
    await conn.close()
    return ActionResult(extracted_content='Saved to database')

Deterministic Automation

from browser_use import BrowserSession, ActionResult

@tools.action('Fill form deterministically with Playwright-like API')
async def fill_complex_form(
    browser_session: BrowserSession
) -> ActionResult:
    """Use Actor API for precise form filling."""
    page = await browser_session.get_current_page()
    
    # Deterministic element selection and interaction
    await page.click('button[data-test="login"]')
    await page.wait_for_selector('input[name="email"]')
    await page.fill('input[name="email"]', 'user@example.com')
    await page.fill('input[name="password"]', 'secret123')
    await page.click('button[type="submit"]')
    
    # Wait for navigation
    await page.wait_for_url('**/dashboard')
    
    return ActionResult(extracted_content='Form filled and submitted')

Performance Considerations

Use search_page and find_elements for fast, LLM-free lookups:
# Instead of extract, use search_page for simple queries
{
    "search_page": {
        "pattern": "\\$\\d+\\.\\d{2}",  # Find prices
        "regex": true
    }
}

Troubleshooting

Check:
  • Tool description is clear and specific
  • Parameter types are correct
  • Tool isn’t excluded in Tools(exclude_actions=[…])
  • Domain restrictions don’t block current page
Verify:
  • Parameter name matches exactly (e.g., browser_session)
  • Type hint is correct (e.g., BrowserSession)
  • Parameter is available in current context
Ensure:
  • Return type is ActionResult or str
  • Don’t raise exceptions, return ActionResult(error=’…’)
  • Use proper field names (extracted_content, not content)

Next Steps

Available Tools

Complete list of default actions

Add Custom Tools

Detailed guide to creating tools

Tool Response

Advanced ActionResult patterns

Actor API

Playwright-like browser control