Documentation Index Fetch the complete documentation index at: https://mintlify.com/browser-use/browser-use/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Tools system is the bridge between the LLM and browser actions. It provides a registry of actions the agent can perform, handles parameter validation, and manages action execution. Each tool is a function that the LLM can call to interact with the web.
Architecture
Registry Centralized catalog of available actions
Action Models Pydantic models for type-safe parameters
Parameter Injection Automatic dependency injection from context
Result Handling Structured responses with ActionResult
The Tools class (line 345 in tools/service.py) manages the action registry:
class Tools (Generic[Context]):
def __init__ (
self ,
exclude_actions : list[ str ] | None = None ,
output_model : type[T] | None = None ,
display_files_in_done_text : bool = True ,
):
self .registry = Registry[Context](exclude_actions or [])
self ._output_model = output_model
self ._coordinate_clicking_enabled = False
from browser_use import Tools
# Default tools (all actions)
tools = Tools()
# Exclude specific actions
tools = Tools( exclude_actions = [ 'search' , 'wait' ])
# With structured output model
from pydantic import BaseModel
class ProductInfo ( BaseModel ):
name: str
price: float
rating: float
tools = Tools( output_model = ProductInfo)
Default Browser Actions
Navigation Actions
Search the web using a search engine # LLM can call:
{
"search" : {
"query" : "browser automation python" ,
"engine" : "duckduckgo" # or "google", "bing"
}
}
Implementation (line 362 in tools/service.py):
Encodes query for URL safety
Constructs search URL for specified engine
Navigates to search results
Returns: "Searched {engine} for '{query}'"
Navigate to a specific URL # LLM can call:
{
"navigate" : {
"url" : "https://example.com" ,
"new_tab" : false
}
}
Implementation (line 406):
Validates URL format
Dispatches NavigateToUrlEvent
Handles network errors gracefully
Returns: "Navigated to {url}" or error
Go back in browser history # LLM can call:
{
"go_back" : {}
}
Implementation (line 454):
Simple history navigation
Returns: "Navigated back"
Interaction Actions
click
input
scroll
send_keys
Click an element by index or coordinate # By index (element from selector_map)
{
"click" : {
"index" : 42
}
}
# By coordinate (advanced models only)
{
"click" : {
"coordinate_x" : 500 ,
"coordinate_y" : 300
}
}
Implementation (line 565 for index, line 521 for coordinate):
Looks up element from selector_map
Highlights element visually
Detects if click opens new tab
Handles special cases (dropdowns, file inputs)
Returns: "Clicked {element_description}"
Coordinate clicking is auto-enabled for Claude Sonnet 4, Claude Opus 4, Gemini 3 Pro, and Browser Use models.
Type text into an input field {
"input" : {
"index" : 15 ,
"text" : "Hello world" ,
"clear" : true # Clear existing text first
}
}
Implementation (line 635):
Validates element is an input field
Clears field if clear=true
Types text character by character
Detects autocomplete fields
Handles sensitive data masking
Returns: "Typed '{text}'" or "Typed <sensitive>"
Autocomplete fields automatically wait 400ms for dropdown suggestions to appear.
Scroll page or element {
"scroll" : {
"down" : true, # or false for up
"pages" : 1.5 , # scroll amount
"index" : null # null for page, or element index
}
}
Implementation (line 1237):
Auto-detects viewport height
Scrolls by full pages (viewport height)
Supports fractional pages
Can scroll specific elements
Returns: "Scrolled {direction} {pages} pages"
Send keyboard keys {
"send_keys" : {
"keys" : "Tab Tab Enter" # Space-separated keys
}
}
Supported keys: Enter, Tab, Escape, ArrowUp, ArrowDown, ArrowLeft, ArrowRight, Backspace, Delete Returns: "Sent keys: {keys}"
Tab Management
Switch to another tab {
"switch" : {
"tab_id" : "a1b2" # Last 4 chars of target_id
}
}
Returns: "Switched to tab #{tab_id}" Close a browser tab {
"close" : {
"tab_id" : "a1b2"
}
}
Returns: "Closed tab #{tab_id}"
File Operations
upload_file
write_file
read_file
Upload file to input[type=file] {
"upload_file" : {
"index" : 10 , # File input element index
"path" : "/path/to/file.pdf"
}
}
Implementation (line 721):
Validates file exists and has content
Finds file input near selected element
Falls back to closest file input to scroll position
Returns: "Successfully uploaded file"
Files must be in available_file_paths parameter when creating the agent.
Write content to file {
"write_file" : {
"path" : "output.txt" ,
"content" : "Hello world"
}
}
Uses agent’s FileSystem service to create files in workspace. Read file contents {
"read_file" : {
"path" : "data.txt"
}
}
Reads from agent’s FileSystem workspace.
dropdown_options
select_dropdown
Get dropdown options {
"dropdown_options" : {
"index" : 8 # Select element index
}
}
Returns all options with values and text. Select dropdown option {
"select_dropdown" : {
"index" : 8 ,
"value" : "option-value" # or text
}
}
Selects option by value or visible text.
Completion
Complete the task {
"done" : {
"success" : true,
"text" : "Successfully extracted 5 products" ,
"attachments" : [ "products.csv" ] # Optional file paths
}
}
Signals task completion with final output. Request screenshot in next browser state Only available when use_vision='auto'. Requests a screenshot for visual confirmation.
from browser_use import Tools, ActionResult
tools = Tools()
@tools.action ( 'Ask human for help with a question' )
async def ask_human ( question : str ) -> ActionResult:
"""Human-in-the-loop tool."""
answer = input ( f ' \n { question } \n > ' )
return ActionResult(
extracted_content = f 'Human answered: { answer } '
)
Critical : The parameter must be named exactly browser_session with type BrowserSession. Parameter injection works by name matching .
from browser_use import BrowserSession, ActionResult
@tools.action ( 'Get current page title deterministically' )
async def get_title ( browser_session : BrowserSession) -> ActionResult:
"""Access browser directly for deterministic actions."""
page = await browser_session.get_current_page()
title = await page.evaluate( 'document.title' )
return ActionResult( extracted_content = f 'Title: { title } ' )
Available injectable parameters (line 77 in tools/service.py):
@tools.action ( 'Advanced tool with all context' )
async def advanced_tool (
# Your parameters
query : str ,
max_results : int = 10 ,
# Injected by agent (name-based matching)
browser_session : BrowserSession,
file_system : FileSystem,
page_extraction_llm : BaseChatModel,
available_file_paths : list[ str ],
has_sensitive_data : bool ,
sensitive_data : dict[ str , str | dict[ str , str ]] | None ,
extraction_schema : dict | None ,
) -> ActionResult:
"""Tool with full agent context access."""
# Use browser
state = await browser_session.get_browser_state_summary()
# Use file system
file_system.write_file( 'output.txt' , 'data' )
# Use extraction LLM
response = await page_extraction_llm.ainvoke([ ... ])
return ActionResult( extracted_content = 'Result' )
Domain-Restricted Tools
@tools.action (
'Login to example.com with credentials' ,
allowed_domains = [ '*.example.com' , 'auth.example.com' ]
)
async def login ( username : str , password : str , browser_session : BrowserSession):
"""Only callable on example.com domains."""
page = await browser_session.get_current_page()
# ... login logic
return ActionResult( extracted_content = 'Logged in successfully' )
ActionResult Response
The ActionResult class structures tool responses:
from browser_use.agent.views import ActionResult
# Simple success
ActionResult( extracted_content = "Found 5 products" )
# With error
ActionResult( error = "Failed to load page: timeout" )
# Task completion
ActionResult(
extracted_content = "Task completed" ,
is_done = True ,
success = True ,
attachments = [ "report.pdf" , "data.csv" ]
)
# With metadata
ActionResult(
extracted_content = "Clicked button" ,
metadata = { 'click_x' : 500 , 'click_y' : 300 }
)
# With memory
ActionResult(
extracted_content = "Long result text..." ,
long_term_memory = "Short summary for agent memory" ,
include_extracted_content_only_once = True
)
ActionResult Fields:
extracted_content: Main result text (shown to agent)
error: Error message if action failed
is_done: Mark task as complete
success: Whether task succeeded (for done action)
attachments: List of file paths
metadata: Additional structured data
long_term_memory: Summary for agent’s memory
include_extracted_content_only_once: Show full content once, use memory after
Parameter Injection System
The tools system automatically injects context based on parameter names:
# Parameter name -> Injected value
browser_session: BrowserSession → Current browser session
file_system: FileSystem → Agent 's file system
page_extraction_llm: BaseChatModel → Extraction LLM
available_file_paths: list[ str ] → Available files
has_sensitive_data: bool → Whether sensitive data exists
sensitive_data: dict → Sensitive data mapping
extraction_schema: dict → Structured output schema
The injection happens at execution time, not registration. You don’t need to pass these values when registering tools.
The Registry class manages action registration:
from browser_use.tools.registry.service import Registry
registry = Registry()
# Register action
@registry.action (
description = 'My custom action' ,
param_model = MyParamModel,
terminates_sequence = False , # Whether action ends multi-action sequence
)
async def my_action ( params : MyParamModel) -> ActionResult:
return ActionResult( extracted_content = 'Done' )
# Get prompt description (for LLM)
prompt = registry.get_prompt_description( url = 'https://example.com' )
# Create dynamic action model
ActionModel = registry.create_action_model(
include_actions = [ 'click' , 'input' , 'done' ],
exclude_actions = [ 'search' ]
)
Excluding Default Actions
Remove actions you don’t need:
tools = Tools( exclude_actions = [
'search' , # Don't allow web searches
'wait' , # Don't allow waiting
'screenshot' , # Don't allow screenshot requests
'upload_file' , # Don't allow file uploads
])
Common exclusions:
screenshot: When use_vision != 'auto'
search: For domain-restricted tasks
upload_file: For read-only tasks
Coordinate Clicking
Enable coordinate-based clicking for advanced models:
tools = Tools()
tools.set_coordinate_clicking( True )
# Now agent can click by pixel coordinates
# Automatically enabled for: Claude Sonnet 4, Claude Opus 4, Gemini 3 Pro, Browser Use models
When enabled, click action accepts coordinates:
{
"click" : {
"coordinate_x" : 500 ,
"coordinate_y" : 300
}
}
Structured Output Integration
Define expected output format:
from pydantic import BaseModel
class SearchResult ( BaseModel ):
title: str
url: str
snippet: str
class SearchResults ( BaseModel ):
results: list[SearchResult]
total_count: int
tools = Tools( output_model = SearchResults)
# Agent now has a 'structured_output' action
# LLM will call it when task is complete:
# {
# "structured_output": {
# "results": [...],
# "total_count": 10
# }
# }
Real-World Examples
Human-in-the-Loop
from browser_use import Tools, Agent, ChatBrowserUse, ActionResult
tools = Tools()
@tools.action ( 'Ask human for 2FA code' )
async def get_2fa_code () -> ActionResult:
"""Get 2FA code from user."""
code = input ( 'Enter 2FA code: ' )
return ActionResult( extracted_content = f '2FA code: { code } ' )
agent = Agent(
task = """
1. Go to example.com/login
2. Enter username and password
3. Use get_2fa_code action to get 2FA code
4. Complete login
""" ,
llm = ChatBrowserUse(),
tools = tools,
)
API Integration
import httpx
@tools.action ( 'Fetch product data from external API' )
async def fetch_product_data ( product_id : str ) -> ActionResult:
"""Call external API for product information."""
async with httpx.AsyncClient() as client:
response = await client.get(
f 'https://api.example.com/products/ { product_id } '
)
data = response.json()
return ActionResult(
extracted_content = f "Product: { data[ 'name' ] } , Price: $ { data[ 'price' ] } "
)
Database Access
import asyncpg
@tools.action ( 'Save scraped data to database' )
async def save_to_db (
product_name : str ,
price : float ,
rating : float
) -> ActionResult:
"""Store extracted data in PostgreSQL."""
conn = await asyncpg.connect( 'postgresql://localhost/mydb' )
await conn.execute(
'INSERT INTO products (name, price, rating) VALUES ($1, $2, $3)' ,
product_name, price, rating
)
await conn.close()
return ActionResult( extracted_content = 'Saved to database' )
Deterministic Automation
from browser_use import BrowserSession, ActionResult
@tools.action ( 'Fill form deterministically with Playwright-like API' )
async def fill_complex_form (
browser_session : BrowserSession
) -> ActionResult:
"""Use Actor API for precise form filling."""
page = await browser_session.get_current_page()
# Deterministic element selection and interaction
await page.click( 'button[data-test="login"]' )
await page.wait_for_selector( 'input[name="email"]' )
await page.fill( 'input[name="email"]' , 'user@example.com' )
await page.fill( 'input[name="password"]' , 'secret123' )
await page.click( 'button[type="submit"]' )
# Wait for navigation
await page.wait_for_url( '**/dashboard' )
return ActionResult( extracted_content = 'Form filled and submitted' )
Troubleshooting
Parameter injection fails
Verify:
Parameter name matches exactly (e.g., browser_session)
Type hint is correct (e.g., BrowserSession)
Parameter is available in current context
Ensure:
Return type is ActionResult or str
Don’t raise exceptions, return ActionResult(error=’…’)
Use proper field names (extracted_content, not content)
Next Steps
Available Tools Complete list of default actions
Add Custom Tools Detailed guide to creating tools
Tool Response Advanced ActionResult patterns
Actor API Playwright-like browser control