Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/browser-use/browser-use/llms.txt

Use this file to discover all available pages before exploring further.

The Agent class is the main entry point for Browser Use. It orchestrates the LLM, browser interactions, and tools to complete tasks autonomously.

Constructor

from browser_use import Agent, ChatBrowserUse, Browser

agent = Agent(
    task="Find the latest news about AI",
    llm=ChatBrowserUse(),
    browser=Browser(headless=False),
)

Parameters

task
str
required
The task description for the agent to complete.
llm
BaseChatModel
required
Language model instance. Defaults to ChatBrowserUse() if not provided. See Supported Models.
browser
Browser | None
Browser instance to use. If not provided, a new browser will be created with default settings.
browser_session
BrowserSession | None
Deprecated: Use browser parameter instead. Alias for backward compatibility.
tools
Tools[Context] | None
Registry of tools (actions) the agent can use. If not provided, default tools are loaded. See Tools.
controller
Tools[Context] | None
Deprecated: Use tools parameter instead. Alias for backward compatibility.

LLM Configuration

page_extraction_llm
BaseChatModel | None
Separate LLM for page content extraction. Use a smaller/faster model for efficiency. Defaults to main llm.
judge_llm
BaseChatModel | None
LLM for judging agent trace quality. Defaults to main llm.
fallback_llm
BaseChatModel | None
Fallback LLM to use if the primary LLM fails.

Vision & Screenshots

use_vision
bool | Literal['auto']
default:"True"
Vision mode:
  • True: Always include screenshots in LLM context
  • False: Never include screenshots, excludes screenshot tool
  • 'auto': Include screenshot tool but only use vision when requested
vision_detail_level
Literal['auto', 'low', 'high']
default:"'auto'"
Screenshot detail level for vision models.
llm_screenshot_size
tuple[int, int] | None
Target size (width, height) to resize screenshots before sending to LLM. Coordinates from LLM are automatically scaled back to original viewport size.

Skills Integration

skills
list[str | Literal['*']] | None
List of skill IDs to enable, or ['*'] for all skills. Skills are pre-built actions from the cloud.
skill_ids
list[str | Literal['*']] | None
Deprecated: Use skills parameter instead. Alias for backward compatibility.
skill_service
Any | None
Pre-configured skill service instance for advanced use cases.

Actions & Behavior

initial_actions
list[dict[str, dict[str, Any]]] | None
List of actions to execute before starting the main task (without LLM). Format: [{'action_name': {'param': value}}]
max_actions_per_step
int
default:"5"
Maximum actions the agent can output per step (e.g., for form filling).
max_failures
int
default:"5"
Maximum consecutive failures before stopping.
final_response_after_failure
bool
default:"True"
If True, agent attempts one final recovery call after reaching max_failures.
use_thinking
bool
default:"True"
Enable explicit reasoning steps in agent output.
flash_mode
bool
default:"False"
Fast mode that skips evaluation, planning, and thinking. Overrides use_thinking and enable_planning when enabled.
directly_open_url
bool
default:"True"
If True, automatically navigate to URLs detected in the task.

Planning

enable_planning
bool
default:"True"
Enable agent planning with step-by-step todo items.
planning_replan_on_stall
int
default:"3"
Number of consecutive failures before suggesting plan revision. Set to 0 to disable.
planning_exploration_limit
int
default:"5"
Number of steps without a plan before nudging agent to create one. Set to 0 to disable.

Loop Detection

loop_detection_enabled
bool
default:"True"
Enable detection of repetitive action patterns.
loop_detection_window
int
default:"20"
Rolling window size for tracking action similarity.

System Messages

override_system_message
str | None
Completely replace the default system prompt.
extend_system_message
str | None
Add additional instructions to the default system prompt.

File & Data Management

save_conversation_path
str | Path | None
Directory path to save conversation history.
save_conversation_path_encoding
str
default:"'utf-8'"
Encoding for saved conversations.
available_file_paths
list[str] | None
List of file paths the agent can access for upload actions.
file_system_path
str | None
Path for agent’s file system operations.
display_files_in_done_text
bool
default:"True"
Show file information in completion messages.
sensitive_data
dict[str, str | dict[str, str]] | None
Dictionary of sensitive data to handle securely. Format: {key: value} or {domain: {key: value}}.

Output Format

output_model_schema
type[AgentStructuredOutput] | None
Pydantic model class for structured output validation. See Custom Output.
extraction_schema
dict | None
JSON schema for data extraction. Auto-detected from output_model_schema if not provided.

Visual Output

generate_gif
bool | str
default:"False"
Generate GIF of agent actions. Set to True or a file path string.
include_attributes
list[str] | None
List of HTML attributes to include in DOM analysis.

Performance & Limits

max_history_items
int | None
Maximum number of recent steps to keep in LLM memory. None keeps all steps.
llm_timeout
int
default:"90"
Timeout in seconds for LLM calls. Auto-detected based on model.
step_timeout
int
default:"180"
Timeout in seconds for each agent step.
message_compaction
MessageCompactionSettings | bool | None
default:"True"
Compact old messages to reduce prompt size. Set to False to disable or provide MessageCompactionSettings for custom configuration.
max_clickable_elements_length
int
default:"40000"
Maximum characters for clickable elements in prompt.

Judge & Validation

use_judge
bool
default:"True"
Enable post-execution judge to validate task completion.
ground_truth
str | None
Ground truth answer for judge validation.

Cloud Callbacks

register_new_step_callback
Callable | None
Callback function called after each step. Signature: (BrowserStateSummary, AgentOutput, int) -> None | Awaitable[None]
register_done_callback
Callable | None
Callback function called when agent completes. Signature: (AgentHistoryList) -> None | Awaitable[None]
register_should_stop_callback
Callable[[], Awaitable[bool]] | None
Callback to check if agent should stop. Returns True to stop.
register_external_agent_status_raise_error_callback
Callable[[], Awaitable[bool]] | None
Callback to check external agent status. Raises InterruptedError if returns True.

Advanced Options

calculate_cost
bool
default:"False"
Calculate and track API token costs.
include_tool_call_examples
bool
default:"False"
Include tool usage examples in system prompt.
include_recent_events
bool
default:"False"
Include recent browser events in context.
sample_images
list[ContentPartTextParam | ContentPartImageParam] | None
Sample images to include in prompts for vision models.
demo_mode
bool | None
Enable demo mode with browser overlay UI.
task_id
str | None
Custom task ID. Auto-generated if not provided.
injected_agent_state
AgentState | None
Pre-existing agent state for resuming sessions.
source
str | None
Source identifier for telemetry.

Methods

run()

Execute the agent to complete the task.
history = await agent.run(max_steps=100)
max_steps
int
default:"100"
Maximum number of steps the agent can take.
return
AgentHistoryList
Complete execution history with results, screenshots, and metadata.

step()

Execute a single step of the task.
await agent.step()
step_info
AgentStepInfo | None
Optional step information including step number and max steps.

add_new_task()

Add a follow-up task to the agent.
agent.add_new_task("Now search for Python tutorials")
new_task
str
required
The new task description.

stop()

Stop the agent execution gracefully.
await agent.stop()

kill()

Force-stop the agent and clean up resources.
await agent.kill()

Properties

state

state
AgentState
Current agent state including step counter, failures, and internal state.

history

history
AgentHistoryList
Complete history of agent actions and results.

browser_session

browser_session
BrowserSession
The browser session instance being used.

tools

tools
Tools[Context]
The tools registry containing all available actions.

settings

settings
AgentSettings
Agent configuration settings.

Example Usage

import asyncio
from browser_use import Agent, Browser, ChatBrowserUse

async def main():
    # Create agent with custom configuration
    agent = Agent(
        task="Research AI news and save to file",
        llm=ChatBrowserUse(),
        browser=Browser(headless=False),
        max_actions_per_step=10,
        use_vision=True,
    )
    
    # Run the agent
    history = await agent.run(max_steps=50)
    
    # Check results
    if history.is_done():
        print(f"Task completed: {history.final_result()}")
        print(f"Success: {history.is_successful()}")
    
    # Access history
    print(f"URLs visited: {history.urls()}")
    print(f"Actions taken: {history.action_names()}")

if __name__ == "__main__":
    asyncio.run(main())

See Also