Agent

The Agent class is the main entry point for Browser Use. It orchestrates the LLM, browser interactions, and tools to complete tasks autonomously.

Constructor

from browser_use import Agent, ChatBrowserUse, Browser

agent = Agent(
    task="Find the latest news about AI",
    llm=ChatBrowserUse(),
    browser=Browser(headless=False),
)

Parameters

task

str

required

The task description for the agent to complete.

llm

BaseChatModel

required

Language model instance. Defaults to ChatBrowserUse() if not provided. See Supported Models.

browser

Browser | None

Browser instance to use. If not provided, a new browser will be created with default settings.

browser_session

BrowserSession | None

Deprecated: Use browser parameter instead. Alias for backward compatibility.

tools

Tools[Context] | None

Registry of tools (actions) the agent can use. If not provided, default tools are loaded. See Tools.

controller

Tools[Context] | None

Deprecated: Use tools parameter instead. Alias for backward compatibility.

LLM Configuration

page_extraction_llm

BaseChatModel | None

Separate LLM for page content extraction. Use a smaller/faster model for efficiency. Defaults to main llm.

judge_llm

BaseChatModel | None

LLM for judging agent trace quality. Defaults to main llm.

fallback_llm

BaseChatModel | None

Fallback LLM to use if the primary LLM fails.

Vision & Screenshots

use_vision

bool | Literal['auto']

default:"True"

Vision mode:

True: Always include screenshots in LLM context
False: Never include screenshots, excludes screenshot tool
'auto': Include screenshot tool but only use vision when requested

vision_detail_level

Literal['auto', 'low', 'high']

default:"'auto'"

Screenshot detail level for vision models.

llm_screenshot_size

tuple[int, int] | None

Target size (width, height) to resize screenshots before sending to LLM. Coordinates from LLM are automatically scaled back to original viewport size.

Skills Integration

skills

list[str | Literal['*']] | None

List of skill IDs to enable, or ['*'] for all skills. Skills are pre-built actions from the cloud.

skill_ids

list[str | Literal['*']] | None

Deprecated: Use skills parameter instead. Alias for backward compatibility.

skill_service

Any | None

Pre-configured skill service instance for advanced use cases.

Actions & Behavior

initial_actions

list[dict[str, dict[str, Any]]] | None

List of actions to execute before starting the main task (without LLM). Format: [{'action_name': {'param': value}}]

max_actions_per_step

int

default:"5"

Maximum actions the agent can output per step (e.g., for form filling).

max_failures

int

default:"5"

Maximum consecutive failures before stopping.

final_response_after_failure

bool

default:"True"

If True, agent attempts one final recovery call after reaching max_failures.

use_thinking

bool

default:"True"

Enable explicit reasoning steps in agent output.

flash_mode

bool

default:"False"

Fast mode that skips evaluation, planning, and thinking. Overrides use_thinking and enable_planning when enabled.

directly_open_url

bool

default:"True"

If True, automatically navigate to URLs detected in the task.

Planning

enable_planning

bool

default:"True"

Enable agent planning with step-by-step todo items.

planning_replan_on_stall

int

default:"3"

Number of consecutive failures before suggesting plan revision. Set to 0 to disable.

planning_exploration_limit

int

default:"5"

Number of steps without a plan before nudging agent to create one. Set to 0 to disable.

Loop Detection

loop_detection_enabled

bool

default:"True"

Enable detection of repetitive action patterns.

loop_detection_window

int

default:"20"

Rolling window size for tracking action similarity.

System Messages

override_system_message

str | None

Completely replace the default system prompt.

extend_system_message

str | None

Add additional instructions to the default system prompt.

File & Data Management

save_conversation_path

str | Path | None

Directory path to save conversation history.

save_conversation_path_encoding

str

default:"'utf-8'"

Encoding for saved conversations.

available_file_paths

list[str] | None

List of file paths the agent can access for upload actions.

file_system_path

str | None

Path for agent’s file system operations.

display_files_in_done_text

bool

default:"True"

Show file information in completion messages.

sensitive_data

dict[str, str | dict[str, str]] | None

Dictionary of sensitive data to handle securely. Format: {key: value} or {domain: {key: value}}.

Output Format

output_model_schema

type[AgentStructuredOutput] | None

Pydantic model class for structured output validation. See Custom Output.

extraction_schema

dict | None

JSON schema for data extraction. Auto-detected from output_model_schema if not provided.

Visual Output

generate_gif

bool | str

default:"False"

Generate GIF of agent actions. Set to True or a file path string.

include_attributes

list[str] | None

List of HTML attributes to include in DOM analysis.

Performance & Limits

max_history_items

int | None

Maximum number of recent steps to keep in LLM memory. None keeps all steps.

llm_timeout

int

default:"90"

Timeout in seconds for LLM calls. Auto-detected based on model.

step_timeout

int

default:"180"

Timeout in seconds for each agent step.

message_compaction

MessageCompactionSettings | bool | None

default:"True"

Compact old messages to reduce prompt size. Set to False to disable or provide MessageCompactionSettings for custom configuration.

max_clickable_elements_length

int

default:"40000"

Maximum characters for clickable elements in prompt.

Judge & Validation

use_judge

bool

default:"True"

Enable post-execution judge to validate task completion.

ground_truth

str | None

Ground truth answer for judge validation.

Cloud Callbacks

register_new_step_callback

Callable | None

Callback function called after each step. Signature: (BrowserStateSummary, AgentOutput, int) -> None | Awaitable[None]

register_done_callback

Callable | None

Callback function called when agent completes. Signature: (AgentHistoryList) -> None | Awaitable[None]

register_should_stop_callback

Callable[[], Awaitable[bool]] | None

Callback to check if agent should stop. Returns True to stop.

register_external_agent_status_raise_error_callback

Callable[[], Awaitable[bool]] | None

Callback to check external agent status. Raises InterruptedError if returns True.

Advanced Options

calculate_cost

bool

default:"False"

Calculate and track API token costs.

include_tool_call_examples

bool

default:"False"

Include tool usage examples in system prompt.

include_recent_events

bool

default:"False"

Include recent browser events in context.

sample_images

list[ContentPartTextParam | ContentPartImageParam] | None

Sample images to include in prompts for vision models.

demo_mode

bool | None

Enable demo mode with browser overlay UI.

task_id

str | None

Custom task ID. Auto-generated if not provided.

injected_agent_state

AgentState | None

Pre-existing agent state for resuming sessions.

source

str | None

Source identifier for telemetry.

Methods

run()

Execute the agent to complete the task.

history = await agent.run(max_steps=100)

max_steps

int

default:"100"

Maximum number of steps the agent can take.

return

AgentHistoryList

Complete execution history with results, screenshots, and metadata.

step()

Execute a single step of the task.

await agent.step()

step_info

AgentStepInfo | None

Optional step information including step number and max steps.

add_new_task()

Add a follow-up task to the agent.

agent.add_new_task("Now search for Python tutorials")

new_task

str

required

The new task description.

stop()

Stop the agent execution gracefully.

await agent.stop()

kill()

Force-stop the agent and clean up resources.

await agent.kill()

Properties

state

AgentState

Current agent state including step counter, failures, and internal state.

history

AgentHistoryList

Complete history of agent actions and results.

browser_session

BrowserSession

The browser session instance being used.

tools

Tools[Context]

The tools registry containing all available actions.

settings

AgentSettings

Agent configuration settings.

Example Usage

import asyncio
from browser_use import Agent, Browser, ChatBrowserUse

async def main():
    # Create agent with custom configuration
    agent = Agent(
        task="Research AI news and save to file",
        llm=ChatBrowserUse(),
        browser=Browser(headless=False),
        max_actions_per_step=10,
        use_vision=True,
    )
    
    # Run the agent
    history = await agent.run(max_steps=50)
    
    # Check results
    if history.is_done():
        print(f"Task completed: {history.final_result()}")
        print(f"Success: {history.is_successful()}")
    
    # Access history
    print(f"URLs visited: {history.urls()}")
    print(f"Actions taken: {history.action_names()}")

if __name__ == "__main__":
    asyncio.run(main())

Core Classes

LLM Providers

Actions

Configuration

Constructor

Parameters

LLM Configuration

Vision & Screenshots

Skills Integration

Actions & Behavior

Planning

Loop Detection

System Messages

File & Data Management

Output Format

Visual Output

Performance & Limits

Judge & Validation

Cloud Callbacks

Advanced Options

Methods

run()

step()

add_new_task()

stop()

kill()

Properties

state

history

browser_session

tools

settings

Example Usage

See Also

Core Classes

LLM Providers

Actions

Configuration

Documentation Index

​Constructor

​Parameters

​LLM Configuration

​Vision & Screenshots

​Skills Integration

​Actions & Behavior

​Planning

​Loop Detection

​System Messages

​File & Data Management

​Output Format

​Visual Output

​Performance & Limits

​Judge & Validation

​Cloud Callbacks

​Advanced Options

​Methods

​run()

​step()

​add_new_task()

​stop()

​kill()

​Properties

​state

​history

​browser_session

​tools

​settings

​Example Usage

​See Also

Constructor

Parameters

LLM Configuration

Vision & Screenshots

Skills Integration

Actions & Behavior

Planning

Loop Detection

System Messages

File & Data Management

Output Format

Visual Output

Performance & Limits

Judge & Validation

Cloud Callbacks

Advanced Options

Methods

run()

step()

add_new_task()

stop()

kill()

Properties

state

history

browser_session

tools

settings

Example Usage

See Also