Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/browser-use/browser-use/llms.txt

Use this file to discover all available pages before exploring further.

Overview

ChatOllama provides integration with locally running Ollama models, enabling completely private and offline browser automation without sending data to external APIs.

Basic Usage

from browser_use import Agent, ChatOllama
import asyncio

async def main():
    llm = ChatOllama(model='llama3.2')
    agent = Agent(
        task="Find the number 1 post on Show HN",
        llm=llm,
    )
    await agent.run()

if __name__ == "__main__":
    asyncio.run(main())

Prerequisites

  1. Install Ollama: Download from ollama.com
  2. Pull a model: ollama pull llama3.2
  3. Start Ollama: It runs automatically after installation
# Pull recommended models
ollama pull llama3.2
ollama pull llama3.2:70b
ollama pull qwen2.5-coder:32b

# Verify Ollama is running
curl http://localhost:11434

Configuration

Required Parameters

model
str
required
Ollama model name. Popular options:
  • llama3.2: Fast and capable
  • llama3.2:70b: More powerful
  • qwen2.5-coder:32b: Great for web tasks
  • mistral: Alternative option
  • codellama: Coding focused

Client Parameters

host
str
default:"None"
Ollama server URL. Defaults to http://localhost:11434.
timeout
float
default:"None"
Request timeout in seconds.
client_params
dict
default:"None"
Additional parameters for the Ollama client.
ollama_options
Options
default:"None"
Ollama-specific options for model behavior.Common options:
  • temperature: Sampling temperature
  • num_predict: Max tokens to generate
  • top_k: Top-K sampling
  • top_p: Top-P sampling
  • repeat_penalty: Repetition penalty

Advanced Usage

Custom Ollama Host

from browser_use import Agent, ChatOllama

# Connect to remote Ollama instance
llm = ChatOllama(
    model='llama3.2',
    host='http://192.168.1.100:11434',
)

agent = Agent(task="Your task", llm=llm)

With Ollama Options

from browser_use import Agent, ChatOllama
from ollama import Options

llm = ChatOllama(
    model='llama3.2',
    ollama_options=Options(
        temperature=0.7,
        num_predict=2048,
        top_k=40,
        top_p=0.9,
        repeat_penalty=1.1,
    ),
)

agent = Agent(task="Your task", llm=llm)

Structured Output

from browser_use import Agent, ChatOllama
from pydantic import BaseModel

class SearchResult(BaseModel):
    title: str
    description: str
    url: str

llm = ChatOllama(model='llama3.2')

agent = Agent(
    task="Extract search result",
    llm=llm,
    output_model_schema=SearchResult,
)

result = await agent.run()
print(result.structured_output)  # SearchResult instance

Custom Timeout for Large Models

from browser_use import Agent, ChatOllama
import httpx

llm = ChatOllama(
    model='llama3.2:70b',
    timeout=300.0,  # 5 minutes for large model
)

agent = Agent(task="Complex task", llm=llm)

Using Dictionary Options

from browser_use import Agent, ChatOllama

llm = ChatOllama(
    model='qwen2.5-coder:32b',
    ollama_options={
        'temperature': 0.2,
        'num_predict': 4096,
        'top_p': 0.95,
    },
)

agent = Agent(task="Your task", llm=llm)

Setup Guide

macOS

# Install Ollama
brew install ollama

# Or download from ollama.com
curl -fsSL https://ollama.com/install.sh | sh

# Start service
ollama serve

# Pull model
ollama pull llama3.2

Linux

# Install
curl -fsSL https://ollama.com/install.sh | sh

# Start service (usually auto-starts)
sudo systemctl start ollama

# Pull model
ollama pull llama3.2

Windows

  1. Download installer from ollama.com
  2. Run installer
  3. Open terminal and run: ollama pull llama3.2

Docker

# Run Ollama in Docker
docker run -d \
  --name ollama \
  -p 11434:11434 \
  -v ollama:/root/.ollama \
  ollama/ollama

# Pull model
docker exec ollama ollama pull llama3.2

Error Handling

from browser_use import Agent, ChatOllama
from browser_use.llm.exceptions import ModelProviderError
import httpx

try:
    llm = ChatOllama(model='llama3.2')
    agent = Agent(task="Your task", llm=llm)
    result = await agent.run()
except ModelProviderError as e:
    print(f"Ollama error: {e.message}")
    print("Make sure Ollama is running: ollama serve")
except httpx.ConnectError:
    print("Cannot connect to Ollama. Is it running?")
    print("Start with: ollama serve")

Properties

provider

Returns the provider name: "ollama"
llm = ChatOllama(model='llama3.2')
print(llm.provider)  # "ollama"

name

Returns the model name.
llm = ChatOllama(model='llama3.2')
print(llm.name)  # "llama3.2"

Methods

get_client()

Returns an OllamaAsyncClient instance.
llm = ChatOllama(model='llama3.2')
client = llm.get_client()
# Use client directly for advanced operations

ainvoke()

Asynchronously invoke the model with messages.
from browser_use.llm.messages import SystemMessage, UserMessage

llm = ChatOllama(model='llama3.2')

messages = [
    SystemMessage(content="You are a helpful assistant"),
    UserMessage(content="What is Browser Use?")
]

response = await llm.ainvoke(messages)
print(response.completion)  # String response

Parameters

  • messages (list[BaseMessage]): List of messages
  • output_format (type[T] | None): Optional Pydantic model for structured output

Returns

ChatInvokeCompletion[T] | ChatInvokeCompletion[str] with:
  • completion: Response content (string or structured output)
  • usage: Currently None for Ollama (not tracked)
Ollama does not currently provide token usage information in responses.

For Speed

  • llama3.2 (8B): Fast, good quality
  • qwen2.5-coder (7B): Great for web tasks
  • mistral (7B): Balanced performance

For Quality

  • llama3.2:70b: Best quality, slower
  • qwen2.5-coder:32b: Excellent for browser automation
  • mixtral:8x7b: High quality mixture of experts

For Resource-Constrained

  • llama3.2:3b: Very fast on CPU
  • phi3: Microsoft’s efficient model
  • tinyllama: Minimal resource usage
# Check model sizes
ollama list

# Remove unused models
ollama rm model-name

Performance Tips

  1. GPU Acceleration: Ollama automatically uses GPU if available
  2. Model Size: Smaller models are faster but less capable
  3. num_predict: Limit output tokens for faster responses
  4. Preload Models: Models load faster after first use
# Optimize for speed
llm = ChatOllama(
    model='llama3.2',
    ollama_options={
        'num_predict': 512,  # Limit output length
        'num_ctx': 2048,     # Smaller context window
    },
)

Troubleshooting

Ollama Not Running

# Check if Ollama is running
curl http://localhost:11434

# Start Ollama
ollama serve

# Or on Linux with systemd
sudo systemctl start ollama
sudo systemctl status ollama

Model Not Found

# List installed models
ollama list

# Pull missing model
ollama pull llama3.2

Connection Refused

# Verify correct host
llm = ChatOllama(
    model='llama3.2',
    host='http://localhost:11434',  # Default
)

Slow Performance

# Use smaller model
ollama pull llama3.2:3b

# Check GPU usage
nvidia-smi  # For NVIDIA GPUs

# Reduce context size in options

Benefits of Ollama

  1. Privacy: All data stays on your machine
  2. No API Costs: Free to use
  3. Offline Capable: Works without internet
  4. Fast: Low latency on local hardware
  5. Customizable: Full control over models and parameters

Limitations

  1. No Usage Tracking: Token counts not available
  2. Hardware Dependent: Performance varies by hardware
  3. Model Quality: May not match GPT-4 or Claude for complex tasks
  4. Setup Required: Need to install and manage Ollama