🌐 Web-Use

Web-Use is an intelligent autonomous browsing agent, built to seamlessly navigate websites, interact with dynamic content, perform smart searches, download files, and adapt to ever-changing pages — all with minimal effort from you. Powered by advanced LLMs and the Chrome DevTools Protocol, it transforms complex web tasks into streamlined, automated workflows that boost productivity and save time.

✨Key Features

🤖 Autonomous Web Navigation — Navigate websites, fill forms, and interact with dynamic content without manual intervention
🛠️ Multi-LLM Support — Works with Anthropic Claude, Google Gemini, OpenAI, Groq, Ollama, Cerebras, Mistral, and more
📸 Vision Capability — Understands visual content on pages for better decision-making
🔗 Web Model Context Protocol (WebMCP) — Discovers and uses custom tools exposed by websites, enabling context-aware interactions
⚡ Efficient Element Interaction — Indexed DOM elements for fast, accurate clicking and typing
📥 File Operations — Download files and upload content to forms
🔄 State Awareness — Maintains understanding of page state to avoid loops and recover from errors
⏱️ Intelligent Waiting — Handles loading states, animations, and user interactions (CAPTCHA, OTP)

🌐 Web Model Context Protocol (WebMCP)

Web-Use supports WebMCP, a protocol that allows websites to expose custom tools and capabilities directly to the agent. When visiting a website with WebMCP support:

Auto-Discovery — The agent automatically detects available tools
Dynamic Registration — Tools are added to the agent's toolkit on-the-fly
Full Integration — WebMCP tools appear in the browser state with complete schema information
Seamless Execution — Tools are called like built-in tools with proper parameter validation

Example

If you visit a documentation site that supports WebMCP with a search_docs tool:

**WebMCP Tools Available:**
**search_docs** — Search documentation
  - `query` (string) [✓ required]
  - `limit` (integer) [○ optional]

The agent will automatically use this tool when relevant to the task.

Enable WebMCP support:

agent = Agent(
    config=config,
    llm=llm,
    use_web_mcp=True,  # Enable WebMCP discovery
    max_steps=100
)

🛠️Installation Guide

Prerequisites

Python 3.11 or higher
UV

Installation Steps

Clone the repository:

git clone https://github.com/CursorTouch/Web-Use.git
cd Web-Use

Install dependencies:

uv sync

Launch Chrome with remote debugging:

chrome --remote-debugging-port=9222

Setting up the .env file:

GOOGLE_API_KEY="<API_KEY_HERE>"

Basic Setup:

from src.agent.browser.config import BrowserConfig
from src.providers.ollama import ChatOllama
from src.agent import Agent
from dotenv import load_dotenv

load_dotenv()

# Initialize LLM
llm = ChatOllama(model='qwen3.5:397b-cloud', temperature=0.5)

# Configure browser
config = BrowserConfig(
    browser='chrome',
    headless=False,
    use_system_profile=True
)

# Create agent with WebMCP support
agent = Agent(
    config=config,
    llm=llm,
    use_vision=False,
    use_web_mcp=True,  # Enable WebMCP for website tools
    max_steps=100
)

# Run agent
user_query = input('Enter your query: ')
agent.print_response(user_query)

Execute:

uv run main.py

⚙️Configuration Options

Agent Parameters

Parameter	Type	Default	Description
`config`	BrowserConfig	Required	Browser configuration (headless, profile, etc.)
`llm`	BaseChatLLM	Required	Language model to use for reasoning
`use_vision`	bool	False	Enable screenshot-based visual understanding
`use_web_mcp`	bool	False	NEW: Enable Web Model Context Protocol to discover website tools
`max_steps`	int	25	Maximum number of actions before timeout
`max_consecutive_failures`	int	3	Retry limit for failed tool calls
`include_human_in_loop`	bool	False	Allow pausing for human input
`keep_alive`	bool	False	Keep browser open after task completion

Browser Configuration

config = BrowserConfig(
    browser='chrome',              # 'chrome' or 'edge'
    headless=False,                # Run in headless mode
    use_system_profile=True,       # Use real browser profile with auth
    user_data_dir='/path/to/profile',  # Custom profile directory
    cdp_port=9222,                 # Chrome DevTools Protocol port
    downloads_dir='/Downloads',    # Where to save files
    attach_to_existing=False,      # Connect to running browser
    update_cdp=False,              # Regenerate CDP protocol files
)

🎥Demos

Prompt: I want to know the price details of the RTX 4060 laptop gpu from varrious sellers from amazon.in

Amazon.mov

Prompt: Make a twitter post about AI on X

Twitter.mov

Prompt: Can you play the trailer of GTA 6 on youtube

Youtube.mov

Prompt: Can you go to my github account and visit the Windows MCP

Github.mov

🪪License

This project is licensed under MIT License - see the LICENSE file for details.

🤝Contributing

Contributions are welcome! Please see CONTRIBUTING for setup instructions and development guidelines.

Made with ❤️ by Jeomon George, Muhammad Yaseen

Version	Changes	Urgency	Date
v0.3	## What's New in v0.3 ### New Features - PDF support in `scrape_tool` — extract content from PDF pages directly; specify individual pages with `pages=[1,5,10]` - OAuth 2.0 + PKCE authentication — built-in OAuth flow for sites that require it - WebMCP integration — agents can discover and call custom tools exposed by websites via the WebMCP protocol - Loop detection — `LoopGuard` detects page cycles and repeated failed retries, with prompt rules to break out automatically - **`k	High	4/24/2026
v0.2	## Feature - Improved the grounding to handle more corner cases. ## Fix - Fixed the bug that causes stucking in the pages of pdf or blank pages. - Removed redundant parts in the agent implementation	Low	7/7/2025
v0.1	### Key Features & Updates * Dual Agent Modes: Supports both non-vision and vision-based agent operation (to support both LLM and VLM). * Scrollable vs. Interactive Elements: A clear separation improves DOM recognition and interaction. * Scrolling Logic: Enables scrolling through distinct webpage sections, including nested containers. * HTML → Markdown: Upgraded to `markdownify` in the `Scrape Tool` for better content conversion. * Tab Management: Tracks the nu	Low	6/17/2025

Web-Use

Description

README

🌐 Web-Use

✨Key Features

🌐 Web Model Context Protocol (WebMCP)

Example

🛠️Installation Guide

Prerequisites

Installation Steps

⚙️Configuration Options

Agent Parameters

Browser Configuration

🎥Demos

🪪License

🤝Contributing

📒References

Release History

Dependencies & License Audit

Similar Packages

More from CursorTouch

More in AI Agents