freshcrate
Home > AI Agents > Web-Use

Web-Use

Web-Use is a CDP powered Browser Agent

Description

Web-Use is a CDP powered Browser Agent

README

🌐 Web-Use

License Python Powered by CDP
Follow on Twitter Join us on DiscordWeb-Use is an intelligent autonomous browsing agent, built to seamlessly navigate websites, interact with dynamic content, perform smart searches, download files, and adapt to ever-changing pages — all with minimal effort from you. Powered by advanced LLMs and the Chrome DevTools Protocol, it transforms complex web tasks into streamlined, automated workflows that boost productivity and save time.

  • 🤖 Autonomous Web Navigation — Navigate websites, fill forms, and interact with dynamic content without manual intervention
  • 🛠️ Multi-LLM Support — Works with Anthropic Claude, Google Gemini, OpenAI, Groq, Ollama, Cerebras, Mistral, and more
  • 📸 Vision Capability — Understands visual content on pages for better decision-making
  • 🔗 Web Model Context Protocol (WebMCP) — Discovers and uses custom tools exposed by websites, enabling context-aware interactions
  • ⚡ Efficient Element Interaction — Indexed DOM elements for fast, accurate clicking and typing
  • 📥 File Operations — Download files and upload content to forms
  • 🔄 State Awareness — Maintains understanding of page state to avoid loops and recover from errors
  • ⏱️ Intelligent Waiting — Handles loading states, animations, and user interactions (CAPTCHA, OTP)

🌐 Web Model Context Protocol (WebMCP)

Web-Use supports WebMCP, a protocol that allows websites to expose custom tools and capabilities directly to the agent. When visiting a website with WebMCP support:

  • Auto-Discovery — The agent automatically detects available tools
  • Dynamic Registration — Tools are added to the agent's toolkit on-the-fly
  • Full Integration — WebMCP tools appear in the browser state with complete schema information
  • Seamless Execution — Tools are called like built-in tools with proper parameter validation

Example

If you visit a documentation site that supports WebMCP with a search_docs tool:

**WebMCP Tools Available:**
**search_docs** — Search documentation
  - `query` (string) [✓ required]
  - `limit` (integer) [○ optional]

The agent will automatically use this tool when relevant to the task.

Enable WebMCP support:

agent = Agent(
    config=config,
    llm=llm,
    use_web_mcp=True,  # Enable WebMCP discovery
    max_steps=100
)

🛠️Installation Guide

Prerequisites

  • Python 3.11 or higher
  • UV

Installation Steps

Clone the repository:

git clone https://github.com/CursorTouch/Web-Use.git
cd Web-Use

Install dependencies:

uv sync

Launch Chrome with remote debugging:

chrome --remote-debugging-port=9222

Setting up the .env file:

GOOGLE_API_KEY="<API_KEY_HERE>"

Basic Setup:

from src.agent.browser.config import BrowserConfig
from src.providers.ollama import ChatOllama
from src.agent import Agent
from dotenv import load_dotenv

load_dotenv()

# Initialize LLM
llm = ChatOllama(model='qwen3.5:397b-cloud', temperature=0.5)

# Configure browser
config = BrowserConfig(
    browser='chrome',
    headless=False,
    use_system_profile=True
)

# Create agent with WebMCP support
agent = Agent(
    config=config,
    llm=llm,
    use_vision=False,
    use_web_mcp=True,  # Enable WebMCP for website tools
    max_steps=100
)

# Run agent
user_query = input('Enter your query: ')
agent.print_response(user_query)

Execute:

uv run main.py

⚙️Configuration Options

Agent Parameters

Parameter Type Default Description
config BrowserConfig Required Browser configuration (headless, profile, etc.)
llm BaseChatLLM Required Language model to use for reasoning
use_vision bool False Enable screenshot-based visual understanding
use_web_mcp bool False NEW: Enable Web Model Context Protocol to discover website tools
max_steps int 25 Maximum number of actions before timeout
max_consecutive_failures int 3 Retry limit for failed tool calls
include_human_in_loop bool False Allow pausing for human input
keep_alive bool False Keep browser open after task completion

Browser Configuration

config = BrowserConfig(
    browser='chrome',              # 'chrome' or 'edge'
    headless=False,                # Run in headless mode
    use_system_profile=True,       # Use real browser profile with auth
    user_data_dir='/path/to/profile',  # Custom profile directory
    cdp_port=9222,                 # Chrome DevTools Protocol port
    downloads_dir='/Downloads',    # Where to save files
    attach_to_existing=False,      # Connect to running browser
    update_cdp=False,              # Regenerate CDP protocol files
)

🎥Demos

Prompt: I want to know the price details of the RTX 4060 laptop gpu from varrious sellers from amazon.in

Amazon.mov

Prompt: Make a twitter post about AI on X

Twitter.mov

Prompt: Can you play the trailer of GTA 6 on youtube

Youtube.mov

Prompt: Can you go to my github account and visit the Windows MCP

Github.mov

🪪License

This project is licensed under MIT License - see the LICENSE file for details.

🤝Contributing

Contributions are welcome! Please see CONTRIBUTING for setup instructions and development guidelines.

Made with ❤️ by Jeomon George, Muhammad Yaseen


📒References

Release History

VersionChangesUrgencyDate
v0.2## Feature - Improved the grounding to handle more corner cases. ## Fix - Fixed the bug that causes stucking in the pages of pdf or blank pages. - Removed redundant parts in the agent implementation Low7/7/2025
v0.1### Key Features & Updates * **Dual Agent Modes**: Supports both **non-vision** and **vision-based** agent operation (to support both LLM and VLM). * **Scrollable vs. Interactive Elements**: A clear separation improves DOM recognition and interaction. * **Scrolling Logic**: Enables scrolling through distinct webpage sections, including nested containers. * **HTML → Markdown**: Upgraded to `markdownify` in the `Scrape Tool` for better content conversion. * **Tab Management**: Tracks the nuLow6/17/2025

Dependencies & License Audit

Loading dependencies...

Similar Packages

SurfSenseAn open source, privacy focused alternative to NotebookLM for teams with no data limit's. Join our Discord: https://discord.gg/ejRNvftDp9v0.0.19
MaiBotMaiSaka, an LLM-based intelligent agent, is a digital lifeform devoted to understanding you and interacting in the style of a real human. She does not pursue perfection, nor does she seek efficiency; 1.0.0-pre.4
daily_stock_analysisLLM驱动的 A/H/美股智能分析器:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.v3.13.0
hermes-gate🏛️ Hermes Gate — Terminal TUI for managing remote Hermes Agent sessions with auto-reconnect, detach support, and zero config0.0.0
scraping-browser🔍 Automate dynamic web scraping with Scraping Browser, a full-host solution using Puppeteer, Selenium, and Playwright for seamless data collection.main@2026-04-21