# cortex-scout

> A unified web extraction and stateful automation engine for AI. Replaces heavy testing frameworks with token-optimized browser control, deep research, and HITL.

- **URL**: https://www.freshcrate.ai/projects/cortex-scout
- **Author**: cortex-works
- **Category**: MCP Servers
- **Latest version**: `v3.3.7` (2026-04-10)
- **License**: MIT
- **Source**: https://github.com/cortex-works/cortex-scout
- **Language**: Rust
- **GitHub**: 63 stars, 6 forks
- **Registry**: github
- **Tags**: `ai-agents`, `anti-bot-bypass`, `automated-testing`, `dev-tools`, `knowledge-graph`, `llm-tools`, `mcp`, `model-context-protocol`, `rust`

## Description

A unified web extraction and stateful automation engine for AI. Replaces heavy testing frameworks with token-optimized browser control, deep research, and HITL.

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `v3.3.7` | 2026-04-10 | High | ### Changed - Updated MCP setup guidance so hard timeout env vars are treated as required guardrails in copy-pasteable MCP config examples, not optional tuning.  ### Fixed - Fixed MCP tool sessions getting stuck on pathological fetches by enforcing hard per-tool timeouts in both HTTP and stdio dispatch, plus bounded timeouts for expensive scrape stages and browser launch/probe paths. - Fixed `web_fetch` on the LLVM `CodeGenerator.html` path by removing UTF-8-unsafe string slicing in the cleaner, |
| `v3.3.6` | 2026-04-09 | High | ### Changed - Updated MCP tool descriptions and agent guidance to clarify that proxy use is optional by default, balanced fetches stay on the non-proxy/native path unless blocking signals appear, and all tool responses now expose timing information. - `web_fetch` balanced-mode strategy now prefers the fast native HTTP path on normal server-rendered pages such as GitHub, only escalating into CDP earlier for proxy mode, high/aggressive mode, or known JS-heavy/problematic hosts.  ### Added - Added |
| `v3.3.5` | 2026-04-09 | Medium | ### Changed - Updated agent guidance to prefer direct MCP-tool validation on realistic public URLs after rebuilds when verifying runtime behavior for release decisions.  ### Fixed - Fixed `extract_fields` natural-language schema parsing so prompts like `fields: page_title, page_type, main_topics, summary` and `Return a JSON response with fields ...` now resolve to the requested strict output fields instead of drifting into generic auto-extraction keys. - Fixed extraction grounding so metadata-ba |
| `v3.3.4` | 2026-04-09 | Medium | ### Added - Added cross-process host guard coordination for search engines and scrape hosts so multiple Cortex Scout processes on the same machine/IP space requests out instead of self-triggering rate limits. - Added shared cross-process search cache + singleflight locking so concurrent repos/agents can reuse live search results instead of duplicating the same upstream traffic. - Added regression tests for `deep_research` history URL extraction and timeout helper behavior.  ### Changed - Lowered |
| `v3.3.3` | 2026-04-08 | Medium | ### Added - Added `publish/ci/smoke_deep_research.py` to sweep `deep_research` over MCP with coverage for every public parameter, clamp behavior, and invalid input handling. - Added `effective_config` to `deep_research` responses so MCP clients and smoke tests can inspect the clamped execution parameters that were actually used.  ### Fixed - Fixed browser launches in root/restricted environments by applying explicit no-sandbox handling across automation sessions, visible auth sessions, and the r |
| `v3.3.2` | 2026-03-30 | Medium | ### Added - Expanded `scout_browser_automate` with broader Playwright-style parity: `navigate_back`, `hover`, `wait_for`, `resize`, `tabs`, `file_upload`, `fill_form`, `handle_dialog`, `pdf_save`, coordinate mouse actions, route inspection/removal, network state toggling, cookie/localStorage/sessionStorage CRUD, and verification helpers (`generate_locator`, `verify_*`). - Added richer `mock_api` controls with persistent route registry, method matching, custom response headers, delay simulation, |
| `v3.3.0` | 2026-03-30 | Medium | ### Added - Unified the public MCP tool surface around grouped calls: `web_search(include_content=true)`, `web_fetch(mode="single"\|"batch"\|"crawl")`, and `hitl_web_fetch(auth_mode="challenge"\|"auth")`. - Expanded browser automation to behave more like a compact Playwright replacement, including nested flows, console capture, storage state helpers, and stronger auto-wait assertions.  ### Changed - Refreshed usage docs, setup guides, and smoke coverage so the repository points agents at the curren |
| `v3.2.0` | 2026-03-17 | Low | ### Added ## 🎭 The "Playwright Killer" (Stateful Browser Automation)  CortexScout includes a built-in, stateful CDP automation engine designed specifically for AI Agents, completely replacing heavy frameworks like Playwright or Cypress for E2E testing workflows. - **The Silent Omni-Tool (`scout_browser_automate`)**: Instead of calling dozens of tools, agents pass an array of `steps` (navigate, click, type, scroll, press_key, snapshot, screenshot). The entire sequence executes in a single LL |
| `v3.1.3` | 2026-03-14 | Low | ### Fixed  - **Auth-wall false positives on public pages with login modals.**   Pages like Discourse forum threads were incorrectly blocked with `NEED_HITL`   because the password input in the header login modal triggered auth detection.   Rewrote `detect_auth_wall_html` with a high/low-confidence selector split:   - **High-confidence selectors** (e.g. `#login_field`, `.auth-form`, `#loginForm`)     fire unconditionally.   - **Low-confidence selectors** (e.g. `[type='password']`, generic `/login |
| `v3.1.2` | 2026-03-05 | Low | ### Fixed  - **CDP concurrent launches — `SingletonLock` race condition (closes #7).**     When multiple MCP tools triggered headless browser fetches simultaneously, all   launched into the same default Chrome user-data dir, causing every instance   after the first to fail with `"SingletonLock"`. Each CDP launch now gets an   isolated `--user-data-dir` under a unique `/tmp/cortex-scout-cdp-XXXXXXXX`   directory (cleaned up automatically after each request). Concurrent headless   scraping now wor |

## Citation

- HTML: https://www.freshcrate.ai/projects/cortex-scout
- Markdown: https://www.freshcrate.ai/projects/cortex-scout.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/cortex-scout/deps

_Generated by freshcrate.ai. Indexes github releases for AI-agent ecosystem packages._