| v2.48.13 | ## What's New **`spider authenticate` command** â Store your [Spider Cloud](https://spider.cloud) API key locally for remote crawls. ### Usage ```sh # Authenticate (stores key in ~/.spider/credentials) spider authenticate sk-your-key spider auth # alias, interactive prompt # Crawl via Spider Cloud (key auto-loaded) spider crawl -u https://example.com -o # Choose cloud mode spider crawl -u https://example.com --spider-cloud-mode smart -o spider crawl -u https://example.com --spider-cloud-mo | Medium | 3/31/2026 |
| v2.48.4 | New `spider_mcp` crate â MCP server for Spider. ```bash cargo install spider_mcp ``` Setup (Claude Code `~/.claude/settings.json` or Claude Desktop config): ```json { "mcpServers": { "spider": { "command": "spider-mcp" } } } ``` Usage examples: ``` Scrape a page: "Fetch https://example.com as markdown" Crawl a site: "Crawl https://example.com up to 5 pages" Extract links: "Get all links from https://example.com" Transform HTML: "Convert this HTML to markdown: <h1>Hel | Medium | 3/25/2026 |
| v2.48.2 | Race alternative browser engines alongside your primary crawl. Best HTML wins. ```rust use spider::configuration::{BackendEndpoint, BackendEngine, ParallelBackendsConfig}; let mut website = Website::new("https://example.com"); website.configuration.parallel_backends = Some(ParallelBackendsConfig { backends: vec![BackendEndpoint { engine: BackendEngine::LightPanda, endpoint: Some("ws://127.0.0.1:9222".to_string()), binary_path: None, protocol: None, }], | Medium | 3/25/2026 |
| v2.47.75 | ## What's New - **PageData & Crawler trait abstractions** for extensible crawl pipelines - **Proxy support for LLM HTTP requests** (#378) - **Chrome remote_addr** via CDP `Network.responseReceived` - **Remote cache for Chrome responses** â dump & fallback support ## Performance - SIMD-accelerated byte scanning (memchr), unrolled FNV hash - Trie: `Box<str>` keys + manual byte-walk + memchr dot scan - Bloom filter bitmask addressing + inline early-exit - Zero-alloc DNS cache hits via `Arc<[Sock | Low | 3/20/2026 |
| v2.47.51 | - NUMA thread pinning for multi-socket servers (`numa` feature) - zerocopy wire parsing for HTTP status lines, cache headers, DNS records (`zero_copy` feature) | Low | 3/19/2026 |
| v2.47.50 | Zero-copy page passing (bytes::Bytes), mmap+hugepages bloom filter for URL dedup (`bloom` feature). | Low | 3/19/2026 |
| v2.47.24 | io_uring TCP connect + lightweight background runtime - io_uring TCP connect: Socket + Connect opcodes for kernel-async TCP connects via the existing uring worker - Lightweight background runtime: Drops from multi-thread to current-thread tokio executor when io_uring is active - Public API: uring_fs::tcp_connect(addr), uring_fs::is_uring_enabled() - CI fixes: clippy unnecessary_cast, io_other_error, cargo fmt **Full Changelog**: https://github.com/spider-rs/spider/compare/ | Low | 3/15/2026 |
| v2.45.28 | ### Agent Hardening - Cap LLM-controlled durations (Wait, ClickHold, SetViewport, OpenPage) - Add `js_escape()` for safe JS string interpolation in action handlers - Wrap `Navigate` and screenshot calls with timeouts - Use `PageWaitStrategy::Load` for `WaitForNavigation` instead of fixed sleep - Replace `eval_with_timeout` for Fill/Type/Clear actions with error propagation - Improve semaphore and logging diagnostics on error paths | Low | 3/2/2026 |
| v2.45.24 | ## What's New ### Performance - **Cache-first fast path** â skip browser/HTTP entirely when cache has data (~5-50ms vs 1-3s) - **Deferred Chrome** â process multi-page crawls from cache before launching a browser - **Work-stealing (hedged requests)** â parallel retry for slow crawl requests - **io_uring** â StreamingWriter for high-throughput file I/O on Linux ### Agent - **Per-round model pool routing** â route cheap rounds to fast models, complex rounds to capable ones - **Comprehensive rout | Low | 2/21/2026 |
| v2.45.20 | ## What's New ### Relevance Gate for Remote Multimodal Crawling Added a `relevance_gate` config that instructs the LLM to return a `"relevant": true|false` field in its JSON response. When a page is deemed irrelevant, its wildcard budget credit is refunded so the crawler discovers more relevant content. **New config fields:** - `relevance_gate: bool` â enables the feature - `relevance_prompt: Option<String>` â optional custom relevance criteria **How it works:** 1. When enabled, t | Low | 2/5/2026 |
| v2.44.13 | ## What's New - **Spider Cloud integration** (`spider_cloud` feature) â optional proxy rotation, anti-bot bypass, and intelligent fallback via [spider.cloud](https://spider.cloud) - Modes: Proxy, Api, Unblocker, Fallback, Smart - Smart mode auto-detects Cloudflare challenges, CAPTCHAs, and bot protection then retries via `/unblocker` - **S3 skills loading** (`skills_s3` feature) â load agent skills from S3-compatible storage (AWS, MinIO, R2) - CLI: `--spider-cloud-key` and `--spider-cloud-m | Low | 2/5/2026 |
| v2.43.20 | ## Spider v2.43.20 ### Changes - **fix(spider)**: Fix doctest and update chromey for adblock compatibility - **fix(search)**: Use reqwest::Client directly for cache feature compatibility - **chore(spider)**: Update spider_agent dependency to 0.4 ### spider_agent Integration The `agent` feature now uses spider_agent v0.4.0, which includes: - Smart caching with size-aware LRU eviction - High-performance chain execution with parallel step support - Batch processing for multiple items - Prefetch | Low | 2/3/2026 |
| spider_agent-v0.4.0 | ## Spider Agent v0.4.0 ### Performance Optimizations This release adds several performance optimizations for automation workflows: #### Smart Caching - **SmartCache**: Size-aware LRU cache with automatic cleanup - Bounded memory usage with configurable limits - TTL-based expiration - Automatic cleanup on memory pressure - Statistics tracking (hits, misses, evictions) #### High-Performance Execution - **ChainExecutor**: Parallel step execution for automation chains - Analyzes depend | Low | 2/3/2026 |
| v2.43.18 | ## Features ### Web Search Integration Add web search capabilities to Spider's RemoteMultimodalEngine with support for multiple search providers. #### Supported Providers - **Serper** (`search_serper`) - Google SERP API - **Brave** (`search_brave`) - Privacy-focused search - **Bing** (`search_bing`) - Microsoft Bing Web Search - **Tavily** (`search_tavily`) - AI-optimized search #### New Methods - `search()` - Search the web and return structured results - `search_and_extract()` - Search + fe | Low | 2/2/2026 |
| v2.43.13 | ## đ¤ Advanced Agentic Automation Features This release adds comprehensive agentic automation capabilities to spider, making it a powerful tool for autonomous web interactions. ### Phase 1: Simplified Agentic APIs - `act(page, instruction)` - Execute single actions with natural language - `observe(page)` - Analyze page state and get structured observations - `extract_page(page, prompt, schema)` - Extract structured data from pages - `AutomationMemory` - In-memory state management for multi-rou | Low | 2/2/2026 |
| v2.43.3 | ## Bug Fix - **fix(automation)**: Improve `best_effort_parse_json_object` parsing to handle LLM responses with reasoning text before JSON code blocks - Find ```json blocks anywhere in response (not just at boundaries) - Support JSON arrays in addition to objects - Better fallback parsing for various LLM response formats **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.43.2...v2.43.3 | Low | 2/2/2026 |
| v2.43.2 | ## New Feature: Extraction Schema Support Add JSON Schema support for structured extraction in `RemoteMultimodalEngine`. ### `ExtractionSchema` Struct ```rust pub struct ExtractionSchema { pub name: String, // Schema name (e.g., "products") pub description: Option<String>, // What to extract pub schema: String, // JSON Schema definition pub strict: bool, // Enforce strict adherence } ``` ### Example Usage ```rust use spider::features::automation: | Low | 2/2/2026 |
| v2.43.1 | ## Bug Fix - **fix(page)**: Add missing `remote_multimodal_usage` and `extra_remote_multimodal_data` fields to the decentralized `Page` struct for feature parity with the standard `Page` struct. **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.43.0...v2.43.1 | Low | 2/1/2026 |
| v2.43.0 | ## What's New ### Token Usage Tracking for RemoteMultimodalEngine The remote multimodal automation engine now tracks and returns token usage conforming to the OpenAI API format: - `AutomationUsage` struct with `prompt_tokens`, `completion_tokens`, `total_tokens` - Usage is accumulated across all inference rounds - Stored on `Page.remote_multimodal_usage` ### Extraction Support New extraction capabilities for RemoteMultimodalEngine, similar to the OpenAI integration: - `extra_ai_data` - Ena | Low | 2/1/2026 |
| v2.42.0 | ## WebDriver Support Full W3C WebDriver protocol support via `thirtyfour` crate for Selenium Grid, remote browsers, and cross-browser testing. ```rust use spider::website::Website; use spider::features::webdriver_common::{WebDriverConfig, WebDriverBrowser}; let mut website = Website::new("https://example.com") .with_webdriver( WebDriverConfig::new() .with_server_url("http://localhost:4444") .with_browser(WebDriverBrowser::Chrome) .with_headless( | Low | 2/1/2026 |
| v2.41.1 | # v2.41.0 - WebDriver Support This release adds WebDriver support via the `thirtyfour` crate, enabling browser automation using the W3C WebDriver protocol. Connect to remote Selenium Grid, chromedriver, geckodriver, and more. | Low | 2/1/2026 |
| v2.40.2 | ## Whats Changed Solve web challenges, perform actions, and more with remote multimodal iterative automation. - **Remote Multimodal Engine** for Chrome automation using vision + LLM - Iterative automation loop: capture â infer plan â execute â re-capture â repeat - Unified `RemoteMultimodalConfigs` to configure: - API endpoint - Model selection - Prompts - Retry behavior - Capture strategies - Strict JSON automation plans: `{ "label": "...", "done": true|false, "ste | Low | 1/23/2026 |
| v2.39.14 | ## What's Changed This release brings built in Chrome gemini nano support and remote vision support. * Add `with_on_should_crawl_callback_closure` by @WilliamVenner in https://github.com/spider-rs/spider/pull/346 * feat(solver): add built in gemini nano support ## New Contributors * @WilliamVenner made their first contribution in https://github.com/spider-rs/spider/pull/346 **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.38.122...v2.39.14 | Low | 1/16/2026 |
| v2.38.122 | ## What's Changed * fix(chrome): add automatic chrome executable detection by @yebei199 in https://github.com/spider-rs/spider/pull/343 * feat(gemini): add Gemini AI support for dynamic browser scripting by @swistaczek in https://github.com/spider-rs/spider/pull/344 * chore(smart): add mismatch cypher retry ## New Contributors * @yebei199 made their first contribution in https://github.com/spider-rs/spider/pull/343 * @swistaczek made their first contribution in https://github.com/spider- | Low | 1/2/2026 |
| v2.38.109 | # Whats Changed Fix smart mode lifecycles loading. **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.38.68...v2.38.109 | Low | 12/26/2025 |
| v2.38.70 | # Whats Changed Fix smart mode re-rendering document content. **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.38.44...v2.38.70 | Low | 12/7/2025 |
| v2.38.46 | ## What's Changed * fix "real_browser" disabled by @rumpl in https://github.com/spider-rs/spider/pull/336 * fix builder methods wait for * fix headless http -> https upgrade cf * fix smart mode re-render tracking and content forwarding ## New Contributors * @rumpl made their first contribution in https://github.com/spider-rs/spider/pull/336 **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.37.180...v2.38.46 | Low | 12/5/2025 |
| v2.37.180 | ## What's Changed * spider_cli: fix duplicated argument -r by @zazolabs in https://github.com/spider-rs/spider/pull/324 * chore(chrome): fix compile [#328] * spider_cli: fix download files url empty parse * feat(spider): add `with_max_bytes_allowed` to track global browser context bytes for session * chore(cli): add proxy_url [#330] ## New Contributors * @zazolabs made their first contribution in https://github.com/spider-rs/spider/pull/324 **Full Changelog**: https://github.com/sp | Low | 11/1/2025 |
| v2.37.159 | # Whats Changed Builder methods to bind local_address and network. * fix duration tracking [#304] * fix network interface platform checking **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.37.119...v2.37.159 | Low | 7/6/2025 |
| v2.37.122 | ## What's Changed Major spoof emulations for chrome moved to `spider_fingerprint`. * chore(fingerprint): add navigator.hardwareConcurrency spoof * chore(examples): fix anti_bot with_user_agent * chore(fingerprint): fix device_pixel_ratio mac defaults * chore(fingerprint): fix hide chrome * chore(fingerprint): prep fingerprint canvas noise * chore(fingerprint): add profiles start * chore(fingerprint): add env section * chore(fingerprint): fix userAgentData getHighEntropyValues * ch | Low | 5/17/2025 |
| v2.37.18 | # Whats Changed The page streaming rewriter now handles built in metadata extracting by default. You can access it by using `page.metadata` or `page.get_metadata()`. Some of the metadata properties are set as placeholders unused. * feat(page): add metadata extracting * chore(chrome): fix concurrent context creation * chore(chrome): bump cdp revision 1457408 ```rust /// Page-level metadata extracted from HTML. pub struct Metadata { /// The meta title pub title: Option<com | Low | 5/7/2025 |
| v2.36.123 | # Whats Changed Major fix for http or smart mode request adding the Host header preventing proper redirects. Fix openai automation usage. * chore(website): fix client host header * chore(chrome,sitemap): fix sitemap handling xml * feat(antibot): add antibot detection * chore(chrome): fix viewport browser handling pages * chore(chrome): fix fingerprint execution script * chore(sitemap): add auto sitemap adding whitelisting **Full Changelog**: https://github.com/spider-rs/spider/com | Low | 4/18/2025 |
| v2.36.67 | # Whats Changed Fix xml parsing initial links. * chore(real_browser,chrome): add missing chrome headers * chore(chrome): add real browser loading * chore(chrome): fix request_timeout default * chore(chrome): fix timeout subtracting **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.35.18...v2.36.67 | Low | 3/29/2025 |
| v2.35.18 | # Whats Changed * The [rquest](https://github.com/0x676e67/rquest) client support with the `rquest` feature flag. * New `website.with_emulation` for `rquest` emulation. * Bug fixes and improvements with chrome request timeout handling. ```rust /// Set the request emuluation. This method does nothing if the `rquest` flag is not enabled. pub fn with_emulation(&mut self, emulation: Option<rquest_util::Emulation>) -> &mut Self { self.configuration.with_emulation(emulati | Low | 3/26/2025 |
| v2.34.5 | # Whats Changed Get a map of the request and responses sent for headless. Responses: bytes transfered Requests: mono time Example: mapping ```json { "response_map": { "https://spider.cloud/_astro/page.V2R8AmkL.js": 0.0, "https://spider.cloud/_astro/FaqSection.93yW76zV.js": 0.0, "https://spider.cloud/_astro/AuthDropdownMarketing.BtXgMRKz.js": 0.0, "https://spider.cloud/fonts/berkeley-mono/WEB/BerkeleyMono-Italic.woff2": 0. | Low | 3/19/2025 |
| v2.33.11 | # Whats Changed Add `Website::with_crawl_timeout` builder method to add a max timeout for the crawl. This is useful when robots.txt can change the expected crawl durations. Example: ```rust use std::time::Duration; use spider::tokio; use spider::website::Website; use tokio::io::AsyncWriteExt; #[tokio::main] async fn main() { let mut website: Website = Website::new("https://spider.cloud").with_crawl_timeout(Some(Duration::from_millis(10))).build().unwrap(); let mut rx2 | Low | 3/14/2025 |
| v2.33.1 | # Whats Changed Remove `jemalloc` flag. This should be done at the top level of main. Add asset support for chrome media request. - feat(chrome): add asset handling pages [#275] **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.32.6...v2.33.1 | Low | 3/9/2025 |
| v2.32.9 | # Whats Changed Two new methods for thread safe crawling and bootstrapping setup. `website.crawl_chrome_send` and `website.crawl_raw_send`. **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.32.6...v2.32.9 | Low | 3/8/2025 |
| v2.32.6 | # Whats Changed Chrome performance improved when deserializing the page and removed unused Bytes wrapper. Add to block list items chrome. **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.31.8...v2.32.6 | Low | 3/7/2025 |
| v2.31.8 | # Whats Changed 1. chore(chrome): fix wait_for events sequence 1. chore(chrome): add navigation network cancel **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.31.4...v2.31.8 | Low | 3/5/2025 |
| v2.31.4 | # Whats Changed Chrome now manages document reloading to prevent infinite page reloading through scripting. The firewall feature flag now enables the firewall protection via networking on chrome as well for an improved ad, tracking, and malice website blocker. * chore(chrome): add infinite loop document reload protection * chore(chrome): add to block list * chore(chrome): add firewall feature flag. * perf(conf): remove box indirection proxies, whitelist, and blacklist * chore(chrome): | Low | 2/23/2025 |
| v2.30.3 | # Whats Changed Use the feature flag `firewall` to protect against malice websites and lazy loading smart mode chrome rendering. * feat(firewall): add start of spider_firewall * chore(smart): fix missing bytes transferred * feature(smart): add lazy load chrome * perf(bytes): remove BytesMut **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.27.66...v2.30.3 | Low | 2/16/2025 |
| v2.27.66 | ## What's Changed * chore(cli): trigger help page on missing arguments by @pwnwriter in https://github.com/spider-rs/spider/pull/265 * chore(chrome): add connection retry ws * chore(smart): add initial http fallback * chore(website): add direct proxy control * chore(website): fix scrape hang [#268] ## New Contributors * @pwnwriter made their first contribution in https://github.com/spider-rs/spider/pull/265 **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.27.50...v2 | Low | 2/13/2025 |
| v2.27.50 | # Whats Changed Web page normalizing to prevent all duplicate content, crawl traps, and more pages from being crawled repeatedly. We can now crawl websites that target ports outside 80 and 443. 1. feat(page): add relative directory url handling 1. chore(website): fix relative page merging links 1. chore(serde): fix cron compile configuration 1. chore(chrome): update tokio-tungestite@0.26 1. chore(page): add port validation links 1. chore(website): fix signature compile non disk featu | Low | 1/25/2025 |
| v2.26.27 | # Whats Changed 1. add auto find sitemap url on 404 or network error. 2. fix chrome_cache_hybrid compile. 3. add `cache_chrome_hybrid_mem` flag to use memory instead of disk. 4. fix q draining across website methods 5. fix crawl depth handling 6. fix worker init background connect 7. add proper status code from errors **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.26.1...v2.26.27 | Low | 1/18/2025 |
| v2.26.1 | # Whats Changed This release brings performance improvements by skipping URL parsing per page. You can now also pass in a second param to the page link methods to collect the links with a new domain target. Targeting the correct root domain for parsing the links is now handled across features. If you used `page::Page::take_url` directly you may need to call `page::Page::set_url_parsed_direct_empty()` first or the `page::Page::get_url_parsed()` method. 1. perf(cli): add page links dire | Low | 1/11/2025 |
| v2.24.15 | # Whats Changed Add a callback to perform validation using [spider::page::Page](https://docs.rs/spider/latest/spider/page/struct.Page.html). You can now use the `basic` feature flag to easily disable io-uring on linux and still get the default features with `"default-features = false"`. 1. feat(website): add on_should_crawl_callback [#241] 1. feat(page): add blocked_crawl [#242] 1. chore(disk): fix cfg aho_corasick 1. chore(fs): remove tentril crate 1. chore(page): fix crawling initia | Low | 1/4/2025 |
| v2.23.7 | # Whats Changed Linux now uses [io_uring](https://github.com/tokio-rs/io-uring) for the DNS connect phase. If you do not have a recent version of linux installed disable the feature flag `io_uring`. * feat(io_uring): add io_uring for connect_phase linux * chore(fs): fix feature flag compile fs **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.22.19...v2.23.7 | Low | 12/31/2024 |
| v2.22.19 | # Whats Changed This release brings in a SQLite for improved memory handling with the feature flags `disk_native_tls`, `disk`, and `disk_aws`. SQLite is set to be used in a hybrid manner with memory in order to maintain performance. With disk handling and our string interning urls crawled can entire the billions of resources or infinite with EFS attached. ## Other Changes * chore(website,page): fix concurrent initial scoped access to `lazy_static!` * chore(chrome): add more network | Low | 12/24/2024 |
| v2.21.33 | # Whats Changed Fix http crawling past first page Fix safe handling abs urls **Full Changelog**: https://github.com/spider-rs/spider/compare/v2.21.27...v2.21.33 | Low | 12/18/2024 |