# spider > Web crawler and scraper for Rust - **URL**: https://www.freshcrate.ai/projects/spider - **Author**: spider-rs - **Category**: AI Agents - **Latest version**: `v2.48.13` (2026-03-31) - **License**: MIT - **Source**: https://github.com/spider-rs/spider - **Homepage**: https://spider.cloud - **Language**: Rust - **GitHub**: 2,437 stars, 196 forks - **Registry**: github - **Tags**: `ai-agent`, `automation`, `crawler`, `headless-chrome`, `rust`, `scraping`, `spider`, `web-crawler` ## Description Web crawler and scraper for Rust ## Recent releases | Version | Date | Urgency | Changes | | --- | --- | --- | --- | | `v2.48.13` | 2026-03-31 | Medium | ## What's New **`spider authenticate` command** — Store your [Spider Cloud](https://spider.cloud) API key locally for remote crawls. ### Usage ```sh # Authenticate (stores key in ~/.spider/credentials) spider authenticate sk-your-key spider auth # alias, interactive prompt # Crawl via Spider Cloud (key auto-loaded) spider crawl -u https://example.com -o # Choose cloud mode spider crawl -u https://example.com --spider-cloud-mode smart -o spider crawl -u https://example.com --spider-cloud-mo | | `v2.48.4` | 2026-03-25 | Medium | New `spider_mcp` crate — MCP server for Spider. ```bash cargo install spider_mcp ``` Setup (Claude Code `~/.claude/settings.json` or Claude Desktop config): ```json { "mcpServers": { "spider": { "command": "spider-mcp" } } } ``` Usage examples: ``` Scrape a page: "Fetch https://example.com as markdown" Crawl a site: "Crawl https://example.com up to 5 pages" Extract links: "Get all links from https://example.com" Transform HTML: "Convert this HTML to markdown:

Hel | | `v2.48.2` | 2026-03-25 | Medium | Race alternative browser engines alongside your primary crawl. Best HTML wins. ```rust use spider::configuration::{BackendEndpoint, BackendEngine, ParallelBackendsConfig}; let mut website = Website::new("https://example.com"); website.configuration.parallel_backends = Some(ParallelBackendsConfig { backends: vec![BackendEndpoint { engine: BackendEngine::LightPanda, endpoint: Some("ws://127.0.0.1:9222".to_string()), binary_path: None, protocol: None, }], | | `v2.47.75` | 2026-03-20 | Low | ## What's New - PageData & Crawler trait abstractions for extensible crawl pipelines - Proxy support for LLM HTTP requests (#378) - Chrome remote_addr via CDP `Network.responseReceived` - Remote cache for Chrome responses — dump & fallback support ## Performance - SIMD-accelerated byte scanning (memchr), unrolled FNV hash - Trie: `Box` keys + manual byte-walk + memchr dot scan - Bloom filter bitmask addressing + inline early-exit - Zero-alloc DNS cache hits via `Arc<[Sock | | `v2.47.51` | 2026-03-19 | Low | - NUMA thread pinning for multi-socket servers (`numa` feature) - zerocopy wire parsing for HTTP status lines, cache headers, DNS records (`zero_copy` feature) | | `v2.47.50` | 2026-03-19 | Low | Zero-copy page passing (bytes::Bytes), mmap+hugepages bloom filter for URL dedup (`bloom` feature). | | `v2.47.24` | 2026-03-15 | Low | io_uring TCP connect + lightweight background runtime - io_uring TCP connect: Socket + Connect opcodes for kernel-async TCP connects via the existing uring worker - Lightweight background runtime: Drops from multi-thread to current-thread tokio executor when io_uring is active - Public API: uring_fs::tcp_connect(addr), uring_fs::is_uring_enabled() - CI fixes: clippy unnecessary_cast, io_other_error, cargo fmt Full Changelog: https://github.com/spider-rs/spider/compare/ | | `v2.45.28` | 2026-03-02 | Low | ### Agent Hardening - Cap LLM-controlled durations (Wait, ClickHold, SetViewport, OpenPage) - Add `js_escape()` for safe JS string interpolation in action handlers - Wrap `Navigate` and screenshot calls with timeouts - Use `PageWaitStrategy::Load` for `WaitForNavigation` instead of fixed sleep - Replace `eval_with_timeout` for Fill/Type/Clear actions with error propagation - Improve semaphore and logging diagnostics on error paths | | `v2.45.24` | 2026-02-21 | Low | ## What's New ### Performance - Cache-first fast path — skip browser/HTTP entirely when cache has data (~5-50ms vs 1-3s) - Deferred Chrome — process multi-page crawls from cache before launching a browser - Work-stealing (hedged requests) — parallel retry for slow crawl requests - io_uring — StreamingWriter for high-throughput file I/O on Linux ### Agent - Per-round model pool routing — route cheap rounds to fast models, complex rounds to capable ones - Comprehensive rout | | `v2.45.20` | 2026-02-05 | Low | ## What's New ### Relevance Gate for Remote Multimodal Crawling Added a `relevance_gate` config that instructs the LLM to return a `"relevant": true\|false` field in its JSON response. When a page is deemed irrelevant, its wildcard budget credit is refunded so the crawler discovers more relevant content. New config fields: - `relevance_gate: bool` — enables the feature - `relevance_prompt: Option` — optional custom relevance criteria How it works:** 1. When enabled, t | ## Citation - HTML: https://www.freshcrate.ai/projects/spider - Markdown: https://www.freshcrate.ai/projects/spider.md - Dependencies JSON: https://www.freshcrate.ai/api/projects/spider/deps _Generated by freshcrate.ai. Indexes github releases for AI-agent ecosystem packages._