# spider

> Web crawler and scraper for Rust

- **URL**: https://www.freshcrate.ai/projects/spider
- **Author**: spider-rs
- **Category**: AI Agents
- **Latest version**: `v2.48.13` (2026-03-31)
- **License**: MIT
- **Source**: https://github.com/spider-rs/spider
- **Homepage**: https://spider.cloud
- **Language**: Rust
- **GitHub**: 2,437 stars, 196 forks
- **Registry**: github
- **Tags**: `ai-agent`, `automation`, `crawler`, `headless-chrome`, `rust`, `scraping`, `spider`, `web-crawler`

## Description

Web crawler and scraper for Rust

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `v2.48.13` | 2026-03-31 | Medium | ## What's New  **`spider authenticate` command** — Store your [Spider Cloud](https://spider.cloud) API key locally for remote crawls.  ### Usage  ```sh # Authenticate (stores key in ~/.spider/credentials) spider authenticate sk-your-key spider auth  # alias, interactive prompt  # Crawl via Spider Cloud (key auto-loaded) spider crawl -u https://example.com -o  # Choose cloud mode spider crawl -u https://example.com --spider-cloud-mode smart -o spider crawl -u https://example.com --spider-cloud-mo |
| `v2.48.4` | 2026-03-25 | Medium | New `spider_mcp` crate — MCP server for Spider.  ```bash cargo install spider_mcp ```  Setup (Claude Code `~/.claude/settings.json` or Claude Desktop config):  ```json { "mcpServers": { "spider": { "command": "spider-mcp" } } } ```  Usage examples:  ``` Scrape a page:       "Fetch https://example.com as markdown" Crawl a site:        "Crawl https://example.com up to 5 pages" Extract links:       "Get all links from https://example.com" Transform HTML:      "Convert this HTML to markdown: <h1>Hel |
| `v2.48.2` | 2026-03-25 | Medium | Race alternative browser engines alongside your primary crawl. Best HTML wins.  ```rust use spider::configuration::{BackendEndpoint, BackendEngine, ParallelBackendsConfig};  let mut website = Website::new("https://example.com"); website.configuration.parallel_backends = Some(ParallelBackendsConfig {     backends: vec![BackendEndpoint {         engine: BackendEngine::LightPanda,         endpoint: Some("ws://127.0.0.1:9222".to_string()),         binary_path: None,         protocol: None,     }], |
| `v2.47.75` | 2026-03-20 | Low | ## What's New  - **PageData & Crawler trait abstractions** for extensible crawl pipelines - **Proxy support for LLM HTTP requests** (#378) - **Chrome remote_addr** via CDP `Network.responseReceived` - **Remote cache for Chrome responses** — dump & fallback support  ## Performance  - SIMD-accelerated byte scanning (memchr), unrolled FNV hash - Trie: `Box<str>` keys + manual byte-walk + memchr dot scan - Bloom filter bitmask addressing + inline early-exit - Zero-alloc DNS cache hits via `Arc<[Sock |
| `v2.47.51` | 2026-03-19 | Low | - NUMA thread pinning for multi-socket servers (`numa` feature) - zerocopy wire parsing for HTTP status lines, cache headers, DNS records (`zero_copy` feature) |
| `v2.47.50` | 2026-03-19 | Low | Zero-copy page passing (bytes::Bytes), mmap+hugepages bloom filter for URL dedup (`bloom` feature). |
| `v2.47.24` | 2026-03-15 | Low | io_uring TCP connect + lightweight background runtime    - io_uring TCP connect: Socket + Connect opcodes for kernel-async TCP connects via the existing uring   worker   - Lightweight background runtime: Drops from multi-thread to current-thread tokio executor when   io_uring is active   - Public API: uring_fs::tcp_connect(addr), uring_fs::is_uring_enabled()   - CI fixes: clippy unnecessary_cast, io_other_error, cargo fmt  **Full Changelog**: https://github.com/spider-rs/spider/compare/ |
| `v2.45.28` | 2026-03-02 | Low | ### Agent Hardening  - Cap LLM-controlled durations (Wait, ClickHold, SetViewport, OpenPage) - Add `js_escape()` for safe JS string interpolation in action handlers - Wrap `Navigate` and screenshot calls with timeouts - Use `PageWaitStrategy::Load` for `WaitForNavigation` instead of fixed sleep - Replace `eval_with_timeout` for Fill/Type/Clear actions with error propagation - Improve semaphore and logging diagnostics on error paths |
| `v2.45.24` | 2026-02-21 | Low | ## What's New  ### Performance - **Cache-first fast path** — skip browser/HTTP entirely when cache has data (~5-50ms vs 1-3s) - **Deferred Chrome** — process multi-page crawls from cache before launching a browser - **Work-stealing (hedged requests)** — parallel retry for slow crawl requests - **io_uring** — StreamingWriter for high-throughput file I/O on Linux  ### Agent - **Per-round model pool routing** — route cheap rounds to fast models, complex rounds to capable ones - **Comprehensive rout |
| `v2.45.20` | 2026-02-05 | Low | ## What's New  ### Relevance Gate for Remote Multimodal Crawling  Added a `relevance_gate` config that instructs the LLM to return a `"relevant": true\|false` field in its JSON response. When a page is deemed irrelevant, its wildcard budget credit is refunded so the crawler discovers more relevant content.  **New config fields:** - `relevance_gate: bool` — enables the feature - `relevance_prompt: Option<String>` — optional custom relevance criteria  **How it works:** 1. When enabled, t |

## Citation

- HTML: https://www.freshcrate.ai/projects/spider
- Markdown: https://www.freshcrate.ai/projects/spider.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/spider/deps

_Generated by freshcrate.ai. Indexes github releases for AI-agent ecosystem packages._
