# kreuzberg

> A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 91+ formats. Available for Rust, Python

- **URL**: https://www.freshcrate.ai/projects/kreuzberg
- **Author**: kreuzberg-dev
- **Category**: MCP Servers
- **Latest version**: `v4.9.9` (2026-06-05)
- **License**: NOASSERTION
- **Source**: https://github.com/kreuzberg-dev/kreuzberg
- **Homepage**: https://kreuzberg.dev/
- **Language**: Rust
- **GitHub**: 7,618 stars, 380 forks
- **Registry**: github
- **Tags**: `bun`, `csharp`, `document-intelligence`, `elixir`, `ffi`, `golang`, `java`, `metadata-extraction`, `rag`, `rust`

## Description

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 91+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `v4.9.9` | 2026-06-05 | High | **Full Changelog**: https://github.com/kreuzberg-dev/kreuzberg/commits/v4.9.9 |
| `v4.9.8` | 2026-05-17 | High | LTS patch release. Four targeted bug fixes plus dependency pinning so the branch builds against current crates.io releases.  ### Fixed  - **#934**: RTF hex byte escapes now honor `\ansicpgNNNN`, so CP1251 Cyrillic byte runs decode as readable text instead of Windows-1252 mojibake. - **#937**: `ExtractionConfig(cancel_token=…)` raised `TypeError: unexpected keyword argument 'cancel_token'` from Python despite the type stub declaring the kwarg. The `#[pyo3(signature = …)]` on `ExtractionConfig::_ |
| `v4.9.7` | 2026-05-08 | High | **Full Changelog**: https://github.com/kreuzberg-dev/kreuzberg/compare/v4.9.6...v4.9.7 |
| `v4.9.6` | 2026-05-08 | High | **Full Changelog**: https://github.com/kreuzberg-dev/kreuzberg/compare/v4.9.5...v4.9.6 |
| `v4.9.5` | 2026-04-23 | High | ## Fixed  - **#790**: Fix GPU acceleration — kreuzberg now bundles CPU-only ONNX Runtime by default (zero-config). When a GPU execution provider (`cuda`, `tensorrt`, `coreml`) is explicitly requested via `AccelerationConfig` but unavailable, kreuzberg returns an error with setup instructions instead of silently falling back to CPU. `Auto` mode gracefully falls back to CPU with an info log. For GPU support, set `ORT_DYLIB_PATH` to a GPU-enabled ONNX Runtime. - **#791**: Fix DOCX OCR extraction — |
| `v4.9.3` | 2026-04-22 | High | See [CHANGELOG.md](https://github.com/kreuzberg-dev/kreuzberg/blob/main/CHANGELOG.md#493---2026-04-22) for full details. |
| `v4.9.2` | 2026-04-19 | High | ## Fixed  - Fix cancellation token not checked in WASM (non-tokio) path for Excel, DOC, PPT, Pages, Keynote, and Numbers extractors — cancellation was silently ignored in WASM builds - Propagate `Cancelled` error code (9) to all bindings — Go, C FFI, Python, TypeScript, Java, C#, and C API docs now include the new code - Fix PHP e2e embed tests calling instance methods statically — use procedural `\Kreuzberg\embed()` functions - Fix TypeScript e2e embed tests using wrong field names (`type`/`nam |
| `v4.9.1` | 2026-04-19 | High | ## Fixed  - **#754**: Preserve `_internal_bindings.pyi` type stub during wheel artifact cleanup — published wheels now include inline type information for the core binding module - Add missing `Default` impl for `PyCancellationToken` to satisfy clippy `new_without_default` lint - Improve download resilience for `eng.traineddata` in build script — increase retries from 3 to 5, add fallback URL via `raw.githubusercontent.com`, and increase timeout to 300s - Increase Task installer retry resilience |
| `v4.9.0` | 2026-04-18 | High | ## What's Changed * Fix duplicated heading in markdown chunker with prepend_heading_context by @tobocop2 in https://github.com/kreuzberg-dev/kreuzberg/pull/701 * chore(deps): bump pnpm/action-setup from 5 to 6 by @dependabot[bot] in https://github.com/kreuzberg-dev/kreuzberg/pull/698 * chore(deps): bump actions/upload-pages-artifact from 4 to 5 by @dependabot[bot] in https://github.com/kreuzberg-dev/kreuzberg/pull/711 * fix: remove duplicate output_format key and fix numeric types in OCR metadat |
| `v4.8.6` | 2026-04-17 | High | ## What's Changed * Fix duplicated heading in markdown chunker with prepend_heading_context by @tobocop2 in https://github.com/kreuzberg-dev/kreuzberg/pull/701 * chore(deps): bump pnpm/action-setup from 5 to 6 by @dependabot[bot] in https://github.com/kreuzberg-dev/kreuzberg/pull/698 * chore(deps): bump actions/upload-pages-artifact from 4 to 5 by @dependabot[bot] in https://github.com/kreuzberg-dev/kreuzberg/pull/711 * fix: remove duplicate output_format key and fix numeric types in OCR metadat |

## Citation

- HTML: https://www.freshcrate.ai/projects/kreuzberg
- Markdown: https://www.freshcrate.ai/projects/kreuzberg.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/kreuzberg/deps

_Generated by freshcrate.ai. Indexes github releases for AI-agent ecosystem packages._
