# MinerU

> Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

- **URL**: https://www.freshcrate.ai/projects/MinerU
- **Author**: opendatalab
- **Category**: RAG & Memory
- **Latest version**: `mineru-3.2.2-released` (2026-06-02)
- **License**: AGPL-3.0
- **Source**: https://github.com/opendatalab/MinerU
- **Homepage**: https://opendatalab.github.io/MinerU/
- **Language**: Python
- **GitHub**: 60,769 stars, 5,085 forks
- **Registry**: github
- **Tags**: `ai4science`, `document-analysis`, `extract-data`, `layout-analysis`, `ocr`, `parser`, `pdf`, `pdf-converter`, `python`

## Description

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `mineru-3.2.2-released` | 2026-06-02 | High | ## What's Changed * #5033 fix: Enhance PDF processing and improve concurrency management by @myhloli in https://github.com/opendatalab/MinerU/pull/5062  * #5061 fix: add functionality to skip broken PDF pages during rewrite process by @myhloli in https://github.com/opendatalab/MinerU/pull/5064     **Full Changelog**: https://github.com/opendatalab/MinerU/compare/mineru-3.2.1-released...mineru-3.2.2-released |
| `mineru-3.2.0-released` | 2026-05-26 | High | ## What's Changed  MinerU 3.2.0 版本现已发布，本次更新主要聚焦于界面体验、依赖管理、VLM 模型升级以及稳定性修复。  - 优化 Gradio 界面交互与展示效果，提升文件上传、结果查看和整体使用体验。 - 优化项目依赖管理，精简不必要依赖，降低安装与运行环境维护成本。 - 更新 VLM 模型至 2605 版本，提升视觉语言模型相关解析能力与稳定性。 - 修复若干已知问题，提升整体稳定性与兼容性。  MinerU 3.2.0 is now available. This release focuses on UI improvements, dependency optimization, VLM model updates, and general stability fixes.  - Improved the Gradio interface for a smoother upload, preview, and result-viewing experience. - Optimized dependency manage |
| `mineru-3.1.15-released` | 2026-05-19 | High | ## What's Changed  * Improved Gradio preview and upload experience, including Office source-file preview links, clipboard file upload, clearer processing status, better i18n rendering, and extracted Gradio CSS/JS/header resources. * Fixed Gradio Markdown/HTML image previews to use served file URLs instead of embedded base64, improving preview compatibility without changing exported artifacts. * Improved Office parsing robustness, including DOCX table alignment, safer XML tag-name handling, e |
| `mineru-3.1.14-released` | 2026-05-15 | High | ## What's Changed  - Accuracy improvements:    - Optimized the `pdf_classify` classification pipeline.    - Tuned the contrast threshold boundary for span OCR.   **Full Changelog**: https://github.com/opendatalab/MinerU/compare/mineru-3.1.13-released...mineru-3.1.14-released |
| `mineru-3.1.11-released` | 2026-05-09 | High | ## What's Changed * perf: optimize table parsing performance in pipeline mode   **Full Changelog**: https://github.com/opendatalab/MinerU/compare/mineru-3.1.10-released...mineru-3.1.11-released |
| `mineru-3.1.7-released` | 2026-05-06 | High | ## What's Changed * feat: add Windows CUDA acceleration troubleshooting section to documentation  * feat: add MINERU_TASK_RESULT_TIMEOUT_SECONDS for configurable task process timeout * feat: add MINERU_TASK_RESULT_DOWNLOAD_TIMEOUT_SECONDS for configurable task result download timeout * feat: add Ascend NPU support for router multi-card deployment * feat: add sheet title as text in markdown output for xlsx multi-sheets  #4897   ## New Contributors * @crescenth made their first contributi |
| `mineru-3.1.6-released` | 2026-04-28 | High | ## What's Changed  - fix some office docs bugs   **Full Changelog**: https://github.com/opendatalab/MinerU/compare/mineru-3.1.5-released...mineru-3.1.6-released |
| `mineru-3.1.5-released` | 2026-04-27 | High | ## What's Changed * feat: implement asynchronous model retrieval and enhance timeout handling in API client by @myhloli in https://github.com/opendatalab/MinerU/pull/4857 * fix: specify maximum version for mlx dependency in pyproject.toml by @myhloli in https://github.com/opendatalab/MinerU/pull/4860   **Full Changelog**: https://github.com/opendatalab/MinerU/compare/mineru-3.1.4-released...mineru-3.1.5-released |
| `mineru-3.1.2-released` | 2026-04-22 | High | ## What's Changed fix: prevent abnormal server termination caused by excessively long PDF rendering time in router mode.   **Full Changelog**: https://github.com/opendatalab/MinerU/compare/mineru-3.1.1-released...mineru-3.1.2-released |
| `mineru-3.1.1-released` | 2026-04-20 | High | ## What's Changed * fix: Mitigate potential inference hangs on Ascend NPU platforms. by @myhloli in https://github.com/opendatalab/MinerU/pull/4821   **Full Changelog**: https://github.com/opendatalab/MinerU/compare/mineru-3.1.0-released...mineru-3.1.1-released |

## Dependency audit

- **Score**: 55/100
- **Total deps**: 56
- **Resolved**: 36
- **Unresolved**: 20
- **License conflicts**: 0
- **Warnings**: 19
- **Scanned**: 2026-05-04

## Citation

- HTML: https://www.freshcrate.ai/projects/MinerU
- Markdown: https://www.freshcrate.ai/projects/MinerU.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/MinerU/deps

_Generated by freshcrate.ai. Indexes github releases for AI-agent ecosystem packages._
