Description
<p align="center"> <a href="https://github.com/pymupdf/pymupdf4llm"> <img loading="lazy" alt="PyMuPDF logo" src="https://pymupdf.readthedocs.io/en/latest/_static/sidebar-logo-light.svg" width="150px" height='auto' /> </a> </p> # PyMuPDF4LLM [](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) [](https://artifex.com/licensing/gnu-agpl-v3) [](https://pepy.tech/projects/pymupdf4llm) [](https://discord.gg/7pH3gqcRtg) **PyMuPDF4LLM** is a lightweight extension for **PyMuPDF** that turns documents into clean, structured data with minimal setup. It includes layout analysis *without* any GPU requirement. **PyMuPDF4LLM** makes it easy to extract document content in the format you need for **LLM** & **RAG** environments. It supports structured data extraction to **Markdown**, **JSON** and **TXT** , as well as [LlamaIndex](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/index.html#with-llamaindex) and [LangChain](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/index.html#with-langchain) integration. ## Features - Parsing of [multiple document formats](https://pymupdf.readthedocs.io/en/latest/about.html#feature-matrix). - Export structured data as Markdown, JSON and plain text output formats. - Support for multi-column pages. - Support for image and vector graphics extraction. - Layout analysis for better semantic understanding of document structure. - Support for page chunking output. - Integration with popular AI frameworks. ## Installation ```bash $ pip install -U pymupdf4llm ``` > This command will automatically install or upgrade [PyMuPDF](https://github.com/pymupdf/PyMuPDF) as required. ## Execution ### Markdown ```python import pymupdf4llm # The remainder of the script is unchanged md_text = pymupdf4llm.to_markdown("input.pdf") # now work with the output data, e.g. store as a UTF8-encoded file import pathlib pathlib.Path("output.md").write_text(md_text) ``` ### JSON ```python import pymupdf4llm json_text = pymupdf4llm.to_json("input.pdf") # now work with the output data, e.g. store as a UTF8-encoded file import pathlib pathlib.Path("output.json").write_text(json_text) ``` ### Plain Text ```python import pymupdf4llm plain_text = pymupdf4llm.to_text("input.pdf") # now work with the output data, e.g. store as a UTF8-encoded file import pathlib pathlib.Path("output.txt").write_text(plain_text) ``` ## Documentation Check out the [PyMuPDF4LLM documentation](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm), for details on installation, features, sample code and the [full API](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/api.html). ## Examples Find our [examples on GitHub](https://github.com/pymupdf/pymupdf4llm/tree/main/examples). ## Integrations For your AI application development, check out our [integrations](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/index.html#integrations) with popular frameworks. ## Support You can get support for PyMuPDF4LLM via a number of options: - [GitHub Issue Board](https://github.com/pymupdf/pymupdf4llm/issues) - [Discord](https://discord.gg/7pH3gqcRtg) - [MuPDF Forum](https://forum.mupdf.com)
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 1.27.2.2 | Imported from PyPI (1.27.2.2) | Low | 4/21/2026 |
| v0.3.4 | For change details please consult [this](https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md) file. | Low | 2/14/2026 |
| v0.3.3 | For change details please consult [this](https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md) file. | Low | 2/13/2026 |
| 0.2.9 | For a description of changes see [this](https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md) file. | Low | 1/9/2026 |
| 0.2.8 | For a summary of changes see file https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md | Low | 1/4/2026 |
| 0.2.7 | For details for the changes see [this](https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md) file. | Low | 12/7/2025 |
| v0.2.6 | Release v0.2.6 | Low | 12/3/2025 |
| 0.2.5 | See detail descriptions of the changes in this file: https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md | Low | 11/30/2025 |
| 0.2.4 | For a list of changes see document https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md. | Low | 11/25/2025 |
| 0.2.3 | For details on changes see file https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md | Low | 11/24/2025 |
| 0.2.2 | Some hotfixes. | Low | 11/17/2025 |
| 0.2.1 | For changes see file https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md | Low | 11/17/2025 |
| 0.2.0 | * Improved reading sequence evaluation for the layout execution path * Do no longer `import pymupdf.layout` to allow running in native mode even when PyMuPDF-Layout is installed. | Low | 11/10/2025 |
| 0.1.9 | Release 0.1.9 | Low | 11/7/2025 |
| 0.1.8 | Fix some markdown and text output bugs. | Low | 11/6/2025 |
| v0.1.7 | This version represents a major change as it support the new PyMuPDF-Layout package. | Low | 11/5/2025 |
| v0.0.27 | For changes see file https://github.com/pymupdf/RAG/blob/main/CHANGES.md | Low | 7/19/2025 |
| v0.0.26 | For changes see file https://github.com/pymupdf/RAG/blob/main/CHANGES.md | Low | 7/2/2025 |
| v0.0.25 | For changes see file https://github.com/pymupdf/RAG/blob/main/CHANGES.md | Low | 6/13/2025 |
| v0.0.24 | For a list of changes see [this](https://github.com/pymupdf/RAG/blob/main/CHANGES.md) file. | Low | 5/10/2025 |
| v0.0.23 | For a list of changes see [this](https://github.com/pymupdf/RAG/blob/main/CHANGES.md) file. | Low | 5/9/2025 |
| v0.0.22 | See [this](https://github.com/pymupdf/RAG/blob/main/CHANGES.md) file. | Low | 4/28/2025 |
| v0.0.21 | For changes in this release see [this](https://github.com/pymupdf/RAG/blob/main/CHANGES.md) file. | Low | 4/8/2025 |
| v0.0.20 | For a description of changes, please [see](https://github.com/pymupdf/RAG/blob/main/CHANGES.md) this file in the repository. | Low | 4/4/2025 |
| v0.0.19 | For fixes and changes see this [file](https://github.com/pymupdf/RAG/blob/main/CHANGES.md). | Low | 3/31/2025 |
| v0.0.17 | Fixes #147, #81, #78 | Low | 9/21/2024 |
| v0.0.16 | Hotfixes #140 | Low | 9/16/2024 |
| v0.0.15 | Fixes issues #138, #135, #134, #132, #128. | Low | 9/16/2024 |
| v0.0.14 | Release v0.0.14 | Low | 9/4/2024 |
| v0.0.13 | Release v0.0.13 | Low | 8/31/2024 |
| v0.0.12 | Fix: make list of bullets a tuple. | Low | 8/23/2024 |
| v0.0.11 | Release v0.0.11 | Low | 8/22/2024 |
| v0.0.10 | Release v0.0.10 | Low | 7/21/2024 |
| v0.0.9 | See changes.rst for a description of the changes. | Low | 7/11/2024 |
| v0.0.8 | Release v0.0.8 | Low | 7/5/2024 |
