freshcrate
Home > RAG & Memory > pymupdf4llm

pymupdf4llm

PyMuPDF Utilities for LLM/RAG

Description

<p align="center"> <a href="https://github.com/pymupdf/pymupdf4llm"> <img loading="lazy" alt="PyMuPDF logo" src="https://pymupdf.readthedocs.io/en/latest/_static/sidebar-logo-light.svg" width="150px" height='auto' /> </a> </p> # PyMuPDF4LLM [![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) [![License MIT](https://img.shields.io/badge/license-AGPL-green)](https://artifex.com/licensing/gnu-agpl-v3) [![PyPI Downloads](https://static.pepy.tech/badge/pymupdf4llm/month)](https://pepy.tech/projects/pymupdf4llm) [![Discord](https://img.shields.io/discord/1460622234811895872?color=6A7EC2&logo=discord&logoColor=ffffff)](https://discord.gg/7pH3gqcRtg) **PyMuPDF4LLM** is a lightweight extension for **PyMuPDF** that turns documents into clean, structured data with minimal setup. It includes layout analysis *without* any GPU requirement. **PyMuPDF4LLM** makes it easy to extract document content in the format you need for **LLM** & **RAG** environments. It supports structured data extraction to **Markdown**, **JSON** and **TXT** , as well as [LlamaIndex](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/index.html#with-llamaindex) and [LangChain](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/index.html#with-langchain) integration. ## Features - Parsing of [multiple document formats](https://pymupdf.readthedocs.io/en/latest/about.html#feature-matrix). - Export structured data as Markdown, JSON and plain text output formats. - Support for multi-column pages. - Support for image and vector graphics extraction. - Layout analysis for better semantic understanding of document structure. - Support for page chunking output. - Integration with popular AI frameworks. ## Installation ```bash $ pip install -U pymupdf4llm ``` > This command will automatically install or upgrade [PyMuPDF](https://github.com/pymupdf/PyMuPDF) as required. ## Execution ### Markdown ```python import pymupdf4llm # The remainder of the script is unchanged md_text = pymupdf4llm.to_markdown("input.pdf") # now work with the output data, e.g. store as a UTF8-encoded file import pathlib pathlib.Path("output.md").write_text(md_text) ``` ### JSON ```python import pymupdf4llm json_text = pymupdf4llm.to_json("input.pdf") # now work with the output data, e.g. store as a UTF8-encoded file import pathlib pathlib.Path("output.json").write_text(json_text) ``` ### Plain Text ```python import pymupdf4llm plain_text = pymupdf4llm.to_text("input.pdf") # now work with the output data, e.g. store as a UTF8-encoded file import pathlib pathlib.Path("output.txt").write_text(plain_text) ``` ## Documentation Check out the [PyMuPDF4LLM documentation](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm), for details on installation, features, sample code and the [full API](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/api.html). ## Examples Find our [examples on GitHub](https://github.com/pymupdf/pymupdf4llm/tree/main/examples). ## Integrations For your AI application development, check out our [integrations](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/index.html#integrations) with popular frameworks. ## Support You can get support for PyMuPDF4LLM via a number of options: - [GitHub Issue Board](https://github.com/pymupdf/pymupdf4llm/issues) - [Discord](https://discord.gg/7pH3gqcRtg) - [MuPDF Forum](https://forum.mupdf.com)

Release History

VersionChangesUrgencyDate
1.27.2.2Imported from PyPI (1.27.2.2)Low4/21/2026
v0.3.4For change details please consult [this](https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md) file.Low2/14/2026
v0.3.3For change details please consult [this](https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md) file.Low2/13/2026
0.2.9For a description of changes see [this](https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md) file.Low1/9/2026
0.2.8For a summary of changes see file https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.mdLow1/4/2026
0.2.7For details for the changes see [this](https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md) file.Low12/7/2025
v0.2.6Release v0.2.6Low12/3/2025
0.2.5See detail descriptions of the changes in this file: https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.mdLow11/30/2025
0.2.4For a list of changes see document https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.md.Low11/25/2025
0.2.3For details on changes see file https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.mdLow11/24/2025
0.2.2Some hotfixes.Low11/17/2025
0.2.1For changes see file https://github.com/pymupdf/pymupdf4llm/blob/main/CHANGES.mdLow11/17/2025
0.2.0* Improved reading sequence evaluation for the layout execution path * Do no longer `import pymupdf.layout` to allow running in native mode even when PyMuPDF-Layout is installed.Low11/10/2025
0.1.9Release 0.1.9Low11/7/2025
0.1.8Fix some markdown and text output bugs.Low11/6/2025
v0.1.7This version represents a major change as it support the new PyMuPDF-Layout package.Low11/5/2025
v0.0.27For changes see file https://github.com/pymupdf/RAG/blob/main/CHANGES.mdLow7/19/2025
v0.0.26For changes see file https://github.com/pymupdf/RAG/blob/main/CHANGES.mdLow7/2/2025
v0.0.25For changes see file https://github.com/pymupdf/RAG/blob/main/CHANGES.mdLow6/13/2025
v0.0.24For a list of changes see [this](https://github.com/pymupdf/RAG/blob/main/CHANGES.md) file.Low5/10/2025
v0.0.23For a list of changes see [this](https://github.com/pymupdf/RAG/blob/main/CHANGES.md) file.Low5/9/2025
v0.0.22See [this](https://github.com/pymupdf/RAG/blob/main/CHANGES.md) file.Low4/28/2025
v0.0.21For changes in this release see [this](https://github.com/pymupdf/RAG/blob/main/CHANGES.md) file.Low4/8/2025
v0.0.20For a description of changes, please [see](https://github.com/pymupdf/RAG/blob/main/CHANGES.md) this file in the repository.Low4/4/2025
v0.0.19For fixes and changes see this [file](https://github.com/pymupdf/RAG/blob/main/CHANGES.md).Low3/31/2025
v0.0.17Fixes #147, #81, #78Low9/21/2024
v0.0.16Hotfixes #140 Low9/16/2024
v0.0.15Fixes issues #138, #135, #134, #132, #128.Low9/16/2024
v0.0.14Release v0.0.14Low9/4/2024
v0.0.13Release v0.0.13Low8/31/2024
v0.0.12Fix: make list of bullets a tuple.Low8/23/2024
v0.0.11Release v0.0.11Low8/22/2024
v0.0.10Release v0.0.10Low7/21/2024
v0.0.9See changes.rst for a description of the changes.Low7/11/2024
v0.0.8Release v0.0.8Low7/5/2024

Dependencies & License Audit

Loading dependencies...

Similar Packages

azure-search-documentsMicrosoft Azure Cognitive Search Client Library for Pythonazure-template_0.1.0b6187637
apache-tvm-ffitvm ffi0.1.10
luqumA Lucene query parser generating ElasticSearch queries and more !1.0.0
torchaoPackage for applying ao techniques to GPU models0.17.0
banksA prompt programming language2.4.1