pymupdf-layout
PyMuPDF Layout turns PDFs into structured data 10Ã faster than vision-based tools using AI trained on PDF internals, not images. CPU-only. No GPU required.
Description
# PyMuPDF Layout **PyMuPDF Layout** is a fast and lightweight layout analysis Python package integrated with PyMuPDF for clean, structured data output from PDF. It's fast, accurate and doesn't need GPUs like vision-based models. While other tools train machine learning models on rendered page images, PyMuPDF Layout trains Graph Neural Networks directly on PDF internals. This gives us accuracy at 10Ã the speed utilizing CPU-only resources. [](https://polyformproject.org/licenses/noncommercial/1.0.0/) [](https://pypi.org/project/pymupdf-layout/) [](https://polyformproject.org/licenses/noncommercial/1.0.0/) [](https://discord.gg/ppTFv8uJ46) ## Features - đ Structured data extraction from your documents in Markdown, JSON or TXT format - đ§ Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling - đ Detect and isolate header and footer patterns on each page ## Usage **PyMuPDF Layout** works alongside PyMuDF4LLM's `to_markdown` method. Once PyMuPDF Layout is activated just use `to_markdown` and PyMuPDF Layout will work behind the scenes to analyse documents and deliver improved results. You can also get a `JSON` or `TXT` format of the data with `to_json` or `to_text`. ### Extract Structured data ``` python import pymupdf.layout import pymupdf4llm source = "your.pdf" doc = pymupdf.open(source) md = pymupdf4llm.to_markdown(doc) json = pymupdf4llm.to_json(doc) txt = pymupdf4llm.to_text(doc) ``` ## Try It! Try **PyMuPDF Layout** on [our PyMuPDF website](https://pymupdf.io). ## Documentation See the [PyMuPDF Layout documentation page](https://pymupdf.readthedocs.io/en/latest/pymupdf-layout/index.html) for more.
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 1.27.2.2 | Imported from PyPI (1.27.2.2) | Low | 4/21/2026 |
