# pymupdf-layout

> PyMuPDF Layout turns PDFs into structured data 10× faster than vision-based tools using AI trained on PDF internals, not images. CPU-only. No GPU required.

- **URL**: https://www.freshcrate.ai/projects/pymupdf-layout
- **Author**: Artifex
- **Category**: Developer Tools
- **Latest version**: `1.27.2.2` (2026-04-21)
- **License**: non-standard
- **Homepage**: https://pypi.org/project/pymupdf-layout/
- **Language**: Python
- **Registry**: pypi (`pymupdf-layout`)
- **Tags**: `pypi`

## Description

# PyMuPDF Layout

**PyMuPDF Layout** is a fast and lightweight layout analysis Python package integrated with PyMuPDF for clean, structured data output from PDF. It's fast, accurate and doesn't need GPUs like vision-based models.

While other tools train machine learning models on rendered page images, PyMuPDF Layout trains Graph Neural Networks directly on PDF internals. This gives us accuracy at 10× the speed utilizing CPU-only resources.

[![License PolyForm Noncommercial](https://img.shields.io/badge/license-Polyform_Noncommercial-purple)](https://polyformproject.org/licenses/noncommercial/1.0.0/)
[![Python version](https://img.shields.io/badge/python-3.11+-blue)](https://pypi.org/project/pymupdf-layout/) [![Docs](https://img.shields.io/badge/docs-RTD-green)](https://polyformproject.org/licenses/noncommercial/1.0.0/)
[![Discord](https://img.shields.io/discord/770681584617652264?color=6A7EC2&logo=discord&logoColor=ffffff)](https://discord.gg/ppTFv8uJ46)

## Features

- 📚 Structured data extraction from your documents in Markdown, JSON or TXT format
- 🧐 Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling
- 🔍 Detect and isolate header and footer patterns on each page


## Usage

**PyMuPDF Layout** works alongside PyMuDF4LLM's `to_markdown` method. Once PyMuPDF Layout is activated just use `to_markdown` and PyMuPDF Layout will work behind the scenes to analyse documents and deliver improved results.

You can also get a `JSON` or `TXT` format of the data with `to_json` or `to_text`.

### Extract Structured data

``` python
import pymupdf.layout
import pymupdf4llm
source = "your.pdf"
doc = pymupdf.open(source)
md = pymupdf4llm.to_markdown(doc)
json = pymupdf4llm.to_json(doc)
txt = pymupdf4llm.to_text(doc)
```

## Try It!

Try **PyMuPDF Layout** on [our PyMuPDF website](https://pymupdf.io).

## Documentation

See the [PyMuPDF Layout documentation page](https://pymupdf.readthedocs.io/en/latest/pymupdf-layout/index.html) for more.

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `1.27.2.2` | 2026-04-21 | Low | Imported from PyPI (1.27.2.2) |

## Citation

- HTML: https://www.freshcrate.ai/projects/pymupdf-layout
- Markdown: https://www.freshcrate.ai/projects/pymupdf-layout.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/pymupdf-layout/deps

_Generated by freshcrate.ai. Indexes pypi releases for AI-agent ecosystem packages._
