pymupdf-layout

PyMuPDF Layout turns PDFs into structured data 10× faster than vision-based tools using AI trained on PDF internals, not images. CPU-only. No GPU required.

pypi

Why this rank:Release freshnessHealthy release cadence

Description

# PyMuPDF Layout **PyMuPDF Layout** is a fast and lightweight layout analysis Python package integrated with PyMuPDF for clean, structured data output from PDF. It's fast, accurate and doesn't need GPUs like vision-based models. While other tools train machine learning models on rendered page images, PyMuPDF Layout trains Graph Neural Networks directly on PDF internals. This gives us accuracy at 10× the speed utilizing CPU-only resources. [![License PolyForm Noncommercial](https://img.shields.io/badge/license-Polyform_Noncommercial-purple)](https://polyformproject.org/licenses/noncommercial/1.0.0/) [![Python version](https://img.shields.io/badge/python-3.11+-blue)](https://pypi.org/project/pymupdf-layout/) [![Docs](https://img.shields.io/badge/docs-RTD-green)](https://polyformproject.org/licenses/noncommercial/1.0.0/) [![Discord](https://img.shields.io/discord/770681584617652264?color=6A7EC2&logo=discord&logoColor=ffffff)](https://discord.gg/ppTFv8uJ46) ## Features - 📚 Structured data extraction from your documents in Markdown, JSON or TXT format - 🧐 Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling - 🔍 Detect and isolate header and footer patterns on each page ## Usage **PyMuPDF Layout** works alongside PyMuDF4LLM's `to_markdown` method. Once PyMuPDF Layout is activated just use `to_markdown` and PyMuPDF Layout will work behind the scenes to analyse documents and deliver improved results. You can also get a `JSON` or `TXT` format of the data with `to_json` or `to_text`. ### Extract Structured data ``` python import pymupdf.layout import pymupdf4llm source = "your.pdf" doc = pymupdf.open(source) md = pymupdf4llm.to_markdown(doc) json = pymupdf4llm.to_json(doc) txt = pymupdf4llm.to_text(doc) ``` ## Try It! Try **PyMuPDF Layout** on [our PyMuPDF website](https://pymupdf.io). ## Documentation See the [PyMuPDF Layout documentation page](https://pymupdf.readthedocs.io/en/latest/pymupdf-layout/index.html) for more.

Release History

Version	Changes	Urgency	Date
1.27.2.2	Imported from PyPI (1.27.2.2)	Low	4/21/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

modalPython client library for Modalmain@2026-06-05

unstructured-clientPython Client SDK for Unstructured APIv0.45.0

anthropicThe official Python library for the anthropic APIv0.106.0

pipThe PyPA recommended tool for installing Python packages.main@2026-06-05

openinference-semantic-conventionsOpenInference Semantic Conventionspython-openinference-instrumentation-pipecat-v2.0.0

More from Artifex

pymupdf4llmPyMuPDF Utilities for LLM/RAG

More in Developer Tools

system_prompts_leaksExtracted system prompts from ChatGPT (GPT-5.4, GPT-5.3, Codex), Claude (Opus 4.6, Sonnet 4.6, Claude Code), Gemini (3.1 Pro, 3 Flash, CLI), Grok (4.2, 4), Perplexity, and more. Updated regularly.

pipThe PyPA recommended tool for installing Python packages.

anthropicThe official Python library for the anthropic API

openinference-instrumentationOpenInference instrumentation utilities