docling
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.
Description
<p align="center"> <a href="https://github.com/docling-project/docling"> <img loading="lazy" alt="Docling" src="https://github.com/docling-project/docling/raw/main/docs/assets/docling_processing.png" width="100%"/> </a> </p> # Docling <p align="center"> <a href="https://trendshift.io/repositories/12132" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12132" alt="DS4SD%2Fdocling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> </p> [](https://arxiv.org/abs/2408.09869) [](https://docling-project.github.io/docling/) [](https://pypi.org/project/docling/) [](https://pypi.org/project/docling/) [](https://github.com/astral-sh/uv) [](https://github.com/astral-sh/ruff) [](https://pydantic.dev) [](https://github.com/pre-commit/pre-commit) [](https://opensource.org/licenses/MIT) [](https://pepy.tech/projects/docling) [](https://apify.com/vancura/docling) [](https://app.dosu.dev/097760a8-135e-4789-8234-90c8837d7f1c/ask?utm_source=github) [](https://docling.ai/discord) [](https://www.bestpractices.dev/projects/10101) [](https://lfaidata.foundation/projects/) Docling simplifies document processing, parsing diverse formats โ including advanced PDF understanding โ and providing seamless integrations with the gen AI ecosystem. ## Features * ๐๏ธ Parsing of [multiple document formats][supported_formats] incl. PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, WebVTT, images (PNG, TIFF, JPEG, ...), LaTeX, plain text, and more * ๐ Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more * ๐งฌ Unified, expressive [DoclingDocument][docling_document] representation format * โช๏ธ Various [export formats][supported_formats] and options, including Markdown, HTML, WebVTT, [DocTags](https://arxiv.org/abs/2503.11576) and lossless JSON * ๐ Support of several application-specifc XML schemas incl. [USPTO](https://www.uspto.gov/patents) patents, [JATS](https://jats.nlm.nih.gov/) articles, and [XBRL](https://www.xbrl.org/) financial reports. * ๐ Local execution capabilities for sensitive data and air-gapped environments * ๐ค Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI * ๐ Extensive OCR support for scanned PDFs and images * ๐ Support of several Visual Language Models ([GraniteDocling](https://huggingface.co/ibm-granite/granite-docling-258M)) * ๐๏ธ Audio support with Automatic Speech Recognition (ASR) models * ๐ Connect to any agent using the [MCP server](https://docling-project.github.io/docling/usage/mcp/) * ๐ป Simple and convenient CLI ### What's new * ๐ค Structured [information extraction][extraction] \[๐งช beta\] * ๐ New layout model (**Heron**) by default, for faster PDF parsing * ๐ [MCP server](https://docling-project.github.io/docling/usage/mcp/) for agentic applications * ๐ผ Parsing of XBRL (eXtensible Business Reporting Language) documents for financial reports * ๐ฌ Parsing of WebVTT (Web Video Text Tracks) files and export to WebVTT format * ๐ฌ Parsing of LaTeX files * ๐ Parsing of plain-text files (`.txt`, `.text`) and Markdown supersets (`.qmd`, `.Rmd`) * ๐ Chart understanding (Barchart, Piechart, LinePlot): converting them into tables, code or adding detailed descriptions ### Coming soon * ๐ Metadata extraction, including title, authors, references & language * ๐ Complex chemistry understanding (Molecular structures) ## Installation To use Docling, simply install `docling` from your package manager, e.g. pip: ```bash pip install docling ``` > **Note:** Python 3.9 support was dropped in docling version 2.70.0. Please use Python 3.10 or higher
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 2.90.0 | Imported from PyPI (2.90.0) | Low | 4/21/2026 |
| v2.90.0 | ### Feature * Implement GraniteVisionTableStructureModel for VLM-based table extraction ([#3323](https://github.com/docling-project/docling/issues/3323)) ([`1569e42`](https://github.com/docling-project/docling/commit/1569e42f8484f7abda8b8fb615e9c67d47e83855)) ### Fix * **latex:** Fully unwrap deeply nested formatting macros ([#3249](https://github.com/docling-project/docling/issues/3249)) ([`101233e`](https://github.com/docling-project/docling/commit/101233ebe211ece703605a16a23225da836e3c46)) | High | 4/17/2026 |
| v2.89.0 | ### Feature * Explicit TikZ environment handling in LaTeX backend ([#3187](https://github.com/docling-project/docling/issues/3187)) ([`a15c16e`](https://github.com/docling-project/docling/commit/a15c16e19fc9531e68916d15a1976ba76414c545)) ### Fix * **ocr:** Align RapidOCR english assets with 3.8 mobile models ([#3291](https://github.com/docling-project/docling/issues/3291)) ([`251c8b2`](https://github.com/docling-project/docling/commit/251c8b217a72453205242993e03ca8004cb2877e)) * **docx:** Iso | High | 4/16/2026 |
| v2.88.0 | ### Feature * **service:** Establish client SDK for docling serve ([#3264](https://github.com/docling-project/docling/issues/3264)) ([`42157a3`](https://github.com/docling-project/docling/commit/42157a3e100ae306f74938310018be3909cabf8c)) ### Fix * **ocr:** Support rapidocr 3.8 mobile model naming ([#3277](https://github.com/docling-project/docling/issues/3277)) ([`6b257ec`](https://github.com/docling-project/docling/commit/6b257ece330db9c39b8834b2b5a87b9c1eecb1fa)) ### Documentation * Add a | Medium | 4/13/2026 |
| v2.87.0 | ### Feature * **vlm:** Add Nanonets OCR2 onboarding ([#3274](https://github.com/docling-project/docling/issues/3274)) ([`9970d1e`](https://github.com/docling-project/docling/commit/9970d1ef94c5e826080834d0f8858cfd8f9e7edb)) ### Fix * Transformers v5 compatibility for AUTOMODEL_CAUSALLM VLMs ([#3276](https://github.com/docling-project/docling/issues/3276)) ([`d431224`](https://github.com/docling-project/docling/commit/d43122447f9b5b9dcad1f88819b8cb2a59f62b33)) * **vlm:** Add explicit MLX suppo | Medium | 4/13/2026 |
| v2.86.0 | ### Feature * Support for GraniteVision v4 ([#3217](https://github.com/docling-project/docling/issues/3217)) ([`fd83420`](https://github.com/docling-project/docling/commit/fd834204fadcb15190f3f2c289841143773b5f9d)) * Add signature/stamp html block to DC document ([#3251](https://github.com/docling-project/docling/issues/3251)) ([`9b4b67b`](https://github.com/docling-project/docling/commit/9b4b67b23e77d6d9063ee141196707412bde1673)) * **vlm:** Add PARTIAL_SUCCESS status for VLM pipeline pages ([# | Medium | 4/10/2026 |
| v2.85.0 | ### Feature * Add support for Falcon-OCR ([#3237](https://github.com/docling-project/docling/issues/3237)) ([`d0e19be`](https://github.com/docling-project/docling/commit/d0e19be14ff3dbe8d44b5bf8bfe4cf53b58249f6)) * Add support for LightOnOCR-2-1B ([#3213](https://github.com/docling-project/docling/issues/3213)) ([`f2affd7`](https://github.com/docling-project/docling/commit/f2affd76149aa7c1ed84df1e84ef537f3905559b)) ### Fix * **latex:** Expand custom macro parameters ([#3223](https://github.co | Medium | 4/7/2026 |
| v2.84.0 | ### Feature * Glm ocr ([#3146](https://github.com/docling-project/docling/issues/3146)) ([`a9265d8`](https://github.com/docling-project/docling/commit/a9265d854a195993d2e63bfc8c4bb2f76be7f9d9)) * Switch to the latest version of DocumentFigureClassifier model v2.5 ([#3171](https://github.com/docling-project/docling/issues/3171)) ([`d046390`](https://github.com/docling-project/docling/commit/d046390bf4bff2c538cb33eebb03dce56d122d37)) * Remove the deprecation of extraction ([#3220](https://github. | Medium | 4/1/2026 |
| v2.83.0 | ### Feature * Upgrade to transformers v5 ([#3200](https://github.com/docling-project/docling/issues/3200)) ([`d2c6357`](https://github.com/docling-project/docling/commit/d2c6357982d79629440919188d73bda18bc678c8)) * OCR model for remote KServe v2 API ([#3189](https://github.com/docling-project/docling/issues/3189)) ([`8522b00`](https://github.com/docling-project/docling/commit/8522b00146a2217760ad1944934926ed0e9f5d39)) ### Fix * **pdf:** Propagate hyperlinks to DoclingDocument text items ([#31 | Medium | 3/31/2026 |
| v2.82.0 | ### Feature * Implementation of HTML backend with headless browser ([#2969](https://github.com/docling-project/docling/issues/2969)) ([`1c74a9b`](https://github.com/docling-project/docling/commit/1c74a9b9c7c2019b85abef8f0f94381a83b721df)) ### Fix * **omml:** Correct LaTeX output for fractions, math operators, and functions ([#3122](https://github.com/docling-project/docling/issues/3122)) ([`e36125b`](https://github.com/docling-project/docling/commit/e36125ba2ddfbe584fc752e6dc7ca0f0f8f58d87)) | Medium | 3/25/2026 |
| v2.81.0 | ### Feature * Route plain-text and Quarto/R Markdown files to the Markdown backend ([#3161](https://github.com/docling-project/docling/issues/3161)) ([`96d7c7e`](https://github.com/docling-project/docling/commit/96d7c7ec79992d8dddedfafaaedb7f9bf6e14f40)) ### Fix * **docx:** Missing list items after numbered header (#2665) ([#2678](https://github.com/docling-project/docling/issues/2678)) ([`2f7c09e`](https://github.com/docling-project/docling/commit/2f7c09e0d8f07a5fa0aaf4f33bdfb1f71d3f3063)) * | Low | 3/20/2026 |
| v2.80.0 | ### Feature * Add the VllmCudaGraphMode ([#3125](https://github.com/docling-project/docling/issues/3125)) ([`f950679`](https://github.com/docling-project/docling/commit/f950679f60ab6b1a9b057e7131fc8c8334e6e62e)) | Low | 3/14/2026 |
| v2.79.0 | ### Feature * Add fact metadata and linkbase relationships for XBRL ([#3084](https://github.com/docling-project/docling/issues/3084)) ([`7952efe`](https://github.com/docling-project/docling/commit/7952efee2fcbae2a9c516d75acd8995c004fc949)) ### Fix * Use OCR cells with TableFormer v2 ([#3107](https://github.com/docling-project/docling/issues/3107)) ([`93f6fee`](https://github.com/docling-project/docling/commit/93f6feeabcef81b1f71a189458b0166af9db176c)) * Add self-consistency check in the table | Low | 3/12/2026 |
| v2.78.0 | ### Feature * Add support for TableFormer v2 ([#3013](https://github.com/docling-project/docling/issues/3013)) ([`4ccd1d4`](https://github.com/docling-project/docling/commit/4ccd1d465deb8d521c09e2da61b537a9236d6560)) * Add gRPC transport for KServe v2 API engine ([#3074](https://github.com/docling-project/docling/issues/3074)) ([`3d90778`](https://github.com/docling-project/docling/commit/3d90778e3e5762b16758e1c121f42890e32f0560)) ### Fix * **html:** Fix broken document tree and quadratic com | Low | 3/10/2026 |
| v2.77.0 | ### Feature * Track vlm_inference time for mlx_model pipeline ([#3060](https://github.com/docling-project/docling/issues/3060)) ([`38c4bb2`](https://github.com/docling-project/docling/commit/38c4bb26e8e3a7797d1caec3f690a7c8d5d9a735)) * Add configurable graph_optimization_level for ONNX Runtime engines ([#3071](https://github.com/docling-project/docling/issues/3071)) ([`cfc6636`](https://github.com/docling-project/docling/commit/cfc6636a2a0e6b149dd51714d20e9b93f3f6463b)) ### Fix * **docx:** Pr | Low | 3/6/2026 |
| v2.76.0 | ### Feature * Export to WebVTT format ([#3036](https://github.com/docling-project/docling/issues/3036)) ([`d276e60`](https://github.com/docling-project/docling/commit/d276e6056106b6aa04fee65def96d3e10557d632)) ### Fix * **xlsx:** Handle OneCellAnchor images in Excel backend ([#3045](https://github.com/docling-project/docling/issues/3045)) ([`859c302`](https://github.com/docling-project/docling/commit/859c302310289c5bab45a6e160e7cc3b9c538343)) * Normalize Unicode ligatures in PDF text extracti | Low | 3/2/2026 |
| v2.75.0 | ### Feature * Create a backend parser for XBRL instance reports ([#3017](https://github.com/docling-project/docling/issues/3017)) ([`334ba6e`](https://github.com/docling-project/docling/commit/334ba6e51fa7feb5f2ae15fce4612c7b3fad67d6)) * Unified model-family inference engines (including image-classification) and KServe v2 API support ([#2979](https://github.com/docling-project/docling/issues/2979)) ([`0353293`](https://github.com/docling-project/docling/commit/03532938b52fb1513e2ea3afffc6da6a7d | Low | 2/24/2026 |
| v2.74.0 | ### Feature * Introduce docling-parse v5 and deprecate old docling-parse backends ([#2872](https://github.com/docling-project/docling/issues/2872)) ([`bf417e6`](https://github.com/docling-project/docling/commit/bf417e6d264ebaf93bda7f53534e2cc50ccb2284)) ### Fix * Security vulnerabilities with XML External Entity and related attacks ([#3009](https://github.com/docling-project/docling/issues/3009)) ([`576bada`](https://github.com/docling-project/docling/commit/576bada7b7d542ea308778a053bc3c4d49 | Low | 2/17/2026 |
| v2.73.1 | ### Fix * **asciidoc:** Handle commas in image alt text ([#2983](https://github.com/docling-project/docling/issues/2983)) ([`86b6912`](https://github.com/docling-project/docling/commit/86b691204d2e4c2a54c99d80063e2dd5b5428168)) * Use timezone-aware datetime ([#2947](https://github.com/docling-project/docling/issues/2947)) ([`e2870f9`](https://github.com/docling-project/docling/commit/e2870f94ed78caeb6db9d735b5a73fa80e5e2104)) * Add failed pages to DoclingDocument for page break consistency ([#2 | Low | 2/13/2026 |
| v2.73.0 | ### Feature * Inference engines abstraction for object detection model family with HF Transformers and ONNX runtime ([#2959](https://github.com/docling-project/docling/issues/2959)) ([`14e474c`](https://github.com/docling-project/docling/commit/14e474c95555f04e5c4ac55351ad802d372858fc)) * Added support for parsing LaTeX (.tex) documents ([#2890](https://github.com/docling-project/docling/issues/2890)) ([`e6ccb8b`](https://github.com/docling-project/docling/commit/e6ccb8b2c1d99fa6e2660d7c4bb866a | Low | 2/11/2026 |
| v2.72.0 | ### Feature * Add chart extraction models ([#2848](https://github.com/docling-project/docling/issues/2848)) ([`fe45c71`](https://github.com/docling-project/docling/commit/fe45c71fe7ad137088e3719dc99e337860120d33)) ### Fix * **backend:** Improve Excel table bounds detection and flatten merged cells ([#2778](https://github.com/docling-project/docling/issues/2778)) ([`3110c43`](https://github.com/docling-project/docling/commit/3110c439da48fe215379492a29a310e64e9d67e7)) * **pptx:** Handle picture | Low | 2/3/2026 |
| v2.71.0 | ### Feature * Webvtt and source tracker ([#2787](https://github.com/docling-project/docling/issues/2787)) ([`0602a7c`](https://github.com/docling-project/docling/commit/0602a7cdab17b0e42057e1ef502048e95bd589f4)) * Add support for Word document comments extraction ([#2834](https://github.com/docling-project/docling/issues/2834)) ([`b6ca094`](https://github.com/docling-project/docling/commit/b6ca09451963c606b5d280b74e559278717bb911)) ### Fix * Allow newer typer versions ([#2930](https://github. | Low | 1/30/2026 |
| v2.70.0 | ### Feature * Drop support for Python 3.9 ([#2905](https://github.com/docling-project/docling/issues/2905)) ([`7f38658`](https://github.com/docling-project/docling/commit/7f386587ed9a28a839a928f3815d5ce1f3e05f8b)) ### Fix * **md:** Handle pipe symbols that are not table markers ([#2904](https://github.com/docling-project/docling/issues/2904)) ([`86eaef5`](https://github.com/docling-project/docling/commit/86eaef5b4544d638099657d38f18966ddd3e73f2)) * Remove direct vllm dependency ([#2910](https | Low | 1/23/2026 |
| v2.69.1 | ### Fix * Off-by-one error for page indexing in vlm_pipeline ([#2902](https://github.com/docling-project/docling/issues/2902)) ([`08f49e2`](https://github.com/docling-project/docling/commit/08f49e2abc74bfbc6be3433f64698c2b4ac7ddce)) | Low | 1/21/2026 |
| v2.69.0 | ### Feature * New picture classifier v2.0 ([#2889](https://github.com/docling-project/docling/issues/2889)) ([`43badc3`](https://github.com/docling-project/docling/commit/43badc3838ccfc98fd28d9d66ffe0811585f90fd)) * Add classification filters for picture description ([#2836](https://github.com/docling-project/docling/issues/2836)) ([`ac16a26`](https://github.com/docling-project/docling/commit/ac16a26a047ccf5edd88775197ca43d146d00528)) ### Fix * Torch compatibility for xpu ([#2894](https://git | Low | 1/20/2026 |
| v2.68.0 | ### Feature * Support for DeepSeek-OCR in VLM pipeline ([#2798](https://github.com/docling-project/docling/issues/2798)) ([`19af03f`](https://github.com/docling-project/docling/commit/19af03f539b40d88eedd132644ed085b572664d7)) ### Fix * **logging:** Include page numbers in preprocess error messages ([#2858](https://github.com/docling-project/docling/issues/2858)) ([`89bea24`](https://github.com/docling-project/docling/commit/89bea245392b840a0c25c5fc35c931477a34d881)) * **docx:** Handle groupe | Low | 1/13/2026 |
| v2.67.0 | ### Feature * Enrichment annotations in the new meta format ([#2859](https://github.com/docling-project/docling/issues/2859)) ([`aab3ff5`](https://github.com/docling-project/docling/commit/aab3ff5d82fc54864657c0c2ff8e0aa21461f23f)) * Add XPU device support for Intel GPUs ([#2809](https://github.com/docling-project/docling/issues/2809)) ([`2b83fdd`](https://github.com/docling-project/docling/commit/2b83fdd0deeec0f1ad016cc78ea42d3144a86cad)) * Add option to report timings details ([#2772](https:/ | Low | 1/9/2026 |
| v2.66.0 | ### Feature * Add preset for using granite-docling via vllm and other apis ([#2792](https://github.com/docling-project/docling/issues/2792)) ([`241d19e`](https://github.com/docling-project/docling/commit/241d19ed6f1b6d4327df250497ff8d8dd2686b5d)) ### Fix * **docx:** Handle tables with merged cells causing IndexError ([#2813](https://github.com/docling-project/docling/issues/2813)) ([`faff935`](https://github.com/docling-project/docling/commit/faff935b0e9f7a6f450b3bbc0329a05ac1b00ff2)) * **mar | Low | 12/24/2025 |
| v2.65.0 | ### Feature * Add YAML output format to CLI ([#2768](https://github.com/docling-project/docling/issues/2768)) ([`da7678a`](https://github.com/docling-project/docling/commit/da7678a754b62df5cf0a9a1efe98c288bda20bd7)) ### Fix * **rapidocr:** Use correct parameter name for rec_keys_path ([#2762](https://github.com/docling-project/docling/issues/2762)) ([`1d78418`](https://github.com/docling-project/docling/commit/1d78418cefb5b90691481fa92c35e8b4909b6de5)) * **docx:** Handle missing value in para | Low | 12/15/2025 |
| v2.64.1 | ### Fix * Clear word/char cells when force_full_page_ocr is used ([#2738](https://github.com/docling-project/docling/issues/2738)) ([`1df0560`](https://github.com/docling-project/docling/commit/1df0560ec2cafcd95f2240e6188385e1ec117110)) * Add missing font download in the rapidocr artifacts ([#2735](https://github.com/docling-project/docling/issues/2735)) ([`edbabfc`](https://github.com/docling-project/docling/commit/edbabfcac2fd53345b1a0677e81f206285d58bae)) * Ensure proper image_scale for gene | Low | 12/9/2025 |
| v2.64.0 | ### Feature * **experimental:** Add experimental TableCropsLayoutModel ([#2669](https://github.com/docling-project/docling/issues/2669)) ([`1344362`](https://github.com/docling-project/docling/commit/134436245a1ebdadbfd8ba3c870f0f3c866f39a7)) * Factory and plugin-capability for Layout and Table models ([#2637](https://github.com/docling-project/docling/issues/2637)) ([`ad97e52`](https://github.com/docling-project/docling/commit/ad97e5285126388847ba9a219ac73f006c759f09)) ### Fix * InputFormat. | Low | 12/2/2025 |
| v2.63.0 | ### Feature * Add save and load for conversion result ([#2648](https://github.com/docling-project/docling/issues/2648)) ([`b559813`](https://github.com/docling-project/docling/commit/b559813b9becf7950bc539c1334e55ef17bed2ad)) ### Fix * Respect document_timeout in new threaded StandardPdfPipeline ([#2653](https://github.com/docling-project/docling/issues/2653)) ([`2087c6b`](https://github.com/docling-project/docling/commit/2087c6bf9f65f279dd2ff0631768996aecd640fe)) * In DocumentConverter.conve | Low | 11/20/2025 |
| v2.62.0 | ### Feature * Add the Image backend ([#2627](https://github.com/docling-project/docling/issues/2627)) ([`3495b73`](https://github.com/docling-project/docling/commit/3495b73de875c2438108b4362dbac770b6d322ca)) * **experimental:** Layout + VLM model with layout prompt ([#2244](https://github.com/docling-project/docling/issues/2244)) ([`4852d8b`](https://github.com/docling-project/docling/commit/4852d8b4f2938434f1d6250984fa18ec5428055f)) ### Fix * Correct the model-repo name ([#2624](https://gith | Low | 11/17/2025 |
| v2.61.2 | ### Fix * Default to EasyOCR in Python 3.14 ([#2605](https://github.com/docling-project/docling/issues/2605)) ([`5c27567`](https://github.com/docling-project/docling/commit/5c27567c4160b6ec43857855c8d5cd3a58c031c5)) | Low | 11/10/2025 |
| v2.61.1 | ### Fix * **docx:** Slow table parsing ([#2553](https://github.com/docling-project/docling/issues/2553)) ([`ef623ff`](https://github.com/docling-project/docling/commit/ef623ffceefe40aa237e163b564310ed81296bcf)) * **html:** Slow table parsing ([#2582](https://github.com/docling-project/docling/issues/2582)) ([`0ba8d5d`](https://github.com/docling-project/docling/commit/0ba8d5d9e325390626268744f289458e91689b4b)) ### Documentation * Make navigation menus collapse and expand ([#2573](https://gith | Low | 11/6/2025 |
| v2.61.0 | ### Feature * **vlm:** Track generated tokens and stop reasons for VLM models ([#2543](https://github.com/docling-project/docling/issues/2543)) ([`6a04e27`](https://github.com/docling-project/docling/commit/6a04e273528691eb22a5708f1270d4c5fa8f5b7c)) ### Fix * Temporarily pin NuExtract to working revision ([#2588](https://github.com/docling-project/docling/issues/2588)) ([`fa92574`](https://github.com/docling-project/docling/commit/fa925741b6dc00c7bd2806c62cb75cb539649c9f)) * **ocr:** Use PSM | Low | 11/6/2025 |
| v2.60.1 | ### Fix * Extract response from api_image_request in picture description ([#2571](https://github.com/docling-project/docling/issues/2571)) ([`8360aa5`](https://github.com/docling-project/docling/commit/8360aa54492bc5b5e07fcd07b0b85284910f1a14)) | Low | 11/4/2025 |
| v2.60.0 | ### Feature * Use threading in the standard pipeline and move old behavior to legacy ([#2452](https://github.com/docling-project/docling/issues/2452)) ([`268d027`](https://github.com/docling-project/docling/commit/268d027c8f2abae7339b4c7d33642c3135c56e7a)) ### Fix * **pdf:** Threadsafe for pypdfium2 backend ([#2527](https://github.com/docling-project/docling/issues/2527)) ([`a51275d`](https://github.com/docling-project/docling/commit/a51275d08037a30ebaa07e33b0c4e82623791259)) ### Documentati | Low | 10/31/2025 |
| v2.59.0 | ### Feature * **vlm:** Add num_tokens as attribtue for VlmPrediction ([#2489](https://github.com/docling-project/docling/issues/2489)) ([`b6c892b`](https://github.com/docling-project/docling/commit/b6c892b505bf29a12ce7e8d9b4e88e1253440ebc)) * Support for Python 3.14 ([#2530](https://github.com/docling-project/docling/issues/2530)) ([`cdffb47`](https://github.com/docling-project/docling/commit/cdffb47b9a12da23489e345ea633786914776f7d)) ### Fix * Xlsx cell parsing, now returning values instead | Low | 10/30/2025 |
| v2.58.0 | ### Feature * **pdf:** Support for password-protected PDF documents ([#2499](https://github.com/docling-project/docling/issues/2499)) ([`bbe82a6`](https://github.com/docling-project/docling/commit/bbe82a68d08e5dc33191524bb636f06112edff87)) * **backend:** Add generic options support and HTML image handling modes ([#2011](https://github.com/docling-project/docling/issues/2011)) ([`a30e6a7`](https://github.com/docling-project/docling/commit/a30e6a76148079cc48fb179e4b9ca36371026b6f)) * **ASR:** MLX | Low | 10/22/2025 |
| v2.57.0 | ### Feature * **docx:** Process drawingml objects in docx ([#2453](https://github.com/docling-project/docling/issues/2453)) ([`1682993`](https://github.com/docling-project/docling/commit/16829939cf1f8d89974c51c1d7c5cdc2fe8045da)) ### Fix * Use proper page concatentation in VLM pipeline MD/HTML conversion ([#2458](https://github.com/docling-project/docling/issues/2458)) ([`cd7f7ba`](https://github.com/docling-project/docling/commit/cd7f7ba145c401fb6567ef1c7337c840100cded1)) ### Documentation | Low | 10/15/2025 |
| v2.56.1 | ### Fix * Avoid downloading easyocr models by default ([#2454](https://github.com/docling-project/docling/issues/2454)) ([`688a7df`](https://github.com/docling-project/docling/commit/688a7dfd38ba3e3aea64f7fe027815e910818785)) | Low | 10/13/2025 |
| v2.56.0 | ### Feature * AutoOCR model selecting the best OCR model available and deprecating the usage of EasyOCR ([#2391](https://github.com/docling-project/docling/issues/2391)) ([`f7244a4`](https://github.com/docling-project/docling/commit/f7244a433378327576e3554d41d80928ee38e2a7)) * Add Tesseract PSM options support ([#2411](https://github.com/docling-project/docling/issues/2411)) ([`f11f8c0`](https://github.com/docling-project/docling/commit/f11f8c0a8188f99179acd7e47a48b908b1ea64d0)) ### Fix * **a | Low | 10/13/2025 |
| v2.55.1 | ### Fix * **markdown:** Setext heading support ([#2359](https://github.com/docling-project/docling/issues/2359)) ([`ee73ffa`](https://github.com/docling-project/docling/commit/ee73ffae15b2bb60c42e333fca4684bd57eeff31)) * **docs:** Fixed the color scheme ([#2371](https://github.com/docling-project/docling/issues/2371)) ([`246de77`](https://github.com/docling-project/docling/commit/246de77d8ce53fbeb7a93a6412c461df82269685)) * Empty table handling ([#2365](https://github.com/docling-project/doclin | Low | 10/3/2025 |
| v2.55.0 | ### Feature * Repetition-based StoppingCriteria for GraniteDocling ([#2323](https://github.com/docling-project/docling/issues/2323)) ([`1e9dc43`](https://github.com/docling-project/docling/commit/1e9dc43b722aeffa4574ae2a87bae1eb180c1201)) * Rich tables support for HTML backend ([#2324](https://github.com/docling-project/docling/issues/2324)) ([`c803abe`](https://github.com/docling-project/docling/commit/c803abed9ae98489184791a70bf49cac0c83ab89)) ### Fix * Pin wider range of typer ([#2309](htt | Low | 9/30/2025 |
| v2.54.0 | ### Feature * Rich tables for MSWord backend ([#2291](https://github.com/docling-project/docling/issues/2291)) ([`e2482a2`](https://github.com/docling-project/docling/commit/e2482a2ada52b2b8a41c4402b27e125adbe4385f)) * Add a backend parser for WebVTT files ([#2288](https://github.com/docling-project/docling/issues/2288)) ([`46efaae`](https://github.com/docling-project/docling/commit/46efaaefee17a6b83e02a050f9f3c8a51afbbd53)) ### Fix * Correct y-axis scaling in draw_table_cells ([#2287](https: | Low | 9/22/2025 |
| v2.53.0 | ### Feature * Add granite-docling model ([#2272](https://github.com/docling-project/docling/issues/2272)) ([`17afb66`](https://github.com/docling-project/docling/commit/17afb664d005168b5a6f12a2df4432076a9329bb)) * **RapidOcr:** Support generic extra arguments for RapidOcr ([#2266](https://github.com/docling-project/docling/issues/2266)) ([`0e95171`](https://github.com/docling-project/docling/commit/0e95171dd64733ba52f2f0906642be24f6237977)) ### Fix * Handle empty result from RapidOCR to avoid | Low | 9/17/2025 |
| v2.52.0 | ### Feature * Enrichment steps on all convert pipelines (incl docx, html, etc) ([#2251](https://github.com/docling-project/docling/issues/2251)) ([`2c91234`](https://github.com/docling-project/docling/commit/2c9123419f541feda8cc98c53aeb37288fabcaee)) ### Fix * Add missing features in ThreadedStandardPdfPipeline ([#2252](https://github.com/docling-project/docling/issues/2252)) ([`0700af2`](https://github.com/docling-project/docling/commit/0700af212cce8d90dbe0477dcb06d69370649e97)) * Address de | Low | 9/11/2025 |
| v2.51.0 | ### Feature * Updating default parameters to get better performance with docling-parse ([#2208](https://github.com/docling-project/docling/issues/2208)) ([`b49d1ad`](https://github.com/docling-project/docling/commit/b49d1ad4f1af6eeadc3f8d0e35123dc52c6e228e)) * Updated the backend for new docling-parse ([#2187](https://github.com/docling-project/docling/issues/2187)) ([`b3d7542`](https://github.com/docling-project/docling/commit/b3d754206172d08d6d01f29f132dcb66383f955b)) ### Documentation * Ad | Low | 9/5/2025 |
| v2.50.0 | ### Feature * Heron layout model as new default ([#1971](https://github.com/docling-project/docling/issues/1971)) ([`e38aa0f`](https://github.com/docling-project/docling/commit/e38aa0f7f2e8a7881c0f97131bf776556778f9a2)) ### Fix * **html:** Access to variable not yet declared ([#2171](https://github.com/docling-project/docling/issues/2171)) ([`293e81b`](https://github.com/docling-project/docling/commit/293e81bf9d341edd1d35f3b66faf726b82ad4885)) | Low | 9/3/2025 |
| v2.49.0 | ### Feature * [Beta] Extraction with schema ([#2138](https://github.com/docling-project/docling/issues/2138)) ([`9f4bc5b`](https://github.com/docling-project/docling/commit/9f4bc5b2f19d700208b0b233c88fbe960758bdbd)) * **msexcel:** Set ContentLayer.INVISIBLE for invisible sheet ([#1876](https://github.com/docling-project/docling/issues/1876)) ([`a283ccf`](https://github.com/docling-project/docling/commit/a283ccff25a25ebbe6e9b2decfaaad6f300597db)) ### Fix * **pypdfium2:** Fix OCR bounding box m | Low | 9/1/2025 |
