freshcrate
Home > Developer Tools > markitdown

markitdown

Utility tool for converting various files to Markdown

Description

# MarkItDown > [!IMPORTANT] > MarkItDown is a Python package and command-line utility for converting various files to Markdown (e.g., for indexing, text analysis, etc). > > For more information, and full documentation, see the project [README.md](https://github.com/microsoft/markitdown) on GitHub. ## Installation From PyPI: ```bash pip install markitdown[all] ``` From source: ```bash git clone git@github.com:microsoft/markitdown.git cd markitdown pip install -e packages/markitdown[all] ``` ## Usage ### Command-Line ```bash markitdown path-to-file.pdf > document.md ``` ### Python API ```python from markitdown import MarkItDown md = MarkItDown() result = md.convert("test.xlsx") print(result.text_content) ``` ### More Information For more information, and full documentation, see the project [README.md](https://github.com/microsoft/markitdown) on GitHub. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Release History

VersionChangesUrgencyDate
0.1.5Imported from PyPI (0.1.5)Low4/21/2026
v0.1.5## What's Changed * Update PDF table extraction to support aligned Markdown by @lesyk in https://github.com/microsoft/markitdown/pull/1499 * Fix: PDF parsing doesn't support partially numbered lists by @lesyk in https://github.com/microsoft/markitdown/pull/1525 * Extend table support for wide tables by @lesyk in https://github.com/microsoft/markitdown/pull/1552 * Add text/markdown to Accept header by @afourney in https://github.com/microsoft/markitdown/pull/1554 * Remove onnxruntime<=1.20.1Low2/20/2026
v0.1.5b1## What's Changed * Update PDF table extraction to support aligned Markdown by @lesyk in https://github.com/microsoft/markitdown/pull/1499 * Fix: PDF parsing doesn't support partially numbered lists by @lesyk in https://github.com/microsoft/markitdown/pull/1525 ## New Contributors * @lesyk made their first contribution in https://github.com/microsoft/markitdown/pull/1499 **Full Changelog**: https://github.com/microsoft/markitdown/compare/v0.1.4...v0.1.5b1Low1/8/2026
v0.1.4Maintenance release: Bumps mammoth to 1.11.0 to address [cve-2025-11849](https://avd.aquasec.com/nvd/2025/cve-2025-11849/) And pdfminer.six to 20251107 to address [GHSA-wf5f-4jwr-ppcp](https://github.com/pdfminer/pdfminer.six/security/advisories/GHSA-wf5f-4jwr-ppcp)Low12/1/2025
v0.1.3## What's Changed * Pin `onnxruntime` on Windows by @t-kalinowski in https://github.com/microsoft/markitdown/pull/1274 * Have the MarkItDown MCP server read MARKITDOWN_ENABLE_PLUGINS from ENV by @afourney in https://github.com/microsoft/markitdown/pull/1273 * Resolved an issue with linked images in docx [mammoth] by @afourney in https://github.com/microsoft/markitdown/pull/1405 * Ensure safe ExifTool usage: require >= 12.24 by @t3tra-dev in https://github.com/microsoft/markitdown/pull/1399 Low8/26/2025
v0.1.2## What's Changed - feat: render math equations in .docx documents by @sathinduga in https://github.com/microsoft/markitdown/pull/1160 - Make it easier to use AzureKeyCredentials with Azure Doc Intelligence by @afourney in https://github.com/microsoft/markitdown/pull/1151 - Add CSV to Markdown table conversion - fixes https://github.com/microsoft/markitdown/issues/1144 by @erinshek in https://github.com/microsoft/markitdown/pull/1176 - Fix typo in README.md by @lentil32 in https://github.comLow5/28/2025
v0.1.2a1## What's Changed * feat: render math equations in .docx documents by @sathinduga in https://github.com/microsoft/markitdown/pull/1160 * Make it easier to use AzureKeyCredentials with Azure Doc Intelligence by @afourney in https://github.com/microsoft/markitdown/pull/1151 * Add CSV to Markdown table conversion - fixes #1144 by @erinshek in https://github.com/microsoft/markitdown/pull/1176 * chore: fix typo in README.md by @lentil32 in https://github.com/microsoft/markitdown/pull/1175 * UpdaLow5/21/2025
v0.1.1## What's Changed `convert_url` renamed to `convert_uri`, and now handles data and file URIs by @afourney in https://github.com/microsoft/markitdown/pull/1153 **NOTE**: `convert_url` remains an alias to `convert_uri`, for backward compatibility. Both now accept file URIs and data URIs: e.g., ```python markitdown = MarkItDown() result = markitdown.convert_uri("file:///path/to/file.txt") print(result.markdown) ``` And, ```python markitdown = MarkItDown() result = markitdLow3/25/2025
v0.1.0## Overview Version 0.1.0 (previously 0.1.0a6) is a large release, bringing many improvements over the previous 0.0.2 version. High-level changes include: * Organized dependencies into feature groups — install only the converters you need, or get everything with `pip install markitdown[all]` * A new plugin-based architecture, allowing 3rd-party developers to add functionality to MarkItDown (see the [sample plugin](https://github.com/microsoft/markitdown/tree/main/packages/markitdown-sampLow3/22/2025
v0.1.0a6## What's Changed * Add support for preserving base64 encoded images by @BetterAndBetterII in https://github.com/microsoft/markitdown/pull/1140 * Bump version and resolve a console encoding error. by @afourney in https://github.com/microsoft/markitdown/pull/1149 ## New Contributors * @BetterAndBetterII made their first contribution in https://github.com/microsoft/markitdown/pull/1140 **Full Changelog**: https://github.com/microsoft/markitdown/compare/v0.1.0a5...v0.1.0a6Low3/21/2025
v0.1.0a5## What's Changed * Consider anything with a charset as plain text-convertible. by @afourney in https://github.com/microsoft/markitdown/pull/1142 * Adjust warning filters and update dependencies by @afourney in https://github.com/microsoft/markitdown/pull/1143 **Full Changelog**: https://github.com/microsoft/markitdown/compare/v0.1.0a4...v0.1.0a5Low3/20/2025
v0.1.0a4## Features * Basic EPub support from @0xRaduan, in collaboration with @afourney * Switch from puremagic to magika. by @afourney in https://github.com/microsoft/markitdown/pull/1108 * Added CLI options for extension, mime-types, and charset. by @afourney in https://github.com/microsoft/markitdown/pull/1115 * Sort pptx shapes to be parsed in top-to-bottom, left-to-right order by @richardye101 in https://github.com/microsoft/markitdown/pull/1104 ## Bug fixes and enhancements * fix(README)Low3/17/2025
v0.0.2## What's Changed * Avoids resetting warning filters (addresses #1068) by @afourney in https://github.com/microsoft/markitdown/pull/1101 * Removes deprecated features from 0.0.1aX (pre-release alphas) by @afourney in https://github.com/microsoft/markitdown/pull/1105 **Full Changelog**: https://github.com/microsoft/markitdown/compare/v0.0.1...v0.0.2Low3/8/2025
v0.1.0a1## What's Changed This MarkItDown _alpha_ introduces numerous bug-fixes, and the following major changes: * Dependencies are now organized into optional feature-groups (further details below). Use pip install `markitdown[all]` to have backward-compatible behavior. * The DocumentConverter class interface has changed to read from file-like streams rather than file paths. *No temporary files are created anymore*. If you are the maintainer of a DocumentConverter, you likely need to update your Low3/6/2025
v0.0.1Promoting v0.0.1a5 to a full release. For more details see the prior [Release Notes](https://github.com/microsoft/markitdown/releases/tag/v0.0.1a5). Low3/6/2025
v0.0.1a5## What's Changed * Fixed compatibility with [markdownify v1.0.0](https://github.com/matthewwithanm/python-markdownify/releases/tag/1.0.0) ## New Contributors * @lh0x00 made their first contribution in https://github.com/microsoft/markitdown/pull/1072 **Full Changelog**: https://github.com/microsoft/markitdown/compare/v0.0.1a4...v0.0.1a5Low2/28/2025
v0.0.1a4## Some of What's Changed * feat: Add RSSConverter by @Soulter in https://github.com/microsoft/markitdown/pull/97 * feat: Add IpynbConverter by @AumGupta in https://github.com/microsoft/markitdown/pull/71 * feat(devcontainer): Add DevContainer Configuration for Easier Contribution Setup by @l-lumin in https://github.com/microsoft/markitdown/pull/64 * feat: add support for conversion via Document Intelligence by @KennyZhang1 in https://github.com/microsoft/markitdown/pull/303 * feat: add veLow2/11/2025
v0.0.1a3## New Features and Formats * Add zip handling by @Josh-XT in https://github.com/microsoft/markitdown/pull/22 * Add PPTX chart support by @nyosegawa in https://github.com/microsoft/markitdown/pull/33 ## Breaking Changes Renamed `mlm_client ` and `mlm_model` arguments to `llm_client` and `llm_model`, and added appropriate deprecation warnings. See: * Fix LLM terminology in code by @CharlesCNorton in https://github.com/microsoft/markitdown/pull/73 * Fix LLM terms by @CharlesCNorton inLow12/17/2024
v0.0.1a2## Initial Release of markitdown The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.) It presently supports: * PDF (.pdf) * PowerPoint (.pptx) * Word (.docx) * Excel (.xlsx) * Images (EXIF metadata, and OCR) * Audio (EXIF metadata, and speech transcription) * HTML (special handling of Wikipedia, etc.) * Various other text-based formats (csv, json, xml, etc.) The API is simple: ```python from markitdown Low12/17/2024

Dependencies & License Audit

Loading dependencies...

Similar Packages

azure-coreMicrosoft Azure Core Library for Pythonazure-template_0.1.0b6187637
azure-mgmt-coreMicrosoft Azure Management Core Library for Pythonazure-template_0.1.0b6187637
azure-monitor-opentelemetry-exporterMicrosoft Azure Monitor Opentelemetry Exporter Client Library for Pythonazure-template_0.1.0b6187637
azure-servicebusMicrosoft Azure Service Bus Client Library for Pythonazure-template_0.1.0b6187637
azure-monitor-opentelemetryMicrosoft Azure Monitor Opentelemetry Distro Client Library for Pythonazure-template_0.1.0b6187637