freshcrate
Home > Frameworks > markdownify

markdownify

Convert HTML to markdown.

Description

|build| |version| |license| |downloads| .. |build| image:: https://img.shields.io/github/actions/workflow/status/matthewwithanm/python-markdownify/python-app.yml?branch=develop :alt: GitHub Workflow Status :target: https://github.com/matthewwithanm/python-markdownify/actions/workflows/python-app.yml?query=workflow%3A%22Python+application%22 .. |version| image:: https://img.shields.io/pypi/v/markdownify :alt: Pypi version :target: https://pypi.org/project/markdownify/ .. |license| image:: https://img.shields.io/pypi/l/markdownify :alt: License :target: https://github.com/matthewwithanm/python-markdownify/blob/develop/LICENSE .. |downloads| image:: https://pepy.tech/badge/markdownify :alt: Pypi Downloads :target: https://pepy.tech/project/markdownify Installation ============ ``pip install markdownify`` Usage ===== Convert some HTML to Markdown: .. code:: python from markdownify import markdownify as md md('<b>Yay</b> <a href="http://github.com">GitHub</a>') # > '**Yay** [GitHub](http://github.com)' Specify tags to exclude: .. code:: python from markdownify import markdownify as md md('<b>Yay</b> <a href="http://github.com">GitHub</a>', strip=['a']) # > '**Yay** GitHub' \...or specify the tags you want to include: .. code:: python from markdownify import markdownify as md md('<b>Yay</b> <a href="http://github.com">GitHub</a>', convert=['b']) # > '**Yay** GitHub' Options ======= Markdownify supports the following options: strip A list of tags to strip. This option can't be used with the ``convert`` option. convert A list of tags to convert. This option can't be used with the ``strip`` option. autolinks A boolean indicating whether the "automatic link" style should be used when a ``a`` tag's contents match its href. Defaults to ``True``. default_title A boolean to enable setting the title of a link to its href, if no title is given. Defaults to ``False``. heading_style Defines how headings should be converted. Accepted values are ``ATX``, ``ATX_CLOSED``, ``SETEXT``, and ``UNDERLINED`` (which is an alias for ``SETEXT``). Defaults to ``UNDERLINED``. bullets An iterable (string, list, or tuple) of bullet styles to be used. If the iterable only contains one item, it will be used regardless of how deeply lists are nested. Otherwise, the bullet will alternate based on nesting level. Defaults to ``'*+-'``. strong_em_symbol In markdown, both ``*`` and ``_`` are used to encode **strong** or *emphasized* texts. Either of these symbols can be chosen by the options ``ASTERISK`` (default) or ``UNDERSCORE`` respectively. sub_symbol, sup_symbol Define the chars that surround ``<sub>`` and ``<sup>`` text. Defaults to an empty string, because this is non-standard behavior. Could be something like ``~`` and ``^`` to result in ``~sub~`` and ``^sup^``. If the value starts with ``<`` and ends with ``>``, it is treated as an HTML tag and a ``/`` is inserted after the ``<`` in the string used after the text; this allows specifying ``<sub>`` to use raw HTML in the output for subscripts, for example. newline_style Defines the style of marking linebreaks (``<br>``) in markdown. The default value ``SPACES`` of this option will adopt the usual two spaces and a newline, while ``BACKSLASH`` will convert a linebreak to ``\\n`` (a backslash and a newline). While the latter convention is non-standard, it is commonly preferred and supported by a lot of interpreters. code_language Defines the language that should be assumed for all ``<pre>`` sections. Useful, if all code on a page is in the same programming language and should be annotated with `````python`` or similar. Defaults to ``''`` (empty string) and can be any string. code_language_callback When the HTML code contains ``pre`` tags that in some way provide the code language, for example as class, this callback can be used to extract the language from the tag and prefix it to the converted ``pre`` tag. The callback gets one single argument, a BeautifulSoup object, and returns a string containing the code language, or ``None``. An example to use the class name as code language could be:: def callback(el): return el['class'][0] if el.has_attr('class') else None Defaults to ``None``. escape_asterisks If set to ``False``, do not escape ``*`` to ``\*`` in text. Defaults to ``True``. escape_underscores If set to ``False``, do not escape ``_`` to ``\_`` in text. Defaults to ``True``. escape_misc If set to ``True``, escape miscellaneous punctuation characters that sometimes have Markdown significance in text. Defaults to ``False``. keep_inline_images_in Images are converted to their alt-text when the images are located inside headlines or table cells. If some inline images should be converted to markdown images instead, this option can be set to a list of parent tags that should be allowe

Release History

VersionChangesUrgencyDate
1.2.2Imported from PyPI (1.2.2)Low4/21/2026
1.2.0## What's Changed * Add beautiful_soup_parser option by @vincentkelleher in https://github.com/matthewwithanm/python-markdownify/pull/206 * make convert_hn() public instead of internal by @chrispy-snps in https://github.com/matthewwithanm/python-markdownify/pull/213 * Add conversion support for `<q>` tags by @colinrobinsonuib in https://github.com/matthewwithanm/python-markdownify/pull/217 * Ensure that explicitly provided heading conversion functions are used (#212) by @chrispy-snps in httpLow8/9/2025
1.1.0## What's Changed * Support `video` tag with `poster` attribute by @itmammoth in https://github.com/matthewwithanm/python-markdownify/pull/189 * Add missing newlines for definition lists by @chrispy-snps in https://github.com/matthewwithanm/python-markdownify/pull/200 * In inline contexts, resolve `<br/>` to a space instead of an empty string by @chrispy-snps in https://github.com/matthewwithanm/python-markdownify/pull/202 * Generalize `colspan` handling to handle missing header rows by @sbrLow3/5/2025
1.0.0## Breaking Changes If you are using custom tag conversion functions (`convert_*()`), note that the function interface has changed. See #191 for details. ## What's Changed * Do not construct Markdown links in code spans and code blocks by @chrispy-snps in https://github.com/matthewwithanm/python-markdownify/pull/165 * Insert a blank line between table caption, table content by @chrispy-snps in https://github.com/matthewwithanm/python-markdownify/pull/167 * Allow a `wrap_width` value of Low2/24/2025
0.14.1Fixes technical errors regarding the heading tag: - https://github.com/matthewwithanm/python-markdownify/issues/142 - https://github.com/matthewwithanm/python-markdownify/issues/143 **Full Changelog**: https://github.com/matthewwithanm/python-markdownify/compare/0.14.0...0.14.1Low11/24/2024
0.14.0## What's Changed * More carefully separate inline text from block content by @jsm28 in https://github.com/matthewwithanm/python-markdownify/pull/120 * More selective escaping of `-#.)` (alternative approach) by @jsm28 in https://github.com/matthewwithanm/python-markdownify/pull/149 * More thorough cleanup of input whitespace by @jsm28 in https://github.com/matthewwithanm/python-markdownify/pull/151 * Fix logic for indentation inside list items by @jsm28 in https://github.com/matthewwithanm/Low11/24/2024
0.13.1## What's Changed * Migrated the metadata into PEP 621-compliant pyproject.toml by @KOLANICH in https://github.com/matthewwithanm/python-markdownify/pull/138 **Full Changelog**: https://github.com/matthewwithanm/python-markdownify/compare/0.13.0...0.13.1Low7/14/2024
0.13.0## What's Changed * Avoid inline styles inside `<code>` / `<pre>` conversion by @jsm28 in https://github.com/matthewwithanm/python-markdownify/pull/117 * Escape all characters with Markdown significance by @jsm28 in https://github.com/matthewwithanm/python-markdownify/pull/118 * Update MANIFEST.in to exclude tests during packaging by @samypr100 in https://github.com/matthewwithanm/python-markdownify/pull/125 * Special-case use of HTML tags for converting `<sub>` / `<sup>` by @jsm28 in https:Low7/14/2024
0.12.1Release 0.12.1Low3/26/2024
0.12.0Huge thanks to all the contributors! ## What's Changed * improve text normalization/escaping for preformatted/code contexts by @chrispy-snps in https://github.com/matthewwithanm/python-markdownify/pull/104 * ignore script and style content (such as css and javascript) by @tlk in https://github.com/matthewwithanm/python-markdownify/pull/112 * Add no css example to readme by @GeeCastro in https://github.com/matthewwithanm/python-markdownify/pull/111 * Fix newline start in header tags by @5yLow3/26/2024
0.11.6Release 0.11.6Low9/2/2022
0.11.5Release 0.11.5Low8/31/2022
0.11.4Release 0.11.4Low8/28/2022
0.11.3Thanks to all contributors! :)Low8/28/2022
0.11.2Release 0.11.2Low4/24/2022
0.11.1Thanks to @mvkorpel Low4/14/2022
0.11.0Thanks to all contributors! :partying_face: Low4/13/2022
0.10.3Release 0.10.3Low1/23/2022
0.10.2Release 0.10.2Low1/18/2022
0.10.1Thanks to milahu!Low12/11/2021
0.10.0Thanks to @Inzaniak !Low11/17/2021
0.9.4Release 0.9.4Low9/4/2021
0.9.3Release 0.9.3Low8/25/2021
0.9.2Release 0.9.2Low7/11/2021
0.9.1Release 0.9.1Low7/11/2021
0.9.0Release 0.9.0Low5/30/2021
0.8.1And testing CDataLow5/30/2021
0.8.0Now supporting code, samp, kbd, pre, del, sLow5/21/2021
0.7.4Allowing for HTML Tags inside table cells and the omission of a table header rowLow5/18/2021
0.7.3Release 0.7.3Low5/16/2021
0.7.2Thanks to all collaborators!Low5/2/2021
0.7.1Release 0.7.1Low5/2/2021
0.7.0Release 0.7.0Low4/22/2021
0.6.6Release 0.6.6Low4/22/2021
0.6.5This just adds the 3.x tags in PyPi and sets some dependencies straight. Only one test needed an update.Low2/21/2021
0.6.4This adds newlines after blockquotes to allow for following paragraphs Thanks to all contributors!Low2/21/2021
0.6.3Release 0.6.3Low1/12/2021
0.6.1Release 0.6.1Low1/4/2021
0.6.0Release 0.6.0Low12/13/2020
0.5.3Release 0.5.3Low9/1/2020
0.5.2Release 0.5.2Low8/18/2020
0.5.1Release 0.5.1Low8/11/2020

Dependencies & License Audit

Loading dependencies...

Similar Packages

pre-commitA framework for managing and maintaining multi-language pre-commit hooks.v4.6.0
azure-core-tracing-opentelemetryMicrosoft Azure Azure Core OpenTelemetry plugin Library for Pythonazure-template_0.1.0b6187637
spdx-toolsSPDX parser and tools.0.8.5
lacesDjango components that know how to render themselves.0.1.2
django-tasksA backport of Django's built in Tasks framework0.12.0