freshcrate
Home > RAG & Memory > tokenizers

Description

<p align="center"> <br> <img src="https://huggingface.co/landing/assets/tokenizers/tokenizers-logo.png" width="600"/> <br> <p> <p align="center"> <a href="https://badge.fury.io/py/tokenizers"> <img alt="Build" src="https://badge.fury.io/py/tokenizers.svg"> </a> <a href="https://github.com/huggingface/tokenizers/blob/master/LICENSE"> <img alt="GitHub" src="https://img.shields.io/github/license/huggingface/tokenizers.svg?color=blue"> </a> </p> <br> # Tokenizers Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Bindings over the [Rust](https://github.com/huggingface/tokenizers/tree/master/tokenizers) implementation. If you are interested in the High-level design, you can go check it there. Otherwise, let's dive in! ## Main features: - Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions). - Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. - Easy to use, but also extremely versatile. - Designed for research and production. - Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token. - Does all the pre-processing: Truncate, Pad, add the special tokens your model needs. ### Installation #### With pip: ```bash pip install tokenizers ``` #### From sources: To use this method, you need to have the Rust installed: ```bash # Install with: curl https://sh.rustup.rs -sSf | sh -s -- -y export PATH="$HOME/.cargo/bin:$PATH" ``` Once Rust is installed, you can compile doing the following ```bash git clone https://github.com/huggingface/tokenizers cd tokenizers/bindings/python # Create a virtual env (you can use yours as well) python -m venv .env source .env/bin/activate # Install `tokenizers` in the current virtual env pip install -e . ``` ### Load a pretrained tokenizer from the Hub ```python from tokenizers import Tokenizer tokenizer = Tokenizer.from_pretrained("bert-base-cased") ``` ### Using the provided Tokenizers We provide some pre-build tokenizers to cover the most common cases. You can easily load one of these using some `vocab.json` and `merges.txt` files: ```python from tokenizers import CharBPETokenizer # Initialize a tokenizer vocab = "./path/to/vocab.json" merges = "./path/to/merges.txt" tokenizer = CharBPETokenizer(vocab, merges) # And then encode: encoded = tokenizer.encode("I can feel the magic, can you?") print(encoded.ids) print(encoded.tokens) ``` And you can train them just as simply: ```python from tokenizers import CharBPETokenizer # Initialize a tokenizer tokenizer = CharBPETokenizer() # Then train it! tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ]) # Now, let's use it: encoded = tokenizer.encode("I can feel the magic, can you?") # And finally save it somewhere tokenizer.save("./path/to/directory/my-bpe.tokenizer.json") ``` #### Provided Tokenizers - `CharBPETokenizer`: The original BPE - `ByteLevelBPETokenizer`: The byte level version of the BPE - `SentencePieceBPETokenizer`: A BPE implementation compatible with the one used by SentencePiece - `BertWordPieceTokenizer`: The famous Bert tokenizer, using WordPiece All of these can be used and trained as explained above! ### Build your own Whenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer, by putting all the different parts you need together. You can check how we implemented the [provided tokenizers](https://github.com/huggingface/tokenizers/tree/master/bindings/python/py_src/tokenizers/implementations) and adapt them easily to your own needs. #### Building a byte-level BPE Here is an example showing how to build your own byte-level BPE by putting all the different pieces together, and then saving it to a single file: ```python from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors # Initialize a tokenizer tokenizer = Tokenizer(models.BPE()) # Customize pre-tokenization and decoding tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True) tokenizer.decoder = decoders.ByteLevel() tokenizer.post_processor = processors.ByteLevel(trim_offsets=True) # And then train trainer = trainers.BpeTrainer( vocab_size=20000, min_frequency=2, initial_alphabet=pre_tokenizers.ByteLevel.alphabet() ) tokenizer.train([ "./path/to/dataset/1.txt", "./path/to/dataset/2.txt", "./path/to/dataset/3.txt" ], trainer=trainer) # And Save it tokenizer.save("byte-level-bpe.tokenizer.json", pretty=True) ``` Now, when you want to use this tokenizer, this is as simple as: ```python from tokenizers import Tokenizer tokenizer = Tokenizer.from_file("byte-level-bpe.tokenizer.json") encoded = tokenizer.encode("I can feel the magic, can you?")

Release History

VersionChangesUrgencyDate
0.22.2Imported from PyPI (0.22.2)Low4/21/2026
v0.22.2## What's Changed Okay mostly doing the release for these PR: * Update deserialize of added tokens by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1891 * update stub for typing by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1896 * bump PyO3 to 0.26 by @davidhewitt in https://github.com/huggingface/tokenizers/pull/1901 <img width="2400" height="1200" alt="image" src="https://github.com/user-attachments/assets/0b974453-1fc6-4393-84ea-da99269e2b34" /Low12/2/2025
v0.22.1# Release v0.22.1 Main change: - Bump huggingface_hub upper version (#1866) from @Wauplin - chore(trainer): add and improve trainer signature (#1838) from @shenxiangzhuang - Some doc updates: c91d76ae558ca2dc1aa725959e65dc21bf1fed7e, 7b0217894c1e2baed7354ab41503841b47af7cf9, 57eb8d7d9564621221784f7949b9efdeb7a49ac1 Low9/19/2025
v0.22.0## What's Changed * Bump on-headers and compression in /tokenizers/examples/unstable_wasm/www by @dependabot[bot] in https://github.com/huggingface/tokenizers/pull/1827 * Implement `from_bytes` and `read_bytes` Methods in WordPiece Tokenizer for WebAssembly Compatibility by @sondalex in https://github.com/huggingface/tokenizers/pull/1758 * fix: use AHashMap to fix compile error by @b00f in https://github.com/huggingface/tokenizers/pull/1840 * New stream by @ArthurZucker in https://github.comLow8/29/2025
v0.21.4**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.21.3...v0.21.4 No change, the 0.21.3 release failed, this is just a re-release. https://github.com/huggingface/tokenizers/releases/tag/v0.21.3Low7/28/2025
v0.21.3## What's Changed * Clippy fixes. by @Narsil in https://github.com/huggingface/tokenizers/pull/1818 * Fixed an introduced backward breaking change in our Rust APIs. **Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.21.2...v0.21.3Low7/4/2025
v0.21.2## What's Changed This release if focused around some performance optimization, enabling broader python no gil support, and fixing some onig issues! * Update the release builds following 0.21.1. by @Narsil in https://github.com/huggingface/tokenizers/pull/1746 * replace lazy_static with stabilized std::sync::LazyLock in 1.80 by @sftse in https://github.com/huggingface/tokenizers/pull/1739 * Fix no-onig no-wasm builds by @414owen in https://github.com/huggingface/tokenizers/pull/1772 Low6/24/2025
v0.21.1## What's Changed * Update dev version and pyproject.toml by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1693 * Add feature flag hint to README.md, fixes #1633 by @sftse in https://github.com/huggingface/tokenizers/pull/1709 * Upgrade to PyO3 0.23 by @Narsil in https://github.com/huggingface/tokenizers/pull/1708 * Fixing the README. by @Narsil in https://github.com/huggingface/tokenizers/pull/1714 * Fix typo in Split docstrings by @Dylan-Harden3 in https://github.com/hugLow3/13/2025
v0.21.1rc0## What's Changed * Update dev version and pyproject.toml by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1693 * Add feature flag hint to README.md, fixes #1633 by @sftse in https://github.com/huggingface/tokenizers/pull/1709 * Upgrade to PyO3 0.23 by @Narsil in https://github.com/huggingface/tokenizers/pull/1708 * Fixing the README. by @Narsil in https://github.com/huggingface/tokenizers/pull/1714 * Fix typo in Split docstrings by @Dylan-Harden3 in https://github.com/hugLow3/12/2025
v0.21.0## Release ~v0.20.4~ v0.21.0 * More cache options. by @Narsil in https://github.com/huggingface/tokenizers/pull/1675 * Disable caching for long strings. by @Narsil in https://github.com/huggingface/tokenizers/pull/1676 * Testing ABI3 wheels to reduce number of wheels by @Narsil in https://github.com/huggingface/tokenizers/pull/1674 * Adding an API for decode streaming. by @Narsil in https://github.com/huggingface/tokenizers/pull/1677 * Decode stream python by @Narsil in https://github.com/Low11/15/2024
v0.20.3## What's Changed There was a breaking change in `0.20.3` for tuple inputs of `encode_batch`! * fix pylist by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1673 * [MINOR:TYPO] Fix docstrings by @cakiki in https://github.com/huggingface/tokenizers/pull/1653 ## New Contributors * @cakiki made their first contribution in https://github.com/huggingface/tokenizers/pull/1653 **Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.20.2...v0.20.3Low11/5/2024
v0.20.2# Release v0.20.2 Thanks a MILE to @diliop we now have support for python 3.13! 🥳 ## What's Changed * Bump cookie and express in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1648 * Fix off-by-one error in tokenizer::normalizer::Range::len by @rlanday in https://github.com/huggingface/tokenizers/pull/1638 * Arg name correction: auth_token -> token by @rravenel in https://github.com/huggingface/tokenizers/pull/1621 * Unsound caLow11/4/2024
v0.20.1## What's Changed The most awaited `offset` issue with `Llama` is fixed 🥳 * Update README.md by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1608 * fix benchmark file link by @152334H in https://github.com/huggingface/tokenizers/pull/1610 * Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows by @dependabot in https://github.com/huggingface/tokenizers/pull/1626 * [`ignore_merges`] Fix offsets by @ArthurZucker in https://github.com/huggingface/tokenizerLow10/10/2024
v0.20.0# Release v0.20.0 This release is focused on **performances** and **user experience**. ## Performances: First off, we did a bit of benchmarking, and found some place for improvement for us! With a few minor changes (mostly #1587) here is what we get on `Llama3` running on a g6 instances on AWS `https://github.com/huggingface/tokenizers/blob/main/bindings/python/benches/test_tiktoken.py` : ![image](https://github.com/user-attachments/assets/e6838866-ec76-44ce-a7b6-532e56971234) ## PLow8/8/2024
v0.19.1## What's Changed * add serialization for `ignore_merges` by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1504 **Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.19.0...v0.19.1Low4/17/2024
v0.19.0## What's Changed * chore: Remove CLI - this was originally intended for local development by @bryantbiggs in https://github.com/huggingface/tokenizers/pull/1442 * [`remove black`] And use ruff by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1436 * Bump ip from 2.0.0 to 2.0.1 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1456 * Added ability to inspect a 'Sequence' decoder and the `AddedVocabulary`. by @eaplatanios in https://github.comLow4/17/2024
v0.19.0rc0Bumping 3 versions because of this: https://github.com/huggingface/transformers/blob/60dea593edd0b94ee15dc3917900b26e3acfbbee/setup.py#L177 ## What's Changed * chore: Remove CLI - this was originally intended for local development by @bryantbiggs in https://github.com/huggingface/tokenizers/pull/1442 * [`remove black`] And use ruff by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1436 * Bump ip from 2.0.0 to 2.0.1 in /bindings/node by @dependabot in https://github.com/hugLow4/16/2024
v0.15.2## What's Changed Big shoutout to @rlrs for [the fast replace normalizers](https://github.com/huggingface/tokenizers/pull/1413) PR. This boosts the performances of the tokenizers: ![image](https://github.com/huggingface/tokenizers/assets/48595927/d8ee81b1-6d92-43d4-b74c-8775727763e3) * chore: Update dependencies to latest supported versions by @bryantbiggs in https://github.com/huggingface/tokenizers/pull/1441 * Convert word counts to u64 by @stephenroller in https://github.com/huggingfacLow2/12/2024
v0.15.1## What's Changed * udpate to version = "0.15.1-dev0" by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1390 * Derive `Clone` on `Tokenizer`, add `Encoding.into_tokens()` method by @epwalsh in https://github.com/huggingface/tokenizers/pull/1381 * Stale bot. by @Narsil in https://github.com/huggingface/tokenizers/pull/1404 * Fix doc links in readme by @Pierrci in https://github.com/huggingface/tokenizers/pull/1367 * Faster HF dataset iteration in docs by @mariosasko in httpsLow1/22/2024
v0.15.1.rc0## What's Changed * pyo3: update to 0.19 by @mikelui in https://github.com/huggingface/tokenizers/pull/1322 * Add `expect()` for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316 * Re-using scritpts from safetensors. by @Narsil in https://github.com/huggingface/tokenizers/pull/1328 * Reduce number of different revisions by 1 by @Narsil in https://github.com/huggingface/tokenizers/pull/1329 * Python 38 arm by @Narsil in https://github.com/huggingfaceLow1/18/2024
v0.15.0## What's Changed * fix a clerical error in the comment by @tiandiweizun in https://github.com/huggingface/tokenizers/pull/1356 * fix: remove useless token by @rtrompier in https://github.com/huggingface/tokenizers/pull/1371 * Bump @babel/traverse from 7.22.11 to 7.23.2 in /bindings/node by @dependabot in https://github.com/huggingface/tokenizers/pull/1370 * Allow hf_hub 0.18 by @mariosasko in https://github.com/huggingface/tokenizers/pull/1383 * Allow `huggingface_hub<1.0` by @Wauplin in Low11/14/2023
v0.14.1## What's Changed * Fix conda release by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1211 * Fix node release by @ArthurZucker in https://github.com/huggingface/tokenizers/pull/1212 * Printing warning to stderr. by @Narsil in https://github.com/huggingface/tokenizers/pull/1222 * Fixing padding_left sequence_ids. by @Narsil in https://github.com/huggingface/tokenizers/pull/1233 * Use LTO for release and benchmark builds by @csko in https://github.com/huggingface/tokenizersLow10/6/2023
v0.14.1rc1## What's Changed * pyo3: update to 0.19 by @mikelui in https://github.com/huggingface/tokenizers/pull/1322 * Add `expect()` for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316 * Re-using scritpts from safetensors. by @Narsil in https://github.com/huggingface/tokenizers/pull/1328 * Reduce number of different revisions by 1 by @Narsil in https://github.com/huggingface/tokenizers/pull/1329 * Python 38 arm by @Narsil in https://github.com/huggingfaceLow10/5/2023
v0.14.0⚠️ Reworks the release pipeline. Other breaking changes ⚠️ : - #1335, AddedToken is reworked, `is_special_token` rename to `special` for consistency - feature http is now `OFF` by default, and depends on hf-hub instead of cached_path (updated cache directory, better sync implementation) - Removed SSL link on the python package, calling huggingface_hub directly instead. - New dependency : huggingface_hub (while we deprecate Tokenizer.from_pretrained(...) to Tokenizer.from_file(hugginngface_hLow9/7/2023
v0.14.0.rc1Reworks the release pipeline. Other breaking changes are mostly related to https://github.com/huggingface/tokenizers/pull/1335, where AddedToken is reworked ## What's Changed * pyo3: update to 0.19 by @mikelui in https://github.com/huggingface/tokenizers/pull/1322 * Add `expect()` for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316 * Re-using scritpts from safetensors. by @Narsil in https://github.com/huggingface/tokenizers/pull/1328 * Reduce nuLow9/7/2023
v0.13.4.rc3Mostly checking the new release scripts actually work. ## What's Changed * pyo3: update to 0.19 by @mikelui in https://github.com/huggingface/tokenizers/pull/1322 * Add `expect()` for disabling truncation by @boyleconnor in https://github.com/huggingface/tokenizers/pull/1316 * Re-using scritpts from safetensors. by @Narsil in https://github.com/huggingface/tokenizers/pull/1328 ## New Contributors * @mikelui made their first contribution in https://github.com/huggingface/tokenizers/pullLow8/23/2023
v0.13.4.rc2## What's Changed * Fix stride condition. by @Narsil in https://github.com/huggingface/tokenizers/pull/1321 **Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc1...v0.13.4.rc2Low8/14/2023
v0.13.4.rc1## What's Changed * Update all GH Actions with dependency on actions/checkout by @mfuntowicz in https://github.com/huggingface/tokenizers/pull/1256 * Parallelize unigram trainer by @mishig25 in https://github.com/huggingface/tokenizers/pull/976 * Update unigram/trainer.rs by @chris-ha458 in https://github.com/huggingface/tokenizers/pull/1257 * Fixing broken link. by @Narsil in https://github.com/huggingface/tokenizers/pull/1268 * fix documentation regarding regex by @chris-ha458 in https:/Low8/14/2023
v0.13.4-rc2Release v0.13.4-rc2Low5/17/2023
v0.13.4-rc1Release v0.13.4-rc1Low5/15/2023
node-v0.13.3## What's Changed * Update pr docs actions by @mishig25 in https://github.com/huggingface/tokenizers/pull/1101 * Adding rust audit. by @Narsil in https://github.com/huggingface/tokenizers/pull/1099 * Revert "Update pr docs actions" by @mishig25 in https://github.com/huggingface/tokenizers/pull/1107 * Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1108 * Include license file in Rust crate by @anLow4/5/2023
v0.13.3## What's Changed * Update pr docs actions by @mishig25 in https://github.com/huggingface/tokenizers/pull/1101 * Adding rust audit. by @Narsil in https://github.com/huggingface/tokenizers/pull/1099 * Revert "Update pr docs actions" by @mishig25 in https://github.com/huggingface/tokenizers/pull/1107 * Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1108 * Include license file in Rust crate by @anLow4/5/2023
python-v0.13.3## What's Changed * Update pr docs actions by @mishig25 in https://github.com/huggingface/tokenizers/pull/1101 * Adding rust audit. by @Narsil in https://github.com/huggingface/tokenizers/pull/1099 * Revert "Update pr docs actions" by @mishig25 in https://github.com/huggingface/tokenizers/pull/1107 * Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1108 * Include license file in Rust crate by @anLow4/5/2023
python-v0.13.3rc1## What's Changed * Update pr docs actions by @mishig25 in https://github.com/huggingface/tokenizers/pull/1101 * Adding rust audit. by @Narsil in https://github.com/huggingface/tokenizers/pull/1099 * Revert "Update pr docs actions" by @mishig25 in https://github.com/huggingface/tokenizers/pull/1107 * Bump loader-utils from 1.4.0 to 1.4.2 in /tokenizers/examples/unstable_wasm/www by @dependabot in https://github.com/huggingface/tokenizers/pull/1108 * Include license file in Rust crate by @anLow4/4/2023
node-v0.13.2Python 3.11 support (Python only modification)Low11/7/2022
v0.13.2Python 3.11 support (Python only modification)Low11/7/2022
python-v0.13.2## [0.13.2] - [#1096] Python 3.11 supportLow11/7/2022
node-v0.13.1## [0.13.1] - [#1072] Fixing Roberta type ids.Low10/6/2022
v0.13.1## [0.13.1] - [#1072] Fixing Roberta type ids.Low10/6/2022
python-v0.13.1## [0.13.1] - [#1072] Fixing Roberta type ids.Low10/6/2022
python-v0.13.0## [0.13.0] - [#956] PyO3 version upgrade - [#1055] M1 automated builds - [#1008] `Decoder` is now a composable trait, but without being backward incompatible - [#1047, #1051, #1052] `Processor` is now a composable trait, but without being backward incompatible Both trait changes warrant a "major" number since, despite best efforts to not break backward compatibility, the code is different enough that we cannot be exactly sure.Low9/21/2022
node-v0.13.0## [0.13.0] - [#1008] `Decoder` is now a composable trait, but without being backward incompatible - [#1047, #1051, #1052] `Processor` is now a composable trait, but without being backward incompatibleLow9/19/2022
v0.13.0## [0.13.0] - [#1009] `unstable_wasm` feature to support building on Wasm (it's unstable !) - [#1008] `Decoder` is now a composable trait, but without being backward incompatible - [#1047, #1051, #1052] `Processor` is now a composable trait, but without being backward incompatible Both trait changes warrant a "major" number since, despite best efforts to not break backward compatibility, the code is different enough that we cannot be exactly sure.Low9/19/2022
python-v0.12.1## [0.12.1] - [#938] **Reverted breaking change**. https://github.com/huggingface/transformers/issues/16520Low4/13/2022
node-v0.12.0## [0.12.0] The breaking change was causing more issues upstream in `transformers` than anticipated: https://github.com/huggingface/transformers/pull/16537#issuecomment-1085682657 The decision was to rollback on that breaking change, and figure out a different way later to do this modification Bump minor version because of a breaking change. Using `0.12` to match other bindings. - [#938] **Breaking change**. Decoder trait is modified to be composable. This is only breaking if youLow3/31/2022
python-v0.12.0## [0.12.0] The breaking change was causing more issues upstream in `transformers` than anticipated: https://github.com/huggingface/transformers/pull/16537#issuecomment-1085682657 The decision was to rollback on that breaking change, and figure out a different way later to do this modification Bump minor version because of a breaking change. - [#938] **Breaking change**. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizLow3/31/2022
v0.12.0## [0.12.0] Bump minor version because of a breaking change. The breaking change was causing more issues upstream in `transformers` than anticipated: https://github.com/huggingface/transformers/pull/16537#issuecomment-1085682657 The decision was to rollback on that breaking change, and figure out a different way later to do this modification - [#938] **Breaking change**. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizLow3/31/2022
v0.11.2- [#919] Fixing single_word AddedToken. (regression from 0.11.2) - [#916] Deserializing faster `added_tokens` by loading them in batch.Low2/28/2022
node-v0.8.3Release node-v0.8.3Low2/28/2022
python-v0.11.6- [#919] Fixing single_word AddedToken. (regression from 0.11.2) - [#916] Deserializing faster `added_tokens` by loading them in batch.Low2/28/2022
python-v0.11.5[#895] Add wheel support for Python 3.10Low2/16/2022

Dependencies & License Audit

Loading dependencies...

Similar Packages

outlines-coreStructured Text Generation in Rust0.2.14
pytorch-lightningPyTorch Lightning is the lightweight PyTorch wrapper for ML researchers. Scale your models. Write less boilerplate.2.6.1
nvidia-cuda-cupti-cu12CUDA profiling tools runtime libs.12.9.79
apache-tvm-ffitvm ffi0.1.10
magikaA tool to determine the content type of a file with deep learning1.0.2