Description
# BinaryOrNot Python library and CLI tool to check if a file is binary or text. Zero dependencies. ```python from binaryornot.check import is_binary is_binary("image.png") # True is_binary("README.md") # False is_binary("data.sqlite") # True is_binary("report.csv") # False ``` ```sh $ binaryornot image.png True ``` ## Install ```sh pip install binaryornot ``` ## Why not just check for null bytes? That's the first thing everyone tries. It works until it doesn't: - A UTF-16 text file is full of null bytes. Your tool thinks it's binary and corrupts it. - A Big5 or GB2312 text file has high-ASCII bytes everywhere. Looks binary by byte ratios alone. - A font file (.woff, .eot) is clearly binary but might not have null bytes in the first chunk. BinaryOrNot reads the first 128 bytes and runs them through a trained decision tree that considers byte ratios, Shannon entropy, encoding validity, BOM detection, and more. It handles all the edge cases above correctly, with zero dependencies. Tested against [37 text encodings and 49 binary formats](https://binaryornot.github.io/binaryornot/usage/), verified by parametrized tests driven from coverage CSVs. ## API One function: ```python from binaryornot.check import is_binary is_binary(filename) # returns True or False ``` There's also `is_binary_string()` if you already have bytes: ```python from binaryornot.helpers import is_binary_string is_binary_string(b"\x00\x01\x02") # True is_binary_string(b"hello world") # False ``` [Full documentation](https://binaryornot.github.io/binaryornot/) covers the detection algorithm in detail. ## Credits Created by [Audrey Roy Greenfeld](https://audrey.feldroy.com).
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 0.6.0 | Imported from PyPI (0.6.0) | Low | 4/21/2026 |
| v0.6.0 | BinaryOrNot identifies binary files three ways: by extension, by file signature, and by content analysis. Pass it any file path and it tells you binary or text, accurately, across PNGs, PDFs, executables, archives, fonts, CJK-encoded text, and hundreds of other formats. ``` uv pip install --upgrade binaryornot ``` ### What's new **131 file types recognized by name.** `is_binary()` checks the filename extension against a curated list of binary types (images, audio, video, archives, executables | Low | 3/8/2026 |
| v0.5.0 | This is the biggest release in BinaryOrNot's history. I rebuilt the detection engine from the ground up. The original used byte ratio heuristics with chardet as a second opinion for ambiguous files. I replaced all of that with a trained decision tree operating on 23 features, covering 49 binary formats and 37 text encodings, with zero external dependencies. It's backed by 211 tests and a training pipeline you can re-run yourself. If you've ever had BinaryOrNot misidentify a UTF-16 file, choke on | Low | 3/7/2026 |
| 0.4.0 | - Enhanced detection for some binary streams and UTF texts. (#10, 11) Thanks @pombredanne. - Set up Appveyor for continuous testing on Windows. Thanks @pydanny. - Update link to Perl source implementation. (#9) Thanks @asmeurer @pombredanne @audreyr. - Handle UnicodeDecodeError in check. (#12) Thanks @DRMacIver. - Add very simple Hypothesis based tests. (#13) Thanks @DRMacIver. - Use setup to determine requirements and remove redundant requirements.txt. (#14) Thanks @hackebrot. - Add documentati | Low | 8/22/2015 |
