Description
# Langcodes: a library for language codes **langcodes** knows what languages are. It knows the standardized codes that refer to them, such as `en` for English, `es` for Spanish and `hi` for Hindi. These are [IETF language tags][]. You may know them by their old name, ISO 639 language codes. IETF has done some important things for backward compatibility and supporting language variations that you won't find in the ISO standard. [IETF language tags]: https://www.w3.org/International/articles/language-tags/ It may sound to you like langcodes solves a pretty boring problem. At one level, that's right. Sometimes you have a boring problem, and it's great when a library solves it for you. But there's an interesting problem hiding in here. How do you work with language codes? How do you know when two different codes represent the same thing? How should your code represent relationships between codes, like the following? * `eng` is equivalent to `en`. * `fra` and `fre` are both equivalent to `fr`. * `en-GB` might be written as `en-gb` or `en_GB`. Or as 'en-UK', which is erroneous, but should be treated as the same. * `en-CA` is not exactly equivalent to `en-US`, but it's really, really close. * `en-Latn-US` is equivalent to `en-US`, because written English must be written in the Latin alphabet to be understood. * The difference between `ar` and `arb` is the difference between "Arabic" and "Modern Standard Arabic", a difference that may not be relevant to you. * You'll find Mandarin Chinese tagged as `cmn` on Wiktionary, but many other resources would call the same language `zh`. * Chinese is written in different scripts in different territories. Some software distinguishes the script. Other software distinguishes the territory. The result is that `zh-CN` and `zh-Hans` are used interchangeably, as are `zh-TW` and `zh-Hant`, even though occasionally you'll need something different such as `zh-HK` or `zh-Latn-pinyin`. * The Indonesian (`id`) and Malaysian (`ms` or `zsm`) languages are mutually intelligible. * `jp` is not a language code. (The language code for Japanese is `ja`, but people confuse it with the country code for Japan.) One way to know is to read IETF standards and Unicode technical reports. Another way is to use a library that implements those standards and guidelines for you, which langcodes does. When you're working with these short language codes, you may want to see the name that the language is called _in_ a language: `fr` is called "French" in English. That language doesn't have to be English: `fr` is called "franΓ§ais" in French. A supplement to langcodes, [`language_data`][language-data], provides this information. [language-data]: https://github.com/rspeer/language_data langcodes is maintained by Elia Robyn Lake a.k.a. Robyn Speer, and is released as free software under the MIT license. ## Standards implemented Although this is not the only reason to use it, langcodes will make you more acronym-compliant. langcodes implements [BCP 47](http://tools.ietf.org/html/bcp47), the IETF Best Current Practices on Tags for Identifying Languages. BCP 47 is also known as RFC 5646. It subsumes ISO 639 and is backward compatible with it, and it also implements recommendations from the [Unicode CLDR](http://cldr.unicode.org). langcodes can also refer to a database of language properties and names, built from Unicode CLDR and the IANA subtag registry, if you install `language_data`. In summary, langcodes takes language codes and does the Right Thing with them, and if you want to know exactly what the Right Thing is, there are some documents you can go read. # Documentation ## Standardizing language tags This function standardizes tags, as strings, in several ways. It replaces overlong tags with their shortest version, and also formats them according to the conventions of BCP 47: >>> from langcodes import * >>> standardize_tag('eng_US') 'en-US' It removes script subtags that are redundant with the language: >>> standardize_tag('en-Latn') 'en' It replaces deprecated values with their correct versions, if possible: >>> standardize_tag('en-uk') 'en-GB' Sometimes this involves complex substitutions, such as replacing Serbo-Croatian (`sh`) with Serbian in Latin script (`sr-Latn`), or the entire tag `sgn-US` with `ase` (American Sign Language). >>> standardize_tag('sh-QU') 'sr-Latn-EU' >>> standardize_tag('sgn-US') 'ase' If *macro* is True, it uses macrolanguage codes as a replacement for the most common standardized language within that macrolanguage. >>> standardize_tag('arb-Arab', macro=True) 'ar' Even when *macro* is False, it shortens tags that contain both the macrolanguage and the language: >>> standardize_tag('zh-cmn-hans-cn') 'zh-Hans-CN' If the tag can't be parsed according to BCP 47, this will raise a LanguageTagError (a subclass of ValueError): >>> standardize_tag('spa-latn-mx') 'es-MX'
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 3.5.1 | Imported from PyPI (3.5.1) | Low | 4/21/2026 |
| v3.5.1 | ## What's Changed * style: fix typos by @kianmeng in https://github.com/georgkrause/langcodes/pull/28 * fix: Do not install language-data by default by @georgkrause in https://github.com/georgkrause/langcodes/pull/30 * fix: Add warning that best_match is deprecated by @georgkrause in https://github.com/georgkrause/langcodes/pull/31 * chore(deps): update dependency python to 3.14 by @renovate[bot] in https://github.com/georgkrause/langcodes/pull/41 * chore(deps): update actions/upload-artifa | Low | 12/2/2025 |
| v3.5.0 | ## What's Changed * This release adds support for python 3.13 and drops support for python 3.8 * feat: Added `ignore_script` and tested it. by @mtrd3v in https://github.com/georgkrause/langcodes/pull/17 * Run tests against Python 3.13 by @georgkrause in https://github.com/georgkrause/langcodes/pull/24 * ci: Allow Action comments on fork PRs by @georgkrause in https://github.com/georgkrause/langcodes/pull/25 * chore: Drop support for python 3.8, bump python 3.12 to stable by @georgkrause in | Low | 11/19/2024 |
| v3.4.1 | ## What's Changed * fix: Rework Language.__hash__ by @moi15moi in https://github.com/georgkrause/langcodes/pull/20 * chore(deps): update actions/setup-python action to v5 by @renovate in https://github.com/georgkrause/langcodes/pull/12 * chore: Increase coverage by actually executing all tests ## New Contributors * @moi15moi made their first contribution in https://github.com/georgkrause/langcodes/pull/20 **Full Changelog**: https://github.com/georgkrause/langcodes/compare/v3.4.0...v3. | Low | 9/25/2024 |
| v3.4.0 | ## What's Changed * Compatibility with python > 3.10 * Configure Renovate by @renovate in https://github.com/georgkrause/langcodes/pull/7 * ci: Run tests in pipeline by @georgkrause in https://github.com/georgkrause/langcodes/pull/6 * chore(deps): update actions/setup-python action to v5 by @renovate in https://github.com/georgkrause/langcodes/pull/9 * Update language data 1.2.0 by @georgkrause in https://github.com/georgkrause/langcodes/pull/10 * feat: Automate pypi deployment by @georgkr | Low | 4/24/2024 |
| v3.4.0-dev2 | ## What's Changed * Configure Renovate by @renovate in https://github.com/georgkrause/langcodes/pull/7 * ci: Run tests in pipeline by @georgkrause in https://github.com/georgkrause/langcodes/pull/6 * chore(deps): update actions/setup-python action to v5 by @renovate in https://github.com/georgkrause/langcodes/pull/9 * Update language data 120 by @georgkrause in https://github.com/georgkrause/langcodes/pull/10 ## New Contributors * @renovate made their first contribution in https://github | Low | 4/24/2024 |
| v3.4.0-dev1 | ## What's Changed * Configure Renovate by @renovate in https://github.com/georgkrause/langcodes/pull/7 * ci: Run tests in pipeline by @georgkrause in https://github.com/georgkrause/langcodes/pull/6 * chore(deps): update actions/setup-python action to v5 by @renovate in https://github.com/georgkrause/langcodes/pull/9 * Update language data 1.2.0 by @georgkrause in https://github.com/georgkrause/langcodes/pull/10 ## New Contributors * @georgkrause made their first contribution in https://g | Low | 4/24/2024 |
