Description
 Computing dot-products, similarity measures, and distances between low- and high-dimensional vectors is ubiquitous in Machine Learning, Scientific Computing, Geospatial Analysis, and Information Retrieval. These algorithms generally have linear complexity in time, constant or linear complexity in space, and are data-parallel. In other words, it is easily parallelizable and vectorizable and often available in packages like BLAS (level 1) and LAPACK, as well as higher-level `numpy` and `scipy` Python libraries. Ironically, even with decades of evolution in compilers and numerical computing, [most libraries can be 3-200x slower than hardware potential][benchmarks] even on the most popular hardware, like 64-bit x86 and Arm CPUs. Moreover, most lack mixed-precision support, which is crucial for modern AI! The rare few that support minimal mixed precision, run only on one platform, and are vendor-locked, by companies like Intel and Nvidia. SimSIMD provides an alternative. 1๏ธโฃ SimSIMD functions are practically as fast as `memcpy`. 2๏ธโฃ Unlike BLAS, most kernels are designed for mixed-precision and bit-level operations. 3๏ธโฃ SimSIMD often [ships more binaries than NumPy][compatibility] and has more backends than most BLAS implementations, and more high-level interfaces than most libraries. [benchmarks]: https://ashvardanian.com/posts/simsimd-faster-scipy [compatibility]: https://pypi.org/project/simsimd/#files <div> <a href="https://pepy.tech/project/simsimd"> <img alt="PyPI" src="https://static.pepy.tech/personalized-badge/simsimd?period=total&units=abbreviation&left_color=black&right_color=blue&left_text=SimSIMD%20Python%20installs" /> </a> <a href="https://www.npmjs.com/package/simsimd"> <img alt="npm" src="https://img.shields.io/npm/dy/simsimd?label=JavaScript%20NPM%20installs" /> </a> <a href="https://crates.io/crates/simsimd"> <img alt="rust" src="https://img.shields.io/crates/d/simsimd?label=Rust%20Crate%20installs" /> </a> <img alt="GitHub code size in bytes" src="https://img.shields.io/github/languages/code-size/ashvardanian/simsimd"> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions Ubuntu" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=Ubuntu&logo=github&color=blue"> </a> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions Windows" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=Windows&logo=windows&color=blue"> </a> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions macOS" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=macOS&logo=apple&color=blue"> </a> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions CentOS Linux" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=CentOS&logo=centos&color=blue"> </a> </div> ## Features __SimSIMD__ (Arabic: "ุณูู ุณูู ุฏู") is a mixed-precision math library of __over 350 SIMD-optimized kernels__ extensively used in AI, Search, and DBMS workloads. Named after the iconic ["Open Sesame"](https://en.wikipedia.org/wiki/Open_sesame) command that opened doors to treasure in _Ali Baba and the Forty Thieves_, SimSIMD can help you 10x the cost-efficiency of your computational pipelines. Implemented distance functions include: - Euclidean (L2) and Cosine (Angular) spatial distances for Vector Search. _[docs][docs-spatial]_ - Dot-Products for real & complex vectors for DSP & Quantum computing. _[docs][docs-dot]_ - Hamming (~ Manhattan) and Jaccard (~ Tanimoto) bit-level distances. _[docs][docs-binary]_ - Set Intersections for Sparse Vectors and Text Analysis. _[docs][docs-sparse]_ - Mahalanobis distance and Quadratic forms for Scientific Computing. _[docs][docs-curved]_ - Kullback-Leibler and JensenโShannon divergences for probability distributions. _[docs][docs-probability]_ - Fused-Multiply-Add (FMA) and Weighted Sums to replace BLAS level 1 functions. _[docs][docs-fma]_ - For Levenshtein, NeedlemanโWunsch, and Smith-Waterman, check [StringZilla][stringzilla]. - ๐ Haversine and Vincenty's formulae for Geospatial Analysis. [docs-spatial]: #cosine-similarity-reciprocal-square-root-and-newton-raphson-iteration [docs-curved]: #curved-spaces-mahalanobis-distance-and-bilinear-quadratic-forms [docs-sparse]: #set-intersection-galloping-and-binary-search [docs-binary]: https://github.com/ashvardanian/SimSIMD/pull/138 [docs-dot]: #complex-dot-products-conjugate-dot-products-and-complex-numbers [docs-probability]: #logarithms-in-kullback-leibler--jensenshannon-divergences [docs-fma]: #mixed-p
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 6.5.16 | Imported from PyPI (6.5.16) | Low | 4/21/2026 |
| v7.6.0 | ## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now | High | 4/20/2026 |
| v7.5.0 | - Built-in OpenMP bundling for JS & Python ๐ - Intel Granite Rapids ๐ชจ F16 โ F32 GEMMs ๐ - Faster bit-vector population counts for Arm NEON ๐ฆพ - SME compatibility with non-Apple Clang on Apple machines ๐ - Hardening against MSan SVE false-positives, thanks to @alexey-milovidov ๐ฆบ - Hardening against GCC 13 Arm NEON code-gen bugs, thanks to @swasik ๐ - `_into` & `_parallel` GEMM Rust APIs: reusing memory & [ForkUnion](https://github.com/ashvardanian/ForkUnion) pools ๐ - De-vec | Medium | 4/14/2026 |
| v7.4.5 | - Improve: Vectorize F32 SME MaxSim finalizer (0daacf3b) - Improve: Remove centering from RMSD kernels (1a83ab4f) - Fix: Emulated vs native test durations (4266451d) | Medium | 4/6/2026 |
| v7.4.4 | - Fix: ARMv7 Rust cross-compilation with CC for versioned GCC (a5e67e60) - Make: `check_source_runs`-probing like `march=native` on MSVC (7a152f3b) - Fix: Drop `_MM_FROUND_NO_EXC` from `_mm256_cvtps_ph` calls (8649b0c0) - Fix: Guard against old MSVC preprocessor (25d33048) - Make: Enforce newer preprocessor in MSVC (be966af2) - Make: Cleaner CIBW artifact names & env forwarding (a6cf6424) - Make: Forward cross-compilation flags for macOS wheels (6ed3b8c2) - Make: Split ppc64le, s390x, i68 | Medium | 4/6/2026 |
| v7.4.3 | Release: v7.4.3 [skip ci] ### Patch - Fix: Require AArch64 for NEON kernels (2ba1b343) - Docs: Table order & formatting (8673a56f) - Make: Avoid `--all-features` in Rust cross-compilation CI (8be8bffe) - Improve: Arm32 compatibility (64041725) - Make: `cancel-in-progress` CI to shift compute resources (dfc8fa02) - Improve: Harden Swift SDK for 6.1+ toolkit (965cd524) - Make: Strip `.unsafeFlags` & list platforms for SPM consumption (b061b78d) - Make: Expose `CNumKongDispatch` target to Swift us | Medium | 4/5/2026 |
| v7.4.2 | Release: v7.4.2 [skip ci] ### Patch - Docs: Shrink tables in the main README (6d2ea345) - Make: Inline Power Shell cross-compilation logic in CI (974c30ca) - Make: Define `_ARM64_` for Arm JS builds in MSVC (f3030420) - Make: Skip same-named artifacts on CI reruns (7c098e51) | Medium | 4/5/2026 |
| v7.4.1 | Release: v7.4.1 [skip ci] ### Patch - Make: Set `repository.url` for NPM (385480d2) - Make: Pull MSVC ARM64 Cross-Compiler (e20c93ef) - Fix: Swap `f16x8` for `u16x8` in `cast_neon` (154ec5db) | Medium | 4/5/2026 |
| v7.4.0 | - Faster tensor contractions - Faster GEMM "packers" with SIMD - New SVE+SDOT kernels for `i8` - MSVC build stability on Arm ### Minor - Add: WASM elementwise ops & spatial mini-float kernels (81b8c449) - Add: WASM type-casting kernels (e09df318) - Add: SVE+SDOT ops for 8-bit integers (913fc6b0) ### Patch - Fix: Misplaced NEON loads/stores in Sierra (05e30455) - Fix: Avoid unconsitional `np` symbols (9dffb681) - Make: Resolve probe locations for NPM consumers (c602f45f) - Doc | Medium | 4/4/2026 |
| v7.3.0 | This release hardens Arm kernels across NEON, SVE, and SME. The most widespread fix replaces `_x` (don't-care) predicated intrinsics with `_m` (merge-with-zero) variants โ inactive lanes left undefined by `_x` could carry stale data into reductions, producing wrong results for non-power-of-two dimensions on real SVE hardware. Partial-tail padding in `BMOPA` is fixed for sub-32-bit types, and strided reductions in NEON are hardened against off-by-one in non-contiguous layouts. > Thanks to the | Medium | 4/2/2026 |
| v7.2.4 | Release: v7.2.4 [skip ci] ### Patch - Make: 2h timeout budget for JS & Py builds (2e8f081e) | Medium | 3/28/2026 |
| v7.2.3 | Release: v7.2.3 [skip ci] ### Patch - Fix: Harden implicit narrowing casts (319fae28) - Fix: Negating unsigned integers in MSVC (9be61e3d) - Make: Retry flaky CI jobs (b622d630) - Make: Remove conflicting NEON probes (c0f35733) | Medium | 3/28/2026 |
| v7.2.2 | Release: v7.2.2 [skip ci] ### Patch - Make: Trusted publishing for NPM (95782713) - Improve: VNNI spatial kernels for E2M3, E3M2, & E4M3 (02d53256) - Fix: `NK_TARGET_NEON` auto-detect in MSVC (4ad21241) | Medium | 3/28/2026 |
| v7.2.1 | Release: v7.2.1 [skip ci] ### Patch - Improve: Listing compile-time capabilities (0e9f04a8) - Improve: Flush Float16 sums in `spatial/` Float6 kernels (52606b0e) - Make: Slimmer NPM packages per platform (0a18afcb) - Improve: Lower E4M3 Genoa to Icelake with 40% gains (8ade366e) | Medium | 3/28/2026 |
| v7.2.0 | Nvidia just unveiled Arm-based Olympus cores and Vera CPUs with native support for 8-bit floating-point numbers (FP8). Intel's Xeon 7 Diamond Rapids and Nova Lake CPUs with FP8 may arrive even sooner through the new AVX 10.2 extensions. FP8 arithmetic is at the heart of modern LLM inference, but most of the world's CPUs don't have it yet. NumKong v7.2 bridges that gap โ native FP8 on the new chips, efficient emulation on everything else โ so more global infrastructure is ready for AI workloads s | Medium | 3/28/2026 |
| v7.1.1 | - Improve: Smaller `TensorError` state (c5475be2) - Improve: Apply `StorageElement` to every operation class (98064815) - Improve: Drop redundant NEON MinMax in FHM & BFDOT files (96c869f8) - Improve: Simpler `i4` dot-product in NEON (bf61c2c4) - Docs: Apple M5 instruction timings & x86 refresh (835ae52a) - Fix: Fill only upper triangle in other SME kernels (2a93c309) - Fix: Filling only upper triangle in `u1_smebi32` kernels (68f5963b) - Fix: Harden SME streming behaviour (8fe8cc9f) - I | Medium | 3/22/2026 |
| v7.1.0 | - Zero-copy Tensor exchange in Python, Rust, & C++ - `std::format` & `core::fmt::Display` for Rust & C++ - Tensors & multi-dimensional iterators for sub-byte types - Documenting Python reductions along an `axis=` - Faster `dtype=` hints resolution in CPython binding - Upgraded CI for Clang cross-compiled binaries - Compiling SME feature checks with old Assembler - Simplify WASM backend usage in browsers ### Minor - Add: `nk::cast` & reduction helpers for C++ (80ff0b03) - Add: Print | Low | 3/21/2026 |
| v7.0.0 | What started as a straightforward optimization request from the @albumentations-team โ improving element-wise operations between equi-dimensional arrays โ snowballed into the largest piece of open-source work I've done in years. __200K+ lines of SIMD across 2'000+ kernels__: - targeting every major vector ISA, grouped by platform and shape โ __x86 AVX2 vectors__ on Haswell, Alder Lake, Sierra Forest ยท __x86 AVX-512 vectors__ on Skylake, Ice Lake, Genoa, Sapphire Rapids, Turin ยท __Intel's fixe | Low | 3/17/2026 |
| v6.5.16 | Release: v6.5.16 [skip ci] ### Patch - Fix: Surround `#pragma clang` with checks for Clang (#192) (f871d803) - Improve: Reduce native half-precision usage (486d8b5a) - Fix: Unpoison SIMD dispatch results for MemorySanitizer (#304) (2513ee7f) - Fix: Enlarge dummy buffer for SVE predicated loads (#307) (fe9327c5) | Low | 3/7/2026 |
| v6.5.15 | Release: v6.5.15 [skip ci] ### Patch - Fix: Initialize `dummy_input` to fix MSan false positive (#302) (c2ad842d) | Low | 3/4/2026 |
| v6.5.14 | Release: v6.5.14 [skip ci] ### Patch - Fix: Wrong predicate width in BF16 SVE L2 kernel (#301) (87ae846) - Improve: FreeBSD comp-time target selection (#300) (cb11f8b) | Low | 3/3/2026 |
| v6.5.13 | Release: v6.5.13 [skip ci] ### Patch - Fix: Replace `avx2vnni` with `avxvnni` for Sierra Forest (#296) (a8bb232) - Make: Remove `NPM_TOKEN` for OIDC publishing (13cd5bc) - Make: Sign rebase with GitHub Actions bot (e7b89b5) - Fix: Revert to `atol=1` for test integer outputs vs SciPy (b75bdbd) | Low | 2/16/2026 |
| v6.5.12 | Release: v6.5.12 [skip ci] ### Patch - Make: Same upload/download CI versions (ae9e567) | Low | 12/21/2025 |
| v6.5.11 | Release: v6.5.11 [skip ci] ### Patch - Improve: Round integer distances (c487b55) - Fix: Absolute tolerance bound for integers (73a9ff7) - Make: Skip flaky Arm failures (6be67bb) - Fix: NEON guard for u8 dot dispatch (2c5876d) | Low | 12/20/2025 |
| v6.5.10 | Release: v6.5.10 [skip ci] ### Patch - Make: Re-attempt forwarding `NPM_TOKEN` (714c615) - Fix: Misusing `pytest.warns` (2135a58) - Docs: Wording & spelling inconsistencies (1cc8f71) - Fix: `f32` to `bf16` down-casting on BIG-endian (a801b58) | Low | 12/18/2025 |
| v6.5.9 | Release: v6.5.9 [skip ci] ### Patch - Make: Deno `--no-check` for CI (9976b88) - Fix: CMake relative paths for Termux compatibility (#288) (07976ad) - Make: NPM w/out `NODE_AUTH_TOKEN` (63cd55b) - Fix: Length check in `jaccard_b8_ice` and `hamming_b8_ice` (#286) (a7cc7e1) - Make: Python 3.14 builds (#271) (e4d62e7) - Fix: Avoid `sqrt(0)` in `probability.h` (108a8b5) - Fix: `u64size` in Rust to match the C ABI (31195e9) - Make: Stack-realign for `i386` builds (5a386f5) - Make: 32-bit cross-compi | Low | 12/17/2025 |
| v6.5.8 | Release: v6.5.8 [skip ci] ### Patch - Make: Avoid half-precision NEON on Windows (6541157) - Make: Retire `macos-13` runners (7d6358e) | Low | 12/17/2025 |
| v6.5.7 | Release: v6.5.7 [skip ci] ### Patch - Make: Bump CI versions (fdad95c) - Make: NPM Trusted Publishing (581623a) - Make: Conservative Sierra Forest flags (1b2f16c) | Low | 12/17/2025 |
| v6.5.6 | Release: v6.5.6 [skip ci] ### Patch - Make: Explicit cross-compilation overrides (b97ad62) | Low | 12/17/2025 |
| v6.5.5 | - Improve: Faster sparse dot product (e5dad6c) - Improve: Turin kernels & cleaner loops in `sparse.h` (d6e17b1) - Fix: `dot_bf16_neon` step (8f3ef10) - Fix: Jensen-Shannon masked accumulation (b9d7834) - Improve: Runtime-defined dimensions (d9ca85d) - Improve: Broader Rust tests (cce9374) - Improve: Test `bf16` dot product (658901d) - Improve: Naming baseline kernels & benchmarks (a897aa9) - Improve: Log accuracy of `i8` & `f32` kernels (c6db82d) - Improve: Slice overlap chack steps (ea | Low | 11/13/2025 |
| v6.5.4 | Release: v6.5.4 [skip ci] ### Patch - Fix: `intersect_u16` test in Rust (682556e) - Fix: Check macro presence on Windows (56a01ef) - Fix: Resetting capability in PyTest (7243bf6) - Fix: JS division by zero with +eps (02fa2a5) - Fix: `ComplexProducts` number of dimensions in Rust (ad429dd) - Improve: Detect NEON+DP via WinAPI (e6cfcad) - Docs: Enumerating x86 platforms (97fa158) - Fix: Probe `mrs` for avoid `SIGILL` on older Arm (b139cc9) | Low | 10/30/2025 |
| v6.5.3 | Release: v6.5.3 [skip ci] ### Patch - Make: Co-package PyTests (#278) (4491f09) | Low | 9/6/2025 |
| v6.5.2 | ### Patch - Make: Rust 1.64 compatibility (889bf25) - Docs: Inconsistencies & typos (301d59c) - Make: Avoid `Cargo.lock` for the library (5b9c207) - Make: MSVC-friendlier Rust builds (e940014) - Improve: Naming Rust tests (e1fab5c) | Low | 9/5/2025 |
| v6.5.1 | Release: v6.5.1 [skip ci] ### Patch - Make: Avoid `--lib-sdir .` on Linux (39623cc) - Docs: Probability Distributions in Rust (af7c145) | Low | 8/17/2025 |
| v6.5.0 | SimSIMD has historically been one of the largest collections of mixed-precision kernels, but `f32` to/from `f16` and `bf16` conversion operators have never been exposed to bindings. This release is the first step in that direction. I look forward to everyone's suggestions on how to further improve the Rust API. Thanks ๐ค --- Here's an example: ```rs use simsimd::{SpatialSimilarity, f16, bf16}; // Process embeddings at different precisions for speed vs accuracy trade-offs let embed | Low | 7/7/2025 |
| v6.4.10 | Other minor tweaks: - [x] `bf16` L2 calculation in Rust - [x] flushing denormals in Rust - [x] `nonnull` build warnings in GCC & Clang - [x] upgrading JS dependencies ### Patch - Fix: Require NumPy for GIL tests (529b0dd) - Improve: Free threading examples & checks (83e522a) - Make: Enable free-threading `CIBW` builds (0093c3f) - Docs: Setting up `uv` env (8dc7012) - Improve: GIL-free batch-processing in Py (eb234d5) - Make: Drop Python 3.7 for 3.13t (fc62de4) - Improve: Flush | Low | 7/6/2025 |
| v6.4.9 | Release: v6.4.9 [skip ci] ### Patch - Fix: add dot i8 (Rust) (eaeb3b7) | Low | 6/8/2025 |
| v6.4.8 | - Fix: GCC can't handle `v8.0-a` decimal (4116f8a) - Fix: `f16`, `i8`, `bf16` compile-time dispatch (29c0f46) - Docs: Globally unset `DEVELOPER_DIR` (22bb40b) - Fix: Check for NEON for R-profile CPUs (a6bbf9e) - Make: Lower `armv8.2` to `armv8.0` requirement (31fbdcd) - Improve: Set `nonnull` attributes (fc61d19) - Docs: Unset `DEVELOPER_DIR` on macOS (69c6614) - Make: Bump Google Benchmark (cca25a0) - Docs: Refresh C example (73e6ccb) | Low | 6/6/2025 |
| v6.4.7 | Release: v6.4.7 [skip ci] ### Patch - Make: Differentiate `cibw` uploads (9116b2a) | Low | 6/1/2025 |
| v6.4.6 | Release: v6.4.6 [skip ci] ### Patch - Fix: Deno testing CLI commands (be8acfb) - Make: Bump vulnerable JS deps (506b816) - Make: Checking env. variables on macOS (6cb256f) - Make: Try compiling wheels with different flags (77870cf) - Make: Enable Deno to run pre-builds (71b3412) - Make: Set `f16c` flag for `_cvtss_sh` (1215418) - Fix: Pedantic `_Float16` cast warnings (3230095) - Make: Overwrite JS bundles (c252f84) - Fix: Missing `avx512dq` flags (ace4f7e) | Low | 6/1/2025 |
| v6.4.5 | Release: v6.4.5 [skip ci] ### Patch - Fix: Aliasing of half-precision types (abb2d88) | Low | 5/30/2025 |
| v6.4.4 | Release: v6.4.4 [skip ci] ### Patch - Make: Return Rust build errors (#264) (7e3b493) | Low | 5/13/2025 |
| v6.4.3 | Release: v6.4.3 [skip ci] ### Patch - Fix: Use correct type in sparse dot-product macro (354a6b8) | Low | 4/24/2025 |
| v6.4.2 | Release: v6.4.2 [skip ci] ### Patch - Fix: `i4` cosine on Ice Lake (#262) (ffdbbf8) | Low | 4/23/2025 |
| v6.4.1 | Release: v6.4.1 [skip ci] ### Patch - Docs: Dual-licensing with 3-clause BSD (7520fcf) | Low | 3/31/2025 |
| v6.4.0 | Release: v6.4.0 [skip ci] ### Minor - Add: Expose L2 distance in Swift (#255) (b106afc) | Low | 2/26/2025 |
| v6.3.4 | Release: v6.3.4 [skip ci] ### Patch - Fix: Turin kernels for `spdot` (#252) (5044fef) | Low | 2/19/2025 |
| v6.3.3 | Release: v6.3.3 [skip ci] ### Patch - Improve: Sparse intersection dependency chain (#251) (b8ee93f) | Low | 2/14/2025 |
| v6.3.2 | Release: v6.3.2 [skip ci] ### Patch - Make: Upgrade deprecated CI tools (d9bc3d2) | Low | 2/5/2025 |
| v6.3.1 | Release: v6.3.1 [skip ci] ### Patch - Make: Update `release.yml` for Arm (d8c6f40) - Make: Use official Docker repo (48d39e3) - Make: Remove conflicting `containerd` on Arm (41e02f9) - Make: Install Docker on Aarch64 (636a22d) - Make: Avoid `extras` repo in `yum` on Aarch64 (aa5aced) - Fix: Wrong variable used in l2sq_bf16_sve (c26008b) - Make: Resolve Windows build conflicts (8e50840) | Low | 2/5/2025 |
| v6.3.0 | Release: v6.3.0 [skip ci] ### Minor - Add: `simsimd_flush_denormals` (63af257) ### Patch - Make: Faster `cibuildwheel` releases (e54e939) - Make: Fix CI instance label (60048b4) - Make: Use newer Python for `cibuildwheel` (e922019) - Make: Skip 32-bit Windows Python images (372480a) - Make: Use newer image for Arm CI (a63f55f) - Make: Patch `pyproject.toml` (b3e35a9) - Make: Skip PyPy builds (1fe7faa) - Make: `test-command` Windows compatibility (dea5b71) - Make: Skip `armv7l` PyPi builds (92 | Low | 1/24/2025 |
