freshcrate
Home > Databases > simsimd

simsimd

Portable mixed-precision BLAS-like vector math library for x86 and ARM

Description

![SimSIMD banner](https://github.com/ashvardanian/ashvardanian/blob/master/repositories/SimSIMD.jpg?raw=true) Computing dot-products, similarity measures, and distances between low- and high-dimensional vectors is ubiquitous in Machine Learning, Scientific Computing, Geospatial Analysis, and Information Retrieval. These algorithms generally have linear complexity in time, constant or linear complexity in space, and are data-parallel. In other words, it is easily parallelizable and vectorizable and often available in packages like BLAS (level 1) and LAPACK, as well as higher-level `numpy` and `scipy` Python libraries. Ironically, even with decades of evolution in compilers and numerical computing, [most libraries can be 3-200x slower than hardware potential][benchmarks] even on the most popular hardware, like 64-bit x86 and Arm CPUs. Moreover, most lack mixed-precision support, which is crucial for modern AI! The rare few that support minimal mixed precision, run only on one platform, and are vendor-locked, by companies like Intel and Nvidia. SimSIMD provides an alternative. 1๏ธโƒฃ SimSIMD functions are practically as fast as `memcpy`. 2๏ธโƒฃ Unlike BLAS, most kernels are designed for mixed-precision and bit-level operations. 3๏ธโƒฃ SimSIMD often [ships more binaries than NumPy][compatibility] and has more backends than most BLAS implementations, and more high-level interfaces than most libraries. [benchmarks]: https://ashvardanian.com/posts/simsimd-faster-scipy [compatibility]: https://pypi.org/project/simsimd/#files <div> <a href="https://pepy.tech/project/simsimd"> <img alt="PyPI" src="https://static.pepy.tech/personalized-badge/simsimd?period=total&units=abbreviation&left_color=black&right_color=blue&left_text=SimSIMD%20Python%20installs" /> </a> <a href="https://www.npmjs.com/package/simsimd"> <img alt="npm" src="https://img.shields.io/npm/dy/simsimd?label=JavaScript%20NPM%20installs" /> </a> <a href="https://crates.io/crates/simsimd"> <img alt="rust" src="https://img.shields.io/crates/d/simsimd?label=Rust%20Crate%20installs" /> </a> <img alt="GitHub code size in bytes" src="https://img.shields.io/github/languages/code-size/ashvardanian/simsimd"> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions Ubuntu" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=Ubuntu&logo=github&color=blue"> </a> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions Windows" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=Windows&logo=windows&color=blue"> </a> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions macOS" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=macOS&logo=apple&color=blue"> </a> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions CentOS Linux" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=CentOS&logo=centos&color=blue"> </a> </div> ## Features __SimSIMD__ (Arabic: "ุณูŠู…ุณูŠู… ุฏูŠ") is a mixed-precision math library of __over 350 SIMD-optimized kernels__ extensively used in AI, Search, and DBMS workloads. Named after the iconic ["Open Sesame"](https://en.wikipedia.org/wiki/Open_sesame) command that opened doors to treasure in _Ali Baba and the Forty Thieves_, SimSIMD can help you 10x the cost-efficiency of your computational pipelines. Implemented distance functions include: - Euclidean (L2) and Cosine (Angular) spatial distances for Vector Search. _[docs][docs-spatial]_ - Dot-Products for real & complex vectors for DSP & Quantum computing. _[docs][docs-dot]_ - Hamming (~ Manhattan) and Jaccard (~ Tanimoto) bit-level distances. _[docs][docs-binary]_ - Set Intersections for Sparse Vectors and Text Analysis. _[docs][docs-sparse]_ - Mahalanobis distance and Quadratic forms for Scientific Computing. _[docs][docs-curved]_ - Kullback-Leibler and Jensenโ€“Shannon divergences for probability distributions. _[docs][docs-probability]_ - Fused-Multiply-Add (FMA) and Weighted Sums to replace BLAS level 1 functions. _[docs][docs-fma]_ - For Levenshtein, Needlemanโ€“Wunsch, and Smith-Waterman, check [StringZilla][stringzilla]. - ๐Ÿ”œ Haversine and Vincenty's formulae for Geospatial Analysis. [docs-spatial]: #cosine-similarity-reciprocal-square-root-and-newton-raphson-iteration [docs-curved]: #curved-spaces-mahalanobis-distance-and-bilinear-quadratic-forms [docs-sparse]: #set-intersection-galloping-and-binary-search [docs-binary]: https://github.com/ashvardanian/SimSIMD/pull/138 [docs-dot]: #complex-dot-products-conjugate-dot-products-and-complex-numbers [docs-probability]: #logarithms-in-kullback-leibler--jensenshannon-divergences [docs-fma]: #mixed-p

Release History

VersionChangesUrgencyDate
6.5.16Imported from PyPI (6.5.16)Low4/21/2026
v7.6.0## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers nowHigh4/20/2026
v7.5.0- Built-in OpenMP bundling for JS & Python ๐Ÿ - Intel Granite Rapids ๐Ÿชจ F16 โ†’ F32 GEMMs ๐Ÿ’Ž - Faster bit-vector population counts for Arm NEON ๐Ÿฆพ - SME compatibility with non-Apple Clang on Apple machines ๐Ÿ - Hardening against MSan SVE false-positives, thanks to @alexey-milovidov ๐Ÿฆบ - Hardening against GCC 13 Arm NEON code-gen bugs, thanks to @swasik ๐Ÿ‚ - `_into` & `_parallel` GEMM Rust APIs: reusing memory & [ForkUnion](https://github.com/ashvardanian/ForkUnion) pools ๐Ÿ†• - De-vecMedium4/14/2026
v7.4.5- Improve: Vectorize F32 SME MaxSim finalizer (0daacf3b) - Improve: Remove centering from RMSD kernels (1a83ab4f) - Fix: Emulated vs native test durations (4266451d) Medium4/6/2026
v7.4.4- Fix: ARMv7 Rust cross-compilation with CC for versioned GCC (a5e67e60) - Make: `check_source_runs`-probing like `march=native` on MSVC (7a152f3b) - Fix: Drop `_MM_FROUND_NO_EXC` from `_mm256_cvtps_ph` calls (8649b0c0) - Fix: Guard against old MSVC preprocessor (25d33048) - Make: Enforce newer preprocessor in MSVC (be966af2) - Make: Cleaner CIBW artifact names & env forwarding (a6cf6424) - Make: Forward cross-compilation flags for macOS wheels (6ed3b8c2) - Make: Split ppc64le, s390x, i68Medium4/6/2026
v7.4.3Release: v7.4.3 [skip ci] ### Patch - Fix: Require AArch64 for NEON kernels (2ba1b343) - Docs: Table order & formatting (8673a56f) - Make: Avoid `--all-features` in Rust cross-compilation CI (8be8bffe) - Improve: Arm32 compatibility (64041725) - Make: `cancel-in-progress` CI to shift compute resources (dfc8fa02) - Improve: Harden Swift SDK for 6.1+ toolkit (965cd524) - Make: Strip `.unsafeFlags` & list platforms for SPM consumption (b061b78d) - Make: Expose `CNumKongDispatch` target to Swift usMedium4/5/2026
v7.4.2Release: v7.4.2 [skip ci] ### Patch - Docs: Shrink tables in the main README (6d2ea345) - Make: Inline Power Shell cross-compilation logic in CI (974c30ca) - Make: Define `_ARM64_` for Arm JS builds in MSVC (f3030420) - Make: Skip same-named artifacts on CI reruns (7c098e51) Medium4/5/2026
v7.4.1Release: v7.4.1 [skip ci] ### Patch - Make: Set `repository.url` for NPM (385480d2) - Make: Pull MSVC ARM64 Cross-Compiler (e20c93ef) - Fix: Swap `f16x8` for `u16x8` in `cast_neon` (154ec5db) Medium4/5/2026
v7.4.0- Faster tensor contractions - Faster GEMM "packers" with SIMD - New SVE+SDOT kernels for `i8` - MSVC build stability on Arm ### Minor - Add: WASM elementwise ops & spatial mini-float kernels (81b8c449) - Add: WASM type-casting kernels (e09df318) - Add: SVE+SDOT ops for 8-bit integers (913fc6b0) ### Patch - Fix: Misplaced NEON loads/stores in Sierra (05e30455) - Fix: Avoid unconsitional `np` symbols (9dffb681) - Make: Resolve probe locations for NPM consumers (c602f45f) - DocMedium4/4/2026
v7.3.0This release hardens Arm kernels across NEON, SVE, and SME. The most widespread fix replaces `_x` (don't-care) predicated intrinsics with `_m` (merge-with-zero) variants โ€” inactive lanes left undefined by `_x` could carry stale data into reductions, producing wrong results for non-power-of-two dimensions on real SVE hardware. Partial-tail padding in `BMOPA` is fixed for sub-32-bit types, and strided reductions in NEON are hardened against off-by-one in non-contiguous layouts. > Thanks to the Medium4/2/2026
v7.2.4Release: v7.2.4 [skip ci] ### Patch - Make: 2h timeout budget for JS & Py builds (2e8f081e) Medium3/28/2026
v7.2.3Release: v7.2.3 [skip ci] ### Patch - Fix: Harden implicit narrowing casts (319fae28) - Fix: Negating unsigned integers in MSVC (9be61e3d) - Make: Retry flaky CI jobs (b622d630) - Make: Remove conflicting NEON probes (c0f35733) Medium3/28/2026
v7.2.2Release: v7.2.2 [skip ci] ### Patch - Make: Trusted publishing for NPM (95782713) - Improve: VNNI spatial kernels for E2M3, E3M2, & E4M3 (02d53256) - Fix: `NK_TARGET_NEON` auto-detect in MSVC (4ad21241) Medium3/28/2026
v7.2.1Release: v7.2.1 [skip ci] ### Patch - Improve: Listing compile-time capabilities (0e9f04a8) - Improve: Flush Float16 sums in `spatial/` Float6 kernels (52606b0e) - Make: Slimmer NPM packages per platform (0a18afcb) - Improve: Lower E4M3 Genoa to Icelake with 40% gains (8ade366e) Medium3/28/2026
v7.2.0Nvidia just unveiled Arm-based Olympus cores and Vera CPUs with native support for 8-bit floating-point numbers (FP8). Intel's Xeon 7 Diamond Rapids and Nova Lake CPUs with FP8 may arrive even sooner through the new AVX 10.2 extensions. FP8 arithmetic is at the heart of modern LLM inference, but most of the world's CPUs don't have it yet. NumKong v7.2 bridges that gap โ€” native FP8 on the new chips, efficient emulation on everything else โ€” so more global infrastructure is ready for AI workloads sMedium3/28/2026
v7.1.1- Improve: Smaller `TensorError` state (c5475be2) - Improve: Apply `StorageElement` to every operation class (98064815) - Improve: Drop redundant NEON MinMax in FHM & BFDOT files (96c869f8) - Improve: Simpler `i4` dot-product in NEON (bf61c2c4) - Docs: Apple M5 instruction timings & x86 refresh (835ae52a) - Fix: Fill only upper triangle in other SME kernels (2a93c309) - Fix: Filling only upper triangle in `u1_smebi32` kernels (68f5963b) - Fix: Harden SME streming behaviour (8fe8cc9f) - IMedium3/22/2026
v7.1.0- Zero-copy Tensor exchange in Python, Rust, & C++ - `std::format` & `core::fmt::Display` for Rust & C++ - Tensors & multi-dimensional iterators for sub-byte types - Documenting Python reductions along an `axis=` - Faster `dtype=` hints resolution in CPython binding - Upgraded CI for Clang cross-compiled binaries - Compiling SME feature checks with old Assembler - Simplify WASM backend usage in browsers ### Minor - Add: `nk::cast` & reduction helpers for C++ (80ff0b03) - Add: PrintLow3/21/2026
v7.0.0What started as a straightforward optimization request from the @albumentations-team โ€” improving element-wise operations between equi-dimensional arrays โ€” snowballed into the largest piece of open-source work I've done in years. __200K+ lines of SIMD across 2'000+ kernels__: - targeting every major vector ISA, grouped by platform and shape โ€” __x86 AVX2 vectors__ on Haswell, Alder Lake, Sierra Forest ยท __x86 AVX-512 vectors__ on Skylake, Ice Lake, Genoa, Sapphire Rapids, Turin ยท __Intel's fixeLow3/17/2026
v6.5.16Release: v6.5.16 [skip ci] ### Patch - Fix: Surround `#pragma clang` with checks for Clang (#192) (f871d803) - Improve: Reduce native half-precision usage (486d8b5a) - Fix: Unpoison SIMD dispatch results for MemorySanitizer (#304) (2513ee7f) - Fix: Enlarge dummy buffer for SVE predicated loads (#307) (fe9327c5) Low3/7/2026
v6.5.15Release: v6.5.15 [skip ci] ### Patch - Fix: Initialize `dummy_input` to fix MSan false positive (#302) (c2ad842d) Low3/4/2026
v6.5.14Release: v6.5.14 [skip ci] ### Patch - Fix: Wrong predicate width in BF16 SVE L2 kernel (#301) (87ae846) - Improve: FreeBSD comp-time target selection (#300) (cb11f8b) Low3/3/2026
v6.5.13Release: v6.5.13 [skip ci] ### Patch - Fix: Replace `avx2vnni` with `avxvnni` for Sierra Forest (#296) (a8bb232) - Make: Remove `NPM_TOKEN` for OIDC publishing (13cd5bc) - Make: Sign rebase with GitHub Actions bot (e7b89b5) - Fix: Revert to `atol=1` for test integer outputs vs SciPy (b75bdbd) Low2/16/2026
v6.5.12Release: v6.5.12 [skip ci] ### Patch - Make: Same upload/download CI versions (ae9e567) Low12/21/2025
v6.5.11Release: v6.5.11 [skip ci] ### Patch - Improve: Round integer distances (c487b55) - Fix: Absolute tolerance bound for integers (73a9ff7) - Make: Skip flaky Arm failures (6be67bb) - Fix: NEON guard for u8 dot dispatch (2c5876d) Low12/20/2025
v6.5.10Release: v6.5.10 [skip ci] ### Patch - Make: Re-attempt forwarding `NPM_TOKEN` (714c615) - Fix: Misusing `pytest.warns` (2135a58) - Docs: Wording & spelling inconsistencies (1cc8f71) - Fix: `f32` to `bf16` down-casting on BIG-endian (a801b58) Low12/18/2025
v6.5.9Release: v6.5.9 [skip ci] ### Patch - Make: Deno `--no-check` for CI (9976b88) - Fix: CMake relative paths for Termux compatibility (#288) (07976ad) - Make: NPM w/out `NODE_AUTH_TOKEN` (63cd55b) - Fix: Length check in `jaccard_b8_ice` and `hamming_b8_ice` (#286) (a7cc7e1) - Make: Python 3.14 builds (#271) (e4d62e7) - Fix: Avoid `sqrt(0)` in `probability.h` (108a8b5) - Fix: `u64size` in Rust to match the C ABI (31195e9) - Make: Stack-realign for `i386` builds (5a386f5) - Make: 32-bit cross-compiLow12/17/2025
v6.5.8Release: v6.5.8 [skip ci] ### Patch - Make: Avoid half-precision NEON on Windows (6541157) - Make: Retire `macos-13` runners (7d6358e) Low12/17/2025
v6.5.7Release: v6.5.7 [skip ci] ### Patch - Make: Bump CI versions (fdad95c) - Make: NPM Trusted Publishing (581623a) - Make: Conservative Sierra Forest flags (1b2f16c) Low12/17/2025
v6.5.6Release: v6.5.6 [skip ci] ### Patch - Make: Explicit cross-compilation overrides (b97ad62) Low12/17/2025
v6.5.5- Improve: Faster sparse dot product (e5dad6c) - Improve: Turin kernels & cleaner loops in `sparse.h` (d6e17b1) - Fix: `dot_bf16_neon` step (8f3ef10) - Fix: Jensen-Shannon masked accumulation (b9d7834) - Improve: Runtime-defined dimensions (d9ca85d) - Improve: Broader Rust tests (cce9374) - Improve: Test `bf16` dot product (658901d) - Improve: Naming baseline kernels & benchmarks (a897aa9) - Improve: Log accuracy of `i8` & `f32` kernels (c6db82d) - Improve: Slice overlap chack steps (eaLow11/13/2025
v6.5.4Release: v6.5.4 [skip ci] ### Patch - Fix: `intersect_u16` test in Rust (682556e) - Fix: Check macro presence on Windows (56a01ef) - Fix: Resetting capability in PyTest (7243bf6) - Fix: JS division by zero with +eps (02fa2a5) - Fix: `ComplexProducts` number of dimensions in Rust (ad429dd) - Improve: Detect NEON+DP via WinAPI (e6cfcad) - Docs: Enumerating x86 platforms (97fa158) - Fix: Probe `mrs` for avoid `SIGILL` on older Arm (b139cc9) Low10/30/2025
v6.5.3Release: v6.5.3 [skip ci] ### Patch - Make: Co-package PyTests (#278) (4491f09) Low9/6/2025
v6.5.2### Patch - Make: Rust 1.64 compatibility (889bf25) - Docs: Inconsistencies & typos (301d59c) - Make: Avoid `Cargo.lock` for the library (5b9c207) - Make: MSVC-friendlier Rust builds (e940014) - Improve: Naming Rust tests (e1fab5c) Low9/5/2025
v6.5.1Release: v6.5.1 [skip ci] ### Patch - Make: Avoid `--lib-sdir .` on Linux (39623cc) - Docs: Probability Distributions in Rust (af7c145) Low8/17/2025
v6.5.0SimSIMD has historically been one of the largest collections of mixed-precision kernels, but `f32` to/from `f16` and `bf16` conversion operators have never been exposed to bindings. This release is the first step in that direction. I look forward to everyone's suggestions on how to further improve the Rust API. Thanks ๐Ÿค— --- Here's an example: ```rs use simsimd::{SpatialSimilarity, f16, bf16}; // Process embeddings at different precisions for speed vs accuracy trade-offs let embedLow7/7/2025
v6.4.10Other minor tweaks: - [x] `bf16` L2 calculation in Rust - [x] flushing denormals in Rust - [x] `nonnull` build warnings in GCC & Clang - [x] upgrading JS dependencies ### Patch - Fix: Require NumPy for GIL tests (529b0dd) - Improve: Free threading examples & checks (83e522a) - Make: Enable free-threading `CIBW` builds (0093c3f) - Docs: Setting up `uv` env (8dc7012) - Improve: GIL-free batch-processing in Py (eb234d5) - Make: Drop Python 3.7 for 3.13t (fc62de4) - Improve: FlushLow7/6/2025
v6.4.9Release: v6.4.9 [skip ci] ### Patch - Fix: add dot i8 (Rust) (eaeb3b7) Low6/8/2025
v6.4.8- Fix: GCC can't handle `v8.0-a` decimal (4116f8a) - Fix: `f16`, `i8`, `bf16` compile-time dispatch (29c0f46) - Docs: Globally unset `DEVELOPER_DIR` (22bb40b) - Fix: Check for NEON for R-profile CPUs (a6bbf9e) - Make: Lower `armv8.2` to `armv8.0` requirement (31fbdcd) - Improve: Set `nonnull` attributes (fc61d19) - Docs: Unset `DEVELOPER_DIR` on macOS (69c6614) - Make: Bump Google Benchmark (cca25a0) - Docs: Refresh C example (73e6ccb)Low6/6/2025
v6.4.7Release: v6.4.7 [skip ci] ### Patch - Make: Differentiate `cibw` uploads (9116b2a) Low6/1/2025
v6.4.6Release: v6.4.6 [skip ci] ### Patch - Fix: Deno testing CLI commands (be8acfb) - Make: Bump vulnerable JS deps (506b816) - Make: Checking env. variables on macOS (6cb256f) - Make: Try compiling wheels with different flags (77870cf) - Make: Enable Deno to run pre-builds (71b3412) - Make: Set `f16c` flag for `_cvtss_sh` (1215418) - Fix: Pedantic `_Float16` cast warnings (3230095) - Make: Overwrite JS bundles (c252f84) - Fix: Missing `avx512dq` flags (ace4f7e) Low6/1/2025
v6.4.5Release: v6.4.5 [skip ci] ### Patch - Fix: Aliasing of half-precision types (abb2d88) Low5/30/2025
v6.4.4Release: v6.4.4 [skip ci] ### Patch - Make: Return Rust build errors (#264) (7e3b493) Low5/13/2025
v6.4.3Release: v6.4.3 [skip ci] ### Patch - Fix: Use correct type in sparse dot-product macro (354a6b8) Low4/24/2025
v6.4.2Release: v6.4.2 [skip ci] ### Patch - Fix: `i4` cosine on Ice Lake (#262) (ffdbbf8) Low4/23/2025
v6.4.1Release: v6.4.1 [skip ci] ### Patch - Docs: Dual-licensing with 3-clause BSD (7520fcf) Low3/31/2025
v6.4.0Release: v6.4.0 [skip ci] ### Minor - Add: Expose L2 distance in Swift (#255) (b106afc) Low2/26/2025
v6.3.4Release: v6.3.4 [skip ci] ### Patch - Fix: Turin kernels for `spdot` (#252) (5044fef) Low2/19/2025
v6.3.3Release: v6.3.3 [skip ci] ### Patch - Improve: Sparse intersection dependency chain (#251) (b8ee93f) Low2/14/2025
v6.3.2Release: v6.3.2 [skip ci] ### Patch - Make: Upgrade deprecated CI tools (d9bc3d2) Low2/5/2025
v6.3.1Release: v6.3.1 [skip ci] ### Patch - Make: Update `release.yml` for Arm (d8c6f40) - Make: Use official Docker repo (48d39e3) - Make: Remove conflicting `containerd` on Arm (41e02f9) - Make: Install Docker on Aarch64 (636a22d) - Make: Avoid `extras` repo in `yum` on Aarch64 (aa5aced) - Fix: Wrong variable used in l2sq_bf16_sve (c26008b) - Make: Resolve Windows build conflicts (8e50840) Low2/5/2025
v6.3.0Release: v6.3.0 [skip ci] ### Minor - Add: `simsimd_flush_denormals` (63af257) ### Patch - Make: Faster `cibuildwheel` releases (e54e939) - Make: Fix CI instance label (60048b4) - Make: Use newer Python for `cibuildwheel` (e922019) - Make: Skip 32-bit Windows Python images (372480a) - Make: Use newer image for Arm CI (a63f55f) - Make: Patch `pyproject.toml` (b3e35a9) - Make: Skip PyPy builds (1fe7faa) - Make: `test-command` Windows compatibility (dea5b71) - Make: Skip `armv7l` PyPi builds (92Low1/24/2025

Dependencies & License Audit

Loading dependencies...

Similar Packages

azure-storage-blobMicrosoft Azure Blob Storage Client Library for Pythonazure-template_0.1.0b6187637
azure-storage-file-shareMicrosoft Azure Azure File Share Storage Client Library for Pythonazure-template_0.1.0b6187637
mirakuruProcess executor (not only) for tests.3.0.2
opentelemetry-instrumentation-qdrantOpenTelemetry Qdrant instrumentation0.60.0
django-modelclusterDjango extension to allow working with 'clusters' of models as a single unit, independently of the database6.4.1