simsimd

Portable mixed-precision BLAS-like vector math library for x86 and ARM

Why this rank:Strong adoptionRecent releaseHealthy release cadence

Description

![SimSIMD banner](https://github.com/ashvardanian/ashvardanian/blob/master/repositories/SimSIMD.jpg?raw=true) Computing dot-products, similarity measures, and distances between low- and high-dimensional vectors is ubiquitous in Machine Learning, Scientific Computing, Geospatial Analysis, and Information Retrieval. These algorithms generally have linear complexity in time, constant or linear complexity in space, and are data-parallel. In other words, it is easily parallelizable and vectorizable and often available in packages like BLAS (level 1) and LAPACK, as well as higher-level `numpy` and `scipy` Python libraries. Ironically, even with decades of evolution in compilers and numerical computing, [most libraries can be 3-200x slower than hardware potential][benchmarks] even on the most popular hardware, like 64-bit x86 and Arm CPUs. Moreover, most lack mixed-precision support, which is crucial for modern AI! The rare few that support minimal mixed precision, run only on one platform, and are vendor-locked, by companies like Intel and Nvidia. SimSIMD provides an alternative. 1️⃣ SimSIMD functions are practically as fast as `memcpy`. 2️⃣ Unlike BLAS, most kernels are designed for mixed-precision and bit-level operations. 3️⃣ SimSIMD often [ships more binaries than NumPy][compatibility] and has more backends than most BLAS implementations, and more high-level interfaces than most libraries. [benchmarks]: https://ashvardanian.com/posts/simsimd-faster-scipy [compatibility]: https://pypi.org/project/simsimd/#files <div> <a href="https://pepy.tech/project/simsimd"> <img alt="PyPI" src="https://static.pepy.tech/personalized-badge/simsimd?period=total&units=abbreviation&left_color=black&right_color=blue&left_text=SimSIMD%20Python%20installs" /> </a> <a href="https://www.npmjs.com/package/simsimd"> <img alt="npm" src="https://img.shields.io/npm/dy/simsimd?label=JavaScript%20NPM%20installs" /> </a> <a href="https://crates.io/crates/simsimd"> <img alt="rust" src="https://img.shields.io/crates/d/simsimd?label=Rust%20Crate%20installs" /> </a> <img alt="GitHub code size in bytes" src="https://img.shields.io/github/languages/code-size/ashvardanian/simsimd"> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions Ubuntu" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=Ubuntu&logo=github&color=blue"> </a> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions Windows" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=Windows&logo=windows&color=blue"> </a> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions macOS" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=macOS&logo=apple&color=blue"> </a> <a href="https://github.com/ashvardanian/SimSIMD/actions/workflows/release.yml"> <img alt="GitHub Actions CentOS Linux" src="https://img.shields.io/github/actions/workflow/status/ashvardanian/SimSIMD/release.yml?branch=main&label=CentOS&logo=centos&color=blue"> </a> </div> ## Features __SimSIMD__ (Arabic: "سيمسيم دي") is a mixed-precision math library of __over 350 SIMD-optimized kernels__ extensively used in AI, Search, and DBMS workloads. Named after the iconic ["Open Sesame"](https://en.wikipedia.org/wiki/Open_sesame) command that opened doors to treasure in _Ali Baba and the Forty Thieves_, SimSIMD can help you 10x the cost-efficiency of your computational pipelines. Implemented distance functions include: - Euclidean (L2) and Cosine (Angular) spatial distances for Vector Search. _[docs][docs-spatial]_ - Dot-Products for real & complex vectors for DSP & Quantum computing. _[docs][docs-dot]_ - Hamming (~ Manhattan) and Jaccard (~ Tanimoto) bit-level distances. _[docs][docs-binary]_ - Set Intersections for Sparse Vectors and Text Analysis. _[docs][docs-sparse]_ - Mahalanobis distance and Quadratic forms for Scientific Computing. _[docs][docs-curved]_ - Kullback-Leibler and Jensen–Shannon divergences for probability distributions. _[docs][docs-probability]_ - Fused-Multiply-Add (FMA) and Weighted Sums to replace BLAS level 1 functions. _[docs][docs-fma]_ - For Levenshtein, Needleman–Wunsch, and Smith-Waterman, check [StringZilla][stringzilla]. - 🔜 Haversine and Vincenty's formulae for Geospatial Analysis. [docs-spatial]: #cosine-similarity-reciprocal-square-root-and-newton-raphson-iteration [docs-curved]: #curved-spaces-mahalanobis-distance-and-bilinear-quadratic-forms [docs-sparse]: #set-intersection-galloping-and-binary-search [docs-binary]: https://github.com/ashvardanian/SimSIMD/pull/138 [docs-dot]: #complex-dot-products-conjugate-dot-products-and-complex-numbers [docs-probability]: #logarithms-in-kullback-leibler--jensenshannon-divergences [docs-fma]: #mixed-p

Release History

Version	Changes	Urgency	Date
v7.7.0	Release: v7.7.0 [skip ci] ### Minor - Add: Rust trait reorganisation, bit reductions, macro purge (31217375) - Add: Tensor `fill_zeros`, `fill`, `copy`, popcount-style `BitwiseReductions` (01274600) ### Patch - Fix: `const` friendly & rank-aware tensor ops (ef37cf9b) - Improve: Harden tensor shapes against `-flto` (3eca0d2c) - Improve: Accept any integral in sub_byte_ref::operator=, clamp out-of-range (f3004f63) - Improve: Collapse per-lane finalize args to one pointer-to-vec shape (a35ddcde)	High	5/23/2026
6.5.16	Imported from PyPI (6.5.16)	Low	4/21/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	High	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	High	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	High	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Medium	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Low	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Low	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Low	4/20/2026
v7.6.0	## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now	Low	4/20/2026
v7.5.0	- Built-in OpenMP bundling for JS & Python 🐍 - Intel Granite Rapids 🪨 F16 → F32 GEMMs 💎 - Faster bit-vector population counts for Arm NEON 🦾 - SME compatibility with non-Apple Clang on Apple machines 🍏 - Hardening against MSan SVE false-positives, thanks to @alexey-milovidov 🦺 - Hardening against GCC 13 Arm NEON code-gen bugs, thanks to @swasik 🐂 - `_into` & `_parallel` GEMM Rust APIs: reusing memory & [ForkUnion](https://github.com/ashvardanian/ForkUnion) pools 🆕 - De-vec	Medium	4/14/2026
v7.4.5	- Improve: Vectorize F32 SME MaxSim finalizer (0daacf3b) - Improve: Remove centering from RMSD kernels (1a83ab4f) - Fix: Emulated vs native test durations (4266451d)	Medium	4/6/2026
v7.4.4	- Fix: ARMv7 Rust cross-compilation with CC for versioned GCC (a5e67e60) - Make: `check_source_runs`-probing like `march=native` on MSVC (7a152f3b) - Fix: Drop `_MM_FROUND_NO_EXC` from `_mm256_cvtps_ph` calls (8649b0c0) - Fix: Guard against old MSVC preprocessor (25d33048) - Make: Enforce newer preprocessor in MSVC (be966af2) - Make: Cleaner CIBW artifact names & env forwarding (a6cf6424) - Make: Forward cross-compilation flags for macOS wheels (6ed3b8c2) - Make: Split ppc64le, s390x, i68	Medium	4/6/2026
v7.4.3	Release: v7.4.3 [skip ci] ### Patch - Fix: Require AArch64 for NEON kernels (2ba1b343) - Docs: Table order & formatting (8673a56f) - Make: Avoid `--all-features` in Rust cross-compilation CI (8be8bffe) - Improve: Arm32 compatibility (64041725) - Make: `cancel-in-progress` CI to shift compute resources (dfc8fa02) - Improve: Harden Swift SDK for 6.1+ toolkit (965cd524) - Make: Strip `.unsafeFlags` & list platforms for SPM consumption (b061b78d) - Make: Expose `CNumKongDispatch` target to Swift us	Medium	4/5/2026
v7.4.2	Release: v7.4.2 [skip ci] ### Patch - Docs: Shrink tables in the main README (6d2ea345) - Make: Inline Power Shell cross-compilation logic in CI (974c30ca) - Make: Define `_ARM64_` for Arm JS builds in MSVC (f3030420) - Make: Skip same-named artifacts on CI reruns (7c098e51)	Medium	4/5/2026
v7.4.1	Release: v7.4.1 [skip ci] ### Patch - Make: Set `repository.url` for NPM (385480d2) - Make: Pull MSVC ARM64 Cross-Compiler (e20c93ef) - Fix: Swap `f16x8` for `u16x8` in `cast_neon` (154ec5db)	Medium	4/5/2026
v7.4.0	- Faster tensor contractions - Faster GEMM "packers" with SIMD - New SVE+SDOT kernels for `i8` - MSVC build stability on Arm ### Minor - Add: WASM elementwise ops & spatial mini-float kernels (81b8c449) - Add: WASM type-casting kernels (e09df318) - Add: SVE+SDOT ops for 8-bit integers (913fc6b0) ### Patch - Fix: Misplaced NEON loads/stores in Sierra (05e30455) - Fix: Avoid unconsitional `np` symbols (9dffb681) - Make: Resolve probe locations for NPM consumers (c602f45f) - Doc	Medium	4/4/2026
v7.3.0	This release hardens Arm kernels across NEON, SVE, and SME. The most widespread fix replaces `_x` (don't-care) predicated intrinsics with `_m` (merge-with-zero) variants — inactive lanes left undefined by `_x` could carry stale data into reductions, producing wrong results for non-power-of-two dimensions on real SVE hardware. Partial-tail padding in `BMOPA` is fixed for sub-32-bit types, and strided reductions in NEON are hardened against off-by-one in non-contiguous layouts. > Thanks to the	Medium	4/2/2026
v7.2.4	Release: v7.2.4 [skip ci] ### Patch - Make: 2h timeout budget for JS & Py builds (2e8f081e)	Medium	3/28/2026
v7.2.3	Release: v7.2.3 [skip ci] ### Patch - Fix: Harden implicit narrowing casts (319fae28) - Fix: Negating unsigned integers in MSVC (9be61e3d) - Make: Retry flaky CI jobs (b622d630) - Make: Remove conflicting NEON probes (c0f35733)	Medium	3/28/2026
v7.2.2	Release: v7.2.2 [skip ci] ### Patch - Make: Trusted publishing for NPM (95782713) - Improve: VNNI spatial kernels for E2M3, E3M2, & E4M3 (02d53256) - Fix: `NK_TARGET_NEON` auto-detect in MSVC (4ad21241)	Medium	3/28/2026
v7.2.1	Release: v7.2.1 [skip ci] ### Patch - Improve: Listing compile-time capabilities (0e9f04a8) - Improve: Flush Float16 sums in `spatial/` Float6 kernels (52606b0e) - Make: Slimmer NPM packages per platform (0a18afcb) - Improve: Lower E4M3 Genoa to Icelake with 40% gains (8ade366e)	Medium	3/28/2026
v7.2.0	Nvidia just unveiled Arm-based Olympus cores and Vera CPUs with native support for 8-bit floating-point numbers (FP8). Intel's Xeon 7 Diamond Rapids and Nova Lake CPUs with FP8 may arrive even sooner through the new AVX 10.2 extensions. FP8 arithmetic is at the heart of modern LLM inference, but most of the world's CPUs don't have it yet. NumKong v7.2 bridges that gap — native FP8 on the new chips, efficient emulation on everything else — so more global infrastructure is ready for AI workloads s	Medium	3/28/2026
v7.1.1	- Improve: Smaller `TensorError` state (c5475be2) - Improve: Apply `StorageElement` to every operation class (98064815) - Improve: Drop redundant NEON MinMax in FHM & BFDOT files (96c869f8) - Improve: Simpler `i4` dot-product in NEON (bf61c2c4) - Docs: Apple M5 instruction timings & x86 refresh (835ae52a) - Fix: Fill only upper triangle in other SME kernels (2a93c309) - Fix: Filling only upper triangle in `u1_smebi32` kernels (68f5963b) - Fix: Harden SME streming behaviour (8fe8cc9f) - I	Medium	3/22/2026
v7.1.0	- Zero-copy Tensor exchange in Python, Rust, & C++ - `std::format` & `core::fmt::Display` for Rust & C++ - Tensors & multi-dimensional iterators for sub-byte types - Documenting Python reductions along an `axis=` - Faster `dtype=` hints resolution in CPython binding - Upgraded CI for Clang cross-compiled binaries - Compiling SME feature checks with old Assembler - Simplify WASM backend usage in browsers ### Minor - Add: `nk::cast` & reduction helpers for C++ (80ff0b03) - Add: Print	Low	3/21/2026
v7.0.0	What started as a straightforward optimization request from the @albumentations-team — improving element-wise operations between equi-dimensional arrays — snowballed into the largest piece of open-source work I've done in years. __200K+ lines of SIMD across 2'000+ kernels__: - targeting every major vector ISA, grouped by platform and shape — __x86 AVX2 vectors__ on Haswell, Alder Lake, Sierra Forest · __x86 AVX-512 vectors__ on Skylake, Ice Lake, Genoa, Sapphire Rapids, Turin · __Intel's fixe	Low	3/17/2026
v6.5.16	Release: v6.5.16 [skip ci] ### Patch - Fix: Surround `#pragma clang` with checks for Clang (#192) (f871d803) - Improve: Reduce native half-precision usage (486d8b5a) - Fix: Unpoison SIMD dispatch results for MemorySanitizer (#304) (2513ee7f) - Fix: Enlarge dummy buffer for SVE predicated loads (#307) (fe9327c5)	Low	3/7/2026
v6.5.15	Release: v6.5.15 [skip ci] ### Patch - Fix: Initialize `dummy_input` to fix MSan false positive (#302) (c2ad842d)	Low	3/4/2026
v6.5.14	Release: v6.5.14 [skip ci] ### Patch - Fix: Wrong predicate width in BF16 SVE L2 kernel (#301) (87ae846) - Improve: FreeBSD comp-time target selection (#300) (cb11f8b)	Low	3/3/2026
v6.5.13	Release: v6.5.13 [skip ci] ### Patch - Fix: Replace `avx2vnni` with `avxvnni` for Sierra Forest (#296) (a8bb232) - Make: Remove `NPM_TOKEN` for OIDC publishing (13cd5bc) - Make: Sign rebase with GitHub Actions bot (e7b89b5) - Fix: Revert to `atol=1` for test integer outputs vs SciPy (b75bdbd)	Low	2/16/2026
v6.5.12	Release: v6.5.12 [skip ci] ### Patch - Make: Same upload/download CI versions (ae9e567)	Low	12/21/2025
v6.5.11	Release: v6.5.11 [skip ci] ### Patch - Improve: Round integer distances (c487b55) - Fix: Absolute tolerance bound for integers (73a9ff7) - Make: Skip flaky Arm failures (6be67bb) - Fix: NEON guard for u8 dot dispatch (2c5876d)	Low	12/20/2025
v6.5.10	Release: v6.5.10 [skip ci] ### Patch - Make: Re-attempt forwarding `NPM_TOKEN` (714c615) - Fix: Misusing `pytest.warns` (2135a58) - Docs: Wording & spelling inconsistencies (1cc8f71) - Fix: `f32` to `bf16` down-casting on BIG-endian (a801b58)	Low	12/18/2025
v6.5.9	Release: v6.5.9 [skip ci] ### Patch - Make: Deno `--no-check` for CI (9976b88) - Fix: CMake relative paths for Termux compatibility (#288) (07976ad) - Make: NPM w/out `NODE_AUTH_TOKEN` (63cd55b) - Fix: Length check in `jaccard_b8_ice` and `hamming_b8_ice` (#286) (a7cc7e1) - Make: Python 3.14 builds (#271) (e4d62e7) - Fix: Avoid `sqrt(0)` in `probability.h` (108a8b5) - Fix: `u64size` in Rust to match the C ABI (31195e9) - Make: Stack-realign for `i386` builds (5a386f5) - Make: 32-bit cross-compi	Low	12/17/2025
v6.5.8	Release: v6.5.8 [skip ci] ### Patch - Make: Avoid half-precision NEON on Windows (6541157) - Make: Retire `macos-13` runners (7d6358e)	Low	12/17/2025
v6.5.7	Release: v6.5.7 [skip ci] ### Patch - Make: Bump CI versions (fdad95c) - Make: NPM Trusted Publishing (581623a) - Make: Conservative Sierra Forest flags (1b2f16c)	Low	12/17/2025
v6.5.6	Release: v6.5.6 [skip ci] ### Patch - Make: Explicit cross-compilation overrides (b97ad62)	Low	12/17/2025
v6.5.5	- Improve: Faster sparse dot product (e5dad6c) - Improve: Turin kernels & cleaner loops in `sparse.h` (d6e17b1) - Fix: `dot_bf16_neon` step (8f3ef10) - Fix: Jensen-Shannon masked accumulation (b9d7834) - Improve: Runtime-defined dimensions (d9ca85d) - Improve: Broader Rust tests (cce9374) - Improve: Test `bf16` dot product (658901d) - Improve: Naming baseline kernels & benchmarks (a897aa9) - Improve: Log accuracy of `i8` & `f32` kernels (c6db82d) - Improve: Slice overlap chack steps (ea	Low	11/13/2025
v6.5.4	Release: v6.5.4 [skip ci] ### Patch - Fix: `intersect_u16` test in Rust (682556e) - Fix: Check macro presence on Windows (56a01ef) - Fix: Resetting capability in PyTest (7243bf6) - Fix: JS division by zero with +eps (02fa2a5) - Fix: `ComplexProducts` number of dimensions in Rust (ad429dd) - Improve: Detect NEON+DP via WinAPI (e6cfcad) - Docs: Enumerating x86 platforms (97fa158) - Fix: Probe `mrs` for avoid `SIGILL` on older Arm (b139cc9)	Low	10/30/2025
v6.5.3	Release: v6.5.3 [skip ci] ### Patch - Make: Co-package PyTests (#278) (4491f09)	Low	9/6/2025
v6.5.2	### Patch - Make: Rust 1.64 compatibility (889bf25) - Docs: Inconsistencies & typos (301d59c) - Make: Avoid `Cargo.lock` for the library (5b9c207) - Make: MSVC-friendlier Rust builds (e940014) - Improve: Naming Rust tests (e1fab5c)	Low	9/5/2025
v6.5.1	Release: v6.5.1 [skip ci] ### Patch - Make: Avoid `--lib-sdir .` on Linux (39623cc) - Docs: Probability Distributions in Rust (af7c145)	Low	8/17/2025
v6.5.0	SimSIMD has historically been one of the largest collections of mixed-precision kernels, but `f32` to/from `f16` and `bf16` conversion operators have never been exposed to bindings. This release is the first step in that direction. I look forward to everyone's suggestions on how to further improve the Rust API. Thanks 🤗 --- Here's an example: ```rs use simsimd::{SpatialSimilarity, f16, bf16}; // Process embeddings at different precisions for speed vs accuracy trade-offs let embed	Low	7/7/2025
v6.4.10	Other minor tweaks: - [x] `bf16` L2 calculation in Rust - [x] flushing denormals in Rust - [x] `nonnull` build warnings in GCC & Clang - [x] upgrading JS dependencies ### Patch - Fix: Require NumPy for GIL tests (529b0dd) - Improve: Free threading examples & checks (83e522a) - Make: Enable free-threading `CIBW` builds (0093c3f) - Docs: Setting up `uv` env (8dc7012) - Improve: GIL-free batch-processing in Py (eb234d5) - Make: Drop Python 3.7 for 3.13t (fc62de4) - Improve: Flush	Low	7/6/2025
v6.4.9	Release: v6.4.9 [skip ci] ### Patch - Fix: add dot i8 (Rust) (eaeb3b7)	Low	6/8/2025
v6.4.8	- Fix: GCC can't handle `v8.0-a` decimal (4116f8a) - Fix: `f16`, `i8`, `bf16` compile-time dispatch (29c0f46) - Docs: Globally unset `DEVELOPER_DIR` (22bb40b) - Fix: Check for NEON for R-profile CPUs (a6bbf9e) - Make: Lower `armv8.2` to `armv8.0` requirement (31fbdcd) - Improve: Set `nonnull` attributes (fc61d19) - Docs: Unset `DEVELOPER_DIR` on macOS (69c6614) - Make: Bump Google Benchmark (cca25a0) - Docs: Refresh C example (73e6ccb)	Low	6/6/2025
v6.4.7	Release: v6.4.7 [skip ci] ### Patch - Make: Differentiate `cibw` uploads (9116b2a)	Low	6/1/2025
v6.4.6	Release: v6.4.6 [skip ci] ### Patch - Fix: Deno testing CLI commands (be8acfb) - Make: Bump vulnerable JS deps (506b816) - Make: Checking env. variables on macOS (6cb256f) - Make: Try compiling wheels with different flags (77870cf) - Make: Enable Deno to run pre-builds (71b3412) - Make: Set `f16c` flag for `_cvtss_sh` (1215418) - Fix: Pedantic `_Float16` cast warnings (3230095) - Make: Overwrite JS bundles (c252f84) - Fix: Missing `avx512dq` flags (ace4f7e)	Low	6/1/2025
v6.4.5	Release: v6.4.5 [skip ci] ### Patch - Fix: Aliasing of half-precision types (abb2d88)	Low	5/30/2025
v6.4.4	Release: v6.4.4 [skip ci] ### Patch - Make: Return Rust build errors (#264) (7e3b493)	Low	5/13/2025
v6.4.3	Release: v6.4.3 [skip ci] ### Patch - Fix: Use correct type in sparse dot-product macro (354a6b8)	Low	4/24/2025
v6.4.2	Release: v6.4.2 [skip ci] ### Patch - Fix: `i4` cosine on Ice Lake (#262) (ffdbbf8)	Low	4/23/2025
v6.4.1	Release: v6.4.1 [skip ci] ### Patch - Docs: Dual-licensing with 3-clause BSD (7520fcf)	Low	3/31/2025
v6.4.0	Release: v6.4.0 [skip ci] ### Minor - Add: Expose L2 distance in Swift (#255) (b106afc)	Low	2/26/2025
v6.3.4	Release: v6.3.4 [skip ci] ### Patch - Fix: Turin kernels for `spdot` (#252) (5044fef)	Low	2/19/2025
v6.3.3	Release: v6.3.3 [skip ci] ### Patch - Improve: Sparse intersection dependency chain (#251) (b8ee93f)	Low	2/14/2025
v6.3.2	Release: v6.3.2 [skip ci] ### Patch - Make: Upgrade deprecated CI tools (d9bc3d2)	Low	2/5/2025
v6.3.1	Release: v6.3.1 [skip ci] ### Patch - Make: Update `release.yml` for Arm (d8c6f40) - Make: Use official Docker repo (48d39e3) - Make: Remove conflicting `containerd` on Arm (41e02f9) - Make: Install Docker on Aarch64 (636a22d) - Make: Avoid `extras` repo in `yum` on Aarch64 (aa5aced) - Fix: Wrong variable used in l2sq_bf16_sve (c26008b) - Make: Resolve Windows build conflicts (8e50840)	Low	2/5/2025
v6.3.0	Release: v6.3.0 [skip ci] ### Minor - Add: `simsimd_flush_denormals` (63af257) ### Patch - Make: Faster `cibuildwheel` releases (e54e939) - Make: Fix CI instance label (60048b4) - Make: Use newer Python for `cibuildwheel` (e922019) - Make: Skip 32-bit Windows Python images (372480a) - Make: Use newer image for Arm CI (a63f55f) - Make: Patch `pyproject.toml` (b3e35a9) - Make: Skip PyPy builds (1fe7faa) - Make: `test-command` Windows compatibility (dea5b71) - Make: Skip `armv7l` PyPi builds (92	Low	1/24/2025

Dependencies & License Audit

Loading dependencies...

Similar Packages

alibabacloud-adb20211201Alibaba Cloud adb (20211201) SDK Library for Pythonmaster@2026-06-06

ydbYDB Python SDK3.29.1

typerTyper, build great CLIs. Easy to code. Based on Python type hints.0.26.7

django-timezone-fieldA Django app providing DB, form, and REST framework fields for zoneinfo and pytz timezone objects.main@2026-06-03

azure-storage-blobMicrosoft Azure Blob Storage Client Library for Pythonazure-mgmt-computelimit_1.1.0

More in Databases

milvusMilvus is a high-performance, cloud-native vector database built for scalable vector ANN search

WeKnoraLLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

ai-real-estate-assistantAdvanced AI Real Estate Assistant using RAG, LLMs, and Python. Features market analysis, property valuation, and intelligent search.

alibabacloud-adb20211201Alibaba Cloud adb (20211201) SDK Library for Python