# simsimd > Portable mixed-precision BLAS-like vector math library for x86 and ARM - **URL**: https://www.freshcrate.ai/projects/simsimd - **Author**: Ash Vardanian - **Category**: Databases - **Latest version**: `v7.7.0` (2026-05-23) - **License**: Apache-2.0 - **Source**: https://github.com/ashvardanian/simsimd - **Language**: C - **GitHub**: 1,801 stars, 116 forks - **Registry**: pypi (`simsimd`) - **Tags**: `pypi` ## Description ![SimSIMD banner](https://github.com/ashvardanian/ashvardanian/blob/master/repositories/SimSIMD.jpg?raw=true) Computing dot-products, similarity measures, and distances between low- and high-dimensional vectors is ubiquitous in Machine Learning, Scientific Computing, Geospatial Analysis, and Information Retrieval. These algorithms generally have linear complexity in time, constant or linear complexity in space, and are data-parallel. In other words, it is easily parallelizable and vectorizable and often available in packages like BLAS (level 1) and LAPACK, as well as higher-level `numpy` and `scipy` Python libraries. Ironically, even with decades of evolution in compilers and numerical computing, [most libraries can be 3-200x slower than hardware potential][benchmarks] even on the most popular hardware, like 64-bit x86 and Arm CPUs. Moreover, most lack mixed-precision support, which is crucial for modern AI! The rare few that support minimal mixed precision, run only on one platform, and are vendor-locked, by companies like Intel and Nvidia. SimSIMD provides an alternative. 1️⃣ SimSIMD functions are practically as fast as `memcpy`. 2️⃣ Unlike BLAS, most kernels are designed for mixed-precision and bit-level operations. 3️⃣ SimSIMD often [ships more binaries than NumPy][compatibility] and has more backends than most BLAS implementations, and more high-level interfaces than most libraries. [benchmarks]: https://ashvardanian.com/posts/simsimd-faster-scipy [compatibility]: https://pypi.org/project/simsimd/#files

## Features __SimSIMD__ (Arabic: "سيمسيم دي") is a mixed-precision math library of __over 350 SIMD-optimized kernels__ extensively used in AI, Search, and DBMS workloads. Named after the iconic ["Open Sesame"](https://en.wikipedia.org/wiki/Open_sesame) command that opened doors to treasure in _Ali Baba and the Forty Thieves_, SimSIMD can help you 10x the cost-efficiency of your computational pipelines. Implemented distance functions include: - Euclidean (L2) and Cosine (Angular) spatial distances for Vector Search. _[docs][docs-spatial]_ - Dot-Products for real & complex vectors for DSP & Quantum computing. _[docs][docs-dot]_ - Hamming (~ Manhattan) and Jaccard (~ Tanimoto) bit-level distances. _[docs][docs-binary]_ - Set Intersections for Sparse Vectors and Text Analysis. _[docs][docs-sparse]_ - Mahalanobis distance and Quadratic forms for Scientific Computing. _[docs][docs-curved]_ - Kullback-Leibler and Jensen–Shannon divergences for probability distributions. _[docs][docs-probability]_ - Fused-Multiply-Add (FMA) and Weighted Sums to replace BLAS level 1 functions. _[docs][docs-fma]_ - For Levenshtein, Needleman–Wunsch, and Smith-Waterman, check [StringZilla][stringzilla]. - 🔜 Haversine and Vincenty's formulae for Geospatial Analysis. [docs-spatial]: #cosine-similarity-reciprocal-square-root-and-newton-raphson-iteration [docs-curved]: #curved-spaces-mahalanobis-distance-and-bilinear-quadratic-forms [docs-sparse]: #set-intersection-galloping-and-binary-search [docs-binary]: https://github.com/ashvardanian/SimSIMD/pull/138 [docs-dot]: #complex-dot-products-conjugate-dot-products-and-complex-numbers [docs-probability]: #logarithms-in-kullback-leibler--jensenshannon-divergences [docs-fma]: #mixed-p ## Recent releases | Version | Date | Urgency | Changes | | --- | --- | --- | --- | | `v7.7.0` | 2026-05-23 | High | Release: v7.7.0 [skip ci] ### Minor - Add: Rust trait reorganisation, bit reductions, macro purge (31217375) - Add: Tensor `fill_zeros`, `fill`, `copy`, popcount-style `BitwiseReductions` (01274600) ### Patch - Fix: `const` friendly & rank-aware tensor ops (ef37cf9b) - Improve: Harden tensor shapes against `-flto` (3eca0d2c) - Improve: Accept any integral in sub_byte_ref::operator=, clamp out-of-range (f3004f63) - Improve: Collapse per-lane finalize args to one pointer-to-vec shape (a35ddcde) | | `6.5.16` | 2026-04-21 | Low | Imported from PyPI (6.5.16) | | `v7.6.0` | 2026-04-20 | High | ## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now | | `v7.6.0` | 2026-04-20 | High | ## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now | | `v7.6.0` | 2026-04-20 | High | ## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now | | `v7.6.0` | 2026-04-20 | Medium | ## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now | | `v7.6.0` | 2026-04-20 | Medium | ## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now | | `v7.6.0` | 2026-04-20 | Medium | ## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now | | `v7.6.0` | 2026-04-20 | Medium | ## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now | | `v7.6.0` | 2026-04-20 | Medium | ## CUDA & C++ 20 Compatibility NVCC 13 caps its language-standard flag at C++20, and our multi-argument subscript overloads from C++23 P2128 made `tensor.hpp` unparseable by cudafe++. We added call-operator primaries that mirror every multi-argument subscript overload in the tensor view, span, and owning container types, and kept the bracket sugar behind an `__cpp_multidimensional_subscript` feature test so older toolchains pick the portable spelling automatically. Downstream CUDA callers now | ## Citation - HTML: https://www.freshcrate.ai/projects/simsimd - Markdown: https://www.freshcrate.ai/projects/simsimd.md - Dependencies JSON: https://www.freshcrate.ai/api/projects/simsimd/deps _Generated by freshcrate.ai. Indexes pypi releases for AI-agent ecosystem packages._