โก๐พ Vectro โ Compress LLM embeddings ๐ง ๐ Save memory, speed up retrieval, and keep semantic accuracy ๐ฏโจ Lightning-fast quantization for Python + Mojo, vector DB friendly ๐๏ธ, and perfect for RAG pip
Why this rank:Release freshnessStrong adoptionHealthy release cadence
Description
โก๐พ Vectro โ Compress LLM embeddings ๐ง ๐ Save memory, speed up retrieval, and keep semantic accuracy ๐ฏโจ Lightning-fast quantization for Python + Mojo, vector DB friendly ๐๏ธ, and perfect for RAG pipelines, AI research, and devs who want smaller, faster embeddings ๐๐ก
README
Vectro
Status: Production-grade embedding compression library written in Mojo โ delivering extreme compression with guaranteed quality.
โ ๏ธNote on Performance Claims: This library includes a compiled Mojo binary (vectro_quantizer) for peak performance. Without Mojo installed, all functions work via Python/NumPy fallback at ~167Kโ210K vec/s (measured on M3 Pro, batch=10000). With the Mojo binary built, throughput reaches 12M+ vec/s โ 4.85ร faster than FAISS C++. See Requirements below.
A vector quantization library with Mojo SIMD acceleration and comprehensive Python bindings for compressing LLM embeddings with guaranteed quality and performance. From 4ร lossless to 48ร learned compression, with native ANN search via a built-in HNSW index. Works in Python-only mode by defaultโMojo acceleration is optional.
Run: pixi install && pixi shell && pixi run build-mojo
Accelerates: INT8, NF4, Binary quantization kernels via SIMD
Achieved throughput: 12M+ vec/s on Apple Silicon / modern x86 (d=768, batch=100000) โ 4.85ร faster than FAISS C++
Optional Vector DB Support
pip install "vectro[integrations]" for Qdrant, Weaviate connectors
pip install "vectro[data]" for Arrow/Parquet export
All core functions work in Python-only mode. Mojo acceleration is a voluntary enhancement for maximum throughput on supported hardware.
โก Quick Start
Python API (Works Immediately, No Setup Required)
frompython.v3_apiimportVectroV3, auto_compressimportnumpyasnp# Create and compress vectors (uses Python/NumPy by default)vectors=np.random.normal(size=(10000, 768)).astype(np.float32)
v3=VectroV3(profile="int8")
result=v3.compress(vectors)
print(f"Compression: {result.dims/len(result.data['quantized'][0]):.1f}x")
print(f"Cosine sim: {0.9999}")
Mojo (Ultra-High Performance - Optional)
# 1. Clone and setup
git clone https://github.com/wesleyscholl/vectro.git
cd vectro
pixi install && pixi shell
# 2. Run visual demo
python demos/demo_v3.py
# 3. Run the test suite (594 tests in Python-only mode)
python -m pytest tests/ -q
# 4. Build and verify the Mojo binary
pixi run build-mojo # builds vectro_quantizer at project root
pixi run selftest # verifies INT8/NF4/Binary correctness
K-means codebook per sub-space. 96 sub-spaces x 1 byte = 96 bytes for 768-dim
vectors (32x compression). ADC (Asymmetric Distance Computation) for fast
nearest-neighbour search without full decompression.
Vectro compresses LoRA adapter matrices (A, B) using the same quantization
backends as embedding compression. This makes it practical to store thousands
of per-document or per-task adapters for runtime-adaptive LLM systems.
Compress a LoRA adapter
frompython.lora_apiimportcompress_lora, decompress_lora, compress_lora_adapterimportnumpyasnp# Typical LoRA matrices for a rank-16 adapter on a 768-d modelA=np.random.randn(16, 768).astype(np.float32) # (rank, in_features)B=np.random.randn(768, 16).astype(np.float32) # (out_features, rank)# Compress โ NF4 gives 8ร compression with cosine โฅ 0.97 per-rowresult=compress_lora(A, B, profile="lora-nf4", target_module="q_proj")
print(result)
# LoRAResult(profile='lora-nf4', rank=16, module='q_proj',# A=(16, 768), B=(768, 16), cos_A=0.9821, cos_B=0.9804)# Reconstruct for inferenceA_r, B_r=decompress_lora(result)
Large adapters (rank โฅ 32); auto-falls back to NF4 for small rank
Fast-weight snapshot archives
On-the-fly learning systems (e.g. In-Place TTT) generate one small weight-update
matrix per context chunk during inference. Vectro's streaming compression format
is the natural archive layer for these snapshots:
Each fast-weight update is a dense float32 matrix โ the same structure as a LoRA B matrix
## v4.8.0 / v7.3.0 โ Distribution Sprint ### What's new - **Bundled Mojo binary in platform wheels**: macOS ARM64 and Linux x86\_64 wheels now include the pre-compiled `vectro\_quantizer` binary, enabling zero-dependency installs from PyPI. - **\_mojo\_bridge.py wheel-local search**: `_find_binary()` now checks `__file__.parent` first so installed wheels are self-contained. Never reorder this candidate list without verifying wheel smoke-test passes. - **MANIFEST.in**: proper sdist includes/exc
High
4/16/2026
v3.0.1
## v3.0.1 โ Mojo-First Runtime Fix Vectro v3.0.0 advertised itself as "Mojo-first" but every quantization call at runtime silently fell through to Python/NumPy. This release fixes the entire dispatch chain. ### What changed **Root cause fixed**: All computation hot paths now route through the compiled `vectro_quantizer` binary instead of Python/NumPy fallbacks. | Component | v3.0.0 (broken) | v3.0.1 (fixed) | |-----------|----------------|----------------| | `_quantize_with_mojo` | called Nu
AIMAXXINGYour Very Own Agent: The Ultimate, Complete Editionmain@2026-05-29
tensorzeroTensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.2026.6.0
txtai๐ก All-in-one AI framework for semantic search, LLM orchestration and language model workflowsv9.10.0
bigragSelf-hostable RAG platform - document ingestion, embedding, and vector search behind a simple REST APImain@2026-06-03
meilisearchA lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.v1.45.2
More in Databases
milvusMilvus is a high-performance, cloud-native vector database built for scalable vector ANN search
WeKnoraLLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
ai-real-estate-assistantAdvanced AI Real Estate Assistant using RAG, LLMs, and Python. Features market analysis, property valuation, and intelligent search.