# fast-plaid

> High-Performance Engine for Multi-Vector Search

- **URL**: https://www.freshcrate.ai/projects/fast-plaid
- **Author**: lightonai
- **Category**: Databases
- **Latest version**: `1.4.7` (2026-05-28)
- **License**: MIT
- **Source**: https://github.com/lightonai/fast-plaid
- **Language**: Python
- **GitHub**: 245 stars, 21 forks
- **Registry**: github
- **Tags**: `colbert`, `colpali`, `information-retrieval`, `python`, `rust`, `vector-database`

## Description

High-Performance Engine for Multi-Vector Search

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `1.4.7` | 2026-05-28 | High | ## What's new  - **`freeze()` / `unfreeze()`** — drop and restore the per-shard `{i}.codes.npy` / `{i}.residuals.npy` files for read-only indexes, roughly halving on-disk size. Search is unaffected; `update()` / `delete()` raise on a frozen index until `unfreeze()` is called. The unfreeze rebuild is byte-identical to the original Rust-written shards.  See PR [#45](https://github.com/lightonai/fast-plaid/pull/45) for details. |
| `1.4.5` | 2026-03-23 | Medium | Fast-Plaid 1.4.5  ## Bug Fix  Fixed a panic during index creation on large datasets. PyTorch's native `quantile()` uses a sort-based implementation that fails with "input tensor is too large" when the residual tensor exceeds an internal size limit. This caused a `PanicException` in the `create` path for large document collections.  All remaining `Tensor::quantile()` calls in the index creation code (bucket cutoffs and bucket weights computation) have been replaced with the `kthvalue`-based appro |
| `1.4.2` | 2026-01-13 | Low | Fast-Plaid 1.4.2  Previous version added incremental support for Fast-Plaid Indexes but introduced a bug when deleting documents from the index. The buffer used to get good centroids would not get updates. The bug impacted indexes that were relying on the delete method. |
| `1.4.1` | 2026-01-05 | Low | # Fast-Plaid 1.4.1 Release Notes  ## Overview  Fast-Plaid 1.4.1 introduces **incremental index updates** with dynamic centroid expansion and a new **low memory mode** that significantly reduces GPU VRAM usage. This release focuses on making Fast-Plaid more efficient for production workloads with evolving document collections.  ## Key Features  ### Incremental Updates with Dynamic Centroid Expansion  The `.update()` method now supports intelligent centroid management:  - **Buffered Up |
| `1.3.1` | 2025-12-17 | Low | Small release which reduce memory usage of Fast-Plaid index creation. Getting better one step at a time 😊 |
| `1.3.0` | 2025-12-04 | Low | ## v1.3.0: Memory Optimizations & Architecture Improvements  This release introduces significant reductions in memory usage and improves index management.  ### 🚀 Performance & Memory * **Memory-Mapped Loading:** Implemented a new loading system with incremental updates and zero-copy validation to prevent loading entire indices into RAM with `update` method. * **Optimized Tensors:** Shifted to smaller integer types (`Uint8`, `Int32`) where appropriate and replaced `torch.quantile` with a c |
| `1.2.5` | 2025-10-29 | Low | FastPlaid 1.2.5: Leaner & Faster  We're excited to release FastPlaid 1.2.5! This version focuses on significant optimizations for indexing, giving you faster search speeds and much more efficient GPU VRAM management.  ✨ Highlights  - Drastically Reduced GPU VRAM Usage: We've refactored the indexing process to process document embeddings in batches. This massively reduces GPU VRAM consumption during index creation, all without compromising on speed. No impact on overall CPU RAM usage or ind |
| `1.2.4` | 2025-09-23 | Low | The version 1.2.4 of fast-plaid now support Python 3.13 version and upload dedicated wheels to PyPi.  🚀 |
| `1.2.3` | 2025-09-22 | Low | The 1.2.3 version of Fast-Plaid enhance the mutability of the index by adding deletion of specific embeddings. It also includes a built-in sqlite filtering pipeline. |
| `1.2.1` | 2025-09-10 | Low | This new release allows to feed Fast-Plaid with un-padded queries. It also normalize decompressed embeddings to further enhance the results. It also solve an issue on small dataset where the fast-kmeans would be initialized with a larger number of clusters than training data points. This version will be integrated to PyLate as the backend for search. |

## Citation

- HTML: https://www.freshcrate.ai/projects/fast-plaid
- Markdown: https://www.freshcrate.ai/projects/fast-plaid.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/fast-plaid/deps

_Generated by freshcrate.ai. Indexes github releases for AI-agent ecosystem packages._