Seismic is a fast and lightweight search engine for learned sparse embeddings, written in Rust with Python bindings. It indexes sparse vector collections and retrieves results in microseconds with near-exact accuracy.
- Python >= 3.8
- Rust toolchain (only needed if installing from source for hardware-specific optimizations)
The easiest way to use Seismic is via its Python API, which can be installed in two different ways:
- the easiest way is via pip as follows:
pip install pyseismic-lsr- via Rust compilation that allows deeper hardware optimizations as follows (requires a working Rust toolchain, installable via rustup):
RUSTFLAGS="-C target-cpu=native" pip install --no-binary :all: pyseismic-lsrCheck docs/PythonUsage.md for more details.
Given a collection as a jsonl file, you can quickly index it by running
from seismic import SeismicIndex
json_input_file = "" # Your data collection
index = SeismicIndex.build(json_input_file)
print("Number of documents:", index.len)
print("Avg number of non-zero components:", index.nnz / index.len)
print("Dimensionality of the vectors:", index.dim)
index.print_space_usage_byte()and then exploit Seismic to retrieve your set of queries quickly
import numpy as np
MAX_TOKEN_LEN = 30
string_type = f'U{MAX_TOKEN_LEN}'
query = {"a": 3.5, "certain": 3.5, "query": 0.4}
query_id = "0"
query_components = np.array(list(query.keys()), dtype=string_type)
query_values = np.array(list(query.values()), dtype=np.float32)
results = index.search(
query_id=query_id,
query_components=query_components,
query_values=query_values,
k=10,
query_cut=3,
heap_factor=0.8,
)Each document in the jsonl file should be a JSON object with an id (integer), an optional content (string), and a vector (dictionary mapping tokens to scores, e.g., {"dog": 2.45}). See docs/RunExperiments.md for full format details.
- Multiple index variants â Standard (
SeismicIndex), compressed (SeismicIndexDotVByte), and large vocabulary (SeismicIndexLV) for collections with >65K unique tokens - RAG-ready â Build the index with
load_content=Trueand retrieve document texts alongside scores (example) - Python & Rust APIs â Use from Python via
pyseismic-lsror integrate directly in Rust viacargo add seismic(docs) - Parallel batch search â Multi-threaded query processing via
batch_search
Interactive Jupyter notebooks are available in the examples/ folder:
- HandsOnSeismic.ipynb â Quick 2-minute overview of building and querying an index
- SeismicGuide.ipynb â Comprehensive guide covering all features: indexing, k-NN graphs, search, evaluation
- RAG.ipynb â Plug Seismic into a RAG pipeline with document content retrieval
- DotVByteIndex.ipynb â Memory-efficient compressed index variant
- LargeVocabulary.ipynb â Handling collections with large vocabularies (>65K tokens)
Comparison with Dynamic Superblock Pruning (DSP) using the splade-v3 encoding of the MS MARCO dataset.
| Index | MRR@10 | AQT (Ξs) | Memory (GB) |
|---|---|---|---|
| DSP | 40.28 | 745 | 24.0 |
| Seismic | 40.27 | 185 | 7.9 |
Experiments performed in single-threaded mode on an Intel Core Ultra 7265K CPU, equipped with 124 GB of RAM.
Seismic is an approximate algorithm designed for high-performance retrieval over learned sparse representations. We provide pre-optimized configurations for several common datasets, e.g., MsMarco. Check the detailed documentation in docs/BestResults.md and the available optimized configurations in experiments/best_configs.
Check out our docs folder for detailed guides:
- PythonUsage.md - How to use the Seismic Python API.
- RustUsage.md - How to use Seismic directly in Rust.
- Guidelines.md - Step-by-step guide to build your Seismic index with hyperparameter tuning tips.
- BestResults.md - A detailed guide on how to replicate results with optimized configurations.
- RunExperiments.md - How to run custom experiments, download datasets, and data format details.
- TomlInstructions.md - TOML configuration reference.
Click to expand citations
- Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, and Rossano Venturini. "Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations." Proc. ACM SIGIR. 2024.
- Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, and Rossano Venturini. "Pairing Clustered Inverted Indexes with Κ-NN Graphs for Fast Approximate Retrieval over Learned Sparse Representations." Proc. ACM CIKM. 2024.
- Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini, and Leonardo Venuta. "Investigating the Scalability of Approximate Sparse Retrieval Algorithms to Massive Datasets." Proc. ECIR. 2025.
- Bruch, Sebastian and Fontana, Martino and Nardini, Franco Maria and Rulli, Cosimo and Venturini, Rossano. "Forward Index Compression for Learned Sparse Retrieval", ECIR 2025 (to appear)
SIGIR 2024
@inproceedings{bruch2024seismic,
author = {Bruch, Sebastian and Nardini, Franco Maria and Rulli, Cosimo and Venturini, Rossano},
title = {Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations},
booktitle = {Proceedings of the 47th International {ACM} {SIGIR} {C}onference on Research and Development in Information Retrieval ({SIGIR})},
pages = {152--162},
publisher = {{ACM}},
year = {2024},
url = {https://doi.org/10.1145/3626772.3657769},
doi = {10.1145/3626772.3657769}
}CIKM 2024
@inproceedings{bruch2024pairing,
author = {Bruch, Sebastian and Nardini, Franco Maria and Rulli, Cosimo and Venturini, Rossano},
title = {Pairing Clustered Inverted Indexes with $\kappa$-NN Graphs for Fast Approximate Retrieval over Learned Sparse Representations},
booktitle = {Proceedings of the 33rd International {ACM} {C}onference on {I}nformation and {K}nowledge {M}anagement ({CIKM})},
pages = {3642--3646},
publisher = {{ACM}},
year = {2024},
url = {https://doi.org/10.1145/3627673.3679977},
doi = {10.1145/3627673.3679977}
}ECIR 2025
@inproceedings{bruch2025investigating,
author = {Bruch, Sebastian and Nardini, Franco Maria and Rulli, Cosimo and Venturini, Rossano and Venuta, Leonardo},
title = {Investigating the Scalability of Approximate Sparse Retrieval Algorithms to Massive Datasets},
booktitle = {Advances in Information Retrieval},
pages = {437--445},
publisher = {Springer Nature Switzerland},
year = {2025},
url = {https://doi.org/10.1007/978-3-031-88714-7_43},
doi = {10.1007/978-3-031-88714-7_43}
}ECIR 2026 (Accepted, to appear)
@article{bruch2026forward,
title={Forward Index Compression for Learned Sparse Retrieval},
author={Bruch, Sebastian and Fontana, Martino and Nardini, Franco Maria and Rulli, Cosimo and Venturini, Rossano},
journal={European Conference on Information Retrieval 2026 (to appear)},
year={2026}
}Journal of ACM (Under Review)
@article{bruch2025efficient,
title={Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets},
author={Bruch, Sebastian and Nardini, Franco Maria and Rulli, Cosimo and Venturini, Rossano},
journal={arXiv preprint arXiv:2509.24815},
year={2025}
}The source code in this repository is subject to the following citation license:
By downloading and using this software, you agree to cite the papers listed in the Bibliography section above in any kind of material you produce where it was used to conduct a search or experimentation, whether be it a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation license.

