fastRAG

Efficient Retrieval Augmentation and Generation Framework

benchmark colbert diffusion generative-ai information-retrieval knowledge-graph llm multi-modal python

Why this rank:Strong adoptionHealthy release cadence

Description

Efficient Retrieval Augmentation and Generation Framework

README

THIS PROJECT IS ARCHIVED

Intel will not provide or guarantee development of or support for this project, including but not limited to, maintenance, bug fixes, new releases or updates.
Patches to this project are no longer accepted by Intel.
If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the community, please create your own fork of the project.

Build and explore efficient retrieval-augmented generative models and applications

📍 Installation • 🚀 Components • 📚 Examples • 🚗 Getting Started • 💊 Demos • ✏️ Scripts • 📊 Benchmarks

fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. fastRAG is designed to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation.

Comments, suggestions, issues and pull-requests are welcomed! ❤️

Important

Now compatible with Haystack v2+. Please report any possible issues you find.

📣 Updates

2024-05: fastRAG V3 is Haystack 2.0 compatible 🔥
2023-12: Gaudi2 and ONNX runtime support; Optimized Embedding models; Multi-modality and Chat demos; REPLUG text generation.
2023-06: ColBERT index modification: adding/removing documents; see IndexUpdater.
2023-05: RAG with LLM and dynamic prompt synthesis example.
2023-04: Qdrant DocumentStore support.

Key Features

Optimized RAG: Build RAG pipelines with SOTA efficient components for greater compute efficiency.
Optimized for Intel Hardware: Leverage Intel extensions for PyTorch (IPEX), 🤗 Optimum Intel and 🤗 Optimum-Habana for running as optimal as possible on Intel® Xeon® Processors and Intel® Gaudi® AI accelerators.
Customizable: fastRAG is built using Haystack and HuggingFace. All of fastRAG's components are 100% Haystack compatible.

🚀 Components

For a brief overview of the various unique components in fastRAG refer to the Components Overview page.

*LLM Backends*
Intel Gaudi Accelerators	Running LLMs on Gaudi 2
ONNX Runtime	Running LLMs with optimized ONNX-runtime
OpenVINO	Running quantized LLMs using OpenVINO
Llama-CPP	Running RAG Pipelines with LLMs on a Llama CPP backend
*Optimized Components*
Embedders	Optimized int8 bi-encoders
Rankers	Optimized/sparse cross-encoders
*RAG-efficient Components*
ColBERT	Token-based late interaction
Fusion-in-Decoder (FiD)	Generative multi-document encoder-decoder
REPLUG	Improved multi-document decoder
PLAID	Incredibly efficient indexing engine

📍 Installation

Preliminary requirements:

Python 3.8 or higher.
PyTorch 2.0 or higher.

To set up the software, install from pip or clone the project for the bleeding-edge updates. Run the following, preferably in a newly created virtual environment:

pip install fastrag

Extra Packages

There are additional dependencies that you can install based on your specific usage of fastRAG:

# Additional engines/components
pip install fastrag[intel]               # Intel optimized backend [Optimum-intel, IPEX]
pip install fastrag[openvino]            # Intel optimized backend using OpenVINO
pip install fastrag[elastic]             # Support for ElasticSearch store
pip install fastrag[qdrant]              # Support for Qdrant store
pip install fastrag[colbert]             # Support for ColBERT+PLAID; requires FAISS
pip install fastrag[faiss-cpu]           # CPU-based Faiss library
pip install fastrag[faiss-gpu]           # GPU-based Faiss library

To work with the latest version of fastRAG, you can install it using the following command:

pip install .

Development tools

pip install .[dev]

License

The code is licensed under the Apache 2.0 License.

Disclaimer

This is not an official Intel product.

Release History

Version	Changes	Urgency	Date
v3.1.2	## What's Changed * Updated IPEX embedder to work with new Haystack version (2.7) by @gadmarkovits in https://github.com/IntelLabs/fastRAG/pull/74 ## New Contributors * @gadmarkovits made their first contribution in https://github.com/IntelLabs/fastRAG/pull/74 Full Changelog: https://github.com/IntelLabs/fastRAG/compare/v3.1.1...v3.1.2	Low	11/25/2024
v3.1.1	## What's Changed * Relax dependencies, add Streaming Callback by @dnoliver in https://github.com/IntelLabs/fastRAG/pull/71 * OpenVINO Serialization fix by @danielfleischer in https://github.com/IntelLabs/fastRAG/pull/73 ## New Contributors * @dnoliver made their first contribution in https://github.com/IntelLabs/fastRAG/pull/71 Full Changelog: https://github.com/IntelLabs/fastRAG/compare/v3.1.0...v3.1.1	Low	11/24/2024
v3.1.0	## What's Changed * Update llava.py by @mosheber in https://github.com/IntelLabs/fastRAG/pull/54 * Remove indexing function by @mosheber in https://github.com/IntelLabs/fastRAG/pull/55 * IPEX benchmarking fix by @peteriz in https://github.com/IntelLabs/fastRAG/pull/58 * Removing Handlers with Phi3.5 Suppport by @mosheber in https://github.com/IntelLabs/fastRAG/pull/59 * replaced list[str] with List[str] by @mosheber in https://github.com/IntelLabs/fastRAG/pull/67 * Adding files for multi m	Low	11/7/2024
v3.0.2	## What's Changed * Fix IPEX embedders performance by @peteriz in https://github.com/IntelLabs/fastRAG/pull/52 * Fix support python versions by @peteriz in https://github.com/IntelLabs/fastRAG/pull/53 Full Changelog: https://github.com/IntelLabs/fastRAG/compare/v3.0.1...v3.0.2	Low	7/9/2024
v3.0.1	## What's Changed * Gaudi Generator by @mosheber in https://github.com/IntelLabs/fastRAG/pull/50 * Adding pypi packaging support by @peteriz in https://github.com/IntelLabs/fastRAG/pull/51 Full Changelog: https://github.com/IntelLabs/fastRAG/compare/v3.0...v3.0.1	Low	7/2/2024
v3.0	# Compatibility with Haystack v2 - ⚡ All our classes are now compatible with 🤖 Haystack v2, including the example notebooks and yaml pipeline configurations. - 💻 We based our demos on the [Chainlit](https://github.com/Chainlit/chainlit) UI library; examples include RAG chat with multi-modality! 🖼️ ❤️ Feel free to report any issue, bug or question!	Low	5/22/2024
v2.0	# fastRAG 2.0: Let's do RAG Efficiently :fire: fastRAG 2.0 includes new highly-anticipated efficiency-oriented components, an updated chat-like demo experience with multi-modality and improvements to existing components. The library now utilizes efficient Intel optimizations using [Intel extensions for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch), [🤗 Optimum Intel](https://github.com/huggingface/optimum-intel) and [🤗 Optimum-Habana](https://github.com/hugging	Low	12/24/2023
v1.3.0	## What's Changed * ColBERT Upstream Updates by @danielfleischer in https://github.com/IntelLabs/fastRAG/pull/19 Full Changelog: https://github.com/IntelLabs/fastRAG/compare/v1.2.1...v1.3.0	Low	6/20/2023
v1.2.1	## What's Changed * Update plaid_colbert_pipeline.ipynb by @mosheber in https://github.com/IntelLabs/fastRAG/pull/17 * Update colbert.py by @mosheber in https://github.com/IntelLabs/fastRAG/pull/18 Full Changelog: https://github.com/IntelLabs/fastRAG/compare/v1.2.0...v1.2.1	Low	6/13/2023
v1.2.0	Release v1.2.0	Low	5/21/2023

Dependencies & License Audit

Loading dependencies...

Similar Packages

OpenClawProBenchOpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.main@2026-06-28

awesome-opensource-aiCurated list of the best truly open-source AI projects, models, tools, and infrastructure.main@2026-07-25

developers-guide-to-aiThe Developer's Guide to AI - A Field Guide for the Working Developermain@2026-07-24

MiniSearchMinimalist web-searching platform with an AI assistant that runs directly from your browser. Uses WebLLM, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.spacemain@2026-07-24

onyxOpen Source AI Platform - AI Chat with advanced features that works with every LLMv4.4.2

More in Testing

multi-agent-ralph-loopAutonomous orchestration framework for Claude Code with MemPalace-inspired memory (4-layer stack, 818-token wake-up), parallel-first Agent Teams (6 teammates), Aristotle First Principles methodology,

trulensEvaluation and Tracking for LLM Experiments and AI Agents

ObservalObserval is an AI agent registry with first in class observabilty and eval framework

pilot#1 Terminal Benchmark 2.0 — AI that ships your tickets.