freshcrate
Skin:/
Home > Frameworks > AutoRAG

AutoRAG

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

Description

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

README

AutoRAG

RAG AutoML tool for automatically finding an optimal RAG pipeline for your data.

Thumbnail

PyPI - Downloads LinkedInX (formerly Twitter) Follow Hugging FaceMarker-Inc-Korea%2FAutoRAG | Trendshift

There are many RAG pipelines and modules out there, but you donโ€™t know what pipeline is great for โ€œyour own dataโ€ and "your own use-case." Making and evaluating all RAG modules is very time-consuming and hard to do. But without it, you will never know which RAG pipeline is the best for your own use-case.

AutoRAG is a tool for finding the optimal RAG pipeline for โ€œyour data.โ€ You can evaluate various RAG modules automatically with your own evaluation data and find the best RAG pipeline for your own use-case.

AutoRAG supports a simple way to evaluate many RAG module combinations. Try now and find the best RAG pipeline for your own use-case.

Explore our ๐Ÿ“– Document!!


YouTube Tutorial

AutoRAG.Tutorial.1.1.mp4

Muted by default, enable sound for voice-over

You can see on YouTube

Use AutoRAG in HuggingFace Space ๐Ÿš€

Colab Tutorial

Index

Quick Install

We recommend using Python version 3.10 or higher for AutoRAG.

pip install AutoRAG

If you want to use the local models, you need to install gpu version.

pip install "AutoRAG[gpu]"

Or for parsing, you can use the parsing version.

pip install "AutoRAG[gpu,parse]"

Data Creation

Hugging Face Sticker

Image

image

RAG Optimization requires two types of data: QA dataset and Corpus dataset.

  1. QA dataset file (qa.parquet)
  2. Corpus dataset file (corpus.parquet)

QA dataset is important for accurate and reliable evaluation and optimization.

Corpus dataset is critical to the performance of RAGs. This is because RAG uses the corpus to retrieve documents and generate answers using it.

๐Ÿ“Œ Supporting Data Creation Modules

Image

Quick Start

1. Parsing

Set YAML File

modules:
  - module_type: langchain_parse
    parse_method: pdfminer

You can also use multiple Parse modules at once. However, in this case, you'll need to return a new process for each parsed result.

Start Parsing

You can parse your raw documents with just a few lines of code.

from autorag.parser import Parser

parser = Parser(data_path_glob="your/data/path/*")
parser.start_parsing("your/path/to/parse_config.yaml")

2. Chunking

Set YAML File

modules:
  - module_type: llama_index_chunk
    chunk_method: Token
    chunk_size: 1024
    chunk_overlap: 24
    add_file_name: en

You can also use multiple Chunk modules at once. In this case, you need to use one corpus to create QA and then map the rest of the corpus to QA Data. If the chunk method is different, the retrieval_gt will be different, so we need to remap it to the QA dataset.

Start Chunking

You can chunk your parsed results with just a few lines of code.

from autorag.chunker import Chunker

chunker = Chunker.from_parquet(parsed_data_path="your/parsed/data/path")
chunker.start_chunking("your/path/to/chunk_config.yaml")

3. QA Creation

You can create QA dataset with just a few lines of code.

import pandas as pd
from llama_index.llms.openai import OpenAI

from autorag.data.qa.filter.dontknow import dontknow_filter_rule_based
from autorag.data.qa.generation_gt.llama_index_gen_gt import (
	make_basic_gen_gt,
	make_concise_gen_gt,
)
from autorag.data.qa.schema import Raw, Corpus
from autorag.data.qa.query.llama_gen_query import factoid_query_gen
from autorag.data.qa.sample import random_single_hop

llm = OpenAI()
raw_df = pd.read_parquet("your/path/to/parsed.parquet")
raw_instance = Raw(raw_df)

corpus_df = pd.read_parquet("your/path/to/corpus.parquet")
corpus_instance = Corpus(corpus_df, raw_instance)

initial_qa = (
	corpus_instance.sample(random_single_hop, n=3)
	.map(
		lambda df: df.reset_index(drop=True),
	)
	.make_retrieval_gt_contents()
	.batch_apply(
		factoid_query_gen,  # query generation
		llm=llm,
	)
	.batch_apply(
		make_basic_gen_gt,  # answer generation (basic)
		llm=llm,
	)
	.batch_apply(
		make_concise_gen_gt,  # answer generation (concise)
		llm=llm,
	)
	.filter(
		dontknow_filter_rule_based,  # filter don't know
		lang="en",
	)
)

initial_qa.to_parquet('./qa.parquet', './corpus.parquet')

RAG Optimization

Hugging Face Sticker

Image

rag

How AutoRAG optimizes RAG pipeline?

Here is the AutoRAG RAG Structure that only show Nodes.

Image

Here is the image showing all the nodes and modules.

Image

rag_opt_gif

๐Ÿ“Œ Supporting RAG Optimization Nodes & modules

Metrics

The metrics used by each node in AutoRAG are shown below.

Image

Image

Here is the detailed information about the metrics that AutoRAG supports.

Quick Start

1. Set YAML File

First, you need to set the config YAML file for your RAG optimization.

We highly recommend using pre-made config YAML files for starter.

Here is an example of the config YAML file to use three retrieval nodes, prompt_maker, and generator nodes.

node_lines:
  - node_line_name: retrieve_node_line
    nodes:
      - node_type: lexical_retrieval
        strategy:
          metrics: [ retrieval_f1, retrieval_recall, retrieval_ndcg, retrieval_mrr ]
        top_k: 3
        modules:
          - module_type: bm25
      - node_type: semantic_retrieval
        strategy:
          metrics: [ retrieval_f1, retrieval_recall, retrieval_ndcg, retrieval_mrr ]
        top_k: 3
        modules:
          - module_type: vectordb
            vectordb: default
      - node_type: hybrid_retrieval
        strategy:
          metrics: [ retrieval_f1, retrieval_recall, retrieval_ndcg, retrieval_mrr ]
        top_k: 3
        modules:
          - module_type: hybrid_rrf
            weight_range: (4,80)
  - node_line_name: post_retrieve_node_line
    nodes:
      - node_type: prompt_maker  # Set Prompt Maker Node
        strategy:
          metrics: # Set Generation Metrics
            - metric_name: meteor
            - metric_name: rouge
            - metric_name: sem_score
              embedding_model: openai
        modules:
          - module_type: fstring
            prompt: "Read the passages and answer the given question. \n Question: {query} \n Passage: {retrieved_contents} \n Answer : "
      - node_type: generator  # Set Generator Node
        strategy:
          metrics: # Set Generation Metrics
            - metric_name: meteor
            - metric_name: rouge
            - metric_name: sem_score
              embedding_model: openai
        modules:
          - module_type: openai_llm
            llm: gpt-4o-mini
            batch: 16

2. Run AutoRAG

You can evaluate your RAG pipeline with just a few lines of code.

from autorag.evaluator import Evaluator

evaluator = Evaluator(qa_data_path='your/path/to/qa.parquet', corpus_data_path='your/path/to/corpus.parquet')
evaluator.start_trial('your/path/to/config.yaml')

or you can use the command line interface

autorag evaluate --config your/path/to/default_config.yaml --qa_data_path your/path/to/qa.parquet --corpus_data_path your/path/to/corpus.parquet

Once it is done, you can see several files and folders created in your current directory. At the trial folder named to numbers (like 0), you can check summary.csv file that summarizes the evaluation results and the best RAG pipeline for your data.

For more details, you can check out how the folder structure looks like at here.

3. Run Dashboard

You can run a dashboard to easily see the result.

autorag dashboard --trial_dir /your/path/to/trial_dir

sample dashboard

dashboard

4. Deploy your optimal RAG pipeline

4-1. Run as a Code

You can use an optimal RAG pipeline right away from the trial folder. The trial folder is the directory used in the running dashboard. (like 0, 1, 2, ...)

from autorag.deploy import Runner

runner = Runner.from_trial_folder('/your/path/to/trial_dir')
runner.run('your question')

4-2. Run as an API server

You can run this pipeline as an API server.

Check out the API endpoint at here.

import nest_asyncio
from autorag.deploy import ApiRunner

nest_asyncio.apply()

runner = ApiRunner.from_trial_folder('/your/path/to/trial_dir')
runner.run_api_server()
autorag run_api --trial_dir your/path/to/trial_dir --host 0.0.0.0 --port 8000

The cli command uses extracted config YAML file. If you want to know it more, check out here.

4-3. Run as a Web Interface

you can run this pipeline as a web interface.

Check out the web interface at here.

autorag run_web --trial_path your/path/to/trial_path

sample web interface

web_interface

โ˜Ž๏ธ FaQ

๐Ÿ’ป Hardware Specs

โญ Running AutoRAG

๐Ÿฏ Tips/Tricks

โ˜Ž๏ธ TroubleShooting

Thanks for shoutout

Company

llama index

Individual


โœจ Contributors โœจ

Thanks go to these wonderful people:

Contribution

We are developing AutoRAG as open-source.

So this project welcomes contributions and suggestions. Feel free to contribute to this project.

Plus, check out our detailed documentation at here.

Citation

@misc{kim2024autoragautomatedframeworkoptimization,
      title={AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline},
      author={Dongkyu Kim and Byoungwook Kim and Donggeon Han and Matouลก Eibich},
      year={2024},
      eprint={2410.20878},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.20878},
}

Release History

VersionChangesUrgencyDate
v0.3.22## What's Changed * [Feature Request] New NVIDA reranker module by @hypoxisaurea in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1199 * feat: add MiniMax LLM as a first-class generator module by @octo-patch in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1204 * fix: use restricted unpickler for BM25 corpus files to prevent arbitrary code execution (CWE-502) by @sebastiondev in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1205 * Upgrade LangChain to v1 and harden optional integratHigh4/3/2026
v0.3.21## What's Changed * remove dynamic version and replace build backend to hatchling. add tyโ€ฆ by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1186 **Full Changelog**: https://github.com/Marker-Inc-Korea/AutoRAG/compare/v0.3.20...v0.3.21Low11/14/2025
v0.3.20## What's Changed * Add the vllm embedding model by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1160 * Delete GUI-related codes for compact code base by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1176 * Refactor docs and pypi publish github actions by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1178 **Full Changelog**: https://github.com/Marker-Inc-Korea/AutoRAG/compare/v0.3.19...v0.3.20Low11/14/2025
v0.3.19## What's Changed * Enable the chat prompt use using the openai_llm, llama_index_llm, and vllm_api by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1150 * Modify README.md with new badge and image by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1151 * fix the generator modules by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1153 * Fix few errors and dump version 0.3.19 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1155 Low10/13/2025
v0.3.18## What's Changed * Add chat_fstring module and support chat function in vllm. (+reasoning) by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1141 * Fix vLLM model_executor deletion error for newer vLLM versions by @Copilot in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1144 * Add comprehensive GitHub Copilot instructions for AutoRAG development by @Copilot in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1146 * Cohere version upgrade to v.3.18.0 by @vkehfdl1 in https:Low9/20/2025
v0.3.17## What's Changed * Huggingface model automatically don't use async mode while ingesting by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1126 * Add uv run sphinx-build in sphinx.yml by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1128 * change openai version to the latest and replace openai.resources.betaโ€ฆ by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1131 * Bump tj-actions/changed-files from 44 to 46 in /.github/workflows by @dependabot[bLow8/31/2025
v0.3.16## What's Changed * Fix broken link in README.md by @zenoengine in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1098 * Correct Japanese prompts to be more natural. by @sappho192 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1103 * prevent unicode decoder error. by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1108 * Add support for openai_like embedding model in semantic splitter llama_index module by @parssky in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1109Low6/22/2025
v0.3.14## What's Changed * Make AutoRAG to Monorepo by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/960 * Change install method to yarn install by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1090 * Run the GUI next.js application using docker compose by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1092 * Add gpt-4.5-preview model in openai_llm.py. by @minsing-jin in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1095 * dump version 0.3.14 by @Low3/3/2025
v0.3.13## What's Changed * ๐Ÿš‘ fix: Update container image tags for API services to use the latestโ€ฆ by @hongsw in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1048 * remove embedding_model from kwargs for passage filter module by @rjwharry in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1043 * Add score and other metadatas at /v1/retrieve endpoint by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1055 * update sphinx github actions by @vkehfdl1 in https://github.com/Marker-IncLow1/25/2025
v0.3.12## What's Changed * At cli, the default api remote setting is False now by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1028 * [HotFix] AutoRAG api error fix by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1032 * docs: update Milvus configuration examples by @e7217 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1030 * Add instruction about removal of file name related file name by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/1038 * chaLow12/9/2024
v0.3.11## What's Changed * docs[fix]: modify contents on upstage parser by @e7217 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/967 * Resolve Pydantic 2.10.0 conflict issue with latest LlamaIndex by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/973 * Add Qdrant vectorDB by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/976 * Replace to local embeddings at the gpu sample config YAML files by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/988 * AddLow11/29/2024
v0.3.10## What's Changed * Add Integration part at docs by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/939 * Update README.md by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/941 * Add Weaviate VectorDB by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/949 * add documentation for evaluate your custom rag by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/953 * Add /v1/retrieve endpoint at API server by @vkehfdl1 in https://github.com/MarkLow11/20/2024
v0.3.9## What's Changed * Edit documentation about data schema and descriptions by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/905 * autorag โ€”version by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/913 * [Hotfix] fix hf space url at README.md by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/917 * โœจ feat: improve sample size handling in Validator class by @hongsw in https://github.com/Marker-Inc-Korea/AutoRAG/pull/912 * Fix error that missing init Low11/11/2024
v0.3.8## What's Changed * Feature/docker deploy push by @hongsw in https://github.com/Marker-Inc-Korea/AutoRAG/pull/887 * Edit stream API endpoint and add instructions deploying kotaemon to fly.io by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/891 * Delete trial path logic at parse & chunk + add detail docs & tutorial at docs by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/894 * Feature/#892 by @rjwharry in https://github.com/Marker-Inc-Korea/AutoRAG/pull/895 * Low10/30/2024
v0.3.7## What's Changed * fix the error and release 0.3.5-rc1 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/842 * Add Huggingface Space at README.md by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/847 * Add new Sample YAML file by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/848 * Fix README.md by @Jake-Song in https://github.com/Marker-Inc-Korea/AutoRAG/pull/850 * Add AWS Bedrock llm and upgrade VERSION 0.3.6 by @bwook00 in https://github.com/MaLow10/24/2024
v0.3.5## What's Changed * Run validation at the start_trial by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/826 * AutoRAG api version & api docker container + gpu version docker container by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/823 * Add FlashRank Reranker module by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/818 * set the fixed port number of the panel dashboard by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/827 * chanLow10/13/2024
v0.3.4## What's Changed * Add OpenVINO Reranker module by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/808 * Properly truncate to 8000 tokens when we use OpenAI Embeddings by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/812 * Refactor API server with streaming and passage return by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/810 * โœจ feat: Added Docker push workflow, Dockerfile updates, and build script by @hongsw in https://github.com/Marker-Inc-Low10/9/2024
v0.3.3## What's Changed * [Parse Bug] Fix only parse the first page of the whole pdf files by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/783 * [Parse Bug] Add non-table exists page to use clova.py by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/784 * Prevent error that httpx uses different event loop at method chaining on the QA by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/785 * add deepeval metrics by @Eastsidegunn in https://github.com/MarkeLow10/5/2024
v0.3.2## What's Changed * [Hotfix] Fix parse path at support.py by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/778 **Full Changelog**: https://github.com/Marker-Inc-Korea/AutoRAG/compare/v0.3.1...v0.3.2Low10/3/2024
v0.3.1## What's Changed * Add toctree by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/745 * Fix minor errors at the documentations by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/747 * add effective_order at bleu as True by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/748 * add passage dependency filter at data creation by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/751 * Add Passage Dependency at README.md by @bwook00 in https:/Low10/2/2024
v0.3.0## What's Changed * Refactoring to v3.0 for efficient deployment by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/727 * resolve vllm error by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/735 * Change data creation package names to v0.3 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/740 * Add more yaml file by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/743 * Update README for v 0.3.0 by @bwook00 in https://github.com/MarkerLow9/25/2024
v0.2.18## What's Changed * change add_file_name language notation by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/717 * Ingest bm25_tokenizer and embedidng only in the strategy of other modules by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/716 * OpenAI o1 model compatibility by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/719 * Compatible with Langchain version 0.3.0 by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/724 * Release/vLow9/19/2024
v0.2.17## What's Changed * Add update corpus feature for chunking optimization by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/706 * Add func annotation about parse module by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/708 * Add baseline beta docs by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/710 * Finish new data creation documentation by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/711 * Finish Chunk and Parse documentation by Low9/16/2024
v0.2.16## What's Changed * Replace FastAPI with Flask by @rjwharry in https://github.com/Marker-Inc-Korea/AutoRAG/pull/657 * Mock all OpenAI Embeddings at the test code for outside contributors by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/659 * Add basic dataset schema for new 'beta' version of data creation by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/663 * Add AutoParse baseline and module 'langchain_parse' and 'clova' by @bwook00 in https://github.com/MarkLow9/13/2024
v0.2.15## What's Changed * Update ragas.md to fix typo by @cd80 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/628 * Add optional parameter 'exist_gen_gt' at make_qa_with_existing_queries function. by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/620 * Refactor contributing guide to real informative one by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/625 * Add ruff Linter and reformat all codes by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/622Low8/28/2024
v0.2.14## What's Changed * Refactor code for removing few warnings by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/605 * Resolve asyncio error at FastAPI server execution by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/611 * update cohere version to the latest (which is okay) by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/612 * Update issue templates by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/617 * Add New feature: existing_Low8/19/2024
v0.2.13## What's Changed * Add few description for better understanding of AutoRAG by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/596 * Add ko_okt and ko_kkma bm25 tokenizer by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/597 * Add autorag validate for validating system setup easily by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/599 * update modules pictures at README.md and delete tutorial step 2 by @bwook00 in https://github.com/Marker-Inc-KoreLow8/6/2024
v0.2.12## What's Changed * refactor bleu type hint for compatibility below python 3.10 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/584 * empty cache when bert score calculation finished by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/590 * Downgrade cohere version by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/593 * dump version 0.2.12 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/595 **Full Changelog**: https://github.Low8/3/2024
v0.2.11## What's Changed * add support for gpt-4o-mini by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/581 * Impute relevance score when there is no score at the hybrid fusion. by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/580 * dump version v0.2.11 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/582 **Full Changelog**: https://github.com/Marker-Inc-Korea/AutoRAG/compare/v0.2.10...v0.2.11Low7/23/2024
v0.2.10## What's Changed * add example yaml code at G-eval docs to use specific G-eval metrics by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/572 * Refactoring hybrid retrieval with more advanced optimizations by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/574 * dump version v0.2.10 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/575 **Full Changelog**: https://github.com/Marker-Inc-Korea/AutoRAG/compare/V0.2.9...v0.2.10Low7/15/2024
V0.2.9## What's Changed * Feature/#550 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/551 * make sacrebleu parameter available in upper stream by @Eastsidegunn in https://github.com/Marker-Inc-Korea/AutoRAG/pull/549 * update docs 'batch' parameter by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/552 * delete sacrebleu extra dependency for Korean and Japanese by @Eastsidegunn in https://github.com/Marker-Inc-Korea/AutoRAG/pull/554 * Add window_replacement module aLow7/9/2024
v0.2.8## What's Changed * add emoji at README by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/534 * Fix typo at query decompose and edit documentation of it by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/537 * delete whitespace and empty string at expanded queries by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/541 * Resolve error when using strategy at query expansion by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/543 * Add decLow6/30/2024
v0.2.7## What's Changed * change the embedding model type list to the latest version by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/523 * Delete python 3.8 tag and add 3.11, 3.12 tag at pyproject.toml by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/525 * Fix vllm to mockllm at add more llm models docs by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/524 * Openai token limit feature at passage augmenter and passage filter. by @vkehfdl1 in https://gitLow6/25/2024
v0.2.6## What's Changed * dump version v0.2.6 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/509 * window read_parquet error by @Eastsidegunn in https://github.com/Marker-Inc-Korea/AutoRAG/pull/513 * Add meta tag at each docs by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/514 * change package name to 'evaluation' by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/516 ## New Contributors * @gnekt made their first contribution in https://github.Low6/23/2024
v0.2.5## What's Changed * add reset_index at cast_qa_dataset and cast_corpus_dataset by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/475 * Adjust embedding batch size at YAML file by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/481 * Directly watch each module result at dashboard by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/480 * release 0.2.4 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/482 * Resolve error that tokenized toLow6/12/2024
v0.2.3## What's Changed * add funding.yml by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/471 * HotFix at vllm.py by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/473 **Full Changelog**: https://github.com/Marker-Inc-Korea/AutoRAG/compare/v0.2.2...v0.2.3Low6/3/2024
v0.2.2## What's Changed * reset index at split dataframe by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/469 **Full Changelog**: https://github.com/Marker-Inc-Korea/AutoRAG/compare/v0.2.1...v0.2.2Low5/30/2024
v0.2.1## What's Changed * Implement longllmlingua compressor module by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/459 * Use generator module at hyde and decompose instead of LLMPredictorType from LlamaIndex by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/464 * Add normalize mean strategy by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/465 * implement multi_query_expansion module by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/461 *Low5/29/2024
v0.2.0## What's Changed * Add MockLLM to resolve restart evaluate bug by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/453 * Add 'rank' strategy as selecting best module strategy by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/451 * Delete ragas context precision & Enable dict input of retrieval metrics by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/455 * Can't use passage compressor when there is no retrieval gt values in QA datset by @vkehfdl1 inLow5/21/2024
v0.1.12## What's Changed * [Hotfix] delete assert top_k at passage_augmenter/base by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/435 * [Hotfix] add **kwargs at pass_passage_augmenter by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/437 * Add gpt-4o support at openai_llm by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/441 * [Hotfix] Respond when the input 'scores_list' is `np.ndarray` at threshold_cutoff_pure by @bwook00 in https://github.com/Marker-ILow5/17/2024
v0.1.11## What's Changed * Fix error at UPR and Colbert Reranker on GPU by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/424 * [HotFix] empty cache and return only float value at vllm.py by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/426 * Refactor prompt maker run.py for preventing long processing time and OOM by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/427 * Add Ragas context precision metric by @vkehfdl1 in https://github.com/Marker-Inc-KoreLow5/15/2024
v0.1.10## What's Changed * Add various option for tokenize BM25 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/409 * Mock cohere and JinaAI test codes by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/420 * Add Metrics documentation by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/419 * dump version 0.1.10 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/423 **Full Changelog**: https://github.com/Marker-Inc-Korea/AutoRAG/compare/vLow5/10/2024
v0.1.9## What's Changed * Add retrieval_ndcg_metric by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/386 * Add mrr retrieval metric by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/390 * Add MAP (Mean Average Precision) retrieval metric by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/401 * Make new openai_llm generator module by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/404 * Parallel Processing of Ko Reranker by @bwook00 in https:Low5/1/2024
v0.1.8## What's Changed * add library setuptools-scm and build by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/387 * [Hotfix] Add pass augmenter at support by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/389 * Show logging at module level by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/395 * Resolve minor error at colbert and long_context_reorder by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/397 * Resolve summary.csv error at reLow4/29/2024
v0.1.7## What's Changed * Parallel processing at UPR reranker by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/334 * dump version 0.1.7 by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/385 **Full Changelog**: https://github.com/Marker-Inc-Korea/AutoRAG/compare/v0.1.6...v0.1.7Low4/28/2024
v0.1.6## What's Changed * test for large embeddings and version control for latest chromaDB by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/375 * Fix retrieval and retrieval token metrics to use unaswerable by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/371 * add pypi deploy CD by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/374 * Parallel Processing of Colbert Reranker by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/378 * MinoLow4/27/2024
v0.1.5## What's Changed * Create 'Passage Augmenter' node and its first module 'prev_next_augmenter' by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/347 * [Hotfix] resolve Tart reranker error by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/363 * [Hotfix] resolve colbert, flag embedding reranker batch error by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/366 * Add pass passage augmenter module by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRALow4/24/2024
v0.1.4## What's Changed * Add Recency filter module by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/312 * Resolve error when put larger than corpus_df at 'make_single_content_qa' content_size by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/324 * Parallel processing at sentence transformer reranker by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/325 * Add refine module by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/323 * fix errorLow4/22/2024
v0.1.3## What's Changed * Add Hybrid RSF(relative score fusion) retrieval by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/287 * Add Passage Filter node and its first module 'similarity_threshold_cutoff' by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/290 * Add Hybrid DBSF(Distribution-based Score Fusion) retrieval by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/289 * Move README supporting nodes & modules to Notion Page by @bwook00 in https://githLow4/14/2024
v0.1.2## What's Changed * Add Jina Reranker by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/269 * Add Setence Transformer Reranker by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/274 * Add Colbert Reranker by @vkehfdl1 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/271 * Add Flag Embedding Reranker by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/277 * Add long context reorder by @bwook00 in https://github.com/Marker-Inc-Korea/AutoRAG/pull/281 Low4/10/2024

Dependencies & License Audit

Loading dependencies...

Similar Packages

arthur-engineMake AI work for Everyone - Monitoring and governing for your AI/ML 2.1.601
opikDebug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.2.0.56
aragA-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces. State-of-the-art RAG framework with keyword, semantic, and chunk read tools for multi-hop QA.v0.1.0
aitoolmanControllable and Transparent LLM Application Frameworkmain@2026-06-06
opentulpaSelf-hosted personal AI agent that lives in your DMs. Describe any workflow: triage Gmail, pull a Giphy feed, build a Slack bot, monitor markets. It writes the code, runs it, schedules it, and saves imain@2026-06-05

More in Frameworks

ctranslate2Fast inference engine for Transformer models
schemathesisProperty-based testing framework for Open API and GraphQL based apps
spec_driven_developSpec-Driven Develop is a platform-agnostic AI agent skill that automates the pre-development workflow for large-scale complex tasks. It is not a framework, not a runtime, not a package manager โ€” it is
DrasilGenerate all the things (focusing on research software)