- ๐ค Agentic RAG-R1: Enhance Agentic RAG Reasoning Capacity via Reinforcement Learning ๐
Agentic RAGโR1 is an openโsource initiative to build an Agentic RetrievalโAugmented Generation (RAG) system by endowing a base language model with autonomous search & reasoning skills through reinforcement learning (currently using the GRPO algorithm).
Chinese Language Version:
English Language Version:
Agentic RAG combines two powerful concepts:
- RetrievalโAugmented Generation (RAG): Combines generative power with onโtheโfly retrieval from external knowledge bases, ensuring factual and upโtoโdate answers.
- Agentic AI: Gives the model the ability to decide when to retrieve, what to retrieve, and how to weave the retrieved evidence into its reasoning.
Our architecture is inspired by TCโRAG and features an agent memory stack that orchestrates the full deliberation loop, supporting the following actions:
- Plan (โ)
- Reasoning (โ )
- Backtrack (โ )
- Summary (โ )
- Tool Observation โ wiki/document/knowledgeโgraph search, etc. (โ )
- Conclusion (โ )
Motivated by DeepSeek-R1, we apply GRPO (Generalized Relevance Policy Optimization) to reinforce the agent's choice of reasoning steps and retrieval actions, effectively boosting both search depth and answer quality.
We use conda to manage the environment. Follow these steps to set up:
conda create -n AgenticRAG python=3.11 -y
conda activate AgenticRAG
pip install -r requirements.txtWe provide our search tool repository ArtSearch as the search engine, which supports retrieval of information from Wikipedia. You can follow the instructions in that repository to deploy a local instance of the search system.
.
โโโ ArtSearch # Search tool integration
โโโ checkpoints # Model checkpoints
โโโ examples # Example use cases
โโโ experiments
โ โโโ evaluation # Evaluation scripts and results
โ โโโ training # Training configurations
โโโ README.md
โโโ requirements.txt
โโโ script
โ โโโ evaluation # Evaluation scripts
โ โโโ run_server.sh # Server deployment script
โ โโโ training # Training scripts
โโโ service
โ โโโ chat_client.py # Client for interacting with the model
โ โโโ chat_server.py # Server for hosting the model
โโโ src
โ โโโ config # Configuration files
โ โโโ data # Data processing utilities
โ โโโ evaluation # Evaluation metrics and tools
โ โโโ models # Model definitions
โ โโโ train.py # Main training script
โ โโโ utils # Utility functions
Follow the steps below to get up and running with Agentic RAGโR1.
Before you start, rename file ".env_format" to ".env" and fill the necessary os enviroment variables.
- Zeroโ2 Mode
./script/training/train_zero2.sh
- Zeroโ3 Mode
./script/training/train_zero3.sh
- Example Mode
comming soon~
- Server Mode
Launch the chat server:
./script/run_server.sh
-
LoRA Tuning Support ๐ง: Fine-tune efficiently with Low-Rank Adaptation
-
Model Quant Support ๐ป: Support model quant to nf4 and ..
-
Custom Agent Tools ๐ ๏ธ: Integrate your own tools and personal RAG datasets
-
Distributed Training ๐: Support for Deepspeed Zero 2 Stage and Zero 3 Stage
-
Efficient Resource Usage ๐ป: Support for models up to 32B parameters using only 2 A100 GPUs
-
Tool Calling Reward ๐ฏ: Enhanced reward model that includes:
- Accuracy reward
- Format reward
- RAG accuracy reward using the RAGAS framework
The total reward is calculated as:
$$r_{total} = r_{accuracy} + r_{format} + r_{rag}$$ -
TCRAG Integration ๐: Use TCRAG as the rollout generator
We have made our training logs publicly available at: SwanLab Training Log
Our Qwen 2.5-7B-Instruct model was evaluated on the MedQA test set using Qwenโ2.5โ72B as the judge:
| Configuration | Format Accuracy | Answer Accuracy |
|---|---|---|
| Before fine-tuning | 39% | 84% |
| Before fine-tuning + search | 56% | 79% |
| After fine-tuning (200 steps) + search | 92% | 87% |
- Add more tools
- [Additional planned features]
The concept of Agentic-RAG-R1 is inspired by Deepseek-R1 and TC-RAG. We sincerely appreciate the efforts of these teams for their contributions to open-source research and development. This work is in the same period as work with Search-R1 and ReSearch.
Supervisors: Junfeng Zhao, Xu Chu, Yasha Wang
Affiliation: Key Laboratory of High Confidence Software Technologies (Peking University), School of Computer Science, Peking University, China
If you use this work in your research, please cite:
@misc{Agentic_RAG_R1,
title = {Agentic RAG-R1: Enhance Agentic RAG Reasoning Capacity via Reinforcement Learning},
author = {Xinke Jiang, Jiaran Gao, Rihong Qiu, Zhixin Zhang, Wentao Zhang, Yue Fang, Hongxin Ding},
year = {2025},
howpublished= {\url{https://github.com/jiangxinke/Agentic-RAG-R1}},
note = {GitHub repository},
}This project is licensed under the Apache License. See the LICENSE file for details.







