freshcrate
Skin:/
Home > Frameworks > Agentic-RAG-R1

Agentic-RAG-R1

Agentic RAG R1 Framework via Reinforcement Learning

Why this rank:Strong adoptionRelease freshnessHealthy release cadence

Description

Agentic RAG R1 Framework via Reinforcement Learning

README

๐Ÿค– Agentic RAG-R1: Enhance Agentic RAG Reasoning Capacity via Reinforcement Learning ๐Ÿš€

Table of Contents

Introduction ๐ŸŒŸ

Agentic RAGโ€‘R1 is an openโ€‘source initiative to build an Agentic Retrievalโ€‘Augmented Generation (RAG) system by endowing a base language model with autonomous search & reasoning skills through reinforcement learning (currently using the GRPO algorithm).

Chinese Language Version:

Chinese version results

English Language Version:

English version results

What is Agentic RAG? ๐Ÿ’ก

Agentic RAG combines two powerful concepts:

  • Retrievalโ€‘Augmented Generation (RAG): Combines generative power with onโ€‘theโ€‘fly retrieval from external knowledge bases, ensuring factual and upโ€‘toโ€‘date answers.
  • Agentic AI: Gives the model the ability to decide when to retrieve, what to retrieve, and how to weave the retrieved evidence into its reasoning.

Agentic RAG concept

Architecture ๐Ÿ—๏ธ

Our architecture is inspired by TCโ€‘RAG and features an agent memory stack that orchestrates the full deliberation loop, supporting the following actions:

  1. Plan (โŒ)
  2. Reasoning (โœ…)
  3. Backtrack (โœ…)
  4. Summary (โœ…)
  5. Tool Observation โ€“ wiki/document/knowledgeโ€‘graph search, etc. (โœ…)
  6. Conclusion (โœ…)

Architecture diagram

Training Strategy ๐Ÿง 

Motivated by DeepSeek-R1, we apply GRPO (Generalized Relevance Policy Optimization) to reinforce the agent's choice of reasoning steps and retrieval actions, effectively boosting both search depth and answer quality.

Training strategy diagram

Rollout Generation ๐Ÿ”„

Rollout generation diagram

Installation ๐Ÿ› ๏ธ

We use conda to manage the environment. Follow these steps to set up:

conda create -n AgenticRAG python=3.11 -y
conda activate AgenticRAG 
pip install -r requirements.txt

Tools Environment (Optional) ๐Ÿงฐ

We provide our search tool repository ArtSearch as the search engine, which supports retrieval of information from Wikipedia. You can follow the instructions in that repository to deploy a local instance of the search system.

Folder Structure ๐Ÿ“

.
โ”œโ”€โ”€ ArtSearch                 # Search tool integration
โ”œโ”€โ”€ checkpoints               # Model checkpoints
โ”œโ”€โ”€ examples                  # Example use cases
โ”œโ”€โ”€ experiments
โ”‚   โ”œโ”€โ”€ evaluation            # Evaluation scripts and results
โ”‚   โ””โ”€โ”€ training              # Training configurations
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ script
โ”‚   โ”œโ”€โ”€ evaluation            # Evaluation scripts
โ”‚   โ”œโ”€โ”€ run_server.sh         # Server deployment script
โ”‚   โ””โ”€โ”€ training              # Training scripts
โ”œโ”€โ”€ service
โ”‚   โ”œโ”€โ”€ chat_client.py        # Client for interacting with the model
โ”‚   โ””โ”€โ”€ chat_server.py        # Server for hosting the model
โ”œโ”€โ”€ src
โ”‚   โ”œโ”€โ”€ config                # Configuration files
โ”‚   โ”œโ”€โ”€ data                  # Data processing utilities
โ”‚   โ”œโ”€โ”€ evaluation            # Evaluation metrics and tools
โ”‚   โ”œโ”€โ”€ models                # Model definitions
โ”‚   โ”œโ”€โ”€ train.py              # Main training script
โ”‚   โ””โ”€โ”€ utils                 # Utility functions

Quick Start โšก

Follow the steps below to get up and running with Agentic RAGโ€‘R1.

Before you start, rename file ".env_format" to ".env" and fill the necessary os enviroment variables.

Training

  • Zeroโ€‘2 Mode

./script/training/train_zero2.sh

  • Zeroโ€‘3 Mode

./script/training/train_zero3.sh

Inference

  • Example Mode

comming soon~

  • Server Mode

Launch the chat server:

./script/run_server.sh

Features โœจ

  • LoRA Tuning Support ๐Ÿ”ง: Fine-tune efficiently with Low-Rank Adaptation

  • Model Quant Support ๐Ÿ’ป: Support model quant to nf4 and ..

  • Custom Agent Tools ๐Ÿ› ๏ธ: Integrate your own tools and personal RAG datasets

  • Distributed Training ๐ŸŒ: Support for Deepspeed Zero 2 Stage and Zero 3 Stage

  • Efficient Resource Usage ๐Ÿ’ป: Support for models up to 32B parameters using only 2 A100 GPUs

  • Tool Calling Reward ๐ŸŽฏ: Enhanced reward model that includes:

    • Accuracy reward
    • Format reward
    • RAG accuracy reward using the RAGAS framework

    The total reward is calculated as:

    $$r_{total} = r_{accuracy} + r_{format} + r_{rag}$$

  • TCRAG Integration ๐Ÿ”—: Use TCRAG as the rollout generator

Results ๐Ÿ“Š

Experiment Log on Qwen 2.5-7B-Instruct

Experiment log

We have made our training logs publicly available at: SwanLab Training Log

Results on MedQA Test Set ๐Ÿฅ

Our Qwen 2.5-7B-Instruct model was evaluated on the MedQA test set using Qwenโ€‘2.5โ€‘72B as the judge:

Configuration Format Accuracy Answer Accuracy
Before fine-tuning 39% 84%
Before fine-tuning + search 56% 79%
After fine-tuning (200 steps) + search 92% 87%

Roadmap ๐Ÿ—บ๏ธ

  • Add more tools
  • [Additional planned features]

Acknowledgements ๐Ÿ™

The concept of Agentic-RAG-R1 is inspired by Deepseek-R1 and TC-RAG. We sincerely appreciate the efforts of these teams for their contributions to open-source research and development. This work is in the same period as work with Search-R1 and ReSearch.

Contributors๐Ÿ“

Supervisors: Junfeng Zhao, Xu Chu, Yasha Wang

Affiliation: Key Laboratory of High Confidence Software Technologies (Peking University), School of Computer Science, Peking University, China

Citation ๐Ÿ“

If you use this work in your research, please cite:

@misc{Agentic_RAG_R1,
  title       = {Agentic RAG-R1: Enhance Agentic RAG Reasoning Capacity via Reinforcement Learning},
  author      = {Xinke Jiang, Jiaran Gao, Rihong Qiu, Zhixin Zhang, Wentao Zhang, Yue Fang, Hongxin Ding},
  year        = {2025},
  howpublished= {\url{https://github.com/jiangxinke/Agentic-RAG-R1}},
  note        = {GitHub repository},
}

๐ŸŒŸ Star History

Star History Chart

License ๐Ÿ“„

This project is licensed under the Apache License. See the LICENSE file for details.

Release History

VersionChangesUrgencyDate
0.0.0No release found โ€” using repo HEADLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026
dev@2026-02-16Latest activity on dev branchLow2/16/2026

Dependencies & License Audit

Loading dependencies...

Similar Packages

deer-flowAn open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of tamain@2026-06-06
opentulpaSelf-hosted personal AI agent that lives in your DMs. Describe any workflow: triage Gmail, pull a Giphy feed, build a Slack bot, monitor markets. It writes the code, runs it, schedules it, and saves imain@2026-06-05
kg-ragThis project implements a comprehensive framework for Knowledge Graph Retrieval Augmented Generation (KG-RAG). It focuses on financial data from SEC 10-Q filings and explores how knowledge graphs can main@2026-06-03
langgraphBuild resilient language agents as graphs.1.2.4
auto-re-agentAutomate binary analysis by coordinating LLM agents with Ghidra, enabling scalable and precise reverse engineering workflows.main@2026-06-02

More in Frameworks

langchainThe agent engineering platform
deer-flowAn open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of ta
tqdmFast, Extensible Progress Meter
simBuild, deploy, and orchestrate AI agents. Sim is the central intelligence layer for your AI workforce.