Why this rank:Strong adoptionRecent releaseHealthy release cadence
Description
The implementation for SIGIR 2026: Learning to Retrieve from Agent Trajectories.
README
[SIGIR 2026] Learning to Retrieve from Agent Trajectories
Retrieval is no longer optimized only for human searchers. As large language model agents increasingly issue queries, inspect snippets, browse documents, and reason over retrieved evidence, the target of retrieval training has shifted from human interaction to agent interaction. LRAT studies this paradigm shift and learns retrievers directly from multi-step agent trajectories.
Training data for agent native search should match how search agents actually search, browse, and consume evidence.
LRAT studies how to train retrievers from the intermediate behaviors of strong search agents rather than from only final answers. The repository focuses on a practical pipeline for:
collecting long-horizon search trajectories from agentic systems,
converting trajectories into retrieval supervision,
training retrievers on the resulting samples, and
evaluating both retrieval quality and end-to-end task success.
Trajectory-first retrieval learning: build retriever supervision from agent search and browse traces instead of relying only on static relevance labels.
Agent-friendly data collection: run local or API-based research agents and save each query as structured trajectory JSON.
Training data construction with an LLM judge: turn trajectories into (query, pos, neg, ...) training pairs with reasoning-aware annotations.
Benchmark-oriented evaluation: evaluate outputs on BrowseComp-Plus and InfoSeek-Eval with a local vLLM judge.
Core utilities for index construction and trajectory-to-training-data conversion
search_agent/
Agent clients for Tongyi DeepResearch, WebExplorer, AgentCPM, OpenAI-compatible APIs, and related prompts/utilities
searcher/
Search backends and local retrieval interfaces
docs/
Step-by-step documentation for indexing, trajectory construction, training data construction, and evaluation
datasets/
Benchmark files used in evaluation
topics-qrels/
Query and qrel files for retrieval experiments
trajectory/
Example trajectory artifacts
FlagEmbedding/
Local copy of FlagEmbedding used for retriever training
tevatron/
Local copy of Tevatron utilities used in dense retrieval workflows
scripts_evaluation/
Evaluation scripts for end-to-end judging
Vendored Dependencies
FlagEmbedding/ is a vendored and locally modified copy based on the upstream FlagEmbedding project. In this repository, it reflects user-side modifications layered on top of upstream work and earlier external changes.
tevatron/ is a vendored upstream dependency used to support dense retrieval utilities and encoding workflows.
If you do not want to build training data from scratch, you can directly use the released LRAT-Train dataset. If you prefer to control filtering or supervision design yourself, you can also start from saved agent trajectories and rerun pair extraction with src/data_builder.py.
You can plug the JSONL generated by src/data_builder.py into your existing training setup without changing the repository-level presentation structure.
This repository is released under the Apache License 2.0. See LICENSE.
Vendored components keep their own upstream licenses, especially:
FlagEmbedding/ under its upstream MIT license
tevatron/ under Apache License 2.0
Citation
If you find this repository useful, please cite our SIGIR 2026 paper below. The latest public version is available on arXiv.
@inproceedings{zhou2026lrat,
title={Learning to Retrieve from Agent Trajectories},
author={Zhou, Yuqi and Dai, Sunhao and Qu, Changle and Pang, Liang and Xu, Jun and Wen, Ji-Rong},
booktitle={Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2026}
}
Release History
Version
Changes
Urgency
Date
main@2026-06-04
Latest activity on main branch
High
6/4/2026
0.0.0
No release found — using repo HEAD
High
4/8/2026
main@2026-04-08
Latest activity on main branch
High
4/8/2026
main@2026-04-08
Latest activity on main branch
High
4/8/2026
main@2026-04-08
Latest activity on main branch
High
4/8/2026
main@2026-04-08
Latest activity on main branch
High
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Medium
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
main@2026-04-08
Latest activity on main branch
Low
4/8/2026
Dependencies & License Audit
Loading dependencies...
Similar Packages
OpenOutreachLinkedin Automation Tool: Describe your product. Define your target market. The AI finds the leads for you.main@2026-06-05
adk-pythonAn open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.v2.2.0
daily_stock_analysisLLM驱动的 A/H/美股智能分析器:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.v3.20.0
agentscope-javaAgentScope Java: Agent-Oriented Programming for Building LLM Applicationsv2.0.0-RC1
MaiBotMaiSaka, an LLM-based intelligent agent, is a digital lifeform devoted to understanding you and interacting in the style of a real human. She does not pursue perfection, nor does she seek efficiency; 1.0.0-rc.4