arxiv:2603.20278

OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

Published on Mar 17

· Submitted by

ZhuofengLi on Mar 24

TIGER-Lab

Upvote

Authors:

Abstract

OpenResearcher presents a reproducible pipeline for training deep research agents using offline search environments and synthesized trajectories, achieving improved accuracy on benchmark tasks.

AI-generated summary

Training deep research agents requires long-horizon trajectories that interleave search, evidence aggregation, and multi-step reasoning. However, existing data collection pipelines typically rely on proprietary web APIs, making large-scale trajectory synthesis costly, unstable, and difficult to reproduce. We present OpenResearcher, a reproducible pipeline that decouples one-time corpus bootstrapping from multi-turn trajectory synthesis and executes the search-and-browse loop entirely offline using three explicit browser primitives: search, open, and find, over a 15M-document corpus. Using GPT-OSS-120B as the teacher model, we synthesize over 97K trajectories, including a substantial long-horizon tail with 100+ tool calls. Supervised fine-tuning a 30B-A3B backbone on these trajectories achieves 54.8\% accuracy on BrowseComp-Plus, a +34.0 point improvement over the base model, while remaining competitive on BrowseComp, GAIA, and xbench-DeepSearch. Because the environment is offline and fully instrumented, it also enables controlled analysis, where our study reveals practical insights into deep research pipeline design, including data filtering strategies, agent configuration choices, and how retrieval success relates to final answer accuracy. We release the pipeline, synthesized trajectories, model checkpoints, and the offline search environment at https://github.com/TIGER-AI-Lab/OpenResearcher.

View arXiv page View PDF Project page GitHub 447 Add to collection

Community

ZhuofengLi

Paper submitter about 2 hours ago

•

edited about 2 hours ago

🔥 Introducing OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis.

Already adopted by NVIDIA's Nemotron family of models — excited for what’s next.

🚀 We explore how to synthesize long-horizon research trajectories for deep-research agents — fully offline, scalable, and low-cost, without relying on live web APIs.

🔑 Two Key Ideas

📚 Offline Corpus

15M FineWeb documents
10K high-quality bootstraping gold passages

🔎 Explicit Browsing Primitives

search / open / find
Enables multi-scale evidence discovery

📊 Results

🥇 Achieves 54.8% on BrowseComp-Plus, ranking #1 among open-source models on the leaderboard.
Outperforms much larger models: GPT-4.1, Claude-Opus-4, Gemini-2.5-Pro, DeepSeek-R1, Tongyi-DeepResearch
Strong performance across: BrowseComp, GAIA, xbench-DeepSearch

💡 Insights

Beyond benchmark accuracy, we provide one of the first systematic analyses of deep-research trajectory synthesis, including:

RQ1: Is answer correctness alone sufficient as supervision?
RQ2 & RQ3: How do corpus coverage and turn budget affect synthesis?
RQ4: Why does search-only fail for deep research?
RQ5: Why do failures occur even after retrieving relevant evidence?

The answer can be found in paper Section 4.5 (Ablation Study and Discussion).

🔗 Resources

👨‍💻 GitHub: https://github.com/TIGER-AI-Lab/OpenResearcher
🤗 Models & Data: https://huggingface.co/collections/TIGER-Lab/openresearcher
🚀 Demo: https://huggingface.co/spaces/OpenResearcher/OpenResearcher

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment