Papers
arxiv:2603.25158

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Published on Mar 26
· Submitted by
Jingwei Ni
on Mar 30
Authors:
,
,
,
,
,

Abstract

Trace2Skill enables scalable skill generation for LLM agents by analyzing diverse execution traces in parallel and consolidating them into transferable, declarative skills without parameter updates or external modules.

AI-generated summary

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or sequentially overfits to non-generalizable trajectory-local lessons. To overcome this, we introduce Trace2Skill, a framework that mirrors how human experts author skills: by holistically analyzing broad execution experience before distilling it into a single, comprehensive guide. Instead of reacting sequentially to individual trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to analyze a diverse pool of executions. It extracts trajectory-specific lessons and hierarchically consolidates them into a unified, conflict-free skill directory via inductive reasoning. Trace2Skill supports both deepening existing human-written skills and creating new ones from scratch. Experiments in challenging domains, such as spreadsheet, VisionQA and math reasoning, show that Trace2Skill significantly improves upon strong baselines, including Anthropic's official xlsx skills. Crucially, this trajectory-grounded evolution does not merely memorize task instances or model-specific quirks: evolved skills transfer across LLM scales and generalize to OOD settings. For example, skills evolved by Qwen3.5-35B on its own trajectories improved a Qwen3.5-122B agent by up to 57.65 absolute percentage points on WikiTableQuestions. Ultimately, our results demonstrate that complex agent experience can be packaged into highly transferable, declarative skills -- requiring no parameter updates, no external retrieval modules, and utilizing open-source models as small as 35B parameters.

Community

Nice one! Thx...

very interesting and insightful paper

Paper author Paper submitter

🔥 Trace2Skill – Distilling Trajectory-Local Lessons into Transferable Agent Skills (arXiv:2603.25158)

The Problem:
Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex reasoning tasks, but manual authoring creates a severe scalability bottleneck. On the flip side, automated skill generation often produces fragile or fragmented results—either by relying on shallow parametric knowledge or by sequentially overfitting to local, non-generalizable trajectories.

The Solution: Trace2Skill
This paper introduces Trace2Skill, a framework that mimics how human experts author skills. Rather than sequentially reacting to individual agent trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to holistically analyze a diverse pool of execution experiences. It extracts trajectory-specific lessons and uses inductive reasoning to hierarchically consolidate them into a single, unified, conflict-free skill directory.

🌟 Key Highlights:

  • Human-Like Skill Authoring: Builds broad prior knowledge through extensive, parallel trajectory analysis before drafting or deepening comprehensive skills.
  • Massive Performance Jumps: Significantly improves upon strong baselines across challenging domains (Spreadsheets, VisionQA, Math Reasoning), even beating Anthropic's official xlsx skills.
  • Cross-Model Transferability: Evolved skills generalize incredibly well across LLM scales and out-of-distribution settings! For example, declarative skills evolved purely by a 35B parameter model (Qwen3.5-35B) improved a massive 122B agent by up to 57.65 absolute percentage points on WikiTableQuestions.
  • Plug-and-Play: Achieves these results with no parameter updates and no external episodic retrieval modules needed at inference time.

🚧 Work in Progress & Future Work:
We note that this paper is currently a Work in Progress. While error-driven skill updates provide a highly reliable and safe learning signal, success-derived patches are much more volatile. Although success signals can yield the highest performance gains, they drop below baselines if not filtered perfectly during the hierarchical merge. Therefore, future work will focus on designing a more selective "success analyst" to better filter and stabilize success-derived patches during the skill distillation process.

oh great, thanks for it!

is there any github repo? or any implementation guide?

Awesome work, really impressive

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.25158
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.25158 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.25158 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.25158 in a Space README.md to link it from this page.

Collections including this paper 1