PyVision-RL: Forging Open Agentic Vision Models via RL Paper • 2602.20739 • Published 2 days ago • 26
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published 1 day ago • 76
SkillOrchestra: Learning to Route Agents via Skill Transfer Paper • 2602.19672 • Published 3 days ago • 47
CADEvolve: Creating Realistic CAD via Program Evolution Paper • 2602.16317 • Published 8 days ago • 26
AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines Paper • 2602.14296 • Published 11 days ago • 47
Discovering Multiagent Learning Algorithms with Large Language Models Paper • 2602.16928 • Published 8 days ago • 14
HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam Paper • 2602.13964 • Published 11 days ago • 2
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 13 days ago • 52
ResearchGym: Evaluating Language Model Agents on Real-World AI Research Paper • 2602.15112 • Published 10 days ago • 20
BrowseComp-V^3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents Paper • 2602.12876 • Published 13 days ago • 8
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents Paper • 2602.14234 • Published 11 days ago • 26
Multimodal Fact-Level Attribution for Verifiable Reasoning Paper • 2602.11509 • Published 14 days ago • 4
P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling Paper • 2602.12116 • Published 14 days ago • 4