papers
updated
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
• 2506.06395
• Published
• 133
Qwen3 Embedding: Advancing Text Embedding and Reranking Through
Foundation Models
Paper
• 2506.05176
• Published
• 79
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
• 2505.24726
• Published
• 277
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Paper
• 2506.03143
• Published
• 53
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
• 2506.01939
• Published
• 188
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware
Reinforcement Learning
Paper
• 2506.01713
• Published
• 48
Large Language Models for Data Synthesis
Paper
• 2505.14752
• Published
• 49
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Paper
• 2505.23762
• Published
• 45
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
• 2505.22617
• Published
• 131
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper
• 2505.22453
• Published
• 46
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based
Mobile GUI Agents
Paper
• 2505.21496
• Published
• 38
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
• 2505.17667
• Published
• 88
Synthetic Data RL: Task Definition Is All You Need
Paper
• 2505.17063
• Published
• 11
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
Learning
Paper
• 2505.16410
• Published
• 58
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement
Learning
Paper
• 2505.16421
• Published
• 19
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Paper
• 2505.15277
• Published
• 105
Efficient Agent Training for Computer Use
Paper
• 2505.13909
• Published
• 44
MMSearch-R1: Incentivizing LMMs to Search
Paper
• 2506.20670
• Published
• 64
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published
• 261
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper
• 2507.15846
• Published
• 133
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
• 2507.09477
• Published
• 88
VeriGUI: Verifiable Long-Chain GUI Dataset
Paper
• 2508.04026
• Published
• 162
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Paper
• 2507.23779
• Published
• 45
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from
Experience
Paper
• 2508.04700
• Published
• 52
A Survey of Self-Evolving Agents: On Path to Artificial Super
Intelligence
Paper
• 2507.21046
• Published
• 84
Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web
Agents
Paper
• 2508.01858
• Published
• 20
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm
Bridging Foundation Models and Lifelong Agentic Systems
Paper
• 2508.07407
• Published
• 98
OpenCUA: Open Foundations for Computer-Use Agents
Paper
• 2508.09123
• Published
• 32
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn
Reinforcement Learning
Paper
• 2509.02544
• Published
• 125
Why Language Models Hallucinate
Paper
• 2509.04664
• Published
• 196
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
• 2509.08721
• Published
• 662
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published
• 233