LawThinker: A Deep Research Legal Agent in Dynamic Environments Paper • 2602.12056 • Published about 18 hours ago • 27
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning Paper • 2602.12099 • Published about 18 hours ago • 27
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published about 17 hours ago • 34
AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions Paper • 2602.06008 • Published 8 days ago • 4
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published 2 days ago • 167
G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design Paper • 2602.08253 • Published 4 days ago • 25
Vision Transformer Finetuning Benefits from Non-Smooth Components Paper • 2602.06883 • Published 7 days ago • 4
ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution Paper • 2602.03075 • Published 10 days ago • 6
OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale Paper • 2602.05711 • Published 8 days ago • 9
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention Paper • 2602.05847 • Published 8 days ago • 12
QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals Paper • 2602.02581 • Published 13 days ago • 9
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning Paper • 2602.06960 • Published 7 days ago • 10
Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers Paper • 2602.06079 • Published 9 days ago • 18
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published 7 days ago • 30
Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making Paper • 2602.06570 • Published 7 days ago • 59
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare Paper • 2602.06717 • Published 7 days ago • 68