Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published 8 days ago • 114
GEMS: Agent-Native Multimodal Generation with Memory and Skills Paper • 2603.28088 • Published 15 days ago • 85
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 20 days ago • 130 • 9
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 20 days ago • 130
XSkill: Continual Learning from Experience and Skills in Multimodal Agents Paper • 2603.12056 • Published Mar 12 • 33
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published Feb 26 • 45
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads Paper • 2602.09443 • Published Feb 10 • 59
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations Paper • 2602.05885 • Published Feb 5 • 28
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning Paper • 2602.04634 • Published Feb 4 • 99
LatentMem: Customizing Latent Memory for Multi-Agent Systems Paper • 2602.03036 • Published Feb 3 • 14
LatentMem: Customizing Latent Memory for Multi-Agent Systems Paper • 2602.03036 • Published Feb 3 • 14
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 102
AgentDoG Collection A Diagnostic Guardrail Framework for AI Agent Safety and Security • 10 items • Updated 5 days ago • 107