MIND: Benchmarking Memory Consistency and Action Control in World Models Paper • 2602.08025 • Published 5 days ago • 9 • 2
CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion Paper • 2602.10999 • Published 1 day ago • 10 • 1
ASA: Training-Free Representation Engineering for Tool-Calling Agents Paper • 2602.04935 • Published 9 days ago • 39 • 2
Benchmarking Large Language Models for Knowledge Graph Validation Paper • 2602.10748 • Published 1 day ago • 5 • 2
StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors Paper • 2602.08934 • Published 3 days ago • 2
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development Paper • 2602.10975 • Published 1 day ago • 17 • 2
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models Paper • 2602.10224 • Published 2 days ago • 15 • 2
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published 2 days ago • 158 • 5
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions Paper • 2602.08711 • Published 4 days ago • 24 • 2
VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval Paper • 2602.08099 • Published 4 days ago • 9 • 2
AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions Paper • 2602.06008 • Published 7 days ago • 4 • 2
GENIUS: Generative Fluid Intelligence Evaluation Suite Paper • 2602.11144 • Published 1 day ago • 44 • 2
Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation Paper • 2602.10699 • Published 2 days ago • 1 • 2
LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation Paper • 2602.10367 • Published 2 days ago • 11 • 2
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models Paper • 2602.09713 • Published 3 days ago • 8 • 3
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning Paper • 2602.10560 • Published 2 days ago • 24 • 1
TIC-VLA: A Think-in-Control Vision-Language-Action Model for Robot Navigation in Dynamic Environments Paper • 2602.02459 • Published 10 days ago • 2 • 2
UMEM: Unified Memory Extraction and Management Framework for Generalizable Memory Paper • 2602.10652 • Published 2 days ago • 2 • 2
GoodVibe: Security-by-Vibe for LLM-Based Code Generation Paper • 2602.10778 • Published 1 day ago • 2 • 3
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression Paper • 2602.11008 • Published 1 day ago • 15 • 3