Exploration and Exploitation Errors Are Measurable for Language Model Agents Paper • 2604.13151 • Published 13 days ago • 24
VideoLLaMA2 Collection Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability • 13 items • Updated Sep 2, 2025 • 20
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 9
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 513
UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios Paper • 2505.21954 • Published May 28, 2025 • 1