MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence Paper • 2508.13992 • Published Aug 19, 2025 • 7
FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings Paper • 2604.18109 • Published 8 days ago
Approaching Dialogue State Tracking via Aligning Speech Encoders and LLMs Paper • 2506.08633 • Published Jun 10, 2025
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning Paper • 2510.12000 • Published Oct 13, 2025 • 1
Do Audio-Visual Large Language Models Really See and Hear? Paper • 2604.02605 • Published 25 days ago • 7
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music Paper • 2604.10905 • Published 15 days ago • 28
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music Paper • 2604.10905 • Published 15 days ago • 28
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music Paper • 2604.10905 • Published 15 days ago • 28
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published Mar 14 • 14
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published Mar 14 • 14
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published Mar 14 • 14
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published Mar 14 • 14
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published Mar 14 • 14
Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge Paper • 2505.07365 • Published May 12, 2025
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence Paper • 2508.13992 • Published Aug 19, 2025 • 7
Introducing SSBD+ Dataset with a Convolutional Pipeline for detecting Self-Stimulatory Behaviours in Children using raw videos Paper • 2311.15072 • Published Nov 25, 2023
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence Paper • 2508.13992 • Published Aug 19, 2025 • 7
Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published Nov 13, 2025 • 19
Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published Nov 13, 2025 • 19
Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding Paper • 2508.11818 • Published Aug 15, 2025