Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously Paper • 2603.12262 • Published 5 days ago • 26
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions Paper • 2602.08711 • Published Feb 9 • 28
HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing Paper • 2602.03560 • Published Feb 3 • 47
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models Paper • 2602.04804 • Published Feb 4 • 48
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers Paper • 2601.17367 • Published Jan 24 • 34