LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model Paper • 2603.01068 • Published 6 days ago • 19
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published Oct 16, 2025 • 68
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published 3 days ago • 79
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published 22 days ago • 20
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published 26 days ago • 50
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models Paper • 2602.13191 • Published 21 days ago • 30
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published 26 days ago • 50
lmms-lab-encoder/wd_temporal_grounding_frames_max_64_max_448x448_pixels_with_fps Updated 20 days ago • 153
lmms-lab-encoder/wd_temporal_grounding_frames_max_64_max_448x448_pixels_with_fps Updated 20 days ago • 153
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning Paper • 2602.12099 • Published 22 days ago • 57
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder Paper • 2510.18795 • Published Oct 21, 2025 • 11