Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation Paper • 2603.12247 • Published 1 day ago • 21
Video-Based Reward Modeling for Computer-Use Agents Paper • 2603.10178 • Published 3 days ago • 28
HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising Paper • 2603.08703 • Published 4 days ago • 29
RISE-Video: Can Video Generators Decode Implicit World Rules? Paper • 2602.05986 • Published Feb 5 • 26
Architecture Decoupling Is Not All You Need For Unified Multimodal Model Paper • 2511.22663 • Published Nov 27, 2025 • 29
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark Paper • 2510.13759 • Published Oct 15, 2025 • 11
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper • 2509.18824 • Published Sep 23, 2025 • 23
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27, 2025 • 84
Cut2Next: Generating Next Shot via In-Context Tuning Paper • 2508.08244 • Published Aug 11, 2025 • 13
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation Paper • 2508.03320 • Published Aug 5, 2025 • 63
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation Paper • 2508.03694 • Published Aug 5, 2025 • 52
AnyI2V: Animating Any Conditional Image with Motion Control Paper • 2507.02857 • Published Jul 3, 2025 • 13
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14, 2025 • 37
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published Jul 14, 2025 • 51
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30, 2025 • 90