Strips as Tokens: Artist Mesh Generation with Native UV Segmentation Paper • 2604.09132 • Published 5 days ago • 43
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 5 days ago • 41
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published 9 days ago • 106
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published 9 days ago • 200
DVD: Deterministic Video Depth Estimation with Generative Priors Paper • 2603.12250 • Published Mar 12 • 26
Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization Paper • 2601.12993 • Published Jan 19 • 77
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Paper • 2601.08303 • Published Jan 13 • 19
Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface Paper • 2512.19402 • Published Dec 22, 2025 • 8
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding Paper • 2512.17532 • Published Dec 19, 2025 • 68
StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors Paper • 2512.16915 • Published Dec 18, 2025 • 38
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling Paper • 2512.15702 • Published Dec 17, 2025 • 16
UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving Paper • 2512.09864 • Published Dec 10, 2025 • 12
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 132
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Paper • 2510.19808 • Published Oct 22, 2025 • 30
GigaBrain-0: A World Model-Powered Vision-Language-Action Model Paper • 2510.19430 • Published Oct 22, 2025 • 53
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published Oct 14, 2025 • 149