Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 30 days ago • 102
Solaris: Building a Multiplayer Video World Model in Minecraft Paper • 2602.22208 • Published Feb 25 • 28
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 55
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 55
Cambrian-S-Data Collection Data used during Cambrian-S's 4-stage training • 4 items • Updated Feb 27 • 5
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts Paper • 2511.04655 • Published Nov 6, 2025 • 10
VSI-SUPER Collection VSI-SUPER benchmark proposed in Cambrian-S • 2 items • Updated about 1 month ago • 3
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts Paper • 2511.04655 • Published Nov 6, 2025 • 10
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts Paper • 2511.04655 • Published Nov 6, 2025 • 10 • 2
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 181
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 182
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 170
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26, 2025 • 189