UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Paper • 2603.23500 • Published 6 days ago • 35
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens Paper • 2603.19232 • Published 11 days ago • 33
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 27 days ago • 102
Mode Seeking meets Mean Seeking for Fast Long Video Generation Paper • 2602.24289 • Published Feb 27 • 41
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Paper • 2602.19163 • Published Feb 22 • 14
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published Feb 13 • 44
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss Paper • 2602.02493 • Published Feb 2 • 46
One-step Latent-free Image Generation with Pixel Mean Flows Paper • 2601.22158 • Published Jan 29 • 18
Revisiting Diffusion Model Predictions Through Dimensionality Paper • 2601.21419 • Published Jan 29 • 4
Towards Pixel-Level VLM Perception via Simple Points Prediction Paper • 2601.19228 • Published Jan 27 • 18
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation Paper • 2601.15369 • Published Jan 21 • 21
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 55
Bidirectional Normalizing Flow: From Data to Noise and Back Paper • 2512.10953 • Published Dec 11, 2025 • 7