papers
updated
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper
• 2412.09013
• Published • 13
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published • 68
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper
• 2507.13546
• Published • 126
Yume: An Interactive World Generation Model
Paper
• 2507.17744
• Published • 92
Latent Denoising Makes Good Visual Tokenizers
Paper
• 2507.15856
• Published • 12
Dynamic Reflections: Probing Video Representations with Text Alignment
Paper
• 2511.02767
• Published • 4
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
Paper
• 2511.10555
• Published • 63
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
Paper
• 2511.11793
• Published • 195
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Paper
• 2511.12609
• Published • 106
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Paper
• 2511.09611
• Published • 71
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
Paper
• 2511.13704
• Published • 44
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Paper
• 2511.12710
• Published • 39
Back to Basics: Let Denoising Generative Models Denoise
Paper
• 2511.13720
• Published • 70
Draft and Refine with Visual Experts
Paper
• 2511.11005
• Published • 3
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper
• 2511.14993
• Published • 233
Medal S: Spatio-Textual Prompt Model for Medical Segmentation
Paper
• 2511.13001
• Published • 3
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation
Paper
• 2511.16671
• Published • 16
NaTex: Seamless Texture Generation as Latent Color Diffusion
Paper
• 2511.16317
• Published • 16
Scaling Spatial Intelligence with Multimodal Foundation Models
Paper
• 2511.13719
• Published • 48
Plan-X: Instruct Video Generation via Semantic Planning
Paper
• 2511.17986
• Published • 18
SAM 3D: 3Dfy Anything in Images
Paper
• 2511.16624
• Published • 114
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper
• 2511.19365
• Published • 66
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios
Paper
• 2511.18050
• Published • 38
In-Video Instructions: Visual Signals as Generative Control
Paper
• 2511.19401
• Published • 32
Paper
• 2511.11238
• Published • 39
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
Paper
• 2511.10629
• Published • 129
SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control
Paper
• 2511.09715
• Published • 11
Generating an Image From 1,000 Words: Enhancing Text-to-Image With
Structured Captions
Paper
• 2511.06876
• Published • 28
EVTAR: End-to-End Try on with Additional Unpaired Visual Reference
Paper
• 2511.00956
• Published • 5
UniLumos: Fast and Unified Image and Video Relighting with
Physics-Plausible Feedback
Paper
• 2511.01678
• Published • 38
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
• 2504.13837
• Published • 141