GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering Paper • 2603.15616 • Published 5 days ago • 5
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 5 days ago • 138
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation Paper • 2603.11647 • Published 9 days ago • 31
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation Paper • 2603.11421 • Published 10 days ago • 34
ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA Paper • 2603.10256 • Published 11 days ago • 19
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 23 days ago • 198
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published Feb 6 • 36
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published Feb 13 • 44
Unified Latents (UL): How to train your latents Paper • 2602.17270 • Published about 1 month ago • 58
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers Paper • 2602.16968 • Published about 1 month ago • 12
jina-embeddings-v5-text: Task-Targeted Embedding Distillation Paper • 2602.15547 • Published Feb 17 • 26
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published Feb 12 • 20
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published Feb 13 • 56