OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published 5 days ago • 68
TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification Paper • 2604.14531 • Published 1 day ago • 5
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 3 days ago • 58
MOSS-Audio Collection An open-source audio understanding model supporting speech recognition, environmental sound analysis, music understanding, time-aware QA, and complex • 5 items • Updated 1 day ago • 35
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 3 days ago • 129
Geometric Context Transformer for Streaming 3D Reconstruction Paper • 2604.14141 • Published 3 days ago • 2
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published 5 days ago • 133
ERNIE-Image Collection The serieas of image generation models, including text2img、img2img. • 2 items • Updated 3 days ago • 18
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 9 days ago • 255
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 10 days ago • 182
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Paper • 2505.22019 • Published May 28, 2025 • 12
TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders Paper • 2604.07340 • Published 10 days ago • 16