fulandiege 's Collections
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper
• 2512.13687
• Published • 106
MMGR: Multi-Modal Generative Reasoning
Paper
• 2512.14691
• Published • 121
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper
• 2512.23447
• Published • 98
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper
• 2512.23576
• Published • 66
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published • 318
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
• 2512.24618
• Published • 152
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation
Paper
• 2512.24551
• Published • 21
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper
• 2512.24873
• Published • 108
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper
• 2512.23959
• Published • 112
Agent Learning via Early Experience
Paper
• 2510.08558
• Published • 275
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper
• 2601.02151
• Published • 113
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
Paper
• 2601.05432
• Published • 169
MMFormalizer: Multimodal Autoformalization in the Wild
Paper
• 2601.03017
• Published • 106
Qwen3-VL Technical Report
Paper
• 2511.21631
• Published • 161
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper
• 2512.08765
• Published • 134
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published • 511
Advancing Open-source World Models
Paper
• 2601.20540
• Published • 132
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper
• 2602.05400
• Published • 349
ASA: Training-Free Representation Engineering for Tool-Calling Agents
Paper
• 2602.04935
• Published • 41
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
Paper
• 2602.10604
• Published • 193
Kimi K2.5: Visual Agentic Intelligence
Paper
• 2602.02276
• Published • 259
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
Paper
• 2602.12617
• Published • 20
MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
Paper
• 2602.12705
• Published • 65
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
Paper
• 2602.10809
• Published • 57
SLA2: Sparse-Linear Attention with Learnable Routing and QAT
Paper
• 2602.12675
• Published • 57
Unified Latents (UL): How to train your latents
Paper
• 2602.17270
• Published • 57
Code2World: A GUI World Model via Renderable Code Generation
Paper
• 2602.09856
• Published • 201
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
Paper
• 2602.16855
• Published • 50
World Craft: Agentic Framework to Create Visualizable Worlds via Text
Paper
• 2601.09150
• Published • 19
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper
• 2602.10693
• Published • 220
Does Your Reasoning Model Implicitly Know When to Stop Thinking?
Paper
• 2602.08354
• Published • 262
A Very Big Video Reasoning Suite
Paper
• 2602.20159
• Published • 515
On Data Engineering for Scaling LLM Terminal Capabilities
Paper
• 2602.21193
• Published • 98
Query-focused and Memory-aware Reranker for Long Context Processing
Paper
• 2602.12192
• Published • 57
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation
Paper
• 2602.18283
• Published • 56
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
Paper
• 2602.21534
• Published • 23
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Paper
• 2602.24286
• Published • 90
dLLM: Simple Diffusion Language Modeling
Paper
• 2602.22661
• Published • 132
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing
Paper
• 2603.03143
• Published • 139
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
Paper
• 2601.18491
• Published • 125
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement
Learning
Paper
• 2510.23473
• Published • 86
Experiential Reinforcement Learning
Paper
• 2602.13949
• Published • 71