AURA: Always-On Understanding and Real-Time Assistance via Video Streams Paper • 2604.04184 • Published 7 days ago • 43
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning Paper • 2510.14958 • Published Oct 16, 2025 • 23
view reply May I ask why is dp_world_size realted to tp_size? Based on my understanding, it should be just dp_world_size = dp_shard_size * dp_replicate_size. Or am I missing some points here?
Running 3.77k The Ultra-Scale Playbook 🌌 3.77k The ultimate guide to training LLM on large GPU Clusters