view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention Oct 7, 2024 • 69
view article Article You could have designed state of the art positional encoding Nov 25, 2024 • 455
Running on CPU Upgrade Featured 3.08k The Smol Training Playbook 📚 3.08k The secrets to building world-class LLMs
Running 3.76k The Ultra-Scale Playbook 🌌 3.76k The ultimate guide to training LLM on large GPU Clusters
nvidia/Llama-Nemotron-Post-Training-Dataset Viewer • Updated May 8, 2025 • 3.91M • 3.72k • 647