view article Article Scaling Pedagogical Pre-training: From Optimal Mixing to 10 Billion Tokens 15 days ago โข 4