view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 159
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 itazap, ariG23498, ArthurZ, sergiopaniego, merve, pcuenq • Dec 18, 2025 • 124
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 lysandre, ArthurZ, cyrilvallez, reach-vb • Dec 1, 2025 • 310
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 379
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 187
view article Article The Transformers Library: standardizing model definitions +2 lysandre, ArthurZ, pcuenq, julien-c • May 15, 2025 • 121
view article Article Fixing Gradient Accumulation +4 lysandre, ArthurZ, muellerzr, ydshieh, BenjaminB, pcuenq • Oct 16, 2024 • 66
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 RQlee, ArthurZ, achikundu, lwtr, rganti, mayank-mishra • Aug 21, 2024 • 41
view article Article Fine-Tuning Gemma Models in Hugging Face +2 svaibhav, alanwaketan, ybelkada, ArthurZ • Feb 23, 2024 • 46
view article Article Code Llama: Llama 2 learns to code +6 philschmid, osanseviero, pcuenq, lewtun, lvwerra, loubnabnl, ArthurZ, joaogante • Aug 25, 2023 • 10
view article Article Code Llama: Llama 2 learns to code +6 philschmid, osanseviero, pcuenq, lewtun, lvwerra, loubnabnl, ArthurZ, joaogante • Aug 25, 2023 • 10