mistralai/Voxtral-Mini-4B-Realtime-2602 Automatic Speech Recognition • 4B • Updated 4 days ago • 445k • 684
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 238
Running Featured 1.3k FineWeb: decanting the web for the finest text data at scale 🍷 1.3k Generate a curated web‑text dataset for LLM training
Running 3.73k The Ultra-Scale Playbook 🌌 3.73k The ultimate guide to training LLM on large GPU Clusters
Running on CPU Upgrade Featured 3.03k The Smol Training Playbook 📚 3.03k The secrets to building world-class LLMs