view article Article Llasa Goes RL: Training LLaSA with GRPO for Improved Prosody and Expressiveness Nov 5, 2025 • 12
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand Dec 4, 2025 • 65
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 234
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published Dec 12, 2025 • 22
Running 108 The Eiffel Tower Llama 📝 108 Explore the Eiffel Tower Llama experiment with open-source models
Running 89 Unlocking On-Policy Distillation for Any Model Family 📝 89 Visualize on-policy distillation for any model family