Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 3 days ago • 23
view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 • 78
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 128
view article Article OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve May 20, 2025 • 64
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective Jan 27 • 69
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 276
view article Article An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct Jun 11, 2024 • 68
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3, 2025 • 342
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? +2 Jul 23, 2025 • 48
view article Article ⚡ nano-vLLM: Lightweight, Low-Latency LLM Inference from Scratch Jun 28, 2025 • 38
view article Article Introducing Cosmos Predict-2: A Foundation For Your Own World Model Jun 17, 2025 • 9