Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights Paper • 2603.12228 • Published 8 days ago • 11
Efficient Memory Management for Large Language Model Serving with PagedAttention Paper • 2309.06180 • Published Sep 12, 2023 • 46
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs Paper • 2410.16144 • Published Oct 21, 2024 • 5