view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4, 2025 • 32
Cerebras REAP Collection Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method • 30 items • Updated Feb 25 • 135
Sparse Foundational Llama 2 Models Collection Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras • 27 items • Updated Apr 16, 2025 • 10
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models Paper • 2310.16795 • Published Oct 25, 2023 • 27