JoyAI-LLM Flash
Collection
2 items
•
Updated
•
1
JoyAI-LLM Flash-Base is a state-of-the-art mixture-of-experts (MoE) language model with 3 billion activated parameters and 48 billion total parameters. Trained with the Muon optimizer, JoyAI Flash-base achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. JoyAI-LLM Flash series aim to accelarate high-throughput, latency-sensitive applications where cost per query must remain minimal.
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 48B |
| Activated Parameters | 3B |
| Number of Layers (Dense layer included) | 40 |
| Number of Dense Layers | 1 |
| Attention Hidden Dimension | 2048 |
| MoE Hidden Dimension (per Expert) | 768 |
| Number of Attention Heads | 32 |
| Number of Experts | 256 |
| Selected Experts per Token | 8 |
| Number of Shared Experts | 1 |
| Vocabulary Size | 129K |
| Context Length | 128K |
| Attention Mechanism | MLA |
| Activation Function | SwiGLU |
| Benchmark | JoyAI-LLM Flash-base | Qwen3-30B-A3B-base |
|---|---|---|
| MMLU | 84.70 | 82.12 |
| MMLU-Pro | 73.14 | 61.76 |
| CMMLU | 83.09 | 83.60 |
| HumanEval | 85.37 | 87.80 |
| LiveCodeBench | 39.91 | 37.34 |
| GSM8K | 88.78 | 90.37 |
| MATH | 78.16 | 59.60 |
| MATH 500 | 77.00 | 58.00 |
Both the code repository and the model weights are released under the Modified MIT License.