mmBERT: a modern multilingual encoder Collection mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9, 2025 • 53
Falcon-H1 Collection Falcon-H1 Family of Hybrid-Head Language Models (Transformer-SSM), including 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B (pretrained & instruction-tuned). • 33 items • Updated Mar 2 • 59
Granite 4.0 Language Models Collection Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 11 items • Updated 19 days ago • 218
Falcon Edge series Collection A series of powerful, universal and fine-tunable small Language Models • 7 items • Updated Nov 6, 2025 • 25
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6, 2025 • 191
Qwen3 Collection Qwen's new Qwen3 models. In Unsloth Dynamic 2.0, GGUF, 4-bit and 16-bit Safetensor formats. Includes 128K Context Length variants. • 70 items • Updated 5 days ago • 269
BitNet Collection 🔥BitNet family of large language models (1-bit LLMs). • 7 items • Updated May 1, 2025 • 63
Granite Experiments Collection Experimental projects under consideration for the Granite family. • 26 items • Updated 5 days ago • 16
Granite 3.3 Language Models Collection Language models with improved reasoning and instruction-following capabilities. • 4 items • Updated 20 days ago • 45
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization Paper • 2502.02631 • Published Feb 4, 2025 • 4
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16, 2025 • 170