mmBERT: a modern multilingual encoder Collection mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance โข 16 items โข Updated Sep 9, 2025 โข 53