mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance
AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
Genomic Next-Token Predictors are In-Context Learners
Controlled Generation for Private Synthetic Text
models 53
jhu-clsp/mmBERT-small
Fill-Mask • Updated • 16.4k • • 66
jhu-clsp/mmBERT-base
Fill-Mask • Updated • 318k • • 199
jhu-clsp/mmBERT-checkpoints
Updated • 4
jhu-clsp/ettin-decoder-1b
Fill-Mask • Updated • 320 • 5
jhu-clsp/ettin-decoder-32m
Text Generation • Updated • 579
jhu-clsp/ettin-encoder-1b
Feature Extraction • Updated • 864 • 21
jhu-clsp/ettin-encoder-68m
Fill-Mask • Updated • 9.7k • • 4
jhu-clsp/ettin-dec-from-enc-32m
Text Generation • Updated • 4
jhu-clsp/ettin-encoder-150m
Fill-Mask • Updated • 16.9k • • 10
jhu-clsp/ettin-decoder-400m
Text Generation • Updated • 563 • 4
datasets 38
jhu-clsp/robust04-instructions
Viewer • Updated • 136k • 226 • 2
jhu-clsp/core17-instructions
Viewer • Updated • 49.4k • 307 • 2
jhu-clsp/news21-instructions
Viewer • Updated • 71.5k • 226 • 1
jhu-clsp/megawika-2
Updated • 72 • 4
jhu-clsp/mmBERT-decay-data
Updated • 5.61k • 6
jhu-clsp/mmBERT-midtraining-data
Updated • 571 • 1
jhu-clsp/ettin-pretraining-data
Updated • 3.46k • 9
jhu-clsp/ettin-decay-data
Updated • 970 • 1
jhu-clsp/astro-llms-benchmark-dataset
Viewer • Updated • 40 • 57 • 1
jhu-clsp/astro-llms-full-query-data
Viewer • Updated • 368 • 26 • 1