Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
5
7
14
Catherine Arnett
catherinearnett
Follow
alopardo's profile picture
HasanOJ's profile picture
GorkaUrbizu's profile picture
108 followers
·
37 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
updated
a dataset
13 days ago
catherinearnett/bilingual-tokenizer-training-data
published
a dataset
13 days ago
catherinearnett/bilingual-tokenizer-training-data
liked
a dataset
23 days ago
commoncrawl/CommonLID
View all activity
Organizations
catherinearnett
's datasets
4
Sort: Recently updated
catherinearnett/bilingual-tokenizer-training-data
Viewer
•
Updated
13 days ago
•
30.7M
•
283
catherinearnett/montok
Updated
Sep 19, 2025
•
3.12k
•
3
catherinearnett/morphscore
Viewer
•
Updated
Jul 10, 2025
•
5.09M
•
286
•
4
catherinearnett/monolingual-tokenizer-data
Viewer
•
Updated
May 15, 2025
•
139M
•
150
•
1