Data PRO
Optitransfer
ยท
AI & ML interests
None yet
Organizations
Detecting synthetic / template content in web corpora โ what signals work at scale?
#56 opened 22 days ago
by
Optitransfer
LLM training
1
#26 opened 6 months ago
by
IvanTheTerriblest
Quality scoring non-English web data -- approaches and challenges
#73 opened 22 days ago
by
Optitransfer
too much duplication in the dataset
โ 1
3
#19 opened 9 months ago
by
ShuaiAnwo
Inquiry regarding intra-document quality filtering
1
#55 opened about 2 months ago
by
AshleyLL
Swiss Web Premium -- curated multilingual corpus from .ch domains
#1 opened 22 days ago
by
Optitransfer
Swiss Web Premium -- curated multilingual corpus from .ch domains
#1 opened 22 days ago
by
Optitransfer