HuggingFaceFW/fineweb-edu
Viewer
• Updated
• 3.5B • 221k
• 969
mlfoundations/dclm-baseline-1.0
Preview
• Updated
• 119k
• 255
Viewer
• Updated
• 4.48B • 79.5k
• 762
Note only multimodal data =(
Viewer
• Updated
• 48.3M • 9.45k
• 353
Viewer
• Updated
• 5.45B • 8.68k
• 514
Note Don't have directly text =(
HuggingFaceTB/issues-kaggle-notebooks
Viewer
• Updated
• 16.1M • 223
• 14
Note only 500k rows
Viewer
• Updated
• 7.89M • 15.9k
• 184
Note 1.6M rows with web-0.5-to-1.0
Locutusque/UltraTextbooks
Viewer
• Updated
• 5.52M • 1.68k
• 198
tokyotech-llm/swallow-math-v2
Viewer
• Updated
• 17.4M • 4.91k
• 27
tokyotech-llm/swallow-code-v2
Viewer
• Updated
• 147M • 174k
• 33
HuggingFaceFW/finepdfs-edu
Viewer
• Updated
• 49.5M • 4.19k
• 80
HuggingFaceTB/smollm-corpus
Viewer
• Updated
• 237M • 30.3k
• 439