Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Buckets:
HCAI-Lab
/
dolma3-6t-sample-5000-docs
Follow
Human-Centered AI Lab
9
Files
xet
HCAI-Lab/dolma3-6t-sample-5000-docs
/
worker_0013
11.1 GB
56,043 files
Updated about 1 month ago
Ctrl+K
Name
Size
Uploaded
Xet hash
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000383.jsonl.zst
112 kB
xet
about 1 month ago
3f62e259
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000384.jsonl.zst
247 kB
xet
about 1 month ago
eb230c6b
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000385.jsonl.zst
119 kB
xet
about 1 month ago
c35013b4
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000386.jsonl.zst
312 kB
xet
about 1 month ago
46b58328
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000387.jsonl.zst
194 kB
xet
about 1 month ago
c2cf6c99
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000388.jsonl.zst
187 kB
xet
about 1 month ago
a5ab2fe2
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000389.jsonl.zst
174 kB
xet
about 1 month ago
02015445
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000390.jsonl.zst
243 kB
xet
about 1 month ago
a1f57c6c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000391.jsonl.zst
156 kB
xet
about 1 month ago
37fdd059
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000392.jsonl.zst
215 kB
xet
about 1 month ago
f6f71c3c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000393.jsonl.zst
193 kB
xet
about 1 month ago
ad1c1df0
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000394.jsonl.zst
156 kB
xet
about 1 month ago
80785528
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000395.jsonl.zst
274 kB
xet
about 1 month ago
169b5579
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000396.jsonl.zst
182 kB
xet
about 1 month ago
1a1e1a53
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000397.jsonl.zst
174 kB
xet
about 1 month ago
7ce8776c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000398.jsonl.zst
145 kB
xet
about 1 month ago
488b2614
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000399.jsonl.zst
155 kB
xet
about 1 month ago
e52cae54
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000400.jsonl.zst
206 kB
xet
about 1 month ago
e391c539
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000401.jsonl.zst
207 kB
xet
about 1 month ago
d83f00d6
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000402.jsonl.zst
188 kB
xet
about 1 month ago
77a48606
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000403.jsonl.zst
278 kB
xet
about 1 month ago
fb7f81a7
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000404.jsonl.zst
153 kB
xet
about 1 month ago
3d6e57f5
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000405.jsonl.zst
167 kB
xet
about 1 month ago
ca7f98c6
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000406.jsonl.zst
275 kB
xet
about 1 month ago
e52eec1f
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000407.jsonl.zst
248 kB
xet
about 1 month ago
bba5250d
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000408.jsonl.zst
165 kB
xet
about 1 month ago
2833e943
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000409.jsonl.zst
138 kB
xet
about 1 month ago
9a9e71b5
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000410.jsonl.zst
121 kB
xet
about 1 month ago
4ade49fb
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000411.jsonl.zst
157 kB
xet
about 1 month ago
94ac67de
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000412.jsonl.zst
178 kB
xet
about 1 month ago
6f95da7e
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000413.jsonl.zst
342 kB
xet
about 1 month ago
1190e2e7
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000414.jsonl.zst
188 kB
xet
about 1 month ago
ddf0e66c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000415.jsonl.zst
129 kB
xet
about 1 month ago
e142fe09
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000416.jsonl.zst
252 kB
xet
about 1 month ago
2da845e3
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000417.jsonl.zst
139 kB
xet
about 1 month ago
79717d01
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000418.jsonl.zst
213 kB
xet
about 1 month ago
6567f5d7
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000419.jsonl.zst
176 kB
xet
about 1 month ago
2fd5eeaf
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000420.jsonl.zst
162 kB
xet
about 1 month ago
b9ded923
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000421.jsonl.zst
159 kB
xet
about 1 month ago
d0ac39ad
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000422.jsonl.zst
205 kB
xet
about 1 month ago
d2f4fe5f
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000423.jsonl.zst
219 kB
xet
about 1 month ago
ec1c6632
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000424.jsonl.zst
141 kB
xet
about 1 month ago
4cd2e301
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000425.jsonl.zst
220 kB
xet
about 1 month ago
98482744
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000426.jsonl.zst
243 kB
xet
about 1 month ago
ca98ef5b
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000427.jsonl.zst
156 kB
xet
about 1 month ago
e7f2970c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000428.jsonl.zst
213 kB
xet
about 1 month ago
6218d8bc
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000429.jsonl.zst
123 kB
xet
about 1 month ago
9b2106bb
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000430.jsonl.zst
119 kB
xet
about 1 month ago
3f4df0c8
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000431.jsonl.zst
190 kB
xet
about 1 month ago
3eab43b5
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000432.jsonl.zst
189 kB
xet
about 1 month ago
d0fe25da
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000433.jsonl.zst
200 kB
xet
about 1 month ago
47f548de
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000434.jsonl.zst
217 kB
xet
about 1 month ago
1438123c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000435.jsonl.zst
245 kB
xet
about 1 month ago
a8202f15
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000436.jsonl.zst
233 kB
xet
about 1 month ago
d09f35a3
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000437.jsonl.zst
240 kB
xet
about 1 month ago
a8ece1c5
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000438.jsonl.zst
217 kB
xet
about 1 month ago
ba8d2aa3
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000439.jsonl.zst
121 kB
xet
about 1 month ago
14fb973e
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000440.jsonl.zst
276 kB
xet
about 1 month ago
9f25cf6a
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000441.jsonl.zst
206 kB
xet
about 1 month ago
793d9e7c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000442.jsonl.zst
139 kB
xet
about 1 month ago
6abd5987
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000443.jsonl.zst
163 kB
xet
about 1 month ago
f1ff3a2a
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000444.jsonl.zst
193 kB
xet
about 1 month ago
cddd2e9b
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000445.jsonl.zst
219 kB
xet
about 1 month ago
756a6c3a
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000446.jsonl.zst
159 kB
xet
about 1 month ago
366bdbbc
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000447.jsonl.zst
160 kB
xet
about 1 month ago
fac90522
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000448.jsonl.zst
139 kB
xet
about 1 month ago
dc026220
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000449.jsonl.zst
142 kB
xet
about 1 month ago
d042a0de
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000450.jsonl.zst
179 kB
xet
about 1 month ago
91391c39
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000451.jsonl.zst
161 kB
xet
about 1 month ago
bb479311
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000452.jsonl.zst
244 kB
xet
about 1 month ago
c58da7c6
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000453.jsonl.zst
217 kB
xet
about 1 month ago
708e597a
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000454.jsonl.zst
159 kB
xet
about 1 month ago
a2152ab5
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000455.jsonl.zst
161 kB
xet
about 1 month ago
d715811e
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000456.jsonl.zst
184 kB
xet
about 1 month ago
c1deebfa
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000457.jsonl.zst
112 kB
xet
about 1 month ago
c616940c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000458.jsonl.zst
162 kB
xet
about 1 month ago
5641f1cf
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000459.jsonl.zst
126 kB
xet
about 1 month ago
45655016
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000460.jsonl.zst
280 kB
xet
about 1 month ago
2a02ed74
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000461.jsonl.zst
198 kB
xet
about 1 month ago
4ec38a37
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000462.jsonl.zst
247 kB
xet
about 1 month ago
a79162be
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000463.jsonl.zst
172 kB
xet
about 1 month ago
c7a31d68
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000464.jsonl.zst
116 kB
xet
about 1 month ago
1a6b8dad
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000465.jsonl.zst
144 kB
xet
about 1 month ago
2506f3e7
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000466.jsonl.zst
173 kB
xet
about 1 month ago
cd40451a
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000467.jsonl.zst
201 kB
xet
about 1 month ago
3bea2325
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000468.jsonl.zst
222 kB
xet
about 1 month ago
42068025
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000469.jsonl.zst
189 kB
xet
about 1 month ago
7a5eea4f
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000470.jsonl.zst
136 kB
xet
about 1 month ago
58adad8a
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000471.jsonl.zst
183 kB
xet
about 1 month ago
50e79f1f
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000472.jsonl.zst
160 kB
xet
about 1 month ago
7d19ef4c
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000473.jsonl.zst
198 kB
xet
about 1 month ago
c3182d10
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000474.jsonl.zst
346 kB
xet
about 1 month ago
9280eb39
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000475.jsonl.zst
176 kB
xet
about 1 month ago
afc8ce59
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000476.jsonl.zst
190 kB
xet
about 1 month ago
993bf3ff
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000477.jsonl.zst
151 kB
xet
about 1 month ago
1fc5c748
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000478.jsonl.zst
231 kB
xet
about 1 month ago
23be34db
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000479.jsonl.zst
169 kB
xet
about 1 month ago
9713f717
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000480.jsonl.zst
185 kB
xet
about 1 month ago
9096d44d
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000481.jsonl.zst
196 kB
xet
about 1 month ago
62fdf31d
soc127__phase1_pool_shared__common_crawl__part_001__data__common_crawl-entertainment-0018__shard_00000482.jsonl.zst
200 kB
xet
about 1 month ago
7824b062
Load more
Use this bucket
Total size
11.1 GB
Files
56,043
Last updated
Mar 24
Pre-warmed CDN
US
EU
US
EU
Contributors