Buckets:

HCAI-Lab/dolma3-6t-sample-5000-docs / sample_contract.json
download
raw
483 Bytes
{
"WORKING_SAMPLE_TOKEN_FLOOR_PER_BIN": 0,
"WORKING_SAMPLE_DOCS_PER_BIN": 5000,
"WORKING_SAMPLE_GLOBAL_TOKEN_BUDGET": null,
"WORKING_SAMPLE_MIN_TOKEN_COUNT": 512,
"WORKING_SAMPLE_MAX_TOKEN_COUNT": null,
"WORKING_SAMPLE_REALIZED_TOKEN_TOTAL": 6923659307,
"WORKING_SAMPLE_REALIZED_DOC_COUNT": 2844282,
"WORKING_SAMPLE_UNDERFILLED_BIN_COUNT": 15,
"WORKING_SAMPLE_COVERED_BIN_COUNT": 576,
"WORKING_SAMPLE_TOTAL_BIN_COUNT": 576,
"WORKING_SAMPLE_SAMPLING_SEED": 42
}

Xet Storage Details

Size:
483 Bytes
·
Xet hash:
8142a6de21cbb4665f26b653ab9375b7957d582e69551e070e5c3ef443bb448d

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.