Need this for your organization?
Need dedicated governance and shared quotas? A Hugging Face Enterprise plan includes storage access at scale.
AI-native object storage designed for scale, speed, and team workflows.
Store models, datasets, and artifacts with simple per-TB pricing. Xet deduplication. Included CDN. No git overhead.
Per-TB pricing with built-in CDN and deduplication speedups.
No Git constraints: commit-free sync and fast object updates.
Designed for ML workflows: datasets, checkpoints, model artifacts.
Xet uses content-defined chunking to break files into byte-level chunks and deduplicates across your entire bucket. When you retrain a model and only 5% of weights change, only that 5% is re-uploaded.
Raw + processed dataset: stored once, billed once*
4x less data per upload, verified with real-world workloads
*Requires Enterprise or Enterprise Plus plan
Traditional S3 Upload
8 / 8 chunks uploaded
XET Deduplicated Upload
1 / 8 chunks uploaded
Gray = already stored · Purple = only the changed chunk
Simple per-TB pricing that scales with usage. Egress and CDN are included at no extra cost.
Pour raw data from every source into a single bucket: crawls, annotations, synthetic outputs, partner datasets. No git overhead, no commit queues, no file-count limits. When training begins, your data is already there, streamed to GPUs via the included CDN.
Immediate availability on upload, no queued commits
Batch API processes thousands of files in a single call
Raw + processed datasets with dedup = no double billing*
*Requires Enterprise or Enterprise Plus plan
crawl-2026-jan/
48 TB · 2.1M files
annotations-v3/
12 TB · 890K files
synthetic-pairs/
6 TB · 340K files
Xet dedup: 66 TB stored → billed for 41 TB*
Every bucket includes a CDN. Warm localized cache close to where you compute for ultra fast streaming and downloads. Egress is included up to a generous 8:1 ratio of your total storage.
Pre-warm cache in any cloud region you need
Our CDN is deployed inside GCP and AWS networks
Egress included up to 8:1 your storage
Coding agents run in ephemeral environments, but their outputs shouldn't vanish. Checkpoints, benchmark
results, generated datasets: one hf sync command in your agent's bash tool is all it takes.
Pre-warmed CDN and no git overhead for fast reads and writes
Persist artifacts across ephemeral CI runs and terminal sessions
Install the official HF CLI skill and your agent knows every command
Start with buckets, sync your AI data, and unlock object storage built for ML workflows.
Need dedicated governance and shared quotas? A Hugging Face Enterprise plan includes storage access at scale.