Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
1
Max Schwartzapfel
PRO
dystrio
Follow
M1-Ai's profile picture
John6666's profile picture
Makemericalshappen's profile picture
5 followers
·
3 following
https://dystrio.com
AI & ML interests
Runtime-agnostic model compilation for efficient inference.
Recent Activity
reacted
to
their
post
with 🚀
about 20 hours ago
Sculpt: A compression ladder for Qwen 3.5 9B These were designed to improve deployment efficiency while preserving full dense-transformer compatibility. Across the ladder we observe: * ~2–10% checkpoint size reduction * up to 25–27% faster prefill at higher compression tiers * decode throughput roughly unchanged Because Sculpt operates before quantization, these structural reductions compound with GPTQ, AWQ, GGUF, and INT8/INT4 pipelines, shifting deployment memory thresholds without changing infrastructure assumptions. The release includes four checkpoints: -Default (kf = 0.95)** — fidelity-preserving baseline replacement -Production (kf = 0.90)** — balanced serving tier -Throughput (kf = 0.88)** — latency-optimized -Experimental (kf = 0.82)** — aggressive compression boundary All checkpoints load directly with Transformers, vLLM, TGI, and GGUF workflows. No custom runtime or kernels required. The goal of Sculpt is simple: explore how far structured FFN compression plus teacher-guided distillation can improve efficiency while remaining a drop-in dense replacement. Models: - [Qwen3.5-9B-Sculpt-Default](https://huggingface.co/dystrio/Qwen3.5-9B-Sculpt-Default) - [Qwen3.5-9B-Sculpt-Production](https://huggingface.co/dystrio/Qwen3.5-9B-Sculpt-Production) - [Qwen3.5-9B-Sculpt-Throughput](https://huggingface.co/dystrio/Qwen3.5-9B-Sculpt-Throughput) - [Qwen3.5-9B-Sculpt-Experimental](https://huggingface.co/dystrio/Qwen3.5-9B-Sculpt-Experimental) Curious which tradeoff tier people would choose in practice for serving workloads. We want to Sculpt models that work for you. If you give us you workload, we can make the sculpted models meet your SLO's.
posted
an
update
4 days ago
Sculpt: A compression ladder for Qwen 3.5 9B These were designed to improve deployment efficiency while preserving full dense-transformer compatibility. Across the ladder we observe: * ~2–10% checkpoint size reduction * up to 25–27% faster prefill at higher compression tiers * decode throughput roughly unchanged Because Sculpt operates before quantization, these structural reductions compound with GPTQ, AWQ, GGUF, and INT8/INT4 pipelines, shifting deployment memory thresholds without changing infrastructure assumptions. The release includes four checkpoints: -Default (kf = 0.95)** — fidelity-preserving baseline replacement -Production (kf = 0.90)** — balanced serving tier -Throughput (kf = 0.88)** — latency-optimized -Experimental (kf = 0.82)** — aggressive compression boundary All checkpoints load directly with Transformers, vLLM, TGI, and GGUF workflows. No custom runtime or kernels required. The goal of Sculpt is simple: explore how far structured FFN compression plus teacher-guided distillation can improve efficiency while remaining a drop-in dense replacement. Models: - [Qwen3.5-9B-Sculpt-Default](https://huggingface.co/dystrio/Qwen3.5-9B-Sculpt-Default) - [Qwen3.5-9B-Sculpt-Production](https://huggingface.co/dystrio/Qwen3.5-9B-Sculpt-Production) - [Qwen3.5-9B-Sculpt-Throughput](https://huggingface.co/dystrio/Qwen3.5-9B-Sculpt-Throughput) - [Qwen3.5-9B-Sculpt-Experimental](https://huggingface.co/dystrio/Qwen3.5-9B-Sculpt-Experimental) Curious which tradeoff tier people would choose in practice for serving workloads. We want to Sculpt models that work for you. If you give us you workload, we can make the sculpted models meet your SLO's.
published
a model
4 days ago
dystrio/Qwen3.5-9B-Sculpt-Throughput
View all activity
Organizations
None yet
dystrio
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
10 days ago
dystrio/Qwen2.5-7B-Instruct-sculpt-throughput
Text Generation
•
5B
•
Updated
11 days ago
•
372
•
1
Load more