Looking for latency benchmarks/results for the model

by HaneetArya - opened Jan 7

Discussion

HaneetArya

Jan 7

Are there any published benchmarks or has anyone run their own tests they could share?

distillabs

distil labs org Jan 9

Hey, sorry for the delay. You can see the benchmarks in our github repo: https://github.com/distil-labs/Distil-PII .

Overall we have found that finetuned models conform to the JSON schema, stop hallucinating extra entities, handle obfuscated inputs and numbers (while keeping last-4), and preserve non-PII operational tokens. Performance lifts are large across sizes resulting in the 1B and 3B students are on par (within one standard deviation) with a 680B+ LLM judge baseline. SmolLM2 is surprisingly resistant to training, but we are still releasing it for the sake of completeness.

Model name	# parameters	LLM as a judge metric
Deepseek 3.1 (untrained)	685B	0.84 +/- 0.03
Llama-3.2-3B-Instruct	3B	0.82 +/- 0.03
Llama-3.2-1B-Instruct	1B	0.81 +/- 0.02
gemma-3-270m-it	270M	0.73 +/- 0.07
SmolLM2-135M-Instruct	135M	0.25 +/- 0.05

distillabs

distil labs org Jan 12

Hey, we missed the question about latency in the title. The latency depends on your hardware - we provide the pre-trained model checkpoints, so you can easily evaluate them yourself. In general the small models will be much faster than the 600B model but the details depend on your setup.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment