Looking for latency benchmarks/results for the model

#1
by HaneetArya - opened

Are there any published benchmarks or has anyone run their own tests they could share?

distil labs org

Hey, sorry for the delay. You can see the benchmarks in our github repo: https://github.com/distil-labs/Distil-PII .

Overall we have found that finetuned models conform to the JSON schema, stop hallucinating extra entities, handle obfuscated inputs and numbers (while keeping last-4), and preserve non-PII operational tokens. Performance lifts are large across sizes resulting in the 1B and 3B students are on par (within one standard deviation) with a 680B+ LLM judge baseline. SmolLM2 is surprisingly resistant to training, but we are still releasing it for the sake of completeness.

Model name # parameters LLM as a judge metric
Deepseek 3.1 (untrained) 685B 0.84 +/- 0.03
Llama-3.2-3B-Instruct 3B 0.82 +/- 0.03
Llama-3.2-1B-Instruct 1B 0.81 +/- 0.02
gemma-3-270m-it 270M 0.73 +/- 0.07
SmolLM2-135M-Instruct 135M 0.25 +/- 0.05
distil labs org

Hey, we missed the question about latency in the title. The latency depends on your hardware - we provide the pre-trained model checkpoints, so you can easily evaluate them yourself. In general the small models will be much faster than the 600B model but the details depend on your setup.

Sign up or log in to comment