arxiv:2604.01193

Embarrassingly Simple Self-Distillation Improves Code Generation

Published on Apr 1

· Submitted by

Authors:

Abstract

Simple self-distillation improves code generation in large language models by fine-tuning on model-generated samples, effectively addressing precision-exploration trade-offs in decoding.

AI-generated summary

Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful diversity where exploration matters. Taken together, SSD offers a complementary post-training direction for improving LLM code generation.

View arXiv page View PDF GitHub 8 Add to collection

Community

urroxyz

about 7 hours ago

•

edited about 7 hours ago

This is awesome!

I wonder if some filtering could improve training metrics even beyond what you've found, such as using a cheap proxy model to disambiguate outputs as "helpful" and "not so helpful".

Correctness is not the same thing as usefulness. That is clear. So using an equivalent fewer-parameter model would be especially relevant.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2604.01193

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.01193 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.01193 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.01193 in a Space README.md to link it from this page.