Introduction
A compact yet capable reasoning model. Built for everyday use, even on limited hardware.
OpenSonnet-Lite
OpenSonnet-Lite is a lightweight language model fine-tuned from Qwen/Qwen3-4B-Thinking-2507, designed to deliver strong Chain-of-Thought (CoT) reasoning without demanding high-end resources. During reasoning tasks, it approaches the performance level of Claude Sonnet 4.6. A frontier commercial model, while remaining fully open weights and accessible.
One key improvement over the base model is the restoration of multi-turn reasoning. The original Qwen3-4B-Thinking-2507 loses its reasoning capability across multi-turn conversations due to chat template limitations (see Qwen's best practices). OpenSonnet-Lite addresses this directly through a corrected chat template, enabling consistent, coherent reasoning across long dialogues.
With the right prompt engineering techniques, this model also handles complex tasks with near-perfect output across several domains.
If you need a quick demo, you can try this model for free. It runs on dual T4 GPUs using Kaggle Notebooks.
Model Overview
| Property | Value |
|---|---|
| Architecture | Causal Language Model |
| Total Parameters | 4.0B |
| Non-Embedding Parameters | 3.6B |
| Number of Layers | 36 |
| Attention Heads (GQA) | 32 for Q, 8 for KV |
| Native Context Length | 262,144 tokens |
Training
Infrastructure
| Resource | Details |
|---|---|
| GPU | NVIDIA B200 (180 GB VRAM) |
| Training Duration | 9 hours |
| Estimated Cost | $56.25 (Serverless) |
Hyperparameters
The model was trained using supervised fine-tuning techniques with parameter-efficient methods to optimize performance while maintaining computational efficiency. Key training parameters include:
| Parameter | Value |
|---|---|
| Maximum Sequence Length | 262,144 |
| Per Device Training Batch Size | 64 |
| Number of Training Epochs | 3 |
Datasets
A total of 143,335 raw samples were collected across 11 curated datasets. After filtering to remove empty rows, duplicate CoT tags, and malformed examples, 140,765 samples (~140K) were used for the final training run. All filtering is fully automated using a dedicated script to prevent human error.
| # | Dataset | Raw Samples | After Filtering |
|---|---|---|---|
| 1 | Roman1111111/claude-sonnet-4.6-100000X-filtered | 108,978 | 106,552 |
| 2 | TeichAI/lordx64-claude-opus-4.7-max-cleaned | 4,807 | 4,807 |
| 3 | Crownelius/Opus-4.6-Reasoning-3300x | 2,160 | 2,053 |
| 4 | TeichAI/claude-4.5-opus-high-reasoning-250x | 250 | 250 |
| 5 | TeichAI/claude-haiku-4.5-high-reasoning-1700x | 1,688 | 1,688 |
| 6 | TeichAI/claude-sonnet-4.5-high-reasoning-250x | 247 | 247 |
| 7 | TeichAI/deepseek-v3.2-speciale-openr1-math-3k | 3,317 | 3,317 |
| 8 | TeichAI/deepseek-v3.2-speciale-1000x | 991 | 975 |
| 9 | Roman1111111/gemini-3-pro-10000x-hard-high-reasoning | 10,031 | 10,010 |
| 10 | Roman1111111/gemini-3.1-pro-hard-high-reasoning | 3,150 | 3,150 |
| 11 | Jackrong/DeepSeek-V4-Distill-8000x | 7,716 | 7,716 |
| Total | 143,335 | 140,765 |
Inference Parameters
Update as of 2026-05-06: These are the stable inference parameters.
For best results, the following sampling configuration is recommended:
| Parameter | Recommended Value | Description |
|---|---|---|
| temperature | 1.0 | Controls randomness in generation |
| top_p | 0.95 | Nucleus sampling threshold |
| top_k | 20 | Top-k sampling parameter |
| min_p | 0 | Minimum probability threshold |
| repetition_penalty | 1.0 | Penalizes repeated tokens |
| presence_penalty | 1.0 | Encourages introducing new topics |
Max Tokens
| Small Tasks | Medium Tasks | Large Tasks | Complex Tasks |
|---|---|---|---|
| 4096/8192 | 16384 | 32768/81920 | 131072 |
Instruction
You are OpenSonnet, a large language model trained by the Open Source community. You are based on the Qwen3 architecture.
You must think concisely, clearly, quickly, and in a direct manner.
Quickstart
#pip install transformers>=4.51.0
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "hadadxyz/OpenSonnet-Lite"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
instruction = "You are OpenSonnet, a large language model trained by the Open Source community. You are based on the Qwen3 architecture.\n\nYou must think concisely, clearly, quickly, and in a direct manner."
prompt = "Hello, who are you?"
messages = [
{"role": "system", "content": instruction},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=4096
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content)
print("content:", content)
Bias, Risks, and Hallucinations
As with any language model, users should be aware of the following limitations before deploying OpenSonnet-Lite in production or sensitive contexts.
Bias: This model was fine-tuned on datasets distilled from several large commercial models. Any systemic biases present in those source models. Including cultural, linguistic, or ideological tendencies, may be partially inherited. The model has not undergone dedicated bias auditing or alignment evaluation beyond standard SFT.
Hallucinations: OpenSonnet-Lite can and will generate plausible-sounding but factually incorrect information, particularly on niche topics, recent events, or highly specific technical domains. Extended Chain-of-Thought reasoning reduces this risk but does not eliminate it. Outputs should be verified against authoritative sources when accuracy is critical.
Risks: This is an open weights model with no built-in content filter or safety layer. It may produce outputs that are inappropriate, misleading, or harmful in certain contexts. Users and developers are solely responsible for implementing appropriate safeguards, usage policies, and monitoring when deploying this model in any application.
Use of this model implies acceptance of these limitations. It is intended as a research and general-purpose tool, not as a replacement for human judgment in high-stakes decisions.
Citation
If you use this model in your research or applications, please cite both this model and the base model:
@misc{opensonnet-lite,
author = {hadadxyz},
title = {OpenSonnet-Lite},
year = {2026},
url = {https://huggingface.co/hadadxyz/OpenSonnet-Lite}
}
Acknowledgments
This model was made possible through the combination of multiple high-quality datasets from the community. We acknowledge and thank all dataset creators and the Qwen team for providing the excellent base model.
- Downloads last month
- 277