Introduction

A compact yet capable reasoning model. Built for everyday use, even on limited hardware.

OpenSonnet-Lite

OpenSonnet-Lite is a lightweight language model fine-tuned from Qwen/Qwen3-4B-Thinking-2507, designed to deliver strong Chain-of-Thought (CoT) reasoning without demanding high-end resources. During reasoning tasks, it approaches the performance level of Claude Sonnet 4.6. A frontier commercial model, while remaining fully open weights and accessible.

One key improvement over the base model is the restoration of multi-turn reasoning. The original Qwen3-4B-Thinking-2507 loses its reasoning capability across multi-turn conversations due to chat template limitations (see Qwen's best practices). OpenSonnet-Lite addresses this directly through a corrected chat template, enabling consistent, coherent reasoning across long dialogues.

With the right prompt engineering techniques, this model also handles complex tasks with near-perfect output across several domains.

If you need a quick demo, you can try this model for free. It runs on dual T4 GPUs using Kaggle Notebooks.

Model Overview

Property Value
Architecture Causal Language Model
Total Parameters 4.0B
Non-Embedding Parameters 3.6B
Number of Layers 36
Attention Heads (GQA) 32 for Q, 8 for KV
Native Context Length 262,144 tokens

Training

Infrastructure

Resource Details
GPU NVIDIA B200 (180 GB VRAM)
Training Duration 9 hours
Estimated Cost $56.25 (Serverless)

Hyperparameters

The model was trained using supervised fine-tuning techniques with parameter-efficient methods to optimize performance while maintaining computational efficiency. Key training parameters include:

Parameter Value
Maximum Sequence Length 262,144
Per Device Training Batch Size 64
Number of Training Epochs 3

Datasets

A total of 143,335 raw samples were collected across 11 curated datasets. After filtering to remove empty rows, duplicate CoT tags, and malformed examples, 140,765 samples (~140K) were used for the final training run. All filtering is fully automated using a dedicated script to prevent human error.

Inference Parameters

Update as of 2026-05-06: These are the stable inference parameters.

For best results, the following sampling configuration is recommended:

Parameter Recommended Value Description
temperature 1.0 Controls randomness in generation
top_p 0.95 Nucleus sampling threshold
top_k 20 Top-k sampling parameter
min_p 0 Minimum probability threshold
repetition_penalty 1.0 Penalizes repeated tokens
presence_penalty 1.0 Encourages introducing new topics

Max Tokens

Small Tasks Medium Tasks Large Tasks Complex Tasks
4096/8192 16384 32768/81920 131072

Instruction

You are OpenSonnet, a large language model trained by the Open Source community. You are based on the Qwen3 architecture.

You must think concisely, clearly, quickly, and in a direct manner.

Quickstart

#pip install transformers>=4.51.0
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "hadadxyz/OpenSonnet-Lite"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
instruction = "You are OpenSonnet, a large language model trained by the Open Source community. You are based on the Qwen3 architecture.\n\nYou must think concisely, clearly, quickly, and in a direct manner."

prompt = "Hello, who are you?"

messages = [
    {"role": "system", "content": instruction},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=4096
)

output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Bias, Risks, and Hallucinations

As with any language model, users should be aware of the following limitations before deploying OpenSonnet-Lite in production or sensitive contexts.

  • Bias: This model was fine-tuned on datasets distilled from several large commercial models. Any systemic biases present in those source models. Including cultural, linguistic, or ideological tendencies, may be partially inherited. The model has not undergone dedicated bias auditing or alignment evaluation beyond standard SFT.

  • Hallucinations: OpenSonnet-Lite can and will generate plausible-sounding but factually incorrect information, particularly on niche topics, recent events, or highly specific technical domains. Extended Chain-of-Thought reasoning reduces this risk but does not eliminate it. Outputs should be verified against authoritative sources when accuracy is critical.

  • Risks: This is an open weights model with no built-in content filter or safety layer. It may produce outputs that are inappropriate, misleading, or harmful in certain contexts. Users and developers are solely responsible for implementing appropriate safeguards, usage policies, and monitoring when deploying this model in any application.

Use of this model implies acceptance of these limitations. It is intended as a research and general-purpose tool, not as a replacement for human judgment in high-stakes decisions.

Citation

If you use this model in your research or applications, please cite both this model and the base model:

@misc{opensonnet-lite,
  author = {hadadxyz},
  title  = {OpenSonnet-Lite},
  year   = {2026},
  url    = {https://huggingface.co/hadadxyz/OpenSonnet-Lite}
}

Acknowledgments

This model was made possible through the combination of multiple high-quality datasets from the community. We acknowledge and thank all dataset creators and the Qwen team for providing the excellent base model.

Downloads last month
277
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 14 Ask for provider support

Model tree for hadadxyz/OpenSonnet-Lite

Finetuned
(232)
this model
Quantizations
2 models

Datasets used to train hadadxyz/OpenSonnet-Lite

Collection including hadadxyz/OpenSonnet-Lite