OmniCoder-2-9B

The second generation of OmniCoder. Faster reasoning, dramatically less repetition, completely rebuilt training pipeline.

What's New in v2

OmniCoder-2-9B is a ground-up rebuild of OmniCoder-9B, not a continuation. The entire training pipeline was redesigned to fix the core issues users reported with v1:

No more repetition loops. v1 trained on ALL tokens (system prompts, tool outputs, templates), which taught the model to reproduce repetitive boilerplate. v2 trains only on assistant tokens. The model never learns to parrot templates, so it stays coherent through long multi-turn conversations.
Faster, more focused reasoning. v1's <think> blocks were often bloated and circular. v2 produces tighter reasoning chains that get to the point faster because it only learned from the actual reasoning, not the scaffolding around it.
Much more stable in agentic loops. v1 would sometimes get stuck in repetitive tool-call cycles. v2 handles multi-step agentic tasks cleanly. It knows when to stop, when to switch tools, and when to give a final answer.
Rebuilt training pipeline. Switched from all-token cosine-schedule training to assistant-only constant-LR training (based on Schulman's "LoRA Without Regret"). The model converges faster and doesn't suffer from premature LR decay that killed v1's learning.

TL;DR: OmniCoder-2-9B fixes the repetition and instability issues from v1. Same weights, same architecture, completely different training approach.

Benchmarks

Benchmark	OmniCoder-2-9B	OmniCoder-9B	Qwen3.5-9B	GPT-OSS-120B	GLM 4.7	Claude Haiku 4.5
AIME 2025 (pass@5)	90	90	91.6
GPQA Diamond (pass@1)	83	83.8	81.7	71.5		73
GPQA Diamond (pass@3)	86	86.4
Terminal-Bench 2.0	25.8	23.6	14.6		33.4	27

Key Results

GPQA Diamond pass@1: 83% (164/198). On par with OmniCoder-9B (83.8%) and Qwen3.5-9B base (81.7%). Pass@3 improves to 86%.
Terminal-Bench 2.0: 25.8% (23/89). +2.2 points over OmniCoder-9B (23.6%), +76% over Qwen3.5-9B base (14.6%).
AIME 2025 pass@5: 90% (27/30). Parity with OmniCoder-9B.

Quickstart

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Tesslate/OmniCoder-2-9B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to find the longest common subsequence of two strings."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

vLLM

vllm serve Tesslate/OmniCoder-2-9B --tensor-parallel-size 1 --max-model-len 65536

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
response = client.chat.completions.create(
    model="Tesslate/OmniCoder-2-9B",
    messages=[{"role": "user", "content": "Explain the difference between a mutex and a semaphore."}],
    temperature=0.6,
)
print(response.choices[0].message.content)

llama.cpp (GGUF)

llama-cli --hf-repo Tesslate/OmniCoder-2-9B-GGUF --hf-file omnicoder-2-9b-q4_k_m.gguf -p "Your prompt" -c 8192

Training Details

v1 vs v2 Pipeline

	OmniCoder-9B (v1)	OmniCoder-2-9B (v2)
Loss computed on	All tokens (system, user, tool, assistant)	Assistant tokens only
LR schedule	Cosine (decayed to zero too early)	Constant with warmup
EOS training	Global	Per-turn (`train_on_eos: turn`)
Thinking blocks	Stripped by stock template	Preserved on all turns (custom Jinja2)
Repetition	Frequent loops in generation	Near-zero
Convergence	LR killed learning at step 60	Clean convergence at step 80-100

Training Config


Base Model	Qwen3.5-9B
Method	LoRA SFT (r=64, alpha=32, all layers incl. MLP)
Dataset	425K agentic trajectories from 5 sources
Loss	Assistant tokens only (`roles_to_train: [assistant]`)
Sequence Length	65,536 tokens (sample packing)
LR Schedule	Constant with warmup (2e-4, 10 warmup steps)
Hardware	4x NVIDIA H200 (DDP)
Framework	Axolotl
Precision	bf16
Optimizer	AdamW (weight_decay=0.001)
Effective Batch	32
Steps	350 (~10% of one epoch)

Training Data Sources

Source	Samples	Description
NVIDIA Nemotron-Terminal-Corpus	226K	Terminal agent trajectories
CoderForge-Preview (reward >= 0.5)	155K	SWE-bench style coding trajectories
Nemotron Skill-Based	24K	Skill-based coding tasks
Scale-SWE	20K	Real GitHub issue patches (synthesized trajectories)
Opus Reasoning	2.3K	Chain-of-thought reasoning

Why Constant LR? (Schulman "LoRA Without Regret")

v1 used a cosine LR schedule over 80 steps. The learning rate decayed to zero by step 60, killing learning before the model converged. Loss appeared to plateau at 0.45 but was actually still dropping.

v2 follows Schulman et al.'s findings:

Constant LR. No decay, no premature convergence death.
LoRA LR ~10x FullFT. Our 2e-4 is correct for 9B-class models.
LoRA on ALL layers including MLP. Attention-only LoRA significantly underperforms.
Batch size 32. LoRA is less tolerant of large batches; 32 is the sweet spot.

Architecture

OmniCoder-2 inherits Qwen3.5-9B's hybrid architecture:

Gated Delta Networks. Linear attention layers interleaved with standard attention for efficient long-range dependencies.
VLM Backbone. Built on Qwen3_5ForConditionalGeneration.
262K Native Context. Full 262,144 token context window.

Recommended Sampling Parameters

Parameter	Value
Temperature	0.6
Top-P	0.95
Top-K	20
Presence Penalty	0.0

For agentic / tool-calling tasks, consider lower temperature (0.2-0.4) for more deterministic behavior.

Limitations

Performance on non-English tasks has not been extensively evaluated
Tool-calling format is flexible but works best with the scaffolding patterns seen in training

Acknowledgments

Special thanks to the Axolotl team and the discussion in axolotl#3453 for helping get Qwen3.5 packing support working.

Citation

@misc{omnicoder2_2025,
  title={OmniCoder-2-9B: A Frontier Open Coding Agent},
  author={Tesslate},
  year={2025},
  url={https://huggingface.co/Tesslate/OmniCoder-2-9B}
}

Built by Tesslate

Downloads last month: -

Model tree for Tesslate/OmniCoder-2-9B

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(109)

this model

Quantizations

5 models

Evaluation results

Accuracy on AIME 2025
self-reported

90.000
Accuracy on GPQA Diamond (pass@1)
self-reported

83.000
Accuracy on GPQA Diamond (pass@3)
self-reported

86.000
Pass Rate on Terminal-Bench 2.0
self-reported

25.800