OmniCoder-2-9B
The second generation of OmniCoder. Faster reasoning, dramatically less repetition, completely rebuilt training pipeline.
What's New in v2
OmniCoder-2-9B is a ground-up rebuild of OmniCoder-9B, not a continuation. The entire training pipeline was redesigned to fix the core issues users reported with v1:
- No more repetition loops. v1 trained on ALL tokens (system prompts, tool outputs, templates), which taught the model to reproduce repetitive boilerplate. v2 trains only on assistant tokens. The model never learns to parrot templates, so it stays coherent through long multi-turn conversations.
- Faster, more focused reasoning. v1's
<think>blocks were often bloated and circular. v2 produces tighter reasoning chains that get to the point faster because it only learned from the actual reasoning, not the scaffolding around it. - Much more stable in agentic loops. v1 would sometimes get stuck in repetitive tool-call cycles. v2 handles multi-step agentic tasks cleanly. It knows when to stop, when to switch tools, and when to give a final answer.
- Rebuilt training pipeline. Switched from all-token cosine-schedule training to assistant-only constant-LR training (based on Schulman's "LoRA Without Regret"). The model converges faster and doesn't suffer from premature LR decay that killed v1's learning.
TL;DR: OmniCoder-2-9B fixes the repetition and instability issues from v1. Same weights, same architecture, completely different training approach.
Benchmarks
| Benchmark | OmniCoder-2-9B | OmniCoder-9B | Qwen3.5-9B | GPT-OSS-120B | GLM 4.7 | Claude Haiku 4.5 |
|---|---|---|---|---|---|---|
| AIME 2025 (pass@5) | 90 | 90 | 91.6 | |||
| GPQA Diamond (pass@1) | 83 | 83.8 | 81.7 | 71.5 | 73 | |
| GPQA Diamond (pass@3) | 86 | 86.4 | ||||
| Terminal-Bench 2.0 | 25.8 | 23.6 | 14.6 | 33.4 | 27 |
Key Results
- GPQA Diamond pass@1: 83% (164/198). On par with OmniCoder-9B (83.8%) and Qwen3.5-9B base (81.7%). Pass@3 improves to 86%.
- Terminal-Bench 2.0: 25.8% (23/89). +2.2 points over OmniCoder-9B (23.6%), +76% over Qwen3.5-9B base (14.6%).
- AIME 2025 pass@5: 90% (27/30). Parity with OmniCoder-9B.
Quickstart
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Tesslate/OmniCoder-2-9B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
messages = [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to find the longest common subsequence of two strings."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
vLLM
vllm serve Tesslate/OmniCoder-2-9B --tensor-parallel-size 1 --max-model-len 65536
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
response = client.chat.completions.create(
model="Tesslate/OmniCoder-2-9B",
messages=[{"role": "user", "content": "Explain the difference between a mutex and a semaphore."}],
temperature=0.6,
)
print(response.choices[0].message.content)
llama.cpp (GGUF)
llama-cli --hf-repo Tesslate/OmniCoder-2-9B-GGUF --hf-file omnicoder-2-9b-q4_k_m.gguf -p "Your prompt" -c 8192
Training Details
v1 vs v2 Pipeline
| OmniCoder-9B (v1) | OmniCoder-2-9B (v2) | |
|---|---|---|
| Loss computed on | All tokens (system, user, tool, assistant) | Assistant tokens only |
| LR schedule | Cosine (decayed to zero too early) | Constant with warmup |
| EOS training | Global | Per-turn (train_on_eos: turn) |
| Thinking blocks | Stripped by stock template | Preserved on all turns (custom Jinja2) |
| Repetition | Frequent loops in generation | Near-zero |
| Convergence | LR killed learning at step 60 | Clean convergence at step 80-100 |
Training Config
| Base Model | Qwen3.5-9B |
| Method | LoRA SFT (r=64, alpha=32, all layers incl. MLP) |
| Dataset | 425K agentic trajectories from 5 sources |
| Loss | Assistant tokens only (roles_to_train: [assistant]) |
| Sequence Length | 65,536 tokens (sample packing) |
| LR Schedule | Constant with warmup (2e-4, 10 warmup steps) |
| Hardware | 4x NVIDIA H200 (DDP) |
| Framework | Axolotl |
| Precision | bf16 |
| Optimizer | AdamW (weight_decay=0.001) |
| Effective Batch | 32 |
| Steps | 350 (~10% of one epoch) |
Training Data Sources
| Source | Samples | Description |
|---|---|---|
| NVIDIA Nemotron-Terminal-Corpus | 226K | Terminal agent trajectories |
| CoderForge-Preview (reward >= 0.5) | 155K | SWE-bench style coding trajectories |
| Nemotron Skill-Based | 24K | Skill-based coding tasks |
| Scale-SWE | 20K | Real GitHub issue patches (synthesized trajectories) |
| Opus Reasoning | 2.3K | Chain-of-thought reasoning |
Why Constant LR? (Schulman "LoRA Without Regret")
v1 used a cosine LR schedule over 80 steps. The learning rate decayed to zero by step 60, killing learning before the model converged. Loss appeared to plateau at 0.45 but was actually still dropping.
v2 follows Schulman et al.'s findings:
- Constant LR. No decay, no premature convergence death.
- LoRA LR ~10x FullFT. Our 2e-4 is correct for 9B-class models.
- LoRA on ALL layers including MLP. Attention-only LoRA significantly underperforms.
- Batch size 32. LoRA is less tolerant of large batches; 32 is the sweet spot.
Architecture
OmniCoder-2 inherits Qwen3.5-9B's hybrid architecture:
- Gated Delta Networks. Linear attention layers interleaved with standard attention for efficient long-range dependencies.
- VLM Backbone. Built on
Qwen3_5ForConditionalGeneration. - 262K Native Context. Full 262,144 token context window.
Recommended Sampling Parameters
| Parameter | Value |
|---|---|
| Temperature | 0.6 |
| Top-P | 0.95 |
| Top-K | 20 |
| Presence Penalty | 0.0 |
For agentic / tool-calling tasks, consider lower temperature (0.2-0.4) for more deterministic behavior.
Limitations
- Performance on non-English tasks has not been extensively evaluated
- Tool-calling format is flexible but works best with the scaffolding patterns seen in training
Acknowledgments
Special thanks to the Axolotl team and the discussion in axolotl#3453 for helping get Qwen3.5 packing support working.
Citation
@misc{omnicoder2_2025,
title={OmniCoder-2-9B: A Frontier Open Coding Agent},
author={Tesslate},
year={2025},
url={https://huggingface.co/Tesslate/OmniCoder-2-9B}
}
Built by Tesslate
- Downloads last month
- -
Model tree for Tesslate/OmniCoder-2-9B
Evaluation results
- Accuracy on AIME 2025self-reported90.000
- Accuracy on GPQA Diamond (pass@1)self-reported83.000
- Accuracy on GPQA Diamond (pass@3)self-reported86.000
- Pass Rate on Terminal-Bench 2.0self-reported25.800