OmniCoder 2

OmniCoder-2-9B

The second generation of OmniCoder. Faster reasoning, dramatically less repetition, completely rebuilt training pipeline.

License Base Model Previous


What's New in v2

OmniCoder-2-9B is a ground-up rebuild of OmniCoder-9B, not a continuation. The entire training pipeline was redesigned to fix the core issues users reported with v1:

  • No more repetition loops. v1 trained on ALL tokens (system prompts, tool outputs, templates), which taught the model to reproduce repetitive boilerplate. v2 trains only on assistant tokens. The model never learns to parrot templates, so it stays coherent through long multi-turn conversations.
  • Faster, more focused reasoning. v1's <think> blocks were often bloated and circular. v2 produces tighter reasoning chains that get to the point faster because it only learned from the actual reasoning, not the scaffolding around it.
  • Much more stable in agentic loops. v1 would sometimes get stuck in repetitive tool-call cycles. v2 handles multi-step agentic tasks cleanly. It knows when to stop, when to switch tools, and when to give a final answer.
  • Rebuilt training pipeline. Switched from all-token cosine-schedule training to assistant-only constant-LR training (based on Schulman's "LoRA Without Regret"). The model converges faster and doesn't suffer from premature LR decay that killed v1's learning.

TL;DR: OmniCoder-2-9B fixes the repetition and instability issues from v1. Same weights, same architecture, completely different training approach.


Benchmarks

Benchmark OmniCoder-2-9B OmniCoder-9B Qwen3.5-9B GPT-OSS-120B GLM 4.7 Claude Haiku 4.5
AIME 2025 (pass@5) 90 90 91.6
GPQA Diamond (pass@1) 83 83.8 81.7 71.5 73
GPQA Diamond (pass@3) 86 86.4
Terminal-Bench 2.0 25.8 23.6 14.6 33.4 27

Key Results

  • GPQA Diamond pass@1: 83% (164/198). On par with OmniCoder-9B (83.8%) and Qwen3.5-9B base (81.7%). Pass@3 improves to 86%.
  • Terminal-Bench 2.0: 25.8% (23/89). +2.2 points over OmniCoder-9B (23.6%), +76% over Qwen3.5-9B base (14.6%).
  • AIME 2025 pass@5: 90% (27/30). Parity with OmniCoder-9B.

Quickstart

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Tesslate/OmniCoder-2-9B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to find the longest common subsequence of two strings."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

vLLM

vllm serve Tesslate/OmniCoder-2-9B --tensor-parallel-size 1 --max-model-len 65536
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
response = client.chat.completions.create(
    model="Tesslate/OmniCoder-2-9B",
    messages=[{"role": "user", "content": "Explain the difference between a mutex and a semaphore."}],
    temperature=0.6,
)
print(response.choices[0].message.content)

llama.cpp (GGUF)

llama-cli --hf-repo Tesslate/OmniCoder-2-9B-GGUF --hf-file omnicoder-2-9b-q4_k_m.gguf -p "Your prompt" -c 8192

Training Details

v1 vs v2 Pipeline

OmniCoder-9B (v1) OmniCoder-2-9B (v2)
Loss computed on All tokens (system, user, tool, assistant) Assistant tokens only
LR schedule Cosine (decayed to zero too early) Constant with warmup
EOS training Global Per-turn (train_on_eos: turn)
Thinking blocks Stripped by stock template Preserved on all turns (custom Jinja2)
Repetition Frequent loops in generation Near-zero
Convergence LR killed learning at step 60 Clean convergence at step 80-100

Training Config

Base Model Qwen3.5-9B
Method LoRA SFT (r=64, alpha=32, all layers incl. MLP)
Dataset 425K agentic trajectories from 5 sources
Loss Assistant tokens only (roles_to_train: [assistant])
Sequence Length 65,536 tokens (sample packing)
LR Schedule Constant with warmup (2e-4, 10 warmup steps)
Hardware 4x NVIDIA H200 (DDP)
Framework Axolotl
Precision bf16
Optimizer AdamW (weight_decay=0.001)
Effective Batch 32
Steps 350 (~10% of one epoch)

Training Data Sources

Source Samples Description
NVIDIA Nemotron-Terminal-Corpus 226K Terminal agent trajectories
CoderForge-Preview (reward >= 0.5) 155K SWE-bench style coding trajectories
Nemotron Skill-Based 24K Skill-based coding tasks
Scale-SWE 20K Real GitHub issue patches (synthesized trajectories)
Opus Reasoning 2.3K Chain-of-thought reasoning

Why Constant LR? (Schulman "LoRA Without Regret")

v1 used a cosine LR schedule over 80 steps. The learning rate decayed to zero by step 60, killing learning before the model converged. Loss appeared to plateau at 0.45 but was actually still dropping.

v2 follows Schulman et al.'s findings:

  • Constant LR. No decay, no premature convergence death.
  • LoRA LR ~10x FullFT. Our 2e-4 is correct for 9B-class models.
  • LoRA on ALL layers including MLP. Attention-only LoRA significantly underperforms.
  • Batch size 32. LoRA is less tolerant of large batches; 32 is the sweet spot.

Architecture

OmniCoder-2 inherits Qwen3.5-9B's hybrid architecture:

  • Gated Delta Networks. Linear attention layers interleaved with standard attention for efficient long-range dependencies.
  • VLM Backbone. Built on Qwen3_5ForConditionalGeneration.
  • 262K Native Context. Full 262,144 token context window.

Recommended Sampling Parameters

Parameter Value
Temperature 0.6
Top-P 0.95
Top-K 20
Presence Penalty 0.0

For agentic / tool-calling tasks, consider lower temperature (0.2-0.4) for more deterministic behavior.


Limitations

  • Performance on non-English tasks has not been extensively evaluated
  • Tool-calling format is flexible but works best with the scaffolding patterns seen in training

Acknowledgments

Special thanks to the Axolotl team and the discussion in axolotl#3453 for helping get Qwen3.5 packing support working.


Citation

@misc{omnicoder2_2025,
  title={OmniCoder-2-9B: A Frontier Open Coding Agent},
  author={Tesslate},
  year={2025},
  url={https://huggingface.co/Tesslate/OmniCoder-2-9B}
}

Built by Tesslate

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for Tesslate/OmniCoder-2-9B

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(109)
this model
Quantizations
5 models

Evaluation results