open-llama-3b-opus-reasoning-sft-2k-4bit

Merged and 4-bit quantized version of open_llama_3b_v2 fine-tuned on reasoning data with <think> tags.

How it was made

  1. Base: openlm-research/open_llama_3b_v2
  2. QLoRA fine-tuned with LoRA adapter
  3. Merged LoRA into base (16bit)
  4. Quantized back to 4-bit NF4

Training Data

  • Crownelius/Opus-4.6-Reasoning-3300x (2,160 samples)
  • Roman1111111/claude-opus-4.6-10000x (9,633 samples)
  • 11,669 samples after filtering (<= 2024 tokens)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ping98k/open-llama-3b-opus-reasoning-sft-2k-4bit", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("ping98k/open-llama-3b-opus-reasoning-sft-2k-4bit")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 10 * 5?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=False))

LoRA adapter

See ping98k/open-llama-3b-opus-reasoning-sft-lora-2k

Downloads last month
721
Safetensors
Model size
4B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for ping98k/open-llama-3b-opus-reasoning-sft-2k-4bit

Quantized
(30)
this model