lapa-instruct-bidi-grpo
Bidirectional English-Ukrainian translation LoRA adapter for lapa-v0.1.2-instruct (Gemma-3 12B).
Created by linearly combining the best direction-specific adapters:
- en->uk: Condition E (LLM judge + calibrated guardrails, step 300)
- uk->en: Condition A (chrF + BLEU, step 300)
Combined via peft.add_weighted_adapter(combination_type='linear', weights=[1.0, 1.0]).
Results (FLoRes+ devtest)
| Direction | BLEU | chrF | Mistral Judge | Aya Judge |
|---|---|---|---|---|
| en->uk | 33.88 | 61.54 | 83.87 | 91.94 |
| uk->en | 43.28 | 67.97 | 88.59 | 94.32 |
Results (WMT24)
| Direction | BLEU | chrF | Mistral Judge | Aya Judge |
|---|---|---|---|---|
| en->uk | 31.37 | 57.15 | 80.31 | 88.88 |
| uk->en | 35.69 | 60.71 | 84.62 | 91.75 |
Comparison to Baseline and Specialists
| Benchmark | Baseline | en->uk specialist (E) | uk->en specialist (A) | Bidi |
|---|---|---|---|---|
| FLoRes en->uk BLEU | 33.44 | 34.02 | 33.80 | 33.88 |
| FLoRes uk->en BLEU | 42.02 | 42.30 | 43.15 | 43.28 |
| WMT24 en->uk BLEU | 31.12 | 31.24 | 30.93 | 31.37 |
| WMT24 uk->en BLEU | 34.60 | 34.78 | 35.75 | 35.69 |
Improves over baseline on all 4 benchmarks. Retains most of each specialist's gains in its respective direction.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"lapa-llm/lapa-v0.1.2-instruct",
device_map="auto",
dtype="bfloat16",
)
model = PeftModel.from_pretrained(
base_model,
"iamthewalrus67/lapa-instruct-bidi-grpo",
adapter_name="bidi",
subfolder="bidi",
)
tokenizer = AutoTokenizer.from_pretrained("lapa-llm/lapa-v0.1.2-instruct")
messages = [{"role": "user", "content": "You are a professional translator. You give only the translated text and nothing else. Translate the following text into Ukrainian:\nThe weather is nice today."}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output = model.generate(input_ids, max_new_tokens=256, temperature=0.1, top_k=1, do_sample=True)
print(tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True))
Training Details
- Base model: lapa-v0.1.2-instruct (Gemma-3 12B, Ukrainian-specialized)
- Method: GRPO with direction-specific reward configurations
- LoRA: r=128, alpha=256, targets=q/k/v/o projections
- Data: WikiMatrix en-uk (132K pairs)
- Training: 300 steps, DeepSpeed ZeRO-2, 4x RTX 6000 Ada
See reward-driven-translation for full reproduction code.
- Downloads last month
- -
Model tree for iamthewalrus67/lapa-instruct-bidi-grpo
Base model
google/gemma-3-12b-pt Finetuned
lapa-llm/lapa-12b-pt Finetuned
lapa-llm/lapa-v0.1.2-instruct