STIP Policy Label Validator (LLM-as-Judge)
Model Summary
This model is a fine-tuned large language model used as a label validator for Science, Technology and Innovation (STI) policy monitoring in an OECD STIP Compass–style setting.
Given:
- a policy label (code, title, definition), and
- a set of evidence sentences extracted from a policy initiative web page,
the model answers strictly with "Yes" or "No" to indicate whether the evidence supports that label.
Core characteristics:
- Task: Binary validation (
Yes/No) of STI policy labels - Domain: STI policy monitoring (Policy Instruments, Target Groups, Themes)
- Input: A prompt with label metadata + evidence sentences
- Output:
"Yes"or"No"as the first generated token(s) after the answer cue - Fine-tuning: Parameter-efficient (LoRA/QLoRA) on (label, evidence-cluster) pairs
⚠️ This model is designed as a decision-support component in a human-in-the-loop pipeline, not as a fully automated policy encoder.
Intended Use
Primary Use
The model is intended to act as a validator in a hybrid semantic–generative workflow for STI policy monitoring:
- A multilingual sentence encoder finds evidence clusters between policy labels and web-scraped initiative text.
- This model takes a single (label, evidence-cluster) pair and decides whether the label is truly instantiated in the initiative.
Typical use cases:
- Semi-automatic coding of policy initiatives into a structured taxonomy (e.g., STIP Compass PI/TG/TH labels).
- Automatic checking of candidate labels proposed by a clustering or retrieval model.
- Supporting policy analysts with ranked, evidence-backed suggestions rather than manual coding from scratch.
Who is this for?
- Policy analysts and STI monitoring teams
- Researchers working on policy text mining
- Data engineers building label-assistance tools for institutional policy data
Data and Training
Training Data (High-Level)
The model was fine-tuned on a dataset of institutional policy web pages and corresponding multi-label annotations derived from a structured STI taxonomy (e.g., Policy Instruments, Target Groups, and Themes).
- Text consists of:
- Policy initiative descriptions
- Objectives
- Associated initiative web pages
- Labels are:
- Multi-label, document-level policy codes (PI/TG/TH)
- Defined by a code, a title, and a natural language definition
Each example is labeled:
"Yes"if the label is in the gold annotation for that initiative"No"otherwise
Training Procedure (Summary)
The model is fine-tuned using parameter-efficient fine-tuning (LoRA or QLoRA):
- Training objective: next-token prediction constrained to
"Yes"/"No" - Batch size: 2
- Max sequence length: 2048
- Learning rate:
2e-4 - Epochs: around 6 (with early stopping on dev F1)
- Label frequency thresholds used to avoid overfitting very rare labels
How to Use
Below is a minimal example using transformers and peft.
You only need to change the model_id to your model name.
1. Install dependencies
pip install transformers peft accelerate sentencepiece
### 2. Load the model and tokenizer
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM
# Change this to your model repo name
model_id = "vtt-qsts-ai/Llama-3.1-70B-Inst-qlora-orig_label-validator"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoPeftModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
)
model.eval()
### 3. Building Prompt
label_code = "Label..."
label_title = "title..."
label_def = "Label Def..."
evidence_sentences = [
"Sent 1...",
"Sent 2...",
]
bullets = "\n".join(f"- {s}" for s in evidence_sentences)
prompt = f"""You are an expert in Science, Technology and Innovation (STI) policy.
Decide whether the evidence sentences describe the following policy label.
Label: {label_code} - {label_title} - {label_def}
Evidence sentences:
{bullets}
Answer strictly with 'Yes' or 'No'. Return 'Yes' only if the evidence clearly
matches the definition of the label, return 'No' otherwise.
Answer:"""
### 4. Get a Yes/No decision
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=2,
do_sample=False,
temperature=0.0,
)
generated = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True,
).strip()
print("Model answer:", generated) # Expected: "Yes" or "No"
Citation
Coming soon...
Model tree for vtt-qsts-ai/Llama-3.1-70B-Inst-qlora-orig_label-validator
Base model
meta-llama/Llama-3.1-70B