STIP Policy Label Validator (LLM-as-Judge)

Model Summary

This model is a fine-tuned large language model used as a label validator for Science, Technology and Innovation (STI) policy monitoring in an OECD STIP Compass–style setting.

Given:

a policy label (code, title, definition), and
a set of evidence sentences extracted from a policy initiative web page,

the model answers strictly with "Yes" or "No" to indicate whether the evidence supports that label.

Core characteristics:

Task: Binary validation (Yes / No) of STI policy labels
Domain: STI policy monitoring (Policy Instruments, Target Groups, Themes)
Input: A prompt with label metadata + evidence sentences
Output: "Yes" or "No" as the first generated token(s) after the answer cue
Fine-tuning: Parameter-efficient (LoRA/QLoRA) on (label, evidence-cluster) pairs

⚠️ This model is designed as a decision-support component in a human-in-the-loop pipeline, not as a fully automated policy encoder.

Intended Use

Primary Use

The model is intended to act as a validator in a hybrid semantic–generative workflow for STI policy monitoring:

A multilingual sentence encoder finds evidence clusters between policy labels and web-scraped initiative text.
This model takes a single (label, evidence-cluster) pair and decides whether the label is truly instantiated in the initiative.

Typical use cases:

Semi-automatic coding of policy initiatives into a structured taxonomy (e.g., STIP Compass PI/TG/TH labels).
Automatic checking of candidate labels proposed by a clustering or retrieval model.
Supporting policy analysts with ranked, evidence-backed suggestions rather than manual coding from scratch.

Who is this for?

Policy analysts and STI monitoring teams
Researchers working on policy text mining
Data engineers building label-assistance tools for institutional policy data

Data and Training

Training Data (High-Level)

The model was fine-tuned on a dataset of institutional policy web pages and corresponding multi-label annotations derived from a structured STI taxonomy (e.g., Policy Instruments, Target Groups, and Themes).

Text consists of:
- Policy initiative descriptions
- Objectives
- Associated initiative web pages
Labels are:
- Multi-label, document-level policy codes (PI/TG/TH)
- Defined by a code, a title, and a natural language definition

Each example is labeled:

"Yes" if the label is in the gold annotation for that initiative
"No" otherwise

Training Procedure (Summary)

The model is fine-tuned using parameter-efficient fine-tuning (LoRA or QLoRA):

Training objective: next-token prediction constrained to "Yes" / "No"
Batch size: 2
Max sequence length: 2048
Learning rate: 2e-4
Epochs: around 6 (with early stopping on dev F1)
Label frequency thresholds used to avoid overfitting very rare labels

How to Use

Below is a minimal example using transformers and peft.
You only need to change the model_id to your model name.

1. Install dependencies

pip install transformers peft accelerate sentencepiece


### 2. Load the model and tokenizer

from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM

# Change this to your model repo name
model_id = "vtt-qsts-ai/Llama-3.1-70B-Inst-qlora-orig_label-validator"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoPeftModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
)
model.eval()

### 3. Building Prompt

label_code = "Label..."
label_title = "title..."
label_def = "Label Def..."
evidence_sentences = [
    "Sent 1...",
    "Sent 2...",
]

bullets = "\n".join(f"- {s}" for s in evidence_sentences)

prompt = f"""You are an expert in Science, Technology and Innovation (STI) policy.
Decide whether the evidence sentences describe the following policy label.

Label: {label_code} - {label_title} - {label_def}

Evidence sentences:
{bullets}

Answer strictly with 'Yes' or 'No'. Return 'Yes' only if the evidence clearly
matches the definition of the label, return 'No' otherwise.

Answer:"""

### 4. Get a Yes/No decision

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2,
        do_sample=False,
        temperature=0.0,
    )

generated = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
).strip()

print("Model answer:", generated)  # Expected: "Yes" or "No"

Citation

Coming soon...

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vtt-qsts-ai/Llama-3.1-70B-Inst-qlora-orig_label-validator

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.1-70B-Instruct

Finetuned

(89)

this model