RegTech-4B-Instruct

Fine-tuned for RAG-powered banking compliance — not general knowledge.

A specialized Qwen3-4B-Instruct model fine-tuned to excel within a Retrieval-Augmented Generation (RAG) pipeline for Italian banking regulatory compliance.

This model doesn't try to memorize regulations — it's trained to work with retrieved context: follow instructions precisely, produce structured outputs, call compliance tools, resist hallucinations, and maintain professional tone when grounded on regulatory documents.


What This Model Does

This fine-tuning optimizes the model's behavior within a RAG system, not its factual knowledge. Specifically:

Task Description
RAG Q&A Answer regulatory questions grounded on retrieved documents
Tool Calling KYC verification, risk scoring, PEP checks, SOS reporting
Query Expansion Rewrite user queries with regulatory terminology for better retrieval
Intent Detection Classify if a message needs document search or is conversational
Document Reranking Score candidate documents by relevance
Structured JSON Topic extraction, metadata, impact analysis in JSON format
Impact Analysis Cross-reference external regulations against internal bank procedures
Hallucination Resistance Refuse to fabricate regulations, articles, or sanctions not in context

Evaluation

Methodology

We evaluate all fine-tuned models using a dynamic adversarial benchmark designed to prevent overfitting to static test sets:

  • Test generation: An independent LLM generates novel, realistic test scenarios across 13 compliance-specific categories for each evaluation run. Tests are never reused.
  • Blind comparison: Both the base and fine-tuned model respond to identical prompts. Responses are anonymized and randomly swapped before judging to eliminate position bias.
  • Expert judging: A frontier-class LLM acts as domain expert judge, scoring each response on 7 criteria (accuracy, context adherence, hallucination resistance, format, tone, instruction following, completeness) on a 1–5 scale.
  • Statistical robustness: Each evaluation consists of multiple independent loops with fresh test sets, ensuring results are consistent and not artifacts of a single test batch.

This approach produces a rigorous, reproducible assessment that closely mirrors real-world compliance assistant performance.

Results — RegTech-4B-Instruct

Evaluated across 73 blind adversarial tests over 3 independent loops.

Head-to-Head vs Base Model

                        Base    Tuned
Win Rate (adj.)        45.2%   54.8%
Wins                     26      33
Ties                          14

Quality Scores (1–5 scale)

Criterion Base Tuned Delta
Hallucination Resistance 3.53 3.89 +0.36 Improved
Tone & Professionalism 3.90 4.27 +0.37 Improved
Output Format 3.41 3.75 +0.34 Improved
Instruction Following 3.14 3.44 +0.30 Improved
Accuracy 3.34 3.59 +0.25 Improved
Context Adherence 3.66 3.89 +0.23 Improved
Completeness 3.45 3.23 -0.22 Trade-off
Overall 3.49 3.72 +0.23 Improved

Key Safety Improvements

The fine-tuned model demonstrates measurably safer behavior in high-stakes regulatory scenarios:

  • Hallucination traps: The tuned model correctly refuses fabricated regulations in all tested scenarios. The base model invents plausible-sounding but entirely fictional legal articles and sanctions.
  • Credential protection: When exposed to prompt injection attacks containing embedded credentials, the tuned model refuses disclosure. The base model has been observed leaking credentials verbatim.
  • Professional tone: Eliminates emoji usage and filler phrases ("Certo!", "Ottima domanda!") that are inappropriate in regulatory communications.

Known Limitations

  • Completeness trade-off (-0.22): The model tends toward concise, precise answers. For tasks requiring exhaustive analysis, responses may be shorter than ideal.
  • Query Expansion: Performance on query rewriting tasks is below the base model. This is a known gap being addressed in dataset improvements.
  • Inference speed: ~40% faster than base model (4.3s vs 7.0s average), primarily due to more concise outputs.

Consistency Across Loops

Loop Base Wins Tuned Wins Ties Tuned %
1 7 13 5 62.0%
2 11 10 2 47.8%
3 8 10 7 54.0%

Tuned model wins or ties in 2 out of 3 independent loops.


Usage Examples

RAG Q&A — Answering from Retrieved Context

messages = [
    {
        "role": "system",
        "content": """Sei un assistente per la compliance bancaria. 
Rispondi SOLO basandoti sul contesto fornito.

<contesto_recuperato>
Art. 92 CRR - Gli enti soddisfano in qualsiasi momento i seguenti 
requisiti: a) CET1 del 4,5%; b) Tier 1 del 6%; c) capitale totale dell'8%.
</contesto_recuperato>"""
    },
    {
        "role": "user", 
        "content": "Quali sono i requisiti minimi di capitale secondo il CRR?"
    }
]

Tool Calling — Compliance Workflows

messages = [
    {
        "role": "system",
        "content": """Sei un assistente operativo per la compliance.
        
<tools>
{"name": "calcola_scoring_rischio", "parameters": {...}}
{"name": "controlla_liste_pep", "parameters": {...}}
{"name": "verifica_kyc", "parameters": {...}}
</tools>

<contesto_recuperato>
Procedura AML-003: L'adeguata verifica rafforzata (EDD) deve essere 
applicata per PEP, paesi ad alto rischio e profili con scoring > 60.
</contesto_recuperato>"""
    },
    {
        "role": "user",
        "content": "Devo aprire un conto per una società con sede a Dubai. Il legale rappresentante è il sig. Al-Rashid."
    }
]

Query Expansion — Improving RAG Retrieval

messages = [
    {
        "role": "system",
        "content": "Riscrivi la query dell'utente per migliorare il recupero documentale. Aggiungi termini tecnici e riferimenti normativi. Rispondi SOLO con il JSON."
    },
    {
        "role": "user",
        "content": "## QUERY ORIGINALE: [obblighi segnalazione operazioni sospette]"
    }
]

Document Reranking

messages = [
    {
        "role": "system",
        "content": "Valuta la rilevanza di ciascun candidato rispetto alla query. Score 0-100. Rispondi SOLO con il JSON."
    },
    {
        "role": "user",
        "content": '{"query": "requisiti CET1", "candidates": [{"id": "doc_001", "title": "Art. 92 CRR"}, {"id": "doc_002", "title": "DORA Art. 5"}]}'
    }
]

Training Metrics

Metric Value
Final Eval Loss 1.368
Token Accuracy 70.5%
Train/Eval Gap 0.033

A gap of 0.033 indicates stable training with no overfitting. The model learned domain-specific behavior without degrading general capabilities.

Design Principles

The LoRA configuration follows a minimal intervention philosophy validated through progressive experimentation across 6+ configurations:

  • Low rank, all modules: Modifying all transformer layers with minimal rank produces better results than high rank on a subset of layers — consistent with findings from the original LoRA paper.
  • Single epoch: One pass through the data is sufficient for behavioral adaptation. Multiple epochs cause catastrophic forgetting on small models.
  • Conservative scaling: Alpha = 2× rank with low learning rate ensures stable gradients with adequate signal amplification.

Dataset Coverage

The training data covers the full lifecycle of a RAG-based compliance assistant:

Category Purpose
Query Expansion Enrich queries with regulatory terms for better retrieval
Intent Classification Route queries to RAG vs conversational responses
Document Reranking Score retrieved documents by relevance
Topic Extraction Extract main topics from regulatory text pages
Document Summarization Summarize multi-page regulatory documents
Relevance Filtering Filter regulatory text relevant to banks
Metadata Extraction Find application dates, issuing authorities
Impact Analysis Cross-reference regulations vs internal procedures
RAG Q&A + Tool Calling Multi-turn compliance conversations with tools

Regulatory sources covered: CRR/CRR3, DORA (UE 2022/2554), D.Lgs. 231/2007 (AML), D.Lgs. 385/1993 (TUB), Circolare 285, PSD2, MiFID II/MiFIR, D.P.R. 180/1950 and related Banca d'Italia provisions.


Deployment

With vLLM

vllm serve ./models/RegTech-4B-Instruct --dtype bfloat16

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "YOUR_REPO_ID", torch_dtype="bfloat16", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("YOUR_REPO_ID")

text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Important Notes

  • RAG-optimized — Trained to work with retrieved context, not to memorize regulations. Always provide relevant documents in the system prompt.
  • Domain-specific — Optimized for Italian banking compliance. General capabilities may differ from the base model.
  • Not legal advice — A tool to assist compliance professionals, not a substitute for regulatory expertise.
  • Part of a model family — This 4B model is the lightweight variant. Larger models (7B, 14B, 32B) in the RegTech family offer progressively better completeness and accuracy for more demanding use cases.

Built for banking RAG by 2Sophia
Fine-tuned with LoRA • Adversarial evaluation by frontier LLM judges • Powered by Qwen3

Downloads last month
20
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sophia-AI/RegTech-4B-Instruct

Adapter
(2437)
this model
Adapters
1 model

Paper for Sophia-AI/RegTech-4B-Instruct