IntentGuard β Financial Services
Production-ready vertical intent classifier for LLM chatbot guardrails. Classifies user messages as allow, deny, or abstain to keep financial services chatbots on-topic and secure.
Research Article | perfecXion.ai | Finance Model | Healthcare Model | Legal Model
IntentGuard Model Family
IntentGuard provides specialized intent classifiers for high-stakes verticals where chatbot misuse carries regulatory, legal, or safety risk:
| Model | Vertical | Accuracy | Off-Topic Pass Rate | Link |
|---|---|---|---|---|
| intentguard-finance | Financial Services | 99.6% | 0.00% | This model |
| intentguard-healthcare | Healthcare & Clinical | 98.9% | 0.98% | perfecXion/intentguard-healthcare |
| intentguard-legal | Legal & Compliance | 97.9% | 0.50% | perfecXion/intentguard-legal |
Overview
The Problem
Enterprise chatbots in regulated industries face a critical challenge: users inevitably ask off-topic questions (sports, entertainment, relationship advice) that the underlying LLM will happily answer β exposing the organization to compliance risk, brand damage, and potential liability.
Traditional keyword filters miss nuanced off-topic queries, while LLM-based guardrails are too slow and expensive for real-time inference.
The Solution
IntentGuard uses a tiny, purpose-trained DeBERTa-v3-xsmall model (22M parameters, 2.5MB quantized) to classify user intent in <30ms on CPU. The three-way classification (allow/deny/abstain) enables precise control:
- Allow β On-topic for the vertical, pass to the LLM
- Deny β Clearly off-topic, block with a polite redirect
- Abstain β Ambiguous, escalate to secondary classifier or human review
Performance
| Metric | Value |
|---|---|
| Overall Accuracy | 99.6% |
| Legitimate Block Rate | 0.00% (no false positives) |
| Off-Topic Pass Rate | 0.00% (no false negatives) |
| p99 Latency (CPU) | <30ms |
| Model Size (ONNX INT8) | 2.5MB |
| Base Parameters | 22M (DeBERTa-v3-xsmall) |
| Expected Calibration Error | <0.03 |
Classification Decision Framework
User Message β Tokenize β DeBERTa Inference β Softmax
β
ββββββββββββββββΌβββββββββββββββ
β β β
ALLOW DENY ABSTAIN
(on-topic) (off-topic) (uncertain)
β β β
Pass to LLM Block + Redirect Escalate
Model Details
| Property | Value |
|---|---|
| Architecture | DeBERTa-v3-xsmall (fine-tuned for 3-way classification) |
| Format | ONNX (INT8 quantized) |
| Version | 1.0 |
| Vertical | Finance (Financial Services) |
| Training | Supervised fine-tuning on curated intent datasets |
| Quantization | INT8 via ONNX Runtime |
| GPU Required | No β runs on CPU |
| Publisher | perfecXion.ai |
Core Topics (Allow)
Banking, lending, credit, payments, investing, insurance, tax, personal finance, retirement, mortgages, financial planning, budgeting
Hard Exclusions (Deny)
Sports, entertainment, cooking, gaming, celebrity gossip, fashion, travel/leisure, fiction writing, relationship advice
Usage
Python (ONNX Runtime)
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("perfecXion/intentguard-finance")
session = ort.InferenceSession("model.onnx")
# Classify a user message
text = "What are the current mortgage rates for a 30-year fixed loan?"
inputs = tokenizer(text, return_tensors="np", max_length=128, truncation=True, padding="max_length")
logits = session.run(None, {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
})[0]
labels = ["allow", "deny", "abstain"]
prediction = labels[np.argmax(logits)]
confidence = float(np.max(np.exp(logits) / np.sum(np.exp(logits))))
print(f"Intent: {prediction} (confidence: {confidence:.3f})")
# Output: Intent: allow (confidence: 0.998)
Docker
# Pull and run the container
docker pull ghcr.io/perfecxion/intentguard:finance-1.0
docker run -p 8080:8080 ghcr.io/perfecxion/intentguard:finance-1.0
# Classify a message
curl -X POST http://localhost:8080/v1/classify \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "What are the current mortgage rates?"}]}'
# Response: {"intent": "allow", "confidence": 0.998}
pip
pip install intentguard
# Python usage
from intentguard import IntentGuard
guard = IntentGuard.load("finance")
result = guard.classify("What are the current mortgage rates?")
print(result) # Intent(label='allow', confidence=0.998)
Example Classifications
| User Message | Predicted | Confidence | Correct? |
|---|---|---|---|
| "What are mortgage rates for a 30-year fixed?" | allow | 0.998 | β |
| "How do I open a Roth IRA?" | allow | 0.997 | β |
| "Who won the Super Bowl?" | deny | 0.999 | β |
| "Tell me a joke" | deny | 0.996 | β |
| "Is my health insurance FSA-eligible?" | allow | 0.942 | β (financial context) |
| "What's the weather today?" | deny | 0.998 | β |
Citation
@misc{thornton2025intentguard,
title={IntentGuard: A Production-Grade Vertical Intent Classifier for LLM Guardrails},
author={Thornton, Scott},
year={2025},
publisher={perfecXion.ai},
url={https://perfecxion.ai/articles/intentguard-vertical-intent-classifier-llm-guardrails.html},
note={Model: https://huggingface.co/perfecXion/intentguard-finance}
}
Quality Metrics
| Metric | Result |
|---|---|
| Accuracy (Finance vertical) | 99.6% |
| Legitimate Block Rate | 0.00% |
| Off-Topic Pass Rate | 0.00% |
| Expected Calibration Error | <0.03 |
| ONNX INT8 Quantization | Validated |
| CPU Inference (p99) | <30ms |
| Docker Container | Available |
License
Apache 2.0
Links
- Research Article: IntentGuard: A Production-Grade Vertical Intent Classifier for LLM Guardrails
- Publisher: perfecXion.ai
- Healthcare Model: perfecXion/intentguard-healthcare
- Legal Model: perfecXion/intentguard-legal
- Docker Image:
ghcr.io/perfecxion/intentguard:finance-1.0
- Downloads last month
- 19
Evaluation results
- Accuracyself-reported99.600
- Legitimate Block Rateself-reported0.000
- Off-Topic Pass Rateself-reported0.000