Scott Thornton PRO

scthornton

AI & ML interests

AI/ML Security

Recent Activity

upvoted an article 12 days ago

IntentGuard: Building a Production-Grade Vertical Intent Classifier for LLM Safety

reacted to perfecXion's post with 👍 17 days ago

# IntentGuard: Open-Source Vertical Intent Classifiers for LLM Guardrails Three models published to the Hub: - [perfecXion/intentguard-finance](https://huggingface.co/perfecXion/intentguard-finance) - [perfecXion/intentguard-healthcare](https://huggingface.co/perfecXion/intentguard-healthcare) - [perfecXion/intentguard-legal](https://huggingface.co/perfecXion/intentguard-legal) DeBERTa-v3-xsmall fine-tuned for three-way classification: **allow**, **deny**, or **abstain**. ONNX + INT8 quantized, under 80MB, p99 <30ms on CPU. Margin-based thresholds (not argmax) — uncertain queries route to clarification instead of forcing a guess. **Eval results (adversarial test sets, ~470-480 examples per vertical):** | Vertical | Accuracy | Legit-Block Rate | Off-Topic-Pass Rate | |----------|----------|------------------|---------------------| | Finance | 99.6% | 0.00% | 0.00% | | Healthcare | 98.9% | 0.00% | 0.98% | | Legal | 97.9% | 0.00% | 0.50% | ```bash docker run -p 8080:8080 ghcr.io/perfecxion/intentguard:finance-latest curl -X POST http://localhost:8080/v1/classify \ -H "Content-Type: application/json" \ -d '{"messages": [{"role": "user", "content": "What are current mortgage rates?"}]}' ``` Apache 2.0. Full pipeline + Docker configs on [GitHub](https://github.com/perfecxion-ai/intentguard). Feedback welcome on domain coverage, adversarial robustness, and multilingual demand.

reacted to theirpost with 👀 23 days ago

# SecureCode Dataset Family Update: 2,185 Security Examples, Framework-Specific Patterns, Clean Parquet Loading Hey y'all, Quick update on the SecureCode dataset family. We've restructured things and fixed several issues: **What changed:** - The datasets are now properly split into three repos: [unified](https://huggingface.co/datasets/scthornton/securecode) (2,185), [web](https://huggingface.co/datasets/scthornton/securecode-web) (1,378), [AI/ML](https://huggingface.co/datasets/scthornton/securecode-aiml) (750) - All repos now use Parquet format -- `load_dataset()` just works, no deprecated loading scripts - SecureCode Web now includes 219 framework-specific examples (Express, Django, Spring Boot, Flask, Rails, Laravel, ASP.NET Core, FastAPI, NestJS) - Data cards have been corrected and split sizes fixed **Why it matters:** With AI-generated code accounting for 60%+ of some codebases (Checkmarx 2025), security training data is more important than ever. Every example in SecureCode is grounded in a real CVE with 4-turn conversations that mirror actual developer-AI workflows. If you're working on code generation models, I'd love to hear how you're approaching the security angle. Are there vulnerability categories or frameworks you'd like to see covered? Paper: [arxiv.org/abs/2512.18542](https://arxiv.org/abs/2512.18542)

View all activity

Organizations

upvoted an article 12 days ago

Article

IntentGuard: Building a Production-Grade Vertical Intent Classifier for LLM Safety

12 days ago

•

reactedto perfecXion's post with 👍 17 days ago

Post

2548

# IntentGuard: Open-Source Vertical Intent Classifiers for LLM Guardrails

Three models published to the Hub:

- [perfecXion/intentguard-finance]( perfecXion/intentguard-finance)
- [perfecXion/intentguard-healthcare]( perfecXion/intentguard-healthcare)
- [perfecXion/intentguard-legal]( perfecXion/intentguard-legal)

DeBERTa-v3-xsmall fine-tuned for three-way classification: **allow**, **deny**, or **abstain**. ONNX + INT8 quantized, under 80MB, p99 <30ms on CPU. Margin-based thresholds (not argmax) — uncertain queries route to clarification instead of forcing a guess.

**Eval results (adversarial test sets, ~470-480 examples per vertical):**

| Vertical | Accuracy | Legit-Block Rate | Off-Topic-Pass Rate |
|----------|----------|------------------|---------------------|
| Finance | 99.6% | 0.00% | 0.00% |
| Healthcare | 98.9% | 0.00% | 0.98% |
| Legal | 97.9% | 0.00% | 0.50% |

docker run -p 8080:8080 ghcr.io/perfecxion/intentguard:finance-latest

curl -X POST http://localhost:8080/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "What are current mortgage rates?"}]}'

Apache 2.0. Full pipeline + Docker configs on [GitHub](https://github.com/perfecxion-ai/intentguard).

Feedback welcome on domain coverage, adversarial robustness, and multilingual demand.

reactedto their post with 👀🚀 23 days ago

Post

1893

# SecureCode Dataset Family Update: 2,185 Security Examples, Framework-Specific Patterns, Clean Parquet Loading

Hey y'all,

Quick update on the SecureCode dataset family. We've restructured things and fixed several issues:

**What changed:**

- The datasets are now properly split into three repos: [unified]( scthornton/securecode) (2,185), [web]( scthornton/securecode-web) (1,378), [AI/ML]( scthornton/securecode-aiml) (750)
- All repos now use Parquet format -- load_dataset() just works, no deprecated loading scripts
- SecureCode Web now includes 219 framework-specific examples (Express, Django, Spring Boot, Flask, Rails, Laravel, ASP.NET Core, FastAPI, NestJS)
- Data cards have been corrected and split sizes fixed

**Why it matters:**

With AI-generated code accounting for 60%+ of some codebases (Checkmarx 2025), security training data is more important than ever. Every example in SecureCode is grounded in a real CVE with 4-turn conversations that mirror actual developer-AI workflows.

If you're working on code generation models, I'd love to hear how you're approaching the security angle. Are there vulnerability categories or frameworks you'd like to see covered?

Paper: [arxiv.org/abs/2512.18542](https://arxiv.org/abs/2512.18542)

posted an update 26 days ago

Post

1893

# SecureCode Dataset Family Update: 2,185 Security Examples, Framework-Specific Patterns, Clean Parquet Loading

Hey y'all,

Quick update on the SecureCode dataset family. We've restructured things and fixed several issues:

**What changed:**

- The datasets are now properly split into three repos: [unified]( scthornton/securecode) (2,185), [web]( scthornton/securecode-web) (1,378), [AI/ML]( scthornton/securecode-aiml) (750)
- All repos now use Parquet format -- load_dataset() just works, no deprecated loading scripts
- SecureCode Web now includes 219 framework-specific examples (Express, Django, Spring Boot, Flask, Rails, Laravel, ASP.NET Core, FastAPI, NestJS)
- Data cards have been corrected and split sizes fixed

**Why it matters:**

With AI-generated code accounting for 60%+ of some codebases (Checkmarx 2025), security training data is more important than ever. Every example in SecureCode is grounded in a real CVE with 4-turn conversations that mirror actual developer-AI workflows.

If you're working on code generation models, I'd love to hear how you're approaching the security angle. Are there vulnerability categories or frameworks you'd like to see covered?

Paper: [arxiv.org/abs/2512.18542](https://arxiv.org/abs/2512.18542)