Spaces:
Sleeping
title: SmolLM2 Customs ADI
emoji: π€
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: true
short_description: DEMO β Build your own free LLM service
SmolLM2 Customs β Build Your Own LLM Service
A showcase: how to build a free, private, OpenAI-compatible LLM service on HuggingFace Spaces and plug it into any hub or application β no GPU, no money, no drama.
This project is under active development β always use the latest release from Codey Lab (more stable builds land there first). This repo (DEV-STATUS) is where the chaos happens. π¬ A β on the repos would be cool π
What is this?
A minimal but production-ready LLM service built on:
- SmolLM2-360M-Instruct β 269MB, Apache 2.0, runs on 2 CPUs for free
- FastAPI β OpenAI-compatible
/v1/chat/completionsendpoint - ADI (Anti-Dump Index) β filters low-quality requests before they hit the model
- HF Dataset β logs every request for later analysis and finetuning
The point is not the model β the point is the pattern. Fork it, swap SmolLM2 for any model you want, and you have your own private LLM API running for free.
How it works
Request
β
ADI Score (is this request worth answering?)
β
REJECT β returns improvement suggestions, logs to dataset
MEDIUM/HIGH β SmolLM2 answers, logs to dataset
SmolLM2 fails β returns 503 β hub fallback chain kicks in
Endpoints
GET / β status
GET /v1/health β health check
POST /v1/chat/completions β OpenAI-compatible inference
Plug into any Hub (one config block)
Works out of the box with Multi-LLM-API-Gateway: Hub Screenshot for this SmolLM2
[LLM_PROVIDER.smollm]
active = "true"
base_url = "https://YOUR-USERNAME-smollm2-customs.hf.space/v1"
env_key = "SMOLLM_API_KEY"
default_model = "smollm2-360m"
models = "smollm2-360m, YOUR-USERNAME/your-finetuned-model"
fallback_to = "gemini"
[LLM_PROVIDER.smollm_END]
Any OpenAI-compatible client works the same way.
Secrets (HF Space Settings)
| Secret | Required | Description |
|---|---|---|
SMOLLM_API_KEY |
recommended | Locks the endpoint β set same value in your hub |
HF_TOKEN or TEST_TOKEN |
optional | HF auth for dataset + model repo access |
MODEL_REPO |
optional | Base model override (default: HuggingFaceTB/SmolLM2-360M-Instruct) |
DATASET_REPO |
optional | Your private HF dataset for logging |
PRIVATE_MODEL_REPO |
optional | Your private model repo for finetuned weights |
Auth modes:
SMOLLM_API_KEY not set β open access (demo/showcase mode)
SMOLLM_API_KEY set β protected (production mode)
Space private β double protection (HF gate + your key)
ADI Routing
| Decision | Action |
|---|---|
HIGH_PRIORITY |
SmolLM2 handles it |
MEDIUM_PRIORITY |
SmolLM2 handles it |
REJECT |
Returns suggestions, logs to dataset |
| SmolLM2 fails | 503 β hub fallback chain |
Training Utilities
Every request is logged to your private HF dataset. Use it to improve over time:
python train.py --mode export # export dataset β JSONL
python train.py --mode validate # validate ADI weights against labeled data
python train.py --mode finetune # finetune SmolLM2 on your data (coming soon)
Once you have enough data β finetune β push to your private model repo β Space loads it automatically next restart.
Stack
| Component | What it does |
|---|---|
main.py |
FastAPI, auth, routing |
smollm.py |
Inference engine, lazy loading |
model.py |
HF token resolution, dataset + model repo access |
adi.py |
Request quality scoring |
train.py |
Dataset export, ADI validation, finetuning |
Part of
- Multi-LLM-API-Gateway β the hub this was built for
- Anti-Dump-Index β the ADI algorithm idea
License
Dual-licensed:
- Apache License 2.0
- Ethical Security Operations License v1.1 (ESOL) β mandatory, non-severable
By using this software you agree to all ethical constraints defined in ESOL v1.1.