SmolLM2-ADI / README.md
Alibrown's picture
Update README.md
d92b427 verified
metadata
title: SmolLM2 Customs ADI
emoji: πŸ€–
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: true
short_description: DEMO β€” Build your own free LLM service

SmolLM2 Customs β€” Build Your Own LLM Service

A showcase: how to build a free, private, OpenAI-compatible LLM service on HuggingFace Spaces and plug it into any hub or application β€” no GPU, no money, no drama.

This project is under active development β€” always use the latest release from Codey Lab (more stable builds land there first). This repo (DEV-STATUS) is where the chaos happens. πŸ”¬ A ⭐ on the repos would be cool πŸ˜™


What is this?

A minimal but production-ready LLM service built on:

  • SmolLM2-360M-Instruct β€” 269MB, Apache 2.0, runs on 2 CPUs for free
  • FastAPI β€” OpenAI-compatible /v1/chat/completions endpoint
  • ADI (Anti-Dump Index) β€” filters low-quality requests before they hit the model
  • HF Dataset β€” logs every request for later analysis and finetuning

The point is not the model β€” the point is the pattern. Fork it, swap SmolLM2 for any model you want, and you have your own private LLM API running for free.


How it works

Request
    ↓
ADI Score (is this request worth answering?)
    ↓
REJECT        β†’ returns improvement suggestions, logs to dataset
MEDIUM/HIGH   β†’ SmolLM2 answers, logs to dataset
SmolLM2 fails β†’ returns 503 β†’ hub fallback chain kicks in

Endpoints

GET  /                       β†’ status
GET  /v1/health              β†’ health check
POST /v1/chat/completions    β†’ OpenAI-compatible inference

Plug into any Hub (one config block)

Works out of the box with Multi-LLM-API-Gateway: Hub Screenshot for this SmolLM2

[LLM_PROVIDER.smollm]
active        = "true"
base_url      = "https://YOUR-USERNAME-smollm2-customs.hf.space/v1"
env_key       = "SMOLLM_API_KEY"
default_model = "smollm2-360m"
models        = "smollm2-360m, YOUR-USERNAME/your-finetuned-model"
fallback_to   = "gemini"
[LLM_PROVIDER.smollm_END]

Any OpenAI-compatible client works the same way.


Secrets (HF Space Settings)

Secret Required Description
SMOLLM_API_KEY recommended Locks the endpoint β€” set same value in your hub
HF_TOKEN or TEST_TOKEN optional HF auth for dataset + model repo access
MODEL_REPO optional Base model override (default: HuggingFaceTB/SmolLM2-360M-Instruct)
DATASET_REPO optional Your private HF dataset for logging
PRIVATE_MODEL_REPO optional Your private model repo for finetuned weights

Auth modes:

SMOLLM_API_KEY not set  β†’ open access (demo/showcase mode)
SMOLLM_API_KEY set      β†’ protected (production mode)
Space private           β†’ double protection (HF gate + your key)

ADI Routing

Decision Action
HIGH_PRIORITY SmolLM2 handles it
MEDIUM_PRIORITY SmolLM2 handles it
REJECT Returns suggestions, logs to dataset
SmolLM2 fails 503 β†’ hub fallback chain

Training Utilities

Every request is logged to your private HF dataset. Use it to improve over time:

python train.py --mode export    # export dataset β†’ JSONL
python train.py --mode validate  # validate ADI weights against labeled data
python train.py --mode finetune  # finetune SmolLM2 on your data (coming soon)

Once you have enough data β†’ finetune β†’ push to your private model repo β†’ Space loads it automatically next restart.


Stack

Component What it does
main.py FastAPI, auth, routing
smollm.py Inference engine, lazy loading
model.py HF token resolution, dataset + model repo access
adi.py Request quality scoring
train.py Dataset export, ADI validation, finetuning

Part of

License

Dual-licensed:

By using this software you agree to all ethical constraints defined in ESOL v1.1.