test_final / README.md
Abdelrahman Almatrooshi
fix: quote HF short_description for valid YAML
2e034be
metadata
title: Focus Guard Final v2
emoji: 🎯
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Focus detection β€” MediaPipe, MLP/XGB, L2CS, FastAPI

FocusGuard

Webcam-based focus detection: MediaPipe face mesh β†’ 17 features (EAR, gaze, head pose, PERCLOS, etc.) β†’ MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video.

Repository: KCL GAP project (internal) β€” adjust link if you publish a public mirror.

Project layout

β”œβ”€β”€ data/                 collected_<name>/*.npz
β”œβ”€β”€ data_preparation/     loaders, split, scale
β”œβ”€β”€ notebooks/            MLP/XGB training + LOPO
β”œβ”€β”€ models/               face_mesh, head_pose, eye_scorer, train scripts
β”‚   β”œβ”€β”€ gaze_calibration.py   9-point polynomial gaze calibration
β”‚   β”œβ”€β”€ gaze_eye_fusion.py    Fuses calibrated gaze with eye openness
β”‚   └── L2CS-Net/              In-tree L2CS-Net repo with Gaze360 weights
β”œβ”€β”€ checkpoints/          mlp_best.pt, xgboost_*_best.json, scalers
β”œβ”€β”€ evaluation/           logs, plots, justify_thresholds
β”œβ”€β”€ ui/                   pipeline.py, live_demo.py
β”œβ”€β”€ src/                  React frontend
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ FocusPageLocal.jsx      Main focus page (camera, controls, model selector)
β”‚   β”‚   └── CalibrationOverlay.jsx  Fullscreen calibration UI
β”‚   └── utils/
β”‚       └── VideoManagerLocal.js    WebSocket client, frame capture, canvas rendering
β”œβ”€β”€ static/               built frontend (after npm run build)
β”œβ”€β”€ main.py, app.py       FastAPI backend
β”œβ”€β”€ requirements.txt
└── package.json

Config

Hyperparameters and app settings live in config/default.yaml (learning rates, batch size, thresholds, L2CS weights, etc.). Override with env FOCUSGUARD_CONFIG pointing to another YAML.

Setup

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

To rebuild the frontend after changes:

npm install
npm run build
mkdir -p static && cp -r dist/* static/

Run

Web app: Use the venv and run uvicorn via Python so it picks up your deps (otherwise you get ModuleNotFoundError: aiosqlite):

source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860

Then open http://localhost:7860.

Frontend dev server (optional, for React development):

npm run dev

OpenCV demo:

python ui/live_demo.py
python ui/live_demo.py --xgb

Train:

python -m models.mlp.train
python -m models.xgboost.train

ClearML experiment tracking

All training and evaluation config (from config/default.yaml) is exposed as ClearML task parameters. Enable logging with USE_CLEARML=1; optionally run on a remote GPU agent instead of locally:

USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m evaluation.justify_thresholds --clearml

The script enqueues the task and exits; a clearml-agent listening on the named queue (e.g. gpu) runs the same command with the same parameters. Start an agent with:

clearml-agent daemon --queue gpu

Logged to ClearML: parameters (full flattened config), scalars (loss, accuracy, F1, ROC-AUC, per-class precision/recall/F1, dataset sizes and class counts), artifacts (best checkpoint, training log JSON), and plots (confusion matrix, ROC curves in evaluation).

Data

9 participants, 144,793 samples, 10 features, binary labels. Collect with python -m models.collect_features --name <name>. Data lives in data/collected_<name>/.

Train/val/test split: All pooled training and evaluation use the same split for reproducibility. The test set is held out before any preprocessing; StandardScaler is fit on the training set only, then applied to val and test. Split ratios and random seed come from config/default.yaml (data.split_ratios, mlp.seed) via data_preparation.prepare_dataset.get_default_split_config(). MLP train, XGBoost train, eval_accuracy scripts, and benchmarks all use this single source so reported test accuracy is on the same held-out set.

Models

Model What it uses Best for
Geometric Head pose angles + eye aspect ratio (EAR) Fast, no ML needed
XGBoost Trained classifier on head/eye features (600 trees, depth 8) Balanced accuracy/speed
MLP Neural network on same features (64β†’32) Higher accuracy
Hybrid Weighted MLP + Geometric ensemble Best head-pose accuracy
L2CS Deep gaze estimation (ResNet50, Gaze360 weights) Detects eye-only gaze shifts

Model numbers (15% test split)

Model Accuracy F1 ROC-AUC
XGBoost (600 trees, depth 8) 95.87% 0.959 0.991
MLP (64β†’32) 92.92% 0.929 0.971

Model numbers (LOPO, 9 participants)

Model LOPO AUC Best threshold (Youden's J) F1 @ best threshold F1 @ 0.50
MLP 0.8624 0.228 0.8578 0.8149
XGBoost 0.8695 0.280 0.8549 0.8324

From the latest python -m evaluation.justify_thresholds run:

  • Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.8195)
  • Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.8409)

Grouped vs pooled benchmark

Latest quick benchmark (python -m evaluation.grouped_split_benchmark --quick) shows the expected gap between pooled random split and person-held-out LOPO:

Protocol Accuracy F1 (weighted) ROC-AUC
Pooled random split 0.9510 0.9507 0.9869
Grouped LOPO (9 folds) 0.8303 0.8304 0.8801

This is why LOPO is the primary generalisation metric for reporting.

Feature ablation snapshot

Latest quick feature-selection run (python -m evaluation.feature_importance --quick --skip-lofo):

Subset Mean LOPO F1
all_10 0.8286
eye_state 0.8071
head_pose 0.7480
gaze 0.7260

Top-5 XGBoost gain features: s_face, ear_right, head_deviation, ear_avg, perclos. For full leave-one-feature-out ablation, run python -m evaluation.feature_importance (slower).

L2CS Gaze Tracking

L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander.

Standalone mode

Select L2CS as the model β€” it handles everything.

Boost mode

Select any other model, then click the GAZE toggle. L2CS runs alongside the base model:

  • Base model handles head pose and eye openness (35% weight)
  • L2CS handles gaze direction (65% weight)
  • If L2CS detects gaze is clearly off-screen, it vetoes the base model regardless of score

Calibration

After enabling L2CS or Gaze Boost, click Calibrate while a session is running:

  1. A fullscreen overlay shows 9 target dots (3Γ—3 grid)
  2. Look at each dot as the progress ring fills
  3. The first dot (centre) sets your baseline gaze offset
  4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates
  5. A cyan tracking dot appears on the video showing where you're looking

Pipeline

  1. Face mesh (MediaPipe 478 pts)
  2. Head pose β†’ yaw, pitch, roll, scores, gaze offset
  3. Eye scorer β†’ EAR, gaze ratio, MAR
  4. Temporal β†’ PERCLOS, blink rate, yawn
  5. 10-d vector β†’ MLP or XGBoost β†’ focused / unfocused

Stack: FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net.