final_test / README.md
Abdelrahman Almatrooshi
Deploy snapshot from main b7a59b11809483dfc959f196f1930240f2662c49
22a6915
metadata
title: FocusGuard
emoji: 👁️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Real-time webcam focus detection via MediaPipe + MLP/XGBoost

FocusGuard

Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming.

Real-time focus detection with face mesh and XGBoost classification


Team

Team name: FocusGuards (5CCSAGAP Large Group Project)

Members: Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas


Links

Project access

Data and checkpoints

The deployed app contains the full feature set (session history, L2CS calibration, model selector, achievements).


Trained models

Model checkpoints are not included in the submission archive. Download them before running inference.

Option 1: Hugging Face Space

Pre-trained checkpoints are available in the Hugging Face Space files:

https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints

Download and place into checkpoints/:

File Description
mlp_best.pt PyTorch MLP (10-64-32-2, ~2,850 params)
xgboost_face_orientation_best.json XGBoost (600 trees, depth 8, lr 0.1489)
scaler_mlp.joblib StandardScaler fit on training data
hybrid_focus_config.json Hybrid pipeline fusion weights
hybrid_combiner.joblib Hybrid combiner
L2CSNet_gaze360.pkl L2CS-Net ResNet50 gaze weights (96 MB)

Option 2: ClearML

Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project".

Model Task ID Model ID
MLP 3899b5aa0c3348b28213a3194322cdf7 56f94b799f624bdc845fa50c4d0606fe
XGBoost c0ceb8e7e8194a51a7a31078cc47775c 6727b8de334f4ca0961c46b436f6fb7c

UI: Open a task on the experiments page, go to Artifacts > Output Models, and download.

Python:

from clearml import Model

mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe")
mlp_path = mlp.get_local_copy()   # downloads .pt

xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c")
xgb_path = xgb.get_local_copy()   # downloads .json

Copy the downloaded files into checkpoints/.

Option 3: Google Drive (submission fallback)

If ClearML access is restricted, download checkpoints from: https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link

Place all files under checkpoints/.

Option 4: Retrain from scratch

python -m models.mlp.train
python -m models.xgboost.train

This regenerates checkpoints/mlp_best.pt, checkpoints/xgboost_face_orientation_best.json, and scalers. Requires training data under data/collected_*/.


Project layout

config/
    default.yaml              hyperparameters, thresholds, app settings
    __init__.py               config loader + ClearML flattener
    clearml_enrich.py         ClearML task enrichment + artifact upload
data_preparation/
    prepare_dataset.py        load/split/scale .npz files (pooled + LOPO)
    data_exploration.ipynb    EDA: distributions, class balance, correlations
models/
    face_mesh.py              MediaPipe 478-point face landmarks
    head_pose.py              yaw/pitch/roll via solvePnP, face-orientation score
    eye_scorer.py             EAR, MAR, gaze ratios, PERCLOS
    collect_features.py       real-time feature extraction + webcam labelling CLI
    gaze_calibration.py       9-point polynomial gaze calibration
    gaze_eye_fusion.py        fuses calibrated gaze with eye openness
    mlp/                      MLP training, eval, Optuna sweep
    xgboost/                  XGBoost training, eval, ClearML + Optuna sweeps
    L2CS-Net/                 vendored L2CS-Net (ResNet50, Gaze360)
checkpoints/                  (excluded from archive; see download instructions above)
notebooks/
    mlp.ipynb                 MLP training + LOPO in Jupyter
    xgboost.ipynb             XGBoost training + LOPO in Jupyter
evaluation/
    justify_thresholds.py     LOPO threshold + weight grid search
    feature_importance.py     XGBoost gain + leave-one-feature-out ablation
    grouped_split_benchmark.py  pooled vs LOPO comparison
    plots/                    ROC curves, confusion matrices, weight searches
    logs/                     JSON training logs
tests/
    test_*.py                 unit + integration tests (pytest)
    .coveragerc               coverage config
ui/
    pipeline.py               all 5 pipeline classes + output smoothing
    live_demo.py              OpenCV webcam demo
src/                          React (Vite) frontend source
static/                       built frontend assets (after npm build)
main.py                       FastAPI application entry point
package.json                  frontend package manifest
requirements.txt
pytest.ini

Setup

Recommended versions:

  • Python 3.10-3.11
  • Node.js 18+ (needed only for frontend rebuild/dev)
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt

Then download checkpoints (see above).

If you need to rebuild frontend assets locally:

npm install
npm run build
mkdir -p static && cp -r dist/* static/

Run

Local OpenCV demo

python ui/live_demo.py        
python ui/live_demo.py --xgb    # XGBoost

Controls: m cycle mesh overlay, 1-5 switch pipeline mode, q quit.

Web app (without Docker)

source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860

Open http://localhost:7860

Web app (Docker)

docker-compose up               # serves on port 7860

Data collection

python -m models.collect_features --name <participant>

Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to data/collected_<participant>/ as .npz files. Raw video is never stored.

9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository.

Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing


Pipeline

Webcam frame
  --> MediaPipe Face Mesh (478 landmarks)
    --> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation
    --> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR
    --> Gaze ratios: h_gaze, v_gaze, gaze_offset
    --> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur
  --> 17 features --> select 10 --> clip to physiological bounds
  --> ML model (MLP / XGBoost) or geometric scorer
  --> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45)
  --> FOCUSED / UNFOCUSED

Five runtime modes share the same feature extraction backbone:

Mode Description
Geometric Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg
XGBoost 600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal)
MLP PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal)
Hybrid 30% MLP + 70% geometric ensemble (LOPO F1 = 0.841)
L2CS Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained)

Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto.


Training

Both scripts read all hyperparameters from config/default.yaml.

python -m models.mlp.train
python -m models.xgboost.train

Outputs: checkpoints/ (model + scaler) and evaluation/logs/ (CSVs, JSON summaries).

ClearML experiment tracking

USE_CLEARML=1 python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 python -m evaluation.justify_thresholds --clearml

Logs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA).

Reference experiment IDs:

Model ClearML experiment ID
MLP (models.mlp.train) 3899b5aa0c3348b28213a3194322cdf7
XGBoost (models.xgboost.train) c0ceb8e7e8194a51a7a31078cc47775c

Evaluation

python -m evaluation.justify_thresholds          # LOPO threshold + weight search
python -m evaluation.grouped_split_benchmark     # pooled vs LOPO comparison
python -m evaluation.feature_importance          # XGBoost gain + LOFO ablation

Results (pooled random split, 15% test)

Model Accuracy F1 ROC-AUC
XGBoost (600 trees, depth 8) 95.87% 0.959 0.991
MLP (64-32) 92.92% 0.929 0.971

Results (LOPO, 9 participants)

Model LOPO AUC Best threshold (Youden's J) F1 at best threshold
MLP 0.862 0.228 0.858
XGBoost 0.870 0.280 0.855

Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820). Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841).

The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric.

Feature ablation

Channel subset Mean LOPO F1
All 10 features 0.829
Eye state only 0.807
Head pose only 0.748
Gaze only 0.726

Top-5 XGBoost gain: s_face (10.27), ear_right (9.54), head_deviation (8.83), ear_avg (6.96), perclos (5.68).


L2CS Gaze Tracking

L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander.

Standalone mode: Select L2CS as the model.

Boost mode: Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto.

Calibration: Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction.

L2CS weight lookup order in runtime:

  1. checkpoints/L2CSNet_gaze360.pkl
  2. models/L2CS-Net/models/L2CSNet_gaze360.pkl
  3. models/L2CSNet_gaze360.pkl

Config

All hyperparameters and app settings are in config/default.yaml. Override with FOCUSGUARD_CONFIG=/path/to/custom.yaml.


Tests

Included checks:

  • data prep helpers and real split consistency (test_data_preparation.py; split test skips if data/collected_*/*.npz is absent)
  • feature clipping (test_models_clip_features.py)
  • pipeline integration (test_pipeline_integration.py)
  • gaze calibration / fusion diagnostics (test_gaze_pipeline.py)
  • FastAPI health, settings, sessions (test_health_endpoint.py, test_api_settings.py, test_api_sessions.py)
pytest

Coverage is enabled by default via pytest.ini (--cov / term report). For HTML coverage: pytest --cov-report=html.

Stack: Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest.