Spaces:

FocusGuard
/

final_test

Sleeping

App Files Files Community

final_test / README.md

Abdelrahman Almatrooshi

Deploy snapshot from main b7a59b11809483dfc959f196f1930240f2662c49

22a6915 9 days ago

preview code

raw

history blame contribute delete

12.9 kB

metadata

title: FocusGuard
emoji: 👁️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Real-time webcam focus detection via MediaPipe + MLP/XGBoost

FocusGuard

Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming.

Team

Team name: FocusGuards (5CCSAGAP Large Group Project)

Members: Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas

Trained models

Model checkpoints are not included in the submission archive. Download them before running inference.

Option 1: Hugging Face Space

Pre-trained checkpoints are available in the Hugging Face Space files:

https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints

Download and place into checkpoints/:

File	Description
`mlp_best.pt`	PyTorch MLP (10-64-32-2, ~2,850 params)
`xgboost_face_orientation_best.json`	XGBoost (600 trees, depth 8, lr 0.1489)
`scaler_mlp.joblib`	StandardScaler fit on training data
`hybrid_focus_config.json`	Hybrid pipeline fusion weights
`hybrid_combiner.joblib`	Hybrid combiner
`L2CSNet_gaze360.pkl`	L2CS-Net ResNet50 gaze weights (96 MB)

Option 2: ClearML

Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project".

Model	Task ID	Model ID
MLP	`3899b5aa0c3348b28213a3194322cdf7`	`56f94b799f624bdc845fa50c4d0606fe`
XGBoost	`c0ceb8e7e8194a51a7a31078cc47775c`	`6727b8de334f4ca0961c46b436f6fb7c`

UI: Open a task on the experiments page, go to Artifacts > Output Models, and download.

Python:

from clearml import Model

mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe")
mlp_path = mlp.get_local_copy()   # downloads .pt

xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c")
xgb_path = xgb.get_local_copy()   # downloads .json

Copy the downloaded files into checkpoints/.

Option 3: Google Drive (submission fallback)

If ClearML access is restricted, download checkpoints from: https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link

Place all files under checkpoints/.

Option 4: Retrain from scratch

python -m models.mlp.train
python -m models.xgboost.train

This regenerates checkpoints/mlp_best.pt, checkpoints/xgboost_face_orientation_best.json, and scalers. Requires training data under data/collected_*/.

Project layout

config/
    default.yaml              hyperparameters, thresholds, app settings
    __init__.py               config loader + ClearML flattener
    clearml_enrich.py         ClearML task enrichment + artifact upload
data_preparation/
    prepare_dataset.py        load/split/scale .npz files (pooled + LOPO)
    data_exploration.ipynb    EDA: distributions, class balance, correlations
models/
    face_mesh.py              MediaPipe 478-point face landmarks
    head_pose.py              yaw/pitch/roll via solvePnP, face-orientation score
    eye_scorer.py             EAR, MAR, gaze ratios, PERCLOS
    collect_features.py       real-time feature extraction + webcam labelling CLI
    gaze_calibration.py       9-point polynomial gaze calibration
    gaze_eye_fusion.py        fuses calibrated gaze with eye openness
    mlp/                      MLP training, eval, Optuna sweep
    xgboost/                  XGBoost training, eval, ClearML + Optuna sweeps
    L2CS-Net/                 vendored L2CS-Net (ResNet50, Gaze360)
checkpoints/                  (excluded from archive; see download instructions above)
notebooks/
    mlp.ipynb                 MLP training + LOPO in Jupyter
    xgboost.ipynb             XGBoost training + LOPO in Jupyter
evaluation/
    justify_thresholds.py     LOPO threshold + weight grid search
    feature_importance.py     XGBoost gain + leave-one-feature-out ablation
    grouped_split_benchmark.py  pooled vs LOPO comparison
    plots/                    ROC curves, confusion matrices, weight searches
    logs/                     JSON training logs
tests/
    test_*.py                 unit + integration tests (pytest)
    .coveragerc               coverage config
ui/
    pipeline.py               all 5 pipeline classes + output smoothing
    live_demo.py              OpenCV webcam demo
src/                          React (Vite) frontend source
static/                       built frontend assets (after npm build)
main.py                       FastAPI application entry point
package.json                  frontend package manifest
requirements.txt
pytest.ini

Setup

Recommended versions:

Python 3.10-3.11
Node.js 18+ (needed only for frontend rebuild/dev)

python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt

Then download checkpoints (see above).

If you need to rebuild frontend assets locally:

npm install
npm run build
mkdir -p static && cp -r dist/* static/

Run

Local OpenCV demo

python ui/live_demo.py        
python ui/live_demo.py --xgb    # XGBoost

Controls: m cycle mesh overlay, 1-5 switch pipeline mode, q quit.

Web app (without Docker)

source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860

Open http://localhost:7860

Web app (Docker)

docker-compose up               # serves on port 7860

Data collection

python -m models.collect_features --name <participant>

Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to data/collected_<participant>/ as .npz files. Raw video is never stored.

9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository.

Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing

Pipeline

Webcam frame
  --> MediaPipe Face Mesh (478 landmarks)
    --> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation
    --> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR
    --> Gaze ratios: h_gaze, v_gaze, gaze_offset
    --> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur
  --> 17 features --> select 10 --> clip to physiological bounds
  --> ML model (MLP / XGBoost) or geometric scorer
  --> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45)
  --> FOCUSED / UNFOCUSED

Five runtime modes share the same feature extraction backbone:

Mode	Description
Geometric	Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg
XGBoost	600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal)
MLP	PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal)
Hybrid	30% MLP + 70% geometric ensemble (LOPO F1 = 0.841)
L2CS	Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained)

Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto.

Training

Both scripts read all hyperparameters from config/default.yaml.

python -m models.mlp.train
python -m models.xgboost.train

Outputs: checkpoints/ (model + scaler) and evaluation/logs/ (CSVs, JSON summaries).

ClearML experiment tracking

USE_CLEARML=1 python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 python -m evaluation.justify_thresholds --clearml

Logs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA).

Reference experiment IDs:

Model	ClearML experiment ID
MLP (`models.mlp.train`)	`3899b5aa0c3348b28213a3194322cdf7`
XGBoost (`models.xgboost.train`)	`c0ceb8e7e8194a51a7a31078cc47775c`

Evaluation

python -m evaluation.justify_thresholds          # LOPO threshold + weight search
python -m evaluation.grouped_split_benchmark     # pooled vs LOPO comparison
python -m evaluation.feature_importance          # XGBoost gain + LOFO ablation

Results (pooled random split, 15% test)

Model	Accuracy	F1	ROC-AUC
XGBoost (600 trees, depth 8)	95.87%	0.959	0.991
MLP (64-32)	92.92%	0.929	0.971

Results (LOPO, 9 participants)

Model	LOPO AUC	Best threshold (Youden's J)	F1 at best threshold
MLP	0.862	0.228	0.858
XGBoost	0.870	0.280	0.855

Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820). Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841).

The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric.

Feature ablation

Channel subset	Mean LOPO F1
All 10 features	0.829
Eye state only	0.807
Head pose only	0.748
Gaze only	0.726

Top-5 XGBoost gain: s_face (10.27), ear_right (9.54), head_deviation (8.83), ear_avg (6.96), perclos (5.68).

L2CS Gaze Tracking

L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander.

Standalone mode: Select L2CS as the model.

Boost mode: Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto.

Calibration: Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction.

L2CS weight lookup order in runtime:

checkpoints/L2CSNet_gaze360.pkl
models/L2CS-Net/models/L2CSNet_gaze360.pkl
models/L2CSNet_gaze360.pkl

Config

All hyperparameters and app settings are in config/default.yaml. Override with FOCUSGUARD_CONFIG=/path/to/custom.yaml.

Tests

Included checks:

data prep helpers and real split consistency (test_data_preparation.py; split test skips if data/collected_*/*.npz is absent)
feature clipping (test_models_clip_features.py)
pipeline integration (test_pipeline_integration.py)
gaze calibration / fusion diagnostics (test_gaze_pipeline.py)
FastAPI health, settings, sessions (test_health_endpoint.py, test_api_settings.py, test_api_sessions.py)

pytest

Coverage is enabled by default via pytest.ini (--cov / term report). For HTML coverage: pytest --cov-report=html.

Stack: Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest.