Spaces:
Sleeping
title: FocusGuard
emoji: 👁️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Real-time webcam focus detection via MediaPipe + MLP/XGBoost
FocusGuard
Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming.
Team
Team name: FocusGuards (5CCSAGAP Large Group Project)
Members: Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas
Links
Project access
- Git repository: GAP_Large_project
- Deployed app (Hugging Face): FocusGuard/final_v2
- ClearML experiments: FocusGuards Large Group Project
Data and checkpoints
- Checkpoints (Google Drive): Download folder
- Dataset (Google Drive): Dataset folder
- Data consent form (PDF): Consent document
The deployed app contains the full feature set (session history, L2CS calibration, model selector, achievements).
Trained models
Model checkpoints are not included in the submission archive. Download them before running inference.
Option 1: Hugging Face Space
Pre-trained checkpoints are available in the Hugging Face Space files:
https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints
Download and place into checkpoints/:
| File | Description |
|---|---|
mlp_best.pt |
PyTorch MLP (10-64-32-2, ~2,850 params) |
xgboost_face_orientation_best.json |
XGBoost (600 trees, depth 8, lr 0.1489) |
scaler_mlp.joblib |
StandardScaler fit on training data |
hybrid_focus_config.json |
Hybrid pipeline fusion weights |
hybrid_combiner.joblib |
Hybrid combiner |
L2CSNet_gaze360.pkl |
L2CS-Net ResNet50 gaze weights (96 MB) |
Option 2: ClearML
Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project".
| Model | Task ID | Model ID |
|---|---|---|
| MLP | 3899b5aa0c3348b28213a3194322cdf7 |
56f94b799f624bdc845fa50c4d0606fe |
| XGBoost | c0ceb8e7e8194a51a7a31078cc47775c |
6727b8de334f4ca0961c46b436f6fb7c |
UI: Open a task on the experiments page, go to Artifacts > Output Models, and download.
Python:
from clearml import Model
mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe")
mlp_path = mlp.get_local_copy() # downloads .pt
xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c")
xgb_path = xgb.get_local_copy() # downloads .json
Copy the downloaded files into checkpoints/.
Option 3: Google Drive (submission fallback)
If ClearML access is restricted, download checkpoints from: https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link
Place all files under checkpoints/.
Option 4: Retrain from scratch
python -m models.mlp.train
python -m models.xgboost.train
This regenerates checkpoints/mlp_best.pt, checkpoints/xgboost_face_orientation_best.json, and scalers. Requires training data under data/collected_*/.
Project layout
config/
default.yaml hyperparameters, thresholds, app settings
__init__.py config loader + ClearML flattener
clearml_enrich.py ClearML task enrichment + artifact upload
data_preparation/
prepare_dataset.py load/split/scale .npz files (pooled + LOPO)
data_exploration.ipynb EDA: distributions, class balance, correlations
models/
face_mesh.py MediaPipe 478-point face landmarks
head_pose.py yaw/pitch/roll via solvePnP, face-orientation score
eye_scorer.py EAR, MAR, gaze ratios, PERCLOS
collect_features.py real-time feature extraction + webcam labelling CLI
gaze_calibration.py 9-point polynomial gaze calibration
gaze_eye_fusion.py fuses calibrated gaze with eye openness
mlp/ MLP training, eval, Optuna sweep
xgboost/ XGBoost training, eval, ClearML + Optuna sweeps
L2CS-Net/ vendored L2CS-Net (ResNet50, Gaze360)
checkpoints/ (excluded from archive; see download instructions above)
notebooks/
mlp.ipynb MLP training + LOPO in Jupyter
xgboost.ipynb XGBoost training + LOPO in Jupyter
evaluation/
justify_thresholds.py LOPO threshold + weight grid search
feature_importance.py XGBoost gain + leave-one-feature-out ablation
grouped_split_benchmark.py pooled vs LOPO comparison
plots/ ROC curves, confusion matrices, weight searches
logs/ JSON training logs
tests/
test_*.py unit + integration tests (pytest)
.coveragerc coverage config
ui/
pipeline.py all 5 pipeline classes + output smoothing
live_demo.py OpenCV webcam demo
src/ React (Vite) frontend source
static/ built frontend assets (after npm build)
main.py FastAPI application entry point
package.json frontend package manifest
requirements.txt
pytest.ini
Setup
Recommended versions:
- Python 3.10-3.11
- Node.js 18+ (needed only for frontend rebuild/dev)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
Then download checkpoints (see above).
If you need to rebuild frontend assets locally:
npm install
npm run build
mkdir -p static && cp -r dist/* static/
Run
Local OpenCV demo
python ui/live_demo.py
python ui/live_demo.py --xgb # XGBoost
Controls: m cycle mesh overlay, 1-5 switch pipeline mode, q quit.
Web app (without Docker)
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860
Web app (Docker)
docker-compose up # serves on port 7860
Data collection
python -m models.collect_features --name <participant>
Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to data/collected_<participant>/ as .npz files. Raw video is never stored.
9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository.
Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing
Pipeline
Webcam frame
--> MediaPipe Face Mesh (478 landmarks)
--> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation
--> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR
--> Gaze ratios: h_gaze, v_gaze, gaze_offset
--> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur
--> 17 features --> select 10 --> clip to physiological bounds
--> ML model (MLP / XGBoost) or geometric scorer
--> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45)
--> FOCUSED / UNFOCUSED
Five runtime modes share the same feature extraction backbone:
| Mode | Description |
|---|---|
| Geometric | Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg |
| XGBoost | 600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal) |
| MLP | PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal) |
| Hybrid | 30% MLP + 70% geometric ensemble (LOPO F1 = 0.841) |
| L2CS | Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained) |
Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto.
Training
Both scripts read all hyperparameters from config/default.yaml.
python -m models.mlp.train
python -m models.xgboost.train
Outputs: checkpoints/ (model + scaler) and evaluation/logs/ (CSVs, JSON summaries).
ClearML experiment tracking
USE_CLEARML=1 python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 python -m evaluation.justify_thresholds --clearml
Logs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA).
Reference experiment IDs:
| Model | ClearML experiment ID |
|---|---|
MLP (models.mlp.train) |
3899b5aa0c3348b28213a3194322cdf7 |
XGBoost (models.xgboost.train) |
c0ceb8e7e8194a51a7a31078cc47775c |
Evaluation
python -m evaluation.justify_thresholds # LOPO threshold + weight search
python -m evaluation.grouped_split_benchmark # pooled vs LOPO comparison
python -m evaluation.feature_importance # XGBoost gain + LOFO ablation
Results (pooled random split, 15% test)
| Model | Accuracy | F1 | ROC-AUC |
|---|---|---|---|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64-32) | 92.92% | 0.929 | 0.971 |
Results (LOPO, 9 participants)
| Model | LOPO AUC | Best threshold (Youden's J) | F1 at best threshold |
|---|---|---|---|
| MLP | 0.862 | 0.228 | 0.858 |
| XGBoost | 0.870 | 0.280 | 0.855 |
Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820). Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841).
The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric.
Feature ablation
| Channel subset | Mean LOPO F1 |
|---|---|
| All 10 features | 0.829 |
| Eye state only | 0.807 |
| Head pose only | 0.748 |
| Gaze only | 0.726 |
Top-5 XGBoost gain: s_face (10.27), ear_right (9.54), head_deviation (8.83), ear_avg (6.96), perclos (5.68).
L2CS Gaze Tracking
L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander.
Standalone mode: Select L2CS as the model.
Boost mode: Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto.
Calibration: Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction.
L2CS weight lookup order in runtime:
checkpoints/L2CSNet_gaze360.pklmodels/L2CS-Net/models/L2CSNet_gaze360.pklmodels/L2CSNet_gaze360.pkl
Config
All hyperparameters and app settings are in config/default.yaml. Override with FOCUSGUARD_CONFIG=/path/to/custom.yaml.
Tests
Included checks:
- data prep helpers and real split consistency (
test_data_preparation.py; split test skips ifdata/collected_*/*.npzis absent) - feature clipping (
test_models_clip_features.py) - pipeline integration (
test_pipeline_integration.py) - gaze calibration / fusion diagnostics (
test_gaze_pipeline.py) - FastAPI health, settings, sessions (
test_health_endpoint.py,test_api_settings.py,test_api_sessions.py)
pytest
Coverage is enabled by default via pytest.ini (--cov / term report). For HTML coverage: pytest --cov-report=html.
Stack: Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest.
