--- title: FocusGuard emoji: 👁️ colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false short_description: Real-time webcam focus detection via MediaPipe + MLP/XGBoost --- # FocusGuard Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming. ![Real-time focus detection with face mesh and XGBoost classification](assets/focusguard-demo.gif) --- ## Team **Team name:** FocusGuards (5CCSAGAP Large Group Project) **Members:** Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas --- ## Links ### Project access - Git repository: [GAP_Large_project](https://github.kcl.ac.uk/k23172173/GAP_Large_project) - Deployed app (Hugging Face): [FocusGuard/final_v2](https://huggingface.co/spaces/FocusGuard/final_v2) - ClearML experiments: [FocusGuards Large Group Project](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments) ### Data and checkpoints - Checkpoints (Google Drive): [Download folder](https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link) - Dataset (Google Drive): [Dataset folder](https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing) - Data consent form (PDF): [Consent document](https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link) The deployed app contains the full feature set (session history, L2CS calibration, model selector, achievements). --- ## Trained models Model checkpoints are **not included** in the submission archive. Download them before running inference. ### Option 1: Hugging Face Space Pre-trained checkpoints are available in the Hugging Face Space files: ``` https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints ``` Download and place into `checkpoints/`: | File | Description | |------|-------------| | `mlp_best.pt` | PyTorch MLP (10-64-32-2, ~2,850 params) | | `xgboost_face_orientation_best.json` | XGBoost (600 trees, depth 8, lr 0.1489) | | `scaler_mlp.joblib` | StandardScaler fit on training data | | `hybrid_focus_config.json` | Hybrid pipeline fusion weights | | `hybrid_combiner.joblib` | Hybrid combiner | | `L2CSNet_gaze360.pkl` | L2CS-Net ResNet50 gaze weights (96 MB) | ### Option 2: ClearML Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project". | Model | Task ID | Model ID | |-------|---------|----------| | MLP | `3899b5aa0c3348b28213a3194322cdf7` | `56f94b799f624bdc845fa50c4d0606fe` | | XGBoost | `c0ceb8e7e8194a51a7a31078cc47775c` | `6727b8de334f4ca0961c46b436f6fb7c` | **UI:** Open a task on the [experiments page](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments), go to Artifacts > Output Models, and download. **Python:** ```python from clearml import Model mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe") mlp_path = mlp.get_local_copy() # downloads .pt xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c") xgb_path = xgb.get_local_copy() # downloads .json ``` Copy the downloaded files into `checkpoints/`. ### Option 3: Google Drive (submission fallback) If ClearML access is restricted, download checkpoints from: https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link Place all files under `checkpoints/`. ### Option 4: Retrain from scratch ```bash python -m models.mlp.train python -m models.xgboost.train ``` This regenerates `checkpoints/mlp_best.pt`, `checkpoints/xgboost_face_orientation_best.json`, and scalers. Requires training data under `data/collected_*/`. --- ## Project layout ``` config/ default.yaml hyperparameters, thresholds, app settings __init__.py config loader + ClearML flattener clearml_enrich.py ClearML task enrichment + artifact upload data_preparation/ prepare_dataset.py load/split/scale .npz files (pooled + LOPO) data_exploration.ipynb EDA: distributions, class balance, correlations models/ face_mesh.py MediaPipe 478-point face landmarks head_pose.py yaw/pitch/roll via solvePnP, face-orientation score eye_scorer.py EAR, MAR, gaze ratios, PERCLOS collect_features.py real-time feature extraction + webcam labelling CLI gaze_calibration.py 9-point polynomial gaze calibration gaze_eye_fusion.py fuses calibrated gaze with eye openness mlp/ MLP training, eval, Optuna sweep xgboost/ XGBoost training, eval, ClearML + Optuna sweeps L2CS-Net/ vendored L2CS-Net (ResNet50, Gaze360) checkpoints/ (excluded from archive; see download instructions above) notebooks/ mlp.ipynb MLP training + LOPO in Jupyter xgboost.ipynb XGBoost training + LOPO in Jupyter evaluation/ justify_thresholds.py LOPO threshold + weight grid search feature_importance.py XGBoost gain + leave-one-feature-out ablation grouped_split_benchmark.py pooled vs LOPO comparison plots/ ROC curves, confusion matrices, weight searches logs/ JSON training logs tests/ test_*.py unit + integration tests (pytest) .coveragerc coverage config ui/ pipeline.py all 5 pipeline classes + output smoothing live_demo.py OpenCV webcam demo src/ React (Vite) frontend source static/ built frontend assets (after npm build) main.py FastAPI application entry point package.json frontend package manifest requirements.txt pytest.ini ``` --- ## Setup Recommended versions: - Python 3.10-3.11 - Node.js 18+ (needed only for frontend rebuild/dev) ```bash python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt ``` Then download checkpoints (see above). If you need to rebuild frontend assets locally: ```bash npm install npm run build mkdir -p static && cp -r dist/* static/ ``` --- ## Run ### Local OpenCV demo ```bash python ui/live_demo.py python ui/live_demo.py --xgb # XGBoost ``` Controls: `m` cycle mesh overlay, `1-5` switch pipeline mode, `q` quit. ### Web app (without Docker) ```bash source venv/bin/activate python -m uvicorn main:app --host 0.0.0.0 --port 7860 ``` Open http://localhost:7860 ### Web app (Docker) ```bash docker-compose up # serves on port 7860 ``` --- ## Data collection ```bash python -m models.collect_features --name ``` Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to `data/collected_/` as `.npz` files. Raw video is never stored. 9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository. Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing --- ## Pipeline ``` Webcam frame --> MediaPipe Face Mesh (478 landmarks) --> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation --> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR --> Gaze ratios: h_gaze, v_gaze, gaze_offset --> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur --> 17 features --> select 10 --> clip to physiological bounds --> ML model (MLP / XGBoost) or geometric scorer --> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45) --> FOCUSED / UNFOCUSED ``` Five runtime modes share the same feature extraction backbone: | Mode | Description | |------|-------------| | **Geometric** | Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg | | **XGBoost** | 600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal) | | **MLP** | PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal) | | **Hybrid** | 30% MLP + 70% geometric ensemble (LOPO F1 = 0.841) | | **L2CS** | Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained) | Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto. --- ## Training Both scripts read all hyperparameters from `config/default.yaml`. ```bash python -m models.mlp.train python -m models.xgboost.train ``` Outputs: `checkpoints/` (model + scaler) and `evaluation/logs/` (CSVs, JSON summaries). ### ClearML experiment tracking ```bash USE_CLEARML=1 python -m models.mlp.train USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train USE_CLEARML=1 python -m evaluation.justify_thresholds --clearml ``` Logs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA). Reference experiment IDs: | Model | ClearML experiment ID | |-------|------------------------| | MLP (`models.mlp.train`) | `3899b5aa0c3348b28213a3194322cdf7` | | XGBoost (`models.xgboost.train`) | `c0ceb8e7e8194a51a7a31078cc47775c` | --- ## Evaluation ```bash python -m evaluation.justify_thresholds # LOPO threshold + weight search python -m evaluation.grouped_split_benchmark # pooled vs LOPO comparison python -m evaluation.feature_importance # XGBoost gain + LOFO ablation ``` ### Results (pooled random split, 15% test) | Model | Accuracy | F1 | ROC-AUC | |-------|----------|----|---------| | XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 | | MLP (64-32) | 92.92% | 0.929 | 0.971 | ### Results (LOPO, 9 participants) | Model | LOPO AUC | Best threshold (Youden's J) | F1 at best threshold | |-------|----------|-----------------------------|----------------------| | MLP | 0.862 | 0.228 | 0.858 | | XGBoost | 0.870 | 0.280 | 0.855 | Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820). Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841). The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric. ### Feature ablation | Channel subset | Mean LOPO F1 | |----------------|-------------| | All 10 features | 0.829 | | Eye state only | 0.807 | | Head pose only | 0.748 | | Gaze only | 0.726 | Top-5 XGBoost gain: `s_face` (10.27), `ear_right` (9.54), `head_deviation` (8.83), `ear_avg` (6.96), `perclos` (5.68). --- ## L2CS Gaze Tracking L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander. **Standalone mode:** Select L2CS as the model. **Boost mode:** Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto. **Calibration:** Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction. L2CS weight lookup order in runtime: 1. `checkpoints/L2CSNet_gaze360.pkl` 2. `models/L2CS-Net/models/L2CSNet_gaze360.pkl` 3. `models/L2CSNet_gaze360.pkl` --- ## Config All hyperparameters and app settings are in `config/default.yaml`. Override with `FOCUSGUARD_CONFIG=/path/to/custom.yaml`. --- ## Tests Included checks: - data prep helpers and real split consistency (`test_data_preparation.py`; split test **skips** if `data/collected_*/*.npz` is absent) - feature clipping (`test_models_clip_features.py`) - pipeline integration (`test_pipeline_integration.py`) - gaze calibration / fusion diagnostics (`test_gaze_pipeline.py`) - FastAPI health, settings, sessions (`test_health_endpoint.py`, `test_api_settings.py`, `test_api_sessions.py`) ```bash pytest ``` Coverage is enabled by default via `pytest.ini` (`--cov` / term report). For HTML coverage: `pytest --cov-report=html`. **Stack:** Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest.