final_test / ui /README.md
Abdelrahman Almatrooshi
Deploy snapshot from main b7a59b11809483dfc959f196f1930240f2662c49
22a6915

ui

Real-time inference pipelines and demo interface. This package bridges the trained models with live webcam input, producing frame-by-frame focus predictions.

Pipeline modes

FocusGuard supports five runtime modes, all sharing the same feature extraction backbone:

Mode Pipeline class What it does
Geometric FaceMeshPipeline Deterministic scoring from head pose and eye state. No ML model needed. Fastest option.
MLP MLPPipeline 10 features through the PyTorch MLP (10-64-32-2). Threshold: 0.23 (LOPO Youden's J).
XGBoost XGBoostPipeline 10 features through XGBoost (600 trees). Threshold: 0.28 (LOPO Youden's J).
Hybrid HybridPipeline 30% MLP + 70% geometric ensemble (w_mlp=0.3, alpha=0.7). LOPO F1: 0.841.
L2CS L2CSPipeline Deep gaze estimation via L2CS-Net (ResNet50). Standalone focus scoring from gaze direction.

Any mode can be combined with L2CS Boost mode (toggle in the UI), which fuses the base score (35%) with L2CS gaze score (65%) and applies gaze-based veto for off-screen looks.

Output smoothing

All pipelines use asymmetric EMA (_OutputSmoother) to stabilise predictions:

Parameter Value Effect
alpha_up 0.55 Fast rise: recognises focus quickly
alpha_down 0.45 Slower fall: avoids flicker on brief glances
grace_frames 10 (~0.33s at 30fps) Holds score steady when face is briefly occluded

Geometric scoring

FaceMeshPipeline computes:

  • s_face: cosine-decay face orientation score from solvePnP (max_angle=22 deg, roll down-weighted 50%)
  • s_eye: EAR-based eye openness score multiplied by iris gaze score
  • Combined score: 0.7 * s_face + 0.3 * s_eye (weights from LOPO grid search)
  • MAR yawn veto: MAR > 0.55 overrides to unfocused

L2CS Boost mode

When enabled alongside any base model:

  1. L2CS-Net predicts gaze yaw/pitch from the face crop
  2. Calibrated gaze is mapped to screen coordinates (if calibration was done)
  3. Fusion: 0.35 * base_score + 0.65 * l2cs_score with fused threshold 0.52
  4. Off-screen gaze produces near-zero L2CS score via cosine decay, dragging fused score below threshold (soft veto)

This catches the key edge case where head faces the screen but eyes wander to a second monitor or phone.

Files

File Purpose
pipeline.py All pipeline classes, feature clipping, output smoothing, hybrid config, runtime feature engine
live_demo.py OpenCV webcam demo with real-time overlay (bounding box, mesh, gaze lines, score bar)

Local demo

python ui/live_demo.py          # MLP (default)
python ui/live_demo.py --xgb    # XGBoost

Controls: m cycle mesh overlay, 1-5 switch pipeline mode, q quit.

Web application

The full web app (React frontend + FastAPI backend) runs from main.py in the project root:

  • WebSocket (/ws/video): frame-slot architecture, only most recent frame processed, stale frames dropped
  • WebRTC (/api/webrtc/offer): SDP exchange + ICE gathering for lower-latency streaming
  • Inference offloaded to ThreadPoolExecutor (4 workers, per-pipeline locks)
  • SQLite database persists sessions and per-frame events via EventBuffer (flushes every 2s)
  • Frontend pages: Focus tracking with live overlays, session records, achievements/gamification, model customisation, 9-point gaze calibration, help documentation

Deployment via Docker: docker-compose up (port 7860). Vite builds the frontend statically into FastAPI's static directory. L2CS-Net weights are pulled at runtime via huggingface_hub.