Spaces:
Sleeping
ui
Real-time inference pipelines and demo interface. This package bridges the trained models with live webcam input, producing frame-by-frame focus predictions.
Pipeline modes
FocusGuard supports five runtime modes, all sharing the same feature extraction backbone:
| Mode | Pipeline class | What it does |
|---|---|---|
| Geometric | FaceMeshPipeline |
Deterministic scoring from head pose and eye state. No ML model needed. Fastest option. |
| MLP | MLPPipeline |
10 features through the PyTorch MLP (10-64-32-2). Threshold: 0.23 (LOPO Youden's J). |
| XGBoost | XGBoostPipeline |
10 features through XGBoost (600 trees). Threshold: 0.28 (LOPO Youden's J). |
| Hybrid | HybridPipeline |
30% MLP + 70% geometric ensemble (w_mlp=0.3, alpha=0.7). LOPO F1: 0.841. |
| L2CS | L2CSPipeline |
Deep gaze estimation via L2CS-Net (ResNet50). Standalone focus scoring from gaze direction. |
Any mode can be combined with L2CS Boost mode (toggle in the UI), which fuses the base score (35%) with L2CS gaze score (65%) and applies gaze-based veto for off-screen looks.
Output smoothing
All pipelines use asymmetric EMA (_OutputSmoother) to stabilise predictions:
| Parameter | Value | Effect |
|---|---|---|
| alpha_up | 0.55 | Fast rise: recognises focus quickly |
| alpha_down | 0.45 | Slower fall: avoids flicker on brief glances |
| grace_frames | 10 (~0.33s at 30fps) | Holds score steady when face is briefly occluded |
Geometric scoring
FaceMeshPipeline computes:
s_face: cosine-decay face orientation score from solvePnP (max_angle=22 deg, roll down-weighted 50%)s_eye: EAR-based eye openness score multiplied by iris gaze score- Combined score:
0.7 * s_face + 0.3 * s_eye(weights from LOPO grid search) - MAR yawn veto: MAR > 0.55 overrides to unfocused
L2CS Boost mode
When enabled alongside any base model:
- L2CS-Net predicts gaze yaw/pitch from the face crop
- Calibrated gaze is mapped to screen coordinates (if calibration was done)
- Fusion:
0.35 * base_score + 0.65 * l2cs_scorewith fused threshold 0.52 - Off-screen gaze produces near-zero L2CS score via cosine decay, dragging fused score below threshold (soft veto)
This catches the key edge case where head faces the screen but eyes wander to a second monitor or phone.
Files
| File | Purpose |
|---|---|
pipeline.py |
All pipeline classes, feature clipping, output smoothing, hybrid config, runtime feature engine |
live_demo.py |
OpenCV webcam demo with real-time overlay (bounding box, mesh, gaze lines, score bar) |
Local demo
python ui/live_demo.py # MLP (default)
python ui/live_demo.py --xgb # XGBoost
Controls: m cycle mesh overlay, 1-5 switch pipeline mode, q quit.
Web application
The full web app (React frontend + FastAPI backend) runs from main.py in the project root:
- WebSocket (
/ws/video): frame-slot architecture, only most recent frame processed, stale frames dropped - WebRTC (
/api/webrtc/offer): SDP exchange + ICE gathering for lower-latency streaming - Inference offloaded to
ThreadPoolExecutor(4 workers, per-pipeline locks) - SQLite database persists sessions and per-frame events via
EventBuffer(flushes every 2s) - Frontend pages: Focus tracking with live overlays, session records, achievements/gamification, model customisation, 9-point gaze calibration, help documentation
Deployment via Docker: docker-compose up (port 7860). Vite builds the frontend statically into FastAPI's static directory. L2CS-Net weights are pulled at runtime via huggingface_hub.