Spaces:

FocusGuard
/

final_test

Sleeping

App Files Files Community

final_test / ui /README.md

Abdelrahman Almatrooshi

Deploy snapshot from main b7a59b11809483dfc959f196f1930240f2662c49

22a6915 11 days ago

preview code

raw

history blame contribute delete

3.66 kB

ui

Real-time inference pipelines and demo interface. This package bridges the trained models with live webcam input, producing frame-by-frame focus predictions.

Pipeline modes

FocusGuard supports five runtime modes, all sharing the same feature extraction backbone:

Mode	Pipeline class	What it does
Geometric	`FaceMeshPipeline`	Deterministic scoring from head pose and eye state. No ML model needed. Fastest option.
MLP	`MLPPipeline`	10 features through the PyTorch MLP (10-64-32-2). Threshold: 0.23 (LOPO Youden's J).
XGBoost	`XGBoostPipeline`	10 features through XGBoost (600 trees). Threshold: 0.28 (LOPO Youden's J).
Hybrid	`HybridPipeline`	30% MLP + 70% geometric ensemble (w_mlp=0.3, alpha=0.7). LOPO F1: 0.841.
L2CS	`L2CSPipeline`	Deep gaze estimation via L2CS-Net (ResNet50). Standalone focus scoring from gaze direction.

Any mode can be combined with L2CS Boost mode (toggle in the UI), which fuses the base score (35%) with L2CS gaze score (65%) and applies gaze-based veto for off-screen looks.

Output smoothing

All pipelines use asymmetric EMA (_OutputSmoother) to stabilise predictions:

Parameter	Value	Effect
alpha_up	0.55	Fast rise: recognises focus quickly
alpha_down	0.45	Slower fall: avoids flicker on brief glances
grace_frames	10 (~0.33s at 30fps)	Holds score steady when face is briefly occluded

Geometric scoring

FaceMeshPipeline computes:

s_face: cosine-decay face orientation score from solvePnP (max_angle=22 deg, roll down-weighted 50%)
s_eye: EAR-based eye openness score multiplied by iris gaze score
Combined score: 0.7 * s_face + 0.3 * s_eye (weights from LOPO grid search)
MAR yawn veto: MAR > 0.55 overrides to unfocused

L2CS Boost mode

When enabled alongside any base model:

L2CS-Net predicts gaze yaw/pitch from the face crop
Calibrated gaze is mapped to screen coordinates (if calibration was done)
Fusion: 0.35 * base_score + 0.65 * l2cs_score with fused threshold 0.52
Off-screen gaze produces near-zero L2CS score via cosine decay, dragging fused score below threshold (soft veto)

This catches the key edge case where head faces the screen but eyes wander to a second monitor or phone.

Files

File	Purpose
`pipeline.py`	All pipeline classes, feature clipping, output smoothing, hybrid config, runtime feature engine
`live_demo.py`	OpenCV webcam demo with real-time overlay (bounding box, mesh, gaze lines, score bar)

Local demo

python ui/live_demo.py          # MLP (default)
python ui/live_demo.py --xgb    # XGBoost

Controls: m cycle mesh overlay, 1-5 switch pipeline mode, q quit.

Web application

The full web app (React frontend + FastAPI backend) runs from main.py in the project root:

WebSocket (/ws/video): frame-slot architecture, only most recent frame processed, stale frames dropped
WebRTC (/api/webrtc/offer): SDP exchange + ICE gathering for lower-latency streaming
Inference offloaded to ThreadPoolExecutor (4 workers, per-pipeline locks)
SQLite database persists sessions and per-frame events via EventBuffer (flushes every 2s)
Frontend pages: Focus tracking with live overlays, session records, achievements/gamification, model customisation, 9-point gaze calibration, help documentation

Deployment via Docker: docker-compose up (port 7860). Vite builds the frontend statically into FastAPI's static directory. L2CS-Net weights are pulled at runtime via huggingface_hub.