File size: 4,438 Bytes
0da4f80
 
 
 
 
 
ea3fe1b
f6b961e
e557410
f6b961e
e69e3a3
f6b961e
4a5bfab
e69e3a3
 
 
 
e557410
 
 
e69e3a3
 
 
 
e557410
 
 
 
 
e69e3a3
 
 
 
4a5bfab
 
e69e3a3
4a5bfab
 
 
 
 
 
 
e69e3a3
4a5bfab
 
 
 
e69e3a3
4a5bfab
 
e69e3a3
4a5bfab
e69e3a3
4a5bfab
 
e69e3a3
 
4a5bfab
 
e69e3a3
4a5bfab
2be007a
 
 
 
 
 
e69e3a3
4a5bfab
 
 
e69e3a3
4a5bfab
 
e69e3a3
4a5bfab
 
e69e3a3
 
4a5bfab
 
e69e3a3
4a5bfab
e69e3a3
4a5bfab
e557410
 
 
 
 
 
 
 
 
 
e69e3a3
4a5bfab
e69e3a3
 
 
e557410
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4a5bfab
e69e3a3
4a5bfab
e69e3a3
e557410
 
 
 
4a5bfab
e557410
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
title: FocusGuard
sdk: docker
app_port: 7860
---

# FocusGuard

Webcam-based focus detection: MediaPipe face mesh -> 17 features (EAR, gaze, head pose, PERCLOS, etc.) -> MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video.

## Project layout

```
β”œβ”€β”€ data/                 collected_<name>/*.npz
β”œβ”€β”€ data_preparation/     loaders, split, scale
β”œβ”€β”€ notebooks/            MLP/XGB training + LOPO
β”œβ”€β”€ models/               face_mesh, head_pose, eye_scorer, train scripts
β”‚   β”œβ”€β”€ gaze_calibration.py   9-point polynomial gaze calibration
β”‚   β”œβ”€β”€ gaze_eye_fusion.py    Fuses calibrated gaze with eye openness
β”‚   └── L2CS-Net/              In-tree L2CS-Net repo with Gaze360 weights
β”œβ”€β”€ checkpoints/          mlp_best.pt, xgboost_*_best.json, scalers
β”œβ”€β”€ evaluation/           logs, plots, justify_thresholds
β”œβ”€β”€ ui/                   pipeline.py, live_demo.py
β”œβ”€β”€ src/                  React frontend
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ FocusPageLocal.jsx      Main focus page (camera, controls, model selector)
β”‚   β”‚   └── CalibrationOverlay.jsx  Fullscreen calibration UI
β”‚   └── utils/
β”‚       └── VideoManagerLocal.js    WebSocket client, frame capture, canvas rendering
β”œβ”€β”€ static/               built frontend (after npm run build)
β”œβ”€β”€ main.py, app.py       FastAPI backend
β”œβ”€β”€ requirements.txt
└── package.json
```

## Setup

```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

To rebuild the frontend after changes:

```bash
npm install
npm run build
mkdir -p static && cp -r dist/* static/
```

## Run

**Web app:** Use the venv and run uvicorn via Python so it picks up your deps (otherwise you get `ModuleNotFoundError: aiosqlite`):

```bash
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860
```

Then open http://localhost:7860.

**Frontend dev server (optional, for React development):**

```bash
npm run dev
```

**OpenCV demo:**

```bash
python ui/live_demo.py
python ui/live_demo.py --xgb
```

**Train:**

```bash
python -m models.mlp.train
python -m models.xgboost.train
```

## Data

9 participants, 144,793 samples, 10 features, binary labels. Collect with `python -m models.collect_features --name <name>`. Data lives in `data/collected_<name>/`.

## Models

| Model | What it uses | Best for |
|-------|-------------|----------|
| **Geometric** | Head pose angles + eye aspect ratio (EAR) | Fast, no ML needed |
| **XGBoost** | Trained classifier on head/eye features (600 trees, depth 8) | Balanced accuracy/speed |
| **MLP** | Neural network on same features (64->32) | Higher accuracy |
| **Hybrid** | Weighted MLP + Geometric ensemble | Best head-pose accuracy |
| **L2CS** | Deep gaze estimation (ResNet50, Gaze360 weights) | Detects eye-only gaze shifts |

## Model numbers (15% test split)

| Model | Accuracy | F1 | ROC-AUC |
|-------|----------|-----|---------|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64->32) | 92.92% | 0.929 | 0.971 |

## L2CS Gaze Tracking

L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander.

### Standalone mode
Select **L2CS** as the model - it handles everything.

### Boost mode
Select any other model, then click the **GAZE** toggle. L2CS runs alongside the base model:
- Base model handles head pose and eye openness (35% weight)
- L2CS handles gaze direction (65% weight)
- If L2CS detects gaze is clearly off-screen, it **vetoes** the base model regardless of score

### Calibration
After enabling L2CS or Gaze Boost, click **Calibrate** while a session is running:
1. A fullscreen overlay shows 9 target dots (3x3 grid)
2. Look at each dot as the progress ring fills
3. The first dot (centre) sets your baseline gaze offset
4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates
5. A cyan tracking dot appears on the video showing where you're looking

## Pipeline

1. Face mesh (MediaPipe 478 pts)
2. Head pose -> yaw, pitch, roll, scores, gaze offset
3. Eye scorer -> EAR, gaze ratio, MAR
4. Temporal -> PERCLOS, blink rate, yawn
5. 10-d vector -> MLP or XGBoost -> focused / unfocused

**Stack:** FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net.