test_final / evaluation /feature_selection_justification.md
k22056537
feat: sync integration updates across app and ML pipeline
eb4abb8

Feature selection justification

The face_orientation model uses 10 of 17 extracted features. This document summarises empirical support.

1. Domain rationale

The 10 features were chosen to cover three channels:

  • Head pose: head_deviation, s_face, pitch
  • Eye state: ear_left, ear_right, ear_avg, perclos
  • Gaze: h_gaze, gaze_offset, s_eye

Excluded: v_gaze (noisy), mar (rare events), yaw/roll (redundant with head_deviation/s_face), blink_rate/closure_duration/yawn_duration (temporal overlap with perclos).

2. XGBoost feature importance (gain)

Config used: {'n_estimators': 600, 'max_depth': 8, 'learning_rate': 0.1489, 'subsample': 0.9625, 'colsample_bytree': 0.9013, 'reg_alpha': 1.1407, 'reg_lambda': 2.4181, 'eval_metric': 'logloss'}. Quick mode: yes (200 trees)

From the trained XGBoost checkpoint (gain on the 10 features):

Feature Gain
head_deviation 8.83
s_face 10.27
s_eye 2.18
h_gaze 4.99
pitch 4.64
ear_left 3.57
ear_avg 6.96
ear_right 9.54
gaze_offset 1.80
perclos 5.68

Top 5 by gain: s_face, ear_right, head_deviation, ear_avg, perclos.

3. Leave-one-feature-out ablation (LOPO)

Baseline (all 10 features) mean LOPO F1: 0.8286.

Skipped in this run (--skip-lofo).

4. Channel ablation (LOPO)

Subset Mean LOPO F1
head_pose 0.7480
eye_state 0.8071
gaze 0.7260
all_10 0.8286

5. Conclusion

Selection is supported by (1) domain rationale (three attention channels), (2) XGBoost gain importance, and (3) channel ablation. Run without --skip-lofo for full leave-one-out ablation.