test_final / evaluation /README.md
Abdelrahman Almatrooshi
docs: README updates in subfolders
afda79c

evaluation/

Training logs, threshold/weight analysis, metrics. LOPO (9 folds) + Youden’s J + weight grid search — see justify_thresholds.py.

Contents: logs/ (JSON from training runs), plots/ (ROC, weight search, EAR/MAR), justify_thresholds.py, feature_importance.py, and the generated markdown reports.

Logs: MLP writes face_orientation_training_log.json, XGBoost writes xgboost_face_orientation_training_log.json. Paths: evaluation/logs/.

Threshold report: Generate THRESHOLD_JUSTIFICATION.md and plots with:

python -m evaluation.justify_thresholds

(LOPO over 9 participants, Youden’s J, weight grid search; ~10–15 min.) Outputs go to plots/ and the markdown file.

Feature importance: Run python -m evaluation.feature_importance for full XGBoost gain + leave-one-feature-out LOPO (slow).
Fast iteration mode: python -m evaluation.feature_importance --quick --skip-lofo (channel ablation + gain only).

Grouped benchmark: Run python -m evaluation.grouped_split_benchmark for full run, or python -m evaluation.grouped_split_benchmark --quick for faster approximate numbers.

Who writes here: models.mlp.train, models.xgboost.train, evaluation.justify_thresholds, evaluation.feature_importance, and the notebooks.