turbofan-surrogate
Neural surrogate for the ODSMR turbofan engine simulator. A small Flax MLP (20k to 1M parameters) approximates ODSMR's thermodynamic map and runs on the order of a million times faster per device than the original CPU simulator, with overall R² above 0.998 on every sensor.
The surrogate is intended as a fast inference backend for reinforcement learning, world model evaluation, and any pipeline that needs to call ODSMR many millions of times.
Results at a glance
Trained and evaluated on a 1.2 M sample dataset drawn from the TurboSens operating distribution (4 wear regimes × 16 contexts, ODSMR-sampled ground truth, 200 k held-out for evaluation).
| Headline | Number |
|---|---|
| Variants shipped | 4 (tiny, small, default, large) |
| Smallest variant | 19.7 k params, 78 KB on disk |
| Largest variant | 1.06 M params, 4.1 MB on disk |
| Best overall R² | ≥ 0.9999 on every sensor (large) |
| Best overall nMAE (×σ) | 0.0039 mean, 0.0057 worst-sensor (large) |
| Recommended variant (default) | R² ≥ 0.9998 on every sensor, mean nMAE 0.0055 |
| Throughput at batch 16 384 (V100) | 6.1 M samples/sec (small), 3.0 M (large) |
| Vs ODSMR per CPU thread | ~1.6 M× faster per device |
| Vs per-context mean baseline | 14× to 64× lower nMAE per sensor |
| Worst held-out sample relative error | ~2.5 % of natural sensor scale |
| Convergence rate of training data | 99.94 % (ODSMR diverges < 0.1 %) |
The trained surrogate beats the per-context mean baseline (which already knows the operating point) by 14× to 64×, evidence that it learned state-driven physics rather than per-context marginals. Hard-case analysis shows the worst residuals concentrate on the post-action wear regime (3.4× over-represented in the worst 1 %), which is exactly the regime where the underlying problem is most ambiguous and where action history matters most — a feature, not a bug, for downstream world-model evaluation.
Full per-sensor / per-slice breakdown in
reports/default_evaluation.md;
side-by-side variant comparison in
reports/comparison.md.
What it does
Takes a 17 dimensional input
- 10 dim health state (per module efficiency and mass flow degradation)
- altitude, Mach, thrust command, ambient temperature delta
- 3 dim phase one hot (MTO take off, MCL climb, CR cruise)
and returns 7 sensor readings
HPC_Tout(K),HP_Nmech(rpm),HPC_Tin(K),LPT_Tin(K),Fuel_flow(kg/s),HPC_Pout_st(Pa),LP_Nmech(rpm)
Quick start
pip install -e .
# for GPU inference (Linux + CUDA 12 + Ampere/Hopper/Volta):
pip install -e ".[gpu]"
import numpy as np
from turbofan_surrogate import (
load_surrogate, build_inputs, CONTEXT_PHASES, SENSOR_NAMES,
)
# Load the recommended model
surrogate = load_surrogate("weights/default.pkl")
# Build a batch of inputs
n = 1024
states = np.random.uniform(-0.05, 0.0, (n, 10)).astype("float32")
ci = np.random.randint(0, 16, n) # 16 named contexts
dtamb = np.full(n, 10.0, dtype="float32")
alt = np.full(n, 35000.0, dtype="float32")
mach = np.full(n, 0.78, dtype="float32")
cmd = np.full(n, 25000.0, dtype="float32")
X = build_inputs(states, ci, dtamb, alt, mach, cmd)
# Predict (7 sensors per input row, in natural units)
sensors = surrogate.predict(X)
print(dict(zip(SENSOR_NAMES, sensors[0])))
A complete worked example with timing is in examples/inference_demo.py.
Available variants
Four architectures are shipped under weights/. All four were trained on
the same 999k row dataset for 100 epochs.
| Variant | Hidden × Depth | Params | Disk | nMAE (mean) | Throughput @ batch 16k (V100) |
|---|---|---|---|---|---|
tiny |
128 × 2 | 19.7k | 78 KB | 0.0137 | 6.1M samples/sec |
small |
256 × 3 | 138k | 540 KB | 0.0109 | 6.1M samples/sec |
default |
384 × 4 | 453k | 1.8 MB | 0.0055 | 3.8M samples/sec |
large |
512 × 5 | 1.06M | 4.1 MB | 0.0039 | 3.0M samples/sec |
Recommendations:
- For most uses, ship
default. R² ≥ 0.9998 on every sensor, mean nMAE 0.55 % of natural sensor std, 3.8M samples/sec on V100. - For maximum throughput, ship
small. Identical throughput totinybut 17 % lower nMAE, only 540 KB on disk. - For best accuracy, ship
large. About 30 % lower nMAE than default at 21 % throughput cost. tinyis dominated bysmalland is included for completeness only.
A full per sensor breakdown lives in reports/comparison.md and a full
held out evaluation report (per phase, per context, per wear mode, per
quantile bin, hardest cases, sensitivity matrix, throughput) for the
default model lives in reports/default_evaluation.md.
Reference numbers
On the held out 200k row eval set, the default model achieves:
| Sensor | nMAE (×σ) | R² | Max abs error |
|---|---|---|---|
| HPC_Tout | 0.0053 | 0.9999 | 9.6 K |
| HP_Nmech | 0.0102 | 0.9998 | 263 rpm |
| HPC_Tin | 0.0044 | 1.0000 | 8.2 K |
| LPT_Tin | 0.0056 | 0.9999 | 22 K |
| Fuel_flow | 0.0028 | 1.0000 | 0.024 kg/s |
| HPC_Pout_st | 0.0028 | 1.0000 | 72 kPa |
| LP_Nmech | 0.0073 | 0.9999 | 129 rpm |
Compared against the per context mean baseline (a strong baseline that already knows the operating point, just not the state), the trained surrogate is 14× to 64× lower nMAE per sensor — i.e. the model learned state-driven physics, not just the per context mean.
Operating envelope
The surrogate was trained on the TurboSens operating distribution, which is wider than ODSMR's documented "tested core". Inputs should fall within:
- ALT: 0 to 39000 ft
- MACH: 0 to 0.84
- COMMAND: 15000 to 125000 lbf (used only for CR; ignored in MTO/MCL)
- DTAMB: -15 to +30 K
- state: each entry within
STATE_BOUNDS(Eff in [-0.05, 0], Wc in [-0.05, 0.03] or [-0.05, 0.05] depending on module)
Predictions outside this envelope are not characterised; outputs may
extrapolate poorly. Inside the envelope, see MODEL_CARD.md for slice
level error guarantees.
Training data and methodology
The training set is 999k samples drawn from a four mode wear mixture
(pristine / mid wear / heavy wear / post action restoration) crossed
with uniform context sampling and per slot ALT/MACH/COMMAND jitter,
then run through ODSMR. The dataset is not redistributed here; see the
companion rl_simulator_safran repository for the generation pipeline.
Training: 100 epochs, AdamW (lr 1e-3 → 1e-5 cosine, weight decay 1e-4), per output normalised MSE in float32, batch 8192. Wall time on a single V100 is 15 to 75 seconds depending on variant.
Model card
See docs/MODEL_CARD.md for intended use, scope,
known failure modes, and ethical considerations.
License
MIT. See LICENSE.
Citation
If you use this surrogate, please also cite the underlying ODSMR (OpenDeckSMR) simulator it approximates:
Psaropoulos, M., Gkoutzamanis, V., Kalfas, A. I., Giannakakis, P., Razakarivony, S., Thepaut, S., & Vu, D. Q. (2025). OpenDeckSMR (Version 0.1) [Computer software]. https://github.com/OpenDeckLab/OpenDeckSMR
BibTeX:
@software{psaropoulos2025opendecksmr,
author = {Psaropoulos, M. and Gkoutzamanis, V. and Kalfas, A. I. and
Giannakakis, P. and Razakarivony, S. and Thepaut, S. and Vu, D. Q.},
title = {{OpenDeckSMR}},
year = {2025},
version = {0.1},
url = {https://github.com/OpenDeckLab/OpenDeckSMR}
}