ML LEARN ERSATZ — TRAIN M1..M4 SURROGATES¶

Trains the four ERSATZ surrogate models (m1: YSS→USS, m2: YSS+USS→PRENE, m3: USS→FSS, m4: USS+FSS→PSTNE) over a material statistics dataset, dispatching each model via an internal ml_learn_auto call with optional parallelism. When the Stats Dataset is not provided, it is auto-computed from the supplied engineering stress-strain curves via curve_engstress_stats (one row per curve). Outputs trained model file paths keyed by model ID (m1–m4) plus all ml_learn_auto diagnostic outputs prefixed by model ID.

When to use¶

Tagged: batch, ersatz, materials, ml_learn_auto, multi-target, parallel, per_target_best, regression.

Inputs¶

Label	ID	Type	Default	Required	Description
Eng Stress-Strain Curves	eng_curves	vector	—	✓	Collection of engineering stress-strain curves (one per material, vector of curve objects); used as the sole data source when stats_dataset is empty — the worker calls curve_engstress_stats on each curve to build the stats table.
Stats Dataset	stats_dataset	dataset	—		Optional pre-built statistics dataset (one row per material) containing yield, ultimate, failure, energy, and shape columns; when supplied it bypasses curve-based auto-computation and is used directly for all four sub-model training runs.
Elastic Modulus	modulus	number	210.0		Elastic modulus (Young’s modulus) used when stats_dataset is not provided and the worker computes per-curve stats internally. Pass the same value you would pass to curves_engstress_stats so the resulting yield_strain / yield_stress columns match a parallel curves_engstress_stats run on the same input curves. Default 210.0 (steel-class metals); pass 0 to auto-fit the slope from each curve’s elastic region. Ignored when stats_dataset is wired.
Yield Offset	yield_offset	number	0.002		Yield-strain offset (e.g., 0.002 for the 0.2%-offset method on metals, 0.005 for polymers) used when stats_dataset is not provided. Same convention as curves_engstress_stats. Ignored when stats_dataset is wired.
Predict Anchor Deltas (instead of absolute values)	predict_deltas	select	yes		When ‘yes’ (default), the m1 / m3 sub-models are trained on anchor-to-anchor deltas (eng_ultimate_strain - yield_strain, ultimate_stress - yield_stress, failure_strain - eng_ultimate_strain, ultimate_stress - failure_stress). The corresponding ml_predict_ersatz reconstructs absolute values with max(0, delta) so the predicted strain monotonicity is guaranteed. Set ‘no’ to keep the legacy absolute-target behavior; ml_predict_ersatz must then be wired with predict_deltas=’no’ as well so the column names match the trained model.
Feature Normalization	normalize	select	standard		Whether to rescale features before training. Defaults to ‘standard’ so models with mixed-scale anchor features (e.g. yield_strain ~0.005 vs ultimate_stress ~500) train correctly. Use ‘no’ only when reproducing the pre-fix legacy behavior.
Cross-Validation Folds	cv_folds	number	5		Number of k-fold CV splits used when learning_quality=’best’ or run_grid_search=’yes’. Ignored when CV is disabled.
Run Hyperparameter Grid Search	run_grid_search	select	auto		Whether to run sklearn GridSearchCV over each regressor’s hyperparameter grid. ‘auto’ (default) ties it to learning_quality. Grid search materially improves model accuracy but costs roughly cv_folds × grid_size more compute per regressor.
Model Mode	model_mode	select	staged		Whether to train four chained sub-models (default — each stage’s prediction feeds the next, exposing energies to the predicted anchors) or a single unified multi-output regressor (one .pkl that takes yield_strain + yield_stress and emits all six targets — eng_ultimate_strain, ultimate_stress, energy_yield_to_ultimate, failure_strain, failure_stress, energy_ultimate_to_failure — at once). Unified is faster to train and predict and gives the regressor visibility into target correlations, at the cost of not being able to inject upstream-predicted anchors into downstream stages. ml_predict_ersatz must be wired with the same model_mode so the keyvalue keys (m1..m4 vs m_all) line up.
Material Name Column	name_col	select	name		Name of the dataset column carrying the per-material/curve identifier (default: ‘name’); when stats are auto-computed this column is populated from each curve’s getName() result.
Eng Yield Strain Column	yld_strain_col	select	yield_strain		Dataset column name for engineering yield strain (default: ‘yield_strain’); serves as a feature in models m1 and m2.
Eng Yield Stress Column	yld_stress_col	select	yield_stress		Dataset column name for engineering yield stress (default: ‘yield_stress’); serves as a feature in models m1 and m2.
Eng Ultimate Strain Column	ult_strain_col	select	eng_ultimate_strain		Dataset column name for engineering ultimate strain (default: ‘eng_ultimate_strain’); target in m1, feature in m2/m3/m4.
Eng Ultimate Stress Column	ult_stress_col	select	ultimate_stress		Dataset column name for engineering ultimate stress (default: ‘ultimate_stress’); target in m1, feature in m2/m3/m4.
Eng Failure Strain Column	fail_strain_col	select	failure_strain		Dataset column name for engineering failure strain (default: ‘failure_strain’); target in m3, feature in m4.
Eng Failure Stress Column	fail_stress_col	select	failure_stress		Dataset column name for engineering failure stress (default: ‘failure_stress’); target in m3, feature in m4.
Post-Necking Shape Column	post_shape_col	select	post_shape		Dataset column name for the post-necking shape parameter (default: ‘post_shape’); available for inclusion in extended feature sets if configured.
Post-Necking Power Column	post_power_col	select	post_power		Dataset column name for the post-necking power parameter (default: ‘post_power’); available for inclusion in extended feature sets if configured.
Damage at Failure Column	damage_max_col	select	damage_max		Dataset column name for the maximum damage-at-failure scalar (default: ‘damage_max’); available as a supplementary feature or target column.
Plateau Fraction Column	plateau_frac_col	select	plateau_fraction		Dataset column name for the plateau fraction (default: ‘plateau_fraction’); characterises the flat region of the stress-strain curve for inclusion as a feature.
Pre-Ultimate Shape Column	pre_ult_shape_col	select	pre_ult_shape		Dataset column name for the pre-ultimate shape exponent (default: ‘pre_ult_shape’); describes the curvature of the hardening region before the ultimate point.
Pre-Ultimate Power Column	pre_ult_power_col	select	pre_ult_power		Dataset column name for the pre-ultimate power exponent (default: ‘pre_ult_power’); complements pre_ult_shape for characterising the pre-necking hardening law.
Voce K Column	voce_k_col	select	voce_k		Dataset column name for the Voce saturation parameter K (default: ‘voce_k’); used when a Voce-type hardening law is fitted to the curve.
Pre-Necking Energy Column (PRENE)	energy_yu_col	select	energy_yield_to_ultimate		Dataset column name for the pre-necking (yield-to-ultimate) specific energy (default: ‘energy_yield_to_ultimate’); sole target of model m2 (PRENE).
Post-Necking Energy Column (PSTNE)	energy_uf_col	select	energy_ultimate_to_failure		Dataset column name for the post-necking (ultimate-to-failure) specific energy (default: ‘energy_ultimate_to_failure’); sole target of model m4 (PSTNE).
Ultimate Strain Plateau Column (optional)	ult_strain_plateau_col	select	ultimate_strain_plateau		Dataset column name for the absolute strain at the END of the post-ultimate plateau (default: ‘ultimate_strain_plateau’). Optional — when at least one row in stats_dataset carries a numeric value here, an extra sub-model ‘m_plat: YSS -> ultimate_strain_plateau’ is trained alongside m1..m4 / m_all. The trained .pkl is added to the ‘models’ keyvalue under key ‘m_plat’ and consumed downstream by ml_predict_ersatz to predict per-material plateau strain (which curve_ersatz_eng / curves_ersatz_eng then convert to a plateau_fraction internally). Skip / leave column missing on a workflow that doesn’t measure plateau strain — m_plat is silently omitted and existing m1..m4 behavior is unchanged.
Elastic Modulus Column (pass-through, no model trained)	elastic_modulus_col	select	modulus		Dataset column name for the elastic modulus (default: ‘modulus’ — the engstress_stats key; ‘slope’ is also accepted as an alias). NOT trained as a sub-model — modulus is essentially constant within a material family. The column name flows through to the in-worker verification call (raw_vs_predictions) so the reconstructed curves use the per-row modulus when available. ml_predict_ersatz exposes the same input plus a default_elastic_modulus fallback for prediction-time rows that don’t carry the stat.
Pre-Ult Mid Stress Column (pass-through)	pre_ult_mid_stress_col	select	eng_mid_yu_stress		Dataset column for the engineering stress at the strain midpoint (ε_yld + ε_ult)/2 — emitted by curve_engstress_stats as ‘eng_mid_yu_stress’. When present on a row, ersatz_eng back-fits the pre-ult shape parameter so the reconstructed curve passes through this midpoint exactly (overriding the energy-based fit). NOT trained as a sub-model; the column flows through verification → ml_predict_ersatz → curves_ersatz_eng.
Pre-Ult Mid Strain Column	pre_ult_mid_strain_col	select	eng_mid_yu_strain		Per-row column carrying engineering strain at the yield-to-ultimate midpoint. Currently a pass-through (predict-side consumer); ml_learn_ersatz reads it for downstream chains that fan out a separate m_mid surrogate when trained.
Post-Ult Mid Stress Column (pass-through)	post_ult_mid_stress_col	select	eng_mid_uf_stress		Dataset column for the engineering stress at the strain midpoint (ε_ult + ε_fail)/2 — emitted by curve_engstress_stats as ‘eng_mid_uf_stress’. When present on a row, ersatz_eng back-fits the post-ult shape parameter (or, for damage_based, the q exponent) so the reconstructed curve passes through this midpoint. NOT trained as a sub-model; flows through verification.
Post-Ult Mid Strain Column	post_ult_mid_strain_col	select	eng_mid_uf_strain		Per-row column carrying engineering strain at the ultimate-to-failure midpoint.
Learning Quality	learning_quality	select	fast		Training quality preset (controls cross-validation and grid search only — orthogonal to Regression Types). fast = no CV, no grid search (default). balanced = 5-fold CV, no grid search (honest score, default hyperparameters). best = 5-fold CV + grid search across each regressor’s hyperparameter grid (slowest, most honest score). Backward-compat: ‘basic’ is accepted as a legacy alias for ‘fast’.
Regression Types	regression_types	list	(complex)		Regression algorithms to evaluate per surrogate (m_all or m1..m4). Each id maps to a Lucy regression task — pick at least one; multi-selecting runs all and reports a comparison table. Default [‘linear_regression’,’rfr_regression’] gives a fast linear baseline plus a non-linear ensemble. Independent of Learning Quality.
Parallel Model Training	parallel	select	no		Set to ‘yes’ to train the four sub-models concurrently via pcntl_fork (requires PHP CLI SAPI with ext-pcntl); falls back silently to sequential if forking is unavailable (default: ‘no’).
Per-Target Best Model	per_target_best	select	no		When ‘yes’, trains a separate best-fit model per target column within each stage and emits comma-joined .pkl paths; when ‘no’ (default) a single multi-output sklearn model covers all targets in one .pkl file.
Save Trained Models to Math Model Library	save_to_mathmodel	select	no		When yes, after training each m1..m4 the worker calls the mathmodel_save worker with that stage’s mfile.pkl. The resulting saved mathmodel ids are emitted as the ‘saved_mathmodels’ keyvalue output (entries m1..m4 with the saved ids), which can be wired into ml_predict_ersatz’s Saved Math Models input as an alternative to the temp-path ‘models’ keyvalue. For per_target_best=yes stages the comma-joined .pkl list is saved as a single per-target-best CSV mathmodel — ml_predict transparently handles either shape on the predict side.
Math Model Name Prefix	mathmodel_name_prefix	text	—		Used only when Save Trained Models = Yes. Each saved model is named ‘<prefix>_m1’, ‘<prefix>_m2’, etc. When empty, an auto-generated prefix ‘ersatz_<uniqid>’ is used so back-to-back runs do not collide on the unique-name constraint enforced by mathmodel_save.
Math Model Tags (comma-separated)	mathmodel_tags	text	ersatz		Tags applied to every saved mathmodel. Helpful for finding the four sub-models later under a single label.
Fit Shape Params from Curves	fit_shape_params	select	no		When ‘yes’, each training curve is independently fit to recover the shape parameters that curves_ersatz_eng consumes (pre_ult_power, voce_k, post_power, damage_max, plateau_fraction, pre_ult_shape, post_shape). Fitted values overwrite the corresponding columns in stats_dataset and become training targets for two new sub-models (m_shape_pre: YSS -> pre_ult_power+voce_k; m_shape_post: USS -> post_power+damage_max+plateau_fraction). At predict time ml_predict_ersatz then emits per-material shape values instead of the hard-coded curves_ersatz_eng fallbacks (post_power=2.0, damage_max=0.3, etc.) — closing the gap between the Stats-Dataset-reconstructed red curve and the predicted-dataset blue curve. Requires eng_curves on the input; with stats_dataset only, the flag is a no-op.
Material Metadata	metadata	dataset	—		Optional per-material metadata, one row per material, joined onto the stats by the name column. Columns named in meta_cols become model features. Numeric and string (categorical) columns are both supported; encoding is handled by the ML backend.
Metadata Feature Columns	meta_cols	text	—		Metadata columns (from the Material Metadata dataset) to use as model features in addition to the stress-strain anchors, e.g. density and family. Leave empty for the original behavior.
Apply Metadata To	meta_apply_to	select	entry		Which staged models receive the metadata features. ‘entry’ adds them to the yield-anchored entry models (m1 / m_all and the yield-keyed sub-models); ‘all’ adds them to every stage.

Outputs¶

Label	ID	Type	Description
Trained Models (mfile.pkl paths)	models	keyvalue	Key-value map of trained surrogate model file paths keyed by model ID (m1, m2, m3, m4 in staged mode, or m_all in unified mode); plus an optional ‘m_plat’ entry when ultimate_strain_plateau was present in stats_dataset. Values are absolute .pkl paths consumed by ml_predict_ersatz.
Stats Dataset	stats_dataset	dataset	The material statistics dataset actually used for training (one row per material with all feature/target columns); echoed back for inspection or caching, whether supplied by the user or auto-computed from eng_curves.
Parallel Mode Actually Used	parallel_actual	text	Text flag indicating the parallelism mode that was actually applied at runtime (‘parallel’ or ‘sequential’), reflecting any automatic fallback from the requested parallel mode.
Saved Math Model IDs (when Save Trained Models = Yes)	saved_mathmodels	keyvalue	Key-value map of platform MathModel record IDs (keyed by m1–m4) created when ‘Save Trained Models’ is enabled; empty when saving is disabled.
Raw vs Predictions	raw_vs_predictions	dataset	Verification dataset built by chain-predicting the just-trained m1..m4 models back over stats_dataset. For every anchor column the dataset carries the raw, predicted, and (raw − pred) values per row. Each row also includes a predicted_eng_stress_strain column with the engineering stress-strain curve reconstructed from the predicted anchors via curves_ersatz_eng — drop the row’s curve cell on a slide alongside the original input curve to spot any per-material divergence.

Disciplines¶

ai_ml.prognosis
ai_ml.supervised.regression
ai_ml.surrogate
engineering.material.calibration
engineering.material.characterization

Auto-generated from platform schema. Worker id: ml_learn_ersatz. Schema hash: e1d169fced6a. Hand-curated docs in workerexamples/ override this page when present.