.. _auto_ml_learn_ersatz:

*ML LEARN ERSATZ — TRAIN M1..M4 SURROGATES*
===========================================

Trains the four ERSATZ surrogate models (m1: YSS→USS, m2: YSS+USS→PRENE, m3: USS→FSS, m4: USS+FSS→PSTNE) over a material statistics dataset, dispatching each model via an internal ml_learn_auto call with optional parallelism. When the Stats Dataset is not provided, it is auto-computed from the supplied engineering stress-strain curves via curve_engstress_stats (one row per curve). Outputs trained model file paths keyed by model ID (m1–m4) plus all ml_learn_auto diagnostic outputs prefixed by model ID.

When to use
-----------

Tagged: ``batch``, ``ersatz``, ``materials``, ``ml_learn_auto``, ``multi-target``, ``parallel``, ``per_target_best``, ``regression``.

Inputs
------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20 20 20

   * - Label
     - ID
     - Type
     - Default
     - Required
     - Description
   * - Eng Stress-Strain Curves
     - eng_curves
     - vector
     - —
     - ✓
     - Collection of engineering stress-strain curves (one per material, vector of curve objects); used as the sole data source when stats_dataset is empty — the worker calls curve_engstress_stats on each curve to build the stats table.
   * - Stats Dataset
     - stats_dataset
     - dataset
     - —
     - 
     - Optional pre-built statistics dataset (one row per material) containing yield, ultimate, failure, energy, and shape columns; when supplied it bypasses curve-based auto-computation and is used directly for all four sub-model training runs.
   * - Material Name Column
     - name_col
     - text
     - name
     - 
     - Name of the dataset column carrying the per-material/curve identifier (default: 'name'); when stats are auto-computed this column is populated from each curve's getName() result.
   * - Eng Yield Strain Column
     - yld_strain_col
     - text
     - yield_strain
     - 
     - Dataset column name for engineering yield strain (default: 'yield_strain'); serves as a feature in models m1 and m2.
   * - Eng Yield Stress Column
     - yld_stress_col
     - text
     - yield_stress
     - 
     - Dataset column name for engineering yield stress (default: 'yield_stress'); serves as a feature in models m1 and m2.
   * - Eng Ultimate Strain Column
     - ult_strain_col
     - text
     - eng_ultimate_strain
     - 
     - Dataset column name for engineering ultimate strain (default: 'eng_ultimate_strain'); target in m1, feature in m2/m3/m4.
   * - Eng Ultimate Stress Column
     - ult_stress_col
     - text
     - ultimate_stress
     - 
     - Dataset column name for engineering ultimate stress (default: 'ultimate_stress'); target in m1, feature in m2/m3/m4.
   * - Eng Failure Strain Column
     - fail_strain_col
     - text
     - failure_strain
     - 
     - Dataset column name for engineering failure strain (default: 'failure_strain'); target in m3, feature in m4.
   * - Eng Failure Stress Column
     - fail_stress_col
     - text
     - failure_stress
     - 
     - Dataset column name for engineering failure stress (default: 'failure_stress'); target in m3, feature in m4.
   * - Post-Necking Shape Column
     - post_shape_col
     - text
     - post_shape
     - 
     - Dataset column name for the post-necking shape parameter (default: 'post_shape'); available for inclusion in extended feature sets if configured.
   * - Post-Necking Power Column
     - post_power_col
     - text
     - post_power
     - 
     - Dataset column name for the post-necking power parameter (default: 'post_power'); available for inclusion in extended feature sets if configured.
   * - Damage at Failure Column
     - damage_max_col
     - text
     - damage_max
     - 
     - Dataset column name for the maximum damage-at-failure scalar (default: 'damage_max'); available as a supplementary feature or target column.
   * - Plateau Fraction Column
     - plateau_frac_col
     - text
     - plateau_fraction
     - 
     - Dataset column name for the plateau fraction (default: 'plateau_fraction'); characterises the flat region of the stress-strain curve for inclusion as a feature.
   * - Pre-Ultimate Shape Column
     - pre_ult_shape_col
     - text
     - pre_ult_shape
     - 
     - Dataset column name for the pre-ultimate shape exponent (default: 'pre_ult_shape'); describes the curvature of the hardening region before the ultimate point.
   * - Pre-Ultimate Power Column
     - pre_ult_power_col
     - text
     - pre_ult_power
     - 
     - Dataset column name for the pre-ultimate power exponent (default: 'pre_ult_power'); complements pre_ult_shape for characterising the pre-necking hardening law.
   * - Voce K Column
     - voce_k_col
     - text
     - voce_k
     - 
     - Dataset column name for the Voce saturation parameter K (default: 'voce_k'); used when a Voce-type hardening law is fitted to the curve.
   * - Pre-Necking Energy Column (PRENE)
     - energy_yu_col
     - text
     - energy_yield_to_ultimate
     - 
     - Dataset column name for the pre-necking (yield-to-ultimate) specific energy (default: 'energy_yield_to_ultimate'); sole target of model m2 (PRENE).
   * - Post-Necking Energy Column (PSTNE)
     - energy_uf_col
     - text
     - energy_ultimate_to_failure
     - 
     - Dataset column name for the post-necking (ultimate-to-failure) specific energy (default: 'energy_ultimate_to_failure'); sole target of model m4 (PSTNE).
   * - Learning Quality
     - learning_quality
     - select
     - basic
     - 
     - Training quality preset — 'basic' uses a reduced regressor set with no cross-validation, 'best' enables the full regressor sweep with 5-fold k-fold CV and prognosis mode (default: 'basic').
   * - Parallel Model Training
     - parallel
     - select
     - no
     - 
     - Set to 'yes' to train the four sub-models concurrently via pcntl_fork (requires PHP CLI SAPI with ext-pcntl); falls back silently to sequential if forking is unavailable (default: 'no').
   * - Per-Target Best Model
     - per_target_best
     - select
     - no
     - 
     - When 'yes', trains a separate best-fit model per target column within each stage and emits comma-joined .pkl paths; when 'no' (default) a single multi-output sklearn model covers all targets in one .pkl file.
   * - Save Trained Models to Math Model Library
     - save_to_mathmodel
     - select
     - no
     - 
     - When yes, after training each m1..m4 the worker calls the mathmodel_save worker with that stage's mfile.pkl. The resulting saved mathmodel ids are emitted as the 'saved_mathmodels' keyvalue output (entries m1..m4 with the saved ids), which can be wired into ml_predict_ersatz's Saved Math Models input as an alternative to the temp-path 'models' keyvalue. For per_target_best=yes stages the comma-joined .pkl list is saved as a single per-target-best CSV mathmodel — ml_predict transparently handles either shape on the predict side.
   * - Math Model Name Prefix
     - mathmodel_name_prefix
     - text
     - —
     - 
     - Used only when Save Trained Models = Yes. Each saved model is named '<prefix>_m1', '<prefix>_m2', etc. When empty, an auto-generated prefix 'ersatz_<uniqid>' is used so back-to-back runs do not collide on the unique-name constraint enforced by mathmodel_save.
   * - Math Model Tags (comma-separated)
     - mathmodel_tags
     - text
     - ersatz
     - 
     - Tags applied to every saved mathmodel. Helpful for finding the four sub-models later under a single label.

Outputs
-------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20

   * - Label
     - ID
     - Type
     - Description
   * - Trained Models (mfile.pkl paths)
     - models
     - keyvalue
     - Key-value map of trained surrogate model file paths keyed by model ID (m1, m2, m3, m4); values are absolute .pkl paths consumed by ml_predict_ersatz.
   * - Stats Dataset
     - stats_dataset
     - dataset
     - The material statistics dataset actually used for training (one row per material with all feature/target columns); echoed back for inspection or caching, whether supplied by the user or auto-computed from eng_curves.
   * - Parallel Mode Actually Used
     - parallel_actual
     - text
     - Text flag indicating the parallelism mode that was actually applied at runtime ('parallel' or 'sequential'), reflecting any automatic fallback from the requested parallel mode.
   * - Saved Math Model IDs (when Save Trained Models = Yes)
     - saved_mathmodels
     - keyvalue
     - Key-value map of platform MathModel record IDs (keyed by m1–m4) created when 'Save Trained Models' is enabled; empty when saving is disabled.
   * - Raw vs Predictions
     - raw_vs_predictions
     - dataset
     - Verification dataset built by chain-predicting the just-trained m1..m4 models back over `stats_dataset`. For every anchor column the dataset carries the raw, predicted, and (raw − pred) values per row. Each row also includes a `predicted_eng_stress_strain` column with the engineering stress-strain curve reconstructed from the predicted anchors via curves_ersatz_eng — drop the row's curve cell on a slide alongside the original input curve to spot any per-material divergence.

Disciplines
-----------

- ai_ml.prognosis
- ai_ml.supervised.regression
- ai_ml.surrogate
- engineering.material.calibration
- engineering.material.characterization

.. raw:: html

   <hr style="margin-top:2em">
   <p style="font-size:11px;color:#888">
   Auto-generated from <code>platform</code> schema. Worker id: <code>ml_learn_ersatz</code>. Schema hash: <code>0884c780ba0f</code>. Hand-curated docs in <code>workerexamples/</code> override this page when present.
   </p>