.. _auto_aws_sagemaker_automl_learn:

*AWS SAGEMAKER AUTOML LEARN*
============================

Trains one or more ML models via AWS SageMaker AutoML, automatically exploring candidate algorithms for regression or classification problems. Accepts a labeled training dataset with explicit input/target feature selections, then submits an AutoML job to SageMaker and returns trained model references alongside job metadata and S3 artifact locations. Use this worker when you want managed, cloud-based AutoML training without manual algorithm selection.

When to use
-----------

Tagged: ``automl``, ``aws``, ``classification``, ``cloud``, ``machine_learning``, ``model_selection``, ``regression``, ``s3``.

Inputs
------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20 20 20

   * - Label
     - ID
     - Type
     - Default
     - Required
     - Description
   * - Training Dataset
     - training_dataset
     - dataset
     - —
     - ✓
     - Tabular dataset (d3VIEW dataset type) used to train the AutoML model; must contain only the intended input and target feature columns — minimum 500 rows required by SageMaker AutoML.
   * - Input Features
     - input_features
     - string
     - —
     - ✓
     - One or more column names from the training dataset to use as predictor features; must be unique and must not overlap with target_features.
   * - Target Features
     - target_features
     - string
     - —
     - ✓
     - One or more column names from the training dataset to predict; each target triggers a separate SageMaker AutoML job, so selecting multiple targets increases total runtime.
   * - Problem Type
     - problem_type
     - select
     - Regression
     - ✓
     - AutoML problem formulation — choose 'Regression' for continuous targets, 'BinaryClassification' for two-class targets, or 'MulticlassClassification' for three-or-more-class targets; defaults to 'Regression'.
   * - Error Metric
     - error_metric
     - select
     - MSE
     - ✓
     - Optimization metric used by SageMaker to rank candidate models; options are MAE, MSE (default), R2, and RMSE — select a metric appropriate for the chosen problem type.
   * - Max Candidates
     - max_candidates
     - number
     - 1
     - 
     - Maximum number of algorithm/pipeline candidates that SageMaker AutoML will evaluate per target feature; integer in [1, 5], defaults to 1 for fastest results.
   * - Max Runtime Per Training Job In Seconds
     - max_runtime_per_training_job_in_seconds
     - number
     - 1800
     - 
     - Wall-clock time budget in seconds for each individual candidate training trial; integer in [600, 3600] with 60-second steps, defaults to 1800 s (30 min).
   * - Max AutoML Job Runtime In Seconds
     - max_auto_ml_job_runtime_in_seconds
     - number
     - 3600
     - 
     - Total wall-clock time budget in seconds for the entire AutoML job (across all candidates); integer in [1800, 7200] with 600-second steps, defaults to 3600 s (1 hr).
   * - Cleanup Candidates
     - cleanup_candidates
     - boolean
     - False
     - 
     - Boolean flag; when true, intermediate SageMaker candidate artifacts are deleted after the job completes to reduce S3 storage costs; defaults to false.

Outputs
-------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20

   * - Label
     - ID
     - Type
     - Description
   * - Models
     - models
     - dataset
     - Dataset mapping each target feature column name to its corresponding SageMaker model name, used as input to downstream prediction or evaluation workers.
   * - Model Training Information
     - model_training_information
     - dataset
     - Dataset containing per-candidate training metadata returned by SageMaker (algorithm, hyperparameters, metric scores, status) for each trained model.
   * - Input Features
     - training_input_features
     - text
     - Comma-separated string of the input feature column names that were actually used during training; useful for auditing and passing to prediction workers.
   * - Target Features
     - training_target_features
     - text
     - Comma-separated string of the target feature column names that were trained against; mirrors the user selection after validation and deduplication.
   * - SageMaker Job Key
     - sagemaker_job_key
     - text
     - Unique SageMaker AutoML job identifier (e.g., 'd3v-automl-20251001120000') used to query job status or retrieve artifacts from AWS directly.
   * - S3 Bucket Name
     - s3_bucket_name
     - text
     - Name of the S3 bucket ('d3v-sm-automl-learn-bucket') where training data, model artifacts, and AutoML outputs are stored.
   * - S3 Output Key
     - s3_output_key
     - text
     - S3 object-key prefix under which all AutoML output artifacts (model tarballs, candidate pipelines) are stored for this specific job run.
   * - Logs
     - logs
     - dataset
     - Dataset of structured log entries (log_type, log_message, log_time) recording worker execution progress, warnings, and errors during the AutoML training run.

Disciplines
-----------

- ai_ml.model_selection
- ai_ml.supervised.classification
- ai_ml.supervised.regression
- platform.integration
- platform.job_submission

.. raw:: html

   <hr style="margin-top:2em">
   <p style="font-size:11px;color:#888">
   Auto-generated from <code>platform</code> schema. Worker id: <code>aws_sagemaker_automl_learn</code>. Schema hash: <code>d8d61c6041f7</code>. Hand-curated docs in <code>workerexamples/</code> override this page when present.
   </p>