.. _auto_aws_sagemaker_automl_learn: *AWS SAGEMAKER AUTOML LEARN* ============================ Trains one or more ML models via AWS SageMaker AutoML, automatically exploring candidate algorithms for regression or classification problems. Accepts a labeled training dataset with explicit input/target feature selections, then submits an AutoML job to SageMaker and returns trained model references alongside job metadata and S3 artifact locations. Use this worker when you want managed, cloud-based AutoML training without manual algorithm selection. When to use ----------- Tagged: ``automl``, ``aws``, ``classification``, ``cloud``, ``machine_learning``, ``model_selection``, ``regression``, ``s3``. Inputs ------ .. list-table:: :header-rows: 1 :widths: 20 20 20 20 20 20 * - Label - ID - Type - Default - Required - Description * - Training Dataset - training_dataset - dataset - — - ✓ - Tabular dataset (d3VIEW dataset type) used to train the AutoML model; must contain only the intended input and target feature columns — minimum 500 rows required by SageMaker AutoML. * - Input Features - input_features - string - — - ✓ - One or more column names from the training dataset to use as predictor features; must be unique and must not overlap with target_features. * - Target Features - target_features - string - — - ✓ - One or more column names from the training dataset to predict; each target triggers a separate SageMaker AutoML job, so selecting multiple targets increases total runtime. * - Problem Type - problem_type - select - Regression - ✓ - AutoML problem formulation — choose 'Regression' for continuous targets, 'BinaryClassification' for two-class targets, or 'MulticlassClassification' for three-or-more-class targets; defaults to 'Regression'. * - Error Metric - error_metric - select - MSE - ✓ - Optimization metric used by SageMaker to rank candidate models; options are MAE, MSE (default), R2, and RMSE — select a metric appropriate for the chosen problem type. * - Max Candidates - max_candidates - number - 1 - - Maximum number of algorithm/pipeline candidates that SageMaker AutoML will evaluate per target feature; integer in [1, 5], defaults to 1 for fastest results. * - Max Runtime Per Training Job In Seconds - max_runtime_per_training_job_in_seconds - number - 1800 - - Wall-clock time budget in seconds for each individual candidate training trial; integer in [600, 3600] with 60-second steps, defaults to 1800 s (30 min). * - Max AutoML Job Runtime In Seconds - max_auto_ml_job_runtime_in_seconds - number - 3600 - - Total wall-clock time budget in seconds for the entire AutoML job (across all candidates); integer in [1800, 7200] with 600-second steps, defaults to 3600 s (1 hr). * - Cleanup Candidates - cleanup_candidates - boolean - False - - Boolean flag; when true, intermediate SageMaker candidate artifacts are deleted after the job completes to reduce S3 storage costs; defaults to false. Outputs ------- .. list-table:: :header-rows: 1 :widths: 20 20 20 20 * - Label - ID - Type - Description * - Models - models - dataset - Dataset mapping each target feature column name to its corresponding SageMaker model name, used as input to downstream prediction or evaluation workers. * - Model Training Information - model_training_information - dataset - Dataset containing per-candidate training metadata returned by SageMaker (algorithm, hyperparameters, metric scores, status) for each trained model. * - Input Features - training_input_features - text - Comma-separated string of the input feature column names that were actually used during training; useful for auditing and passing to prediction workers. * - Target Features - training_target_features - text - Comma-separated string of the target feature column names that were trained against; mirrors the user selection after validation and deduplication. * - SageMaker Job Key - sagemaker_job_key - text - Unique SageMaker AutoML job identifier (e.g., 'd3v-automl-20251001120000') used to query job status or retrieve artifacts from AWS directly. * - S3 Bucket Name - s3_bucket_name - text - Name of the S3 bucket ('d3v-sm-automl-learn-bucket') where training data, model artifacts, and AutoML outputs are stored. * - S3 Output Key - s3_output_key - text - S3 object-key prefix under which all AutoML output artifacts (model tarballs, candidate pipelines) are stored for this specific job run. * - Logs - logs - dataset - Dataset of structured log entries (log_type, log_message, log_time) recording worker execution progress, warnings, and errors during the AutoML training run. Disciplines ----------- - ai_ml.model_selection - ai_ml.supervised.classification - ai_ml.supervised.regression - platform.integration - platform.job_submission .. raw:: html
Auto-generated from platform schema. Worker id: aws_sagemaker_automl_learn. Schema hash: d8d61c6041f7. Hand-curated docs in workerexamples/ override this page when present.