AWS SAGEMAKER AUTOML LEARN

Trains one or more ML models via AWS SageMaker AutoML, automatically exploring candidate algorithms for regression or classification problems. Accepts a labeled training dataset with explicit input/target feature selections, then submits an AutoML job to SageMaker and returns trained model references alongside job metadata and S3 artifact locations. Use this worker when you want managed, cloud-based AutoML training without manual algorithm selection.

When to use

Tagged: automl, aws, classification, cloud, machine_learning, model_selection, regression, s3.

Inputs

Label ID Type Default Required Description
Training Dataset training_dataset dataset Tabular dataset (d3VIEW dataset type) used to train the AutoML model; must contain only the intended input and target feature columns — minimum 500 rows required by SageMaker AutoML.
Input Features input_features string One or more column names from the training dataset to use as predictor features; must be unique and must not overlap with target_features.
Target Features target_features string One or more column names from the training dataset to predict; each target triggers a separate SageMaker AutoML job, so selecting multiple targets increases total runtime.
Problem Type problem_type select Regression AutoML problem formulation — choose ‘Regression’ for continuous targets, ‘BinaryClassification’ for two-class targets, or ‘MulticlassClassification’ for three-or-more-class targets; defaults to ‘Regression’.
Error Metric error_metric select MSE Optimization metric used by SageMaker to rank candidate models; options are MAE, MSE (default), R2, and RMSE — select a metric appropriate for the chosen problem type.
Max Candidates max_candidates number 1   Maximum number of algorithm/pipeline candidates that SageMaker AutoML will evaluate per target feature; integer in [1, 5], defaults to 1 for fastest results.
Max Runtime Per Training Job In Seconds max_runtime_per_training_job_in_seconds number 1800   Wall-clock time budget in seconds for each individual candidate training trial; integer in [600, 3600] with 60-second steps, defaults to 1800 s (30 min).
Max AutoML Job Runtime In Seconds max_auto_ml_job_runtime_in_seconds number 3600   Total wall-clock time budget in seconds for the entire AutoML job (across all candidates); integer in [1800, 7200] with 600-second steps, defaults to 3600 s (1 hr).
Cleanup Candidates cleanup_candidates boolean False   Boolean flag; when true, intermediate SageMaker candidate artifacts are deleted after the job completes to reduce S3 storage costs; defaults to false.

Outputs

Label ID Type Description
Models models dataset Dataset mapping each target feature column name to its corresponding SageMaker model name, used as input to downstream prediction or evaluation workers.
Model Training Information model_training_information dataset Dataset containing per-candidate training metadata returned by SageMaker (algorithm, hyperparameters, metric scores, status) for each trained model.
Input Features training_input_features text Comma-separated string of the input feature column names that were actually used during training; useful for auditing and passing to prediction workers.
Target Features training_target_features text Comma-separated string of the target feature column names that were trained against; mirrors the user selection after validation and deduplication.
SageMaker Job Key sagemaker_job_key text Unique SageMaker AutoML job identifier (e.g., ‘d3v-automl-20251001120000’) used to query job status or retrieve artifacts from AWS directly.
S3 Bucket Name s3_bucket_name text Name of the S3 bucket (‘d3v-sm-automl-learn-bucket’) where training data, model artifacts, and AutoML outputs are stored.
S3 Output Key s3_output_key text S3 object-key prefix under which all AutoML output artifacts (model tarballs, candidate pipelines) are stored for this specific job run.
Logs logs dataset Dataset of structured log entries (log_type, log_message, log_time) recording worker execution progress, warnings, and errors during the AutoML training run.

Disciplines

  • ai_ml.model_selection
  • ai_ml.supervised.classification
  • ai_ml.supervised.regression
  • platform.integration
  • platform.job_submission

Auto-generated from platform schema. Worker id: aws_sagemaker_automl_learn. Schema hash: d8d61c6041f7. Hand-curated docs in workerexamples/ override this page when present.