AWS SAGEMAKER AUTOML LEARN¶

Trains one or more ML models via AWS SageMaker AutoML, automatically exploring candidate algorithms for regression or classification problems. Accepts a labeled training dataset with explicit input/target feature selections, then submits an AutoML job to SageMaker and returns trained model references alongside job metadata and S3 artifact locations. Use this worker when you want managed, cloud-based AutoML training without manual algorithm selection.

When to use¶

Tagged: automl, aws, classification, cloud, machine_learning, model_selection, regression, s3.

Inputs¶

Label	ID	Type	Default	Required	Description
Training Dataset	training_dataset	dataset	—	✓	Tabular dataset (d3VIEW dataset type) used to train the AutoML model; must contain only the intended input and target feature columns — minimum 500 rows required by SageMaker AutoML.
Input Features	input_features	string	—	✓	One or more column names from the training dataset to use as predictor features; must be unique and must not overlap with target_features.
Target Features	target_features	string	—	✓	One or more column names from the training dataset to predict; each target triggers a separate SageMaker AutoML job, so selecting multiple targets increases total runtime.
Problem Type	problem_type	select	Regression	✓	AutoML problem formulation — choose ‘Regression’ for continuous targets, ‘BinaryClassification’ for two-class targets, or ‘MulticlassClassification’ for three-or-more-class targets; defaults to ‘Regression’.
Error Metric	error_metric	select	MSE	✓	Optimization metric used by SageMaker to rank candidate models; options are MAE, MSE (default), R2, and RMSE — select a metric appropriate for the chosen problem type.
Max Candidates	max_candidates	number	1		Maximum number of algorithm/pipeline candidates that SageMaker AutoML will evaluate per target feature; integer in [1, 5], defaults to 1 for fastest results.
Max Runtime Per Training Job In Seconds	max_runtime_per_training_job_in_seconds	number	1800		Wall-clock time budget in seconds for each individual candidate training trial; integer in [600, 3600] with 60-second steps, defaults to 1800 s (30 min).
Max AutoML Job Runtime In Seconds	max_auto_ml_job_runtime_in_seconds	number	3600		Total wall-clock time budget in seconds for the entire AutoML job (across all candidates); integer in [1800, 7200] with 600-second steps, defaults to 3600 s (1 hr).
Cleanup Candidates	cleanup_candidates	boolean	False		Boolean flag; when true, intermediate SageMaker candidate artifacts are deleted after the job completes to reduce S3 storage costs; defaults to false.

Outputs¶

Label	ID	Type	Description
Models	models	dataset	Dataset mapping each target feature column name to its corresponding SageMaker model name, used as input to downstream prediction or evaluation workers.
Model Training Information	model_training_information	dataset	Dataset containing per-candidate training metadata returned by SageMaker (algorithm, hyperparameters, metric scores, status) for each trained model.
Input Features	training_input_features	text	Comma-separated string of the input feature column names that were actually used during training; useful for auditing and passing to prediction workers.
Target Features	training_target_features	text	Comma-separated string of the target feature column names that were trained against; mirrors the user selection after validation and deduplication.
SageMaker Job Key	sagemaker_job_key	text	Unique SageMaker AutoML job identifier (e.g., ‘d3v-automl-20251001120000’) used to query job status or retrieve artifacts from AWS directly.
S3 Bucket Name	s3_bucket_name	text	Name of the S3 bucket (‘d3v-sm-automl-learn-bucket’) where training data, model artifacts, and AutoML outputs are stored.
S3 Output Key	s3_output_key	text	S3 object-key prefix under which all AutoML output artifacts (model tarballs, candidate pipelines) are stored for this specific job run.
Logs	logs	dataset	Dataset of structured log entries (log_type, log_message, log_time) recording worker execution progress, warnings, and errors during the AutoML training run.

Disciplines¶

ai_ml.model_selection
ai_ml.supervised.classification
ai_ml.supervised.regression
platform.integration
platform.job_submission

Auto-generated from platform schema. Worker id: aws_sagemaker_automl_learn. Schema hash: 82e7307e4e86. Hand-curated docs in workerexamples/ override this page when present.