AWS SAGEMAKER AUTOML LEARN¶
Trains one or more ML models via AWS SageMaker AutoML, automatically exploring candidate algorithms for regression or classification problems. Accepts a labeled training dataset with explicit input/target feature selections, then submits an AutoML job to SageMaker and returns trained model references alongside job metadata and S3 artifact locations. Use this worker when you want managed, cloud-based AutoML training without manual algorithm selection.
When to use¶
Tagged: automl, aws, classification, cloud, machine_learning, model_selection, regression, s3.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Training Dataset | training_dataset | dataset | — | ✓ | Tabular dataset (d3VIEW dataset type) used to train the AutoML model; must contain only the intended input and target feature columns — minimum 500 rows required by SageMaker AutoML. |
| Input Features | input_features | string | — | ✓ | One or more column names from the training dataset to use as predictor features; must be unique and must not overlap with target_features. |
| Target Features | target_features | string | — | ✓ | One or more column names from the training dataset to predict; each target triggers a separate SageMaker AutoML job, so selecting multiple targets increases total runtime. |
| Problem Type | problem_type | select | Regression | ✓ | AutoML problem formulation — choose ‘Regression’ for continuous targets, ‘BinaryClassification’ for two-class targets, or ‘MulticlassClassification’ for three-or-more-class targets; defaults to ‘Regression’. |
| Error Metric | error_metric | select | MSE | ✓ | Optimization metric used by SageMaker to rank candidate models; options are MAE, MSE (default), R2, and RMSE — select a metric appropriate for the chosen problem type. |
| Max Candidates | max_candidates | number | 1 | Maximum number of algorithm/pipeline candidates that SageMaker AutoML will evaluate per target feature; integer in [1, 5], defaults to 1 for fastest results. | |
| Max Runtime Per Training Job In Seconds | max_runtime_per_training_job_in_seconds | number | 1800 | Wall-clock time budget in seconds for each individual candidate training trial; integer in [600, 3600] with 60-second steps, defaults to 1800 s (30 min). | |
| Max AutoML Job Runtime In Seconds | max_auto_ml_job_runtime_in_seconds | number | 3600 | Total wall-clock time budget in seconds for the entire AutoML job (across all candidates); integer in [1800, 7200] with 600-second steps, defaults to 3600 s (1 hr). | |
| Cleanup Candidates | cleanup_candidates | boolean | False | Boolean flag; when true, intermediate SageMaker candidate artifacts are deleted after the job completes to reduce S3 storage costs; defaults to false. |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| Models | models | dataset | Dataset mapping each target feature column name to its corresponding SageMaker model name, used as input to downstream prediction or evaluation workers. |
| Model Training Information | model_training_information | dataset | Dataset containing per-candidate training metadata returned by SageMaker (algorithm, hyperparameters, metric scores, status) for each trained model. |
| Input Features | training_input_features | text | Comma-separated string of the input feature column names that were actually used during training; useful for auditing and passing to prediction workers. |
| Target Features | training_target_features | text | Comma-separated string of the target feature column names that were trained against; mirrors the user selection after validation and deduplication. |
| SageMaker Job Key | sagemaker_job_key | text | Unique SageMaker AutoML job identifier (e.g., ‘d3v-automl-20251001120000’) used to query job status or retrieve artifacts from AWS directly. |
| S3 Bucket Name | s3_bucket_name | text | Name of the S3 bucket (‘d3v-sm-automl-learn-bucket’) where training data, model artifacts, and AutoML outputs are stored. |
| S3 Output Key | s3_output_key | text | S3 object-key prefix under which all AutoML output artifacts (model tarballs, candidate pipelines) are stored for this specific job run. |
| Logs | logs | dataset | Dataset of structured log entries (log_type, log_message, log_time) recording worker execution progress, warnings, and errors during the AutoML training run. |
Disciplines¶
- ai_ml.model_selection
- ai_ml.supervised.classification
- ai_ml.supervised.regression
- platform.integration
- platform.job_submission
Auto-generated from platform schema. Worker id: aws_sagemaker_automl_learn. Schema hash: d8d61c6041f7. Hand-curated docs in workerexamples/ override this page when present.