DATASET REMOVE OUTLIERS¶
Detects and removes outlier rows from a dataset using iterative statistical analysis on one or more specified columns. Use this worker to clean noisy tabular data before model training, statistical analysis, or reporting.
When to use¶
Classification: process.
Tagged: data_cleaning, iterative, outlier_removal, row_filter, statistics.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Dataset | dataset | dataset | — | Tabular dataset to be scanned for outliers; all rows in this dataset are candidates for removal if outlying values are detected in the specified columns. | |
| Columns To Check | columnstocheck | scalar | — | One or more column names (comma-separated or multi-select) from the input dataset whose values will be evaluated for outliers; leave unset to check all numeric columns. | |
| Number Of Iterations | num_iterations | scalar | 1 | Number of successive outlier-removal passes to perform (integer ≥ 1); default is 1 — increase when iterative removal is needed to handle clusters of extreme values. |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| dataset_remove_outliers_output_1 | dataset_remove_outliers_output_1 | dataset | Cleaned dataset with outlier rows removed, retaining the same column schema as the input dataset. |
Disciplines¶
- data.dataset.transform
- data.statistics
Runnable example¶
A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_remove_outliers
Auto-generated from transformation schema. Worker id: dataset_remove_outliers. Schema hash: 1b2913a53bd9. Hand-curated docs in workerexamples/ override this page when present.