DATASET REMOVE OUTLIERS

Detects and removes outlier rows from a dataset using iterative statistical analysis on one or more specified columns. Use this worker to clean noisy tabular data before model training, statistical analysis, or reporting.

When to use

Classification: process.

Tagged: data_cleaning, iterative, outlier_removal, row_filter, statistics.

Inputs

Label ID Type Default Required Description
Dataset dataset dataset   Tabular dataset to be scanned for outliers; all rows in this dataset are candidates for removal if outlying values are detected in the specified columns.
Columns To Check columnstocheck scalar   One or more column names (comma-separated or multi-select) from the input dataset whose values will be evaluated for outliers; leave unset to check all numeric columns.
Number Of Iterations num_iterations scalar 1   Number of successive outlier-removal passes to perform (integer ≥ 1); default is 1 — increase when iterative removal is needed to handle clusters of extreme values.

Outputs

Label ID Type Description
dataset_remove_outliers_output_1 dataset_remove_outliers_output_1 dataset Cleaned dataset with outlier rows removed, retaining the same column schema as the input dataset.

Disciplines

  • data.dataset.transform
  • data.statistics

Runnable example

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_remove_outliers


Auto-generated from transformation schema. Worker id: dataset_remove_outliers. Schema hash: 1b2913a53bd9. Hand-curated docs in workerexamples/ override this page when present.