DATASET REMOVE UNIQUE COLUMNS

Removes columns from a dataset whose ratio of unique values exceeds a specified threshold, helping to eliminate high-cardinality identifier-like columns before modeling or analysis. Use this worker to clean datasets by dropping columns that carry little statistical signal due to near-unique values in every row.

When to use

Classification: process.

Tagged: column_filtering, data_cleaning, high_cardinality, preprocessing, remove_unique_columns, uniqueness_ratio.

Inputs

Label ID Type Default Required Description
Choose Dataset dataset_1 dataset   Input dataset (tabular) from which high-cardinality columns will be removed; accepts any dataset object available in the workflow.
Uniqueness Ratio uniqueness_ratio scalar 0.05   Fraction threshold (0–1) above which a column is considered too unique and dropped; defaults to 0.05, meaning columns where more than 5 % of values are unique relative to the total row count are removed.

Outputs

Label ID Type Description
dataset_remove_unique_columns_output_1 dataset_remove_unique_columns_output_1 dataset Cleaned dataset with all columns whose uniqueness ratio exceeds the specified threshold removed, preserving the original row order and remaining column values.

Disciplines

  • ai_ml.preprocessing
  • data.dataset.transform

Runnable example

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_remove_unique_columns


Auto-generated from transformation schema. Worker id: dataset_remove_unique_columns. Schema hash: c08c94da978a. Hand-curated docs in workerexamples/ override this page when present.