DATASET REMOVE UNIQUE COLUMNS¶

Removes columns from a dataset whose ratio of unique values exceeds a specified threshold, helping to eliminate high-cardinality identifier-like columns before modeling or analysis. Use this worker to clean datasets by dropping columns that carry little statistical signal due to near-unique values in every row.

When to use¶

Classification: process.

Tagged: column_filtering, data_cleaning, high_cardinality, preprocessing, remove_unique_columns, uniqueness_ratio.

Inputs¶

Label	ID	Type	Default	Required	Description
Choose Dataset	dataset_1	dataset	—		Input dataset (tabular) from which high-cardinality columns will be removed; accepts any dataset object available in the workflow.
Uniqueness Ratio	uniqueness_ratio	scalar	0.05		Fraction threshold (0–1) above which a column is considered too unique and dropped; defaults to 0.05, meaning columns where more than 5 % of values are unique relative to the total row count are removed.

Outputs¶

Label	ID	Type	Description
dataset_remove_unique_columns_output_1	dataset_remove_unique_columns_output_1	dataset	Cleaned dataset with all columns whose uniqueness ratio exceeds the specified threshold removed, preserving the original row order and remaining column values.

Disciplines¶

ai_ml.preprocessing
data.dataset.transform

Runnable example¶

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_remove_unique_columns

Auto-generated from transformation schema. Worker id: dataset_remove_unique_columns. Schema hash: c08c94da978a. Hand-curated docs in workerexamples/ override this page when present.