DATASET REMOVE UNIQUE COLUMNS¶
Removes columns from a dataset whose ratio of unique values exceeds a specified threshold, helping to eliminate high-cardinality identifier-like columns before modeling or analysis. Use this worker to clean datasets by dropping columns that carry little statistical signal due to near-unique values in every row.
When to use¶
Classification: process.
Tagged: column_filtering, data_cleaning, high_cardinality, preprocessing, remove_unique_columns, uniqueness_ratio.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Choose Dataset | dataset_1 | dataset | — | Input dataset (tabular) from which high-cardinality columns will be removed; accepts any dataset object available in the workflow. | |
| Uniqueness Ratio | uniqueness_ratio | scalar | 0.05 | Fraction threshold (0–1) above which a column is considered too unique and dropped; defaults to 0.05, meaning columns where more than 5 % of values are unique relative to the total row count are removed. |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| dataset_remove_unique_columns_output_1 | dataset_remove_unique_columns_output_1 | dataset | Cleaned dataset with all columns whose uniqueness ratio exceeds the specified threshold removed, preserving the original row order and remaining column values. |
Disciplines¶
- ai_ml.preprocessing
- data.dataset.transform
Runnable example¶
A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_remove_unique_columns
Auto-generated from transformation schema. Worker id: dataset_remove_unique_columns. Schema hash: c08c94da978a. Hand-curated docs in workerexamples/ override this page when present.