DATASET ENCODER¶
Encodes categorical columns in a dataset using either label encoding (integer mapping) or one-hot / unique-column expansion. Use this worker to convert string or categorical features into numeric representations before feeding data into ML pipelines.
When to use¶
Classification: process.
Tagged: categorical, dataset_transform, encoder, label_encoder, one_hot, preprocessing.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Choose Dataset | dataset_1 | dataset | — | Input dataset containing the columns to be encoded; leave empty only if the dataset will be injected dynamically by an upstream worker. | |
| Encoder Type | encoder_type | scalar | label | Encoding strategy to apply: ‘label’ assigns a unique integer to each category in-place, while ‘unique_columns’ expands each category into its own binary column (one-hot style); defaults to ‘label’. | |
| Columns To Encode | columns | scalar | — | One or more column names from dataset_1 to encode; populated dynamically from the chosen dataset — leave blank to encode all detected categorical columns. | |
| Specify Encoding | mapper_input | encoder | — | Optional explicit value-to-encoding map per column (e.g. {“color”: {“red”: 0, “blue”: 1}}); use this to enforce a fixed encoding scheme rather than deriving it automatically from the data. |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| dataset_encoder_output_1 | dataset_encoder_output_1 | dataset | Transformed dataset identical in structure to the input but with the selected columns replaced (label mode) or expanded (unique_columns mode) with their numeric encodings. |
Disciplines¶
- ai_ml.preprocessing
- data.dataset.transform
Runnable example¶
A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_encoder
Auto-generated from transformation schema. Worker id: dataset_encoder. Schema hash: 177909fb4c03. Hand-curated docs in workerexamples/ override this page when present.