DATASET ENCODER

Encodes categorical columns in a dataset using either label encoding (integer mapping) or one-hot / unique-column expansion. Use this worker to convert string or categorical features into numeric representations before feeding data into ML pipelines.

When to use

Classification: process.

Tagged: categorical, dataset_transform, encoder, label_encoder, one_hot, preprocessing.

Inputs

Label ID Type Default Required Description
Choose Dataset dataset_1 dataset   Input dataset containing the columns to be encoded; leave empty only if the dataset will be injected dynamically by an upstream worker.
Encoder Type encoder_type scalar label   Encoding strategy to apply: ‘label’ assigns a unique integer to each category in-place, while ‘unique_columns’ expands each category into its own binary column (one-hot style); defaults to ‘label’.
Columns To Encode columns scalar   One or more column names from dataset_1 to encode; populated dynamically from the chosen dataset — leave blank to encode all detected categorical columns.
Specify Encoding mapper_input encoder   Optional explicit value-to-encoding map per column (e.g. {“color”: {“red”: 0, “blue”: 1}}); use this to enforce a fixed encoding scheme rather than deriving it automatically from the data.

Outputs

Label ID Type Description
dataset_encoder_output_1 dataset_encoder_output_1 dataset Transformed dataset identical in structure to the input but with the selected columns replaced (label mode) or expanded (unique_columns mode) with their numeric encodings.

Disciplines

  • ai_ml.preprocessing
  • data.dataset.transform

Runnable example

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_encoder


Auto-generated from transformation schema. Worker id: dataset_encoder. Schema hash: 177909fb4c03. Hand-curated docs in workerexamples/ override this page when present.