DATASET GROUP BY

Groups a dataset by one or more columns, annotating each resulting row with a configurable group label and column prefix. Use this worker to split or tag a flat dataset into labelled groups for downstream aggregation, filtering, or ML pipelines.

When to use

Classification: process.

Tagged: aggregation, dataset, group_by, label, split, transform.

Inputs

Label ID Type Default Required Description
Choose Dataset dataset dataset   Input dataset to be grouped; accepts any tabular dataset available on the platform — leave empty only if the dataset is injected by an upstream worker.
Group By groupby scalar unclassified   One or more column names (multi-select, drawn from the input dataset) whose unique value combinations define the groups; defaults to ‘unclassified’ when no column is selected.
Column Prefix col_prefix text group_   String prefix prepended to each generated group-identifier column name (e.g. ‘group_’ produces ‘group_0’, ‘group_1’, …); defaults to ‘group_’.
Column Name column_name text group   Base name for the new column that stores the group label assigned to each row; defaults to ‘group’.

Outputs

Label ID Type Description
dataset_group_by_output_1 dataset_group_by_output_1 dataset Transformed dataset identical in rows to the input but augmented with a group-label column (named per column_name/col_prefix) that identifies which group each row belongs to.

Disciplines

  • data.dataset.transform

Runnable example

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_group_by


Auto-generated from transformation schema. Worker id: dataset_group_by. Schema hash: 5ad596e651c2. Hand-curated docs in workerexamples/ override this page when present.