DATASET GROUP BY¶

Groups a dataset by one or more columns, annotating each resulting row with a configurable group label and column prefix. Use this worker to split or tag a flat dataset into labelled groups for downstream aggregation, filtering, or ML pipelines.

When to use¶

Classification: process.

Tagged: aggregation, dataset, group_by, label, split, transform.

Inputs¶

Label	ID	Type	Default	Description
Choose Dataset	dataset	dataset	—	Input dataset to be grouped; accepts any tabular dataset available on the platform — leave empty only if the dataset is injected by an upstream worker.
Group By	groupby	scalar	unclassified	One or more column names (multi-select, drawn from the input dataset) whose unique value combinations define the groups; defaults to ‘unclassified’ when no column is selected.
Column Prefix	col_prefix	text	group_	String prefix prepended to each generated group-identifier column name (e.g. ‘group_’ produces ‘group_0’, ‘group_1’, …); defaults to ‘group_’.
Column Name	column_name	text	group	Base name for the new column that stores the group label assigned to each row; defaults to ‘group’.

Outputs¶

Label	ID	Type	Description
dataset_group_by_output_1	dataset_group_by_output_1	dataset	Transformed dataset identical in rows to the input but augmented with a group-label column (named per column_name/col_prefix) that identifies which group each row belongs to.

Disciplines¶

data.dataset.transform

Runnable example¶

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_group_by

Auto-generated from transformation schema. Worker id: dataset_group_by. Schema hash: 5ad596e651c2. Hand-curated docs in workerexamples/ override this page when present.