DATASET GROUP BY¶
Groups a dataset by one or more columns, annotating each resulting row with a configurable group label and column prefix. Use this worker to split or tag a flat dataset into labelled groups for downstream aggregation, filtering, or ML pipelines.
When to use¶
Classification: process.
Tagged: aggregation, dataset, group_by, label, split, transform.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Choose Dataset | dataset | dataset | — | Input dataset to be grouped; accepts any tabular dataset available on the platform — leave empty only if the dataset is injected by an upstream worker. | |
| Group By | groupby | scalar | unclassified | One or more column names (multi-select, drawn from the input dataset) whose unique value combinations define the groups; defaults to ‘unclassified’ when no column is selected. | |
| Column Prefix | col_prefix | text | group_ | String prefix prepended to each generated group-identifier column name (e.g. ‘group_’ produces ‘group_0’, ‘group_1’, …); defaults to ‘group_’. | |
| Column Name | column_name | text | group | Base name for the new column that stores the group label assigned to each row; defaults to ‘group’. |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| dataset_group_by_output_1 | dataset_group_by_output_1 | dataset | Transformed dataset identical in rows to the input but augmented with a group-label column (named per column_name/col_prefix) that identifies which group each row belongs to. |
Disciplines¶
- data.dataset.transform
Runnable example¶
A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_group_by
Auto-generated from transformation schema. Worker id: dataset_group_by. Schema hash: 5ad596e651c2. Hand-curated docs in workerexamples/ override this page when present.