DATASET ADD COLUMN BY REGEXP

Adds a new column to a dataset by extracting values from an existing column using a regular expression with capture groups. The matched substring (at the specified capture-group index) is stored in the new column for every row. Use this worker when you need to parse structured text embedded in a column — e.g., extracting a numeric dimension from a filename or label string.

When to use

Classification: process.

Tagged: add_column, capture_group, dataset_transform, extract, regex, regexp, string_parsing.

Inputs

Label ID Type Default Required Description
Choose Dataset dataset dataset   Input dataset (tabular) whose rows will be processed; must contain the source column specified in ‘column_name’.
New Column Name new_column_name scalar   Name of the new column that will be appended to the dataset to hold the regex-extracted values.
Column For Regex column_name scalar   Name of the existing column in the dataset whose cell values will be searched with the regular expression.
Regular Expression regexp scalar   Regular expression (with at least one capture group) applied to each cell; e.g., ‘(d+)_mm’ extracts the digit sequence immediately preceding ‘_mm’.
Choose Index index scalar 1   1-based index of the capture group whose match is written to the new column; defaults to 1 (first capture group).

Outputs

Label ID Type Description
dataset_add_column_by_regexp_output_1 dataset_add_column_by_regexp_output_1 dataset A copy of the input dataset with the new column appended, containing the regex-extracted substring for each row.

Disciplines

  • data.dataset.transform

Runnable example

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_add_column_by_regexp


Auto-generated from transformation schema. Worker id: dataset_add_column_by_regexp. Schema hash: 94ca0fbc9be2. Hand-curated docs in workerexamples/ override this page when present.