TOKENIZE TEXT¶
Splits a plain-text string into tokens using a configurable delimiter or regex pattern, then returns the token at a specified index. Use this worker when you need to extract a single delimited field from a text value inside a workflow pipeline.
When to use¶
Classification: process.
Tagged: delimiter, extract, parse, regex, string, text_split, tokenize.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Text To Be Parsed | texttobeparsed | text | — | The raw text string to be tokenized; supply any plain-text value whose fields are separated by a known delimiter or regex pattern. | |
| Tokentype | tokentype | string | regex | Delimiter or splitting strategy to apply; choose from underscore (‘_’), comma (‘,’), space (‘ ‘), or percent (‘%’) — defaults to ‘regex’ which splits on whitespace/common separators. | |
| Index | index | string | — | Zero-based integer index of the token to return after splitting; leave blank to return all tokens. |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| text_tokenize_output_1 | text_tokenize_output_1 | text | The extracted token (plain text) at the requested index, or the full list of tokens when no index is specified. |
Disciplines¶
- ai_ml.preprocessing
- data.dataset.transform
Auto-generated from transformation schema. Worker id: text_tokenize. Schema hash: 3ac86e9a3e58. Hand-curated docs in workerexamples/ override this page when present.