TOKENIZE TEXT¶

Splits a plain-text string into tokens using a configurable delimiter or regex pattern, then returns the token at a specified index. Use this worker when you need to extract a single delimited field from a text value inside a workflow pipeline.

When to use¶

Classification: process.

Tagged: delimiter, extract, parse, regex, string, text_split, tokenize.

Inputs¶

Label	ID	Type	Default	Description
Text To Be Parsed	texttobeparsed	text	—	The raw text string to be tokenized; supply any plain-text value whose fields are separated by a known delimiter or regex pattern.
Tokentype	tokentype	string	regex	Delimiter or splitting strategy to apply; choose from underscore (‘_’), comma (‘,’), space (‘ ‘), or percent (‘%’) — defaults to ‘regex’ which splits on whitespace/common separators.
Index	index	string	—	Zero-based integer index of the token to return after splitting; leave blank to return all tokens.

Outputs¶

Label	ID	Type	Description
text_tokenize_output_1	text_tokenize_output_1	text	The extracted token (plain text) at the requested index, or the full list of tokens when no index is specified.

Disciplines¶

ai_ml.preprocessing
data.dataset.transform

Auto-generated from transformation schema. Worker id: text_tokenize. Schema hash: 3ac86e9a3e58. Hand-curated docs in workerexamples/ override this page when present.