TOKENIZE TEXT

Splits a plain-text string into tokens using a configurable delimiter or regex pattern, then returns the token at a specified index. Use this worker when you need to extract a single delimited field from a text value inside a workflow pipeline.

When to use

Classification: process.

Tagged: delimiter, extract, parse, regex, string, text_split, tokenize.

Inputs

Label ID Type Default Required Description
Text To Be Parsed texttobeparsed text   The raw text string to be tokenized; supply any plain-text value whose fields are separated by a known delimiter or regex pattern.
Tokentype tokentype string regex   Delimiter or splitting strategy to apply; choose from underscore (‘_’), comma (‘,’), space (‘ ‘), or percent (‘%’) — defaults to ‘regex’ which splits on whitespace/common separators.
Index index string   Zero-based integer index of the token to return after splitting; leave blank to return all tokens.

Outputs

Label ID Type Description
text_tokenize_output_1 text_tokenize_output_1 text The extracted token (plain text) at the requested index, or the full list of tokens when no index is specified.

Disciplines

  • ai_ml.preprocessing
  • data.dataset.transform

Auto-generated from transformation schema. Worker id: text_tokenize. Schema hash: 3ac86e9a3e58. Hand-curated docs in workerexamples/ override this page when present.