.. _auto_text_tokenize: *TOKENIZE TEXT* =============== Splits a plain-text string into tokens using a configurable delimiter or regex pattern, then returns the token at a specified index. Use this worker when you need to extract a single delimited field from a text value inside a workflow pipeline. When to use ----------- Classification: **process**. Tagged: ``delimiter``, ``extract``, ``parse``, ``regex``, ``string``, ``text_split``, ``tokenize``. Inputs ------ .. list-table:: :header-rows: 1 :widths: 20 20 20 20 20 20 * - Label - ID - Type - Default - Required - Description * - Text To Be Parsed - texttobeparsed - text - — - - The raw text string to be tokenized; supply any plain-text value whose fields are separated by a known delimiter or regex pattern. * - Tokentype - tokentype - string - regex - - Delimiter or splitting strategy to apply; choose from underscore ('_'), comma (','), space (' '), or percent ('%') — defaults to 'regex' which splits on whitespace/common separators. * - Index - index - string - — - - Zero-based integer index of the token to return after splitting; leave blank to return all tokens. Outputs ------- .. list-table:: :header-rows: 1 :widths: 20 20 20 20 * - Label - ID - Type - Description * - text_tokenize_output_1 - text_tokenize_output_1 - text - The extracted token (plain text) at the requested index, or the full list of tokens when no index is specified. Disciplines ----------- - ai_ml.preprocessing - data.dataset.transform .. raw:: html

Auto-generated from transformation schema. Worker id: text_tokenize. Schema hash: 3ac86e9a3e58. Hand-curated docs in workerexamples/ override this page when present.