.. _auto_text_tokenize:

*TOKENIZE TEXT*
===============

Splits a plain-text string into tokens using a configurable delimiter or regex pattern, then returns the token at a specified index. Use this worker when you need to extract a single delimited field from a text value inside a workflow pipeline.

When to use
-----------

Classification: **process**.

Tagged: ``delimiter``, ``extract``, ``parse``, ``regex``, ``string``, ``text_split``, ``tokenize``.

Inputs
------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20 20 20

   * - Label
     - ID
     - Type
     - Default
     - Required
     - Description
   * - Text To Be Parsed
     - texttobeparsed
     - text
     - —
     - 
     - The raw text string to be tokenized; supply any plain-text value whose fields are separated by a known delimiter or regex pattern.
   * - Tokentype
     - tokentype
     - string
     - regex
     - 
     - Delimiter or splitting strategy to apply; choose from underscore ('_'), comma (','), space (' '), or percent ('%') — defaults to 'regex' which splits on whitespace/common separators.
   * - Index
     - index
     - string
     - —
     - 
     - Zero-based integer index of the token to return after splitting; leave blank to return all tokens.

Outputs
-------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20

   * - Label
     - ID
     - Type
     - Description
   * - text_tokenize_output_1
     - text_tokenize_output_1
     - text
     - The extracted token (plain text) at the requested index, or the full list of tokens when no index is specified.

Disciplines
-----------

- ai_ml.preprocessing
- data.dataset.transform

.. raw:: html

   <hr style="margin-top:2em">
   <p style="font-size:11px;color:#888">
   Auto-generated from <code>transformation</code> schema. Worker id: <code>text_tokenize</code>. Schema hash: <code>3ac86e9a3e58</code>. Hand-curated docs in <code>workerexamples/</code> override this page when present.
   </p>