.. _auto_csv_collect_columns_to_dataset:

*PARSES SEVERAL CSVS FILES BASED ON EXTENSION AND COLLECTS THE COLUMNS TO CREATE A DATASET*
===========================================================================================

Scans multiple CSV files (filtered by file extension) and merges selected columns across all matching files into a single dataset. Use this worker when you need to consolidate specific columns from a batch of CSV files into one unified tabular dataset for downstream analysis or modeling.

When to use
-----------

Classification: **process**.

Tagged: ``batch``, ``column-extraction``, ``csv``, ``dataset-builder``, ``ingest``, ``multi-file``, ``parse``.

Inputs
------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20 20 20

   * - Label
     - ID
     - Type
     - Default
     - Required
     - Description
   * - Files Extensions Separated By Comma
     - files_extensions_separatedby_comma
     - string
     - —
     - 
     - Comma-separated list of file extensions to match when scanning for input files (e.g. 'csv,txt'); leave blank to process all files regardless of extension.
   * - Line Delimiter
     - line_delimiter
     - string
     - ,
     - 
     - Character used to delimit fields within each CSV row (e.g. ',' for standard CSV, '\t' for TSV); defaults to comma if left empty.
   * - Header Names
     - header_names
     - string
     - —
     - 
     - Comma-separated list of custom column header names to assign to the extracted columns; leave blank to use the header row already present in each file.
   * - Column Ids
     - column_ids
     - string
     - —
     - 
     - Comma-separated list of column indices or names to extract from each CSV file; leave blank to collect all columns.
   * - Starting Row Id
     - starting_row_id
     - integer
     - —
     - 
     - Zero-based or one-based row index at which to begin reading data from each file; leave blank to start from the first data row.
   * - Ending Row Id
     - ending_row_id
     - integer
     - —
     - 
     - Row index at which to stop reading data from each file (inclusive); leave blank to read all rows through the end of each file.
   * - Replace Values
     - replace_values
     - string
     - —
     - 
     - Mapping of values to find and replace in the collected data, expressed as a delimited key-value string (e.g. 'N/A:0,null:0'); leave blank to perform no substitutions.

Outputs
-------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20

   * - Label
     - ID
     - Type
     - Description
   * - csv_collect_columns_to_dataset_output_1
     - csv_collect_columns_to_dataset_output_1
     - dataset
     - Consolidated dataset containing the extracted and merged columns from all matched CSV files, ready for downstream transformation, analysis, or model ingestion.

Disciplines
-----------

- data.dataset.ingest
- data.dataset.transform
- data.io.csv

.. raw:: html

   <hr style="margin-top:2em">
   <p style="font-size:11px;color:#888">
   Auto-generated from <code>transformation</code> schema. Worker id: <code>csv_collect_columns_to_dataset</code>. Schema hash: <code>b7f6c14d7fac</code>. Hand-curated docs in <code>workerexamples/</code> override this page when present.
   </p>