.. _auto_csv_collect_columns_to_dataset: *PARSES SEVERAL CSVS FILES BASED ON EXTENSION AND COLLECTS THE COLUMNS TO CREATE A DATASET* =========================================================================================== Scans multiple CSV files (filtered by file extension) and merges selected columns across all matching files into a single dataset. Use this worker when you need to consolidate specific columns from a batch of CSV files into one unified tabular dataset for downstream analysis or modeling. When to use ----------- Classification: **process**. Tagged: ``batch``, ``column-extraction``, ``csv``, ``dataset-builder``, ``ingest``, ``multi-file``, ``parse``. Inputs ------ .. list-table:: :header-rows: 1 :widths: 20 20 20 20 20 20 * - Label - ID - Type - Default - Required - Description * - Files Extensions Separated By Comma - files_extensions_separatedby_comma - string - — - - Comma-separated list of file extensions to match when scanning for input files (e.g. 'csv,txt'); leave blank to process all files regardless of extension. * - Line Delimiter - line_delimiter - string - , - - Character used to delimit fields within each CSV row (e.g. ',' for standard CSV, '\t' for TSV); defaults to comma if left empty. * - Header Names - header_names - string - — - - Comma-separated list of custom column header names to assign to the extracted columns; leave blank to use the header row already present in each file. * - Column Ids - column_ids - string - — - - Comma-separated list of column indices or names to extract from each CSV file; leave blank to collect all columns. * - Starting Row Id - starting_row_id - integer - — - - Zero-based or one-based row index at which to begin reading data from each file; leave blank to start from the first data row. * - Ending Row Id - ending_row_id - integer - — - - Row index at which to stop reading data from each file (inclusive); leave blank to read all rows through the end of each file. * - Replace Values - replace_values - string - — - - Mapping of values to find and replace in the collected data, expressed as a delimited key-value string (e.g. 'N/A:0,null:0'); leave blank to perform no substitutions. Outputs ------- .. list-table:: :header-rows: 1 :widths: 20 20 20 20 * - Label - ID - Type - Description * - csv_collect_columns_to_dataset_output_1 - csv_collect_columns_to_dataset_output_1 - dataset - Consolidated dataset containing the extracted and merged columns from all matched CSV files, ready for downstream transformation, analysis, or model ingestion. Disciplines ----------- - data.dataset.ingest - data.dataset.transform - data.io.csv .. raw:: html
Auto-generated from transformation schema. Worker id: csv_collect_columns_to_dataset. Schema hash: b7f6c14d7fac. Hand-curated docs in workerexamples/ override this page when present.