PARSES SEVERAL CSVS FILES BASED ON EXTENSION AND COLLECTS THE COLUMNS TO CREATE A DATASET¶

Scans multiple CSV files (filtered by file extension) and merges selected columns across all matching files into a single dataset. Use this worker when you need to consolidate specific columns from a batch of CSV files into one unified tabular dataset for downstream analysis or modeling.

When to use¶

Classification: process.

Tagged: batch, column-extraction, csv, dataset-builder, ingest, multi-file, parse.

Inputs¶

Label	ID	Type	Default	Description
Files Extensions Separated By Comma	files_extensions_separatedby_comma	string	—	Comma-separated list of file extensions to match when scanning for input files (e.g. ‘csv,txt’); leave blank to process all files regardless of extension.
Line Delimiter	line_delimiter	string	,	Character used to delimit fields within each CSV row (e.g. ‘,’ for standard CSV, ‘t’ for TSV); defaults to comma if left empty.
Header Names	header_names	string	—	Comma-separated list of custom column header names to assign to the extracted columns; leave blank to use the header row already present in each file.
Column Ids	column_ids	string	—	Comma-separated list of column indices or names to extract from each CSV file; leave blank to collect all columns.
Starting Row Id	starting_row_id	integer	—	Zero-based or one-based row index at which to begin reading data from each file; leave blank to start from the first data row.
Ending Row Id	ending_row_id	integer	—	Row index at which to stop reading data from each file (inclusive); leave blank to read all rows through the end of each file.
Replace Values	replace_values	string	—	Mapping of values to find and replace in the collected data, expressed as a delimited key-value string (e.g. ‘N/A:0,null:0’); leave blank to perform no substitutions.

Outputs¶

Label	ID	Type	Description
csv_collect_columns_to_dataset_output_1	csv_collect_columns_to_dataset_output_1	dataset	Consolidated dataset containing the extracted and merged columns from all matched CSV files, ready for downstream transformation, analysis, or model ingestion.

Disciplines¶

data.dataset.ingest
data.dataset.transform
data.io.csv

Auto-generated from transformation schema. Worker id: csv_collect_columns_to_dataset. Schema hash: b7f6c14d7fac. Hand-curated docs in workerexamples/ override this page when present.