PARSES SEVERAL CSVS FILES BASED ON EXTENSION AND COLLECTS THE COLUMNS TO CREATE A DATASET

Scans multiple CSV files (filtered by file extension) and merges selected columns across all matching files into a single dataset. Use this worker when you need to consolidate specific columns from a batch of CSV files into one unified tabular dataset for downstream analysis or modeling.

When to use

Classification: process.

Tagged: batch, column-extraction, csv, dataset-builder, ingest, multi-file, parse.

Inputs

Label ID Type Default Required Description
Files Extensions Separated By Comma files_extensions_separatedby_comma string   Comma-separated list of file extensions to match when scanning for input files (e.g. ‘csv,txt’); leave blank to process all files regardless of extension.
Line Delimiter line_delimiter string ,   Character used to delimit fields within each CSV row (e.g. ‘,’ for standard CSV, ‘t’ for TSV); defaults to comma if left empty.
Header Names header_names string   Comma-separated list of custom column header names to assign to the extracted columns; leave blank to use the header row already present in each file.
Column Ids column_ids string   Comma-separated list of column indices or names to extract from each CSV file; leave blank to collect all columns.
Starting Row Id starting_row_id integer   Zero-based or one-based row index at which to begin reading data from each file; leave blank to start from the first data row.
Ending Row Id ending_row_id integer   Row index at which to stop reading data from each file (inclusive); leave blank to read all rows through the end of each file.
Replace Values replace_values string   Mapping of values to find and replace in the collected data, expressed as a delimited key-value string (e.g. ‘N/A:0,null:0’); leave blank to perform no substitutions.

Outputs

Label ID Type Description
csv_collect_columns_to_dataset_output_1 csv_collect_columns_to_dataset_output_1 dataset Consolidated dataset containing the extracted and merged columns from all matched CSV files, ready for downstream transformation, analysis, or model ingestion.

Disciplines

  • data.dataset.ingest
  • data.dataset.transform
  • data.io.csv

Auto-generated from transformation schema. Worker id: csv_collect_columns_to_dataset. Schema hash: b7f6c14d7fac. Hand-curated docs in workerexamples/ override this page when present.