PARSES SEVERAL CSVS FILES BASED ON EXTENSION AND COLLECTS THE ROWS TO CREATE A DATASET¶
Scans a collection of CSV (or similarly-delimited) files filtered by file extension, parses each file using the specified header and value rows, and concatenates all rows into a single unified dataset. Use this worker when you need to batch-ingest multiple CSV files from a directory and merge their row-level data for downstream analysis.
When to use¶
Classification: process.
Tagged: batch, collect, concatenate, csv, dataset, extension-filter, ingest, multi-file.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Files Extensions Separated By Comma | files_extensions_separatedby_comma | string | — | Comma-separated list of file extensions to include when scanning for input files (e.g. ‘csv,txt’); leave blank to process all files regardless of extension. | |
| Header Row | header_row | integer | — | Zero-based (or one-based, per platform convention) row index that contains the column header names; leave blank to use the first row as the header. | |
| Value Row | value_row | string | — | Row index or range (as a string) at which data values begin; leave blank to start reading immediately after the header row. | |
| Suppress From Header | suppressfrom_header | string | — | Comma-separated list of column names or patterns to exclude from the parsed header, effectively dropping those columns from every file before merging. | |
| Limit Columns To | limit_columnsto | string | — | Comma-separated list of column names to retain in the output dataset; all other columns are discarded — leave blank to keep all columns. |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| csv_collect_rows_to_dataset_output_1 | csv_collect_rows_to_dataset_output_1 | dataset | Unified tabular dataset produced by row-wise concatenation of all matching CSV files, with columns filtered and named according to the header, suppress, and limit-columns settings. |
Disciplines¶
- data.dataset.ingest
- data.dataset.transform
- data.io.csv
Auto-generated from transformation schema. Worker id: csv_collect_rows_to_dataset. Schema hash: 902905f9bc5f. Hand-curated docs in workerexamples/ override this page when present.