CREATES A RANDOM DATASET¶
Draws a random sample of rows from a source dataset, returning a new dataset of the requested size. Use this worker to reduce large datasets for faster downstream processing or to create randomized subsets for training/testing splits.
When to use¶
Classification: process.
Tagged: dataset, process, random_sampling, row_sample, sampling, transformations.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Source | source_dataset | dataset | — | The input dataset (tabular) from which rows will be randomly drawn; leave unconnected only if the worker is being used in a context where data is injected programmatically. | |
| Number Of Samples | num_samples | scalar | -1 | Integer count of rows to randomly draw from the source dataset; use -1 (default) to sample all rows, effectively shuffling the dataset. | |
| Reset Index | reset_index | string | no | Whether to reset the row index of the sampled output to a continuous 0-based range (‘yes’) or retain the original source indices (‘no’, default); set to ‘yes’ when the downstream worker expects a clean integer index. |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| dataset_random_sampler_output_1 | dataset_random_sampler_output_1 | dataset | Randomly sampled subset of the source dataset as a tabular dataset, containing exactly num_samples rows (or all rows if num_samples is -1), with index reset or preserved according to the reset_index parameter. |
Disciplines¶
- ai_ml.preprocessing
- data.dataset.transform
Runnable example¶
A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_random_sampler
Auto-generated from transformation schema. Worker id: dataset_random_sampler. Schema hash: 2f5424603bbd. Hand-curated docs in workerexamples/ override this page when present.