CREATES A RANDOM DATASET¶

Draws a random sample of rows from a source dataset, returning a new dataset of the requested size. Use this worker to reduce large datasets for faster downstream processing or to create randomized subsets for training/testing splits.

When to use¶

Classification: process.

Tagged: dataset, process, random_sampling, row_sample, sampling, transformations.

Inputs¶

Label	ID	Type	Default	Description
Source	source_dataset	dataset	—	The input dataset (tabular) from which rows will be randomly drawn; leave unconnected only if the worker is being used in a context where data is injected programmatically.
Number Of Samples	num_samples	scalar	-1	Integer count of rows to randomly draw from the source dataset; use -1 (default) to sample all rows, effectively shuffling the dataset.
Reset Index	reset_index	string	no	Whether to reset the row index of the sampled output to a continuous 0-based range (‘yes’) or retain the original source indices (‘no’, default); set to ‘yes’ when the downstream worker expects a clean integer index.

Outputs¶

Label	ID	Type	Description
dataset_random_sampler_output_1	dataset_random_sampler_output_1	dataset	Randomly sampled subset of the source dataset as a tabular dataset, containing exactly num_samples rows (or all rows if num_samples is -1), with index reset or preserved according to the reset_index parameter.

Disciplines¶

ai_ml.preprocessing
data.dataset.transform

Runnable example¶

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_random_sampler

Auto-generated from transformation schema. Worker id: dataset_random_sampler. Schema hash: 2f5424603bbd. Hand-curated docs in workerexamples/ override this page when present.