CREATES A RANDOM DATASET

Draws a random sample of rows from a source dataset, returning a new dataset of the requested size. Use this worker to reduce large datasets for faster downstream processing or to create randomized subsets for training/testing splits.

When to use

Classification: process.

Tagged: dataset, process, random_sampling, row_sample, sampling, transformations.

Inputs

Label ID Type Default Required Description
Source source_dataset dataset   The input dataset (tabular) from which rows will be randomly drawn; leave unconnected only if the worker is being used in a context where data is injected programmatically.
Number Of Samples num_samples scalar -1   Integer count of rows to randomly draw from the source dataset; use -1 (default) to sample all rows, effectively shuffling the dataset.
Reset Index reset_index string no   Whether to reset the row index of the sampled output to a continuous 0-based range (‘yes’) or retain the original source indices (‘no’, default); set to ‘yes’ when the downstream worker expects a clean integer index.

Outputs

Label ID Type Description
dataset_random_sampler_output_1 dataset_random_sampler_output_1 dataset Randomly sampled subset of the source dataset as a tabular dataset, containing exactly num_samples rows (or all rows if num_samples is -1), with index reset or preserved according to the reset_index parameter.

Disciplines

  • ai_ml.preprocessing
  • data.dataset.transform

Runnable example

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_random_sampler


Auto-generated from transformation schema. Worker id: dataset_random_sampler. Schema hash: 2f5424603bbd. Hand-curated docs in workerexamples/ override this page when present.