SCAN THE DATASET AND COMPARE ITS COLUMN VALUES TO A REFERENCE DATASET AND FLAG IF THEY ARE IN OR OUT OF RANGE OF THE REFERENCE VALUES

Scans a dataset by comparing selected column values against the min/max range observed in a reference dataset, then flags each row as in-range or out-of-range. Use this worker to quickly identify anomalous or invalid records relative to a known-good baseline.

When to use

Classification: process.

Tagged: data_quality, flagging, out_of_range, range_check, reference_dataset, scan, validation.

Inputs

Label ID Type Default Required Description
Dataset dataset dataset   The input dataset to be scanned; each row in the selected columns will be compared against the reference range.
Reference Dataset dataset_ref dataset   The reference dataset whose column min/max values define the valid range boundaries used for comparison.
Columns To Scan cols select   One or more column names (drawn from the input dataset) to evaluate against the reference range; leave empty to scan all matching columns.

Outputs

Label ID Type Description
Dataset Scanned dataset_scanned dataset A copy of the input dataset augmented with a flag column indicating whether each row’s values fall within (‘in-range’) or outside (‘out-of-range’) the reference bounds for the selected columns.

Disciplines

  • data.dataset.transform
  • data.statistics

Runnable example

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_scan_out_of_range_rows


Auto-generated from transformation schema. Worker id: dataset_scan_out_of_range_rows. Schema hash: 2a57ee4ac7d1. Hand-curated docs in workerexamples/ override this page when present.