SCAN THE DATASET AND COMPARE ITS COLUMN VALUES TO A REFERENCE DATASET AND FLAG IF THEY ARE IN OR OUT OF RANGE OF THE REFERENCE VALUES¶
Scans a dataset by comparing selected column values against the min/max range observed in a reference dataset, then flags each row as in-range or out-of-range. Use this worker to quickly identify anomalous or invalid records relative to a known-good baseline.
When to use¶
Classification: process.
Tagged: data_quality, flagging, out_of_range, range_check, reference_dataset, scan, validation.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Dataset | dataset | dataset | — | The input dataset to be scanned; each row in the selected columns will be compared against the reference range. | |
| Reference Dataset | dataset_ref | dataset | — | The reference dataset whose column min/max values define the valid range boundaries used for comparison. | |
| Columns To Scan | cols | select | — | One or more column names (drawn from the input dataset) to evaluate against the reference range; leave empty to scan all matching columns. |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| Dataset Scanned | dataset_scanned | dataset | A copy of the input dataset augmented with a flag column indicating whether each row’s values fall within (‘in-range’) or outside (‘out-of-range’) the reference bounds for the selected columns. |
Disciplines¶
- data.dataset.transform
- data.statistics
Runnable example¶
A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_scan_out_of_range_rows
Auto-generated from transformation schema. Worker id: dataset_scan_out_of_range_rows. Schema hash: 2a57ee4ac7d1. Hand-curated docs in workerexamples/ override this page when present.