DATASET SORT BY ROW

Sorts the rows of a dataset by their similarity (or difference) to a reference row from a compare dataset, using raw, absolute, or squared difference metrics. Returns the top-N closest or most-distant rows in ascending or descending order, optionally restricting comparison to specific columns and applying normalization.

When to use

Classification: process.

Tagged: compare, dataset_sort, diff, filter, rank, row_distance, sort.

Inputs

Label ID Type Default Required Description
Dataset dataset dataset   The input dataset whose rows will be scored and sorted against the reference row; each row is treated as a numeric vector for difference computation.
Compare Dataset compare_row dataset   Reference dataset whose first row is used as the comparison vector; all rows in ‘dataset’ are compared against this single row.
Diff Type diff_type scalar raw_diff   Method used to compute the difference between each row and the reference row: ‘raw_diff’ (signed), ‘abs_diff’ (absolute value), or ‘sq_diff’ (squared); defaults to ‘raw_diff’.
Order order scalar desc   Sort order for the output rows based on their computed difference score: ‘desc’ (largest difference first) or ‘asc’ (smallest difference first); defaults to ‘desc’.
Limit limit scalar 1   Maximum number of rows to return after sorting; defaults to 1 (return only the top-scoring row).
Return Type return_type scalar original   Controls the content of the returned dataset: ‘original’ returns the original row values, or an alternative mode returns the computed difference values; defaults to ‘original’.
Columns To Match columns scalar   These columns will be used to compute the diference and find the closest match
Normalize normalize scalar no   Normalize columns before sorting

Outputs

Label ID Type Description
Dataset dataset dataset Sorted and filtered dataset containing up to ‘limit’ rows ranked by their difference score relative to the reference row, in the specified order and return type format.

Disciplines

  • data.dataset.transform
  • data.statistics

Runnable example

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_sort_by_row


Auto-generated from transformation schema. Worker id: dataset_sort_by_row. Schema hash: 4195bb5e3bdd. Hand-curated docs in workerexamples/ override this page when present.