DATASET SORT BY ROW¶
Sorts the rows of a dataset by their similarity (or difference) to a reference row from a compare dataset, using raw, absolute, or squared difference metrics. Returns the top-N closest or most-distant rows in ascending or descending order, optionally restricting comparison to specific columns and applying normalization.
When to use¶
Classification: process.
Tagged: compare, dataset_sort, diff, filter, rank, row_distance, sort.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Dataset | dataset | dataset | — | The input dataset whose rows will be scored and sorted against the reference row; each row is treated as a numeric vector for difference computation. | |
| Compare Dataset | compare_row | dataset | — | Reference dataset whose first row is used as the comparison vector; all rows in ‘dataset’ are compared against this single row. | |
| Diff Type | diff_type | scalar | raw_diff | Method used to compute the difference between each row and the reference row: ‘raw_diff’ (signed), ‘abs_diff’ (absolute value), or ‘sq_diff’ (squared); defaults to ‘raw_diff’. | |
| Order | order | scalar | desc | Sort order for the output rows based on their computed difference score: ‘desc’ (largest difference first) or ‘asc’ (smallest difference first); defaults to ‘desc’. | |
| Limit | limit | scalar | 1 | Maximum number of rows to return after sorting; defaults to 1 (return only the top-scoring row). | |
| Return Type | return_type | scalar | original | Controls the content of the returned dataset: ‘original’ returns the original row values, or an alternative mode returns the computed difference values; defaults to ‘original’. | |
| Columns To Match | columns | scalar | — | These columns will be used to compute the diference and find the closest match | |
| Normalize | normalize | scalar | no | Normalize columns before sorting |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| Dataset | dataset | dataset | Sorted and filtered dataset containing up to ‘limit’ rows ranked by their difference score relative to the reference row, in the specified order and return type format. |
Disciplines¶
- data.dataset.transform
- data.statistics
Runnable example¶
A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_sort_by_row
Auto-generated from transformation schema. Worker id: dataset_sort_by_row. Schema hash: 4195bb5e3bdd. Hand-curated docs in workerexamples/ override this page when present.