JOIN DATASETS BASED ON PRIMARY KEYS

Joins two datasets on one or more shared primary-key columns using simple, outer, inner, or left join semantics. Use this worker when you need to merge tabular datasets from different sources into a single unified dataset before downstream analysis or modeling.

When to use

Classification: process.

Tagged: dataset, inner_join, join, left_join, merge, outer_join, primary_key, process.

Inputs

Label ID Type Default Required Description
Dataset1 dataset1 dataset   The primary (left-hand) dataset to join; must be a tabular dataset object — the join will fail if this is absent or invalid.
Dataset2 dataset2 dataset   The secondary (right-hand) dataset(s) to join onto Dataset1; supports repeated inputs so multiple datasets can be merged sequentially.
Join Type join_type scalar simple   Join strategy: ‘simple’ (default, column-append with no key matching), ‘outer’ (all rows from both), ‘inner’ (only matching rows), or ‘left’ (all rows from Dataset1, matched rows from Dataset2).
Primary Keys primarykeys scalar   Comma-separated column name(s) used as the matching key(s) between the two datasets (e.g. ‘id,run_id’); leave blank for a simple side-by-side column append.
Datset1 Columns To Include datset1_columns_to_include text   Subset of columns from Dataset1 to carry into the output; leave empty to include all Dataset1 columns.
Datset2 Columns To Include datset2_columns_to_include text   Subset of columns from Dataset2 to carry into the output; leave empty to include all Dataset2 columns.
Prefix for Dataset1 prefixfordataset1columns scalar    
Prefix for Dataset 2 prefixfordataset2columns scalar   Dataset 2 Column prefix

Outputs

Label ID Type Description
dataset_join_output_1 dataset_join_output_1 dataset Merged tabular dataset containing columns from both Dataset1 and Dataset2, combined according to the selected join type and primary keys.

Disciplines

  • data.dataset.transform

Runnable example

A runnable example is registered for this worker. Open the example workflow on the d3VIEW canvas: /api/workflow/example?id=dataset_join


Auto-generated from transformation schema. Worker id: dataset_join. Schema hash: 991debb90eaa. Hand-curated docs in workerexamples/ override this page when present.