ONTOLOGY DATASET EXTRACTOR¶

Extracts entities from an RDF/SPARQL ontology store and flattens them into a tabular dataset ready for machine learning. Each row represents one entity of the specified type; columns are derived from the entity’s own properties or from related child entities traversed via configurable relationship paths, with separate input-feature and target-label column groups.

When to use¶

Tagged: arc2, dataset, feature_extraction, ml_prep, ontology, rdf, sparql, tabular.

Inputs¶

Label	ID	Type	Default	Required	Description
Ontology Store Name	ontology_store_name	text	—	✓	Name of the ARC2 RDF ontology store to query; the prefix ‘arc2_ontology_’ is added automatically if omitted (e.g., enter ‘my_model’ to target the store ‘arc2_ontology_my_model’).
Entity Type (Rows)	entity_type	text	—	✓	RDF entity type whose instances become dataset rows (e.g., ‘SystemModel’); must match a type present in the ontology store.
Include Name Column	include_name_column	select	yes		Controls whether a human-readable name column is added alongside the entity ID column; default ‘yes’ includes both ID and Name, ‘no’ includes ID only.
Input Columns Relationship Path	input_cols_relationship_path	text	—		Comma-separated chain of RDF relationship names to traverse from the row entity to the child nodes that become input feature columns (e.g., ‘hasSimulation,hasResponse’); leave empty to use the entity’s own properties directly.
Input Columns Data Mode	input_cols_data	select	binary		Encoding mode for input columns: ‘binary’ (default) writes 1/0 for child presence—requires a non-empty relationship path—while ‘properties’ writes the numeric/string property values of the resolved child nodes.
Input Columns Filter Key	input_cols_filter_key	text	—		Optional property name used to filter which child nodes contribute input columns (e.g., ‘name’); matching is case-insensitive substring containment; leave empty to include all children.
Input Columns Filter Value	input_cols_filter_value	text	—		Substring value to match against the property specified in input_cols_filter_key (e.g., ‘PERFORMANCE’); leave empty if no filtering is needed.
Target Columns Relationship Path	target_cols_relationship_path	text	—		Comma-separated RDF relationship path to the child nodes that become target/label columns (e.g., ‘hasSimulation,hasResponse’); leave empty to produce a dataset with no target columns.
Target Columns Data Mode	target_cols_data	select	properties		Encoding mode for target columns: ‘properties’ (default) writes child property values; ‘binary’ writes 1/0 for child presence and requires a non-empty relationship path.
Target Columns Filter Key	target_cols_filter_key	text	—		Optional property name used to filter which child nodes contribute target columns; uses substring containment matching; leave empty to include all children on the target path.
Target Columns Filter Value	target_cols_filter_value	text	—		Substring value to match against the property specified in target_cols_filter_key; leave empty if no filtering is needed on the target path.

Outputs¶

Label	ID	Type	Description
Dataset	dataset	dataset	Tabular dataset (d3VIEW dataset type) with one row per entity instance and columns for entity ID, optional name, all resolved input features, and all resolved target labels.
Input Columns	input_columns	array	Ordered array of column name strings that correspond to the input (feature) columns in the extracted dataset, ready for direct use in ML worker feature-selection fields.
Target Columns	target_columns	array	Ordered array of column name strings that correspond to the target (label) columns in the extracted dataset, ready for direct use in ML worker target-selection fields.
Summary	summary	text	Human-readable text summary of the extraction result, including entity type queried, total row count, and number of input and target columns generated.

Disciplines¶

ai_ml.preprocessing
data.dataset.transform
platform.ontology

Auto-generated from platform schema. Worker id: ontology_dataset. Schema hash: b3bfaf1f5a0b. Hand-curated docs in workerexamples/ override this page when present.