ONTOLOGY DATASET EXTRACTOR

Extracts entities from an RDF/SPARQL ontology store and flattens them into a tabular dataset ready for machine learning. Each row represents one entity of the specified type; columns are derived from the entity’s own properties or from related child entities traversed via configurable relationship paths, with separate input-feature and target-label column groups.

When to use

Tagged: arc2, dataset, feature_extraction, ml_prep, ontology, rdf, sparql, tabular.

Inputs

Label ID Type Default Required Description
Ontology Store Name ontology_store_name text Name of the ARC2 RDF ontology store to query; the prefix ‘arc2_ontology_’ is added automatically if omitted (e.g., enter ‘my_model’ to target the store ‘arc2_ontology_my_model’).
Entity Type (Rows) entity_type text RDF entity type whose instances become dataset rows (e.g., ‘SystemModel’); must match a type present in the ontology store.
Include Name Column include_name_column select yes   Controls whether a human-readable name column is added alongside the entity ID column; default ‘yes’ includes both ID and Name, ‘no’ includes ID only.
Input Columns Relationship Path input_cols_relationship_path text   Comma-separated chain of RDF relationship names to traverse from the row entity to the child nodes that become input feature columns (e.g., ‘hasSimulation,hasResponse’); leave empty to use the entity’s own properties directly.
Input Columns Data Mode input_cols_data select binary   Encoding mode for input columns: ‘binary’ (default) writes 1/0 for child presence—requires a non-empty relationship path—while ‘properties’ writes the numeric/string property values of the resolved child nodes.
Input Columns Filter Key input_cols_filter_key text   Optional property name used to filter which child nodes contribute input columns (e.g., ‘name’); matching is case-insensitive substring containment; leave empty to include all children.
Input Columns Filter Value input_cols_filter_value text   Substring value to match against the property specified in input_cols_filter_key (e.g., ‘PERFORMANCE’); leave empty if no filtering is needed.
Target Columns Relationship Path target_cols_relationship_path text   Comma-separated RDF relationship path to the child nodes that become target/label columns (e.g., ‘hasSimulation,hasResponse’); leave empty to produce a dataset with no target columns.
Target Columns Data Mode target_cols_data select properties   Encoding mode for target columns: ‘properties’ (default) writes child property values; ‘binary’ writes 1/0 for child presence and requires a non-empty relationship path.
Target Columns Filter Key target_cols_filter_key text   Optional property name used to filter which child nodes contribute target columns; uses substring containment matching; leave empty to include all children on the target path.
Target Columns Filter Value target_cols_filter_value text   Substring value to match against the property specified in target_cols_filter_key; leave empty if no filtering is needed on the target path.

Outputs

Label ID Type Description
Dataset dataset dataset Tabular dataset (d3VIEW dataset type) with one row per entity instance and columns for entity ID, optional name, all resolved input features, and all resolved target labels.
Input Columns input_columns array Ordered array of column name strings that correspond to the input (feature) columns in the extracted dataset, ready for direct use in ML worker feature-selection fields.
Target Columns target_columns array Ordered array of column name strings that correspond to the target (label) columns in the extracted dataset, ready for direct use in ML worker target-selection fields.
Summary summary text Human-readable text summary of the extraction result, including entity type queried, total row count, and number of input and target columns generated.

Disciplines

  • ai_ml.preprocessing
  • data.dataset.transform
  • platform.ontology

Auto-generated from platform schema. Worker id: ontology_dataset. Schema hash: fc7ecf5648e9. Hand-curated docs in workerexamples/ override this page when present.