ONTOLOGY DATASET EXTRACTOR¶
Extracts entities from an RDF/SPARQL ontology store and flattens them into a tabular dataset ready for machine learning. Each row represents one entity of the specified type; columns are derived from the entity’s own properties or from related child entities traversed via configurable relationship paths, with separate input-feature and target-label column groups.
When to use¶
Tagged: arc2, dataset, feature_extraction, ml_prep, ontology, rdf, sparql, tabular.
Inputs¶
| Label | ID | Type | Default | Required | Description |
|---|---|---|---|---|---|
| Ontology Store Name | ontology_store_name | text | — | ✓ | Name of the ARC2 RDF ontology store to query; the prefix ‘arc2_ontology_’ is added automatically if omitted (e.g., enter ‘my_model’ to target the store ‘arc2_ontology_my_model’). |
| Entity Type (Rows) | entity_type | text | — | ✓ | RDF entity type whose instances become dataset rows (e.g., ‘SystemModel’); must match a type present in the ontology store. |
| Include Name Column | include_name_column | select | yes | Controls whether a human-readable name column is added alongside the entity ID column; default ‘yes’ includes both ID and Name, ‘no’ includes ID only. | |
| Input Columns Relationship Path | input_cols_relationship_path | text | — | Comma-separated chain of RDF relationship names to traverse from the row entity to the child nodes that become input feature columns (e.g., ‘hasSimulation,hasResponse’); leave empty to use the entity’s own properties directly. | |
| Input Columns Data Mode | input_cols_data | select | binary | Encoding mode for input columns: ‘binary’ (default) writes 1/0 for child presence—requires a non-empty relationship path—while ‘properties’ writes the numeric/string property values of the resolved child nodes. | |
| Input Columns Filter Key | input_cols_filter_key | text | — | Optional property name used to filter which child nodes contribute input columns (e.g., ‘name’); matching is case-insensitive substring containment; leave empty to include all children. | |
| Input Columns Filter Value | input_cols_filter_value | text | — | Substring value to match against the property specified in input_cols_filter_key (e.g., ‘PERFORMANCE’); leave empty if no filtering is needed. | |
| Target Columns Relationship Path | target_cols_relationship_path | text | — | Comma-separated RDF relationship path to the child nodes that become target/label columns (e.g., ‘hasSimulation,hasResponse’); leave empty to produce a dataset with no target columns. | |
| Target Columns Data Mode | target_cols_data | select | properties | Encoding mode for target columns: ‘properties’ (default) writes child property values; ‘binary’ writes 1/0 for child presence and requires a non-empty relationship path. | |
| Target Columns Filter Key | target_cols_filter_key | text | — | Optional property name used to filter which child nodes contribute target columns; uses substring containment matching; leave empty to include all children on the target path. | |
| Target Columns Filter Value | target_cols_filter_value | text | — | Substring value to match against the property specified in target_cols_filter_key; leave empty if no filtering is needed on the target path. |
Outputs¶
| Label | ID | Type | Description |
|---|---|---|---|
| Dataset | dataset | dataset | Tabular dataset (d3VIEW dataset type) with one row per entity instance and columns for entity ID, optional name, all resolved input features, and all resolved target labels. |
| Input Columns | input_columns | array | Ordered array of column name strings that correspond to the input (feature) columns in the extracted dataset, ready for direct use in ML worker feature-selection fields. |
| Target Columns | target_columns | array | Ordered array of column name strings that correspond to the target (label) columns in the extracted dataset, ready for direct use in ML worker target-selection fields. |
| Summary | summary | text | Human-readable text summary of the extraction result, including entity type queried, total row count, and number of input and target columns generated. |
Disciplines¶
- ai_ml.preprocessing
- data.dataset.transform
- platform.ontology
Auto-generated from platform schema. Worker id: ontology_dataset. Schema hash: fc7ecf5648e9. Hand-curated docs in workerexamples/ override this page when present.