.. _auto_ontology_dataset:

*ONTOLOGY DATASET EXTRACTOR*
============================

Extracts entities from an RDF/SPARQL ontology store and flattens them into a tabular dataset ready for machine learning. Each row represents one entity of the specified type; columns are derived from the entity's own properties or from related child entities traversed via configurable relationship paths, with separate input-feature and target-label column groups.

When to use
-----------

Tagged: ``arc2``, ``dataset``, ``feature_extraction``, ``ml_prep``, ``ontology``, ``rdf``, ``sparql``, ``tabular``.

Inputs
------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20 20 20

   * - Label
     - ID
     - Type
     - Default
     - Required
     - Description
   * - Ontology Store Name
     - ontology_store_name
     - text
     - —
     - ✓
     - Name of the ARC2 RDF ontology store to query; the prefix 'arc2_ontology_' is added automatically if omitted (e.g., enter 'my_model' to target the store 'arc2_ontology_my_model').
   * - Entity Type (Rows)
     - entity_type
     - text
     - —
     - ✓
     - RDF entity type whose instances become dataset rows (e.g., 'SystemModel'); must match a type present in the ontology store.
   * - Include Name Column
     - include_name_column
     - select
     - yes
     - 
     - Controls whether a human-readable name column is added alongside the entity ID column; default 'yes' includes both ID and Name, 'no' includes ID only.
   * - Input Columns Relationship Path
     - input_cols_relationship_path
     - text
     - —
     - 
     - Comma-separated chain of RDF relationship names to traverse from the row entity to the child nodes that become input feature columns (e.g., 'hasSimulation,hasResponse'); leave empty to use the entity's own properties directly.
   * - Input Columns Data Mode
     - input_cols_data
     - select
     - binary
     - 
     - Encoding mode for input columns: 'binary' (default) writes 1/0 for child presence—requires a non-empty relationship path—while 'properties' writes the numeric/string property values of the resolved child nodes.
   * - Input Columns Filter Key
     - input_cols_filter_key
     - text
     - —
     - 
     - Optional property name used to filter which child nodes contribute input columns (e.g., 'name'); matching is case-insensitive substring containment; leave empty to include all children.
   * - Input Columns Filter Value
     - input_cols_filter_value
     - text
     - —
     - 
     - Substring value to match against the property specified in input_cols_filter_key (e.g., 'PERFORMANCE'); leave empty if no filtering is needed.
   * - Target Columns Relationship Path
     - target_cols_relationship_path
     - text
     - —
     - 
     - Comma-separated RDF relationship path to the child nodes that become target/label columns (e.g., 'hasSimulation,hasResponse'); leave empty to produce a dataset with no target columns.
   * - Target Columns Data Mode
     - target_cols_data
     - select
     - properties
     - 
     - Encoding mode for target columns: 'properties' (default) writes child property values; 'binary' writes 1/0 for child presence and requires a non-empty relationship path.
   * - Target Columns Filter Key
     - target_cols_filter_key
     - text
     - —
     - 
     - Optional property name used to filter which child nodes contribute target columns; uses substring containment matching; leave empty to include all children on the target path.
   * - Target Columns Filter Value
     - target_cols_filter_value
     - text
     - —
     - 
     - Substring value to match against the property specified in target_cols_filter_key; leave empty if no filtering is needed on the target path.

Outputs
-------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20

   * - Label
     - ID
     - Type
     - Description
   * - Dataset
     - dataset
     - dataset
     - Tabular dataset (d3VIEW dataset type) with one row per entity instance and columns for entity ID, optional name, all resolved input features, and all resolved target labels.
   * - Input Columns
     - input_columns
     - array
     - Ordered array of column name strings that correspond to the input (feature) columns in the extracted dataset, ready for direct use in ML worker feature-selection fields.
   * - Target Columns
     - target_columns
     - array
     - Ordered array of column name strings that correspond to the target (label) columns in the extracted dataset, ready for direct use in ML worker target-selection fields.
   * - Summary
     - summary
     - text
     - Human-readable text summary of the extraction result, including entity type queried, total row count, and number of input and target columns generated.

Disciplines
-----------

- ai_ml.preprocessing
- data.dataset.transform
- platform.ontology

.. raw:: html

   <hr style="margin-top:2em">
   <p style="font-size:11px;color:#888">
   Auto-generated from <code>platform</code> schema. Worker id: <code>ontology_dataset</code>. Schema hash: <code>fc7ecf5648e9</code>. Hand-curated docs in <code>workerexamples/</code> override this page when present.
   </p>