.. _auto_doe_sampling_point_generator:

*GENERATE SAMPLING POINTS*
==========================

Generates a structured set of sampling points (experiments) from a defined variable space using a choice of classical and advanced DOE strategies — including LHS, full factorial, D-Optimal, Taguchi, and Definitive Screening Design. Use this worker to create the design matrix at the start of any design-exploration, surrogate-training, or sensitivity-analysis workflow.

When to use
-----------

Tagged: ``d-optimal``, ``design_matrix``, ``design_of_experiments``, ``doe``, ``dsd``, ``fractional_factorial``, ``full_factorial``, ``lhs``.

Inputs
------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20 20 20

   * - Label
     - ID
     - Type
     - Default
     - Required
     - Description
   * - Variables
     - variables
     - dataset
     - *(complex)*
     - ✓
     - Dataset defining each design variable — provide name, type (continuous / discrete / constant), and min/default/max bounds; at least one variable is required to build the design space.
   * - Sampling Type
     - sampling_type
     - select
     - d-opt
     - ✓
     - DOE strategy to use for generating points; choose one or more from D-Optimal, Full Factorial, LHS, D-Optimal GA, Space Filling, Taguchi Matrix, Definitive Screening Design, or Fractional Factorial — defaults to D-Optimal.
   * - Points Per Variable
     - num_points_per_variable
     - scalar
     - 3
     - ✓
     - Number of discrete level values sampled per variable when constructing the candidate set; default is 3 — increase for finer resolution in factorial-style methods.
   * - Number Of Experiments
     - num_experiments
     - scalar
     - 10
     - ✓
     - Total number of experimental runs (rows) to select for the final design matrix; default is 10 — ignored by methods that fix run count (e.g. full factorial).
   * - Normalize Variables
     - normalize
     - select
     - no
     - 
     - Whether to normalize all variable values to [0, 1] before generating points; default is 'no' — set to 'yes' when variables have very different scales.
   * - Use Per Variable Samples
     - use_per_variable_num_samples
     - select
     - no
     - 
     - Whether to use each variable's own sample-count setting instead of the global num_points_per_variable; default is 'no' — set to 'yes' when variables require different resolution levels.
   * - Split Input Ranges
     - split_input_ranges
     - select
     - no
     - 
     - When yes, split each eligible continuous variable's [min,max] into sub-ranges (size controlled by step_percentage) and run DOE for each bucket, aggregating the results. A variable is eligible when its max/min ratio exceeds 100/step_percentage. Variables listed in log_variables are split in log-space automatically (bucket boundaries become log-spaced in the original domain), which is recommended for variables spanning multiple orders of magnitude.
   * - Step Percentage
     - step_percentage
     - scalar
     - 20
     - 
     - Size of each sub-range as a percentage of the variable's full range. Also sets the eligibility threshold: a variable is split when max/min > 100/step_percentage. Default 20 gives 5 buckets and a 5x span threshold.
   * - Apply Split To All Variables
     - apply_split_to_all_variables
     - select
     - no
     - 
     - When yes, split every continuous variable regardless of how wide its range is. When no, only split variables whose relative range exceeds step_percentage.
   * - Constraints
     - constraints
     - dataset
     - —
     - 
     - The dataset should contain columns named needle,condition, target where needle is the variable , condition is the operator such as gt,lt, etc and the target is the value to check for. For example, if we need X2 to be greater than X2, we can set needle=X2, condition=GT, target=X1   <a class='btn btn-xs btn-default' target='_blank' href='https://www.d3view.com/docs/master/workflows/Glossary.html#datasetinput'> <i class='fa fa-external-link'> </i> View more </a>
   * - Filter Constants
     - filter_constants
     - select
     - no
     - 
     - Exclude constant variables prior to sampling
   * - Reverse Constraints Order
     - reverse_contraints_order
     - select
     - yes
     - 
     - 
   * - Design Prefix
     - prefix
     - scalar
     - ITER_
     - ✓
     - Design Prefix
   * - Design Iteration
     - design_iter
     - scalar
     - 1
     - ✓
     - Design Iteration
   * - Type of Design
     - design_type
     - select
     - no
     - 
     - Return just the first design
   * - Merge points based on Proximity
     - proximity_merge
     - select
     - no
     - 
     - In the event of multiple sampling scheme selection, this will help to merge close neighbhoring points based on Euclidean distance
   * - Proximity Merge Threshold
     - proximity_merge_threshold
     - scalar
     - 0.01
     - 
     - The euclidean distance for each row is multiplied by this number to get the threshold value. Rows whose values are within this tolerance are replaced with the averaged values
   * - Proximity Treatment
     - proximity_treatment
     - string
     - merge
     - 
     - The default treatment is to merge the close points. We can choose to remove them.
   * - Baseline Detection
     - baseline_type
     - string
     - value_match
     - 
     - From the generated experiments, this option allows to pick the baseline design. If none is found, the first row is selected as baseline design
   * - Cross Sample Size
     - num_cross_size
     - text
     - —
     - 
     - Number for the Cross Sample Size
   * - Log Variables
     - log_variables
     - textarea
     - —
     - 
     - Comma- or newline-separated list of variable names to sample in log space. Their min/max/default are log-transformed before sampling and exp-transformed back on output. When split_input_ranges is yes, these variables also get log-spaced buckets (use for variables like beta that span several orders of magnitude).

Outputs
-------

.. list-table::
   :header-rows: 1
   :widths: 20 20 20 20

   * - Label
     - ID
     - Type
     - Description
   * - Experiments
     - experiments
     - dataset
     - Dataset of generated experimental runs — each row is one simulation/test point with columns corresponding to the input variable names and their assigned values.
   * - Baseline Design
     - baseline_design
     - dataset
     - Dataset containing the baseline (default) design point derived from the variable definitions — useful as a reference run for delta-comparisons and normalization.

Disciplines
-----------

- design_exploration.doe

.. raw:: html

   <hr style="margin-top:2em">
   <p style="font-size:11px;color:#888">
   Auto-generated from <code>platform</code> schema. Worker id: <code>doe_sampling_point_generator</code>. Schema hash: <code>aebc6d685ac6</code>. Hand-curated docs in <code>workerexamples/</code> override this page when present.
   </p>