Robustness Analysis Workflow
===============================
.. _robustanalysisworkflow:


Introduction
------------

Data exploration and analysis is key to understanding the data and gaining insights.
The Generic Data Analyzer workflow utilizes Machine Learning techniques to explore
datasets and analyze data. By executing the workflow, users can choose a task and
receive PPT slides summarizing findings from each task.

Robust Parameter Design (RPD) is a methodology that focuses on making designs
insensitive to noise factors. Noise factors are variables whose values cannot be
consistently controlled. Instead of eliminating these factors, RPD seeks control
factor settings that minimize variation across noise ranges while optimizing the
mean response for target responses.

Model-based Monte Carlo Reliability Analysis (MMCRA) estimates the probability
that a design will meet requirements under uncertainty caused by noise.
A machine learning model is trained to represent system behavior, while uncertain
inputs are sampled from probability distributions. A large number of random samples
are propagated through the model to evaluate target variable performance.

Pre-requisites
--------------

To use this workflow, a general overview of Workflow applications and Workers is
recommended, along with knowledge of robustness and reliability analysis and
machine learning principles.

Please contact support@d3view.com for more information.

Main Tasks
----------

The workflow provides the following two tasks:

1. Robust Parameter Design (RPD)
2. ML Model Based Monte Carlo Reliability Analysis (MMCRA)

Workflow Inputs
---------------

When executing the workflow, the **START worker** prompts users to update inputs
and settings.

Inputs are grouped into three main groups:

1. Generic Inputs
2. Robust Parameter Design Inputs
3. Model Based Monte Carlo Reliability Analysis Inputs

Generic Inputs
--------------

A set of required inputs for all tasks.

+-------------------+----------------------------------------------+
| Input             | Description                                  |
+===================+==============================================+
| Main Task         | Task to perform                              |
+-------------------+----------------------------------------------+
| Input Dataset     | Dataset used for DOE optimization (RPD) and  |
|                   | ML training (RPD and MMCRA)                  |
+-------------------+----------------------------------------------+
| Input Columns     | Columns representing input variables         |
+-------------------+----------------------------------------------+
| Target Columns    | Columns representing output variables        |
+-------------------+----------------------------------------------+

.. thumbnail:: /_images/Images/102workflowinputs.png
   :title:

   Generic Inputs used for both tasks

|

Robust Parameter Design Inputs
------------------------------

The following inputs are used for the RPD task.

+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Input                                 | Description                                                                                                                                                                                                                                                      |
+=======================================+==================================================================================================================================================================================================================================================================+
| Noise Columns                         | Columns that are noise factors.                                                                                                                                                                                                                                  |
+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Noise Columns Distribution            | A dataset that specifies the probability distribution for each noise factor. The dataset must contain the following columns in the same order:                                                                                                                 |
|                                       |                                                                                                                                                                                                                                                                  |
|                                       | 1. **name**                                                                                                                                                                                                                                                      |
|                                       |    This is the column name of the noise factor.                                                                                                                                                                                                                  |
|                                       |                                                                                                                                                                                                                                                                  |
|                                       | 2. **distribution**                                                                                                                                                                                                                                              |
|                                       |    Keyword for probability distribution. Currently, ``norm`` and ``uniform`` are supported for Normal distribution and Uniform distribution.                                                                                                                    |
|                                       |                                                                                                                                                                                                                                                                  |
|                                       | 3. **mean**                                                                                                                                                                                                                                                      |
|                                       |    Normal distribution parameter: mean. This is required when ``norm`` is specified in the ``distribution`` column.                                                                                                                                            |
|                                       |                                                                                                                                                                                                                                                                  |
|                                       | 4. **sd**                                                                                                                                                                                                                                                        |
|                                       |    Normal distribution parameter: standard deviation. This is required when ``norm`` is specified in the ``distribution`` column.                                                                                                                              |
|                                       |                                                                                                                                                                                                                                                                  |
|                                       | 5. **lower**                                                                                                                                                                                                                                                     |
|                                       |    Uniform distribution parameter: lower bound. This is required when ``uniform`` is specified in the ``distribution`` column.                                                                                                                                 |
|                                       |                                                                                                                                                                                                                                                                  |
|                                       | 6. **upper**                                                                                                                                                                                                                                                     |
|                                       |    Uniform distribution parameter: upper bound. This is required when ``uniform`` is specified in the ``distribution`` column.                                                                                                                                 |
|                                       |                                                                                                                                                                                                                                                                  |
|                                       | 7. ...                                                                                                                                                                                                                                                           |
+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Output Target Values                  | A dataset that provides target values for each target variable to optimize. The dataset should have two rows. The first row contains the target column names. The second row contains the target values for each target variable specified in the "Target Columns". |
+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Number of Cross Sample Size           | Number of random samples of noise factors for each combination of control variable values. A duplicate of each control variable combination will be created with different noise factor values to evaluate the mean and standard deviation of target responses. |
+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Number of Points for Distribution     | A specified number of sampling points of noise factors will be generated at the optimal record at each iteration. This dataset is used to display the distribution of target responses at the current optimal control variable values.                     |
| Display                               |                                                                                                                                                                                                                                                                  |
+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


.. thumbnail:: /_images/Images/102sampledatasetdis.png
   :title:

   Sample dataset for specifying sampling distribution for each variable

|

A quick preview of the provided distribution can be found by the “Show Distribution Grid” option.

.. thumbnail:: /_images/Images/102sampledaquickpreview.jpg
   :title:

   Options to view distribution plots

|


.. thumbnail:: /_images/Images/102samplprobabilitydistribution.png
   :title:

   Previews of probability distribution specified from dataset. The distribution parameters can be updated as users inspect the preview.

|


Model Based Monte Carlo Reliability Analysis Inputs
---------------------------------------------------

The following inputs and settings are used MMCRA task. For MMCRA task, we are interested in how the target values get affected by the noises near the same point with noises in the input values. The standard deviation for the probability is usually set to be a relatively small value to simulate the noises. Ideally, the vast majority of the sampling points target values should be falling into the threshold defined by the “Target Variables Lower and Upper Bounds” input dataset.

+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Inputs                                    | Description                                                                                                                                                                                                                                                      |
+===========================================+==================================================================================================================================================================================================================================================================+
| Input Variable Distributions              | A dataset that specifies the probability distribution for each noise factor. The dataset must contain the following columns in the same order:                                                                                                                 |
|                                           |                                                                                                                                                                                                                                                                  |
|                                           | 1. **name**                                                                                                                                                                                                                                                      |
|                                           |    This is the column name of the noise factor.                                                                                                                                                                                                                  |
|                                           |                                                                                                                                                                                                                                                                  |
|                                           | 2. **distribution**                                                                                                                                                                                                                                              |
|                                           |    Keyword for probability distribution. Currently, ``norm`` and ``uniform`` are supported for Normal distribution and Uniform distribution.                                                                                                                    |
|                                           |                                                                                                                                                                                                                                                                  |
|                                           | 3. **mean**                                                                                                                                                                                                                                                      |
|                                           |    Normal distribution parameter: mean. This is required when ``norm`` is specified in the ``distribution`` column.                                                                                                                                            |
|                                           |                                                                                                                                                                                                                                                                  |
|                                           | 4. **sd**                                                                                                                                                                                                                                                        |
|                                           |    Normal distribution parameter: standard deviation. This is required when ``norm`` is specified in the ``distribution`` column.                                                                                                                              |
|                                           |                                                                                                                                                                                                                                                                  |
|                                           | 5. **lower**                                                                                                                                                                                                                                                     |
|                                           |    Uniform distribution parameter: lower bound. This is required when ``uniform`` is specified in the ``distribution`` column.                                                                                                                                 |
|                                           |                                                                                                                                                                                                                                                                  |
|                                           | 6. **upper**                                                                                                                                                                                                                                                     |
|                                           |    Uniform distribution parameter: upper bound. This is required when ``uniform`` is specified in the ``distribution`` column.                                                                                                                                 |
+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Target Variables Lower and Upper Bounds   | A dataset with three columns: **name**, **lower**, and **upper**.                                                                                                                                                                                                |
|                                           |                                                                                                                                                                                                                                                                  |
|                                           | 1. **name** – The target variable as specified in the "Target Columns".                                                                                                                                                                                         |
|                                           | 2. **lower** – Lower threshold for the specified target variable.                                                                                                                                                                                               |
|                                           | 3. **upper** – Upper threshold for the specified target variable.                                                                                                                                                                                               |
|                                           |                                                                                                                                                                                                                                                                  |
|                                           | Lower and upper bound values are used for visualization in the analysis report.                                                                                                                                                                                 |
+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ML Model Option                           | Options to use an existing ML model or train a new ML model using the provided input dataset.                                                                                                                                                                   |
+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Saved ML Model ID                         | If **ML Model Option** is set to use an existing ML model, the model ID can be specified here.                                                                                                                                                                  |
+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Number of Sampling Points for Monte Carlo | Number of sampling points used for histogram visualization to display the results.                                                                                                                                                                              |
+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Show n*Sigma mark                         | Highlights **n × sigma** on the histogram chart with datum lines. The value of **n** can be 1, 2, or 3.                                                                                                                                                        |
+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


Advanced Options
----------------

Sequential Reduction Optimization (SRO) is performed for RPD task. SRO settings can be found in the “Advanced” tab.

+--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Options                        | Description                                                                                                                                                                                        |
+================================+====================================================================================================================================================================================================+
| Output Target Values           | A dataset provides target values for each target variable to optimize to. The dataset should have two rows. The first row contains the target column names.                                     |
|                                | The second row should contain the specified target values for each target variable listed in the **Target Columns**.                                                                             |
+--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Sampling Method                | Sampling schemes used for generating new sampling points. Supported schemes include:                                                                                                             |
|                                |                                                                                                                                                                                                    |
|                                | - **d-opt** – D Optimization                                                                                                                                                                       |
|                                | - **full-factorial** – Full Factorial Design                                                                                                                                                       |
|                                | - **lhs** – Latin Hypercube Sampling                                                                                                                                                               |
|                                | - **spacefilling** – Space Filling Design                                                                                                                                                          |
+--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Number of Values Per Variable  | Number of values for each variable to be considered for sampling.                                                                                                                                  |
+--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Number of Experiments          | Number of sampling points generated.                                                                                                                                                               |
+--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The “Advanced” tab also includes other options that can facilitate the robustness analysis.

+--------------------------------------+-------------------------------------------+
| Option                               | Description                               |
+======================================+===========================================+
| Include Input Dataset for Analysis   | Sampling points only, input points only,  |
| (MMCRA)                              | or both                                   |
+--------------------------------------+-------------------------------------------+
| Number of Bins for Histogram         | Number of bins for histogram plots        |
+--------------------------------------+-------------------------------------------+
| alpha                                | Confidence level (e.g. 0.90 = 90%)        |
+--------------------------------------+-------------------------------------------+
| Reliability P                        | Reliability level (e.g. 0.95 = 95%)       |
+--------------------------------------+-------------------------------------------+
| Reliability Sides                    | Two sided, lower, or upper limit          |
+--------------------------------------+-------------------------------------------+

Learning Inputs
---------------

These parameters control machine learning model training.

+------------------------------------+-------------------------------------------+
| Option                             | Description                               |
+====================================+===========================================+
| Save MathModel As                  | Name of saved ML model                    |
+------------------------------------+-------------------------------------------+
| Include Curve X Values For Learning| Include curve X values as targets         |
+------------------------------------+-------------------------------------------+
| Normalization Option               | Normalize input columns                   |
+------------------------------------+-------------------------------------------+
| Cross Validation Option            | Use cross validation to select best model |
+------------------------------------+-------------------------------------------+
| Train Test Split Train Ratio       | Percentage used for training              |
+------------------------------------+-------------------------------------------+
| Grid Search Option                 | Hyperparameter tuning                     |
+------------------------------------+-------------------------------------------+
| Cross Validation Score Type        | Score metric for cross validation         |
+------------------------------------+-------------------------------------------+
| CV Option for Grid Search          | Cross validation strategy                 |
+------------------------------------+-------------------------------------------+
| Grid Search Score Type             | Scoring metric for grid search            |
+------------------------------------+-------------------------------------------+
| Drop Training Data in Model        | Skip saving training dataset in model     |
+------------------------------------+-------------------------------------------+
| ML Models to Consider              | Candidate ML models                       |
+------------------------------------+-------------------------------------------+

Workflow Outputs
----------------

After execution, a summary report is generated and exported as PPT slides.


Robust Parameter Design Outputs
--------------------------------

The RPD report includes:

- Iteration history of target mean
- Iteration history of target standard deviation
- Input values evolution

.. thumbnail:: /_images/Images/102robustdesignrdpsample.jpg
   :title:

   Sample iteration history for target standard deviation, mean, and input values

|
|

.. thumbnail:: /_images/Images/102robustdesignrdpsampletarget.jpg
   :title:

   Sample animation of histogram for target variable "s" at iteration 5 . Mean and 1 standard deviation from mean is shown by the datum lines.

|
|

When the workflow execution is complete, a notification message with direct access to Simlytiks Dataset, PPT Slides, and the robustness optimal record will be showing on the banner at the top of the canvass for easy access. The Simlytiks Dataset include animation of the target histogram from each iteration to demonstrate target value and standard deviation changes throughout of the optimization process.


.. thumbnail:: /_images/Images/102robustdesignrdpsamplenotificationmessage.jpg
   :title:

   Sample notification message that provides direct access Simlytiks Dataset, PPT Slides and the optimal record for users' convenience.

|
|


Model Based Monte Carlo Reliability Analysis Outputs
------------------------------------------------------------

MMCRA generates a large number of new sampling points around the DOE optimal record. Based on the provided or trained ML model, we get the predicted values for each target variables. This new dataset can be used for evaluating how robust the solution is to noises.
A summary table is available on the first page of the analysis report. It includes summary statistics as well as the tolerance lower and upper limits for each target variable. In addition, it shows the ratio of the points which crossed the tolerance limits.

.. thumbnail:: /_images/Images/102robustsamplereliabitystats.jpg
   :title:

   Sample Reliability Summary Table with summary statistics for each target variable, as well as the tolerence limits and ratio of points exceeding the specified limits.

|
|


The second table includes all data points used for analysis for users’ further exploration. This table includes the input and target variable values as well as the pass status indicating if the point falls out of the specified lower and upper bounds. This table can be interactively inspected when it is opened in d3VIEW Simlytiks application.

.. thumbnail:: /_images/Images/102robustsamplpassstats.jpg
   :title:

   Sample dataset with pass status for user to further explore the dataset on d3VIEW Simlytiks application.

|
|


A bar chart is available to show the top features that has the highest “Ratio of points crossing tolerance limits” and a parallel chart is available to visualize the reliability summary statistics.


.. thumbnail:: /_images/Images/102robustsbarchartstop.png
   :title:

   Bar chart showing the top target features with the highest cross ratio.

|
|


.. thumbnail:: /_images/Images/102sampleparallelsummary.jpg
   :title:

   Sample parallel chart to show the summary statistics for reliability analysis

|
|

A histogram of each target variable will be shown in the MMCRA report. Mean and 1 sigma is labelled in the histogram for users’ convenience. We can compare the distribution to the threshold for each target variable to evaluate the performance for each target variables.


.. thumbnail:: /_images/Images/102samplemmcra.png
   :title:

   Sample MMCRA Visualization for the target variable. The histogram gives an overview of how sensitive is the target variable to the noises around the optimal record.

|
|

Additional visualizations are included in the analysis report for a quick summary of the sampling points and their status whether they fall inside (Good/true) or outside (Fail/false) the threshold.

.. thumbnail:: /_images/Images/102samplevisualizationssum.jpg
   :title:

   Sample visualizations from MMCRA analysis reports provides a quick summary of the status of all the sampling points based on their status whether they fall inside or outside of the threshold.

|
|

When the workflow execution is complete, a notification message with direct access to the Simlytiks Dataset and the PPT Slides will show up on the banner at the top of the canvas.


.. thumbnail:: /_images/Images/102snotificationmessage.jpg
   :title:

   Sample notification message that provides direct access Simlytiks Dataset and PPT Slides for MMCRA analysis report.

|
|

Workflow User Interface
-----------------------

The workflow is designed so the user mainly interacts with the **START worker**.

.. thumbnail:: /_images/Images/102workflowuserinterface.jpg
   :title:

   Workflow structure overview

|
|


START Worker
------------

The START worker is identified as shown below and contains inputs and options for the whole workflow. Once user provides required inputs and options, clicking on the “Run” button at the bottom will start execution of the workflow.


.. thumbnail:: /_images/Images/102workflowuserinterfacestartworker.png
   :title:

   Start Worker

|
|

Execution Panel
---------------

The execution panel includes the following controls.

+-----------+----------------------------------+
| Button    | Description                      |
+===========+==================================+
| Run       | Start workflow execution         |
+-----------+----------------------------------+
| Resume    | Continue from stopped point      |
+-----------+----------------------------------+
| Stop      | Pause execution                  |
+-----------+----------------------------------+
| Validate  | Check inputs before execution    |
+-----------+----------------------------------+
| Reset     | Reset workflow to initial state  |
+-----------+----------------------------------+


.. thumbnail:: /_images/Images/102workflowexecutioncontrols.jpg
   :title:

   Start Worker

|
|


Frequently Asked Questions
--------------------------

**Q1. Where can I find this workflow?**

The workflow with ID **102** can be found in the Workflow Library.


.. thumbnail:: /_images/Images/102workflowfaq1.jpg
   :title:

   FAQ1

|
|


**Q2. How can the data from an old Workflow be imported into a new Workflow in the library

To allow data migration from an old workflow to a new workflow, the data from the old workflow can be exported using Export/Worker IO which will save the old workflow data into a file of the format JSON. Once this file is available, we can use the Tools/Import/WorkflowIO JSON in the new Workflow to overwrite its values with the values saved in the JSON file.

.. thumbnail:: /_images/Images/102workflowfaq2.jpg
   :title:

   FAQ2

|
|

References
----------

1. d3VIEW documentation from https://www.d3view.com