4.6. Tasks

The Tasks Plugin allows for multiple Lucy tasks to be run at once.

python bin/lucy.egg plugins ml_tasks -h

Primary Arguments

Main specifications for a running Tasks.

Testing

Tasks Examples

The examples outlined here are to assist in running multiple Lucy Tasks at once.

Input

An input file is used by passing the input file path as an argument for the -input option. An example command using a Lucy Build is:

python bin/lucy.egg plugins ml_tasks -input ./tasks.json

The file tasks.json holds the information the has been populated and received from Simlytiks.

Using the Iris dataset as an example, the high level keys are:

{
    "data name": "IRIS",
    "date": "June 20, 2022",
    "data_in": "C:/lucy/data/iris.csv",
    "data_out": "C:/lucy/data/response_iris.json",
    "items": [...]
}

The items key holds one or more applicable Lucy Tasks. Each item (or Lucy Task) has information that useful for Lucy and/or Simlytiks. The base information for items is generated by the ml_items Lucy plugin. This plugin is used to update Simlytiks with what Lucy ML Tasks and features are available for a user to access. The user interacts with Simlytiks and decides which task(s) to send to Lucy.

Below is an example of how an Explore’s Feature Importance Task will look, keep in mind that this will be an element of the item list from above:

{
    "label": "Feature Importance",
    "ml_type": "explore",
    "id": "feature_importance"
    "mapsTo": "feature_importance true",
    "description": "Provides scores for how each input relates to the
        target.",
    "viz_type": "barchart-horizontal",
    "fields": [...]
}

The fields list contains additional arguments that are required or optional and are used to customize a Task. Feature Importance has the following fields:

  • Input Features
  • Target Column
  • Normalize

For Feature Importance, a Target Column is required. The field for a Target Column will be included in the fields list and will look as follows:

{
    "label": "Target Column",
    "description": "The name of the target column.",
    "name": "clabel",
    "id": "clabel",
    "mapsTo": "clabel",
    "optional": false,
    "group": "ml",
    "dataTypes": ["text"],
    "source": "data",
    "value": "class",
    "validations": []
}

The elements of each field can be used by Simlytiks and/or Lucy. The choosing of or omitting value is up to the user.

Output

An output is always produced; however, there are a few options:

  1. python bin/lucy.egg plugins ml_tasks -input ./tasks.json

    An output file is created at the location that was specified using the data_out key in the input file. Each Task has a new output key whose value is the location of that task’s Lucy-Response. These are stored in a “temp_items” folder in the same location as the Lucy Build that was used.

  2. python bin/lucy.egg plugins ml_tasks -input ./tasks.json -include-json

    This is similar to (1) with the difference being that instead of output storing a file path, it is the entire Lucy-Response of that Task.

  3. python bin/lucy.egg plugins ml_tasks -input ./tasks.json -replace

    This is similar to (1) with the difference being that the data_out file path in the input file is ignored. Instead, the resulting JSON is written to the input file location. This means that the input file will be adjusted.

  4. python bin/lucy.egg plugins ml_tasks -input ./tasks.json -include-json -replace

    This is a combination of (2) and (3). That is, the resulting JSON is saved to the input file’s location and the output for each Task holds its respective Lucy-Response.

Example Learn/Explore Documents

The Input JSON is of the form that the Tasks Plugin expects. This JSON holds Lucy Explore and Learn Tasks that will be carried out. It is important to note that the data used by the Tasks is specified using the “data_in” key in the Input. The resulting Lucy-Responses are under the Output and are saved at the “data_out” location that is specified in it’s corresponding Input JSON.

Here it is assumed that -include-json was used in the CLI. Hence, the Output has the “output” key for each task populated with the Lucy-Response rather than just the file path to the temporary save of the Lucy-Response for that task.

Dataset Input Output Tasks
Iris tasks.json response.json Explore - Feature Importance, Explore - Distribution (Simlytiks is capable of this without Lucy), Learn - K-Means Clustering, Learn - Mean Shift Clustering, Learn - Decision Tree Classification, Learn - Random Forest Classification, Learn - Gaussian Naive Bayes Classification, Learn - Logistic Regression for Classification
HIC tasks.json response.json Explore - Feature Importance, Explore - Distribution (Simlytiks is capable of this without Lucy), Learn - K-Means Clustering, Learn - Mean Shift Clustering, Learn - Linear Regression, Learn - Lasso Regression, Learn - Ridge Regression, Learn - Support Vector Regression, Learn - Decision Tree Regression, Learn - Random Forest Regression, Learn - Gaussian Process Regression

Example Predict Documents

The Input JSON is of the form that the Tasks Plugin expects. For Predict, this JSON holds a single Predict Task that will be carried out. It is important to note that the point to predict are specified using the “data_in” key and the desired model to use for the predictions is specified using the “mfile_in” key in the Input. The resulting Lucy-Responses are under the Output and are saved at the “data_out” location that is specified in its corresponding Input JSON.

Here it is assumed that -include-json was used in the CLI. Hence, the Output has the “output” key for the predict task populated with the Lucy-Response rather than just the file path to the temporary save of the Lucy-Response.

Dataset Model Summary Input Output Tasks
HIC Linear Regression using tbumper and thood to predict HIC predict_task.json Manual Copy and could change once support is added predict_response.json Manual Copy and could change once support is added Predict

python bin/lucy.egg plugins ml_tasks -input ml_tasks.json -output C:/items.json