4.6. Tasks¶
The Tasks Plugin allows for multiple Lucy tasks to be run at once.
python bin/lucy.egg plugins ml_tasks -h
Tasks Examples¶
The examples outlined here are to assist in running multiple Lucy Tasks at once.
Input¶
An input file is used by passing the input file path as an argument for the
-input
option. An example command using a Lucy Build is:
python bin/lucy.egg plugins ml_tasks -input ./tasks.json
The file tasks.json holds the information the has been populated and received from Simlytiks.
Using the Iris dataset as an example, the high level keys are:
{
"data name": "IRIS",
"date": "June 20, 2022",
"data_in": "C:/lucy/data/iris.csv",
"data_out": "C:/lucy/data/response_iris.json",
"items": [...]
}
The items key holds one or more applicable Lucy Tasks. Each item (or Lucy Task) has information that useful for Lucy and/or Simlytiks. The base information for items is generated by the ml_items Lucy plugin. This plugin is used to update Simlytiks with what Lucy ML Tasks and features are available for a user to access. The user interacts with Simlytiks and decides which task(s) to send to Lucy.
Below is an example of how an Explore’s Feature Importance Task will look, keep in mind that this will be an element of the item list from above:
{
"label": "Feature Importance",
"ml_type": "explore",
"id": "feature_importance"
"mapsTo": "feature_importance true",
"description": "Provides scores for how each input relates to the
target.",
"viz_type": "barchart-horizontal",
"fields": [...]
}
The fields list contains additional arguments that are required or optional and are used to customize a Task. Feature Importance has the following fields:
- Input Features
- Target Column
- Normalize
For Feature Importance, a Target Column is required. The field for a Target Column will be included in the fields list and will look as follows:
{
"label": "Target Column",
"description": "The name of the target column.",
"name": "clabel",
"id": "clabel",
"mapsTo": "clabel",
"optional": false,
"group": "ml",
"dataTypes": ["text"],
"source": "data",
"value": "class",
"validations": []
}
The elements of each field can be used by Simlytiks and/or Lucy. The choosing of or omitting value is up to the user.
Output¶
An output is always produced; however, there are a few options:
python bin/lucy.egg plugins ml_tasks -input ./tasks.json
An output file is created at the location that was specified using the data_out key in the input file. Each Task has a new output key whose value is the location of that task’s Lucy-Response. These are stored in a “temp_items” folder in the same location as the Lucy Build that was used.
python bin/lucy.egg plugins ml_tasks -input ./tasks.json -include-json
This is similar to (1) with the difference being that instead of output storing a file path, it is the entire Lucy-Response of that Task.
python bin/lucy.egg plugins ml_tasks -input ./tasks.json -replace
This is similar to (1) with the difference being that the data_out file path in the input file is ignored. Instead, the resulting JSON is written to the input file location. This means that the input file will be adjusted.
python bin/lucy.egg plugins ml_tasks -input ./tasks.json -include-json -replace
This is a combination of (2) and (3). That is, the resulting JSON is saved to the input file’s location and the output for each Task holds its respective Lucy-Response.
Example Learn/Explore Documents¶
The Input JSON is of the form that the Tasks Plugin expects. This JSON holds Lucy Explore and Learn Tasks that will be carried out. It is important to note that the data used by the Tasks is specified using the “data_in” key in the Input. The resulting Lucy-Responses are under the Output and are saved at the “data_out” location that is specified in it’s corresponding Input JSON.
Here it is assumed that -include-json
was used in the CLI. Hence, the
Output has the “output” key for each task populated with the Lucy-Response
rather than just the file path to the temporary save of the Lucy-Response for
that task.
Dataset | Input | Output | Tasks |
---|---|---|---|
Iris | tasks.json |
response.json |
Explore - Feature Importance, Explore - Distribution (Simlytiks is capable of this without Lucy), Learn - K-Means Clustering, Learn - Mean Shift Clustering, Learn - Decision Tree Classification, Learn - Random Forest Classification, Learn - Gaussian Naive Bayes Classification, Learn - Logistic Regression for Classification |
HIC | tasks.json |
response.json |
Explore - Feature Importance, Explore - Distribution (Simlytiks is capable of this without Lucy), Learn - K-Means Clustering, Learn - Mean Shift Clustering, Learn - Linear Regression, Learn - Lasso Regression, Learn - Ridge Regression, Learn - Support Vector Regression, Learn - Decision Tree Regression, Learn - Random Forest Regression, Learn - Gaussian Process Regression |
Example Predict Documents¶
The Input JSON is of the form that the Tasks Plugin expects. For Predict, this JSON holds a single Predict Task that will be carried out. It is important to note that the point to predict are specified using the “data_in” key and the desired model to use for the predictions is specified using the “mfile_in” key in the Input. The resulting Lucy-Responses are under the Output and are saved at the “data_out” location that is specified in its corresponding Input JSON.
Here it is assumed that -include-json
was used in the CLI. Hence, the
Output has the “output” key for the predict task populated with the Lucy-Response
rather than just the file path to the temporary save of the Lucy-Response.
Dataset | Model Summary | Input | Output | Tasks |
---|---|---|---|---|
HIC | Linear Regression using tbumper and thood to predict HIC | predict_task.json
Manual Copy and could change once support is added |
predict_response.json
Manual Copy and could change once support is added |
Predict |
python bin/lucy.egg plugins ml_tasks -input ml_tasks.json -output C:/items.json