BMS Publishing Guide ==================== About This Guide ^^^^^^^^^^^^^^^^ This guide provides an introduction on how to publish test-data from Battery Management Systems (BMS) from different labs such as CarTech or in-house systems to d3VIEW, a data-to-decision software that helps in scientific data-visualization. Who Should Read This Guide ^^^^^^^^^^^^^^^^^^^^^^^^^^ Engineers interested in publishing test-data from Battery Management Systems (BMS) to d3VIEW can use this guide. What You Should Already Know ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This guide doesn’t include how to install and setup the publisher (lucy). In order to install lucy, the correct version and environment need to be installed and configured. This guide is written with the intention that engineers without any specific background can publish some tests to d3VIEW with lucy. However, certain familiarity with the following topics will greatly help with understanding #. Basic computer skills #. d3VIEW platform #. d3VIEW data flow #. linux commands #. bash script #. programming experience Introduction ^^^^^^^^^^^^ Battery Management Systems (BMS) provide a very useful data that allows to measure performance of batteries and aid in improving the battery performance. Several challenges exist while handling BMS data such as processing large time-history files, variety of different formats, viewing large data and performing analytics on them. d3VIEW is a data-to-decision platform (www.d3view.com) that allows any Engineer, with little or no experience in programming, to import, process and store BMS data and provides advanced capabilities to view and analyze the data. Getting Started ^^^^^^^^^^^^^^^ Overview of Data formats ------------------------ Test data generated in the labs from experiments are collected and stored in various formats. There are mainly three major types of data in production and Table 1 provides a brief overview of these three types of data. .. list-table:: An overview of the three major types of data in production :widths: 25 25 25 25 :header-rows: 1 * - Data Type - Pack Data - Cell Data - Others * - **Example** - BMS.ssv other files - Filename.csv - Filename.csv * - **Source of Meta-data** - WR file (not mandatory) - File name - Header section inside the data file * - **Data starts at Row** - 1 - 6 - 46 * - **Column separator** - Semi-colon (;) - Comma (,) - Comma (,) * - **Number of responses** - 700-1000 - 5 - 700-1000 * - **Size** - 10-300 MB - ~100MB - 300MB-6GB **BMS Data** BMS data are stored in .SSV file with their meta data stored separately in a .WR file. An additional Completed.txt file is created to indicate that the test is complete and ready to get published. These files together are stored under a directory named after the test name. BMS data directories are grouped by the year the lab experiments are planned to be conducted. .. figure:: _images/bms/BMS_data_structure.png :scale: 80 % BMS data are grouped by year and their data files are stored in a directory named after the test name. In the DATA.ssv file, the headers are defined as the first line while the data is defined in subsequent lines. Both the headers and the data are separated by the delimiter ‘;’. There are usually hundreds of signals for each BMS test and tens of thousands of points for each signal. Pack data may contains missing data where their values are left blank. .. figure:: _images/bms/Sample_illustrations_ssv_file.png :scale: 100 % Illustration of pack data organization. Pack data are grouped by year and their data files are stored in a directory named after the test name. .. figure:: _images/bms/sample_wr_file.png :scale: 60 % Illustration of WR (work order) file. Each line represents a meta-data which is used by d3VIEW to associate with the test that corresponds to this file. For example, the ‘Temperature[degC]’ is used to associated with the tag which allows published tests to be search by attribute. **Others** Some data are stored in CSV files. At the start of each data file, there is a header section of that contains all the meta data. The test data starts from after the headers and follows a similar structure as the pack data test file, but with a comma ‘,’ for delimiter. They also share a similar number of signals as pack data for each test. These test data usually have considerably larger number of points so that the size can be up to 6GB. .. figure:: _images/bms/sample_test_file_with_headers.png :scale: 60 % Illustration of test data with metadata as headers inside the CSV file. Meta data "program name" is extracted from "Name". One type of such "name" is in the format of "A_LC_B_SW_CDE", where "LC" is the 2-letter abbreviation of the test location and "SW_CDE" is the software name. Together, "AB_SW_CDE" forms the "Program Name" to be used in the mapping procedure. Some of these test data stored in csv files are converted from the .MF4 file. We also support publishing the raw MF4 file without the csv version on the condition that the string "LC" appears in the MF4 file name. **Summary** In summary, the BMS data is stored in different data-formats as seen above and it is critical to support them by any data-visualization software. The publisher, provided as part of d3VIEW, supports these different data formats and provides a simple way to detect them and to publish them. If you have a data-format that does not match the ones listed above, you can contact your local support so d3VIEW’s publisher can provide readers for them. Overview of publishing procedure -------------------------------- Experiments are performed in labs by engineers and test data are collected and stored in various formats. These data then will be shared with d3VIEW and get published to d3VIEW platform. Once test data is available on d3VIEW platform, they can be accessed, shared by engineers and a variety of visualization and analysis tools are available to users. .. figure:: _images/bms/d3view_publishing_data_flow.png Publishing data flow on d3VIEW There are two different ways to publish the test-data to d3VIEW. In the first method, CLI-METHOD, we use Linux/Windows/Mac and use the Terminal to provide Command Line Interface (CLI) to call the publisher. The advantage of this method is it can be used directly in the terminal (CLI-METHOD-A), can be called programmatically inside BASH scripts (CLI-METHOD-B), and lastly, they can be called as part of scheduled tasks, such as CRONJOBs (CLI-METHOD-C), to watch a directory and import them as new test-data arrive. .. figure:: _images/bms/Sample_publishing_command_CMA.png Two examples of the command format with different publishing options to be used for publishing using CLI-METHOD-A method. We can also put the commands in a BASH script and provide instructions for the publisher to automatically identify test data to get published. .. figure:: _images/bms/example_CRONTAB_file.png A sample CRONJOB set up in the CRONTAB file to be scheduled at 7pm every day and save the logging in the specified file. Users can choose their preferred method to publish the test data. Here is what needed to publish in general: #. A file or directory that contains test-data. #. Publishing commands or a BASH script for CLI-METHOD method (provided by d3VIEW). #. Python enabled Linux/Windows/or Mac with necessary modules (Please refer to the Lucy Administrators Manual). #. Latest Lucy Binary (provided by d3VIEW). #. A licensed version of d3VIEW to where the data can be published **Publishing command and options (CLI-METHOD-A)** CLI-METHOD method calls for the “bms” plugin from where the publisher (lucy) is installed to perform the publishing task in a manner controlled by the publishing options. The directory where the publisher is installed must first be specified in the publishing command for the CLI-METHOD-A method (Figure 9), followed by calling the “bms” plugin and publishing options. BMS, Element and EMEA test data are using “bms” plugin to get published while CarTech test data are published by using “cartech” plugin. Both plugins share most of the publishing options. Publishing options are instructions for publisher to locate the data file to be published, how and where they will be published. Depending on the requirements for publishing, each publishing task can have its own set of options. In production, BMS, Element and EMEA test data use “bms” plugin for publishing because they share a lot of similarities in terms of how their data are stored and structured. Table below provides an overview of some mostly used options used in publishing with brief descriptions. .. list-table:: :widths: 25 60 :header-rows: 1 * - **Publishing options** - **Descriptions** * - **-publish** - Publish to d3VIEW * - **-user** - A valid user on d3VIEW * - **-f** - File name or regex pattern to search for (e.g., BMS.ssv, \*.ssv, \*.csv) * - **-application-key** - lucy * - **-d3view-url** - d3VIEW site where test data will be published to * - **-d** - Directory to search for data files * - **-mapping-file** - A mapping file used for publishing * - **-scratch-dir** - Directory for publishing related scratch work * - **-match-type** - Determines which program’s mapping table to be used * - **-time-channel** - Raw channel names to be used for time channel * - **-error-factor** - If abs(y_prev – y_curr), <, = ) * - CCValue - The reference metric we want to compare to. * - Alternative - The next candidate to be mapped if the calculated metric using the candidate curve and the reference curve fails the condition set by the CCCriteria and CCValue. For example, when a candidate curve is selected by the lucy publisher to map to the channel Cell Voltage_###_RAW_NAME, we are calculating the metrics of the MSE between the candidate curve and “Average Cell Voltage”. If the MSE is smaller than the CCValue, that is, the MSE computed using “Maximum Cell Voltage” and “Minimum Cell Voltage”, then we use the candidate curve to map to the channel Cell Voltage_###. Otherwise, we go to the next candidate in the “Alternative” column, CellVolt_###_RAW_NAME. Then, we perform the same check until we find the correct candidate or when we reached to the end of the mapping table or when the “Alternative” is “NONE”. .. figure:: _images/bms/mapping_file_crosscheck_example.png A mapping file with additional columns for CrossCheck feature (**-use-mse**) **-use-mapper** **-use-mapper** requires the same columns as “CrossCheck”. But we can put “no” to all items in the mapping file. This option intends to provide a solution in the following scenarios. * Map source name with a single digit to a dest name with multiple digits. For example, the source name "CellVolt_2" will be mapped to "Cell Voltage_2" and "CellVolt_10" will be mapped to "Cell Voltage_10". When we order them alphabetically, "Cell Voltage_10" will appear before "Cell Voltage_2". This is not what we want. So, we need to map "CellVolt_2" to "Cell Voltage_02" if the max number of the Cell Voltage signals are two digits or to "Cell Voltage_002" if the max number is three digits. * Allow more than one source names in one cell separated by coma. Some tests have different channel names. However, they are from the same program. Channels from one files have this prefix “PREFIX\_”. To accommodate this characteristic, we put both channel names to the source name separated by comma. This allows lucy the publisher to look for the second channel is the first doesn’t exist in the data file to map to the dest name. .. list-table:: :widths: 20 40 40 * - **Program** - **Source Name** - **Dest Name** * - ... - ... - ... * - PROG_NAME - Current_Raw_Name, PREFIX_Current_Raw_Name - Pack Current * - PROG_NAME - Voltage_Raw_Name, PREFIX_Voltage_Raw_Name - Pack Voltage * - ... - ... - ... Downsampling ------------ Downsampling methods help to reduce the number of points from the raw data without losing information of the original signals or lose as little information as possible. There are several methods available to achieve this with lucy options in the publishing script. #. **-rm-repeating** #. **-error-factor** #. **-rdp** **-rm-repeating** The option **-rm-repeating** removes points having the same values as their adjacent points both on the left and on the right. Therefore, **-rm-repeating** the size of data without losing any information. .. figure:: _images/bms/illustration_rmrepeating.png **-rm-repeating** removes points having the same values as both adjacent points. .. figure:: _images/bms/rm_repeating_simple_example.png A simple example before (blue) and after (red) **-rm-repeating** is applied. .. figure:: _images/bms/rm_repeating_simple_example_table_view.png Table view of signal published with (right) and without (left) **-rm-repeating** option option. This shows how the **-rm-repeating** works by listing the x and y values from the raw data (left) and simplified data (right) side by side. Clearly the first 2 points from the simplified data doesn’t contain any less information than the first 8 data points from the raw data do, only with fewer points. This can be verified from the line view of these two curves. .. figure:: _images/bms/rm_repeating_simple_example_line_view.png Line view of the signal published with (red) and without (blue) **-rm-repeating** option The two signals, red published with -rm-repeating and blue published without **-rm-repeating**, match each other perfectly. This confirms that **-rm-repeating** doesn't discard any information from data in the down-sampling procedure. **-error-factor** **-error-factor** is one parameter for the *-rm-repeating* option. It removes points with the following condition. If :math:`abs(y_{prev} – y_{curr} )<` error factor :math:`×(y_{max}-y_{min})`,then :math:`y_{curr}` is dropped. We always keep the first and last point of the curve. Then, we examine each point starting from the second point of the curve. It determines a tolerance value by taking a proportion (error factor value provided by user) of the y range (difference between max and min y values of specific signals). If the current point is outside of the tolerance, then both the current point and the previous point will remain. Otherwise, if the current point is within the tolerance, it will be skipped unless the point after exceeds the tolerance. .. figure:: _images/bms/illustration_errorfactor.png **-error-factor** removes points having the "similar" values as both adjacent points. .. figure:: _images/bms/error_factor_example.png Signals published with error factor 0 (left, blue) and 0.01 (right, red) Two signals published with different **-error-factor** values 0 and 0.01. First a few points from the raw data are all around 4.147 but varies by 0.001. In this case, **-rm-repeating** will not remove these points because they have different values. However, variation of size 0.001 may be not of our interest compared with the y range of the signal being about 1 (ymax = 4.148, ymin = 3.144). When the **-error-factor** is set to be 0.01, then if the difference between two adjacent points is less than 1% (**-error-factor** 0.01) of the yrange (i.e., 1), then the points are dropped. Thus, **-error-factor** simplifies the raw data by making a small compromise of the information preserved in the publishing process. **-rdp** The Ramer–Douglas–Peucker algorithm is an algorithm that decimates a curve composed of line segments to a similar curve with fewer points. It is similar to **-error-factor** in the sense that, it takes an epsilon value that later will be used as a tolerance to determine if a point will be removed or not. Figure 25 shows the simplified signal with epsilon value 1 and 0.2. Smaller epsilon value leads to fewer points dropped and therefore, there always exists such an epsilon value whose simplified data are close enough to the raw data. .. figure:: _images/bms/rdp_example.png Simplified signal (red) with epsilon 1(top) and 0.2(bottom) One way to reduce the effects of the scale on the rdp method is to use -normalize option to normalize the raw data using min-max normalization to a 1 by 1 box and then perform rdp algorithm. In this way, the variation in the optimal epsilon values for different data is much less.