Load Balance Checker for MPP LS-DYNA

A simple ‘C’ program is included to extract the processor based timing information from D3HSP file and output load balance summary. Future versions of MPP-LS-DYNA may include more information such as the summary shown below.

Usage: load_balance_checker d3hsp_file_name {threshold_factor}


Source Preview

 Load Balance Diagnosis for MPP LS-DYNA
 Author: Suri Bala, Livermore Software
 Copyright: Livermore Software
 Credits: Brian Wainscott, Jason Wang
 Usage: exe_name d3hsp_file_name {threshold_factor}
 Description: Performs a load balance check for all processors
 Compilation: cc source_file_name.c -o exe_file_name

#define LINE_WIDTH 128

int main(int argc, char *argv[]) {

 FILE *d3hsp = NULL;
 char buffer[LINE_WIDTH];
 int i, num_threads=0, host_count=0;
 float *hosts, ratio, sum =0, average_timing=1, min_timing=1e20, max_timing=-1e20;;
 float threshold_factor = 0.2, max_threshold=0, min_threshold=0;

 // if no d3hsp file is specified, exit out
 if( argc == 1 ) {
    printf("Usage: %s d3hsp_file \n", argv[0]);

 // open the d3hsp file
 d3hsp = fopen(argv[1], "r");
 if( !d3hsp ) {
   printf("Could not open file %s\n", argv[1]);

 //if threshold is specified accept it
 if( argc==3) {
   sprintf(argv[2], "%10.4f", threshold_factor);

 // start reading the lines from d3hsp file
 while( !feof(d3hsp) ) {
     fgets(buffer, LINE_WIDTH, d3hsp);
     // get the num of threads
     if( strncmp(buffer+1,"Parallel", 8) == 0 )  {
         sscanf(buffer+25,"%d", &num_threads);
         hosts = (float *) malloc(num_threads*sizeof(float));
     // store the host based timing
     if( buffer[25] == '#' ) {
         sscanf(buffer+65," %e ", &hosts[host_count]);
         if( hosts[host_count] > max_timing) max_timing = hosts[host_count];
         if( hosts[host_count] < min_timing) min_timing = hosts[host_count];
         sum += hosts[host_count];

 average_timing = sum/num_threads;
 max_threshold = (float)1.0+threshold_factor;
 min_threshold = (float)1.0-threshold_factor;

 fprintf(stdout, "\n  Load Balance Summary \n\n");
 fprintf(stdout, "     Processor Number            Ratio           Status            Remarks\n");
 fprintf(stdout, " --------------------------------------------------------------------------------------------\n");
 for(i=0; i max_threshold) fprintf(stdout,"%20s%10s%20s","Overloaded"," "," Better decomposition is needed");
     if( ratio < min_threshold ) fprintf(stdout,"%20s%10s%20s","Underloaded"," "," Better decomposition is needed");
 fprintf(stdout, " --------------------------------------------------------------------------------------------\n");
 fprintf(stdout, "     Total number of threads: %10d\n", num_threads);
 fprintf(stdout, "     Timing \n");
 fprintf(stdout, "     Average CPU (seconds)  : %10.4f\n", average_timing);
 fprintf(stdout, "     Maximum CPU (seconds)  : %10.4f\n", max_timing);
 fprintf(stdout, "     Minimum CPU (seconds)  : %10.4f\n", min_timing);
 fprintf(stdout, "     Thresholds \n");
 fprintf(stdout, "     Maximum threshold used : %10.4f\n", max_threshold);
 fprintf(stdout, "     Minimum threshold used : %10.4f\n", min_threshold);
 fprintf(stdout, "\n\n");



Sample Output

  Load Balance Summary

     Processor Number            Ratio           Status            Remarks
              0                  0.81
              1                  0.86
              2                  0.87
              3                  1.00
              4                  1.36          Overloaded           Better decomposition is needed
              5                  1.02
              6                  1.11
              7                  1.34          Overloaded           Better decomposition is needed
              8                  1.14
              9                  0.89
             10                  0.82
             11                  0.78         Underloaded           Better decomposition is needed
     Total number of threads:         12
     Average CPU (seconds)  : 31791.2500
     Maximum CPU (seconds)  : 43180.0000
     Minimum CPU (seconds)  : 24829.0000
     Maximum threshold used :     1.2000
     Minimum threshold used :     0.8000

  • LowTower says:

    Hello Suri,

    I tried to compile the programm with gcc under HP-UX.

    It results in the following errors:

    load_balance_check.c: In function `main’:
    load_balance_check.c:23: parse error before `float’
    load_balance_check.c:41: `threshold_factor’ undeclared (first use this function)
    load_balance_check.c:41: (Each undeclared identifier is reported only once
    load_balance_check.c:41: for each function it appears in.)
    load_balance_check.c:63: `max_threshold’ undeclared (first use this function)
    load_balance_check.c:64: `min_threshold’ undeclared (first use this function)

    I don’t know C very much, but it seems to me that in line 22 there is a “;” to much!


  • LowTower says:

    Hello Suri,

    me again!

    Now I get the following:

    loadbalance_checker.c:11: `#include’ expects “FILENAME” or
    loadbalance_checker.c:12: `#include’ expects “FILENAME” or
    loadbalance_checker.c:13: `#include’ expects “FILENAME” or

    The same with “cc”.


  • Suri Bala says:

    Hi LowTower,

    I am not sure about this since it compiles ok on Linux based systems.
    I would suggest to download the code directly instead of copying from the preview. This may help avoid text formatting issues. If this does not work, please let me know and I can try to create an executable for HP-UX.


  • Andy H says:

    I understand what you are calculating with the load checker, and can see that I have reasonably balanced loads. However, I do have a problem at the moment where I have reduced my model size from 1,050,000 elements to approx 600,000 working on 16cpus and load balance is equal but my time per cycle has been increasing (hence a longer run time).

    So although a load balance analysis is a good idea it doesn’t help in comparing model calculation performance.

    So my question is. Is it possibly to determine in an mpp run what is influencing the time per cycle? I think that this is an area that would help people better optimise their runs on mpp.

    For example, in structural load cases more cpus in mpp equates to reduced runtime (with global contacts). However, I see the problem where occupant load cases with more cpus in mpp equates to increased runtime (with global contacts and many small contacts). Hence, I am interested in understanding the information at hand in the dyna ascii files to determine how best to optimise my modelling.

  • Suri Bala says:


    Thats a good point. Load balance check looks at a problem as a whole but does not really worry about the dynamic TPZ (TIme per zone). There are several ways to look at and I wrote about this a while ago perhaps it may help.

    Time per zone


  • kevinzeng says:

    this part maybe not work as expected.

    //if threshold is specified accept it
    if( argc==3) {
    sprintf(argv[2], “.4f�, threshold_factor);

    perhaps it should be

    if( argc==3) {
    sscanf(argv[2], “%f�, &threshold_factor);

  • Suri Bala says:


    Thanks for pointing this out. I will incorporate the fix.


Leave a Reply

Your email address will not be published. Required fields are marked *