Galois
|
For performance analysis, programmers instrument pieces of codes to collect program statistics.
Galois relieves developers from this burden by providing (1) built-in and customizable instrumentation, and (2) friendly output of statistics.
Parallel loop iterators in Galois provide built-in instrumentation. Upon termination, Galois apps will output statistics in csv format, similar to the following:
STAT_TYPE, REGION, CATEGORY, TOTAL_TYPE, TOTAL
STAT, for_each_1, Iterations, TSUM, 9028387
STAT, for_each_1, Time, TMAX, 1663
STAT, for_each_1, Commits, TSUM, 9000000
STAT, for_each_1, Pushes, TSUM, 0
STAT, for_each_1, Conflicts, TSUM, 28387
The first row is the header of the csv output. REGION tells you which parallel loop the statistics are related. For example, "for_each_1" corresponds to the parallel loop which has an option of galois::loopname("for_each_1").
CATEGORY specifies what are being reported for the parallel region. For galois::for_each loops, the following five statistics are reported:
For galois::do_all loops, only time and iterations are reported, since there are no conflicts and pushes in galois::do_all loops.
TOTAL_TYPE tells you how the statistics are derived. TSUM means that the value is the sum of all iterations' contributions; TMAX means it is the maximum among all threads for this statistic. Apart from TMAX and TSUM, Galois offers the following derivation of statistics:
For lonestar apps, pass -statFile path_to_csv_file as part of the command-line arguments to redirect the output of statistics of the program to file path_to_csv_file.
Users can choose the amount of output statistics reported in the following ways.
Passing galois::no_stats as an option to the galois::do_all or galois::for_each call if the loopname is required, as shown in the following code snippet.
Report per-thread contribution by setting the environmental variable "PRINT_PER_THREAD_STATS" before executing a Galois app. Below is an example execution asking for per-thread stats:
$> PRINT_PER_THREAD_STATS=1 ./sssp input_graph -t 8
And below is the output with per-thread stats.
STAT_TYPE, REGION, CATEGORY, TOTAL_TYPE, TOTAL
STAT, PageAlloc, MeminfoPre, TSUM, 21
STAT, PageAlloc, MeminfoPre, ThreadValues, 7; 2; 2; 2; 2; 2; 2; 2
STAT, PageAlloc, MeminfoPost, TSUM, 43
STAT, PageAlloc, MeminfoPost, ThreadValues, 8; 5; 5; 5; 5; 5; 5; 5
STAT, SSSP, Iterations, TSUM, 482052
STAT, SSSP, Iterations, ThreadValues, 80602; 71506; 87690; 58227; 53784; 77878; 27112; 25253
STAT, SSSP, Commits, TSUM, 482052
STAT, SSSP, Commits, ThreadValues, 80602; 71506; 87690; 58227; 53784; 77878; 27112; 25253
STAT, SSSP, Pushes, TSUM, 482051
STAT, SSSP, Pushes, ThreadValues, 81305; 71954; 87562; 57907; 54168; 77942; 26152; 25061
STAT, SSSP, Conflicts, TSUM, 0
STAT, SSSP, Conflicts, ThreadValues, 0; 0; 0; 0; 0; 0; 0; 0
STAT, SSSP, Time, TMAX, 19
STAT, SSSP, Time, ThreadValues 19
STAT, (NULL), Time, TMAX, 19
STAT, (NULL), Time, ThreadValues, 19
...
Note the lines with TOTAL_TYPE of "ThreadValues" report sequences of statistics for threads 0 to 7 in this example. Per-thread values are reported only for stats aggregated by TSUM operation.
Monitor algorithm-specific statistics with the following steps.
Let us use single-source shortest path (SSSP) problem to illustrate the use of self-defined statistics. In SSSP, "bad works" mean the number of iterations in which the node distance is lowered more than once. Since the amount of bad works is the sum of the bad works encountered by each thread, bad works should be collected and reduced using sum operation, which calls for a galois::GAccumulator. We declare the collector before the parallel loop for SSSP as the following:
Then in the operator for the parallel loop for SSSP, collect per-thread contribution of bad works as the following:
Finally, after the parallel loop for SSSP is done, report the number of bad works as the following:
In the above example, the TOTAL_TYPE for BadWork is SINGLE.
galois::runtime::reportStat_Single can be used if the statistic is reduced to a single value before being reported. Other stat reporting functions are galois::runtime::reportStat_Tmax, galois::runtime::reportStat_Tmin, galois::runtime::reportStat_Tsum, and galois::runtime::reportStat_Tavg, but they are rarely used in user code. The suffix of a stat reporitng function indicates the TOTAL_TYPE for the statistic being reported. All the stat reporting functions take three parameters: name for the region, name for the statistic, and the value to be aggregated/reported.
Conditional reporting of a statistic can be achieved by using conditional stat reporting functions, i.e. galois::runtime::reportStatCond_Single, galois::runtime::reportStatCond_Tmax, galois::runtime::reportStatCond_Tmin, galois::runtime::reportStatCond_Tsum, and galois::runtime::reportStatCond_Tavg. The conditional stat reporting functions accept a constant expression as a template parameter, and the reporting is turned on only when the constant expression is evaluated to true. Below is an code snippet showing the use of reportStatCond_Single:
The runtime of a serial code region, in milliseconds, can be measured by enclosing the code region in between calls to galois::StatTimer::start and galois::StatTimer::stop. The following code snippet shows an example from lonestar/tutorial_examples/GraphTraversalSerial.cpp:
After program termination, the timer will output statistics like the following:
STAT_TYPE, REGION, CATEGORY, TOTAL_TYPE, TOTAL
STAT, (NULL), sum_serial, TMAX, 15
Note that the name fed to the timer is printed as a category under the region "(NULL)".