Galois
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Output Statistics of Galois Programs

For performance analysis, programmers instrument pieces of codes to collect program statistics.

Galois relieves developers from this burden by providing (1) built-in and customizable instrumentation, and (2) friendly output of statistics.

Basic Statistics

Parallel loop iterators in Galois provide built-in instrumentation. Upon termination, Galois apps will output statistics in csv format, similar to the following:

STAT_TYPE, REGION, CATEGORY, TOTAL_TYPE, TOTAL
STAT, for_each_1, Iterations, TSUM, 9028387
STAT, for_each_1, Time, TMAX, 1663
STAT, for_each_1, Commits, TSUM, 9000000
STAT, for_each_1, Pushes, TSUM, 0
STAT, for_each_1, Conflicts, TSUM, 28387

The first row is the header of the csv output. REGION tells you which parallel loop the statistics are related. For example, "for_each_1" corresponds to the parallel loop which has an option of galois::loopname("for_each_1").

CATEGORY specifies what are being reported for the parallel region. For galois::for_each loops, the following five statistics are reported:

  1. Iterations: the number of iterations executed
  2. Time: the runtime, in milliseconds
  3. Commits: the number of iterations committed
  4. Pushes: the number of iterations generated
  5. Conflicts: the number of iterations aborted due to conflicts

For galois::do_all loops, only time and iterations are reported, since there are no conflicts and pushes in galois::do_all loops.

TOTAL_TYPE tells you how the statistics are derived. TSUM means that the value is the sum of all iterations' contributions; TMAX means it is the maximum among all threads for this statistic. Apart from TMAX and TSUM, Galois offers the following derivation of statistics:

  • TMIN: the value is the minimum among all threads for this statistic.
  • TAVG: the value is the average of all contributions from all iterations to this statistic.
  • SINGLE: a single value is returned for this statistic. This is usually used for user-defined statistics. See Self-defined Statistics for examples.
  • ThreadValues: contribution of each thread to this statistic. See Advanced Control of Output Statistics for examples.

For lonestar apps, pass -statFile path_to_csv_file as part of the command-line arguments to redirect the output of statistics of the program to file path_to_csv_file.

Advanced Control of Output Statistics

Users can choose the amount of output statistics reported in the following ways.

  1. Turn off reporting statistics for a parallel loop by
  2. Report more statistics for a parallel loop by passing galois::more_stats as an option to the galois::do_all or galois::for_each call.
  3. Report per-thread contribution by setting the environmental variable "PRINT_PER_THREAD_STATS" before executing a Galois app. Below is an example execution asking for per-thread stats:

    $> PRINT_PER_THREAD_STATS=1 ./sssp input_graph -t 8

    And below is the output with per-thread stats.

    STAT_TYPE, REGION, CATEGORY, TOTAL_TYPE, TOTAL
    STAT, PageAlloc, MeminfoPre, TSUM, 21
    STAT, PageAlloc, MeminfoPre, ThreadValues, 7; 2; 2; 2; 2; 2; 2; 2
    STAT, PageAlloc, MeminfoPost, TSUM, 43
    STAT, PageAlloc, MeminfoPost, ThreadValues, 8; 5; 5; 5; 5; 5; 5; 5
    STAT, SSSP, Iterations, TSUM, 482052
    STAT, SSSP, Iterations, ThreadValues, 80602; 71506; 87690; 58227; 53784; 77878; 27112; 25253
    STAT, SSSP, Commits, TSUM, 482052
    STAT, SSSP, Commits, ThreadValues, 80602; 71506; 87690; 58227; 53784; 77878; 27112; 25253
    STAT, SSSP, Pushes, TSUM, 482051
    STAT, SSSP, Pushes, ThreadValues, 81305; 71954; 87562; 57907; 54168; 77942; 26152; 25061
    STAT, SSSP, Conflicts, TSUM, 0
    STAT, SSSP, Conflicts, ThreadValues, 0; 0; 0; 0; 0; 0; 0; 0
    STAT, SSSP, Time, TMAX, 19
    STAT, SSSP, Time, ThreadValues 19
    STAT, (NULL), Time, TMAX, 19
    STAT, (NULL), Time, ThreadValues, 19
    ...

    Note the lines with TOTAL_TYPE of "ThreadValues" report sequences of statistics for threads 0 to 7 in this example. Per-thread values are reported only for stats aggregated by TSUM operation.

Self-defined Statistics

Monitor algorithm-specific statistics with the following steps.

  1. Identify the aggregation of the desired statistics as reduction operations, e.g. sum, min, max, etc.
  2. Declare the reducible objects for the statistics before the parallel loop of interest. See Reduction Operations for reducible objects available in Galois.
  3. Collect per-thread contribution for the statistics in the parallel loop of interest.
  4. Reduce and report the statistics after the parallel loop of interest.

Let us use single-source shortest path (SSSP) problem to illustrate the use of self-defined statistics. In SSSP, "bad works" mean the number of iterations in which the node distance is lowered more than once. Since the amount of bad works is the sum of the bad works encountered by each thread, bad works should be collected and reduced using sum operation, which calls for a galois::GAccumulator. We declare the collector before the parallel loop for SSSP as the following:

Then in the operator for the parallel loop for SSSP, collect per-thread contribution of bad works as the following:

if (oldDist != SSSP::DIST_INFINITY) {
BadWork += 1;
}

Finally, after the parallel loop for SSSP is done, report the number of bad works as the following:

galois::runtime::reportStat_Single("SSSP", "BadWork", BadWork.reduce());

In the above example, the TOTAL_TYPE for BadWork is SINGLE.

galois::runtime::reportStat_Single can be used if the statistic is reduced to a single value before being reported. Other stat reporting functions are galois::runtime::reportStat_Tmax, galois::runtime::reportStat_Tmin, galois::runtime::reportStat_Tsum, and galois::runtime::reportStat_Tavg, but they are rarely used in user code. The suffix of a stat reporitng function indicates the TOTAL_TYPE for the statistic being reported. All the stat reporting functions take three parameters: name for the region, name for the statistic, and the value to be aggregated/reported.

Conditional reporting of a statistic can be achieved by using conditional stat reporting functions, i.e. galois::runtime::reportStatCond_Single, galois::runtime::reportStatCond_Tmax, galois::runtime::reportStatCond_Tmin, galois::runtime::reportStatCond_Tsum, and galois::runtime::reportStatCond_Tavg. The conditional stat reporting functions accept a constant expression as a template parameter, and the reporting is turned on only when the constant expression is evaluated to true. Below is an code snippet showing the use of reportStatCond_Single:

// use this line for production code
constexpr bool TRACE_WORK=false;
// use this line for profiling code
//constexpr bool TRACE_WORK=true;
galois::runtime::reportStatCond_Single< TRACE_WORK > ("SSSP", "BadWork", BadWork.reduce());

Timing Serial Code Region

The runtime of a serial code region, in milliseconds, can be measured by enclosing the code region in between calls to galois::StatTimer::start and galois::StatTimer::stop. The following code snippet shows an example from lonestar/tutorial_examples/GraphTraversalSerial.cpp:

//******************************************************
// serial traversal over a graph
// sum over nodes and edges in C++11 syntax
galois::StatTimer T("sum_serial");
T.start();
// iterate over nodes
for (auto n : g) {
auto& sum = g.getData(n); // get node data of n
sum = 0;
// iterate over edges from node n
for (auto e : g.edges(n)) {
sum += g.getEdgeData(e); // get edge data of e
}
}
T.stop();

After program termination, the timer will output statistics like the following:

STAT_TYPE, REGION, CATEGORY, TOTAL_TYPE, TOTAL
STAT, (NULL), sum_serial, TMAX, 15

Note that the name fed to the timer is printed as a category under the region "(NULL)".