Galois
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Profiling Galois Code

When optimizing Galois apps, you may need to work with an external profiling infrastructure to have an idea about the performance in micro-architectural level.

Currently Galois supports profiling with Intel VTune and PAPI. For this to work, you need to include the header galois/runtime/Profile.h, and instrument your code as the following sections suggest.

Profiling with Intel VTune

Intel VTune is a profiling tool offered by Intel in Intel Parallel Studio.

Turn on the use of Intel VTune by running cmake with -DGALOIS_ENABLE_VTUNE=1 option. Instrument the code region of interest with galois::runtime::profileVtune, which expects two arguments: (1) the code region to be profiled as a lambda expression, functor, etc., and (2) the name for the code region. Below is an example of profiling the node-iterator algorithm for triangle counting with Intel VTune:

[&]() {
[&](const GNode& n) {
// Partition neighbors
// [first, ea) [n] [bb, last)
Graph::edge_iterator first =
graph.edge_begin(n, galois::MethodFlag::UNPROTECTED);
Graph::edge_iterator last =
graph.edge_end(n, galois::MethodFlag::UNPROTECTED);
Graph::edge_iterator ea =
lowerBound(first, last, LessThan<Graph>(graph, n));
Graph::edge_iterator bb =
lowerBound(first, last, GreaterThanOrEqual<Graph>(graph, n));
for (; bb != last; ++bb) {
GNode B = graph.getEdgeDst(bb);
for (auto aa = first; aa != ea; ++aa) {
GNode A = graph.getEdgeDst(aa);
Graph::edge_iterator vv =
graph.edge_begin(A, galois::MethodFlag::UNPROTECTED);
Graph::edge_iterator ev =
graph.edge_end(A, galois::MethodFlag::UNPROTECTED);
Graph::edge_iterator it =
lowerBound(vv, ev, LessThan<Graph>(graph, B));
if (it != ev && graph.getEdgeDst(it) == B) {
numTriangles += 1;
}
}
}
},
galois::loopname("nodeIteratingAlgo"));
},
"nodeIteratorAlgo");

Compile your code and run with Intel VTune to collect statistics.

Profiling with PAPI

PAPI stands for "Performance Application Programming Interface".

Turn on the use of PAPI by running cmake with -DGALOIS_ENABLE_PAPI=1 option. Instrument the code region of interest with galois::runtime::profilePapi, which expects two arguments: (1) the code region to be profiled as a lambda expression, functor, etc., and (2) the name for the code region. Below is an example of profiling the edge-iterator algorithm for triangle counting with PAPI:

[&]() {
[&](const WorkItem& w) {
// Compute intersection of range (w.src, w.dst) in neighbors of
// w.src and w.dst
Graph::edge_iterator abegin =
graph.edge_begin(w.src, galois::MethodFlag::UNPROTECTED);
Graph::edge_iterator aend =
graph.edge_end(w.src, galois::MethodFlag::UNPROTECTED);
Graph::edge_iterator bbegin =
graph.edge_begin(w.dst, galois::MethodFlag::UNPROTECTED);
Graph::edge_iterator bend =
graph.edge_end(w.dst, galois::MethodFlag::UNPROTECTED);
Graph::edge_iterator aa = lowerBound(
abegin, aend, GreaterThanOrEqual<Graph>(graph, w.src));
Graph::edge_iterator ea =
lowerBound(abegin, aend, LessThan<Graph>(graph, w.dst));
Graph::edge_iterator bb = lowerBound(
bbegin, bend, GreaterThanOrEqual<Graph>(graph, w.src));
Graph::edge_iterator eb =
lowerBound(bbegin, bend, LessThan<Graph>(graph, w.dst));
numTriangles += countEqual(graph, aa, ea, bb, eb);
},
galois::loopname("edgeIteratingAlgo"),
},
"edgeIteratorAlgo");

Compile your code and run with a sequence of PAPI counters you want to collect. Below is an example command-line:

$> GALOIS_PAPI_EVENTS="PAPI_L1_DCM,PAPI_L2_DCM,PAPI_BR_MSP,PAPI_TOT_INS,PAPI_TOT_CYC" ./triangles input_graph -algo edgeiterator -t 24

Upon program termination, the value of PAPI counters will be reported along with other statistics in csv output, similar to the following:

STAT_TYPE, REGION, CATEGORY, TOTAL_TYPE, TOTAL
STAT, PageAlloc, MeminfoPre, TSUM, 53
STAT, PageAlloc, MeminfoPost, TSUM, 122
STAT, Initialize, Iterations, TSUM, 264346
STAT, Initialize, Time, TMAX, 17
STAT, edgeIteratingAlgo, Iterations, TSUM, 730100
STAT, edgeIteratingAlgo, Time, TMAX, 21
STAT, edgeIteratorAlgo, Time, TMAX, 21
STAT, edgeIteratorAlgo, PAPI_L1_DCM, TSUM, 613659
STAT, edgeIteratorAlgo, PAPI_L2_DCM, TSUM, 368932
STAT, edgeIteratorAlgo, PAPI_BR_MSP, TSUM, 1901191
STAT, edgeIteratorAlgo, PAPI_TOT_INS, TSUM, 293743102
STAT, edgeIteratorAlgo, PAPI_TOT_CYC, TSUM, 548013881
STAT, (NULL), Time, TMAX, 263
STAT, (NULL), GraphReadingTime, TMAX, 26
...

Note that the PAPI counters are reported as categories for the region "edgeIteratorAlgo", the name provided to the galois::runtime::profilePapi call.