Galois
|
When optimizing Galois apps, you may need to work with an external profiling infrastructure to have an idea about the performance in micro-architectural level.
Currently Galois supports profiling with Intel VTune and PAPI. For this to work, you need to include the header galois/runtime/Profile.h, and instrument your code as the following sections suggest.
Intel VTune is a profiling tool offered by Intel in Intel Parallel Studio.
Turn on the use of Intel VTune by running cmake with -DGALOIS_ENABLE_VTUNE=1 option. Instrument the code region of interest with galois::runtime::profileVtune, which expects two arguments: (1) the code region to be profiled as a lambda expression, functor, etc., and (2) the name for the code region. Below is an example of profiling the node-iterator algorithm for triangle counting with Intel VTune:
Compile your code and run with Intel VTune to collect statistics.
PAPI stands for "Performance Application Programming Interface".
Turn on the use of PAPI by running cmake with -DGALOIS_ENABLE_PAPI=1 option. Instrument the code region of interest with galois::runtime::profilePapi, which expects two arguments: (1) the code region to be profiled as a lambda expression, functor, etc., and (2) the name for the code region. Below is an example of profiling the edge-iterator algorithm for triangle counting with PAPI:
Compile your code and run with a sequence of PAPI counters you want to collect. Below is an example command-line:
$> GALOIS_PAPI_EVENTS="PAPI_L1_DCM,PAPI_L2_DCM,PAPI_BR_MSP,PAPI_TOT_INS,PAPI_TOT_CYC" ./triangles input_graph -algo edgeiterator -t 24
Upon program termination, the value of PAPI counters will be reported along with other statistics in csv output, similar to the following:
STAT_TYPE, REGION, CATEGORY, TOTAL_TYPE, TOTAL
STAT, PageAlloc, MeminfoPre, TSUM, 53
STAT, PageAlloc, MeminfoPost, TSUM, 122
STAT, Initialize, Iterations, TSUM, 264346
STAT, Initialize, Time, TMAX, 17
STAT, edgeIteratingAlgo, Iterations, TSUM, 730100
STAT, edgeIteratingAlgo, Time, TMAX, 21
STAT, edgeIteratorAlgo, Time, TMAX, 21
STAT, edgeIteratorAlgo, PAPI_L1_DCM, TSUM, 613659
STAT, edgeIteratorAlgo, PAPI_L2_DCM, TSUM, 368932
STAT, edgeIteratorAlgo, PAPI_BR_MSP, TSUM, 1901191
STAT, edgeIteratorAlgo, PAPI_TOT_INS, TSUM, 293743102
STAT, edgeIteratorAlgo, PAPI_TOT_CYC, TSUM, 548013881
STAT, (NULL), Time, TMAX, 263
STAT, (NULL), GraphReadingTime, TMAX, 26
...
Note that the PAPI counters are reported as categories for the region "edgeIteratorAlgo", the name provided to the galois::runtime::profilePapi call.