Manual

Note that version 2.1 of Galois only supports a subset of functionality available in the previous version, version 2.0. In particular, the current version only supports unordered set iterators and cautious operators. Unordered set iterators do not depend on a specific order of processing items for correctness, although they may prefer a specific order for efficiency. Cautious operators acquire all necessary locks before modifying values or mutating the graph. Additionally, there is no support for Parameter profiles in the current version.

Compiling

Galois C++ uses cmake to generate makefiles (or other project files). To build Galois, from the source directory, type:

cd build (mkdir release; cd release; cmake ../..) (mkdir debug; cd debug; cmake -DCMAKE_BUILD_TYPE=Debug ../..)

This creates both a release and debug build tree from the source. Other build options are supported by cmake. See the cmake documentation for additional information.

Graph Classes

Galois C++ includes two tested graph classes:

1 2 Galois::Graph::FirstGraph<NodeData, EdgeData, Directed> Galois::Graph::LC_CSR_Graph<NodeData, EdgeData>

Line 1 is FirstGraph which allows concurrent mutation of the graph. Line 2 is a local computation graph which does not allow graph mutation. The Local Computation graph directly mmaps a binary representation of the graph into memory for fast load times. The format of the graph is documented in the FileGraph class.

Command Line

Galois programs can be run (and built) just like any other application. By default, the benchmark programs distributed with the Galois system take a -t num_threads command line argument that specifies the number of threads to use when running in parallel and a --help argument that prints out program usage information.

Statistics

During execution of an application, the Galois runtime system will collect and print various statistics. These statistics help keep track of how well the application and the Galois system are performing. By default, the system keeps track of runtime, and number of iterations executed and aborted. The following is example output:

INFO: ThreadPool Using maxwell for thread assignment policy INFO: CommandLine apps/spanningtree/spanningtree 0 /h1/ddn/w/GaloisCpp/inputs/structured/torus5.gr INFO: Hostname maxwell Lonestar Benchmark Suite v3.0 (C++) Copyright (C) 2011 The University of Texas at Austin http://iss.ices.utexas.edu/lonestar/ application: Spanning-tree Algorithm Compute the spanning tree (not mimimal) of a graph Edges in spanning tree: 31 STAT SINGLE NodeSize (null) 96 STAT SINGLE Conflicts (null) 0 STAT SINGLE Iterations (null) 32 STAT SINGLE Time (null) 1 STAT DISTRIBUTION 0 ConflictsDistribution (null) n: 1 ave: 0.0 min: 0 max: 0 stdev: 0.0 STAT DISTRIBUTION 0 IterationsDistribution (null) n: 1 ave: 32.0 min: 32 max: 32 stdev: 0.0

Runtime is the time taken between calls to Galois::StatTimer.start() and Galois::StatTimer.stop(). For most applications included in the Galois distribution this excludes time to read input from files. The scripts in the scripts/ directory in the Galois distribution give examples on how to parse these results.

Running a Parallel Foreach Loop

Galois C++ executes parallel regions of code using Galois::for_each which is similar to std::for_each (and in some cases has the same function signature). This function takes either a Galois worklist or a pair of iterators to use as the initial work. The operator is passed in as a C++ functor (recommended) or a function pointer. A templated functor is recommended as the signature of the user operator depends on the worklist being employed. In either case, the for_each will execute the specified code on each element of the initial work in some order but not necessarily the same order as the initial work iterator. This is referred to an unordered Galois iterator. An example:

struct Process { void operator()(T& req, Galois::UserContext<T>& lwl) { //Do stuff } }; void runBodyParallel(T_iter begin, T_iter end) { Galois::for_each(begin, end, Process()); }

The call to Galois::for_each takes an optional template parameter indicating a suggestion to the runtime system about the order in which to execute iterations. There are several policies available.

PriQueue
implements a std::pri_queue compatible priority queue.
FIFO and LIFO
implement std::queue and std::stack like behaviors respectively.
OrderedByIntegerMetric
implements a priority queue based on a supplied function which maps a work item to an integer priority. Lower values are a higher priority. An inner queue may be passed to control how items within the same priority are stored.
ChunkedFIFO and ChunkedLIFO
implement a chunked FIFO or LIFO strategy to reduce contention. Each thread has a chunk of work which it is filling when pushing and a chunk which is being emptied by popping. When a chunk is filled, it is placed on the central FIFO or LIFO.
dChunkedFIFO and dChunkedLIFO
behave like their non-d counterparts, but maintain a FIFO or LIFO per CPU package (usually L3 cache). If a processor's package local FIFO or LIFO is empty, it attempts to steal a chunk from another CPU package.
LocalQueues
create local non-shared worklists which are used for all work generated during concurrent operation and use a global worklist for all initial work.

The policies are presented in ascending performance order. Thus, in absence of other considerations like load-balancing or maintaining a certain priority order, policies later down the list should be preferred over ones earlier.

Scheduling policies can be nested as well. Each additional policy applies to elements in the previous policy that are not strictly ordered.

Type Traits

The functor passed to for_each may be specialized by several different type-traits to optimize the runtime with respect to the features used by the functor. See the API documentation for more details. The following is list of available type-traits.

needs_parallel_break
Indicates the operator may request the parallel loop to be suspended and a given function run in serial
does_not_need_parallel_push
Indicates the operator does not generate new work and push it on the worklist
needs_per_iter_alloc
Indicates the operator may request the access to a per-iteration allocator
does_not_need_context
Indicates the operator doesn't need a per-iteration context
does_not_need_stats
Indicates the operator doesn't need its execution stats recorded