Manual
Note that version 2.1 of Galois only supports a subset of functionality available in the previous version, version 2.0. In particular, the current version only supports unordered set iterators and cautious operators. Unordered set iterators do not depend on a specific order of processing items for correctness, although they may prefer a specific order for efficiency. Cautious operators acquire all necessary locks before modifying values or mutating the graph. Additionally, there is no support for Parameter profiles in the current version.
Compiling
Galois C++ uses cmake to generate makefiles (or other project files). To build Galois, from the source directory, type:
This creates both a release and debug build tree from the source. Other build options are supported by cmake. See the cmake documentation for additional information.
Graph Classes
Galois C++ includes two tested graph classes:
1 2 | Galois::Graph::FirstGraph<NodeData, EdgeData, Directed> Galois::Graph::LC_CSR_Graph<NodeData, EdgeData> |
Line 1 is FirstGraph
which allows concurrent mutation of the
graph. Line 2 is a local computation graph which does not allow
graph mutation. The Local Computation graph directly mmap
s a
binary representation of the graph into memory for fast load
times. The format of the graph is documented in the FileGraph
class.
Command Line
Galois programs can be run (and built) just like any other application. By
default, the benchmark programs distributed with the Galois system take a
-t num_threads
command line argument that specifies the
number of threads to use when running in parallel and a --help
argument that prints out program usage information.
Statistics
During execution of an application, the Galois runtime system will collect and print various statistics. These statistics help keep track of how well the application and the Galois system are performing. By default, the system keeps track of runtime, and number of iterations executed and aborted. The following is example output:
Runtime is the time taken between calls to
Galois::StatTimer.start()
and
Galois::StatTimer.stop()
. For most applications included in
the Galois distribution this excludes time to read input from files. The
scripts in the scripts/
directory in the Galois distribution
give examples on how to parse these results.
Running a Parallel Foreach Loop
Galois C++ executes parallel regions of code using
Galois::for_each
which is similar to std::for_each
(and in some cases has the same function signature). This function takes
either a Galois worklist or a pair of iterators to use as the initial work.
The operator is passed in as a C++ functor (recommended) or a function pointer.
A templated functor is recommended as the signature of the user operator
depends on the worklist being employed. In either case, the for_each
will execute the specified code on each element of the initial work
in some order but not necessarily the same order as the initial work
iterator. This is referred to an unordered Galois iterator.
An example:
The call to Galois::for_each
takes an optional template
parameter indicating a suggestion to the runtime system about the order in
which to execute iterations. There are several policies available.
- PriQueue
- implements a
std::pri_queue
compatible priority queue. - FIFO and LIFO
- implement
std::queue
andstd::stack
like behaviors respectively. - OrderedByIntegerMetric
- implements a priority queue based on a supplied function which maps a work item to an integer priority. Lower values are a higher priority. An inner queue may be passed to control how items within the same priority are stored.
- ChunkedFIFO and ChunkedLIFO
- implement a chunked FIFO or LIFO strategy to reduce contention. Each thread has a chunk of work which it is filling when pushing and a chunk which is being emptied by popping. When a chunk is filled, it is placed on the central FIFO or LIFO.
- dChunkedFIFO and dChunkedLIFO
- behave like their non-d counterparts, but maintain a FIFO or LIFO per CPU package (usually L3 cache). If a processor's package local FIFO or LIFO is empty, it attempts to steal a chunk from another CPU package.
- LocalQueues
- create local non-shared worklists which are used for all work generated during concurrent operation and use a global worklist for all initial work.
The policies are presented in ascending performance order. Thus, in absence of other considerations like load-balancing or maintaining a certain priority order, policies later down the list should be preferred over ones earlier.
Scheduling policies can be nested as well. Each additional policy applies to elements in the previous policy that are not strictly ordered.
Type Traits
The functor passed to for_each
may be specialized by
several different type-traits to optimize the runtime with respect to
the features used by the functor. See the API documentation for more
details. The following is list of available type-traits.
- needs_parallel_break
- Indicates the operator may request the parallel loop to be suspended and a given function run in serial
- does_not_need_parallel_push
- Indicates the operator does not generate new work and push it on the worklist
- needs_per_iter_alloc
- Indicates the operator may request the access to a per-iteration allocator
- does_not_need_context
- Indicates the operator doesn't need a per-iteration context
- does_not_need_stats
- Indicates the operator doesn't need its execution stats recorded