Galois
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Parallel Loops

Introduction

Galois includes three primary types of parallel loop constructs. These parallel loops provide an interface for running a given operator with various schedules. The type of parallel loop needed depends on the properties of the operator being run.

The three primary parallel loops in Galois are:

  • do_all
  • for_each
  • on_each

At a high level, galois::do_all is appropriate when a range of completely independent tasks need to be run. galois::for_each is appropriate in more general cases like when tasks may conflict, per-iteration allocation is needed within the operator being run, a special scheduling is required, or new work can be discovered as the loop runs. galois::on_each is a lower level interface for running an operation on each thread.

galois::do_all

galois::do_all divides the work items evenly among the threads available. It supports work stealing, but does not include it by default. It assumes that no conflict detection is needed. It is often used in cases like topology-driven algorithms that iterate over the nodes in a graph. It is also often used in cases where a group of independent work items need to be performed.

galois::do_all requires the following inputs:

  • A range of work items
  • An operator to run on each work item

The range of work items can be specified using an instance of galois::iterate. It can be constructed from any of the following:

  • A pair of iterators (begin and end)
  • A pair of unsigned integers for begin and end
  • An initializer list
  • A container that provides a forward iterator (or some more permissive type of iterator)

The operator can be a lambda expression, an object with a call operator (a functor), or a function pointer. It must be able to be called with a single work item as its only argument. The operator must not throw exceptions into its surrounding context. As of this writing, C++ exceptions can not be safely thrown from an operator into the context where it is called.

If the range is a range of iterators, the work item will be a reference to whatever is obtained by dereferencing an iterator of the given type. If the range is a range of integers, the work item will be an integer.

galois::do_all accepts the following optional parameters to enable/disable features:

The following is an example from the tutorial of how to use galois::do_all. Note that, in this example, the range of work items given to galois::do_all corresponds to the outer loop range in the serial implementation. Also note that the operator passed to galois::do_all corresponds to the body of the loop in the serial implementation.

galois::iterate(g.begin(), g.end()), // range
[&](GNode n) { // operator
auto& sum = g.getData(n);
sum = 0;
for (auto e : g.edges(n)) {
sum += g.getEdgeData(e);
}
},
galois::loopname("sum_in_do_all_with_lambda") // options
);

The full example is available at lonestar/tutorial_examples/GraphTraversalPullOperator.cpp

Work Stealing

Work stealing is implemented hierarchically similar to galois::worklists::PerThreadChunkFIFO. Threads steal work within their own socket when they finish their local work, and only the socket leader can steal from outside its socket. The stealing amount is half of the rest work of the stolen thread. Below is an example of turning on work stealing for a galois::do_all loop.

galois::do_all(galois::iterate(graph), incrementNeighborsAtomically,
galois::loopname("do_all_self_sync"), galois::steal(),

galois::for_each

galois::for_each is designed for cases where work items may be created dynamically or conflict detection between different threads is needed. It is built around parallel worklists with customizable scheduling policies, and is particularly useful in cases where

  • Operators that touch disjoint sections of data can be executed concurrently.
  • The only strict dependencies are between a given work item and the work items it creates.

The schedule is used to tell the runtime that the loop may run more quickly if certain work items are prioritized.

galois::for_each requires the following arguments:

  • A starting range of work items
  • An operator to apply to each work item

The range of work items to start from is specified using an instance of galois::iterate, as was the case with galois::do_all.

The operator can be a lambda expression, an object with a call operator (a functor), or a function pointer. The operator must be callable with the following arguments

The work item will have the same type as the iterators given to the loop do when dereferenced, but if integers are given instead of iterators, the work item will be an integer.

The reference to the galois::UserContext provides an interface that can be used within the body of the loop for creating new work items, performing conflict detection, and hosting per-iteration allocators. This user context is accessible to the user primarily so it can provide a means for users to add new work items to the underlying worklist. Although conflict detection is usually performed within the APIs of conflict-aware data structures, the user context also manages the locks acquired while a given operator runs.

The operator passed to galois::for_each is required to be cautious. In other words, the operator must acquire all the locks it will need before it modifies any data. In cases where an operator fails to acquire a lock, it will abort immediately, freeing any per-iteration allocations and releasing any locks it has already acquired. In these cases, if the operator is cautious, it is guaranteed to fail without leaving the shared data structure in an inconsistent state. The operator is also expected to not throw exceptions. As of this writing, C++ exceptions must not be thrown from an operator into the context where it is called.

galois::for_each also can be called with various optional arguments:

The following example from the tutorial shows how to use galois::for_each with conflict detection. This example uses a push-style algorithm where each node adds an integer stored as edge data to the node data of its neighbors.

This shows how to do this with conflict detection enabled:

//******************************************************
// parallel traversal over a graph using galois::for_each
// 1. push operator is specified using lambda expression
// 2. for_each is named "sum_in_for_each_with_push_operator" to show stat
// after this program finishes
initialize(g);
galois::iterate(g.begin(), g.end()), // range
[&](GNode n, auto&) { // operator
for (auto e : g.edges(n)) { // cautious point
auto dst = g.getEdgeDst(e);
g.getData(dst) += g.getEdgeData(e);
}
},
galois::loopname("sum_in_for_each_with_push_operator") // options
);

This shows how to do this without conflict detection:

// define lambda expression as a varible for reuse
auto sumEdgeWeightsAtomically = [&](GNode n) {
for (auto e : g.edges(n)) {
auto dst = g.getEdgeDst(e);
auto& dstData = g.getData(dst);
auto edgeWeight = g.getEdgeData(e);
__sync_fetch_and_add(&dstData, edgeWeight);
}
};
//******************************************************
// parallel traversal over a graph using galois::do_all w/o work stealing
// 1. push operator uses atomic intrinsic
// 2. do_all is named "sum_in_do_all_with_push_atomic" to show stat after this
// program finishes
initialize(g);
galois::do_all(galois::iterate(g.begin(), g.end()), // range
sumEdgeWeightsAtomically // operator
,
galois::loopname("sum_in_do_all_with_push_atomic") // options
);
//******************************************************
// parallel traversal over a graph using galois::for_each
// 1. push operator uses atomic intrinsic
// 2. for_each is named "sum_in_do_for_each_with_push_atomic" to show stat
// after this program finishes
initialize(g);
galois::iterate(g.begin(), g.end()), // range
[&](GNode n, auto&) { sumEdgeWeightsAtomically(n); } // operator
,
galois::loopname("sum_in_for_each_with_push_atomic") // options
,

Note that, since no new work is created and there are no conflicts between operators, this second example could just have well been implemented using galois::do_all. For more details, see lonestar/tutorial_examples/GraphTraversalPushOperator.cpp.

galois::on_each

galois::on_each is a lower level loop construct that simply calls the given operator from each of the threads in the Galois thread pool.

It requires an operator that accepts a thread id and the total number of threads.

galois::on_each can also be called with two optional arguments:

Specialized Parallel Loops

Galois provides the following specialized parallel loops.

  • Deterministic loop iterator: schedule active work items deterministically and produce the same answer across different platforms. See Deterministic Loop Iterator for details.
  • ParaMeter loop iterator: measure the amount of parallelism during loop execution. See ParaMeter Loop Iterator for details.