Introduction

Worklists are required to track work items in data-driven algorithms. A scheduling policy indicates the order by which work items are extracted from a worklist. Galois provides various scheduling policies in the namespace galois::worklists, which can be instantiated by providing one of them as the template parameter of galois::wl, an optional parameter to galois::for_each. Below we will cover Galois schedulers, starting from commonly used ones.

Chunked Worklists

Chunked worklists assign work items to threads one chunk at a time. Similarly, each thread accumulates new work in a chunk before putting it on the worklist. Chunking offers better scalability because threads can amortize the cost of their access to the shared worklist over the entire chunk. The user chooses the size of the chunk: A large chunk size means less contention on the shared worklist, but may lead to load-imbalance, while a small chunk size may increase the contention on the shared worklist.

The worklist of chunks itself can be organized in different ways, and we have observed that mapping the communication patterns of threads to the hardware-topology leads to more scalable implementations:

galois::worklists::ChunkFIFO (or galois::worklists::ChunkLIFO) maintains a single global queue (or stack) for chunks of work items.
galois::worklists::PerSocketChunkFIFO (or galois::worklists::PerSocketChunkLIFO) maintains a queue (or stack) of chunks per socket (multi-core processor) in the system. A thread tries to find a chunk in its local socket before stealing from other sockets.
galois::worklists::PerThreadChunkFIFO (or galois::worklists::PerThreadChunkLIFO) maintains a queue (or stack) of chunks per thread. Normally threads steal work within their socket, and only the leader of a socket can steal from other sockets when its own socket is out of work.

Below is an example of using chunked worklists from lonestar/tutorial_examples/SSSPPushSimple.cpp:

    galois::for_each(
        galois::iterate(
            {*graph.begin()}), // initial range using initializer list
        SSSP                   // operator
        ,
        galois::wl<PSchunk>() // options. PSchunk expands to
                              // galois::worklists::PerSocketChunkLIFO<16>,
                              // where 16 is chunk size
        ,
        galois::loopname("sssp_dchunk16"));

Ordered By Integer Metric (OBIM)

galois::worklists::OrderedByIntegerMetric is suitable for implementing soft priorities, by which active work items can be prioritized but priority inversion will not lead to wrong answers or deadlock. OBIM expects two template parameters:

An indexer mapping work items to integers. Lower values have higher priorities.
The type of worklist for each priority bin.

Below is an example of defining a galois::worklists::OrderedByIntegerMetric scheduling in lonestar/tutorial_examples/SSSPPushSimple.cpp:

  // Priority Function in SSSPPushSimple
  // Map user-defined priority to a bucket number in OBIM
  auto reqIndexer = [&](const GNode& N) {
    return (graph.getData(N, galois::MethodFlag::UNPROTECTED) >> stepShift);
  };
  using namespace galois::worklists;
  using PSchunk = PerSocketChunkLIFO<16>; // chunk size 16
  using OBIM    = OrderedByIntegerMetric<decltype(reqIndexer), PSchunk>;

The reqIndexer object defines the priority binning function. Internally this OBIM uses a PerSocketChunkLIFO to store the items with the same priority. The following code snippet shows how to use OBIM with galois::for_each:

    galois::for_each(
        galois::iterate(
            {*graph.begin()}), // initial range using initializer list
        SSSP                   // operator
        ,
        galois::wl<OBIM>(reqIndexer) // options. Pass an indexer instance for
                                     // OBIM construction.
        ,
        galois::loopname("sssp_obim"));

OBIM works well when the algorithms performance is sensitive to scheduling, and the work-items can be grouped into a small number of bins, ordered by integer priority (typically ~1000 bins). For example, when a single-source shortest path problem, focusing on nodes with lower distances will converge faster if there are sufficient number of nodes to be processed in parallel.

BulkSynchronous

When parallel execution is organized in rounds separated by barriers, existing work items are processed in current round, while new items generated in current round will be postponed until the next round. If this is the case, galois::worklists::BulkSynchronous can be used to avoid maintaining two worklists explicitly in user code. The underlying worklist for rounds can be customized by providing template parameters to galois::worklists::BulkSynchronous.

LocalQueue

galois::worklists::LocalQueue creates local non-shared worklists which are used for all work generated during concurrent operation and use a global worklist for all initial work.

OwnerComputes

galois::worklists::OwnerComputes is similar to galois::worklists::LocalQueue. The differences are listed below:

The user can provide a mapper as a template parameter to galois::worklists::OwnerComputes. This mapper maps work items to threads, represented as integers in the interval of [0, numThreads). A thread may generate a work item and specify another thread to process it.
The underlying worklists are maintained per socket.

StableIterator

galois::worklists::StableIterator can be used when loop iterates over a fixed range of items, and the operator does not generate new work items. It is different from galois::do_all in the sense that it supports speculation and allows for iterations to abort.

Sequential Worklists

Galois also provides sequential worklists, where push() and pop() need to acquire locks. As such, sequential schedulers are not scalable beyond a few number of threads, so we recommend not using them with parallel loops.

galois::worklists::FIFO (galois::worklists::LIFO) implements std::queue (std::stack) like behavior.
galois::worklists::OrderedList implements a std::priority_queue compatible priority queue. Elements of same priority are processed in arbitrary order.

Table of Contents