Triangle Counting

Application Description

This benchmark counts the number of triangles in a given undirected graph. It implements both the node-iterator and edge-iterator algorithms in [1]. All nodes/edges can be processed independently.

[1] Thomas Schank. Algorithmic Aspects of Triangle-Based Network Analysis. PhD Thesis. Universitat Karlsruhe. 2007.

Algorithms

Node-iterator algorithm works as the following. For a pair of nodes a and b, both are node n's neighbors, if a is b's neighbor, and vice versa, then there is a triangle composed of nodes n, a, and b. Therefore, we can count triangles by scanning through all pairs of neighbors for all nodes and see if the pair of nodes are each other's neighbor. To avoid double counting, we count a triangle only if a < n < b by some criteria. This requires pre-sorting of edge lists for all nodes. Figure 1 shows the pseudocode for node-iterator algorithm.

 1 2 3 4 5 6 7 8 9 10 11 12 /* sort edge lists for all nodes */ num_triangles = 0; foreach (Node n: graph) { foreach (Node a: n's neighbor) { if (a >= n) continue; foreach (Node b: n's neighbor) { if (b > n and b is a's neighbor) { num_triangles++; } } } }

Figure 1: Pseudocode for node iterator algorithm of triangle counting

Edge-iterator algorithm works as the following. Given an edge e whose end points are n and m, if n and m have a common neighbor a, then there is a triangle formed by nodes n, m, and a. Hence, we can count triangles by intersecting each edge's end points' edge lists and summing all intersections' sizes. To avoid double counting, we only consider n < a < m by some criteria when intersecting edge lists. This requires pre-sorting of edge lists for all nodes. Figure 2 shows the pseudocode for edge-iterator algorithm.

 1 2 3 4 5 6 7 8 9 /* sort edge lists for all nodes */ num_triangles = 0; foreach (Edge e: graph) { Node n = e.src; Node m = e.dst; NodeSet n_ngh = {s| s is n's neighbor and n < s < m}; NodeSet m_ngh = {t| t is m's neighbor and n < t < m}; num_triangles += |n_ngh intersect m_mgh|; }

Figure 2: Pseudocode for edge-iterator algorithm of triangle counting

Data Structures

There input graph is the data structure used in Triangle Counting for both node-iterator and edge-iterator algorithm.

Parallelism

Both node-iterator and edge-iterator algorithms are topology-driven algorithms, since all nodes/edges are active. The graph is only read, so all activities at nodes/edges can proceed independently.

Performance

Figures 2 and 4 show the execution time of the edge-iterator algorithm and of the node-iterator algorithm, respectively. Figures 3 and 5 show the self-relative speedup, i.e, speedup with respect to the single thread performance for edge-iterator algorithm an node-iterator algorithm, respectively. The input is LiveJournal graph for both algorithms.

Machine Description

Performance numbers are collected on a 4 package (14 cores per package) Intel(R) Xeon(R) Gold 5120 CPU machine at 2.20GHz from 1 thread to 56 threads in increments of 7 threads. The machine has 192GB of RAM. The operating system is CentOS Linux release 7.5.1804. All runs of the Galois benchmarks used gcc/g++ 7.2 to compile.