Pagerank

Problem Description

PageRank is a key technique in web mining to rank the importance of web pages. In PageRank, each web page is assigned a numerical weight to begin with, and the algorithm tries to estimate the importance of the web page relative to other web pages in the hyperlinked set of pages. The key assumption is that more important web pages are likely to receive more links from other websites. More details about the problem and different solutions can be found in [1, 2].

[1] https://en.wikipedia.org/wiki/PageRank
[2] Whang et al. Scalable Data-driven PageRank: Algorithms, System Issues, and Lessons Learned. European Conference on Parallel Processing, 2015.

Application Description

This benchmark computes the PageRank of the nodes for a given input graph using both a push-style residual-based algorithm. The algorithms takes the input as a graph, and some constant parameters that are used in the computation. The algorithmic parameters are the following:

  • ALPHA: ALPHA represents the damping factor, which is the probability that a web surfer will continue browsing by clicking on the linked pages. The damping factor is generally set to 0.85 in the literature.
  • TOLERANCE: It represents a bound on the error in the computation.
  • MAX_ITER: The number of iterations to repeat the PageRank computation.

The CUDA code for the implementation is generated using the IrGL compiler[1].

[1] https://users.oden.utexas.edu/~sreepai/sree-oopsla2016.pdf


Performance

Graph Time (ms)
USA-road-d.NY 28
USA-road-d.USA 5672

The above performance is observed on a 1733.500 Mhz GeForce GTX 1080 6000 with 8119 MB of main memory.