Matrix Completion
Problem Description
This benchmark implements Stochastic Gradient Descent (SGD). In particular, the benchmark uses SGD to complete unknown entries of a sparse matrixThe goal of the benchmark is to build two matrices, A (1xm) and B (1xu), such that the matrix product AB approximately equals R, the original ratings matrix described above. Each entry of A and B is a vector of real numbers and is called a latent vector.
After A and B have been calculated, they can be used to predict unknown entries of R.
Algorithm
First, the entries of the latent vector are initialized to random numbers. Then the ratings are normalized.Then for every edge in the graph R, we apply stochastic gradient descent as follows:
function gradient_update(edge) { Mlv = edge.src.lv Ulv = edge.dst.lv diff = edge.rating - dotproduct(Mlv, Ulv) update(Mlv, diff) update(Ulv, diff) }
The key problem in a parallel implementation is to ensure that the gradient_update procedure has exclusive access to the end-points of the edge. Although this is conceptually simple, the benchmarks show that different locking strategies lead to different schedules, which in turn lead to very wide differences in performance. This is especially the case when applied to scale-free graphs.
For more information on this benchmark, see:
Rashid Kaleem, Anand Venkat, Sreepathi Pai, Mary Hall, Keshav Pingali, "Synchronization Trade-offs in GPU implementations of Graph Algorithms", IPDPS, 2016.The benchmark is set to run for 5 rounds. This benchmark has rough correspondence to the GPU implementations described in the paper above.
Performance
Graph | Time (ms) |
---|---|
Netflix | 5500 |
Amazon | 22260 |
Yahoo | 17868 |
The above performance is observed on a 1733.500 Mhz GeForce GTX 1080 6000 with 8119 MB of main memory.