# Matrix Completion

## Problem Description

This benchmark implements Stochastic Gradient Descent (SGD). In particular, the benchmark uses SGD to complete unknown entries of a sparse matrix R. The sparse matrix represents a bipartite graph, with one set of nodes represent movies, while the other set represents users. The edge connecting a movie node to a user node denotes that the user has rated the movie, with the edge label representing the rating assigned.

The goal of the benchmark is to build two matrices, A (1xm) and B (1xu), such that the matrix product AB approximately equals R, the original ratings matrix described above. Each entry of A and B is a vector of real numbers and is called a latent vector.

After A and B have been calculated, they can be used to predict unknown entries of R.

## Algorithm

First, the entries of the latent vector are initialized to random numbers. Then the ratings are normalized.

Then for every edge in the graph R, we apply stochastic gradient descent as follows:

```function gradient_update(edge) {
Mlv = edge.src.lv
Ulv = edge.dst.lv

diff = edge.rating - dotproduct(Mlv, Ulv)

update(Mlv, diff)
update(Ulv, diff)
}
```

The key problem in a parallel implementation is to ensure that the gradient_update procedure has exclusive access to the end-points of the edge. Although this is conceptually simple, the benchmarks show that different locking strategies lead to different schedules, which in turn lead to very wide differences in performance. This is especially the case when applied to scale-free graphs.