Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank. Mittagseminar Martin Jaggi, Oct

Size: px

Start display at page:

Download "Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank. Mittagseminar Martin Jaggi, Oct"

Amber Perry
5 years ago
Views:

1 Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank Mittagseminar Martin Jaggi, Oct 6 009

2 Sparse Approximation The Problem f( ) convex min f(x) x R n x 0 T x = min f(x) X S n n X 0 Tr(X) = vectors living in the simplex symetric matrices living in the spectahedron /0

3 The Problem min f(x) x R n x 0 T x = min f(x) X S n n X 0 Tr(X) = The Algorithm x (k+) := ( λ) +λ X (k+) := ( λ) X (k) +λvv T x (k) e i i := arg max f( x (k) ) i i Coordinate Descent v := arg max v T ( f( X (k) ))v v =,largest Eigenvector x (k) λ =/k X (k) = v () v (k) k Sparsity = k Rank = k v () v (k) = UU T /0

4 The Algorithm x (k+) := ( λ) +λ e i X (k+) := ( λ) X (k) +λ vv T i x (k) := arg max f( x (k) ) i i v := arg max v T ( f( X (k) ))v v = The Convergence O After steps the primal-dual error is. O After steps the primal-dual error is. [ Clarkson SODA '08 ] [ Hazan LATIN '08 ] Approximate Eigenvector computation Instead of v := arg max v T Mv v = M := f( X (k) ) it is enough to work with : v T Mv λ max v v = O v 3/0 Such a can be found by doing Lanzcos steps. Alternative: Power method

5 a side note How to solve general Semidefinite Programs? min f(x) X S n n X 0 Tr(X) = Optimization Version: Feasibility Version: min Tr(CX) Find X s.t. Tr(A i X) b i X S n n X 0 i [m] Tr(A i X) b i i [m + ] X S n n X 0 Tr(X) = f(x) := m+ M log Soft Max i= e M(Tr(A ix) b i ) By this trick, Hazan s algorithm is able to satisfy all constraints up to an error. /0

6 Matrix Factorizations and machine learning Y = Customer Movie UV T = The Netflix Prize: Movies Customers Ratings (Observed Entries %) = George Clooney plays in movie j v () v (k) k u () u (k) = Customer i is female 5/0

7 Matrix Factorizations for recommender systems Y u () u (k) v () v (k) [Short IEEE article, k Wikipedia: Netflix Prize] Factor vector Freddy Got Fingered Half Baked Julien Donkey-Boy Kill Bill: Vol. Freddy vs. Jason Natural Born Killers Road Trip I Heart Huckabees Scarface Punch-Drunk Love The Royal Tenenbaums The Longest Yard Being John Malkovich The Fast and the Furious Lost in Translation Belle de Jour Armageddon Catwoman The Wizard of Oz Citizen Kane Coyote Ugly Maid in Manhattan Runaway Bride Stepmom Sister Act Annie Hall Sophie s Choice Moonstruck The Way We Were The Sound of Music The Waltons: Season Factor vector 6/0 Figure 3. The first two vectors from a matrix decomposition of the Netflix Prize data. Selected movies are placed at the appropriate spot based on their factor vectors in two dimensions. The plot reveals distinct genres, including clusters of movies with strong female leads, fraternity humor, and quirky independent films.

8 Matrix Factorizations and machine learning Applications: Customer i Product j (Amazon, Netflix, Migros Cumulus etc...) Customer i Customer j (Symmetry?, k=?) i j Word i Document j (Search engines, Latent Semantic Analysis) many other applications (e.g. dimensionality reduction, clustering) 7/0

9 m Accuracy vs Model complexity n 3 Y UV T =: X 3 Error Model complexity min U,V f(uv T ) s.t. rank(uv T ) = k Low Rank = rank(x) s.t. U Fro + V Fro = t Low Norm = X [ Srebro NIPS '05 ] f(x) := ij S (X Y ) ij 8/0

10 3 3 Low Norm Matrix Factorization f(x) := ij S (X Y ) ij min U,V s.t. is equivalent to f(uv T ) U Fro + V Fro = Tr(UU T )+Tr(VV T ) = Tr(Z) = t n UU T VU T = m UV T 3 =: Z 3 VV T U (U T V T ) V min Z f(z) Z S (n+m) (n+m) Z 0 Tr(Z) =t Perfectly fits for Hazan s Algorithm. 9/0

3 3 3 3 Low Norm Matrix Factorization Have to be careful, principal EV is not always the largest EV! Add a constant to the diagonal in that case.

11 Low Norm Matrix Factorization Have to be careful, principal EV is not always the largest EV! Add a constant to the diagonal in that case. We need the largest Eigenvector of M := f( ) Z (k) f(z) := (Z Y ) ij ij S 3 =: Z 3 Largest Eigenvector of the bipartite weighted graph with adjacency matrix M. Customers n m Movies Hazan s new approximate SDP solver applies to Low Norm n Matrix mfactorization Easy to parallelize (Power method) 0 M = 3 3 Algorithm maintains sparsity structure of the given matrix, needs no additional memory 0 Speed is comparable to existing methods, and much better than generic SDP solvers 0/0

12 Thanks

Matrix Factorization and Collaborative Filtering

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Matrix Factorization and Collaborative Filtering MF Readings: (Koren et al., 2009)