A Simple Algorithm for Nuclear Norm Regularized Problems

Size: px

Start display at page:

Download "A Simple Algorithm for Nuclear Norm Regularized Problems"

Rachel Craig
5 years ago
Views:

1 A Simple Algorithm for Nuclear Norm Regularized Problems ICML 00 Martin Jaggi, Marek Sulovský ETH Zurich

2 Matrix Factorizations for recommender systems Y = Customer Movie UV T = u () The Netflix challenge: Movies Customers Ratings (Observed Entries %) u (k) = Angelina Jolie plays in movie j v () v (k) k = Customer i is male

3 Matrix Factorizations in machine learning Applications: Customer i Product j (Amazon, Netflix, etc...) Customer i Customer j i j 3 3 Word i Document j (Search engines, Latent Semantic Analysis) many other applications (e.g. feature generation, dimensionality reduction, clustering)

4 m Regularization n Y UV T =: X 3 Error (Loss) Low Rank Model complexity (Regularization) Low Norm +µ rank(x) +µ X Trade-off variant min X f(x) s.t. rank(x) k s.t. X k Constrained variant f(x) := ij S (X Y ) ij Nuclear norm regularized problems

5 Existing Methods UV T =: X f( ) convex Optimization problem: Existing ML methods solve: min f(x) X s.t. constraint(x) min U,V f(uv T ) s.t. constraint(u, V ) not convex Nuclear norm case: X t U Fro + V Fro t Local minima

6 3 Convex optimization f( ) convex U (U T V T )= V n UU T VU T X T m UV X T =: Z 3 VV T Optimization problem: Our method solves: min f(x) X s.t. constraint(x) min Z Sym n+m Z0 s.t. f(z) constraint(z) convex Nuclear norm case: X t Tr(Z) =t No local minima

7 Sparse Approximation The Problem f( ) convex, differentiable min f(x) x R n x 0 T x = min f(z) Z Sym n n Z 0 Tr(Z) =

8 The Problem min f(x) x R n x 0 T x = min f(z) Z Sym n n Z 0 Tr(Z) = The Algorithm [ Clarkson SODA '08 ] [ Hazan LATIN '08 ] x (k+) := ( λ) +λ x (k) e i Z (k+) := ( λ) Z (k) +λvv T i := arg max f( x (k) ) i i Coordinate descent v := arg max v T ( f( Z (k) ))v v =,largest eigenvector x (k) λ =/k Z (k) = v () v (k) k Sparsity = k Rank = k v () v (k) No projection steps!

9 The Algorithm x (k+) := ( λ) +λ The Convergence i O After steps the primal-dual error is. x (k) := arg max f( x (k) ) i i e i := ( λ) Z (k) +λvv T Z (k+) := arg max v T ( f( v v = O Z (k) After steps the primal-dual error is. ))v v Approximate eigenvector computation Instead of v := arg max v T Mv Z (k) v = it is enough to work with : v T Mv λ max v v = O Such a can be found by doing Lanzcos steps. Alternative: Power method M := f( )

10 3 3 Low Norm Matrix Factorization f(z) := ij S (Z Y ) ij We need the largest eigenvector of M := f( ) Z (k) =: Z 3 Power method: n m v := Mv Mv M = computations correspond to Simon Funk s method

11 Comparison MMMF, Alternating gradient descent Singular Value Thresholding Methods Convergence guarantee Step complexity Convex Control on the rank k(n + m) O(/ ε) compute exact, full SVD Our Method O(/ε) compute approx. eigenvector * Simon Funk s / SVD++ matrix-vector multiplication * different optimization problem

12 Experiments > 5x faster than existing Singular Value Thresholding methods such as [ Toh & Yun `09, Mazumder et al `09, Ji & Ye ICML `09,... ] Scales well to larger size problems such as the Netflix data RMSE MovieLens 0M rb /k, test best on line segm., test gradient interp., test /k, train best on line segm., train gradient interp., train k Prediction performance is - comparable to the best non-linear MMMF methods such as [ Lawrence & Urtasun ICML `09 ] - slightly worse than the customly engineered methods for Netflix. Sensitivity on the regularization parameter: RMSE test k= Trace regularization t

13 Conclusions Overall computational cost is about the same as a single SVD First algorithm for nuclear norm optimization which does not need SVD as an internal computation First Simon-Funk-type algorithm with a convergence guarantee Easy to implement and to parallelize, any approx. eigenvector method of choice can be used internally

14 Thanks

Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank. Mittagseminar Martin Jaggi, Oct

Approximate SDP solvers, Matrix Factorizations, the Netflix Prize, and PageRank Mittagseminar Martin Jaggi, Oct 6 009 Sparse Approximation The Problem f( ) convex min f(x) x R n x 0 T x = min f(x) X S