Lecture: Matrix Completion

Size: px
Start display at page:

Download "Lecture: Matrix Completion"

Transcription

1 1/56 Lecture: Matrix Completion Acknowledgement: this slides is based on Prof. Jure Leskovec and Prof. Emmanuel Candes s lecture notes

2 Recommendation systems References: 2/56

3 The Netflix Prize 3/56 Training data 100 million ratings, 480,000 users, 17,770 movies 6 years of data: Test data Last few ratings of each user (2.8 million) Evaluation criterion: root mean squared error (RMSE): xi (r xi r xi )2 : r xi and r xi are the predicted and true rating of x on i Netflix Cinematch RMSE: Competition teams $1 million prize for 10% improvement on Cinematch

4 Netflix: evaluation 4/56

5 Collaborative Filtering: weighted sum model 5/56 ˆr xi = b xi + w ij (r xj b xj ) j N(i;x) baseline estimate for r xi : b xi = µ + b x + b i µ: overall mean rating b x : rating deviation of user x = (avg. rating of user x) - µ b i : (avg. rating of movie i) - µ We sum over all movies j that are similar to i and were rated by x w ij is the interpolation weight (some real number). We allow: j N(i,x) w ij 1 w ij models interaction between pairs of movies (it does not depend on user x) N(i; x): set of movies rated by user x that are similar to movie i

6 Finding weights w ij? Find w ij such that they work well on known (user, item) ratings: min w ij F(w) := x b xi + j N(i;x) w ij (r xj b xj ) r xi 2 Unconstrained optimization: quadratic function wij F(w) = 2 b xi + w ik (r xk b xk ) r xi (r xj b xj ) = 0 x k N(i;x) for j {N(i, x), i, x} Equivalent to solving a system of linear equations? Steepest gradient descent method: w k+1 = w k τ F(w) Conjugate gradient method 6/56

7 Latent factor models 7/56 SVD on Netflix data: R Q P T For now let s assume we can approximate the rating matrix R as a product of thin Q P T R has missing entries but let s ignore that for now! Basically, we will want the reconstruction error to be small on known ratings and we don t care about the values on the missing ones

8 Ratings as products of factors 8/56 How to estimate the missing rating of user x for item i? ˆr xi = q i p T x = f q if p xf, where q i is row i of Q and p x is column x of P T

9 SVD - Properties Theorem: SVD If A is a real m-by-n matrix, then there exits U = [u 1,..., u m ] R m m and V = [v 1,..., v n ] R n n such that U T U = I, V T V = I and U T AV = diag(σ 1,..., σ p ) R m n, p = min(m, n), where σ 1 σ 2... σ p 0. Eckart & Young, 1936 Let the SVD of A R m n be given in Theorem: SVD. If k < r = rank(a) and A k = k i=1 σ iu i v T i, then min rank(b)=k A B 2 = A A k 2 = σ k+1. 9/56

10 What is SVD? 10/56 A: Input data matrix U: Left singular vecs V: Right singular vecs Σ: Singular values SVD gives minimum reconstruction error (SSE!) (A ij [UΣV T ] ij ) 2 min U,V,Σ In our case, SVD on Netflix data: R Q P T, i.e., A = R, Q = U, P T = V T But, we are not done yet! R has missing entries! ij

11 Latent factor models 11/56 Minimize SSE on training data! Use specialized methods to find P, Q such that ˆr xi = q i p T x (r xi q i p T x ) 2 min P,Q (i,x) training We don t require cols of P, Q to be orthogonal/unit length P, Q map users/movies to a latent space Add regularization: [ min (r xi q i p T x ) 2 + λ p x P,Q (i,x) training x i λ is called regularization parameters q i 2 2 ]

12 Gradient descent method min P,Q [ F(P, Q) := (r xi q i p T x ) 2 + λ p x (i,x) training x i q i 2 2 ] Gradient decent: Initialize P and Q (using SVD, pretend missing ratings are 0) Do gradient descent: P k+1 P k τ P F(P k, Q k ), Q k+1 Q k τ Q F(P k, Q k ), where ( Q F) if = 2 x,i (r xi q i p T x )p xf + 2λq if. Here q if is entry f of row q i of matrix Q Computing gradients is slow when the dimension is huge 12/56

13 Stochastic gradient descent method 13/56 Observation: Letq if be entry f of row q i of matrix Q ( Q F) if = ( 2(rxi q i p T ) x )p xf + 2λq if = Q F(r xi ) x,i x,i ( P F) xf = ( 2(rxi q i p T ) x )q xf + 2λp if = P F(r xi ) x,i x,i Stochastic gradient decent: Instead of evaluating gradient over all ratings, evaluate it for each individual rating and make a step P P τ P F(r xi ) Q Q τ Q F(r xi ) Need more steps but each step is computed much faster

14 Latent factor models with biases 14/56 predicted models: ˆr xi = µ + b x + b i + q i p T x µ: overall mean rating, b x : Bias for user x, b i : Bias for movie i New model: min P,Q,b x,b i (r xi (µ + b x + b i + q i p T x )) 2 (i,x) training [ +λ p x ] q i b x b i 2 2 x i Both biases b x, b i as well as interactions q i, p x are treated as parameters (we estimate them) Add time dependence to biases: ˆr xi = µ + b x (t) + b i (t) + q i p T x

15 Netflix: performance 15/56

16 Netflix: performance 16/56

17 General matrix completion 17/56

18 18/56 Matrix completion Matrix M R n 1 n 2 Observe subset of entries Can we guess the missing entries?

19 19/56 Which algorithm? Hope: only one low-rank matrix consistent with the sampled entries Recovery by minimum complexity minimize rank(x) subject to X ij = M ij, (i, j) Ω Problem This is NP-hard Doubly exponential in n (?)

20 Nuclear-norm minimization Singular value decomposition r X = σ k u k v k k=1 {σ k }: singular values, {u k }, {v k }: singular vectors Nuclear norm (σ i (X) is ith largest singular value of X) X = n σ i (X) i=1 Heuristic minimize X subject to X ij = M ij, (i, j) Ω Convex relaxation of the rank minimization program 20/56

21 Connections with compressed sensing General setup Rank minimization Convex relaxation minimize rank(x) minimize X subject to A(X) = b subject to A(X) = b Suppose X = diag(x), x R n rank(x) = i 1 (x i 0) = x l0 X = i x i = x l1 Rank minimization Convex relaxation minimize x l0 minimize x l1 subject to Ax = b subject to Ax = b This is compressed sensing! 21/56

22 Correspondence 22/56 parsimony concept cardinality rank Hilbert Space norm Euclidean Frobenius sparsity inducing norm l 1 nuclear dual norm l operator norm additivity disjoint support orthogonal row and column spaces convex optimization linear programming semidefinite programming Table: From Recht Parrilo Fazel (08)

23 Semidefinite programming (SDP) Special class of convex optimization problems Relatively natural extension of linear programming (LP) Efficient numerical solvers (interior point methods) LP: x R n minimize c, x subject to Ax = b x 0 SDP: X R n n minimize C, X subject to A k, X = b k X 0 Standard inner product: C, X = trace(c X) 23/56

24 SOCP/SDP Duality (P) min c x s.t. Ax = b, x Q 0 (D) max b y s.t. A y + s = c, s Q 0 (P) min C, X s.t. A 1, X = b 1... A m, X = b m X 0 (D) max b y s.t. y i A i + S = C i S 0 Strong duality If p >, (P) is strictly feasible, then (D) is feasible and p = d If d < +, (D) is strictly feasible, then (P) is feasible and p = d If (P) and (D) has strictly feasible solutions, then both have optimal solutions. 24/56

25 Semidefinite program (D) min s.t. b y y 1 A y m A m C A i, C S k, multiplier is matrix X S k Lagrangian L(y, X) = b y + X, y 1 A y m A m C dual function g(x) = inf y L(y, X) = { C, X, A i, X = b i otherwise The dual of (D) is min C, X s.t. A i, X = b i, X 0 p = d if primal SDP is strictly feasible. 25/56

26 SDP Relaxtion of Maxcut min x Wx s.t. x 2 i = 1 = max 1 ν s.t. W + diag(ν) 0 min Tr(WX) s.t. X ii = 1 X 0 a nonconvex problem; feasible set contains 2n discrete points interpretation: partition {1,..., n} in two sets; W ij is cost of assigning i, j to the same set;; W ij is cost of assigning to different sets dual function ( g(ν) = inf x x Wx + i ν i (x 2 i 1) ) = inf x (W + diag(ν))x 1 ν x { 1 ν W + diag(ν) 0 = otherwise 26/56

27 Positive semidefinite unknown: SDP formulation 27/56 Suppose unknown matrix X is positive semidefinite n min σ i (X) min trace(x) i=1 s.t. X ij = M ij (i, j) Ω X 0 s.t. X ij = M ij (i, j) Ω X 0 Trace heuristic: Mesbahi & Papavassilopoulos (1997), Beck & D Andrea (1998)

28 General SDP formulation 28/56 Let X R m n. For a given norm, the dual norm d is defined as X d := sup{ X, Y : Y R m n, Y 1} Nuclear norm and spectral norms are dual: X := σ 1 (X), X = i σ i (X). (P) max Y X, Y s.t. Y 2 1 max 2 X, Y Y [ ] Im Y s.t. 0 Y I n [ ] 0 X min Z, Z X 0 s.t. Z 1 = I m Z 2 = I n [ ] Z1 Z Z = 3 Z3 0 Z 2

29 General SDP formulation The Lagrangian dual problem is: max min W 1,W 2 Z 0 [ ] 0 X Z, X + Z 0 1 I m, W 1 + Z 2 I n, W 2 strong duality after a scaling of 1/2 and change of variables X to X (D) minimize 1 2 (trace(w 1) + trace(w 2 )) subject to [ ] W1 X X 0 W 2 Optimization variables: W 1 R n 1 n 1, W 2 R n 2 n 2. Proposition 2.1 in "Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization", Benjamin Recht, Maryam Fazel, Pablo A. Parrilo 29/56

30 General SDP formulation 30/56 Nuclear norm minimization min X s.t. A(X) = b max b y s.t. A (y) 1 SDP Reformulation min 1 2 (trace(w 1) + trace(w 2 )) s.t. A(X) = b [ ] W1 X X 0 W 2 max b y [ I A s.t. ] (y) (A (y)) 0 I

31 Matrix recovery 31/56 M = 2 σ k u k u k, k=1 u 1 = (e 1 + e 2 )/ 2, u 2 = (e 1 e 2 )/ M = Cannot be recovered from a small set of entries

32 32/56 Rank-1 matrix M = xy M ij = x i y j If single row (or column) is not sampled recovery is not possible What happens for almost all sampling sets? Ω subset of m entries selected uniformly at random

33 33/56 References Jianfeng Cai, Emmanuel Candes, Zuowei Shen, Singular value thresholding algorithm for matrix completion Shiqian Ma, Donald Goldfarb, Lifeng Chen, Fixed point and Bregman iterative methods for matrix rank minimization Zaiwen Wen, Wotao Yin, Yin Zhang, Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm Onureena Banerjee, Laurent El Ghaoui, Alexandre d Aspremont, Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data Zhaosong Lu, Smooth optimization approach for sparse covariance selection

34 34/56 Matrix Rank Minimization Given X R m n, A : R m n R p, b R p, we consider the matrix rank minimization problem: min rank(x), s.t. A(X) = b matrix completion problem: min rank(x), s.t. X ij = M ij, (i, j) Ω nuclear norm minimization: min X s.t. A(X) = b where X = i σ i and σ i = ith singular value of matrix X.

35 Quadratic penalty framework 35/56 Unconstrained Nuclear Norm Minimization: min F(X) := µ X A(X) b 2 2. Optimality condition: 0 µ X + A (A(X ) b), where X = {UV + W : U W = 0, WV = 0, W 2 1}. Linearization approach (g is the gradient of 1 2 A(X) b 2 2 ): X k+1 := arg min X = arg min X µ X + g k, X X k + 1 2τ X Xk 2 F µ X + 1 2τ X (Xk τg k ) 2 F

36 Matrix Shrinkage Operator 36/56 For a matrix Y R m n, consider: The optimal solution is: SVD: Y = UDiag(σ)V min X R m n ν X X Y 2 F. X := S ν (Y) = UDiag(s ν (σ))v, Thresholding operator: { xi ν, if x s ν (x) := x, with x i = i ν > 0 0, o.w.

37 Fixed Point Method (Proximal gradient method) 37/56 Fixed Point Iterative Scheme { Y k = X k τa (A(X k ) b) X k+1 = S τµ (Y k ). Lemma: Matrix shrinkage operator is non-expansive. i.e., S ν (Y 1 ) S ν (Y 2 ) F Y 1 Y 2 F. Theorem: The sequence {X k } generated by the fixed point iterations converges to some X X, where X is the optimal solution set.

38 38/56 SVT Linearized Bregman method: V k+1 := V k τa (A(X k ) b) X k+1 := S τµ (V k+1 ) Convergence to min τ X X 2 F, s.t. A(X) = b

39 Accelerated proximal gradient (APG) method 39/56 Complexity of the fixed point method: APG algorithm (t 1 = t 0 = 1): Complexity: F(X k ) F(X ) L f X 0 X 2 2k Y k = X k + tk 1 1 t k (X k X k 1 ) G k = Y k (τ k ) 1 A (A(Y k ) b) X k+1 = S τ k(g k ), t k+1 = (t k ) 2 F(X k ) F(X ) 2L f X 0 X 2 (k + 1) 2 2

40 Low-rank factorization model 40/56 Finding a low-rank matrix W so that P Ω (W M) 2 F or the distance between W and {Z R m n, Z ij = M ij, (i, j) Ω} is minimized. Any matrix W R m n with rank(w) K can be expressed as W = XY where X R m K and Y R K n. New model min X,Y,Z 1 2 XY Z 2 F s.t. Z ij = M ij, (i, j) Ω Advantage: SVD is no longer needed! Related work: the solver OptSpace based on optimization on manifold

41 41/56 Nonlinear Gauss-Seidel scheme First variant of alternating minimization: X + ZY ZY (YY ), Y + (X + ) Z (X+X + ) (X+Z), Z + X + Y + + P Ω (M X + Y + ). Let P A be the orthogonal projection onto the range space R(A) X + Y + = ( X + (X+X + ) X+) Z = PX+ Z One can verify that R(X + ) = R(ZY ). X + Y + = P ZY Z = ZY (YZ ZY ) (YZ )Z. idea: modify X + or Y + to obtain the same product X + Y +

42 41/56 Nonlinear Gauss-Seidel scheme Second variant of alternating minimization: X + ZY, Y + (X + ) Z (X+X + ) (X+Z), Z + X + Y + + P Ω (M X + Y + ). Third variant of alternating minimization: V = orth(zy ) X + V, Y + V Z, Z + X + Y + + P Ω (M X + Y + ).

43 42/56 Sparse and low-rank matrix separation Given a matrix M, we want to find a low rank matrix W and a sparse matrix E, so that W + E = M. Convex approximation: min W,E W + µ E 1, s.t. W + E = M Robust PCA

44 Video separation 43/56 Partition the video into moving and static parts

45 ADMM Convex approximation: min W,E W + µ E 1, s.t. W + E = M Augmented Lagrangian function: L(W, E, Λ) := W + µ E 1 + Λ, W + E M + 1 2β W + E M 2 F Alternating direction Augmented Lagrangian method W j+1 := arg min W L(W, Ej, Λ j ), E j+1 := arg min E L(W j+1, E, Λ j ), Λ j+1 := Λ j + γ β (Wj+1 + E j+1 M). 44/56

46 W-subproblem 45/56 Convex approximation: W j+1 := arg min L(W, W Ej, Λ j ) = arg min W + 1 W ( M E j βλ j) 2 W 2β F = S β (M E j βλ j ) := UDiag(s β (σ))v SVD: M E j βλ j = UDiag(σ)V Thresholding operator: { xi ν, if x s ν (x) := x, with x i = i ν > 0 0, o.w.

47 E-subproblem 46/56 Convex approximation: W j+1 := arg min E L(W j+1, E, Λ j ) = arg min E E βµ = s βµ (M W j+1 βλ j ) E ( M W j+1 βλ j) 2 F s ν (y) : = arg min x R 2 x y 2 2 { = y νsgn(y), if y > ν 0, otherwise

48 Low-rank factorization model for matrix separation 47/56 Consider the model min Z,S S 1 s.t. Z + S = D, rank(z) K Low-rank factorization: Z = UV min Z D 1 s.t. UV Z = 0 U,V,Z Only the entries D ij, (i, j) Ω, are given. P Ω (D) is the projection of D onto Ω. New model min U,V,Z P Ω (Z D) 1 s.t. UV Z = 0 Advantage: SVD is no longer needed!

49 ADMM Consider: min U,V,Z P Ω (Z D) 1 s.t. UV Z = 0 Introduce the augmented Lagrangian function L β (U, V, Z, Λ) = P Ω (Z D) 1 + Λ, UV Z + β 2 UV Z 2 F, Alternating direction augmented Lagrangian framework (Bregman): U j+1 := arg min L β (U, V j, Z j, Λ j ), U R m k V j+1 := arg min V R k n L β(u j+1, V, Z j, Λ j ), Z j+1 := arg min Z R m n L β(u j+1, V j+1, Z, Λ j ), Λ j+1 := Λ j + γβ(u j+1 V j+1 Z j+1 ). 48/56

50 ADMM subproblems 49/56 Let B = Z Λ/β, then U + = BV (VV ) and V + = (U +U + ) U +B Since U + V + = U + (U +U + ) U +B = P U+ B, then: Q := orth(bv ), U + = Q and V + = Q B Variable Z: ( P Ω (Z + ) = P Ω (S U + V + D + Λ β, 1 ) ) + D β ( P Ω c(z + ) = P Ω c U + V + + Λ ) β

51 Nonnegative matrix factorization completion 50/56 Model problem: min P Ω (XY M) F s.t. X 0, Y 0, Augmented Lagrangian function: min 1 2 XY Z 2 F s.t. X = U, Y = V, U 0, V 0, P Ω (Z M) = 0 L A (X, Y, Z, U, V, Λ, Π) = 1 2 XY Z 2 F + Λ (X U) where A B := i,j a ijb ij. +Π (Y V) + α 2 X U 2 F + β 2 Y V 2 F,

52 51/56 ADMM X k+1 = (Z k Yk T + αu k Λ k )(Y k Yk T + αi) 1, Y k+1 = (Xk+1X T k+1 + βi) 1 (Xk+1Z T k + βv k Π k ), Z k+1 = X k+1 Y k+1 + P Ω (M X k+1 Y k+1 ), U k+1 = P + (X k+1 + Λ k /α), V k+1 = P + (Y k+1 + Π k /β), Λ k+1 = Λ k + γα(x k+1 U k+1 ), Π k+1 = Π k + γβ(y k+1 V k+1 ), (3a) (3b) (3c) (3d) (3e) (3f) (3g) where γ (0, 1.618) and (P + (A)) ij = max{a ij, 0}.

53 Sparse covariance selection (A. d Aspremont) 52/56 We estimate a covariance matrix Σ from empirical data Infer independence relationships between variables Given m + 1 observations x i R n on n random variables, compute S := 1 m+1 m i=1 (x i x)(x i x) Choose a symmetric subset I of matrix coefficients and denote by J the complement Choose a covariance matrix ˆΣ such that ˆΣ ij = S ij for all (i, j) I = 0 for all (i, j) J ˆΣ 1 ij Benefits: maximum entropy, maximum likelihood, existence and uniqueness Applications: Gene expression data, speech recoginition and finance

54 Maximum likelihood estimation 53/56 Consider estimation: Convex relaxations: whose dual problem is: max X S n log det X Tr(SX) ρ X 0 max log det X Tr(SX) ρ X 1, X S n max log det W s.t. W S λ

55 APG Zhaosong Lu (smooth optimization approach for sparse covariance selection) consider max log det X Tr(SX) ρ X 1 s.t. X := {X S n : βi X αi}, which is equivalent to (U := {U S n : U ij 1, ij}) max X X min U U log det X S + ρu, X Let f (U) := max X X log det X S + ρu, X log det X is strongly concave on X f (U) is continuous differentiable f (U) is Lipschitz cont. with L = ρβ 2 Therefore, APG can be applied to the dual problem min f (U) U U 54/56

56 Extension 55/56 Consider Assume: Then max x X g(x) := min φ(x, u) u U φ(x, u) is a cont. fun. which is strictly concave in x X for every fixed u U, and convex diff. in u U for every fixed x X. Then f (u) := max x X φ(x, u) is diff. f (u) is Lipschitz cont. the primal and the dual min u U f (u) are both solvable and have the same optimal value; Nesterov s smooth minimization approach can be applied to the dual

57 56/56 Nesterov s smoothing technique Consider max x X min φ(x, u) u U Question: What if the assumptions do not hold? Add a strictly convex function µd(u) to the obj. fun. g(u) is differentiable g(u) := arg min φ(x, u) + µd(u) u U Apply Nesterov s smooth minimization Complexity of finding a ɛ-suboptimal point: O( 1 ɛ ) iterations Other smooth technique?

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Low-Rank Factorization Models for Matrix Completion and Matrix Separation for Matrix Completion and Matrix Separation Joint work with Wotao Yin, Yin Zhang and Shen Yuan IPAM, UCLA Oct. 5, 2010 Low rank minimization problems Matrix completion: find a low-rank matrix W R m n so

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm Zaiwen Wen, Wotao Yin, Yin Zhang 2010 ONR Compressed Sensing Workshop May, 2010 Matrix Completion

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS. The original slides can be accessed at: www.mmds.org J. Leskovec,

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

The convex algebraic geometry of rank minimization

The convex algebraic geometry of rank minimization The convex algebraic geometry of rank minimization Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology International Symposium on Mathematical Programming

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

Collaborative Filtering Matrix Completion Alternating Least Squares

Collaborative Filtering Matrix Completion Alternating Least Squares Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016

More information

Relaxations and Randomized Methods for Nonconvex QCQPs

Relaxations and Randomized Methods for Nonconvex QCQPs Relaxations and Randomized Methods for Nonconvex QCQPs Alexandre d Aspremont, Stephen Boyd EE392o, Stanford University Autumn, 2003 Introduction While some special classes of nonconvex problems can be

More information

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics

More information

Matrix Factorizations: A Tale of Two Norms

Matrix Factorizations: A Tale of Two Norms Matrix Factorizations: A Tale of Two Norms Nati Srebro Toyota Technological Institute Chicago Maximum Margin Matrix Factorization S, Jason Rennie, Tommi Jaakkola (MIT), NIPS 2004 Rank, Trace-Norm and Max-Norm

More information

Low-Rank Matrix Recovery

Low-Rank Matrix Recovery ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems

S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems Dingtao Peng Naihua Xiu and Jian Yu Abstract The affine rank minimization problem is to minimize the rank of

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

Collaborative Filtering

Collaborative Filtering Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison From Compressed Sensing to Matrix Completion and Beyond Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison Netflix Prize One million big ones! Given 100 million ratings on a

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

Frist order optimization methods for sparse inverse covariance selection

Frist order optimization methods for sparse inverse covariance selection Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Projection methods to solve SDP

Projection methods to solve SDP Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method

More information

First-order methods for structured nonsmooth optimization

First-order methods for structured nonsmooth optimization First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University

More information

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization Agenda Applications of semidefinite programming 1 Control and system theory 2 Combinatorial and nonconvex optimization 3 Spectral estimation & super-resolution Control and system theory SDP in wide use

More information

Matrix Completion: Fundamental Limits and Efficient Algorithms

Matrix Completion: Fundamental Limits and Efficient Algorithms Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION JUNFENG YANG AND XIAOMING YUAN Abstract. The nuclear norm is widely used to induce low-rank solutions for

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization Forty-Fifth Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 26-28, 27 WeA3.2 Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization Benjamin

More information

A Singular Value Thresholding Algorithm for Matrix Completion

A Singular Value Thresholding Algorithm for Matrix Completion A Singular Value Thresholding Algorithm for Matrix Completion Jian-Feng Cai Emmanuel J. Candès Zuowei Shen Temasek Laboratories, National University of Singapore, Singapore 117543 Applied and Computational

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Matrix Rank Minimization with Applications

Matrix Rank Minimization with Applications Matrix Rank Minimization with Applications Maryam Fazel Haitham Hindi Stephen Boyd Information Systems Lab Electrical Engineering Department Stanford University 8/2001 ACC 01 Outline Rank Minimization

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Maximum Margin Matrix Factorization

Maximum Margin Matrix Factorization Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola

More information

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers

More information

Fixed point and Bregman iterative methods for matrix rank minimization

Fixed point and Bregman iterative methods for matrix rank minimization Math. Program., Ser. A manuscript No. (will be inserted by the editor) Shiqian Ma Donald Goldfarb Lifeng Chen Fixed point and Bregman iterative methods for matrix rank minimization October 27, 2008 Abstract

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

A Simple Algorithm for Nuclear Norm Regularized Problems

A Simple Algorithm for Nuclear Norm Regularized Problems A Simple Algorithm for Nuclear Norm Regularized Problems ICML 00 Martin Jaggi, Marek Sulovský ETH Zurich Matrix Factorizations for recommender systems Y = Customer Movie UV T = u () The Netflix challenge:

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Adaptive First-Order Methods for General Sparse Inverse Covariance Selection

Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Zhaosong Lu December 2, 2008 Abstract In this paper, we consider estimating sparse inverse covariance of a Gaussian graphical

More information

Stochastic dynamical modeling:

Stochastic dynamical modeling: Stochastic dynamical modeling: Structured matrix completion of partially available statistics Armin Zare www-bcf.usc.edu/ arminzar Joint work with: Yongxin Chen Mihailo R. Jovanovic Tryphon T. Georgiou

More information

Guaranteed Rank Minimization via Singular Value Projection

Guaranteed Rank Minimization via Singular Value Projection 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 Guaranteed Rank Minimization via Singular Value Projection Anonymous Author(s) Affiliation Address

More information

Lecture: Examples of LP, SOCP and SDP

Lecture: Examples of LP, SOCP and SDP 1/34 Lecture: Examples of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:

More information

SOLVING A LOW-RANK FACTORIZATION MODEL FOR MATRIX COMPLETION BY A NONLINEAR SUCCESSIVE OVER-RELAXATION ALGORITHM

SOLVING A LOW-RANK FACTORIZATION MODEL FOR MATRIX COMPLETION BY A NONLINEAR SUCCESSIVE OVER-RELAXATION ALGORITHM SOLVING A LOW-RANK FACTORIZATION MODEL FOR MATRIX COMPLETION BY A NONLINEAR SUCCESSIVE OVER-RELAXATION ALGORITHM ZAIWEN WEN, WOTAO YIN, AND YIN ZHANG CAAM TECHNICAL REPORT TR10-07 DEPARTMENT OF COMPUTATIONAL

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 6 (Conic optimization) 07 Feb, 2013 Suvrit Sra Organizational Info Quiz coming up on 19th Feb. Project teams by 19th Feb Good if you can mix your research

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information