ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Size: px
Start display at page:

Download "ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference"

Transcription

1 ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring /45

2 Outline Low-rank matrix completion and recovery Nuclear norm minimization (last lecture) RIP and low-rank matrix recovery Matrix completion Algorithms for nuclear norm minimization Non-convex methods (this lecture) Global landscape Spectral methods (Projected) gradient descent 2/45

3 Why nonconvex? Consider completing an n n matrix, with rank r: minimize X P Ω (X M) 2 F s.t. rank(x) r, where r n. The size of observation Ω is about nrpolylogn; The degrees of freedom in X is about nr; Question: Can we develop algorithms that work with computational and memory complexity that nearly linear in n? This means that we don t even want to store the matrix X which takes n 2 storage. A nonconvex approach will store and update a low-dimensional representation of X throughout the execution of the algorithm. 3/45

4 Convex vs. nonconvex minimize x f(x) convex vs. nonconvex 4/45

5 Prelude: low-rank matrix approximation an optimization perspective 5/45

6 Low-rank matrix approximation / PCA Given M R n n (not necessarily low-rank), solve the low-rank approximation problem (best rank-r approximation): M = argmin X X M 2 F s.t. rank(x) r. this is a nonconvex optimization problem. The solution is known as the Eckart-Young theorem: denote the SVD of M = n i=1 σ i u i v i, where σ i s are in a descending order; then nonconvex, but tractable. r M = σ i u i vi. i=1 6/45

7 Optimization viewpoint Let us factorize X = UV, where U, V R n r. Our problem is equivalent to minimize U,V f(u, V ) := UV M 2 F. The size of U, V are of O(nr), which is much smaller than X; Identifiability issues: for any orthonormal R R r r, we have UV = (αur)(α 1 V R). If (U, V ) is a global minimizer (..), so does (αur, α 1 V R). Question: what does f(u, V ) look like (landscape)? (we already found its global minima.) 7/45

8 The PSD case For simplicity, consider the PSD case. Let M be PSD, so that M = n i=1 σ i u i u i. Let X = UU, where U R n r. We re interested in the landscape of f(u) := 1 4 UU M 2 F. Identifiability: for any orthonormal R R r r, we have UU = (UR)(UR). make the exposition even simpler: set r = 1. f(u) = 1 4 uu M 2 F. 8/45

9 Take f(u) = uu Good news: benign landscape [ ] 2. F Global optima: x = ± local minima. [ ] 1, strict saddle x = 1 [ ] 0. No spurious 0 9/45

10 Critical points Definition 7.1 A first-order critical point (stationary point) satisfies f(u) = 0. Figure credit: Li et al., /45

11 Critical points of f(u) Any u R n satisfies f(u) = (uu M)u = 0. Mu = u 2 2u u aligns with eigenvectors of M. or u = 0. 11/45

12 Critical points of f(u) Any u R n satisfies f(u) = (uu M)u = 0. Mu = u 2 2u u aligns with eigenvectors of M. or u = 0. Since Mu i = σ i u i, the set of critical points are given as { σ i u i, i = 1,..., n}. 11/45

13 Categorization of critical points Need to examine the Hessian: 2 f(u) := 2uu + u 2 2I M. Plug in the non-zero critical points: ũ k := σ k u k, 2 f(ũ k ) = 2σ k u k u k + σ k I M ( = 2σ k u k u n k + σ k u i u i i=1 ) n σ i u i u i i=1 = i k(σ k σ i )u i u i + 2σ k u k u k Assume σ 1 > σ 2 >... > σ n > 0: k = 1: 2 f(ũ 1 ) 0 local minima 1 < k n: λ min ( 2 f(ũ k )) < 0, λ max ( 2 f(ũ k )) > 0, strict saddle u = 0: 2 f(0) 0 local maxima 12/45

14 Categorization of critical points Need to examine the Hessian: 2 f(u) := 2uu + u 2 2I M. Plug in the non-zero critical points: ũ k := σ k u k, 2 f(ũ k ) = 2σ k u k u k + σ k I M ( = 2σ k u k u n k + σ k u i u i i=1 ) n σ i u i u i i=1 = i k(σ k σ i )u i u i + 2σ k u k u k Assume σ 1 > σ 2... σ n 0: k = 1: 2 f(ũ 1 ) 0 local minima 1 < k n: λ min ( 2 f(ũ k )) < 0, λ max ( 2 f(ũ k )) > 0, strict saddle u = 0: 2 f(0) 0 strict saddle 13/45

15 Summary f(u) := 1 4 UU M 2 F, U R n r, If σ r > σ r+1, all local minima are global: U contains the top-r eigenvectors up to an orthonormal transformation; strict saddle points: all stationary points are saddle points except the global optimum. 14/45

16 Consider linear measurements: Undersampled regime y = A(M), y R m, m n 2 where M = U 0 U0 simplicity). Rn n is rank-r, r n, and PSD (for The loss function we consider: f(u) := 1 4 A(UU M) 2 F. If E[A A] = I, then E[f(U)] = 1 4 UU M 2 F. Does f(u) inherit the benign landscape? 15/45

17 Landscape preserving under RIP Recall the definition of RIP: Definition 7.2 The rank-r restricted isometry constants δ r is the smallest quantity (1 δ r ) X 2 F A(X) 2 F (1 + δ r ) X 2 F, X : rank(x) r 16/45

18 Landscape preserving under RIP Recall the definition of RIP: Definition 7.2 The rank-r restricted isometry constants δ r is the smallest quantity (1 δ r ) X 2 F A(X) 2 F (1 + δ r ) X 2 F, X : rank(x) r Theorem 7.3 (Bhojanapalli et al. 2016, Ge et al. 2017) If A satisfies the RIP with δ 2r < 1 10, then f(u) satisfies all local min are global: for any local minimum U of f(u), it satisfies UU = M; strict saddle points: for non-local min critical point U, it satisfies λ min [ 2 f(u)] 2 5 σ r. 16/45

19 Proof of Theorem 7.3 when r = 1 Without loss of generality, assume M = u 0 u 0, and σ 1 = 1. Step 1: check all the critical points: m f(u) = A i, uu u 0 u 0 A i u = 0 }{{} i=1 M Step 2: verify the Hessian at all the critical points: m 2 f(u) = A i, uu u 0 u 0 A i + 2A i uu A i i=1 17/45

20 Proof of Theorem 7.3 when r = 1 Proof: Assume u is first-order optimal. Consider the descent direction: = u u 0 : 2 f(u) = = m [ A i, uu u 0 u 0 A i, + 2 A i, u 2] i=1 m [ A i, 2 3 A i, uu u 0 u 0 2]. i=1 where we have used the first order optimality condition. 18/45

21 Proof of Theorem 7.3 when r = 1 By the RIP property: m 2 f(u) = [ A i, (u u 0 )(u u 0 ) 2 3 A i, uu u 0 u 0 2] i=1 (1 + δ) (u u 0 )(u u 0 ) 2 F 3(1 δ) uu u 0 u 0 2 F [2(1 + δ) 3(1 δ)] uu u 0 u 0 2 F (1 5δ) uu u 0 u 0 2 F where we use (u u 0 )(u u 0 ) 2 F 2 uu u 0 u 0 2 F. 19/45

22 Landscape without RIP In matrix completion, we need to regularize the loss function by promoting incoherent solutions: set m Q(U) = ( e i U 2 α) 4 + i=1 where α is some regularization parameter, and z + = max{z, 0}. 20/45

23 Landscape without RIP In matrix completion, we need to regularize the loss function by promoting incoherent solutions: set m Q(U) = ( e i U 2 α) 4 + i=1 where α is some regularization parameter, and z + = max{z, 0}. Consider the loss function f(u) = 1 p P Ω(UU M) 2 F + λq(u) where λ is a regularization parameter. adding Q(U) doesn t affect the global optimizer if α is set properly. 20/45

24 MC doesn t have spurious local minima Theorem 7.4 (Ge et al, 2016) If p µ4 r 6 log n n, α 2 = Θ( µrσ 1 n ) and λ = Θ( n µr ), then with probability at least 1 n 1, all local min are global: for any local minimum U of f(u), it satisfies UU = M; saddle points that are not local minima are strict saddle points. saddle-point escaping algorithms can be used to guarantee convergence to local minima, which in our problem are global minima. active research area for constructing saddle-point escaping algorithms: (perturbed) gradient descent, trust-region methods, etc... 21/45

25 Spectral methods: a one-shot approach 22/45

26 Setup Consider M R n n (square case for simplicity) rank(m) = r n The thin Singular value decomposition (SVD) of M: where Σ = M = UΣV }{{ } = (2n r)r degrees of freedom σ 1... σ r r σ i u i vi T i=1 contain all singular values {σ i }; U := [u 1,, u r ], V := [v 1,, v r ] consist of singular vectors 23/45

27 Signal + noise (i, j) Ω independently with prob. p One can write observation P Ω (M) as 1 p P Ω(M) = }{{} M + 1 p P Ω(M) M signal }{{} noise [ ] Noise has mean zero: E 1 p P Ω(M) = M 24/45

28 Low-rank denoising 1 p P Ω(M) = }{{} M + 1 p P Ω(M) M low-rank signal }{{} :=E (zero-mean noise) Algorithm 7.1 Spectral method ˆM best rank-r approximation of 1 p P Ω(M) The spectral method can be solved via power methods or Lanczos methods, and we don t need to realize the matrix 1 p P Ω(M). 25/45

29 Histograms of singular values of P Ω (M) A random rank-3 matrix M with p = Fig. credit: Keshavan, Montanari, Oh 10 26/45

30 Performance of spectral methods Theorem 7.5 (Keshavan, Montanari, Oh 10) Suppose number of observed entries m obeys m n log n. Then M M F max i,j M i,j nr log 2 n M 1 F n M F m, }{{} :=ν ν reflects whether energy of M is spread out, M i,j µr/n; When m ν 2 n log 2 n, estimate ˆM is very close to truth 1 Degrees of freedom nr nearly-optimal sample complexity for incoherent matrices 1 The logarithmic factor can be improved. 27/45

31 Perturbation bounds To ensure M is good estimate, it suffices to control noise E Lemma 7.6 Suppose rank(m) = r. For any perturbation E, P r (M + E) M P r (M + E) M F 2 E 2 2r E where P r (X) is best rank-r approximation of X. 28/45

32 Prior on matrix perturbation theory Lemma 7.7 (Weyl s inequality, 1912) Let M, E be n n matrices. Then σ k (M + E) σ k (M) E, k = 1,..., n. Proof: Invoke the Courant-Fisher Minimax Characterization: σ k (A) = max min dim(s)=k 0 v S Av 2 v 2. 29/45

33 Proof of Lemma 7.6 By matrix perturbation theory, P r (M + E) M triangle inequality Weyl s inequality P r (M + E) (M + E) + (M + E) M σ r+1 (M + E) + E σ r+1 (M) + E + E = 2 E. }{{} =0 The 2nd inequality of Lemma 7.6 follows since both P r (M + E) and M are rank-r, and hence P r (M + E) M F 2r P r (M + E) M. 30/45

34 Controlling the noise Recall that entries of E = 1 p P Ω(M) M are zero-mean and independent. A bit of random matrix theory... Lemma 7.8 (Chapter 2.3, Tao 12) Suppose X R n n is a random symmetric matrix obeying {X i,j : i < j} are independent E[X i,j ] = 0 and Var[X i,j ] 1 max i,j X i,j n Then X n log n. 31/45

35 Proof of Theorem 7.5 If we look at the zero-mean matrix Ẽ = p µ/ne, then Var [ Ẽ i,j ] ( p = p(1 p) µ/n 1 ) 2 p M i,j Ẽi,j M i,j pµ/n 1 p, ( ) 2 Mi,j 1, µ/n where we have used the fact M i,j = e i UΣV e j U e i σ 1 V e j Lemma 7.8 tells us that if p log n n, then (by our assumptions) Ẽ n log n E µ pn log n µr n µ n This together with Lemma 7.6 and the fact m pn 2 establishes Theorem /45

36 Gradient methods: iterative refinements 33/45

37 Iterative methods: an overview minimize U,V f(u, V ) := P Ω (UV M) 2 F. Gradient descent: (our focus) ] U t+1 = P U [U t η t U f(u t, V t ), ] V t+1 = P V [V t η t V f(u t, V t ). where η t is the step size and P U, P V denote the Euclidean projection onto some contraint sets; Alternating minimization: One optimizes U, V alternatively while fixing the other, which is a convex problem. U t+1 = argmin U f(u, V t ), V t+1 = argmin V f(u t+1, V ). 34/45

38 Gradient descent for matrix completion minimize X R n r f(x) = (j,k) Ω ( e j XX e k M j,k ) 2 Algorithm 7.2 Gradient descent for MC Input: Y = [Y j,k ] 1 j,k n, r, p. Spectral initialization: Let U 0 Σ 0 U 0 be the rank-r eigendecomposition of M 0 := 1 p P Ω(Y ), and set X 0 = U 0 ( Σ 0) 1/2. Gradient updates: for t = 0, 1, 2,..., T 1 do ( X t+1 = X t η t f X t). 35/45

39 Gradient descent for matrix completion Define the optimal transform from the tth iterate X t to X as Q t := argmin R O r r X t R X F. Theorem 7.9 (Ma et al., 2017) Suppose M = X X is rank-r, incoherent and well-conditioned. Vanilla GD (with spectral initialization) achieves X t Q t X F ρ t µr 1 np X F, X t Q t X ρ t µr 1 np X, log n (spectral) X t Q t X 2, ρ t µr np X 2,, (incoherence) where ρ = 1 σ minη 5 < 1, if step size η 1/σ max and sample complexity n 2 p µ 3 nr 3 log 3 n. 36/45

40 Gradient descent for matrix completion Define the optimal transform from the tth iterate X t to X as Q t := argmin R O r r X t R X F. Theorem 7.9 (Ma et al., 2017) Suppose M = X X is rank-r, incoherent and well-conditioned. Vanilla GD (with spectral initialization) achieves X t Q t X F ρ t µr 1 np X F, X t Q t X ρ t µr 1 np X, (spectral) log n X t Q t X 2, ρ t µr np X 2,, (incoherence) where ρ = 1 σ minη 5 < 1, if step size η 1/σ max and sample complexity n 2 p µ 3 nr 3 log 3 n. linear convergence of X t X t M in Frobenius, spectral and infinity norms. 36/45

41 Numerical evidence for noiseless data Figure 7.1: Relative error of X t X t (measured by F,, ) vs. iteration count for matrix completion, where n = 1000, r = 10, p = 0.1, and η t = /45

42 Set SNR := M 2 F n 2 σ 2. Numerical evidence for noisy data Figure 7.2: Squared relative error of the estimate ˆX (measured by F,, 2, ) and ˆM = ˆX ˆX (measured by ) vs. SNR, where n = 500, r = 10, p = 0.1, and η t = /45

43 Restricted strong convexity and smoothness Lemma 7.10 (Restricted strong convexity and smoothness) Suppose that n 2 p Cκ 2 µrn log n for some C > 0. Then with high probability, the Hessian 2 f(x) obeys vec (V ) 2 f (X) vec (V ) σ min 2 V 2 F (restricted strong convexity) 2 f (X) 5 2 σ max (smoothness) for all X and V = Y H Y Z, H Y := arg min R O r r Y R Z F satisfying X X 2, ɛ X 2, (incoherence region), Z X δ X, where ɛ 1/ κ 3 µr log 2 n and δ 1/κ. 39/45

44 Linear convergence induction I Given the definition of Q t+1, we have X t+1 Q t+1 X F X t+1 Q t X F (i) = (ii) = [ ( X t η f X t)] Q t X F X t Q t η f ( X t Q t) X F (iii) = X t Q t η f ( X t Q t) ( X η f ( X )) F, }{{} :=α where (i) follows from the GD rule, (ii) follows from the identity f(x t R) = f ( X t) R for any R O r r, and (iii) follows from f(x ) = 0. 40/45

45 Linear convergence induction II The fundamental theorem of calculus reveals [ vec X t Q t η f ( X t Q t) (X η f ( X ))] [ = vec X t Q t X ] [ η vec f ( X t Q t) f (X )] ( ) 1 ( = I nr η 2 f (X(τ)) dτ vec X t Q t X ), (7.1) 0 } {{ } :=A where we denote X(τ) := X + τ(x t Q t X ). Taking the squared Euclidean norm of both sides of the equality (7.1) leads to α 2 = vec ( X t Q t X ) (Inr ηa) 2 vec ( X t Q t X ) X t Q t X 2 + F η2 A 2 X t Q t X 2 F 2η vec ( X t Q t X ) A vec ( X t Q t X ), (7.2) 41/45

46 Linear convergence induction III Based on the incoherence of X and X t, τ [0, 1], X (τ) X 2, X t Q t X log n 2, Cµr np X 2, } {{ } incoherence hypothesis Taking X = X (τ), Y = X t and Z = X in Lemma 7.10, one can easily verify the assumptions therein given n 2 p κ 3 µ 3 r 3 n log 3 n. Hence, and vec ( X t Q t X ) ( A vec X t Q t X ) σ min X t Q t X 2 2 F A 5 2 σ max.. 42/45

47 Linear convergence induction IV Substituting these two inequalities into (7.2) yields ( α ) 4 η2 σmax 2 σ min η X t Ĥ t X 2 F ( 1 σ min 2 η ) X t Q t X 2 F as long as 0 < η (2σ min )/(25σ 2 max), which further implies that α ( 1 σ min 4 η ) X t Q t X F. The incoherence hypothesis is important for fast convergence: the fact that X t stays incoherent throughout the execution is called implicit regularization and can be established by a leave-one-out analysis trick [Ma et al., 2017]. 43/45

48 Reference [1] Guaranteed matrix completion via non-convex factorization, R. Sun, T. Luo, IEEE Transactions on Information Theory, [2] The rotation of eigenvectors by a perturbation, C. Davis, and W. Kahan, SIAM Journal on Numerical Analysis, [3] Matrix completion from a few entries, R. Keshavan, A. Montanari, and S. Oh, IEEE Transactions on Information Theory, [4] Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees, Y. Chen and M. Wainwright, arxiv preprint arxiv: , [5] Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution, C. Ma, K. Wang, Y. Chi and Y. Chen, arxiv preprint arxiv: , /45

49 Reference [6] No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis, R. Ge, C. Jin, and Y. Zheng, ICML, [7] Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization, X. Li et al., arxiv preprint arxiv: , [8] Topics in random matrix theory, T. Tao, American mathematical society, [9] Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation, Y. Chen, and Y. Chi, arxiv preprint arxiv: , [10] How to escape saddle points efficiently, Jin, C., Ge, R., Netrapalli, P., Kakade, S. M., and Jordan, M. I, arxiv preprint arxiv: , /45

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Low-Rank Matrix Recovery

Low-Rank Matrix Recovery ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis Oct.18th, 2016 Shuyang Ling (UC Davis) 16w5136, Oaxaca, Mexico Oct.18th, 2016

More information

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview Yuejie Chi Yue M. Lu Yuxin Chen September 08; Revised: January 09 Abstract Substantial progress has been made recently on developing

More information

Nonconvex Methods for Phase Retrieval

Nonconvex Methods for Phase Retrieval Nonconvex Methods for Phase Retrieval Vince Monardo Carnegie Mellon University March 28, 2018 Vince Monardo (CMU) 18.898 Midterm March 28, 2018 1 / 43 Paper Choice Main reference paper: Solving Random

More information

Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle

Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle Tuo Zhao Zhaoran Wang Han Liu Abstract We study the low rank matrix factorization problem via nonconvex optimization. Compared with

More information

arxiv: v1 [math.st] 10 Sep 2015

arxiv: v1 [math.st] 10 Sep 2015 Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and

More information

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion : Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion Cong Ma 1 Kaizheng Wang 1 Yuejie Chi Yuxin Chen 3 Abstract Recent years have seen a flurry of activities in designing provably

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Cong Ma Kaizheng Wang Yuejie Chi Yuxin Chen. November 2017; Revised December 2017

Cong Ma Kaizheng Wang Yuejie Chi Yuxin Chen. November 2017; Revised December 2017 Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution Cong Ma Kaizheng Wang Yuejie Chi Yuxin Chen

More information

Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation

Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation 1 Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation Yudong Chen, Member, IEEE, Yuejie Chi, Senior Member, IEEE Abstract Low-rank modeling plays a pivotal role in signal processing

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

COR-OPT Seminar Reading List Sp 18

COR-OPT Seminar Reading List Sp 18 COR-OPT Seminar Reading List Sp 18 Damek Davis January 28, 2018 References [1] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht. Low-rank Solutions of Linear Matrix Equations via Procrustes

More information

Mini-Course 1: SGD Escapes Saddle Points

Mini-Course 1: SGD Escapes Saddle Points Mini-Course 1: SGD Escapes Saddle Points Yang Yuan Computer Science Department Cornell University Gradient Descent (GD) Task: min x f (x) GD does iterative updates x t+1 = x t η t f (x t ) Gradient Descent

More information

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation UIUC CSL Mar. 24 Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation Yuejie Chi Department of ECE and BMI Ohio State University Joint work with Yuxin Chen (Stanford).

More information

Nonconvex Matrix Factorization from Rank-One Measurements

Nonconvex Matrix Factorization from Rank-One Measurements Yuanxin Li Cong Ma Yuxin Chen Yuejie Chi CMU Princeton Princeton CMU Abstract We consider the problem of recovering lowrank matrices from random rank-one measurements, which spans numerous applications

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate 58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Linear dimensionality reduction for data analysis

Linear dimensionality reduction for data analysis Linear dimensionality reduction for data analysis Nicolas Gillis Joint work with Robert Luce, François Glineur, Stephen Vavasis, Robert Plemmons, Gabriella Casalino The setup Dimensionality reduction for

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Global Optimality in Low-rank Matrix Optimization

Global Optimality in Low-rank Matrix Optimization Global Optimality in Low-rank Matrix Optimization Zhihui Zhu, Qiuwei Li, Gongguo Tang, and Michael B. Wakin arxiv:7.7945v3 cs.it] Mar 8 Abstract This paper considers the minimization of a general objective

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Alp Yurtsever (EPFL),

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach Dohuyng Park Anastasios Kyrillidis Constantine Caramanis Sujay Sanghavi acebook UT Austin UT Austin UT Austin Abstract

More information

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization

More information

Symmetric Factorization for Nonconvex Optimization

Symmetric Factorization for Nonconvex Optimization Symmetric Factorization for Nonconvex Optimization Qinqing Zheng February 24, 2017 1 Overview A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow Low-rank Solutions of Linear Matrix Equations via Procrustes low Stephen Tu, Ross Boczar, Max Simchowitz {STEPHENT,BOCZAR,MSIMCHOW}@BERKELEYEDU EECS Department, UC Berkeley, Berkeley, CA Mahdi Soltanolkotabi

More information

Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle

Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle Nonconvex Low Ran Matrix Factorization via Inexact First Order Oracle Tuo Zhao Zhaoran Wang Han Liu Abstract We study the low ran matrix factorization problem via nonconvex optimization Compared with the

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

Recovering any low-rank matrix, provably

Recovering any low-rank matrix, provably Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline

More information

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017 Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization Xingguo Li Jarvis Haupt Dept of Electrical and Computer Eng University of Minnesota Email: lixx1661, jdhaupt@umdedu

More information

Self-Calibration and Biconvex Compressive Sensing

Self-Calibration and Biconvex Compressive Sensing Self-Calibration and Biconvex Compressive Sensing Shuyang Ling Department of Mathematics, UC Davis July 12, 2017 Shuyang Ling (UC Davis) SIAM Annual Meeting, 2017, Pittsburgh July 12, 2017 1 / 22 Acknowledgements

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Collaborative Filtering Matrix Completion Alternating Least Squares

Collaborative Filtering Matrix Completion Alternating Least Squares Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline

More information

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Yuejie Chi, Yuanxin Li, Huishuai Zhang, and Yingbin Liang Abstract Recent work has demonstrated the effectiveness

More information

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra. QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil

Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil Yuejie Chi Departments of ECE and BMI The Ohio State University Colorado School of Mines December 9, 24 Page Acknowledgement Joint work

More information

Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material

Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material Prateek Jain Microsoft Research Bangalore Bangalore, India prajain@microsoft.com Raghu Meka UT Austin Dept. of Computer

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

arxiv: v3 [math.oc] 19 Oct 2017

arxiv: v3 [math.oc] 19 Oct 2017 Gradient descent with nonconvex constraints: local concavity determines convergence Rina Foygel Barber and Wooseok Ha arxiv:703.07755v3 [math.oc] 9 Oct 207 0.7.7 Abstract Many problems in high-dimensional

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Overparametrization for Landscape Design in Non-convex Optimization

Overparametrization for Landscape Design in Non-convex Optimization Overparametrization for Landscape Design in Non-convex Optimization Jason D. Lee University of Southern California September 19, 2018 The State of Non-Convex Optimization Practical observation: Empirically,

More information

arxiv: v1 [stat.ml] 1 Mar 2015

arxiv: v1 [stat.ml] 1 Mar 2015 Matrix Completion with Noisy Entries and Outliers Raymond K. W. Wong 1 and Thomas C. M. Lee 2 arxiv:1503.00214v1 [stat.ml] 1 Mar 2015 1 Department of Statistics, Iowa State University 2 Department of Statistics,

More information

Matrix Completion from Noisy Entries

Matrix Completion from Noisy Entries Matrix Completion from Noisy Entries Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh Abstract Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations

More information

Fast and Robust Phase Retrieval

Fast and Robust Phase Retrieval Fast and Robust Phase Retrieval Aditya Viswanathan aditya@math.msu.edu CCAM Lunch Seminar Purdue University April 18 2014 0 / 27 Joint work with Yang Wang Mark Iwen Research supported in part by National

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Research Statement Qing Qu

Research Statement Qing Qu Qing Qu Research Statement 1/4 Qing Qu (qq2105@columbia.edu) Today we are living in an era of information explosion. As the sensors and sensing modalities proliferate, our world is inundated by unprecedented

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global homas Laurent * 1 James H. von Brecht * 2 Abstract We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

EE 381V: Large Scale Learning Spring Lecture 16 March 7

EE 381V: Large Scale Learning Spring Lecture 16 March 7 EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

Sparse and Low Rank Recovery via Null Space Properties

Sparse and Low Rank Recovery via Null Space Properties Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical

More information

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yingbin Liang Department of EECS Syracuse

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization

Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization Xingguo Li Zhehui Chen Lin Yang Jarvis Haupt Tuo Zhao University of Minnesota Princeton University

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information