ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
|
|
- Letitia Patterson
- 5 years ago
- Views:
Transcription
1 ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring /45
2 Outline Low-rank matrix completion and recovery Nuclear norm minimization (last lecture) RIP and low-rank matrix recovery Matrix completion Algorithms for nuclear norm minimization Non-convex methods (this lecture) Global landscape Spectral methods (Projected) gradient descent 2/45
3 Why nonconvex? Consider completing an n n matrix, with rank r: minimize X P Ω (X M) 2 F s.t. rank(x) r, where r n. The size of observation Ω is about nrpolylogn; The degrees of freedom in X is about nr; Question: Can we develop algorithms that work with computational and memory complexity that nearly linear in n? This means that we don t even want to store the matrix X which takes n 2 storage. A nonconvex approach will store and update a low-dimensional representation of X throughout the execution of the algorithm. 3/45
4 Convex vs. nonconvex minimize x f(x) convex vs. nonconvex 4/45
5 Prelude: low-rank matrix approximation an optimization perspective 5/45
6 Low-rank matrix approximation / PCA Given M R n n (not necessarily low-rank), solve the low-rank approximation problem (best rank-r approximation): M = argmin X X M 2 F s.t. rank(x) r. this is a nonconvex optimization problem. The solution is known as the Eckart-Young theorem: denote the SVD of M = n i=1 σ i u i v i, where σ i s are in a descending order; then nonconvex, but tractable. r M = σ i u i vi. i=1 6/45
7 Optimization viewpoint Let us factorize X = UV, where U, V R n r. Our problem is equivalent to minimize U,V f(u, V ) := UV M 2 F. The size of U, V are of O(nr), which is much smaller than X; Identifiability issues: for any orthonormal R R r r, we have UV = (αur)(α 1 V R). If (U, V ) is a global minimizer (..), so does (αur, α 1 V R). Question: what does f(u, V ) look like (landscape)? (we already found its global minima.) 7/45
8 The PSD case For simplicity, consider the PSD case. Let M be PSD, so that M = n i=1 σ i u i u i. Let X = UU, where U R n r. We re interested in the landscape of f(u) := 1 4 UU M 2 F. Identifiability: for any orthonormal R R r r, we have UU = (UR)(UR). make the exposition even simpler: set r = 1. f(u) = 1 4 uu M 2 F. 8/45
9 Take f(u) = uu Good news: benign landscape [ ] 2. F Global optima: x = ± local minima. [ ] 1, strict saddle x = 1 [ ] 0. No spurious 0 9/45
10 Critical points Definition 7.1 A first-order critical point (stationary point) satisfies f(u) = 0. Figure credit: Li et al., /45
11 Critical points of f(u) Any u R n satisfies f(u) = (uu M)u = 0. Mu = u 2 2u u aligns with eigenvectors of M. or u = 0. 11/45
12 Critical points of f(u) Any u R n satisfies f(u) = (uu M)u = 0. Mu = u 2 2u u aligns with eigenvectors of M. or u = 0. Since Mu i = σ i u i, the set of critical points are given as { σ i u i, i = 1,..., n}. 11/45
13 Categorization of critical points Need to examine the Hessian: 2 f(u) := 2uu + u 2 2I M. Plug in the non-zero critical points: ũ k := σ k u k, 2 f(ũ k ) = 2σ k u k u k + σ k I M ( = 2σ k u k u n k + σ k u i u i i=1 ) n σ i u i u i i=1 = i k(σ k σ i )u i u i + 2σ k u k u k Assume σ 1 > σ 2 >... > σ n > 0: k = 1: 2 f(ũ 1 ) 0 local minima 1 < k n: λ min ( 2 f(ũ k )) < 0, λ max ( 2 f(ũ k )) > 0, strict saddle u = 0: 2 f(0) 0 local maxima 12/45
14 Categorization of critical points Need to examine the Hessian: 2 f(u) := 2uu + u 2 2I M. Plug in the non-zero critical points: ũ k := σ k u k, 2 f(ũ k ) = 2σ k u k u k + σ k I M ( = 2σ k u k u n k + σ k u i u i i=1 ) n σ i u i u i i=1 = i k(σ k σ i )u i u i + 2σ k u k u k Assume σ 1 > σ 2... σ n 0: k = 1: 2 f(ũ 1 ) 0 local minima 1 < k n: λ min ( 2 f(ũ k )) < 0, λ max ( 2 f(ũ k )) > 0, strict saddle u = 0: 2 f(0) 0 strict saddle 13/45
15 Summary f(u) := 1 4 UU M 2 F, U R n r, If σ r > σ r+1, all local minima are global: U contains the top-r eigenvectors up to an orthonormal transformation; strict saddle points: all stationary points are saddle points except the global optimum. 14/45
16 Consider linear measurements: Undersampled regime y = A(M), y R m, m n 2 where M = U 0 U0 simplicity). Rn n is rank-r, r n, and PSD (for The loss function we consider: f(u) := 1 4 A(UU M) 2 F. If E[A A] = I, then E[f(U)] = 1 4 UU M 2 F. Does f(u) inherit the benign landscape? 15/45
17 Landscape preserving under RIP Recall the definition of RIP: Definition 7.2 The rank-r restricted isometry constants δ r is the smallest quantity (1 δ r ) X 2 F A(X) 2 F (1 + δ r ) X 2 F, X : rank(x) r 16/45
18 Landscape preserving under RIP Recall the definition of RIP: Definition 7.2 The rank-r restricted isometry constants δ r is the smallest quantity (1 δ r ) X 2 F A(X) 2 F (1 + δ r ) X 2 F, X : rank(x) r Theorem 7.3 (Bhojanapalli et al. 2016, Ge et al. 2017) If A satisfies the RIP with δ 2r < 1 10, then f(u) satisfies all local min are global: for any local minimum U of f(u), it satisfies UU = M; strict saddle points: for non-local min critical point U, it satisfies λ min [ 2 f(u)] 2 5 σ r. 16/45
19 Proof of Theorem 7.3 when r = 1 Without loss of generality, assume M = u 0 u 0, and σ 1 = 1. Step 1: check all the critical points: m f(u) = A i, uu u 0 u 0 A i u = 0 }{{} i=1 M Step 2: verify the Hessian at all the critical points: m 2 f(u) = A i, uu u 0 u 0 A i + 2A i uu A i i=1 17/45
20 Proof of Theorem 7.3 when r = 1 Proof: Assume u is first-order optimal. Consider the descent direction: = u u 0 : 2 f(u) = = m [ A i, uu u 0 u 0 A i, + 2 A i, u 2] i=1 m [ A i, 2 3 A i, uu u 0 u 0 2]. i=1 where we have used the first order optimality condition. 18/45
21 Proof of Theorem 7.3 when r = 1 By the RIP property: m 2 f(u) = [ A i, (u u 0 )(u u 0 ) 2 3 A i, uu u 0 u 0 2] i=1 (1 + δ) (u u 0 )(u u 0 ) 2 F 3(1 δ) uu u 0 u 0 2 F [2(1 + δ) 3(1 δ)] uu u 0 u 0 2 F (1 5δ) uu u 0 u 0 2 F where we use (u u 0 )(u u 0 ) 2 F 2 uu u 0 u 0 2 F. 19/45
22 Landscape without RIP In matrix completion, we need to regularize the loss function by promoting incoherent solutions: set m Q(U) = ( e i U 2 α) 4 + i=1 where α is some regularization parameter, and z + = max{z, 0}. 20/45
23 Landscape without RIP In matrix completion, we need to regularize the loss function by promoting incoherent solutions: set m Q(U) = ( e i U 2 α) 4 + i=1 where α is some regularization parameter, and z + = max{z, 0}. Consider the loss function f(u) = 1 p P Ω(UU M) 2 F + λq(u) where λ is a regularization parameter. adding Q(U) doesn t affect the global optimizer if α is set properly. 20/45
24 MC doesn t have spurious local minima Theorem 7.4 (Ge et al, 2016) If p µ4 r 6 log n n, α 2 = Θ( µrσ 1 n ) and λ = Θ( n µr ), then with probability at least 1 n 1, all local min are global: for any local minimum U of f(u), it satisfies UU = M; saddle points that are not local minima are strict saddle points. saddle-point escaping algorithms can be used to guarantee convergence to local minima, which in our problem are global minima. active research area for constructing saddle-point escaping algorithms: (perturbed) gradient descent, trust-region methods, etc... 21/45
25 Spectral methods: a one-shot approach 22/45
26 Setup Consider M R n n (square case for simplicity) rank(m) = r n The thin Singular value decomposition (SVD) of M: where Σ = M = UΣV }{{ } = (2n r)r degrees of freedom σ 1... σ r r σ i u i vi T i=1 contain all singular values {σ i }; U := [u 1,, u r ], V := [v 1,, v r ] consist of singular vectors 23/45
27 Signal + noise (i, j) Ω independently with prob. p One can write observation P Ω (M) as 1 p P Ω(M) = }{{} M + 1 p P Ω(M) M signal }{{} noise [ ] Noise has mean zero: E 1 p P Ω(M) = M 24/45
28 Low-rank denoising 1 p P Ω(M) = }{{} M + 1 p P Ω(M) M low-rank signal }{{} :=E (zero-mean noise) Algorithm 7.1 Spectral method ˆM best rank-r approximation of 1 p P Ω(M) The spectral method can be solved via power methods or Lanczos methods, and we don t need to realize the matrix 1 p P Ω(M). 25/45
29 Histograms of singular values of P Ω (M) A random rank-3 matrix M with p = Fig. credit: Keshavan, Montanari, Oh 10 26/45
30 Performance of spectral methods Theorem 7.5 (Keshavan, Montanari, Oh 10) Suppose number of observed entries m obeys m n log n. Then M M F max i,j M i,j nr log 2 n M 1 F n M F m, }{{} :=ν ν reflects whether energy of M is spread out, M i,j µr/n; When m ν 2 n log 2 n, estimate ˆM is very close to truth 1 Degrees of freedom nr nearly-optimal sample complexity for incoherent matrices 1 The logarithmic factor can be improved. 27/45
31 Perturbation bounds To ensure M is good estimate, it suffices to control noise E Lemma 7.6 Suppose rank(m) = r. For any perturbation E, P r (M + E) M P r (M + E) M F 2 E 2 2r E where P r (X) is best rank-r approximation of X. 28/45
32 Prior on matrix perturbation theory Lemma 7.7 (Weyl s inequality, 1912) Let M, E be n n matrices. Then σ k (M + E) σ k (M) E, k = 1,..., n. Proof: Invoke the Courant-Fisher Minimax Characterization: σ k (A) = max min dim(s)=k 0 v S Av 2 v 2. 29/45
33 Proof of Lemma 7.6 By matrix perturbation theory, P r (M + E) M triangle inequality Weyl s inequality P r (M + E) (M + E) + (M + E) M σ r+1 (M + E) + E σ r+1 (M) + E + E = 2 E. }{{} =0 The 2nd inequality of Lemma 7.6 follows since both P r (M + E) and M are rank-r, and hence P r (M + E) M F 2r P r (M + E) M. 30/45
34 Controlling the noise Recall that entries of E = 1 p P Ω(M) M are zero-mean and independent. A bit of random matrix theory... Lemma 7.8 (Chapter 2.3, Tao 12) Suppose X R n n is a random symmetric matrix obeying {X i,j : i < j} are independent E[X i,j ] = 0 and Var[X i,j ] 1 max i,j X i,j n Then X n log n. 31/45
35 Proof of Theorem 7.5 If we look at the zero-mean matrix Ẽ = p µ/ne, then Var [ Ẽ i,j ] ( p = p(1 p) µ/n 1 ) 2 p M i,j Ẽi,j M i,j pµ/n 1 p, ( ) 2 Mi,j 1, µ/n where we have used the fact M i,j = e i UΣV e j U e i σ 1 V e j Lemma 7.8 tells us that if p log n n, then (by our assumptions) Ẽ n log n E µ pn log n µr n µ n This together with Lemma 7.6 and the fact m pn 2 establishes Theorem /45
36 Gradient methods: iterative refinements 33/45
37 Iterative methods: an overview minimize U,V f(u, V ) := P Ω (UV M) 2 F. Gradient descent: (our focus) ] U t+1 = P U [U t η t U f(u t, V t ), ] V t+1 = P V [V t η t V f(u t, V t ). where η t is the step size and P U, P V denote the Euclidean projection onto some contraint sets; Alternating minimization: One optimizes U, V alternatively while fixing the other, which is a convex problem. U t+1 = argmin U f(u, V t ), V t+1 = argmin V f(u t+1, V ). 34/45
38 Gradient descent for matrix completion minimize X R n r f(x) = (j,k) Ω ( e j XX e k M j,k ) 2 Algorithm 7.2 Gradient descent for MC Input: Y = [Y j,k ] 1 j,k n, r, p. Spectral initialization: Let U 0 Σ 0 U 0 be the rank-r eigendecomposition of M 0 := 1 p P Ω(Y ), and set X 0 = U 0 ( Σ 0) 1/2. Gradient updates: for t = 0, 1, 2,..., T 1 do ( X t+1 = X t η t f X t). 35/45
39 Gradient descent for matrix completion Define the optimal transform from the tth iterate X t to X as Q t := argmin R O r r X t R X F. Theorem 7.9 (Ma et al., 2017) Suppose M = X X is rank-r, incoherent and well-conditioned. Vanilla GD (with spectral initialization) achieves X t Q t X F ρ t µr 1 np X F, X t Q t X ρ t µr 1 np X, log n (spectral) X t Q t X 2, ρ t µr np X 2,, (incoherence) where ρ = 1 σ minη 5 < 1, if step size η 1/σ max and sample complexity n 2 p µ 3 nr 3 log 3 n. 36/45
40 Gradient descent for matrix completion Define the optimal transform from the tth iterate X t to X as Q t := argmin R O r r X t R X F. Theorem 7.9 (Ma et al., 2017) Suppose M = X X is rank-r, incoherent and well-conditioned. Vanilla GD (with spectral initialization) achieves X t Q t X F ρ t µr 1 np X F, X t Q t X ρ t µr 1 np X, (spectral) log n X t Q t X 2, ρ t µr np X 2,, (incoherence) where ρ = 1 σ minη 5 < 1, if step size η 1/σ max and sample complexity n 2 p µ 3 nr 3 log 3 n. linear convergence of X t X t M in Frobenius, spectral and infinity norms. 36/45
41 Numerical evidence for noiseless data Figure 7.1: Relative error of X t X t (measured by F,, ) vs. iteration count for matrix completion, where n = 1000, r = 10, p = 0.1, and η t = /45
42 Set SNR := M 2 F n 2 σ 2. Numerical evidence for noisy data Figure 7.2: Squared relative error of the estimate ˆX (measured by F,, 2, ) and ˆM = ˆX ˆX (measured by ) vs. SNR, where n = 500, r = 10, p = 0.1, and η t = /45
43 Restricted strong convexity and smoothness Lemma 7.10 (Restricted strong convexity and smoothness) Suppose that n 2 p Cκ 2 µrn log n for some C > 0. Then with high probability, the Hessian 2 f(x) obeys vec (V ) 2 f (X) vec (V ) σ min 2 V 2 F (restricted strong convexity) 2 f (X) 5 2 σ max (smoothness) for all X and V = Y H Y Z, H Y := arg min R O r r Y R Z F satisfying X X 2, ɛ X 2, (incoherence region), Z X δ X, where ɛ 1/ κ 3 µr log 2 n and δ 1/κ. 39/45
44 Linear convergence induction I Given the definition of Q t+1, we have X t+1 Q t+1 X F X t+1 Q t X F (i) = (ii) = [ ( X t η f X t)] Q t X F X t Q t η f ( X t Q t) X F (iii) = X t Q t η f ( X t Q t) ( X η f ( X )) F, }{{} :=α where (i) follows from the GD rule, (ii) follows from the identity f(x t R) = f ( X t) R for any R O r r, and (iii) follows from f(x ) = 0. 40/45
45 Linear convergence induction II The fundamental theorem of calculus reveals [ vec X t Q t η f ( X t Q t) (X η f ( X ))] [ = vec X t Q t X ] [ η vec f ( X t Q t) f (X )] ( ) 1 ( = I nr η 2 f (X(τ)) dτ vec X t Q t X ), (7.1) 0 } {{ } :=A where we denote X(τ) := X + τ(x t Q t X ). Taking the squared Euclidean norm of both sides of the equality (7.1) leads to α 2 = vec ( X t Q t X ) (Inr ηa) 2 vec ( X t Q t X ) X t Q t X 2 + F η2 A 2 X t Q t X 2 F 2η vec ( X t Q t X ) A vec ( X t Q t X ), (7.2) 41/45
46 Linear convergence induction III Based on the incoherence of X and X t, τ [0, 1], X (τ) X 2, X t Q t X log n 2, Cµr np X 2, } {{ } incoherence hypothesis Taking X = X (τ), Y = X t and Z = X in Lemma 7.10, one can easily verify the assumptions therein given n 2 p κ 3 µ 3 r 3 n log 3 n. Hence, and vec ( X t Q t X ) ( A vec X t Q t X ) σ min X t Q t X 2 2 F A 5 2 σ max.. 42/45
47 Linear convergence induction IV Substituting these two inequalities into (7.2) yields ( α ) 4 η2 σmax 2 σ min η X t Ĥ t X 2 F ( 1 σ min 2 η ) X t Q t X 2 F as long as 0 < η (2σ min )/(25σ 2 max), which further implies that α ( 1 σ min 4 η ) X t Q t X F. The incoherence hypothesis is important for fast convergence: the fact that X t stays incoherent throughout the execution is called implicit regularization and can be established by a leave-one-out analysis trick [Ma et al., 2017]. 43/45
48 Reference [1] Guaranteed matrix completion via non-convex factorization, R. Sun, T. Luo, IEEE Transactions on Information Theory, [2] The rotation of eigenvectors by a perturbation, C. Davis, and W. Kahan, SIAM Journal on Numerical Analysis, [3] Matrix completion from a few entries, R. Keshavan, A. Montanari, and S. Oh, IEEE Transactions on Information Theory, [4] Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees, Y. Chen and M. Wainwright, arxiv preprint arxiv: , [5] Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution, C. Ma, K. Wang, Y. Chi and Y. Chen, arxiv preprint arxiv: , /45
49 Reference [6] No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis, R. Ge, C. Jin, and Y. Zheng, ICML, [7] Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization, X. Li et al., arxiv preprint arxiv: , [8] Topics in random matrix theory, T. Tao, American mathematical society, [9] Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation, Y. Chen, and Y. Chi, arxiv preprint arxiv: , [10] How to escape saddle points efficiently, Jin, C., Ge, R., Netrapalli, P., Kakade, S. M., and Jordan, M. I, arxiv preprint arxiv: , /45
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationLow-Rank Matrix Recovery
ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery
More informationELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018
ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space
More informationRapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization
Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis Oct.18th, 2016 Shuyang Ling (UC Davis) 16w5136, Oaxaca, Mexico Oct.18th, 2016
More informationNonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview Yuejie Chi Yue M. Lu Yuxin Chen September 08; Revised: January 09 Abstract Substantial progress has been made recently on developing
More informationNonconvex Methods for Phase Retrieval
Nonconvex Methods for Phase Retrieval Vince Monardo Carnegie Mellon University March 28, 2018 Vince Monardo (CMU) 18.898 Midterm March 28, 2018 1 / 43 Paper Choice Main reference paper: Solving Random
More informationNonconvex Low Rank Matrix Factorization via Inexact First Order Oracle
Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle Tuo Zhao Zhaoran Wang Han Liu Abstract We study the low rank matrix factorization problem via nonconvex optimization. Compared with
More informationarxiv: v1 [math.st] 10 Sep 2015
Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and
More informationImplicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion
: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion Cong Ma 1 Kaizheng Wang 1 Yuejie Chi Yuxin Chen 3 Abstract Recent years have seen a flurry of activities in designing provably
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationCong Ma Kaizheng Wang Yuejie Chi Yuxin Chen. November 2017; Revised December 2017
Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution Cong Ma Kaizheng Wang Yuejie Chi Yuxin Chen
More informationHarnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation
1 Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation Yudong Chen, Member, IEEE, Yuejie Chi, Senior Member, IEEE Abstract Low-rank modeling plays a pivotal role in signal processing
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationCOR-OPT Seminar Reading List Sp 18
COR-OPT Seminar Reading List Sp 18 Damek Davis January 28, 2018 References [1] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht. Low-rank Solutions of Linear Matrix Equations via Procrustes
More informationMini-Course 1: SGD Escapes Saddle Points
Mini-Course 1: SGD Escapes Saddle Points Yang Yuan Computer Science Department Cornell University Gradient Descent (GD) Task: min x f (x) GD does iterative updates x t+1 = x t η t f (x t ) Gradient Descent
More informationCombining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation
UIUC CSL Mar. 24 Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation Yuejie Chi Department of ECE and BMI Ohio State University Joint work with Yuxin Chen (Stanford).
More informationNonconvex Matrix Factorization from Rank-One Measurements
Yuanxin Li Cong Ma Yuxin Chen Yuejie Chi CMU Princeton Princeton CMU Abstract We consider the problem of recovering lowrank matrices from random rank-one measurements, which spans numerous applications
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,
More informationFirst Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate
58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,
More informationLinear dimensionality reduction for data analysis
Linear dimensionality reduction for data analysis Nicolas Gillis Joint work with Robert Luce, François Glineur, Stephen Vavasis, Robert Plemmons, Gabriella Casalino The setup Dimensionality reduction for
More information2. Review of Linear Algebra
2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear
More informationCompressed Sensing and Robust Recovery of Low Rank Matrices
Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech
More informationRanking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization
Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis
More informationRandom projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016
Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationGlobal Optimality in Low-rank Matrix Optimization
Global Optimality in Low-rank Matrix Optimization Zhihui Zhu, Qiuwei Li, Gongguo Tang, and Michael B. Wakin arxiv:7.7945v3 cs.it] Mar 8 Abstract This paper considers the minimization of a general objective
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationSketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage
Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Alp Yurtsever (EPFL),
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationNon-square matrix sensing without spurious local minima via the Burer-Monteiro approach
Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach Dohuyng Park Anastasios Kyrillidis Constantine Caramanis Sujay Sanghavi acebook UT Austin UT Austin UT Austin Abstract
More informationHow to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India
How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization
More informationSymmetric Factorization for Nonconvex Optimization
Symmetric Factorization for Nonconvex Optimization Qinqing Zheng February 24, 2017 1 Overview A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationLow-rank Solutions of Linear Matrix Equations via Procrustes Flow
Low-rank Solutions of Linear Matrix Equations via Procrustes low Stephen Tu, Ross Boczar, Max Simchowitz {STEPHENT,BOCZAR,MSIMCHOW}@BERKELEYEDU EECS Department, UC Berkeley, Berkeley, CA Mahdi Soltanolkotabi
More informationNonconvex Low Rank Matrix Factorization via Inexact First Order Oracle
Nonconvex Low Ran Matrix Factorization via Inexact First Order Oracle Tuo Zhao Zhaoran Wang Han Liu Abstract We study the low ran matrix factorization problem via nonconvex optimization Compared with the
More informationarxiv: v5 [math.na] 16 Nov 2017
RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationLow-rank Matrix Completion with Noisy Observations: a Quantitative Comparison
Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationAnalysis of Robust PCA via Local Incoherence
Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu
More informationRecovering any low-rank matrix, provably
Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationSymmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization
Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization Xingguo Li Jarvis Haupt Dept of Electrical and Computer Eng University of Minnesota Email: lixx1661, jdhaupt@umdedu
More informationSelf-Calibration and Biconvex Compressive Sensing
Self-Calibration and Biconvex Compressive Sensing Shuyang Ling Department of Mathematics, UC Davis July 12, 2017 Shuyang Ling (UC Davis) SIAM Annual Meeting, 2017, Pittsburgh July 12, 2017 1 / 22 Acknowledgements
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationCHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationNon-convex Robust PCA: Provable Bounds
Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationCollaborative Filtering Matrix Completion Alternating Least Squares
Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline
More informationMedian-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation
Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Yuejie Chi, Yuanxin Li, Huishuai Zhang, and Yingbin Liang Abstract Recent work has demonstrated the effectiveness
More informationQALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.
QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview
More informationSingular Value Decomposition
Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our
More informationSparse Parameter Estimation: Compressed Sensing meets Matrix Pencil
Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil Yuejie Chi Departments of ECE and BMI The Ohio State University Colorado School of Mines December 9, 24 Page Acknowledgement Joint work
More informationGuaranteed Rank Minimization via Singular Value Projection: Supplementary Material
Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material Prateek Jain Microsoft Research Bangalore Bangalore, India prajain@microsoft.com Raghu Meka UT Austin Dept. of Computer
More informationConditions for Robust Principal Component Analysis
Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and
More informationarxiv: v3 [math.oc] 19 Oct 2017
Gradient descent with nonconvex constraints: local concavity determines convergence Rina Foygel Barber and Wooseok Ha arxiv:703.07755v3 [math.oc] 9 Oct 207 0.7.7 Abstract Many problems in high-dimensional
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationMatrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University
Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationOverparametrization for Landscape Design in Non-convex Optimization
Overparametrization for Landscape Design in Non-convex Optimization Jason D. Lee University of Southern California September 19, 2018 The State of Non-Convex Optimization Practical observation: Empirically,
More informationarxiv: v1 [stat.ml] 1 Mar 2015
Matrix Completion with Noisy Entries and Outliers Raymond K. W. Wong 1 and Thomas C. M. Lee 2 arxiv:1503.00214v1 [stat.ml] 1 Mar 2015 1 Department of Statistics, Iowa State University 2 Department of Statistics,
More informationMatrix Completion from Noisy Entries
Matrix Completion from Noisy Entries Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh Abstract Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations
More informationFast and Robust Phase Retrieval
Fast and Robust Phase Retrieval Aditya Viswanathan aditya@math.msu.edu CCAM Lunch Seminar Purdue University April 18 2014 0 / 27 Joint work with Yang Wang Mark Iwen Research supported in part by National
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationResearch Statement Qing Qu
Qing Qu Research Statement 1/4 Qing Qu (qq2105@columbia.edu) Today we are living in an era of information explosion. As the sensors and sensing modalities proliferate, our world is inundated by unprecedented
More informationExact Low-rank Matrix Recovery via Nonconvex M p -Minimization
Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:
More informationDeep Linear Networks with Arbitrary Loss: All Local Minima Are Global
homas Laurent * 1 James H. von Brecht * 2 Abstract We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationEE 381V: Large Scale Learning Spring Lecture 16 March 7
EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationTighter Low-rank Approximation via Sampling the Leveraged Element
Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com
More informationIV. Matrix Approximation using Least-Squares
IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that
More informationLecture 9: Numerical Linear Algebra Primer (February 11st)
10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template
More informationSparse and Low Rank Recovery via Null Space Properties
Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationDense Error Correction for Low-Rank Matrices via Principal Component Pursuit
Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical
More informationIncremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method
Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yingbin Liang Department of EECS Syracuse
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationOnline Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization
Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization Xingguo Li Zhehui Chen Lin Yang Jarvis Haupt Tuo Zhao University of Minnesota Princeton University
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationRecent Developments in Compressed Sensing
Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More information