Low-rank Solutions of Linear Matrix Equations via Procrustes Flow

Size: px
Start display at page:

Download "Low-rank Solutions of Linear Matrix Equations via Procrustes Flow"

Transcription

1 Low-rank Solutions of Linear Matrix Equations via Procrustes low Stephen Tu Mahdi Soltanolkotabi Ross Boczar Benjamin Recht July 11, 2015 Abstract In this paper we study the problem of recovering an low-rank positive semidefinite matrix from linear measurements Our algorithm, which we call Procrustes low, starts from an initial estimate obtained by a thresholding scheme followed by gradient descent on a non-convex objective We show that as long as the measurements obey a standard restricted isometry property, our algorithm converges to the unknown matrix at a geometric rate In the case of Gaussian measurements, such convergence occurs for a n n matrix of rank r when the number of measurements is Ω(nr) 1 Introduction In numerous applications from signal processing, machine learning, control, and wireless communications, it is often of interest to find a positive semidefinite matrix of minimal rank obeying a set of linear equations In particular, one wishes to solve problems of the form min rank(x) st A(X) = b and X 0, (11) X Rn n where A : R n n R m is a known affine transformation that maps matrices to vectors More specifically, the k-th entry of A(X) is A k, X := Tr(A k X), where each A k R n n is symmetric Since the early seventies, a very popular heuristic for solving such problems has been to replace X with a low-rank factorization X = UU T and solve matrix quadratic equations of the form find st A(UU T ) = b, (12) U R n r via a local search heuristic [Ruh74] Many researchers have demonstrated that such heuristics work well in practice for a variety of problems [RS05, un06, LRS +, RR13] However, these procedures lack strong guarantees associated with convex programming heuristics for solving (11) In this paper we show that a local search heuristic provably solves (12) under standard restricted isometry assumptions on the linear map A or standard ensembles of equality constraints, we demonstrate that X can be estimated to precision ɛ by such local search as long as we have Ω(nr) Department of Electrical Engineering and Computer Science, UC Berkeley, Berkeley CA Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA Department of Statistics, UC Berkeley, Berkeley CA

2 equations 1 This is merely a constant factor more than the number of parameters needed to specify a n n rank r matrix Specialized to a random Gaussian model, our work improves upon recent and independent work by Zheng and Lafferty [ZL15] 2 Algorithm: Procrustes low In this paper we study a local search heuristic for solving matrix quadratic equations of the form (12) which consists of two components: (1) a careful initialization obtained by a projected gradient scheme on n n matrices, and (2) a series of successive refinements of this initial solution via a gradient descent scheme This algorithm is a natural extension of the Wirtinger low algorithm developed in [CLS14b] for solving vector quadratic equations ollowing [CLS14b], we shall refer to the combination of these two steps as the Procrustes low (P) algorithm detailed in Algorithm 1 21 Initialization via low-rank projected gradients In the initial phase of our algorithm we start from Z 0 = 0 n n and apply successive updates of the form on rank r matrices of size n n ( m ) Z τ+1 = P r Z τ α τ+1 ( A k, Z τ b k ) A k, (21) k=1 for T init iterations Here, P r denotes projection onto rank-r positive semidefinite (PSD) matrices which can be computed efficiently via Lanczos methods We set our initialization to an n r matrix U 0 obeying Z Tinit = U 0 U T 0 Updates of the form (21) have a long history in compressed sensing/matrix sensing literature (see eg [TG07, GK09, NT09, NV09, BD09, MJD09, CCS]) urthermore, using the first step of the update (21) for the purposes of initialization has also been proposed in previous work (see eg [AM07, KMO, JNS12]) 22 Successive refinement via gradient descent As mentioned earlier, we are interested in finding a matrix U R n r obeying matrix quadratic equations of the form b = A(UU T ) We wish to refine our initial estimate by solving the following non-convex optimization problem min f(u) := 1 A(UU T ) b 2 = 1 m ( A k, UU T b k ) 2, (22) U R n r 4 l 2 4 which minimizes the misfit in our quadratic equations via the square loss To solve (22), starting from our initial estimate U 0 R n r we apply the successive updates ( σ 2 U τ+1 := U τ µ r(u 0 ) τ+1 σ1 4(U 0) f(u σ 2 m ) τ ) = U τ µ r(u 0 ) τ+1 σ1 4(U ( A k, U τ Uτ T b k )A k U τ (23) 0) Here and throughout, for a matrix X, σ l (X) denotes the l-th largest singular value of X We note that the update (23) is essentially gradient descent with a carefully chosen step size 1 Here and throughout we use f(x) = Ω(g(x)) if there is a positive constant C such that f(x) Cg(x) for all x sufficiently large k=1 k=1 2

3 Algorithm 1 Procrustes low (P) Require: {A k } m k=1, {b k} m k=1, {α τ } τ=1, {µ τ } τ=1, T init N // Initialization phase Z 0 := 0 n n repeat Z τ+1 P r (Z τ α m τ k=1 ( A k, Z τ b k )A k ), // Projection onto rank r PSD matrices until τ = T init QΣQ T := Z Tinit // SVD of Z Tinit, with Q R n r, Σ R r r U 0 := QΣ 1/2 // Gradient descent phase repeat σr U τ+1 U τ µ 2(U 0) ( m τ+1 σ1 4(U 0) k=1 ( A k, U τ Uτ T ) b k )A k U τ, until convergence 3 Main Results or our theoretical results we shall focus on affine maps A which are isotropic and obey the matrix Restricted Isometry Property (RIP) Definition 31 (Isotropic mapping) We say the random map A is an isotropic mapping if E[ A(X) 2 l 2 ] = X 2, holds for any fixed X R n n which is independent of A Definition 32 (Restricted Isometry Property (RIP) [CT05, RP]) The map A satisfies r-rip with constant δ r, if (1 δ r ) X 2 A(X) 2 l 2 (1 + δ r ) X 2, holds for all matrices X R n n of rank at most r We are interested in finding a matrix U R n r obeying quadratic equations of the form A(UU T ) = b, (31) where we assume b = A(XX T ) for a planted solution X R n r We wish to understand when the Procrustes low algorithm recovers this planted solution X We note that this is only possible up to a certain rotational factor as if U obeys (31), then so does any matrix UR with R R r r an orthonormal matrix satisfying R T R = I r This naturally leads to defining the distance between two matrices U, X R n r as dist(u, X) := min U XR (32) R R r r : R T R=I r We note that this distance is the solution to the classic orthogonal Procrustes problem (hence the name of the algorithm) It is known that the optimal rotation matrix R minimizing U XR is equal to R = AB T, where AΣB T is the singular value decomposition (SVD) of X T U We now have all of the elements in place to state our main result 3

4 Theorem 33 Let X R n r be an arbitrary matrix with singular values σ 1 (X) σ 2 (X) σ r (X) > 0 and condition number κ = σ 1 (X)/σ r (X) Also, let b = A(XX T ) R m be m quadratic samples urthermore, assume the mapping A is isotropic and obeys rank-4r RIP with RIP constant δ 4r 1/ Then, using T init log( rκ 2 ) + 2 iterations of the initialization phase of Procrustes low with α τ = 1/m yields a solution U 0 obeying dist (U 0, X) 1 4 σ r(x) (33) urthermore, take a constant step size µ τ = µ for all τ = 1, 2, and assume µ 4/75 Then, starting from any initial solution obeying (33), we have dist (U τ, X) 1 ( 1 24 ) τ µ 2 σr κ 4 (X) (34) The above theorem shows that Procrustes low algorithm achieves a good initialization under the isotropy and RIP assumptions on the mapping A Also, starting from any sufficiently accurate initialization the algorithm exhibits geometric convergence to the unknown matrix X We note that in the above result we have not attempted to optimize the constants urthermore, there is a natural tradeoff involved between the upper bound on the RIP constant, the radius in which P is contractive (33), and its rate of convergence (34) In particular, as it will become clear in the proofs one can increase the radius in which P is contractive (increase the constant 1/4 in (33)) and the rate of convergence (increase the constant 24/125 in (34)) by assuming a smaller upper bound on the RIP constant The most common measurement ensemble which satisfies the isotropy and RIP assumptions is the spiked Gaussian ensemble, where each symmetric matrix A k has N (0, 1/m) entries on the diagonal and N (0, 1/2m) entries elsewhere or this ensemble to achieve a RIP constant of δ r, we require at least m = Ω( 1 nr) measurements Applying Theorem 33 to this measurement δr 2 ensemble, we conclude that the Procrustes low algorithm yields a solution with relative error ɛ (dist( ˆX, X)/ X ɛ) in O(log( r/ɛ)) iterations using only Ω(nr) measurements We would like to note that if more measurements are available it is not necessary to use multiple projected gradient updates in the initialization phase In particular, for the Gaussian model if m = Ω(nr 2 κ 4 ), then (33) will hold after the first iteration (T init = 1) How to verify the initialization is complete Theorem 33 requires that T init = Ω(log( rκ 2 )), but κ is a property of X and is hence unknown However, as long as the RIP constant δ 2r satisfies δ 2r 1/, then we can use each iterate of initialization to test whether or not we have entered the radius of convergence The following lemma establishes a sufficient condition we can check using only information from Z τ The proof is deferred to Appendix B Lemma 34 Assume the RIP constant of A satisfies δ 2r 1/ Let Z τ denote the τ-th step of initialization in Procrustes low, and let U 0 R n r be the such that Z τ = U 0 U0 T Define e τ := A(Z τ ) b l2 = A(Z τ XX T ) Then, if the following condition holds e τ 3 20 σ r(z τ ), l2 4

5 we have that dist(u 0, X) 1 4 σ r(x) One might consider using solely the projected gradient updates (ie set T init = ) as in previous approaches [TG07, GK09, NT09, NV09, BD09, MJD09, CCS] We note that the projected gradient updates in the initialization phase require computing the first r eigenvectors of a PSD matrix whereas the gradient updates do not require any eigenvector computations Such eigenvector computations may be prohibitive compared to the gradient updates, especially when n is large and for ensembles where matrix-vector multiplication is fast We would like to emphasize, however, that for small n and dense matrices using projected gradient updates may be more efficient due to faster convergence rate Our scheme is a natural interpolation: one could only do projected gradient steps, or one could do one projected gradient step Here we argue that very few projected gradients provide sufficient initialization such that gradient descent converges geometrically 4 Related work There is a vast literature dedicated to low-rank matrix recovery/sensing and semi-definite programming We shall only focus on the papers most related to our framework Recht, azel, and Parrilo were the first to study low-rank solutions of linear matrix equations under RIP assumptions [RP] They showed that if the rank-r RIP constant of A is less than a fixed numerical constant, then the matrix with minimum trace satisfying the equality constraints coincided with the minimum rank solution In particular, for the Gaussian ensemble the required number of measurements is Ω(nr) [CP11] Subsequently, a series of papers [CR09, Gro11, Rec11, CLS14a] showed that trace minimization and related convex optimization approaches also work for other measurement ensembles such as those arising in matrix completion and related problems In this paper we have established a similar result to [RP] (albeit only for PSD matrices) We require the same order of measurements Ω(nr) but use a more computationally friendly local search algorithm Also related to this work are projection gradient schemes with hard thresholding [TG07, GK09, NT09, NV09, BD09, MJD09, CCS] Such algorithms enjoy similar guarantees to that of [RP] and this work Indeed, we utilize such results in the initialization phase of our algorithm However, such algorithms require a rank-r SVD in each iteration which may be expensive for large problem sizes We would like to emphasize, however, that for small problem sizes and dense matrices (such as Gaussian ensembles) such algorithms may be faster than gradient descent approaches such as ours More recently, there have been a few results using non-convex optimization schemes for matrix recovery problems In particular, theoretical guarantees for matrix completion have been established using manifold optimization 2 [KMO] and alternating minimization [Kes12] (albeit with the caveat of requiring a fresh set of samples in each iteration) Later on, Jain etal [JNS12] analyzed the performance of alternating minimization under similar modeling assumptions to [RP] and this paper However, the requirements on the RIP constant in [JNS12] are more stringent compared to [RP] and ours In particular, the authors require δ 4r c/r whereas we only require δ 4r c We should note, however, that the authors study general low-rank matrices rather than PSD ones 2 See [JBAS] for related schemes using manifold optimization 5

6 studied in this paper (we defer the analysis of general low-rank matrices to a future version of this paper) Specialized to the Gaussian model, the results of [JNS12] require Ω(nr 3 κ 4 ) measurements 3 Our algorithm and analysis are inspired by the recent paper [CLS14b] by Candes, Li and Soltanolkotabi See also [Sol14, CLM15] for some stability results In [CLS14b] the authors introduced a local regularity condition to analyze the convergence of a gradient descent-like scheme for phase retrieval We use a similar regularity condition but generalize it to ranks higher than one Recently, independent of our work, Zheng and Lafferty [ZL15] provided an analysis of gradient descent using (22) via the same regularity condition Zheng and Lafferty focus on the Gaussian ensemble, and establish a sample complexity of m = Ω(nr 2 κ 4 log n) 4 In comparison we only require Ω(nr) measurements removing both the dependence on κ in the sample complexity and improving the asymptotic rate We would like to emphasize that the improvement in our result is not just due to the more sophisticated initialization scheme In particular, Zheng and Lafferty show geometric convergence (albeit at a slower rate) starting from any initial solution obeying dist(u 0, X) cσ r (X) as long as the number of measurements obeys m = Ω(nrκ 4 log n) In contrast, we establish geometric convergence starting from the same neighborhood of U 0 with only Ω(nr) measurements Moreover, the theory of restricted isometries in our work considerably simplifies the analysis inally, we would also like to mention [SOR14] for guarantees using stochastic gradient algorithms The results of [SOR14] are applicable to a variety of models; focusing on the Gaussian ensemble, the authors require Ω ((nr log n)/ɛ) samples to reach a relative error of ɛ In contrast, our sample complexity is independent of the desired relative error ɛ However, their algorithm only requires a random initialization 5 Proofs Before we dive into the details of the proofs, we would like to mention that we will prove our results using the update in lieu of the P update U τ+1 = U τ µ κ 2 X 2 f(u τ ), (51) U τ+1 = U τ µ P σ 2 r(u 0 ) U 0 4 f(u τ ) (52) As we prove in Section 53, our initial solution obeys U 0 U T 0 XXT σ 2 r(x)/4 Hence, applying Weyl s inequality we can conclude that σ 2 r(u 0 ) σ 4 1 (U 0) σ2 r(x) + σr(x)/4 2 (σ1 2(X) r(x) σ2 1 (X)/4)2 3σ2, (53) (X) 3 The authors also propose a stage-wise algorithm with improved sample complexity of Ω(nr 3 κ 4 ) where κ is a local condition number defined as the ratio of the maximum ratio of two successive eigenvalues We note, however, that in general κ can be as large as κ 4 We note that the paper [ZL15] defines κ in terms of the larger matrix XX T, whereas our convention is to define κ for the smaller matrix X σ 4 1 6

7 and similarly, σr(u 2 0 ) σ1 4(U 0) 12 σr(x) 2 (54) 25 (X) Thus, any result proven for the update (51) will automatically carry over to the P update with a simple rescaling of the upper bound on the step size via (53) urthermore, we can upper bound the convergence rate of gradient descent using the P update in terms of properties of X instead of U 0 via (54) 51 Preliminaries We start with a well known characterization of RIP Lemma 51 [Can08] Let A satisfy 4r-RIP with constant δ 4r Then, for all fixed matrices X, Y of rank at most 2r which are independent of A, we have A(X), A(Y ) X, Y δ 4r X Y Next, we state a recent result which characterizes the convergence rate of projected gradient descent onto general non-convex sets specialized to our problem See [MJD09] for related results using singular value hard thresholding Lemma 52 [ORS15] Let X R n r be an arbitrary matrix Also, let b = A(XX T ) R m be m quadratic samples Consider the iterative updates ( ) Z τ+1 P r Z τ 1 m ( A k, Z τ b k )Z τ, m where P r projects its input matrix onto the rank-r PSD cone Then, Z τ XX T ρ(a) τ Z0 XX T, holds Here, ρ(a) is defined as ρ(a) := 2 sup k=1 X =1,rank(X) 2r, Y =1,rank(Y ) 2r σ 4 1 A(X), A(Y ) X, Y We shall make repeated use of the following lemma which upper bounds UU T XX T by some factor of dist(u, X) Lemma 53 or any U R n r obeying dist(u, X) 1 4 X, we have UU T XX T 9 4 X U XR Proof UU T XX T = U(U XR) T + (U XR)(XR) T ( U + X ) U XR 9 4 X U XR 7

8 inally, we also need the following lemma which upper bounds dist(u, X) by some factor of UU T XX T We defer the proof of this result to Appendix A Lemma 54 or any U, X R n r, we have dist(u, X) 2 1 2( UU T XX T 2 2 1)σr(X) 2 52 Proof of convergence of gradient descent updates (Equation (34)) We first outline the general proof strategy Please see [CLS14b, Sections 23 and 79] for related arguments The first step is to show that gradient descent on the expected value of the function, which we call (U) := E[f(U)], exhibits geometric convergence in a small neighborhood around X The standard approach in optimization to show this is to prove that the function exhibits strong convexity However, it is not possible for (U) to be strongly convex in any neighborhood around X 5 Thus, we rely on the approach used by [CLS14b], which establishes a sufficient condition that only relies on first-order information along certain trajectories After showing the sufficient condition holds in expectation, we use standard RIP results to show that this condition also holds for the function f(u) with high probability To begin our analysis, we start with the following formulas for the gradient of f(u) and (U) f(u) = m A k, UU T XX T A k U, (U) = (UU T XX T )U k=1 Throughout the proof R is the solution to the orthogonal Procrustes problem That is, R = arg min U X R, R R n n : RT R=Ir with the dependence on U omitted for sake of exposition The following definition defines a notion of strong convexity along certain trajectories of the function Definition 55 (Regularity condition, [CLS14b]) Let X R n r be a global optimum of a function f Define the set B(δ) as B(δ) := {U R n r : dist(u, X) δ} The function f satisfies a regularity condition, denoted by RC(α, β, δ), if for all matrices U B(δ) the following inequality holds: f(u), U XR 1 α U XR β f(u) 2 If a function satisfies RC(α, β, δ), then as long as gradient descent starts from a point U 0 B(δ), it will have a geometric rate of convergence to the optimum X This is formalized by the following lemma 5 Such a proof is only possible in the special case when r = 1 8

9 Lemma 56 [CLS14b] If f satisfies RC(α, β, δ) and U 0 B(δ), then the gradient descent update U τ+1 U τ µ f(u τ ), with step size 0 < µ 2/β obeys U τ B(δ) and ( dist 2 (U τ, X) 1 2µ ) τ dist 2 (U 0, X), α for all τ 0 The proof is complete by showing that the regularity condition holds To this end, we first show in Lemma 57 below that the expected function (U) satisfies a slightly stronger variant of the regularity condition from Definition 55 We then show in Lemma 58 that the gradient of f is always close to the gradient of Lemma 57 Let (U) = UU T XX T 2 denote the average function or all U obeying U XR 1 4 σ r(x), we have (U), U XR 1 ( UU T XX T 2 (U 20 + XR)U T σ2 r(x) 4 U XR κ 2 X 2 (U) 2 (55) 2 ) Lemma 58 Let A be a linear map obeying rank-4r RIP with constant δ 4r or any V R n r and any U R n r obeying dist(u, X) 1 4 X and independent of A, we have UU (U) f(u), V δ T 4r XX T V U T This immediately implies that for any U R n r obeying dist(u, X) 1 4 X and independent of A, we have UU f(u) (U) δ T 4r XX T U We shall prove these two lemmas in Sections 521 and 522 However, we first explain how the regularity condition follows from these two lemmas To begin, note that (U), U XR = f(u), U XR + (U) f(u), U XR (a) f(u), U XR + 1 UU T XX T (U XR)U T (b) f(u), U XR + 1 ( UU T XX T 2 (U 20 + XR)U T 2 ) (56) where (a) follows from Lemma 58 by using the fact that δ 4r 1 Theorem 33 and (b) follows from 2ab a 2 + b 2 as assumed in the statement of 9

10 Combining (56) with Lemma 57 for any U obeying U XR 1 4 σ r(x), we have f(u), U XR σ2 r(x) 4 urthermore, for any U obeying dist(u, X) 1 4 X, we have (U) 2 U XR κ 2 X 2 (U) 2 (57) (a) 1 2 f(u) 2 (U) f(u) 2 (b) 1 2 f(u) 2 δ2 4r UU T XX T 2 U 2 (c) 1 2 f(u) δ2 4r X 2 UU T XX T 2 (d) 1 2 f(u) 2 8δ2 4r X 4 U XR 2 (58) Here, (a) holds by the triangle inequality and the inequality (a b) 2 a 2 /2 b 2, (b) holds by Lemma 58, (c) follows from the fact that for dist(u, X) 1 4 X, we have U 5 4 X, and (d) follows from Lemma 53 Equation (58) immediately implies κ 2 X 2 (U) κ 2 X 2 f(u) κ 2 δ2 4r X 2 U XR 2 (59) Plugging the latter into (57), and assuming δ 4r 1, we conclude that for any U XR 1 4 σ r(x) we have ( 1 f(u), U XR 4 32 ) 25 δ2 4r σr(x) 2 U XR κ 2 X 2 f(u) σ2 r(x) U XR κ 2 X 2 f(u) 2 Since dist(u, X) 1 4 σ r(x) implies U XR 1 4 σ r(x), the last inequality shows that f(u) obeys RC(5/σr(X), κ2 X 2, 1 4 σ r(x)) The convergence result in Equation (34) now follows from Lemma 56 All that remains is to prove Lemmas 57 and Proof of the regularity condition for the average function (Lemma 57) We first state some properties of the Procrustes problem and its optimal solution Let U, X R n r and define H := U XR, where R is the orthogonal matrix which minimizes U XR Let AΣB T be the SVD of X T U; we know that the optimal R is equal to R = AB T Thus, U T XR = BΣB T = (XR) T U, which shows that U T XR is a symmetric PSD matrix urthermore, note that since H T XR = U T XR R T X T XR = (XR) T U R T X T XR = (XR) T (U XR) = (XR) T H,

11 we can conclude that H T XR is symmetric To avoid carrying R in our equations we perform the change of variable X XR That is, without loss of generality we assume R = I and that U T X 0 and H T X = X T H Note that for any U obeying dist(u, X) 1 4 X we have (UU T XX T )U UU T XX T U 5 UU 4 X T XX T Using the latter along with the simplifications discussed above, to prove the lemma (ie (55)) it suffices to prove (UU T XX T )U, U X 1 ( UU T XX T 2 ) (U 20 + X)U T 2 σ2 r(x) 4 Equation (5) can equivalently be written in the form U X UU T 5 κ 2 XX T 2 (5) 0 Tr((H T H) 2 + 3H T HH T X + (H T X) 2 + H T HX T X ( ) 1 [ 5 κ 2 (H T H) 2 + 4H T HH T X + 2(H T X) 2 + 2H T HX X] T 1 [ ] (H T H) 2 + 2H T HH T X + H T HX T X 20 σ2 r(x) H T H) 4 Rearranging terms, we arrive at 0 Tr(c 1 (H T H) 2 + c 2 H T HH T X + c 3 (H T X) 2 + c 4 H T HX T X σ2 r(x) H T H) 4 = Tr(( c 2 2 H T H + c 3 H T X) 2 + (c 1 c2 2 )(H T H) 2 + c 4 H T HX T X σ2 r(x) H T H) c 3 4c 3 4 (511) Here the constants c 1, c 2, c 3, c 4 are defined as c 1 = κ 2, c 2 = κ 2, c 3 = κ 2, c 4 = κ 2 Since κ 1 1, it is easy to verify that (a) c i > 0 for i = 1,, 4, and (b) c 1 c2 2 4c 3 To prove (511) it thus suffices to prove, H 2 c 4 1/4 c 2 2 /4c 3 c 1 σ 2 r(x) Using the fact that H 1 4 σ r(x) c 4 1/4 c 2 2 /4c 3 c for κ 1 we see that the proof is complete by assuming 11

12 522 Proof of gradient concentration (Lemma 58) Define := UU T XX T Then, f(u) (U), V = (a) = m A k, A k, V U T E[ A k, A k, V U T ] k=1 m A k, A k, V U T, V U T k=1 (b) UU δ T 4r XX T V U T where (a) follows from our assumption that A is isotropic and (b) follows from Lemma 51, since rank( ) 2r and rank(v U T ) r This proves the first part of the lemma To prove the second part, by the variational form of the robenius norm, we have f(u) (U) = sup f(u) (U), V V R n r, V 1 UU δ T 4r XX T sup V U T The result now follows from V U T V U U 53 Proof of initialization (Equation (33)) V R n r, V 1 Using Lemma 51, we can conclude that ρ(a) from Lemma 52 is bounded by ρ(a) 2δ 4r 1/5 Setting Z 0 = 0 n n and applying Lemma 52 to our initialization iterates, we have that Z τ XX T (1/5) τ XX T (1/5) τ X X rom Lemma 54, we have that dist(u 0, X) 2 Z τ XX T 2(1/5) τ κ X σ r (X) Hence, if we want the RHS to be upper bounded by 1 4 σ r(x), we require 2(1/5) τ κ X 1 4 σ r(x) = (1/5) τ Since X r X, it is enough to require that σ r (X) 4 2κ X τ log( rκ 2 ) + 2 (512) Similarly, it is easy to check that if τ satisfies (512), then Z t XX T Zt XX T σr(x)/4 2, also holds 12

13 Acknowledgements BR is generously supported by ONR awards N and N , NS awards CC and CC , AOSR award A , and a Sloan Research ellowship This research is supported in part by NS CISE Expeditions Award CC , LBNL Award , and DARPA XData Award A , and gifts from Amazon Web Services, Google, SAP, The Thomas and Stacey Siebel oundation, Adatao, Adobe, Apple, Inc, Blue Goji, Bosch, C3Energy, Cisco, Cray, Cloudera, EMC2, Ericsson, acebook, Guavus, HP, Huawei, Informatica, Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware References [AM07] [BD09] D Achlioptas and McSherry ast computation of low-rank matrix approximations Journal of the ACM, 54(2), 2007 T Blumensath and M E Davies Iterative hard thresholding for compressed sensing Applied and Computational Harmonic Analysis, 27(3): , 2009 [Can08] E J Candès The restricted isometry property and its implications for compressed sensing Compte Rendus de l Academie des Sciences, 2008 [CCS] [CLM15] J Cai, E J Candès, and Z Shen A singular value thresholding algorithm for matrix completion SIAM Journal on Optimization, 20(4): , 20 T T Cai, X Li, and Z Ma Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow CoRR, abs/ , 2015 [CLS14a] E J Candès, X Li, and M Soltanolkotabi Phase retrieval from coded diffraction patterns Applied and Computational Harmonic Analysis, 39(2): , 2014 [CLS14b] E J Candès, X Li, and M Soltanolkotabi algorithms CoRR, abs/140765, 2014 [CP11] [CR09] [CT05] Phase retrieval via wirtinger flow: Theory and E J Candès and Y Plan Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements IEEE Transactions on Information Theory, 57(4): , 2011 E J Candès and B Recht Exact matrix completion via convex optimization oundations of Computational Mathematics, 9(6): , 2009 E J Candès and T Tao Decoding by linear programming IEEE Transactions on Information Theory, 51(12): , 2005 [un06] S unk Netflix update: Try this at home, December 2006 [GK09] [Gro11] R Garg and R Khandekar Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property In ICML, 2009 D Gross Recovering low-rank matrices from few coefficients in any basis IEEE Transactions on Information Theory, 57(3): , 2011 [JBAS] M Journée, Bach, P-A Absil, and R Sepulchre Low-rank optimization on the cone of positive semidefinite matrices SIAM Journal on Optimization, 20(5): , 20 [JNS12] P Jain, P Netrapalli, and S Sanghavi Low-rank matrix completion using alternating minimization CoRR, abs/ ,

14 [Kes12] R H Keshavan Efficient algorithms for collaborative filtering PhD thesis, Stanford University, 2012 [KMO] R H Keshavan, A Montanari, and S Oh Matrix completion from a few entries IEEE Transactions on Information Theory, 56(6): , 20 [LRS + ] J Lee, B Recht, N Srebro, J A Tropp, and R Salakhutdinov Practical large-scale optimization for max-norm regularization In NIPS, 20 [MJD09] [NT09] [NV09] [ORS15] R Meka, P Jain, and I S Dhillon Guaranteed rank minimization via singular value projection CoRR, abs/ , 2009 D Needell and J A Tropp CoSaMP: Iterative signal recovery from incomplete and inaccurate samples Applied and Computational Harmonic Analysis, 26(3): , 2009 D Needell and R Vershynin Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit oundations of Computational Mathematics, 9(3): , 2009 S Oymak, B Recht, and M Soltanolkotabi Sharp time-data tradeoffs for linear inverse problems In preparation, 2015 [Rec11] B Recht A simpler approach to matrix completion Journal of Machine Learning Research, 12: , 2011 [RP] B Recht, M azel, and P A Parrilo Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization SIAM Review, 52(3): , 20 [RR13] [RS05] [Ruh74] [Sol14] [SOR14] [TG07] [ZL15] B Recht and C Ré Parallel stochastic gradient algorithms for large-scale matrix completion Mathematical Programming Computation, pages , 2013 J Rennie and N Srebro ast maximum margin matrix factorization for collaborative prediction In ICML, 2005 A Ruhe Numerical computation of principal components when several observations are missing Technical report, University of Umea, Institute of Mathematics and Statistics Report, 1974 M Soltanolkotabi Algorithms and Theory for Clustering and Nonconvex Quadratic Programming PhD thesis, Stanford University, 2014 C De Sa, K Olukotun, and C Ré Global convergence of stochastic gradient descent for some nonconvex matrix problems CoRR, abs/ , 2014 J A Tropp and A C Gilbert Signal recovery from random measurements via orthogonal matching pursuit IEEE Transactions on Information Theory, 53(12): , 2007 Q Zheng and J Lafferty A convergent gradient descent algorithm for rank minimization and semidefinite programming from random linear measurements CoRR, abs/ , 2015 A Proof of Lemma 54 Define H = U XR Similar to the discussion at the beginning of Section 521, without loss of generality we can assume that (a) R = I, (b) U T X 0 and (c) H T X = X T H With these simplifications, establishing the lemma is equivalent to showing that Tr((H T H) 2 + 4H T HH T X + 2(H T X) 2 + 2X T XH T H ηh T H) 0 (A1) 14

15 1 holds with η = 2( We note that 2 1)σr(X) 2 Tr((H T H + 2H T X) 2 + (4 2 2)H T HH T X + 2X T XH T H ηh T H) = Tr((H T H) 2 + 4H T HH T X + 2(H T X) 2 + 2X T XH T H ηh T H) Hence, a sufficient condition for (A1) to hold is (4 2 2)H T X + 2X T X ηi r 0 (A2) Recalling that H T X = U T X X T X, and that U T X 0, we have (4 2 2)H T X + 2X T X ηi r = (4 2 2)U T X + (2 (4 2 2))X T X ηi r Since U T X 0, to show (A2) it suffices to show = (4 2 2)U T X + 2( 2 1)X T X ηi r 2( 2 1)X T X ηi r 0 X T X The RHS trivially holds, concluding the proof B Proof of Lemma 34 rom RIP and the assumption that δ 2r 1/, we have By Weyl s inequalities, this means that Lemma 54 ensures that Z τ XX T Zτ XX T σ 2 r(x) σ r (Z τ ) η 2( 2 1) I r 9 e τ 3 1 dist(u 0, X) Z τ XX T 2 σ r (X) 9 e τ (B1) We can upper bound the RHS by the following chain of inequalities, 3 1 Z τ XX T (a) σ r (X) 2 σ r (X) 9 e τ ( ) (b) σ r (X) 2 σ r (Z) 6 9 e τ (c) σ r (X) 2 6 σ2 r(x) = 1 4 σ r(x) where (a) follows from RIP, (b) follows since e τ 3 20 σ r(z) implies that ( 9 e τ 1 ) 2 σ r (Z) 6 9 e τ, and (c) follows by (B1) 15

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow Low-rank Solutions of Linear Matrix Equations via Procrustes low Stephen Tu, Ross Boczar, Max Simchowitz {STEPHENT,BOCZAR,MSIMCHOW}@BERKELEYEDU EECS Department, UC Berkeley, Berkeley, CA Mahdi Soltanolkotabi

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Isometric sketching of any set via the Restricted Isometry Property

Isometric sketching of any set via the Restricted Isometry Property Isometric sketching of any set via the Restricted Isometry Property Samet Oymak Benjamin Recht Mahdi Soltanolkotabi October 015; Revised March 016 Abstract In this paper we show that for the purposes of

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach Dohuyng Park Anastasios Kyrillidis Constantine Caramanis Sujay Sanghavi acebook UT Austin UT Austin UT Austin Abstract

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

Symmetric Factorization for Nonconvex Optimization

Symmetric Factorization for Nonconvex Optimization Symmetric Factorization for Nonconvex Optimization Qinqing Zheng February 24, 2017 1 Overview A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Low-Rank Matrix Recovery

Low-Rank Matrix Recovery ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization Forty-Fifth Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 26-28, 27 WeA3.2 Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization Benjamin

More information

Recovering any low-rank matrix, provably

Recovering any low-rank matrix, provably Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix

More information

Guaranteed Rank Minimization via Singular Value Projection

Guaranteed Rank Minimization via Singular Value Projection 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 Guaranteed Rank Minimization via Singular Value Projection Anonymous Author(s) Affiliation Address

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

Information-Theoretic Limits of Matrix Completion

Information-Theoretic Limits of Matrix Completion Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles SIAM Student Research Conference Deanna Needell Joint work with Roman Vershynin and Joel Tropp UC Davis, May 2008 CoSaMP: Greedy Signal

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis Oct.18th, 2016 Shuyang Ling (UC Davis) 16w5136, Oaxaca, Mexico Oct.18th, 2016

More information

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp CoSaMP Iterative signal recovery from incomplete and inaccurate samples Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Joint with D. Needell

More information

The convex algebraic geometry of rank minimization

The convex algebraic geometry of rank minimization The convex algebraic geometry of rank minimization Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology International Symposium on Mathematical Programming

More information

Matrix Completion: Fundamental Limits and Efficient Algorithms

Matrix Completion: Fundamental Limits and Efficient Algorithms Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example

More information

Low Rank Matrix Completion Formulation and Algorithm

Low Rank Matrix Completion Formulation and Algorithm 1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9

More information

arxiv: v1 [math.st] 10 Sep 2015

arxiv: v1 [math.st] 10 Sep 2015 Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and

More information

Matrix Completion from a Few Entries

Matrix Completion from a Few Entries Matrix Completion from a Few Entries Raghunandan H. Keshavan and Sewoong Oh EE Department Stanford University, Stanford, CA 9434 Andrea Montanari EE and Statistics Departments Stanford University, Stanford,

More information

Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material

Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material Prateek Jain Microsoft Research Bangalore Bangalore, India prajain@microsoft.com Raghu Meka UT Austin Dept. of Computer

More information

Nonconvex Methods for Phase Retrieval

Nonconvex Methods for Phase Retrieval Nonconvex Methods for Phase Retrieval Vince Monardo Carnegie Mellon University March 28, 2018 Vince Monardo (CMU) 18.898 Midterm March 28, 2018 1 / 43 Paper Choice Main reference paper: Solving Random

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Multipath Matching Pursuit

Multipath Matching Pursuit Multipath Matching Pursuit Submitted to IEEE trans. on Information theory Authors: S. Kwon, J. Wang, and B. Shim Presenter: Hwanchol Jang Multipath is investigated rather than a single path for a greedy

More information

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Alp Yurtsever (EPFL),

More information

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

From Compressed Sensing to Matrix Completion and Beyond. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison From Compressed Sensing to Matrix Completion and Beyond Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison Netflix Prize One million big ones! Given 100 million ratings on a

More information

Self-Calibration and Biconvex Compressive Sensing

Self-Calibration and Biconvex Compressive Sensing Self-Calibration and Biconvex Compressive Sensing Shuyang Ling Department of Mathematics, UC Davis July 12, 2017 Shuyang Ling (UC Davis) SIAM Annual Meeting, 2017, Pittsburgh July 12, 2017 1 / 22 Acknowledgements

More information

A new method on deterministic construction of the measurement matrix in compressed sensing

A new method on deterministic construction of the measurement matrix in compressed sensing A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Fast recovery from a union of subspaces

Fast recovery from a union of subspaces Fast recovery from a union of subspaces Chinmay Hegde Iowa State University Piotr Indyk MIT Ludwig Schmidt MIT Abstract We address the problem of recovering a high-dimensional but structured vector from

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Matrix Factorizations: A Tale of Two Norms

Matrix Factorizations: A Tale of Two Norms Matrix Factorizations: A Tale of Two Norms Nati Srebro Toyota Technological Institute Chicago Maximum Margin Matrix Factorization S, Jason Rennie, Tommi Jaakkola (MIT), NIPS 2004 Rank, Trace-Norm and Max-Norm

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

arxiv: v3 [math.oc] 19 Oct 2017

arxiv: v3 [math.oc] 19 Oct 2017 Gradient descent with nonconvex constraints: local concavity determines convergence Rina Foygel Barber and Wooseok Ha arxiv:703.07755v3 [math.oc] 9 Oct 207 0.7.7 Abstract Many problems in high-dimensional

More information

Nonconvex Matrix Factorization from Rank-One Measurements

Nonconvex Matrix Factorization from Rank-One Measurements Yuanxin Li Cong Ma Yuxin Chen Yuejie Chi CMU Princeton Princeton CMU Abstract We consider the problem of recovering lowrank matrices from random rank-one measurements, which spans numerous applications

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

arxiv: v1 [stat.ml] 1 Mar 2015

arxiv: v1 [stat.ml] 1 Mar 2015 Matrix Completion with Noisy Entries and Outliers Raymond K. W. Wong 1 and Thomas C. M. Lee 2 arxiv:1503.00214v1 [stat.ml] 1 Mar 2015 1 Department of Statistics, Iowa State University 2 Department of Statistics,

More information

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso Rahul Garg IBM T.J. Watson research center grahul@us.ibm.com Rohit Khandekar IBM T.J. Watson research center rohitk@us.ibm.com

More information

Matrix Completion from Noisy Entries

Matrix Completion from Noisy Entries Matrix Completion from Noisy Entries Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh Abstract Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations

More information

Global Optimality in Low-rank Matrix Optimization

Global Optimality in Low-rank Matrix Optimization Global Optimality in Low-rank Matrix Optimization Zhihui Zhu, Qiuwei Li, Gongguo Tang, and Michael B. Wakin arxiv:7.7945v3 cs.it] Mar 8 Abstract This paper considers the minimization of a general objective

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

COMPRESSED Sensing (CS) is a method to recover a

COMPRESSED Sensing (CS) is a method to recover a 1 Sample Complexity of Total Variation Minimization Sajad Daei, Farzan Haddadi, Arash Amini Abstract This work considers the use of Total Variation (TV) minimization in the recovery of a given gradient

More information

Linear dimensionality reduction for data analysis

Linear dimensionality reduction for data analysis Linear dimensionality reduction for data analysis Nicolas Gillis Joint work with Robert Luce, François Glineur, Stephen Vavasis, Robert Plemmons, Gabriella Casalino The setup Dimensionality reduction for

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

arxiv: v2 [cs.lg] 6 May 2017

arxiv: v2 [cs.lg] 6 May 2017 arxiv:170107895v [cslg] 6 May 017 Information Theoretic Limits for Linear Prediction with Graph-Structured Sparsity Abstract Adarsh Barik Krannert School of Management Purdue University West Lafayette,

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

COR-OPT Seminar Reading List Sp 18

COR-OPT Seminar Reading List Sp 18 COR-OPT Seminar Reading List Sp 18 Damek Davis January 28, 2018 References [1] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht. Low-rank Solutions of Linear Matrix Equations via Procrustes

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Yuejie Chi, Yuanxin Li, Huishuai Zhang, and Yingbin Liang Abstract Recent work has demonstrated the effectiveness

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Two Results on the Schatten p-quasi-norm Minimization for Low-Rank Matrix Recovery

Two Results on the Schatten p-quasi-norm Minimization for Low-Rank Matrix Recovery Two Results on the Schatten p-quasi-norm Minimization for Low-Rank Matrix Recovery Ming-Jun Lai, Song Li, Louis Y. Liu and Huimin Wang August 14, 2012 Abstract We shall provide a sufficient condition to

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

We describe the generalization of Hazan s algorithm for symmetric programming

We describe the generalization of Hazan s algorithm for symmetric programming ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,

More information

Collaborative Filtering Matrix Completion Alternating Least Squares

Collaborative Filtering Matrix Completion Alternating Least Squares Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016

More information

Dropping Convexity for Faster Semi-definite Optimization

Dropping Convexity for Faster Semi-definite Optimization JMLR: Workshop and Conference Proceedings vol 49: 53, 06 Dropping Convexity for Faster Semi-definite Optimization Srinadh Bhojanapalli Toyota Technological Institute at Chicago Anastasios Kyrillidis Sujay

More information

Sparse and Low Rank Recovery via Null Space Properties

Sparse and Low Rank Recovery via Null Space Properties Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,

More information

Research Statement Qing Qu

Research Statement Qing Qu Qing Qu Research Statement 1/4 Qing Qu (qq2105@columbia.edu) Today we are living in an era of information explosion. As the sensors and sensing modalities proliferate, our world is inundated by unprecedented

More information

Jointly Low-Rank and Bisparse Recovery: Questions and Partial Answers

Jointly Low-Rank and Bisparse Recovery: Questions and Partial Answers Jointly Low-Rank and Bisparse Recovery: Questions and Partial Answers Simon Foucart, Rémi Gribonval, Laurent Jacques, and Holger Rauhut Abstract This preprint is not a finished product. It is presently

More information

Greedy Sparsity-Constrained Optimization

Greedy Sparsity-Constrained Optimization Greedy Sparsity-Constrained Optimization Sohail Bahmani, Petros Boufounos, and Bhiksha Raj 3 sbahmani@andrew.cmu.edu petrosb@merl.com 3 bhiksha@cs.cmu.edu Department of Electrical and Computer Engineering,

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional

More information

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Deanna Needell and Roman Vershynin Abstract We demonstrate a simple greedy algorithm that can reliably

More information

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Noisy Signal Recovery via Iterative Reweighted L1-Minimization Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.

More information

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization Xingguo Li Jarvis Haupt Dept of Electrical and Computer Eng University of Minnesota Email: lixx1661, jdhaupt@umdedu

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

Error Correction via Linear Programming

Error Correction via Linear Programming Error Correction via Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles,

More information

arxiv: v2 [stat.ml] 27 Sep 2016

arxiv: v2 [stat.ml] 27 Sep 2016 Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach arxiv:1609.030v [stat.ml] 7 Sep 016 Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yingbin Liang Department of EECS Syracuse

More information

On the l 1 -Norm Invariant Convex k-sparse Decomposition of Signals

On the l 1 -Norm Invariant Convex k-sparse Decomposition of Signals On the l 1 -Norm Invariant Convex -Sparse Decomposition of Signals arxiv:1305.6021v2 [cs.it] 11 Nov 2013 Guangwu Xu and Zhiqiang Xu Abstract Inspired by an interesting idea of Cai and Zhang, we formulate

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Universal low-rank matrix recovery from Pauli measurements

Universal low-rank matrix recovery from Pauli measurements Universal low-rank matrix recovery from Pauli measurements Yi-Kai Liu Applied and Computational Mathematics Division National Institute of Standards and Technology Gaithersburg, MD, USA yi-kai.liu@nist.gov

More information

arxiv: v1 [math.na] 26 Nov 2009

arxiv: v1 [math.na] 26 Nov 2009 Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Recovery of Simultaneously Structured Models using Convex Optimization

Recovery of Simultaneously Structured Models using Convex Optimization Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)

More information

ORTHOGONAL matching pursuit (OMP) is the canonical

ORTHOGONAL matching pursuit (OMP) is the canonical IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010 4395 Analysis of Orthogonal Matching Pursuit Using the Restricted Isometry Property Mark A. Davenport, Member, IEEE, and Michael

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information