arxiv: v2 [stat.ml] 22 Nov 2016

Size: px
Start display at page:

Download "arxiv: v2 [stat.ml] 22 Nov 2016"

Transcription

1 Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro actorization and Gradient Descent arxiv: v [stat.ml] Nov 016 Qinqing Zheng John Lafferty University of Chicago November 3, 016 Abstract We address the rectangular matrix completion problem by lifting the unknown matrix to a positive semidefinite matrix in higher dimension, and optimizing a nonconvex objective over the semidefinite factor using a simple gradient descent scheme. With Oµr κ n maxµ, log n random observations of a n 1 n µ-incoherent matrix of rank r and condition number κ, where n = maxn 1, n, the algorithm linearly converges to the global optimum with high probability. 1 Introduction A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling large scale problems in machine learning, signal processing, and convex programming. This work is developing techniques that help to explain the surprising effectiveness of relatively simple first-order algorithms for certain nonconvex optimizations. When applied to problems that can be formulated as semidefinite programs, these techniques can often be viewed as part of a framework proposed by Burer and Monteiro [4]. The Burer- Monteiro technique is based on factoring the semidefinite variable, and applying classical optimization techniques to the resulting nonconvex objective over the factor. While worst-case complexity considerations imply that such an approach cannot succeed in general, a series of recent papers [11, 40, 35, 13, 1] has shown the strategy to be remarkably effective for a number of problems of practical interest, with analytical convergence guarantees and strong empirical performance. In this paper, we enlarge the collection of problems to which the Burer-Monteiro technique can be successfully applied, by analyzing the convergence properties of gradient descent applied to the problem of rectangular matrix completion from incomplete measurements. The standard matrix completion problem asks for the recovery of a low rank matrix X R n 1 n given only a small fraction of observed entries. Let Ω be the set of m indices of the observed entries. ixing a target rank r minn 1, n, the natural, but nonconvex objective is min X R n 1 n rankx subject to X ij = X ij, i, j Ω. 1 1

2 In order for this problem to be well-posed, it is important to understand when X is identifiable and, in particular, the unique minimizer of 1. Moreover, because the problem is in general NPhard, it is essential to identify tractable families of instances, together with efficient algorithms having global convergence guarantees. In the current work, we apply the factorization technique by lifting the matrix X to a positive semidefinite matrix Y R n 1+n n 1 +n in higher dimension. Lifting is an established method that recasts vector or matrix estimation problems in terms of positive semidefinite matrices with special structure. It has been applied to sparse eigenvector approximation [14] and phase retrieval [10], where the lifted matrix is of rank one. As explained in detail below, we can construct Y to be of the same rank as X, thus obtaining a factorization Y = Z Z for some Z R n 1+n r, and transforming the original matrix completion problem into the problem of recovering the semidefinite factor Z. We formulate this as minimizing a nonconvex objective fz, to which we apply a gradient descent scheme, using a particular spectral initialization. Our analysis of this algorithm establishes a lower bound on the number of matrix measurements that are sufficient to guarantee identifiability of the true matrix and geometric convergence of the gradient descent algorithm, with explicit bounds on the rate. In the following section we give a full description of our approach. Our theoretical results are presented in Section 3, with detailed proofs contained in the appendix. Our analysis subsumes the case where X is positive semidefinite. In Section 4 we briefly review related work. The experimental results are presented in Section 5, and we conclude with a brief discussion of future work in Section 6. Semidefinite Lifting, actorization, and Gradient Descent or any n 1 + n r matrix Z, we will use Z i to denote its ith row, and Z U and Z V to denote the top n 1 and bottom n rows. The operator, robenius andl norm of matrices are denoted by, and, respectively. We define Z, = max Zi i as the largest l norm of its rows, and similarly Z, = max { Z,, Z },. Let P Ω : R n 1 n R n 1 n be the operator where P Ω X ij = { X ij if i, j Ω, 0 otherwise. In this paper, we focus on completing an incoherent or non-spiky matrix X. With U Σ V denoting the rank-r SVD of X, we assume X is µ-incoherent, as defined below. Definition 1. The matrix X is µ-incoherent with respect to the canonical basis if its singular vectors satisfy µr µr U,, V n,, 3 1 n where µ is a constant. 1 1 Note that µ 1, since r = U = i [n 1] U i µr.

3 Our main interest is the uniform model where m entries of X are observed uniformly at random, though we shall analyze a Bernoulli sampling model, where each entry of X is observed with probability p = m/n 1 n. One can transfer the results back to the uniform model, as the probability of failure under the uniform model is at most twice that under the Bernoulli model; see [8, 9]. Using the rank-r SVD of X, we can lift X to [ ] [ ] U Y = Σ U X U X V Σ V = Z Z, where Z = V Σ 1. 4 The symmetric decomposition of Y is not unique; our goal is to find a matrix in the set { S = Z R n 1 +n r Z } = Z R for some R with RR = R R = I, 5 since for any Z S we have X = Z U Z V. Let Ω denote the corresponding observed entries of Y, and consider minimization of the squared error 1 min Z p i,j Ω ZZij Yij 1 = min P Ω ZZ Y Z p. 6 Note that Y is not the unique minimizer of 6, nor is it the only possible positive semidefinite lifting of X. or example, let P be an r r nonsingular matrix, and form the matrices [ ] [ ] Z U = Σ 1 P Y U = Σ 1 P Σ 1 U X. 7 V Σ 1 P 1 X V Σ 1 P Σ 1 V Since Ω does not contain any entry in the top-left or bottom-right block, Y is also a minimizer of 6. Thus, the solution set of the lifted problem is much larger than the set S of actual interest. or the sake of simple analysis, we shall focus on exact recovery of Y only, and thus impose an additional regularizer to align the column spaces of Z U and Z V, as in [35]. The regularized loss is fz = 1 PΩ ZZ Y p + λ Z DZ [ ] 4, where D = In I n While this apparently introduces an extra tuning parameter, our analysis establishes linear convergence of the projected gradient descent algorithm when λ = 1, and thus one may treat λ as a fixed number. It is discussed in [13] that one needs to ensure the iterates stay incoherent. Let C be the set of incoherent matrices { µr C = Z : Z, } Z 0 9 n 1 n where we assume µ is known and Z 0 will be determined. 3

4 Our algorithm is simply gradient descent on fz, with projection onto C. Let M = p 1 P Ω UV X. Then the gradient of f is given by [ ] 0 M fz = M Z + λdzz DZ The projection P C to the feasible set C has closed form solution, given by row-wise clipping: Z i if Z i µr n 1 n Z 0, P C Z i = Z i µr n 1 n Z 0 otherwise. Z i 11 Note that X 0 p 1 P Ω X is an unbiased estimator of X under the Bernoulli model. To initialize, we thus construct Z 0 from the top rank-r factors of X 0. This leads to the following algorithm. Algorithm 1: Projected gradient descent for matrix completion input: Ω, { X ij : i, j Ω }, m, n 1, n, r, λ, η initialization p = m/n 1 n U 0 Σ 0 V 0 = rank-r SVD of p 1 P Ω X Z 0 = [U 0 Σ 0 1 ; V 0 Σ 0 1 ] Z 1 = P C Z 0 k 1 repeat M k = p 1 P Ω ZU k Zk V X [ ] 0 M k fz k = Z k + λdz k Z k DZ k. M k 0 Z k+1 = P C Z k k k + 1 until convergence; η Z 0 fzk output: Ẑ = Zk, X = Z k U Zk V. Remarks. i The step size η is normalized by Z 0. Our analysis will establish linear convergence when taking step sizes of the form η/σ1, where η is a sufficiently small constant. We replace σ1 by Z 0 in the actual algorithm since it is unknown in practice. ii The feasible set 9 depends on Z 0 as well. Under the above spectral initialization, our analysis shows that when p Oµκ r µr log n/n 1 n, the term n 1 n Z 0 is an upper bound of Z, with high probability see Corollary 1 below. This means S is a subset of C. Note that this does not change the global optimality of Z and its equivalent elements, since fz = 0. In practice, we find 4

5 that the iterates of our algorithm remain incoherent, so that one may drop the projection step. iii The column space regularizer 8 is needed in our analysis. We also found that when λ = 0, our algorithm typically converges to another PSD lifted matrix of X, with minor difference from Y in the top-left and bottom-right blocks. In the following section we state and sketch a proof of our main convergence result for this algorithm. 3 Main Result: Convergence Analysis Theorem 1. Suppose that X is of rank r, with condition number κ = σ 1/σ r, and µ-incoherent as defined in Definition 1. Suppose further that we observe m entries of X chosen uniformly at random. Let Y = Z Z be the lifted matrix as in 4 and write n = maxn 1, n. Then there exist universal constants c 0, c 1, c, c 3 such that if m c 0 µr κ maxµ, log nn, 1 then with probability at least 1 c 1 n c the iterates of Algorithm 1 converge to Z geometrically, when using regularization parameter λ = 1/, correctly specified input rank r, and constant step size η/σ 1 with η c 3 /µ r κ. We shall analyze the Bernoulli sampling model, as justified in Section. Let us define the distance to Z in terms of the solution set S. Definition. Define the distance between Z and Z as dz, Z = min Z Z = min Z Z R. Z S RR =R R=I The next theorem establishes the global convergence of Algorithm 1, assuming that the input rank is correctly specified. The proof sketch is given in the next subsection. Theorem. There exist universal constants c 0, c 1, c such that if p c 0µr κ log n, with probability at least 1 c 1 n c n 1 n, the initialization Z 1 C satisfies dz 1, Z 1 4 σ r. 13 Moreover, there exist universal constants c 3, c 4, c 5, c 6 such that if p c 3 maxµr κ, µr log n, n 1 n when using constant step size η/σ1 with η c 4 µ r κ and initial value Z1 C obeying 13, the kth step of Algorithm 1 with λ = 1/ satisfies dz k, Z η k/ σ r κ with probability at least 1 c 5 n c 6. 5

6 Remarks. i After each update, the distance of our iterates to Z is reduced by at least a factor of 1 O1/µ r κ. ii Hence, the output Ẑ satisfies dẑ, Z ε after at most log 1 1/1 99 iterations. 3.1 Proof Sketch 56 η κ log σ r/4ε Our proof idea is of the same nature as the analysis in [11, 40]. We show two appealing properties when sufficient entries are observed. irst, our spectral initialization produces a starting point within the Oσ r neighborhood of the solution set. then with prob- Lemma 1. There exist universal constants c, c 1, c, such that if p cµr κ log n n 1 n ability at least 1 c 1 n c, dz 1, Z dz 0, Z 1 σ 4 r. To demonstrate this, we exploit the concentration around the mean of p 1 P Ω X. See Appendix B for the proof. Using this lemma, we can immediately show that Z and all other elements of S are contained in the feasible set 9. Corollary 1. With probability at least 1 c 1 n c, Z µr, n 1 n Z 0. The second crucial property is that fz is well-behaved within the O σ r neighborhood, so that the iterates move closer to the optima in every iteration. The key step is to set up a local regularity condition [11] similar to Nesterov s conditions [8]. Definition 3. Let Z = arg min Z S Z Z denote the matrix closest to Z in the solution set. We say that f satisfies the regularity condition RCε, α, β if there exist constants α, β such that for any Z C satisfying dz, Z ε, we have fz, Z Z 1 α σ r Z Z + 1 fz βσ1. Using this condition, one can show the iterates converge linearly to the optima if we start close enough to Z. Lemma. Consider the update Z k+1 = P C Z k µ fz k. If f satisfies RCε, α, β, σ1 dz k, Z ε and 0 < µ minα/, /β, then dz k+1, Z 1 µ ακ dzk, Z. 6

7 The following lemma illustrates the local regularity of fz. Nesterov s criterion is established upon strong convexity and strong smoothness of the objective. Here we show analogous curvature and smoothness conditions holds for fz locally within the O σr neighborhood with high probability. Interestingly, we found that to show the local curvature condition holds, it suffices to set λ = 1. The proof can be found in Appendix C, for which we have generalized some technical lemmas of [13]. Lemma 3. Let the regularization constant be set to λ = 1. There exists universal constant c, c 1, c, such that if p c maxµ r κ, µr log n, then f satisfies RC 1 4 σ n 1 n r, 51/99, 13196µ r κ, with probability at least 1 c 1 n c. 4 Related Work Matrix completion is one instance of the general low rank linear inverse problem find X of minimum rank such that AX = b, 14 where A is an affine transformation and b = AX is the measurement of the ground truth X. Considerable progress has been made towards algorithms for recovering X including both convex and nonconvex approaches. One of the most popular methods is nuclear norm minimization, a convenient convex relaxation of rank minimization. It was first proposed in [15, 9], and analyzed under a certain restricted isometry property RIP. Subsequent work clarified the conditions for reconstruction, and studied recovery guarantees for both exact and approximately low rank matrices, with or without noise [8, 9, 7, 1]. One significant advantage for this approach is its near-optimal sample complexity. Under the same incoherence assumption as ours, Chen [1] establishes the currently best-known lower bound of Oµrn log n samples. Using a closely related notion of incoherence, Negahban and Wainwright [7] show that if X is α-nonspiky with X X α n1 n, then Oα rn log n samples are sufficient for exact recovery. However, convexity and low sample complexity aside, in practice the power of nuclear norm relaxation is limited due to high computational cost. The popular algorithms for nuclear norm minimization are proximal methods that perform iterative singular value thresholding [5, 34]. However, such algorithms don t scale to large instances because the per-iteration SVD is expensive. Another popular convex surrogate for the rank function is the max-norm [31, 17], given by X = min X=UV U, V,. or certain types of problems, the max-norm offers better generalization error bounds than the nuclear norm [30]. But practically solving large scale problems that incorporate the max-norm is also non-trivial. In 010, Lee et al. [5] rephrased the max-norm constrained problem as an SDP, and applied Burer-Monteiro factorization. Although this ends up with an l, constraint similar to ours 9, we emphasize that the constraint plays a different role in our setting. While [31, 5] use it to promote low rank solutions, our purpose is to enforce incoherent solutions; and experimental results suggest that one can drop it. Moreover, the convergence of projected gradient descent for this problem was not previously understood. 7

8 In a parallel line of work, the problem of developing techniques that exactly solve nonconvex formulations has attracted significant recent research attention. In chronological order, Keshavan et al. [3] proposed a manifold gradient method for matrix completion. They factorize X = U Σ V, where U R n1 r, U U = n 1 I n1 and V R n r, V V = n I n. Similar to our definition of S, the equivalence classes of U and V are Grassmann manifolds of r dimensional subspaces. The authors then minimize the nonconvex objective U, V = min S R r r PΩ USV X over the manifolds. In each iteration, U and V are updated along their manifold gradients, followed by the update of the optimal scaling matrix S. This algorithm can exactly exactly reconstruct the matrix, though the convergence rate is unknown. However, its per-iteration update also has high computational complexity, see Section 5 for details. There are other manifold optimization methods for matrix completion including [, 6, 36]. In the same year, Jain et al. [1] suggested minimizing the squared residual AX b under a rank constraint rankx r. While this constraint is nonconvex, projection onto the feasible set can be computed using low rank SVD. Under certain RIP assumption on A, Jain et al. establish the global convergence of projected gradient descent for 14. This algorithm is named Singular Value Projection SVP. Yet in the setting of completion, only experimental support for the effectiveness of SVP is provided. More importantly, SVP also suffers from expensive per-iteration SVD for large scale problems. Keshavan [4], Jain et al. [] further analysed the alternating minimization procedure for 14. AltMin factorizes X = UV where U R n1 r and V R n r, and alternately solves AUV b over U and V, while fixing the other factor. The authors obtain sample complexity bounds with rκ 8, r 7 κ 6 dependency, respectively. In 014, Hardt [19] improved the bounds to r κ. Notably, all these works assume the use of resampling independent sequences of samples Ω k, k = 1,,.... In other words, in every iteration we can sample the true matrix under a certain Bernoulli model independently. However, in practice Ω is usually given and fixed. To get around the dependence on the sample sets, they partition Ω into a predefined number of subsets of equal size. However, sample sets obtained by partitioning are not independent, and partitioning, if used in practice, does not make the most efficient use of the data. Thus, Hardt and Wootters [0] considered a new resampling scheme. They assume a known generative model of {Ω k }, where each Ω k is obtained under a Bernoulli model with probability p k, p = k p k and Ω = k Ω k. While not practical, under this assumption the authors obtain a sample complexity that is logarithmic in κ. Another theoretical disadvantage of the resampling scheme is that the sample complexity depends on the desired accuracy ε, as established by [4,, 19, 0]. As the accuracy goes to zero, the sample complexity increases. In contrast, our algorithm doesn t require resampling, and the sample complexity is independent of ε. In 014, Candès et al. [11] proposed Wirtinger flow for phase retrieval. Wirtinger flow is a fast first-order algorithm that minimizes a fourth order nonconvex objective, geometrically converging to the global optimum. While previous work [10, 6, 7] lifts the phase retrieval problem into an SDP where the solution is rank one, this work bridges SDP and first-order algorithms via the Burer-Monteiro technique. It has inspired further research on related topics; last year, the authors of [40, 35, 1, 13] considered factorizations for 14, assuming X is semidefinite, and proved global optimality of first-order algorithms under appropriate initializations. Tu et al. 8

9 [35] have extended this algorithm to handle rectangular matrix via asymmetric factorization, and have shown exact recovery of X, assuming A satisfies a certain RIP. They use lifting implicitly, factorizing X = Z U ZV and applying gradient updates on both factors Z U and Z V simultaneously, with the nonconvex objective function gz U, Z V = 1 P Ω Z U ZV X p + λ Z U Z U ZV Z V Their proof strategy also shows convergence of Z in the lifted space. or the specific case of matrix completion, Chen and Wainwright [13] obtained guarantees when X is semidefinite. Our work generalizes the results obtained in [35, 13], extending the recent literature on first-order algorithms for factorized models. After completing this work we learned of independent research of Sun and Luo [33], who also analysed a gradient algorithm for rectangular matrix completion. Their formulation is similar to ours, with additional robenius norm constraints on the factors. The authors established a sample complexity of Or 7 κ 6 observations; in comparison our bound scales as Or κ. The authors also analyzed block coordinate descent type alternating minimization, which cyclically updates the rows of U and then the rows of V, showing exact recovery of this algorithm without resampling. Recent independent work of Yi et al. [38] analyzes a gradient scheme for Robust PCA. Under the setting of partial observation without corruption, this is the standard matrix completion problem. In other related work, [39, 37] also study nonconvex optimization methods for matrix completion, using algorithms that still require low rank SVD in each iteration. 5 Experiments We conduct experiments on synthetic datasets to support our analytical results. As the column space regularizer and incoherence constraint of our gradient method GD are merely for analytical purpose, we drop them in all the experiments; simply optimize the l loss 1 PΩ ZZ Y. We compare GD with SVP, OptSpace, nuclear norm minimization nuclear and trust region methods on Riemannian manifolds trustregion. or nuclear, we rescale the standard objective to be 1 min X λ P ΩX X + X, 16 where λ = 0 will enforce the minimizer fitting the observed values exactly. We use ADMM to solve 16. It is based on the algorithm for the matrix approach in [34], and can neatly handle the case λ = 0. We emphasize there is no computational difference between cases whether λ is zero or not. All methods are implemented in MATLAB. We use the toolbox Manopt for trustregion [3] and the implementation of OptSpace from the authors. or AltMin, we use the same sample sets in every iteration. The experiments were run on a Linux machine with a 3.4GHz Intel Core i7 processor and 8 GB memory. Computational Complexity Table 1 summarizes the per-iteration complexity of all the methods for completing a n n matrix. Since M k is a sparse matrix with m nonzero entries, and we have 9

10 dropped the regularizer and constraint, our method GD only needs mr + m + n r operations to compute the gradient, and 4nr operations to update the iterate. The computation of nuclear is dominated by singular value thresholding and updating the objective value, which require the On 3 cost full SVD. Similarly, SVP needs On r operations to compute the rank-r SVD for low rank projection. or OptSpace, Omr + n r + nr operations are needed to compute the manifold gradient and line search. The most expensive part is to determine the optimal scaling matrix S R r r, which boils down to solving a r by r dense linear system. In total Omr 3 + n r + nr 4 + r 6 operations are used to construct and solve this system. or AltMin, in every iteration we have to solve n 1 +n linear systems of size r r. See [3] for the exact formulation. The time cost for each iteration is Omr. One can see that GD reduces the computation than the others. Though the dominating terms for SVP and GD are in the same order, in practice the partial SVD are more expensive than the gradient update, especially on large instances. Method Complexity GD mr + m + n r + 4nr SVP On r OptSpace Omr 3 + n r + nr 4 + r 6 nuclear On 3 AltMin Omr Table 1: Per-iteration complexities. Runtime Comparison We randomly generated a true matrix X of size and rank 3. It is constructed from the rank-3 SVD of a random matrix with i.i.d normal entries. We sampled m = entries of X uniformly at random, where m is roughly equal to nr log n with n = 4000 and r = 3. or simplicity, we feed SVP, OptSpace and GD with the true rank. or all these methods, we use the randomized algorithm of Halko et al. [18] to compute the low rank SVD, which is approximately 15 times faster than MATLAB built-in SVD on instances of such size. We report relative error measured in the robenius norm, defined as X X / X. or nuclear, we set λ = 0 to enforce exact fitting. The convergence speed of ADMM mildly depends on the choice of penalty parameter. We tested 5 values 0.1, 0., 0.5, 1, 1.5 and selected 0., which leads to fastest convergence. Similarly, for SVP, we would like to choose the largest step size for which the algorithm is converging. We evaluated 15, 0, 30, 35, 40 and selected 30. The step size is chosen for GD in the same way. ive values 0, 50, 70, 75, 80 are tested for η and we picked 70. or OptSpace, we compared fixed step sizes with line search, and found the algorithm converged fastest under line search. igure 1a shows the results. GD is slightly slower than trustregion and faster than competing approaches. To further illustrate how runtime scales as the dimension increases, we run larger instances of size and , where the true rank is 40. The parameters are selected in the same manner, and we terminate the computation once the relative error is below 1e 9. We report the results of AltMin GD, SVP and trustregion in igure a; nuclear, OptSpace do not 10

11 10 0 GD SVP OptSpace nuclear altmin trust region relative error runtime seconds a b igure 1: a Runtime comparison where X is and of rank entries are observed. b Magnified plots to compare other methods except nuclear. scale well to such sizes so that we didn t include them. The runtime of AltMin scales the slowest, while the runtimes of GD and trustregion increase slower than SVP. Sample Complexity We evaluate the number of observations required by GD for exact recovery. or simplicity, we consider square but asymmetric X. We conducted experiments in 4 cases, where the randomly generated X is of size or , and of rank 10 or 0. In each case, we compute the solutions of GD given m random observations, and a solution with relative error below 1e 6 is considered to be successful. We run 0 trials and compute the empirical probability of successful recovery. The results are shown in igure b. or all four cases, the phase transitions occur around m 3.5nr. This suggests that the actual sample complexity of GD may scale linearly with both the dimension n and the rank r. 6 Conclusion We propose a lifting procedure together with Burer-Monteiro factorization and a first-order algorithm to carry out rectangular matrix completion. While optimizing a nonconvex objective, we establish linear convergence of our method to the global optimum with Oµr κ n maxµ, log n random observations. We conjecture that Onr observations are sufficient for exact recovery, and that the column space regularizer can be dropped. We provide empirical evidence showing this simple algorithm is fast and scalable, suggesting that lifting techniques may be promising for much more general classes of problems. 11

12 runtime seconds AltMin Trust Region GD SVP 4k x k rank 3 10k x 5k rank 40 a 0k x 5k rank 40 empirical probability of successful recovery n = 500, r = 10 n = 500, r = 0 n = 1000, r = 10 n = 1000, r = m/nr b igure : a Runtime growth of AltMin, trustregresion, GD and SVP. b Sample complexity of gradient scheme. Acknowledgements The authors thank Rina oygel Barber, Yudong Chen and Ruoyu Sun for helpful comments. Research supported in part by ONR grant N and NS grant DMS A Technical Lemmas Another way of writing the objective function is fz = 1 p m l=1 Al, ZZ λ b l + Z DZ 4, where l is an index of Ω, A l is a matrix with 1 at the corresponding observed entry and 0 elsewhere. Let H = Z Z, the gradient can be written as fz = 1 p = 1 p m l=1 m l=1 {}}{ Al, ZZ b l Al + A l Z + λ DZ Z DZ A l, HZ + ZH + HH A l + A l Z + H + λγ. Γ 1

13 We will use the following facts throughout the proof: Z, = Z, µr n 1 n σ 1, 17 µr H, 3 σ n 1 n 1, 18 A l + A l B, C = A l, BC + CB, 19 Z Z is positive semidefinite, H Z is symmetric. 0 Inequality 17 is a direct result of Definition 1. To see 18, note that H, Z, + Z, µr n 1 n σ 1 + µr n 1 n σ1, and σ 1 σ σ 1 by the discussion of initialization in Appendix B. or 0, it holds that arg min Z Z R = AB, RR =R R=I where AΛB is the SVD of Z Z. Clearly, Z Z is positive semidefinite, and H Z = Z Z Z Z = BΛB Z Z is symmetric. Next, we list several technical lemmas that are utilized later. We will use c to denote a numerical constant, whose value may vary from line to line. [ ] [ 1 ] ZU UΣ R Lemma 4. or any Z of the form Z = =, where U, V, R are unitary matrices and Z V V Σ 1 R Σ 0 is a diagonal matrix, we have ZZ Z Z UΣV U Σ V. Proof. Recall that [ ] [ ] Z Z = U U = Σ 1 V Σ 1 Z V where X = U Σ V. We have ZZ Z Z = UΣU U Σ U + V ΣV V Σ V + UΣV U Σ V, 1 and UΣU U Σ U + V ΣV V Σ V. = Σ + Σ Σ, U U Σ U U + V V Σ V V 13

14 We can obtain the lower bound Σ, U U Σ U U + V V Σ V V r = σ i U U Σ U U + V V Σ V V = i=1 r i=1 r σ i σ i r k=1 i=1 k=1 σ k U U ik + V V ik r σk U U ik V V ik = Σ, U U Σ V V. ii 3 Combining and 3, we obtain UΣU U Σ U + = V ΣV V Σ V Σ + Σ Σ, U U Σ V V UΣV U + Σ V = UΣV U Σ V. Plugging 4 back into 1, we obtain the lemma. UΣV, U Σ V Recall that n = maxn 1, n. We will exploit the following two known concentration results. Lemma 5 Chen [1], Lemma. or any fixed matrix X R n 1 n, there exist universal constants c, c 1, c such that with probability at least 1 c 1 n c, p 1 P Ω X X log n c X log n p + X p,. Lemma 6 Candès and Recht [8], Theorem 4.1. Define subspace T = { M R n 1 n : M = U X + Y V for some X and Y 4 }. 5 Let P T be the Euclidean projection onto T. There is a numerical constant c such that for any δ 0, 1], if p c µr log n, then with probability 1 3n 3, we have δ n 1 n p 1 P T P Ω P T pp T δ. Lemma 7 upper bounds the spectral norm of the adjacency matrix of a random Erdős-Rényi graph. It is a variant of Lemma 7.1 of Keshavan et al. [3], which uses known results of eige and Ofek [16]. 14

15 Lemma 7 Chen and Wainwright [13], Lemma 9. Suppose that Ω [d] [d] is the set of edges of a random Erdős-Rényi graph with n nodes, where any pair of nodes is connected with probability p. There exists two numerical constants c 1, c such that, for any δ 0, 1], if p c 1 log d δ d, then with probability at least 1 1 d 4, uniformly for all x, y R n it holds that p 1 d x i y j 1 + δ x 1 y 1 + c p x y. 6 i,j Ω We refer readers to [3] for a complete proof, in particular noticing that one can choose p large enough so that the constant factor in the first term in 6 is only 1 + δ. Lemma 8, 9 and 10 are direct generalizations of Lemma 4 and 5 of [13]. Lemma 8. There exists a constant c such that, for any δ 0, 1], if p c δ max then with probability at least 1 1 n 1+n 4, uniformly for all H such that H, 3 we have p 1 PΩ HH 1 + δ H 4 + δσ r H. logn1 +n n 1 +n, µ r κ n 1 n, µr n 1 n σ 1, Proof. It holds that p 1 PΩ HH = p 1 H i, H j i,j Ω p 1 i,j Ω H i H j. 7 Since Ω is a reduced sampling of Y R n 1+n n 1 +n under a Bernoulli model, Lemma 7 is applicable here. Assume p c 1 logn 1 +n δ n 1 +n, we then have with probability at least 1 1n 1 + n 4, for all H such that H, 3 µr p 1 PΩ HH p 1 n 1 n σ 1, i,j Ω a 1 + δ Hi Hj i [n 1 +n ] Hi n1 + n + c p n1 + n i [n 1 +n ] 1 + δ H 4 + c H p H, b H 1 + δ H + 81c µ r σ1 n 1 + n, pn 1 n where a follows from Lemma 7 and b follows from H, 3 µr n 1 n σ1. 15 Hi 4 8

16 Let us further assume p 16c µ r κ γ δ n 1 n, where γ = n/n 1 n is a fixed constant, then we can bound p 1 PΩ HH H 1 + δ H + r δσ. 9 The final threshold we obtain is thus p c max logn1 +n δ n 1 +n, µ r κ n 1 n for some constant c. Lemma 9. There exists a constant c, if p c log n, then with probability at least 1 n 4 n 1 n uniformly for all matrices A, B such that AB is of size n 1 + n n 1 + n, p 1 P Ω AB { } n min A B,, B A, 1 n 4, Proof. Let Ω Yi = {j : i, j Ω} denote the set of entries sampled in the ith row of AB. Note that because of the structure of Ω, at most n entries are sampled at the frist n 1 rows, and at most n 1 entries are sampled at the rest n rows. Using a binomial tail bound, if p c log n for sufficiently large c, the event max i [n1 ] Ω Yi n pn holds with probability at least 1 n 4. Similarly for the rest n rows. Hence, if p c log n n 1 n for some constant c, with probability at least 1 n 4 1 n 4, we have max i [n1 +n ] Ω Yi pn. Conditioning on this event, we then have for all A, B of proper size, p 1 PΩ AB n = 1 +n p 1 i=1 n 1 +n p 1 i=1 n 1 +n p 1 i=1 j Ω Yi A i, B j Ai n A B,. j Ω Yi Similarly we can prove with probability at least 1 n 4 1 n 4, Bj A i max Ω Y i B, i [n 1 +n ] p 1 PΩ AB n B A,. The following lemma establishes restricted strong convexity and smoothness of the observation operator for matrices in T. Lemma 10. Let T be the subspace defined in 5. There exists a universal constant c such that, if p c µr log n, with probability at least 1 3n 3, uniformly for all A T, we have δ n 1 n p1 δ A P ΩA p1 + δ A

17 Consequently, uniformly for all A, B T, p 1 P Ω A, P Ω B A, B δ A B. 31 Proof. By Lemma 6, with probability at least 1 3n 3, for any X R n 1 n it holds that p1 δ X P T P Ω P T X p1 + δ X. 3 Let A be a matrix in T. Rewriting P Ω A = P ΩP T A, P Ω P T A = A, P T P Ω P T A, and using the Cauchy-Schwarz inequality and 3 we can bound In addition, we have P Ω A p1 + δ A. 33 P Ω A = A, P T P Ω P T A = A, P T P Ω P T A pp T A + pp T A A P T P Ω P T pp T A + p A a p1 δ A, 34 where a follows from Lemma 6. Combining 33 and 34 proves 30. To show 31, let A = A and B = B. Both A + B and A B are in T. We have A B P Ω A, P Ω B = 1 { 1 {}}{{}}{ } P Ω A + B 4 P Ω A B b 1 { } 1 + δp A + B 4 1 δp A B = 1 { } δp A 4 + B + 4p A, B 35 = pδ + p A, B, where b follows from 30. Thus, we have p 1 P Ω A, P Ω B = p 1 A B P Ω A, P Ω B δ A B + A, B. 36 Similarly, we can show p 1 P Ω A, P Ω B δ A B + A, B. 37 Last, we want to show the projection onto feasible set C is a contraction. 17

18 Lemma 11. Let y R r be a vector such that y θ, for any x R r. Then P θx y x y. Proof. If x θ, then P θx = x. Otherwise P θx = θ x, where x = y = y x x + P x y, we have x. Write x It suffices to show θ x y = θ x y x x + P x y = θ y x + P x y. 38 θ y x x y x. 39 If y x 0, then 39 holds because x > θ. If y x > 0, 39 still holds since x > θ y y x. B Initialization B.1 Proof of Lemma 1 Let δ denote the upper bound of p 1 P Ω X X as in Lemma 5, and let σ 1... σ n denote the singular values of p 1 P Ω X. By Weyl s theorem, we have σ i σ i δ, i [n]. 40 Note this implies σ r+1 δ, as σr+1 = 0. By definition, Z 0 = [U; V ]Σ 1, where UΣV is the rank-r SVD of p 1 P Ω X. According to Lemma 4, one has Z 0 Z 0 Z Z UΣV X a r UΣV X r UΣV p 1 P Ω X + p 1 P Ω X X b r δ + δ = 4 rδ, 41 where a holds because rankuσv X r, b holds since UΣV p 1 P Ω X = σ r+1 δ. Let H = Z 0 Z 0. We want to bound dz 0, Z = H. According to 0, H Z 0 is 18

19 symmetric and Z 0 Z 0 is positive semidefinite. Hence we can write Z 0 Z 0 Z Z = HZ 0 + Z 0 H + HH = tr H H + H Z 0 + H HZ 0 Z 0 + 4H HH Z 0 = tr H H + H Z H HH Z 0 + H HZ 0 Z 0 tr 4 H HH Z 0 + H HZ 0 Z 0 =4 tr H HZ 0 Z 0 + HZ 0T, 4 where in the second line we used that H Z 0 is symmetric. Besides, as Z 0 Z 0 is positive semidefinite, 4 trh HZ 0 Z 0 is nonnegative. Therefore, Z 0 Z 0 Z Z HZ 0 4 1σr H. 43 Combining 41 and 43, it follows that Z 0 Z 0 Z Z dz 0, Z 4 1σr 8r δ. 1σr 44 Therefore, it suffices to show dz 0, Z 8r δ 1σr a = c r log n X σr p + b c r σ 1 µr log n σr pn 1 n σ r, log n p µr log n pn 1 n X, where in a we replaced δ using Lemma 5, and b holds since by our incoherence assumption 3 we have X = U Σ V σ1 max U i V j σ 1 U i,j, V, σ1 µr, 46 n 1 n X, = U Σ V, σ 1 U V, c σ µr n 1 n. 47

20 Note that for c we used AB, A, B. Hence, to obtain dz 0, Z 1 16 σ r, it suffices to have { } cµr 3/ κ log n p max, cµr κ log n n 1 n n 1 n Since P C is just row-wise clipping, by Lemma 11 we have B. Proof of Corollary 1 dz 1, Z P C Z 0 Z Z 0 Z. = cµr κ log n n 1 n. 48 By the incoherence assumption, we have Z, σ 1 σ1. rom the above discussion, we can see that µr n 1 n σ 1, see 17. It suffices to show 8r δ 1 1σr 16 σ r δ 1 16 σ r. By Wely s theorem, we have σ 1 σ σ r. As a result, σ 1 σ 1. C Regularity Condition Analogous to the restricted strong convexity RSC and restricted strong smoothness RSS, we show that with high probability our objective function f satisfies the local curvature and local smoothness conditions defined below. Local Curvature Condition There exists constant c 1, c such that for any Z C satisfying dz, Z 1 4 σ r, Local Smoothness Condition fz, H c 1 H + c H DZ. There exist constants c 3, c 4 such that for any Z C satisfying dz, Z 1 4 σ r, fz c 3 H + c 4 H DZ. 0

21 C.1 Proof of the Local Curvature Condition fz, H = 1 m A l, HZ + ZH + HH A l + A l Z + H, H + λ trh Γ p l=1 i = 1 m A l, HZ + ZH + HH A l, HZ + ZH + HH + λ trh Γ p = 1 p { l=1 a {}} b {{}} { m m A l, HZ + ZH + A l, HH + l=1 + λ trh Γ ii 1 { a + b 3 p l=1 a m l=1 } 3 A l, HZ + ZH A l, HH {}}{{}}{ m A l, HZ + ZH m } A l, HH + λ trh Γ l=1 b 1 = 1 { a 3 } p 8 b + λ trh Γ iii 1 a p 5 4 b + λ trh Γ = 1 PΩ p 1 HZ + ZH 5 p 1 P Ω HH + λ trh Γ. 49 where we used equation 19 for i, the Cauchy-Schwarz inequality for ii, inequality a b a b for iii. inally, in the last line we used m l=1 A l, M = P Ω M. We first lower bound 1 PΩ p 1 HZ + ZH p 1 PΩ H U Z V + Z U HV, which expands to l=1 b. By the symmetry of Ω, it is equal to p 1 PΩ H U Z V + p 1 PΩ Z U HV + p 1 P Ω H U Z V, P Ω Z U HV. 50 As both H U Z V and Z U H V belong to T, we use Lemma 10 to lower bound above three terms, 1

22 respectively. This gives us 1 PΩ p 1 HZ + ZH HU 1 δ Z V + Z U HV + H U Z V,Z U HV δ H U Z V Z U HV HU 1 δ Z V + HU Z U HV + H U Z V,Z U HV δ Z V + Z U HV iv 1 δσr HU + H V + HU Z V,Z U HV = 1 δσ r H + H UZ V,Z U H V. where we used H U Z V Until now, we obtain σ r H U and Z U H V σ r H V for iv. 51 fz, H 1 δσ r H + H UZ V,Z U H V + λ trh Γ 5 p 1 P Ω HH. 5 Next, we lower bound H U Z V,Z U H V + λ trh Γ together. Rewriting [ H U Z V,Z U HV 0 Z = H, U HV Z V HU 0 ZZ ZZ = HH + ZH + HZ, ] Z = H, 1 ZH DZH DZ, 53

23 and plugging in Γ = DZZ DZ, we then have H U Z V,Z U H V + λ trh Γ = H, 1 ZH DZH DZ + λ H, DZZ ZZ DZ + λ H, DZZ DZ + λ H, DZZ DH a = H, 1 ZH DZH DZ + λ H, DZZ ZZ DZ + λ Z DH b = λ Z DH + H, 1 ZH DZH DZ + λdhh + ZH + HZ DZ + H c = λ Z DH + 1 H Z + λ H DH + 3λ trh DHH DZ + λ 1 trh DZH DZ = λ Z DH + 1 λ Z DH H Z + λ 1 + λ Z DH + 3H DH 7 λ H DH trh DZH DZ 7 λ H 4 + λ 1 trh DZH DZ Equality a holds because Z DZ = 0. We plug in 53 in b. or c, we use Z DZ = 0 and that H Z is symmetric. inally, we take λ = 1 and use Lemma 8 to upper bound p 1 P Ω HH : fz, H 1 δσr H + 1 Z DH H δ H 4 5 δσ r H = 1 δσr δ H 5 δσ r H + 1 Z DH or simplicity, we take δ = 1. We also have 16 H 1 16 σ r. This leads to fz, H 7 51 σ r H + 1 Z DH Note that this lower bound holds with high probability uniformly for all Z C such that dz, Z 1 4 σ r, since Lemma 8 and 10 hold uniformly. When the ground truth X is positive semidefinite, we don t need to do lifitng nor impose the regularizer. Using Lemma 10, we can lower bound 1 PΩ p 1 HZ + ZH 1 δσr H directly. Taking proper constants, we can obtain the standard restricted strong convexity condition: fz, H σ r H. 54 3

24 C. Proof of the Local Smoothness Condition To upper bound fz = max W =1 fz, W, it suffices to show that for any n r W of unit robenius norm, fz, W is upper bounded. We first write fz, W = 1 m A l, HZ + ZH + A l, HH A l + A l Z + H, W + λ trw Γ p i = 1 p l=1 m l=1 A l, HZ + ZH + A l, HH A l, WZ + ZW + A l, W H + HW + λ trw Γ = 1 { P Ω HZ + ZH, P Ω WZ + ZW + P Ω HH, P Ω WZ + ZW p } + P Ω HZ + ZH, P Ω W H + HW + P Ω HH, P Ω W H + HW + λ trw Γ, 57 where we used 19 for i. Since a + b + c + d + e 5a + b + c + d + e, we have fz, W 5 p { P Ω HZ + ZH, P Ω WZ + ZW + P Ω HH, P Ω WZ + ZW + P Ω HZ + ZH, P Ω W H + HW + P Ω HH, P Ω W H + HW } + 5λ trw Γ ii 5 PΩ HZ + ZH + PΩ HH p PΩ WZ + ZW + PΩ W H + HW =1 {}}{ + 5λ Γ W 58 1 iii 5 {}}{ 3 {}}{ P Ω HZ + PΩ ZH { p + }}{ PΩ HH 4 1 {}}{ {}}{ P Ω WZ + PΩ ZW { p + }}{ PΩ W H { + }}{ PΩ HW + 5λ Γ, 4

25 where we used the Cauchy-Schwarz inequality for ii, and a+b a +b for iii. We then use Lemma 9 to upper bound 1,, 4, 5, 6, 7, and Lemma 8 for 3. Also since W = 1, one has fz, W 5 8n H Z δ, H 4 + δσ r H 8n Z + 8n, H, + 5λ Γ = 40n 8n Z δ, H + δσ r H Z +, H, + 5λ Γ 400µrσ1 8µrσ δ H + r δσ H + 5λ Γ, 59 where in the last line we plugged in Z, 18. Next, we bound µr µr n σ 1 and H, 3 n σ 1, i.e. 17 and Γ DZZ = ZZ DZ + DZZ DZ DZZ ZZ DZ + DZZ DZ a ZZ ZZ Z + Z Z DZ b = HH + ZH + HZ Z + Z Z DH HH 6 + ZH HZ + Z + Z Z DH c 6 H + Z H Z + Z Z DH d = 6 H + 4σ 1 H Z + 4σ1 Z DH Inequality a holds because AB A B and D = 1. To get b, for the first term in the 3rd line we expand ZZ ZZ, for the second term we expand Z = Z + H and use Z DZ = 0. or c, we use AB A B A B. Last, d holds because Z = σ 1. inally, we combine 59 and 60. As before, take λ = 1, δ = 1, and 16 H 1 16 σ r, we. 60 5

26 obtain fz { 400µrσ1 8µrσ δ H + δσ r + 30λ H + 4σ 1 Z } H + 0λ σ1 Z DH { a 400µrσ1 8µrσ δ H r δσ + 8 σ 1λ H + 1} σ H b + 0λ σ1 Z DH { } µ r σ 1 H + 5σ 1 Z DH 399µ r σ 1 H + 5σ 1 Z DH, 61 where for a we used Z H + Z 1 4 σ r + σ σ 1, for b we used µ, r 1. As before, this condition holds uniformly for all Z such that dz, Z 1 4 σ r and satisfying the incoherence condition. or the case X is positive semidefinite, as we don t need to impose the regularizer, standard restricted strong smoothness condition follows: fz σ 1 H. C.3 Proof of Lemma 3 Rearranging the terms in the smoothness condition 61, we can further bound 1 Z DH 4 fz 0µ r κσ 1 fz 13196µ r κσ σ r H σ r H. 6 Combining equation 56 and 6, it follows that fz, H σ r H + 1 fz µ r κσ1. 63 inally, by upper bounding the probability that Lemma 8, 9, or 10 fails, and the sample probability p these lemmas require, we conclude that once µr log n p c max, µ r κ, 64 n 1 n n 1 n regularity condition 63 holds with probability at least 1 c 1 n c, where c, c 1, c are constants. 6

27 D Linear Convergence D.1 Proof of Lemma Let H k = Z k Z k. Our iterate is Z k+1 = P C Z k η fz k. Since P C is just row-wise clipping, by Lemma 11 we have P C It follows that Z k+1 Z k Zk η σ1 Z k η fz k Z k σ1 fz k Z k = H k + η fz k σ1 η fz k, H k σ1 a H k + η fz k σ η 1 1 σ1 α σ r = b 1 η H k ακ 1 η H k ακ, + ηη /β σ 1 Zk η fz k Z k. 65 σ1 H k + 1 fz k βσ1 fz k where we use the definition of RCε, α, β for a and 0 < η min {α/, /β} for b. Therefore, dz k+1, Z = min Z k+1 Z 1 η Z S ακ dzk, Z. 67 References [1] Srinadh Bhojanapalli, Anastasios Kyrillidis, and Sujay Sanghavi. Dropping convexity for faster semi-definite optimization. arxiv: , 015. [] Nicolas Boumal and Pierre-antoine Absil. Rtrmc: A riemannian trust-region method for low-rank matrix completion. In Advances in neural information processing systems, pages , 011. [3] Nicolas Boumal, Bamdev Mishra, P-A Absil, and Rodolphe Sepulchre. Manopt, a matlab toolbox for optimization on manifolds. Journal of Machine Learning Research, 15Apr: ,

28 [4] Samuel Burer and Renato D.C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 95:39 357, 003. [5] Jian-eng Cai, Emmanuel J Candès, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 04: , 010. [6] Emmanuel Candès, Thomas Strohmer, and Vladislav Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 668: , 013. [7] Emmanuel J Candès and Xiaodong Li. Solving quadratic equations via phaselift when there are about as many equations as unknowns. oundations of Computational Mathematics, 14 5: , 014. [8] Emmanuel J Candès and Benjamin Recht. Exact matrix completion via convex optimization. oundations of Computational mathematics, 96:717 77, 009. [9] Emmanuel J Candès and Terence Tao. The power of convex relaxation: Near-optimal matrix completion. Information Theory, IEEE Transactions on, 565: , 010. [10] Emmanuel J Candès, Yonina C Eldar, Thomas Strohmer, and Vladislav Voroninski. Phase retrieval via matrix completion. SIAM Review, 57:5 51, 015. [11] Emmanuel J Candès, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms. Information Theory, IEEE Transactions on, 614: , 015. [1] Yudong Chen. Incoherence-optimal matrix completion. Information Theory, IEEE Transactions on, 615:909 93, 015. [13] Yudong Chen and Martin J. Wainwright. ast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arxiv: , 015. [14] Alexandre d Aspremont, Laurent El Ghaoui, Michael I Jordan, and Gert RG Lanckriet. A direct formulation for sparse pca using semidefinite programming. SIAM review, 493: , 007. [15] Maryam azel. Matrix rank minimization with applications. Technical report, Elec. Eng. Dept., Stanford University, 00. PhD thesis. [16] Uriel eige and Eran Ofek. Spectral techniques applied to sparse random graphs. Random Structures & Algorithms, 7:51 75, 005. [17] Rina oygel and Nathan Srebro. Concentration-based guarantees for low-rank matrix reconstruction. arxiv: ,

29 [18] Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. inding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53:17 88, 011. [19] Moritz Hardt. Understanding alternating minimization for matrix completion. In OCS 014. IEEE, 014. [0] Moritz Hardt and Mary Wootters. ast matrix completion without the condition number. In COLT 014, pages , 014. [1] Prateek Jain, Raghu Meka, and Inderjit S Dhillon. Guaranteed rank minimization via singular value projection. In Advances in Neural Information Processing Systems, pages , 010. [] Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating minimization. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages ACM, 013. [3] Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few entries. Information Theory, IEEE Transactions on, 566: , 010. [4] Raghunandan Hulikal Keshavan. Efficient algorithms for collaborative filtering. PhD thesis, Stanford University, 01. [5] Jason D Lee, Ben Recht, Nathan Srebro, Joel Tropp, and Ruslan R Salakhutdinov. Practical large-scale optimization for max-norm regularization. In Advances in Neural Information Processing Systems, pages , 010. [6] Bamdev Mishra, Gilles Meyer, rancis Bach, and Rodolphe Sepulchre. Low-rank optimization with trace norm penalty. SIAM Journal on Optimization, 34:14 149, 013. [7] Sahand Negahban and Martin J. Wainwright. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. The Journal of Machine Learning Research, 13 1: , 01. [8] Yurii Nesterov. Introductory lectures on convex optimization, volume 87. Springer Science & Business Media, 004. [9] Benjamin Recht, Maryam azel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 53: , 010. [30] Nathan Srebro and Adi Shraibman. Rank, trace-norm and max-norm. In Learning Theory, pages Springer, 005. [31] Nathan Srebro, Jason Rennie, and Tommi S Jaakkola. Maximum-margin matrix factorization. In Advances in neural information processing systems, pages ,

30 [3] Ruoyu Sun. Matrix Completion via Nonconvex actorization: Algorithms and Theory. PhD thesis, University of Minnesota Twin Cities, 015. [33] Ruoyu Sun and Zhi-Quan Luo. Guaranteed matrix completion via nonconvex factorization. In oundations of Computer Science, IEEE 56th Annual Symposium on, pages 70 89, 015. [34] Ryota Tomioka, Kohei Hayashi, and Hisashi Kashima. Estimation of low-rank tensors via convex optimization. arxiv: , 010. [35] Stephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi, and Benjamin Recht. Low-rank solutions of linear matrix equations via procrustes flow. In International Conference on Machine Learning, 016. [36] Bart Vandereycken. Low-rank matrix completion by riemannian optimization. SIAM Journal on Optimization, 3: , 013. [37] Ke Wei, Jian-eng Cai, Tony. Chan, and Shingyu Leung. Guarantees of riemannian optimization for low rank matrix completion. arxiv: , 016. [38] Xinyang Yi, Dohyung Park, Yudong Chen, and Constantine Caramanis. ast algorithms for robust pca via gradient descent. In Advances in neural information processing systems, 016. [39] Tuo Zhao, Zhaoran Wang, and Han Liu. A nonconvex optimization framework for low rank matrix estimation. In Advances in Neural Information Processing Systems, pages , 015. [40] Qinqing Zheng and John Lafferty. A convergent gradient descent algorithm for rank minimization and semidefinite programming from random linear measurements. In Advances in Neural Information Processing Systems, pages ,

Symmetric Factorization for Nonconvex Optimization

Symmetric Factorization for Nonconvex Optimization Symmetric Factorization for Nonconvex Optimization Qinqing Zheng February 24, 2017 1 Overview A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Recovering any low-rank matrix, provably

Recovering any low-rank matrix, provably Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Fast Algorithms for Robust PCA via Gradient Descent

Fast Algorithms for Robust PCA via Gradient Descent Fast Algorithms for Robust PCA via Gradient Descent Xinyang Yi Dohyung Park Yudong Chen Constantine Caramanis The University of Texas at Austin Cornell University yixy,dhpark,constantine}@utexas.edu yudong.chen@cornell.edu

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

THE UNIVERSITY OF CHICAGO FIRST ORDER METHODS FOR NONCONVEX OPTIMIZATION VIA SYMMETRIC FACTORIZATION A DISSERTATION SUBMITTED TO

THE UNIVERSITY OF CHICAGO FIRST ORDER METHODS FOR NONCONVEX OPTIMIZATION VIA SYMMETRIC FACTORIZATION A DISSERTATION SUBMITTED TO THE UNIVERSITY OF CHICAGO FIRST ORDER METHODS FOR NONCONVEX OPTIMIZATION VIA SYMMETRIC FACTORIZATION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE

More information

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach Dohuyng Park Anastasios Kyrillidis Constantine Caramanis Sujay Sanghavi acebook UT Austin UT Austin UT Austin Abstract

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past

More information

Guaranteed Rank Minimization via Singular Value Projection

Guaranteed Rank Minimization via Singular Value Projection 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 Guaranteed Rank Minimization via Singular Value Projection Anonymous Author(s) Affiliation Address

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis Oct.18th, 2016 Shuyang Ling (UC Davis) 16w5136, Oaxaca, Mexico Oct.18th, 2016

More information

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences Yuxin Chen Emmanuel Candès Department of Statistics, Stanford University, Sep. 2016 Nonconvex optimization

More information

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Yuanming Shi ShanghaiTech University, Shanghai, China shiym@shanghaitech.edu.cn Bamdev Mishra Amazon Development

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

The convex algebraic geometry of rank minimization

The convex algebraic geometry of rank minimization The convex algebraic geometry of rank minimization Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology International Symposium on Mathematical Programming

More information

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Matrix Completion from Noisy Entries

Matrix Completion from Noisy Entries Matrix Completion from Noisy Entries Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh Abstract Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

Low-Rank Matrix Recovery

Low-Rank Matrix Recovery ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery

More information

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow Low-rank Solutions of Linear Matrix Equations via Procrustes low Stephen Tu, Ross Boczar, Max Simchowitz {STEPHENT,BOCZAR,MSIMCHOW}@BERKELEYEDU EECS Department, UC Berkeley, Berkeley, CA Mahdi Soltanolkotabi

More information

Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow

Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow Xiao Zhang and Simon S. Du and Quanquan Gu arxiv:1803.0133v1 [stat.ml] 3 Mar 018 Abstract We revisit the inductive

More information

Matrix Completion: Fundamental Limits and Efficient Algorithms

Matrix Completion: Fundamental Limits and Efficient Algorithms Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example

More information

Implicit Regularization in Matrix Factorization

Implicit Regularization in Matrix Factorization Implicit Regularization in Matrix Factorization Suriya Gunasekar Blake Woodworth Srinadh Bhojanapalli Behnam Neyshabur Nathan Srebro SURIYA@TTIC.EDU BLAKE@TTIC.EDU SRINADH@TTIC.EDU BEHNAM@TTIC.EDU NATI@TTIC.EDU

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem Morteza Ashraphijuo Columbia University ashraphijuo@ee.columbia.edu Xiaodong Wang Columbia University wangx@ee.columbia.edu

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Nonconvex Matrix Factorization from Rank-One Measurements

Nonconvex Matrix Factorization from Rank-One Measurements Yuanxin Li Cong Ma Yuxin Chen Yuejie Chi CMU Princeton Princeton CMU Abstract We consider the problem of recovering lowrank matrices from random rank-one measurements, which spans numerous applications

More information

Global Optimality in Low-rank Matrix Optimization

Global Optimality in Low-rank Matrix Optimization Global Optimality in Low-rank Matrix Optimization Zhihui Zhu, Qiuwei Li, Gongguo Tang, and Michael B. Wakin arxiv:7.7945v3 cs.it] Mar 8 Abstract This paper considers the minimization of a general objective

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Matrix Factorizations: A Tale of Two Norms

Matrix Factorizations: A Tale of Two Norms Matrix Factorizations: A Tale of Two Norms Nati Srebro Toyota Technological Institute Chicago Maximum Margin Matrix Factorization S, Jason Rennie, Tommi Jaakkola (MIT), NIPS 2004 Rank, Trace-Norm and Max-Norm

More information

Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle

Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle Tuo Zhao Zhaoran Wang Han Liu Abstract We study the low rank matrix factorization problem via nonconvex optimization. Compared with

More information

Robust PCA by Manifold Optimization

Robust PCA by Manifold Optimization Journal of Machine Learning Research 19 (2018) 1-39 Submitted 8/17; Revised 10/18; Published 11/18 Robust PCA by Manifold Optimization Teng Zhang Department of Mathematics University of Central Florida

More information

arxiv: v2 [cs.it] 19 Sep 2016

arxiv: v2 [cs.it] 19 Sep 2016 Fast Algorithms for Robust PCA via Gradient Descent Xinyang Yi Dohyung Park Yudong Chen Constantine Caramanis The University of Texas at Austin Cornell University {yixy,dhpark,constantine}@utexas.edu yudong.chen@cornell.edu

More information

arxiv: v2 [stat.ml] 27 Sep 2016

arxiv: v2 [stat.ml] 27 Sep 2016 Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach arxiv:1609.030v [stat.ml] 7 Sep 016 Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi

More information

Low rank matrix completion by alternating steepest descent methods

Low rank matrix completion by alternating steepest descent methods SIAM J. IMAGING SCIENCES Vol. xx, pp. x c xxxx Society for Industrial and Applied Mathematics x x Low rank matrix completion by alternating steepest descent methods Jared Tanner and Ke Wei Abstract. Matrix

More information

Matrix Completion from a Few Entries

Matrix Completion from a Few Entries Matrix Completion from a Few Entries Raghunandan H. Keshavan and Sewoong Oh EE Department Stanford University, Stanford, CA 9434 Andrea Montanari EE and Statistics Departments Stanford University, Stanford,

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department

More information

EE 381V: Large Scale Learning Spring Lecture 16 March 7

EE 381V: Large Scale Learning Spring Lecture 16 March 7 EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

Concentration-Based Guarantees for Low-Rank Matrix Reconstruction

Concentration-Based Guarantees for Low-Rank Matrix Reconstruction JMLR: Workshop and Conference Proceedings 9 20 35 339 24th Annual Conference on Learning Theory Concentration-Based Guarantees for Low-Rank Matrix Reconstruction Rina Foygel Department of Statistics, University

More information

Nonconvex Methods for Phase Retrieval

Nonconvex Methods for Phase Retrieval Nonconvex Methods for Phase Retrieval Vince Monardo Carnegie Mellon University March 28, 2018 Vince Monardo (CMU) 18.898 Midterm March 28, 2018 1 / 43 Paper Choice Main reference paper: Solving Random

More information

Rank Determination for Low-Rank Data Completion

Rank Determination for Low-Rank Data Completion Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,

More information

COR-OPT Seminar Reading List Sp 18

COR-OPT Seminar Reading List Sp 18 COR-OPT Seminar Reading List Sp 18 Damek Davis January 28, 2018 References [1] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht. Low-rank Solutions of Linear Matrix Equations via Procrustes

More information

A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery

A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery Lingxiao Wang *1 Xiao Zhang *1 Quanquan Gu 1 Abstract We propose a generic framework based on a new stochastic variance-reduced

More information

arxiv: v1 [math.st] 10 Sep 2015

arxiv: v1 [math.st] 10 Sep 2015 Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Structured and Low Rank Matrices Section 6.3 Numerical Optimization Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at

More information

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yingbin Liang Department of EECS Syracuse

More information

Low rank matrix completion by alternating steepest descent methods

Low rank matrix completion by alternating steepest descent methods Low rank matrix completion by alternating steepest descent methods Jared Tanner a, Ke Wei b a Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter, Woodstock

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Maximum Margin Matrix Factorization

Maximum Margin Matrix Factorization Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval Proceedings of Machine Learning Research vol 75:1 6, 2018 31st Annual Conference on Learning Theory Fundamental Limits of Weak Recovery with Applications to Phase Retrieval Marco Mondelli Department of

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage

Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Alp Yurtsever (EPFL),

More information

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow Low-rank Solutions of Linear Matrix Equations via Procrustes low Stephen Tu Mahdi Soltanolkotabi Ross Boczar Benjamin Recht July 11, 2015 Abstract In this paper we study the problem of recovering an low-rank

More information

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization Forty-Fifth Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 26-28, 27 WeA3.2 Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization Benjamin

More information

arxiv: v3 [math.oc] 19 Oct 2017

arxiv: v3 [math.oc] 19 Oct 2017 Gradient descent with nonconvex constraints: local concavity determines convergence Rina Foygel Barber and Wooseok Ha arxiv:703.07755v3 [math.oc] 9 Oct 207 0.7.7 Abstract Many problems in high-dimensional

More information

Fast recovery from a union of subspaces

Fast recovery from a union of subspaces Fast recovery from a union of subspaces Chinmay Hegde Iowa State University Piotr Indyk MIT Ludwig Schmidt MIT Abstract We address the problem of recovering a high-dimensional but structured vector from

More information

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Yuejie Chi, Yuanxin Li, Huishuai Zhang, and Yingbin Liang Abstract Recent work has demonstrated the effectiveness

More information

Recovery of Simultaneously Structured Models using Convex Optimization

Recovery of Simultaneously Structured Models using Convex Optimization Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Self-Calibration and Biconvex Compressive Sensing

Self-Calibration and Biconvex Compressive Sensing Self-Calibration and Biconvex Compressive Sensing Shuyang Ling Department of Mathematics, UC Davis July 12, 2017 Shuyang Ling (UC Davis) SIAM Annual Meeting, 2017, Pittsburgh July 12, 2017 1 / 22 Acknowledgements

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

A New Theory for Matrix Completion

A New Theory for Matrix Completion A New Theory for Matrix Completion Guangcan Liu Qingshan Liu Xiao-Tong Yuan B-DAT, School of Information & Control, Nanjing Univ Informat Sci & Technol NO 29 Ningliu Road, Nanjing, Jiangsu, China, 20044

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto

More information

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion : Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion Cong Ma 1 Kaizheng Wang 1 Yuejie Chi Yuxin Chen 3 Abstract Recent years have seen a flurry of activities in designing provably

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Linear dimensionality reduction for data analysis

Linear dimensionality reduction for data analysis Linear dimensionality reduction for data analysis Nicolas Gillis Joint work with Robert Luce, François Glineur, Stephen Vavasis, Robert Plemmons, Gabriella Casalino The setup Dimensionality reduction for

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

Sparse and Low-Rank Matrix Decompositions

Sparse and Low-Rank Matrix Decompositions Forty-Seventh Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 30 - October 2, 2009 Sparse and Low-Rank Matrix Decompositions Venkat Chandrasekaran, Sujay Sanghavi, Pablo A. Parrilo,

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Dropping Convexity for Faster Semi-definite Optimization

Dropping Convexity for Faster Semi-definite Optimization JMLR: Workshop and Conference Proceedings vol 49: 53, 06 Dropping Convexity for Faster Semi-definite Optimization Srinadh Bhojanapalli Toyota Technological Institute at Chicago Anastasios Kyrillidis Sujay

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle

Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle Nonconvex Low Ran Matrix Factorization via Inexact First Order Oracle Tuo Zhao Zhaoran Wang Han Liu Abstract We study the low ran matrix factorization problem via nonconvex optimization Compared with the

More information

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Convex relaxation. In example below, we have N = 6, and the cut we are considering Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution

More information

Research Statement Qing Qu

Research Statement Qing Qu Qing Qu Research Statement 1/4 Qing Qu (qq2105@columbia.edu) Today we are living in an era of information explosion. As the sensors and sensing modalities proliferate, our world is inundated by unprecedented

More information

Breaking the Limits of Subspace Inference

Breaking the Limits of Subspace Inference Breaking the Limits of Subspace Inference Claudia R. Solís-Lemus, Daniel L. Pimentel-Alarcón Emory University, Georgia State University Abstract Inferring low-dimensional subspaces that describe high-dimensional,

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

On Iterative Hard Thresholding Methods for High-dimensional M-Estimation

On Iterative Hard Thresholding Methods for High-dimensional M-Estimation On Iterative Hard Thresholding Methods for High-dimensional M-Estimation Prateek Jain Ambuj Tewari Purushottam Kar Microsoft Research, INDIA University of Michigan, Ann Arbor, USA {prajain,t-purkar}@microsoft.com,

More information

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3 MI 6.97 Algebraic techniques and semidefinite optimization February 4, 6 Lecture 3 Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo In this lecture, we will discuss one of the most important applications

More information

Collaborative Filtering Matrix Completion Alternating Least Squares

Collaborative Filtering Matrix Completion Alternating Least Squares Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016

More information

arxiv: v1 [math.oc] 11 Jun 2009

arxiv: v1 [math.oc] 11 Jun 2009 RANK-SPARSITY INCOHERENCE FOR MATRIX DECOMPOSITION VENKAT CHANDRASEKARAN, SUJAY SANGHAVI, PABLO A. PARRILO, S. WILLSKY AND ALAN arxiv:0906.2220v1 [math.oc] 11 Jun 2009 Abstract. Suppose we are given a

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Gradient Descent Can Take Exponential Time to Escape Saddle Points Gradient Descent Can Take Exponential Time to Escape Saddle Points Simon S. Du Carnegie Mellon University ssdu@cs.cmu.edu Jason D. Lee University of Southern California jasonlee@marshall.usc.edu Barnabás

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information