THIS work studies additive decompositions of matrices

Size: px
Start display at page:

Download "THIS work studies additive decompositions of matrices"

Transcription

1 Robust Matrix Decomposition with Sparse Corruptions Daniel Hsu, Sham M Kakade, Tong Zhang Abstract Suppose a given observation matrix can be decomposed as the sum of a low-rank matrix a sparse matrix, the goal is to recover these individual components from the observed sum Such additive decompositions have applications in a variety of numerical problems including system identification, latent variable graphical modeling, principal components analysis We study conditions under which recovering such a decomposition is possible via a combination of l norm trace norm minimization We are specifically interested in the question of how many sparse corruptions are allowed so that convex programming can still achieve accurate recovery, we obtain stronger recovery guarantees than previous studies Moreover, we do not assume that the spatial pattern of corruptions is rom, which sts in contrast to related analyses under such assumptions via matrix completion Index Terms Matrix decompositions, sparsity, lowrank, outliers I INTRODUCTION THIS work studies additive decompositions of matrices into sparse low-rank components Such decompositions have found applications in a variety of numerical problems, including system identification [], latent variable graphical modeling [2], principal component analysis PCA [3] In these settings, the user has an input matrix Y R m n which is believed to be the sum of a sparse matrix X S a low-rank matrix X L For instance, in the application to PCA, X L represents a matrix of m data points from a lowdimensional subspace of R n, is corrupted by a sparse D Hsu is with Microsoft Research New Engl, Cambridge, MA 0242 USA dahsu@microsoftcom S M Kakade is with Microsoft Research New Engl, Cambridge, MA 0242 USA, also with the Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA skakade@whartonupennedu This author was partially supported by NSF grant IIS-0865 T Zhang is with the Department of Statistics, Rutgers University tzhang@statrutgersedu This author was partially supported by the following grants: AFOSR FA , NSA-AMS 08024, NSF DMS , NSF IIS-0606 Copyright c 20 IEEE Personal use of this material is permitted However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubspermissions@ieeeorg matrix X S of errors before being observed as Y = X S + X L sparse low-rank The goal is to recover the original data matrix X L the error components X S from the corrupted observations Y In the latent variable model application of Chrasekaran et al [2], Y represents the precision matrix over visible nodes of a Gaussian graphical model, X S represents the precision matrix over the visible nodes when conditioned on the hidden nodes In general, Y may be dense as a result of dependencies between visible nodes through the hidden nodes However, X S will be sparse when the visible nodes are mostly independent after conditioning on the hidden nodes, the difference X L = Y X S will be low-rank when the number of hidden nodes is small The goal is then to infer the relevant dependency structure from just the visible nodes measurements of their correlations Even if the matrix Y is exactly the sum of a sparse matrix X S a low-rank matrix X L, it may be impossible to identify these components from the sum For instance, the sparse matrix X S may be low-rank, or the low-rank matrix X L may be sparse In such cases, these components may be confused for each other, thus the desired decomposition of Y may not be identifiable Therefore, one must impose conditions on the sparse low-rank components in order to guarantee their identifiability from Y We present sufficient conditions under which X S X L are identifiable from the sum Y Essentially, we require that X S not be too dense in any single row or column, that the singular vectors of X L not be too sparse The level of denseness sparseness are considered jointly in the conditions in order to obtain the weakest possible conditions Under a mild strengthening of the condition, we also show that X S X L can be recovered by solving certain convex programs, that the solution is robust under small perturbations of Y The first program we consider is min λ X S vec + X L

2 subject to certain feasibility constraints such as X S + X L Y ɛ, where vec is the entry-wise -norm is the trace norm These norms are natural convex surrogates for the sparsity of X S the rank of X L [4], [5], which are generally intractable to optimize We also consider a regularized formulation min 2µ X S + X L Y 2 vec2 + λ X S vec + X L where vec2 is the Frobenius norm; this formulation may be more suitable in certain applications enjoys different recovery guarantees A Related work Our work closely follows that of Chrasekaran et al [], who initiated the study of rank-sparsity incoherence its application to matrix decompositions There, the authors identify parameters that characterize the incoherence of X S X L sufficient to guarantee identifiability recovery using convex programs However, their analysis of this characterization yields conditions that are significantly stronger than those given in our present work For instance, the allowed fraction of nonzero entries in X S is quickly vanishing as a function of the matrix size, even under the most favorable conditions on X L ; our analysis does not have this restriction allows X S to have up to Ωmn non-zero entries when X L is low-rank has non-sparse singular vectors In terms of the PCA application, our analysis allows for up to a constant fraction of the data matrix entries to be corrupted by noise of arbitrary magnitude, while the analysis of [] requires that it decrease as a function of the matrix dimensions Moreover, [] only considers exact decompositions, which may be unrealistic in certain applications; we allow for approximate decompositions, study the effect of perturbations on the accuracy of the recovered components The application to principal component analysis with gross sparse errors was studied by Cès et al [3], building on previous results analysis techniques for the related matrix completion problem eg, [6], [7] The sparse errors model of [3] requires that the support of the sparse matrix X S be rom, which can be unrealistic in some settings However, the conditions are significantly weaker than those of []: for instance, they allow for Ωmn non-zero entries in X S Our work makes no probabilistic assumption on the sparsity pattern of X S instead studies purely deterministic structural conditions The price we pay, however, is roughly a factor of rankx L in what is allowed for the support size of X S relative to the probabilistic analysis of [3] Narrowing this gap with alternative deterministic conditions is an interesting open problem Follow-up work to [3] studies the robustness of the recovery procedure [8], as well as quantitatively weaker conditions on X S [9], but these works are only considered under the rom support model Our work is therefore largely complementary to these probabilistic analyses B Outline We describe our main results in Section II In Section III, we review a number of technical tools such as matrix operator norms that are used to characterize the rank-sparsity incoherence properties of the desired decomposition Section IV analyzes these incoherence properties in detail, giving sufficient conditions for identifiability as well as for certifying the approximate optimality of a target decomposition for our optimization formulations The main recovery guarantees are proved in Sections V VI II MAIN RESULTS Fix an observation matrix Y R m n Our goal is to approximately decompose the matrix Y into the sum of a sparse matrix X S a low-rank matrix X L A Optimization formulations We consider two convex optimization problems over X S, X L R m n R m n The first is the constrained formulation parametrized by λ > 0, ɛ vec 0, ɛ 0 min st λ X S vec + X L X S + X L Y vec ɛ vec X S + X L Y ɛ where vec is the entry-wise -norm, is the trace norm ie, sum of singular values The second is the regularized formulation with regularization parameter µ > 0 min 2µ X S + X L Y 2 vec2 + λ X S vec + X L 2 where vec2 is the Frobenius norm entry-wise 2- norm We also consider adding a constraint to control X L vec, the entry-wise -norm of X L To, we add the constraint to 2, we add X L vec b X S Y vec b 2

3 The parameter b is intended as a natural bound for X L is typically known in applications For example, in image processing, the values of interest may lie in the interval [0, 255] say, hence, we might take b = 500 as a relaxation of the box constraint [0, 255] The core of our analyses do not rely on these additional constraints; we only consider them to obtain improved robustness guarantees for recovering X L, which may be important in some applications B Identifiability conditions Our first result is a refinement of the rank-sparsity incoherence notion developed by [] We characterize a target decomposition of Y into Y = X S + X L by the projection operators to subspaces associated with X S X L Let Ω = Ω X S := {X R m n : suppx supp X S } be the space of matrices whose supports are subsets of the support of X S, let P Ω be the orthogonal projector to Ω under the inner product A, B = tra B; this projection is given by { Mi,j if i, j supp [P ΩM] i,j = X S 0 otherwise for all i [m] := {,, m}, j [n] := {,, n} Furthermore, let T = T X L := {X + X 2 R m n : rangex range X L, rangex 2 range X L } be the span of matrices either with row-space contained in that of X L, or with column-space contained in that of X L Let P T be the orthogonal projector to T, again, under the inner product A, B = tra B; this projection is given by P T M = ŪŪ M + M V V ŪŪ M V V where Ū Rm r V R n r are, respectively, matrices of left right orthonormal singular vectors corresponding to the non-zero singular values of XL, r is the rank of XL We will see that certain operator norms of P Ω P T can be bounded in terms of structural properties of X S X L The first property measures the maximum number of non-zero entries in any row or column of XS : αρ := max { ρ sign X S, ρ sign X S } where M p q := max{ Mv q : v R n, v p }, if M i,j < 0 signm i,j = 0 if M i,j = 0 i [m], j [n] + if M i,j > 0 ρ > 0 is a balancing parameter to accommodate disparity between the number of rows columns; a natural choice for the balancing parameter is ρ := n/m We remark that ρ is only a parameter for the analysis; the optimization formulations do not directly involve ρ Note that XS may have Ωmn non-zero entries α n/m = O mn as long as the nonzero entries of X S are spread out over the entire matrix Conversely, a sparse matrix with just Om + n could have α n/m = mn by having all of its non-zero entries in just a few rows columns The second property measures the sparseness of the singular vectors of XL : βρ := ρ ŪŪ vec + ρ V V vec + Ū 2 V 2 For instance, if the singular vectors of X L are perfectly aligned with the coordinate axes, then βρ = Ω On the other h, if the left right singular vectors have entries bounded by c/m c/n, respectively, for some c, then β n/m 3c r/ mn Our main identifiability result is the following Theorem If inf ρ>0 αρβρ <, then Ω T = {0} Theorem is an immediate consequence of the following lemma also given as Lemma 0 Lemma For all M R m n, P ΩP T M vec inf ρ>0 αρβρ M vec Proof of Theorem : Take any M Ω T By Lemma, P ΩP T M vec αρβρ M vec On the other h, P ΩP T M = M, so αρβρ < implies M vec = 0, ie, M = 0 Clearly, if Ω T contains a matrix other than 0, then { X S + M, X L M : M Ω T } gives a family of sparse/low-rank decompositions of Y = X S + X L with at least the same sparsity rank as X S, X L Conversely, if Ω T = {0}, then any matrix in the direct sum Ω T has exactly one decomposition into a matrix A Ω plus a matrix B T, in this sense X S, X L is identifiable Note that, as we have argued above, the condition inf ρ>0 αρβρ < may be achieved even by matrices X S with Ωmn non-zero entries, provided that the non-zero entries of XS are sufficiently spread out, that XL is low-rank has singular vectors far 3

4 from the coordinate basis This is in contrast with the conditions studied by [] Their analysis uses a different characterization of X S X L, which leads to a stronger identifiability condition in certain cases Roughly, if X S has an approximately symmetric sparsity pattern so sign X S sign X S, then [] requires α β < for square n n matrices Since β = Ω/n for any X L R n n, the condition implies α 2 = On Therefore X S must have at most On non-zero entries or else α 2 becomes superlinear In other words, the fraction of non-zero entries allowed in X S by the condition α β < is quickly vanishing as a function of n C Recovery guarantees Our next results are guarantees for approximately recovering the sparse/low-rank decomposition X S, X L from Y = X S + X L via solving either convex optimization problems or 2 We require a mild strengthening of the condition inf ρ>0 αρβρ <, as well as appropriate settings of λ > 0 µ > 0 for our recovery guarantees Before continuing, we first define another property of X L : γ := Ū V vec which is approximately the same as in fact, bounded above by the third term in the definition of βρ The quantities αρ, βρ, γ are central to our analysis Therefore we state the following proposition for reference, which provides a more intuitive understing of their behavior We note that this is the only part in which any explicit dimensional dependencies comes into our analysis Proposition Let m 0 be the maximum number of nonzero entries of XS per column n 0 be the maximum number of non-zero entries of XS per row Let r be the rank of Ū V Assume further that m 0 c m/ r n 0 c n/ r for some c 0,, Ū vec c2 /m V vec c 2 /n for some c 2 > 0 Then with ρ = n/m, we have αρ c mn, r βρ 3c 2 r mn, γ c 2 r mn [] does not explicitly work out the non-square case, but claims that n can be replaced in their analysis by the larger matrix dimension max{m, n} However this does not seem possible, the analysis there should only lead to the quite suboptimal dimensionality dependency min{m, n} This is because a rectangular matrix X L will have left right singular vectors of different dimensions thus different allowable ranges of infinity norms We now proceed with conditions for the regularized formulation 2 Let E := Y X S + X L ɛ 2 2 := E 2 2 ɛ vec := E vec + P T E vec We require the following, for some ρ > 0 c > : αρβρ < 3 λ αρβρ c µ ɛ 2 2 c αρ αρµ ɛ vec + αργ αρ λ c γ + µ 2 αρβρɛ vec αρβρ c αρβρ For instance, if for some ρ > 0, 4 > 0 5 αργ αρβρ 3 4 4, 6 then the conditions are satisfied for c = 2 provided that µ λ are chosen to satisfy { } 2 µ max 4 ɛ 2 2, 5 ɛvec λ 5 2 γ λ 5 82 αρ 7 Note that 6 can be satisfied when c c 2 /4 in Proposition For the constrained formulation, our analysis requires the same conditions as above, except with E set to 0 Note that our analysis still allows for approximate decompositions; it is only the conditions that are formulated with E = 0 Specifically, we require for some ρ > 0 c > : αρβρ < 8 αρβρ c αργ λ 9 c αρ γ λ c > 0 0 αρβρ c αρβρ For instance, if for some ρ > 0, αργ αρβρ 5 5, then the conditions are satisfied for c = 2 provided that λ is chosen to satisfy 5γ λ 3αρ 2 Note that can be satisfied when c c 2 /5 in Proposition 4

5 In summary, Proposition shows that our results can be applied even with m 0 = Ωm/ r n 0 = Ωn/ r corruptions In contrast, the results of [] only apply under the condition maxm 0, n 0 = O minm, n/ r, which is significantly stronger Moreover, unlike the analysis of [3], we do not have to assume that supp X S is rom The following theorem gives our recovery guarantee for the constrained formulation Theorem 2 Fix a target pair X S, X L R m n R m n satisfying Y X S + X L vec ɛ vec Y X S + X L ɛ Assume the conditions 8, 9, 0 hold for some ρ > 0 c > Let ˆX S, ˆX L R m n be the solution to the convex optimization problem We have { ˆX S X S vec, ˆX L X } L vec max + /c 2 αρβρ ɛ vec αρβρ + /c 2 αρβρ αρβρ ɛ /λ If, in addition for some b X L vec, either: the optimization problem is augmented with the constraint X L vec b letting X L := ˆX L, or ˆXL is post-processed by replacing [ ˆX L ] i,j with [ X L ] i,j := min{max{[ ˆX L ] i,j, b}, b} for all i, j, then we also have X L X L vec2 { min ˆX L X L vec, 2b ˆX L X L vec } The proof of Theorem 2 is in Section V It is clear that if Y = X S + X L, then we can set ɛ vec = ɛ = 0 we obtain exact recovery: ˆXS = X S ˆX L = X L Moreover, any perturbation Y X S + X L affects the accuracy of ˆX S, ˆX L in entry-wise -norm by an amount Oɛ vec + ɛ /λ Note that here, the parameter λ serves to balance the entry-wise -norm trace norm of the perturbation in the same way it is used in the objective function of So, for instance, if we have the simplified conditions, then we may choose λ = 5/3γ/αρ to satisfy 2, upon which the error bound becomes { ˆX S X S vec, ˆX L X } L vec max = O ɛ vec + αρ ɛ γ It is possible to modify the constraints in to use norms other than vec ; the analysis could at the very least be modified by simply using stard relationships to change between norms, although this may introduce new slack in the bounds Finally, the second part of the theorem shows how the accuracy of ˆX L in Frobenius norm can be improved by adding an additional constraint or by post-processing the solution Now we state our recovery guarantees for the regularized formulation 2 Theorem 3 Fix a target pair X S, X L R m n R m n Let E := Y X S + X L ɛ 2 2 ɛ vec := E 2 2 := E vec + P T E vec ɛ := P T E Let k := supp X S r := rank X L Assume the conditions 3, 4, 5 hold for some ρ > 0 c > Let ˆX S, ˆX L R m n be the solution to the convex optimization problem 2 augmented with the constraint X S Y vec b for some b X S Y vec b = is allowed Let r := λ + ɛ vec 2 k µ αρβρ λ + γ + ɛ vec µ + + 2µ ɛ r 2αρ αρβρ We have ˆX S X S vec λ + γ + ɛ vec µ + + 2ɛ 2 2 µ r µ /cλ + λ k µ + 2 k r µ + k ɛvec αρβρ ˆX S X S vec2 { } min ˆX S X S vec, 2b ˆX S X S vec ˆX L X L 2 r ˆX S X S vec2 + ɛ r /c r µ 2 The proof of Theorem 3 is in Section VI As before, if Y = X S + X L so E = 0, then we can set µ 0 obtain exact recovery with ˆX S = X S ˆX L = X L When the perturbation E is non-zero, we control the accuracy of XS in entry-wise -norm 2-norm, the accuracy of X L in trace norm Under the simplified conditions 6, we can choose λ = 5/82/αρ 5

6 µ = max{4ɛ 2 2, 2ɛ vec /5λ} to satisfy 7; this leads to the error bounds ˆX S X S vec = O rαρ max{ɛ 2 2, αρɛ vec } ˆX L X L = } O r min { b ˆX S X S vec, ˆX S X S vec + ɛ + r max { } ɛ 2 2, αρɛ vec here, we have used the facts k αρ 2, αρλ = Θ, r = O r, which also implies that k ɛ vec = Oαρ αρɛ vec Finally, note that if the constraint X S Y vec b is added ie, b <, then the requirement b X S Y vec can be satisfied with b := X S vec + ɛ vec This allows for a possibly improved bound on ˆX L X L Our analysis centers around the construction of a dual certificate using a least-squares method similar to that in related works [], [3] The construction requires the invertibility of P Ω P T a composition of projection operators, which is established in our analysis by studying certain operator norms of P Ω P T in previous works, invertibility is established only under probabilistic assumptions [3] or stricter sparsity conditions [] The rest of the analysis then relates the accuracy of the solutions to 2 to properties of the constructed dual certificate D Examples We illustrate our main results with some simple examples Rom models: We first consider a rom model for the matrices X S X L [] Let the support of X S be chosen uniformly at rom k times over the [m] [n] matrix entries so that one entry can be selected multiple times The value of the entries in the chosen support can be arbitrary With high probability, we have k sign X log n S = O n k sign X log m S = O m so for ρ := n log m/m log n, we have log mlog n αρ = O k mn The logarithmic factors are due to collisions in the rom process Now let Ū V be chosen uniformly at rom over all families of r orthonormal vectors in R m R n, respectively Using arguments similar to those in [6], one can show that with high probability, r log m ŪŪ vec = O m V V r log n vec = O n r log m Ū 2 = O m V r log n 2 = O, n so for the previously chosen ρ, we have log mlog n βρ = O r mn log mlog n γ = O r mn Therefore k rlog mlog n αρβρ = O mn k rlog mlog n αργ = O, mn both of which are provided that k δ mn rlog mlog n for a small enough constant δ 0, In other words, when X L is low-rank, the matrix X S can have nearly a constant fraction of its entries be non-zero while still allowing for exact decomposition of Y = X S + X L Our guarantee improves over that of [] by roughly a factor of Ωmn /4, but is worse by a factor of rlog mlog n relative to the guarantees of [3] for the rom model Therefore there is a gap between our generic deterministic analysis a direct probabilistic analysis of this rom model, this gap seems unavoidable with sparsity conditions based on αρ This is because X L could be an n n for simplicity block diagonal matrix with r blocks of n/r n/r rank- matrices; such a matrix guarantees β = Or/n but has just n 2 /r non-zero entries It is an interesting open problem to find alternative characterizations of supp X S that can narrow or close this gap 6

7 2 Principal component analysis with sparse corruptions: Suppose X L is matrix of m data points lying in a low-dimensional subspace of R n, Z is a rom matrix with independent Gaussian noise entries with variance σ 2 Then Y = X L + Z is the stard model for principal component analysis We augment the model with a sparse noise component XS to obtain Y = X S + X L + Z; here, we allow the non-zero entries of X S to possibly approach infinity According to Theorem 3, we need to estimate Z 2 2, Z vec, P T Z vec, P T Z We have the following with high probability [0], Z 2 2 σ m + σ n + Oσ Using stard arguments with the rotational invariance of the Gaussian distribution, we also have Z vec Oσ logmn P T Z vec Oσ logmn with high probability Finally, by Lemma 5, we have P T Z 2 r Z rσ m + 2 rσ n + O rσ Suppose X S, X L has αρ c mn/ r, βρ = Θ r/ mn, γ = Θ r/ mn satisfies the simplified condition 6 This can be achieved with c c 2 /4 in Proposition Also assume λ µ are chosen to satisfy 7, that b X L vec +ɛ vec Then we note that k = Oc 2 mn/ r 2, thus have from Theorem 3 see the discussion thereafter: ˆX S X S vec = O c mn max{σ m + σ n, σ mn logmn/ r} = O σc mn logmn/ r ˆX L X L = O bσc mn logmn/ r + rσ m + n + c mn, where we may take b = Oσ logmn + X L vec Now consider the situation where both m, n, assume that X L vec remains bounded If c logmn 2 = o which means that the number of corruptions per column is om/logmn 2 the number of deterministic corruptions per row is on/logmn 2 then ˆX L X L = O rσ m + n so the normalized trace norm error of ˆX L tends to zero mn ˆX L X L 0 This means that we can correctly recover the principal components of XL with both deterministic corruptions rom noise, when both m n are large c logmn 2 = o in Proposition III TECHNICAL PRELIMINARIES A Norms, inner products, projections Our analysis involves a variety of norms of vectors, matrices viewed as elements of a vector space as well as linear operators of vectors, linear operators of matrices; we define these related notions in this section Entry-wise norms: For any p [, ], define v p := i v i p /p be the p-norm of a vector v with v := max i v i Also, define M vecp := i,j M i,j p /p to be the entry-wise p-norm of a matrix M again, with M vec := max i,j M i,j Note that vec2 corresponds to the Frobenius norm 2 Inner products, linear operators, orthogonal projections: We endow R m n with the inner product, between matrices that induces the Frobenius norm vec2 ; this is given by M, N = trm N For a linear operator T : R m n R m n, we denote its adjoint by T ; this is the unique linear operator that satisfies T M, N = M, T N for all M R m n N R m n in this work, we only consider bounded linear operators For any two linear operators T T 2, we let T T 2 denote their composition as defined by T T 2 M := T T 2 M Given a subspace W R m n, we let W denote its orthogonal complement, let P W : R m n R m n denote the orthogonal projector to W with respect to,, ie, the unique linear operator with range W satisfying P W = P W P W P W = P W 3 Induced norms: For any two vector norms p q, define M p q := max x 0 Mx q / x p to be the corresponding induced operator norm of a matrix M Our analysis uses the following special cases which have alternative definitions: M = max j Me j, M 2 = max j Me j 2, M 2 2 = spectral norm of M ie, largest singular value of M, M 2 = max i M e i 2, M = max i M e i Here, e i is the ith coordinate vector which has a in the ith position 0 elsewhere Finally, we also consider induced operator norms of linear matrix operators T : R m n R m n in particular, projection operators with respect to, For any two matrix norms, define T := max M 0 T M / M 4 Other norms: The trace norm or nuclear norm M of a matrix M is the sum of the singular values of M We will also make use of a hybrid matrix norm 7

8 ρ, parametrized by ρ > 0, which we define by M ρ := max{ρ M, ρ M } Also define M ρ := sup N ρ M, N, ie, the dual of ρ see below 5 Dual pairs: The matrix norm is said to be dual to if, for all M R m n, M = sup N M, N Proposition 2 Fix any matrix norm, let be its dual For all M R m n N R m n, we have M, N M N Proposition 3 Fix any any linear matrix operator T : R m n R m n any pair of matrix norms We have T = T, where is dual to, is dual to The following pairs of matrix norms are dual to each other: vecp vecq where /p + /q = ; ; 3 ρ ρ by definition 6 Some lemmas: First we show that the ρ norm for any ρ > 0 bounds the spectral norm 2 2 Lemma 2 For any M R m n, we have for all ρ > 0, M 2 2 M ρ Proof: Let σ be the largest singular value of M, let u R m v R n be, respectively, associated left right singular vectors Then [ ] [ ] 0 ρm ρ /2 u ρ M 0 ρ /2 v [ ] [ ] ρ /2 u = ρ /2 Mv ρ /2 M u = σ ρ /2 v Moreover, by definition of, [ ] [ ] 0 ρm ρ /2 u ρ /2 M 0 ρ /2 v [ ] [ ] 0 ρm ρ /2 u ρ M 0 ρ /2 v Therefore [ M 2 2 = σ 0 ρm ρ M 0 ] = max{ ρ M, ρm } = max{ρ M, ρ M } = M ρ The following lemma is the dual of Lemma 2 Lemma 3 For any M R m n, we have for all ρ > 0, M ρ M Proof: We know that M ρ = M, N for some matrix N such that N ρ = Therefore N 2 2 from Lemma 2, thus using Proposition 2, M ρ = M, N M N 2 2 M Finally we state a lemma concerning the invertibility of a certain block-form operator used in our analysis Lemma 4 Fix any matrix norm on R m n linear operators T : R m n R m n T 2 : R m n R m n Let I : R m n R m n be the identity operator, suppose T T 2 < I T T 2 is invertible satisfies I T T 2 T T 2 2 The linear operator on R m n R m n [ ] I T T 2 is invertible, its inverse is given by [ ] I T T 2 I [ I T = T 2 I T T 2 T I T 2 T T 2 I T 2 T Proof: The first claim is a stard application of Taylor expansions The second claim then follows from formulae of block matrix inverses using Schur complements B Projection operators subdifferential sets Recall the definitions of the following subspaces ΩX S := {X R m n : suppx suppx S } T X L := {X + X 2 R m n : I rangex rangex L, rangex 2 rangex L } The orthogonal projectors to these spaces are given in the following proposition ] 8

9 Proposition 4 Fix any X S R m n X L R m n For any matrix M R m n, { Mi,j if i, j suppx [P ΩXS M] i,j = S 0 otherwise for all i m j n, P T XL M = UU M + MV V UU MV V where U V are the matrices of left right singular vectors of X L Lemma 5 Under the setting of Proposition 4, with k := suppx S, P ΩXS M vec k P ΩXS M vec2 k M vec2 P ΩXS M vec k P ΩXS M vec k M vec P T XL M M 2 2 P T XL M 2 rankx L M 2 2 P T XL M vec2 2 rankx L M 2 2 Proof: The first second claims rely on the fact that suppp ΩXS M suppx S, as well as the fact that P ΩXS is an orthonormal projector with respect to the inner product that induces the vec2 norm For the third claim, note that P T XL M 2 2 UU M I UU MV V M 2 2 The remaining claims use a similar decomposition as the third claim as well as the fact that rankuu M rankx L ranki UU MV V } rankx L Define signx S {, 0, +} m n to be the matrix whose i, jth entry is sign[x S ] i,j, define orthx L := UV, where U V, respectively, are matrices of the left right orthonormal singular vectors of X L corresponding to non-zero singular values The following proposition characterizes the subdifferential sets for the non-smooth norms vec [] Proposition 5 The subdifferential set of X S X S vec is XS X S vec = {G R m n : G vec, P ΩXS G = signx S }; the subdifferential set of X L X L is XL X L = {G R m n : G 2 2, P T XL G = orthx L } The following lemma is a simple consequence of subgradient properties Lemma 6 Fix λ > 0 define the function gx S, X L := λ X S vec + X L Consider any X S, X L in R m n R m n If there exists Q R m n such that: Q is a subgradient of λ X S vec at X S = X S, Q is a subgradient of X L at X L = X L, P Ω XS Q vec λ/c P T XL Q 2 2 /c for some c >, then gx S, X L g X S, X L Q, X S + X L X S X L + /cλ P Ω X S X S vec + /c P T X L X L for all X S, X L R m n R m n Proof: Let Ω := Ω XS, T := T XL, S := X S X S, L : X L X L For any subgradient G XS λ X S vec, we have G Q = P ΩG + P Ω G P ΩQ P Ω Q = P Ω G P Ω Q Therefore λ X S + S vec λ X S vec Q, S sup{ G, S Q, S : G XS λ X S vec } sup{ G Q, S : G XS λ X S vec } = sup{ P Ω G P Ω Q, S : G XS λ X S vec } = sup{ P Ω G P Ω Q, P Ω S : G XS λ X S vec } = sup{ P Ω G, P Ω S P Ω Q, P Ω S : G XS λ X S vec } = λ P Ω S vec P Ω Q, P Ω S λ P Ω S vec P Ω Q vec P Ω S vec λ /c P Ω S vec 9

10 where the second-to-last inequality uses the duality of vec vec Proposition 3 Similarly, X L L X L Q, L /c P T L by noting the duality of 2 2 Combining these gives the desired inequality IV RANK-SPARSITY INCOHERENCE Throughout this section, we fix a target X S, X L R m n R m n, let Ω := Ω X S T := T X L Also let Ū V be, respectively, matrices of the left right singular vectors of X L corresponding to non-zero singular values Recall the following structural properties of X S X L : αρ := sign X S ρ = max{ρ sign X S, ρ sign X S }; βρ := ρ ŪŪ vec + ρ V V vec + Ū 2 V 2 ; γ := orth X L vec = Ū V vec The parameter ρ is a balancing parameter to hle disparity between row column dimensions The quantity αρ is the maximum number of non-zero entries in any single row or column The quantities βρ γ measure the coherence of the singular vectors of X L, that is, the alignment of the singular vectors with the coordinate basis For instance, under the conditions of Proposition, we have with ρ = n/m αρ c mn, β ρ 3c 2 rank X L mn γ c 2 rank X L mn for some constants c c 2 A Operator norms of projection operators We show that under the condition inf ρ>0 αρβρ <, the pair X S, X L is identifiable from its sum X S + X L Theorem This is achieved by proving that the composition of projection operators P Ω P T is a contraction as per Lemma, which in turn implies that Ω T = {0} The following two lemmas bound the projection operators P Ω P T in complementary norms Lemma 7 For any M R m n p {, }, we have P ΩM p p sign X S p p M vec This implies, for all ρ > 0, P Ω vec ρ αρ Proof: Define sx S {0, } m n to be the entrywise absolute value of signx S We have P ΩM p p = max{ P ΩMv p : v p } P ΩM vec max{ sp ΩMv p : v p } M vec max{ s X S v p : v p } = M vec sign X S p p The second part follows from the definitions of ρ αρ Lemma 8 For any M R m n, we have P T M vec ŪŪ vec M + V V vec M + Ū 2 V 2 M 2 2 This implies, for all ρ > 0, P T ρ vec βρ Proof: We have P T M vec = ŪŪ M + M V V ŪŪ M V V vec ŪŪ M vec + M V V vec + ŪŪ M V V vec by the triangle inequality The bounds for each term now follow from the definitions: ŪŪ M vec = max M i ŪŪ e i M max i ŪŪ e i = M ŪŪ vec ; M V V vec = max M V V e j j M max V V e j j = M V V vec ; ŪŪ M V V vec = max e i ŪŪ M V V e j i,j max Ū e i 2 Ū M V 2 2 V e j 2 i,j M 2 2 Ū 2 V 2 M ρ Ū 2 V 2 where the second step follows by Cauchy-Schwarz, the fourth step follows from Lemma 2 The second part now follows the definition of βρ 0

11 Now we show that the composition of P Ω P T gives a contraction under the certain norms their duals Lemma 9 For all ρ > 0, P Ω P T ρ ρ αρβρ; 2 P T P Ω vec vec αρβρ; Proof: Immediate from Lemma 7 Lemma 8 Lemma 0 For all ρ > 0, P T P Ω ρ ρ αρβρ; 2 P Ω P T vec vec αρβρ Proof: First note that P T P Ω = P Ω P T = P Ω P T because P Ω P T are self-adjoint, similarly P Ω P T = P T P Ω Now the claim follows by Proposition 3 Lemma 9, using the facts that ρ is dual to ρ that vec is dual to vec Note that Lemma is encompassed by Lemma 0 Another consequence of these contraction properties is the following uncertainty principle, analogous to one stated by [], which effectively states that a matrix X cannot have both signx ρ orthx vec simultaneously small Theorem 4 If X = X S = X L 0, then inf ρ>0 αρβρ Proof: Note that the non-zero element X lives in Ω T, so we get the conclusion by the contrapositive of Theorem B Dual certificate The incoherence properties allow us to construct an approximate dual certificate Q Ω, Q T Ω T that is central to the analysis of the optimization problems 2 The certificate is constructed as the solution to the linear system { P ΩQ Ω + Q T + µ E = λ sign X S P T Q Ω + Q T + µ E = orth X L for some matrix E R m n ; this can be equivalently written as [ ][ ] [ ] I P Ω Q Ω λ sign XS µ = P ΩE I orth X L µ P T E P T Q T We show the existence of the dual certificate Q Ω, Q T under the conditions 3, 4, 5 relative to an arbitrary matrix E Recall that the recovery guarantees for the constrained formulation requires the conditions with E = 0, while the guarantees for the regularized formulation takes E = Y X S + X L Theorem 5 Pick any c >, ρ > 0, E R m n Let k := supp X S r := rank X L Let ɛ 2 2 := E 2 2 ɛ vec := E vec + P T E vec If the following conditions hold: αρβρ < 3 λ αρβρ c µ ɛ 2 2 c αρ αρµ ɛ vec + αργ αρ λ c γ + µ 2 αρβρɛ vec αρβρ c αρβρ 4 > 0 5 these are a restatement of 3, 4, 5, then Q Ω := I P Ω P T λ sign X S P Ωorth X L µ P Ω P T E Ω Q T := I P T P Ω orth X L λp T sign X S µ P T P Ω E T are well-defined satisfy P ΩQ Ω + Q T + µ E = λ sign X S P T Q Ω + Q T + µ E = orth X L P Ω Q Ω + Q T + µ E vec λ/c P T Q Ω + Q T + µ E 2 2 /c Moreover, αρ Q Ω 2 2 αρβρ λ + γ + µ ɛ vec 2αρ Q T 2 2 αρβρ λ + γ + µ ɛ vec + + 2µ ɛ 2 2 Q T 2 r Q T 2 2 Q T vec αρβρ λ + γ + µ ɛ vec 2 Q Ω vec αρβρ λ + γ + µ ɛ vec Q Ω vec k Q Ω vec Q Ω + Q T 2 vec2 λ Q Ω vec + µ λ ɛ vec + Q T + 2µ ɛ 2 2 Remark The dual certificate constitutes an approximate subgradient in the sense that Q Ω + Q T + µ E

12 is a subgradient of both λ X S vec at X S = X S, X L at X L = X L Proof: Under the condition 3, we have αρβρ <, therefore Lemma 9 Lemma 4 imply that the operators I P Ω P T I P T P Ω are invertible satisfy I P Ω P T ρ ρ αρβρ, I P T P Ω vec vec αρβρ Thus Q Ω Q T are well-defined We can bound Q Ω 2 2 as Q Ω 2 2 Q Ω ρ Lemma 2 = I P Ω P T λ sign X S P Ωorth X L µ P Ω P T E ρ αρβρ λ sign XS P Ωorth X L µ P Ω P T E ρ αρβρ λ sign X S ρ + P Ωorth X L ρ + µ P Ω P T E ρ αρ αρβρ λ + γ + µ P T E vec Lemma 7 αρ αρβρ λ + γ + µ ɛ vec Above, we have used the bound P T E vec = E P T E vec ɛ vec Therefore, P T Q Ω + µ E 2 2 I ŪŪ Q ΩI V V P T E 2 2 µ Q Ω µ ɛ 2 2 αρ αρβρ λ + γ + µ ɛ vec + µ ɛ 2 2 The condition 4 now implies that this quantity is at most /c Now we bound Q T vec as Q T vec = I P T P Ω orth X L λp T sign X S µ P T P Ω E vec αρβρ orth XL λp T sign X S µ P T P Ω E vec αρβρ orth X L vec + λ P T sign X S vec + µ P T P Ω E vec αρβρ γ + λαρβρ + µ ɛ vec Lemma 9 Above, we have used the bound P T P Ω E vec = P T E P T P ΩE vec P T E vec + αρβρ E vec ɛ vec Therefore, P Ω Q T + µ E vec Q T vec + µ P Ω E vec αρβρ γ + λαρβρ + µ ɛ vec + µ ɛ vec The condition 5 now implies that this quantity is at most λ/c We also have Q T 2 2 = P T Q Ω + µ E orth X L 2 2 2αρ αρβρ λ + γ + µ ɛ vec + + 2µ ɛ 2 2 since P T Q Ω Q Ω 2 2 P T E 2 2 2ɛ 2 2 by Lemma 5, Q Ω vec = P ΩQ T + µ E λ sign X S vec αρβρ λ + γ + µ ɛ vec + λ + µ ɛ vec The bounds on Q T Q Ω vec follow from the facts that rankq T 2 r suppq Ω k 2

13 Finally, Q Ω + Q T 2 vec2 = Q Ω, P ΩQ Ω + Q T + Q T, P T Q Ω + Q T = Q Ω, λp Ωsign X S µ P ΩE + Q T, P T orth X L µ P T E λ Q Ω vec + µ λ P ΩE vec + Q T + µ P T E 2 2 λ Q Ω vec + µ λ ɛ vec + Q T + 2µ ɛ 2 2 V ANALYSIS OF CONSTRAINED FORMULATION Throughout this section, we fix a target decomposition X S, X L that satisfies the constraints of, let ˆX S, ˆX L be the optimal solution to Let S := ˆX S X S L := ˆX L X L We show that under the conditions of Theorem 5 with E = 0 recall that this does not mean we assume Y X S X L = 0 appropriately chosen λ, solving accurately recovers the target decomposition X S, X L We decompose the errors into symmetric antisymmetric parts avg := S + L /2 mid := S L /2 The constraints allow us to easily bound avg, so most of the analysis involves bounding mid in terms of avg Lemma avg vec ɛ vec avg ɛ Proof: Since both ˆX S, ˆX L X S, X L are feasible solutions to, we have for {vec, }, avg = /2 S + L = /2 ˆX S + ˆX L Y X S + X L Y ˆX S + ˆX L Y + X S + X L Y /2 ɛ Lemma 2 Assume the conditions of Theorem 5 hold with E = 0 We have λ P Ω mid vec + P T mid /c λ avg vec + avg Proof: Let Q := Q Ω + Q T be the dual certificate guaranteed by Theorem 5 Note that Q satisfies the conditions of Lemma 6, so we have λ X S + mid vec + X L mid λ X S vec X L /c λ P Ω mid vec + P T mid Using the triangle inequality, we have λ ˆX S vec + ˆX L = λ X S + S vec + X L + L = λ X S + mid + avg vec + X L mid + avg λ X S + mid vec λ avg vec + X L mid avg Now using the fact that λ ˆX S vec + ˆX L λ X S vec + X L gives the claim Lemma 3 Let k := supp X S Assume the conditions of Theorem 5 hold with E = 0 We have P Ω mid vec /c αρβρ avg vec + avg /λ Proof: Because mid = P Ω mid + P Ω mid = P T mid + P T mid, we have the equation P Ω mid P T mid = P Ω mid + P T mid Separately applying P Ω P T to both sides gives [ I P Ω I P T ] [ ] P Ω mid = P T mid [ P Ω P T mid P T P Ω mid Under the condition αρβρ <, Lemma 0 Lemma 4 imply that I P Ω P T vec vec that αρβρ P Ω mid = I P Ω P T P Ω P T mid + P Ω P T P Ω mid ] 3

14 Therefore P Ω mid vec αρβρ P Ω P T mid vec + P Ω P T P Ω mid vec αρβρ k P T mid vec2 + αρβρ P Ω mid vec Lemma 0 αρβρ k P T mid + αρβρ P Ω mid vec /c αρβρ max{ k, } αρβρ/λ λ avg vec + avg Lemma 2 /c αρβρ avg vec + avg /λ where the last inequality uses the facts k αρ 2, αρβρ <, λαρ this last inequality uses the condition in 4 We now prove Theorem 2, which we restate here for convenience Theorem 6 Theorem 2 restated Assume the conditions of Theorem 5 hold with E = 0 We have by Lemma 2 Lemma 3 The bounds on S vec L vec follow from the bounds on mid vec, avg vec, avg from Lemma If the constraint X L vec b is added, then we can use the facts L vec ˆX L vec + X L vec 2b L vec2 L vec L vec 2b L vec If ˆXL is post-processed, then letting clip ˆX L be the result of the post-processing for all i, j, so [ X L ] i,j [ X L ] i,j [ ˆX L ] i,j [ X L ] i,j X L X L vec L vec X L X L vec2 2b X L X L vec 2b L vec max{ S vec, L vec } + /c 2 αρβρ ɛ vec αρβρ + /c 2 αρβρ αρβρ ɛ /λ If, in addition for some b X L vec, either: the optimization problem is augmented with the constraint X L vec b letting X L := ˆX L, or ˆXL is post-processed by replacing [ ˆX L ] i,j with [ X L ] i,j := min{max{[ ˆX L ] i,j, b}, b} for all i, j, then we also have X L X L vec2 min { L vec, 2b L vec } Proof: First note that since S = avg + mid L = avg mid, we have max{ S vec, L vec } avg vec + mid vec We can bound mid vec as mid vec P Ω mid vec + P Ω mid vec /c + αρβρ avg vec + avg /λ VI ANALYSIS OF REGULARIZED FORMULATION Throughout this section, we fix a target decomposition X S, X L that satisfies X S Y vec b, let ˆX S, ˆX L be the optimal solution to 2 augmented with the constraint X S Y vec b for some b X S Y vec b = is allowed Let S := ˆX S X S L := ˆX L X L We show that under the conditions of Theorem 5 with E = Y X S + X L appropriately chosen λ µ, solving 2 accurately recovers the target decomposition X S, X L Lemma 4 There exists G S, G L, H S R m n such that µ ˆX S + ˆX L Y + λg S + H S = 0; G S vec ; 2 µ ˆX S + ˆX L Y + G L = 0; G L 2 2 ; 3 [H S ] i,j [ S ] i,j 0 i, j Proof: We express the constraint X S Y vec b in 2 as 2mn constraints [X S ] i,j Y i,j b 0 [X S ] i,j + Y i,j b 0 for all i, j Now the corresponding Lagrangian is 2µ X S + X L Y 2 vec2 + λ X S vec + X L + Λ +, X S Y b m,n + Λ, X S + Y b m,n 4

15 where Λ +, Λ 0 m,n is the all-ones m n matrix First-order optimality conditions imply that there exists a subgradient G S of X S vec at X S = ˆX S a subgradient G L of X L at X L = ˆX L such that µ ˆX S + ˆX L Y + λg S + Λ + Λ = 0 µ ˆX S + ˆX L Y + G L = 0 hold with E = Y X S + X L, let Q Ω, Q T be the dual certificate from the conclusion We have αρβρ S vec λ /c Q Ω + Q T 2 vec2 µ + λ kµ + 2 k rµ + k P Ω P T E vec, Now since X S Y vec b, we have [ X S ] i,j Y i,j + b [ X S ] i,j Y i,j + b By complementary slackness, if Λ + i,j > 0, then [ ˆX S ] i,j Y i,j b = 0, which means [ ˆX S ] i,j [ X S ] i,j [ ˆX S ] i,j Y i,j + b = 0 So Λ + i,j [ S] i,j 0 Similarly, if Λ i,j > 0, then [ ˆX S ] i,j [ X S ] i,j 0 So Λ i,j [ S] i,j 0 Therefore H := Λ + Λ satisfies H i,j [ S ] i,j 0 Lemma 5 Assume the conditions of Theorem 5 hold with E = Y X S + X L, let Q Ω, Q T be the dual certificate from the conclusion We have λ P Ω S vec + P T L /c Q Ω + Q T 2 vec2 µ/2 Proof: Let Q := Q Ω + Q T := S + L Since Q + µ E satisfies the conditions of Lemma 6, /c λ P Ω S vec + P T L λ ˆX S vec + ˆX L λ X S vec + X L Q + µ E, S + L Furthermore, by the optimality of ˆX S, ˆX L, λ ˆX S vec + ˆX L λ X S vec + X L 2µ X S + X L Y 2 vec2 2µ ˆX S + ˆX L Y 2 vec2 = 2µ E 2 vec2 2µ S + L E 2 vec2 = 2 E,, 2µ Combining the inequalities gives /c λ P Ω S vec + P T L Q, 2µ, Q 2 vec2 µ/2 where the last inequality follows by taking the maximum value over at = µq Now we prove Theorem 3, restated below with an additional result for L ρ Theorem 7 Theorem 3 restated Let k := supp X S r := rank X L Assume the conditions of Theorem 5 { S vec2 min S vec, 2b S vec }, L ρ /c Q Ω + Q T 2 vec2 µ/2 + min {βρ S vec, } 2 r S vec2 + P T E + 2 rµ, L /c Q Ω + Q T 2 vec2 µ/2 + 2 r S vec2 + P T E + 2 rµ Proof: From Lemma 4, we obtain G S, G L, H S R m n the following equations: λp ΩG S = µ P Ω S + P Ω L P ΩE P ΩH S 6 P T G L = µ P T S + P T L P T E 7 P Ω P T G L = µ P Ω P T S + P Ω P T L P Ω P T E 8 Subtracting 8 from 6 gives µ P Ω S P Ω P T P Ω S P Ω P T P Ω S + P Ω P T L + P ΩH S = λp ΩG S + P Ω P T G L + µ P Ω P T E Moreover, we have sign S, P Ω S = P Ω S vec sign S, P ΩH S = P ΩH S vec, so taking inner products with sign S on both sides of the equation gives the 5

16 following chain of inequalities: µ P Ω S vec + P ΩH S vec µ P Ω P T P Ω S vec + µ P Ω P T P Ω S vec + µ P Ω P T L vec + λ P ΩG S vec + P Ω P T G L vec + µ P Ω P T E vec µ αρβρ P Ω S vec + µ αρβρ P Ω S vec + λ k + µ k P T L vec2 + k P T G L vec2 + µ k P Ω P T E vec µ αρβρ P Ω S vec + µ αρβρ P Ω S vec + µ k P T L vec2 + λ k + 2 k r GL µ k P Ω P T E vec µ αρβρ P Ω S vec + µ αρβρ P Ω S vec + µ k P T L vec2 + λ k + 2 k r + µ k P Ω P T E vec The second third inequalities above follow from Lemma 5 Lemma 0, the fourth inequality uses the fact that G L 2 2 Rearranging the inequality applying Lemma 5 gives αρβρ P Ω S vec αρβρ P Ω S vec + k P T L vec2 + λ kµ + 2 k rµ + k P Ω P T E vec max { αρβρ/λ, k} /c Q Ω + Q T 2 vec2 µ/2 + λ kµ + 2 k rµ + k P Ω P T E vec λ /c Q Ω + Q T 2 vec2 µ/2 + λ kµ + 2 k rµ + k P Ω P T E vec since k αρ 2, αρβρ <, λαρ Now we combine this with S vec P Ω S vec + P Ω S vec Lemma 5 to get the first bound For the second bound, we use the facts S vec ˆX S Y vec + X S Y vec 2b S vec2 S vec S vec 2b S vec For the third fourth bounds, we obtain from 7 P T L ρ P T S ρ + P T E ρ + µ P T G L ρ P T vec ρ S vec + P T E + µ P T G L Lemma 3 = P T ρ vec S vec + P T E + µ P T G L Proposition 3 βρ S vec + P T E + µ P T G L Lemma 8 βρ S vec + P T E + 2 rµ Lemma 5 G L 2 2 P T L P T S + P T E + µ P T G L 2 r S vec2 + P T E + 2 rµ Lemma 5 G L 2 2 Now we combine these with L ρ P T L ρ + P T L ρ P T L + min{ P T L, P T L ρ } Lemma 3 L P T L + P T L Lemma 5 Note that we have an error bound for L in ρ norm, which can be significantly smaller than the bound for the trace norm of L ACKNOWLEDGMENT We thank Emmanuel Cès for clarifications about the results in [3] REFERENCES [] V Chrasekaran, S Sanghavi, P A Parrilo, A S Willsky, Rank-Sparsity Incoherence for Matrix Decomposition, ArXiv e- prints, Jun 2009 [2] V Chrasekaran, P A Parrilo, A S Willsky, Latent Variable Graphical Model Selection via Convex Optimization, ArXiv e-prints, Aug 200 [3] E J Cès, X Li, Y Ma, J Wright, Robust Principal Component Analysis? ArXiv e-prints, Dec 2009 [4] R Tibshirani, Regression shrinkage selection via the lasso, J Royal Statist Soc B, vol 58, no, pp , 996 [5] M Fazel, Matrix rank minimization with applications, PhD dissertation, Department of Electrical Engineering, Stanford University, 2002 [6] E Cès R Recht, Exact matrix completion via convex optimization, Foundations of Computational Mathematics, vol 9, pp ,

17 [7] D Gross, Recovering low-rank matrices from few coefficients in any basis, ArXiv e-prints, Oct 2009 [8] Z Zhou, X Li, J Wright, E J Cès, Y Ma, Stable principal component pursuit, in Proceedings of International Symposium on Information Theory, 200 [9] A Ganesh, J Wright, X Li, E J Cès, Y Ma, Dense error correction for low-rank matrices via principal component pursuit, in Proceedings of International Symposium on Information Theory, 200 [0] K R Davidson S J Szarek, Local operator theory, rom matrices banach spaces, in Hbook of the geometry of Banach spaces, Vol I, 200, pp [] G A Watson, Characterization of the subdifferential of some matrix norms, Linear Algebra Applications, vol 70, pp , 992 Daniel Hsu received a BS in Computer Science Engineering from the University of California, Berkeley in 2004 a PhD in Computer Science from the University of California, San Diego in 200 From 200 to 20, he was a postdoctoral scholar at Rutgers University a visiting scholar at the Wharton School of the University of Pennsylvania He is currently a postdoctoral researcher at Microsoft Research New Engl His research interests are in algorithmic statistics machine learning Sham M Kakade is currently an associate professor of statistics at the Wharton School at the University of Pennsylvania, a Senior Researcher at Microsoft Research New Engl He received his BA in Physics from the California Institute of Technology his PhD from the Gatsby Computational Neuroscience Unit affiliated with University College London He spent the following two years as a Postdoctoral Researcher at the Department of Computer Information Science at the University of Pennsylvania Subsequently, he joined the Toyota Technological Institute, where he was an assistant professor for four years His research focuses on artificial intelligence machine learning, their connections to other areas such as game theory economics Tong Zhang received a BA in Mathematics Computer Science from Cornell University in 994 a PhD in Computer Science from Stanford University in 998 After graduation, he worked at IBM TJ Watson Research Center in Yorktown Heights, New York, Yahoo Research in New York city He is currently a professor of statistics at Rutgers University His research interests include machine learning, algorithms for statistical computation, their mathematical analysis applications 7

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

arxiv: v1 [math.oc] 11 Jun 2009

arxiv: v1 [math.oc] 11 Jun 2009 RANK-SPARSITY INCOHERENCE FOR MATRIX DECOMPOSITION VENKAT CHANDRASEKARAN, SUJAY SANGHAVI, PABLO A. PARRILO, S. WILLSKY AND ALAN arxiv:0906.2220v1 [math.oc] 11 Jun 2009 Abstract. Suppose we are given a

More information

Sparse and Low-Rank Matrix Decompositions

Sparse and Low-Rank Matrix Decompositions Forty-Seventh Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 30 - October 2, 2009 Sparse and Low-Rank Matrix Decompositions Venkat Chandrasekaran, Sujay Sanghavi, Pablo A. Parrilo,

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

EE227 Project Report Robust Principal Component Analysis. Maximilian Balandat Walid Krichene Chi Pang Lam Ka Kit Lam

EE227 Project Report Robust Principal Component Analysis. Maximilian Balandat Walid Krichene Chi Pang Lam Ka Kit Lam EE227 Project Report Robust Principal Component Analysis Maximilian Balandat Walid Krichene Chi Pang Lam Ka Kit Lam May 10, 2012 May 10, 2012 1 Introduction Over the past decade there has been an explosion

More information

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min. MA 796S: Convex Optimization and Interior Point Methods October 8, 2007 Lecture 1 Lecturer: Kartik Sivaramakrishnan Scribe: Kartik Sivaramakrishnan 1 Conic programming Consider the conic program min s.t.

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies July 12, 212 Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies Morteza Mardani Dept. of ECE, University of Minnesota, Minneapolis, MN 55455 Acknowledgments:

More information

Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright by SIAM. Unauthorized reproduction of this article is prohibited. SIAM J. OPTIM. Vol. 21, No. 2, pp. 572 596 2011 Society for Industrial and Applied Mathematics RANK-SPARSITY INCOHERENCE FOR MATRIX DECOMPOSITION * VENKAT CHANDRASEKARAN, SUJAY SANGHAVI, PABLO A. PARRILO,

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

On Optimal Frame Conditioners

On Optimal Frame Conditioners On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

Information-Theoretic Limits of Matrix Completion

Information-Theoretic Limits of Matrix Completion Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Breaking the Limits of Subspace Inference

Breaking the Limits of Subspace Inference Breaking the Limits of Subspace Inference Claudia R. Solís-Lemus, Daniel L. Pimentel-Alarcón Emory University, Georgia State University Abstract Inferring low-dimensional subspaces that describe high-dimensional,

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY Uplink Downlink Duality Via Minimax Duality. Wei Yu, Member, IEEE (1) (2)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY Uplink Downlink Duality Via Minimax Duality. Wei Yu, Member, IEEE (1) (2) IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY 2006 361 Uplink Downlink Duality Via Minimax Duality Wei Yu, Member, IEEE Abstract The sum capacity of a Gaussian vector broadcast channel

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Part 1a: Inner product, Orthogonality, Vector/Matrix norm

Part 1a: Inner product, Orthogonality, Vector/Matrix norm Part 1a: Inner product, Orthogonality, Vector/Matrix norm September 19, 2018 Numerical Linear Algebra Part 1a September 19, 2018 1 / 16 1. Inner product on a linear space V over the number field F A map,

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Robust PCA via Outlier Pursuit

Robust PCA via Outlier Pursuit 1 Robust PCA via Outlier Pursuit Huan Xu, Constantine Caramanis, Member, and Sujay Sanghavi, Member Abstract Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

The Convex Geometry of Linear Inverse Problems

The Convex Geometry of Linear Inverse Problems Found Comput Math 2012) 12:805 849 DOI 10.1007/s10208-012-9135-7 The Convex Geometry of Linear Inverse Problems Venkat Chandrasekaran Benjamin Recht Pablo A. Parrilo Alan S. Willsky Received: 2 December

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Low-Rank Matrix Recovery

Low-Rank Matrix Recovery ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery

More information

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed

More information

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A). Math 344 Lecture #19 3.5 Normed Linear Spaces Definition 3.5.1. A seminorm on a vector space V over F is a map : V R that for all x, y V and for all α F satisfies (i) x 0 (positivity), (ii) αx = α x (scale

More information

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global homas Laurent * 1 James H. von Brecht * 2 Abstract We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analysis Handout 1 Luca Trevisan October 3, 017 Scribed by Maxim Rabinovich Lecture 1 In which we begin to prove that the SDP relaxation exactly recovers communities

More information

Mathematics Department Stanford University Math 61CM/DM Inner products

Mathematics Department Stanford University Math 61CM/DM Inner products Mathematics Department Stanford University Math 61CM/DM Inner products Recall the definition of an inner product space; see Appendix A.8 of the textbook. Definition 1 An inner product space V is a vector

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 2 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory April 5, 2012 Andre Tkacenko

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank

A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank Hongyang Zhang, Zhouchen Lin, and Chao Zhang Key Lab. of Machine Perception (MOE), School of EECS Peking University,

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Normed & Inner Product Vector Spaces

Normed & Inner Product Vector Spaces Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed

More information

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization Noname manuscript No. (will be inserted by the editor) Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization Hui Zhang Wotao Yin Lizhi Cheng Received: / Accepted: Abstract This

More information

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning Krishnakumar Balasubramanian Georgia Institute of Technology krishnakumar3@gatech.edu Kai Yu Baidu Inc. yukai@baidu.com Tong

More information

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector

More information

Jianhua Z. Huang, Haipeng Shen, Andreas Buja

Jianhua Z. Huang, Haipeng Shen, Andreas Buja Several Flawed Approaches to Penalized SVDs A supplementary note to The analysis of two-way functional data using two-way regularized singular value decompositions Jianhua Z. Huang, Haipeng Shen, Andreas

More information

Recovery of Simultaneously Structured Models using Convex Optimization

Recovery of Simultaneously Structured Models using Convex Optimization Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3 MI 6.97 Algebraic techniques and semidefinite optimization February 4, 6 Lecture 3 Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo In this lecture, we will discuss one of the most important applications

More information

Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Supplemental for Spectral Algorithm For Latent Tree Graphical Models Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

Contents. 0.1 Notation... 3

Contents. 0.1 Notation... 3 Contents 0.1 Notation........................................ 3 1 A Short Course on Frame Theory 4 1.1 Examples of Signal Expansions............................ 4 1.2 Signal Expansions in Finite-Dimensional

More information

Lecture 9: Low Rank Approximation

Lecture 9: Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

Robust PCA via Outlier Pursuit

Robust PCA via Outlier Pursuit Robust PCA via Outlier Pursuit Huan Xu Electrical and Computer Engineering University of Texas at Austin huan.xu@mail.utexas.edu Constantine Caramanis Electrical and Computer Engineering University of

More information

Universal low-rank matrix recovery from Pauli measurements

Universal low-rank matrix recovery from Pauli measurements Universal low-rank matrix recovery from Pauli measurements Yi-Kai Liu Applied and Computational Mathematics Division National Institute of Standards and Technology Gaithersburg, MD, USA yi-kai.liu@nist.gov

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

arxiv: v2 [stat.ml] 1 Jul 2013

arxiv: v2 [stat.ml] 1 Jul 2013 A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank Hongyang Zhang, Zhouchen Lin, and Chao Zhang Key Lab. of Machine Perception (MOE), School of EECS Peking University,

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information