THIS work studies additive decompositions of matrices

Size: px

Start display at page:

Download "THIS work studies additive decompositions of matrices"

Berenice Carr
6 years ago
Views:

1 Robust Matrix Decomposition with Sparse Corruptions Daniel Hsu, Sham M Kakade, Tong Zhang Abstract Suppose a given observation matrix can be decomposed as the sum of a low-rank matrix a sparse matrix, the goal is to recover these individual components from the observed sum Such additive decompositions have applications in a variety of numerical problems including system identification, latent variable graphical modeling, principal components analysis We study conditions under which recovering such a decomposition is possible via a combination of l norm trace norm minimization We are specifically interested in the question of how many sparse corruptions are allowed so that convex programming can still achieve accurate recovery, we obtain stronger recovery guarantees than previous studies Moreover, we do not assume that the spatial pattern of corruptions is rom, which sts in contrast to related analyses under such assumptions via matrix completion Index Terms Matrix decompositions, sparsity, lowrank, outliers I INTRODUCTION THIS work studies additive decompositions of matrices into sparse low-rank components Such decompositions have found applications in a variety of numerical problems, including system identification [], latent variable graphical modeling [2], principal component analysis PCA [3] In these settings, the user has an input matrix Y R m n which is believed to be the sum of a sparse matrix X S a low-rank matrix X L For instance, in the application to PCA, X L represents a matrix of m data points from a lowdimensional subspace of R n, is corrupted by a sparse D Hsu is with Microsoft Research New Engl, Cambridge, MA 0242 USA dahsu@microsoftcom S M Kakade is with Microsoft Research New Engl, Cambridge, MA 0242 USA, also with the Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA skakade@whartonupennedu This author was partially supported by NSF grant IIS-0865 T Zhang is with the Department of Statistics, Rutgers University tzhang@statrutgersedu This author was partially supported by the following grants: AFOSR FA , NSA-AMS 08024, NSF DMS , NSF IIS-0606 Copyright c 20 IEEE Personal use of this material is permitted However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubspermissions@ieeeorg matrix X S of errors before being observed as Y = X S + X L sparse low-rank The goal is to recover the original data matrix X L the error components X S from the corrupted observations Y In the latent variable model application of Chrasekaran et al [2], Y represents the precision matrix over visible nodes of a Gaussian graphical model, X S represents the precision matrix over the visible nodes when conditioned on the hidden nodes In general, Y may be dense as a result of dependencies between visible nodes through the hidden nodes However, X S will be sparse when the visible nodes are mostly independent after conditioning on the hidden nodes, the difference X L = Y X S will be low-rank when the number of hidden nodes is small The goal is then to infer the relevant dependency structure from just the visible nodes measurements of their correlations Even if the matrix Y is exactly the sum of a sparse matrix X S a low-rank matrix X L, it may be impossible to identify these components from the sum For instance, the sparse matrix X S may be low-rank, or the low-rank matrix X L may be sparse In such cases, these components may be confused for each other, thus the desired decomposition of Y may not be identifiable Therefore, one must impose conditions on the sparse low-rank components in order to guarantee their identifiability from Y We present sufficient conditions under which X S X L are identifiable from the sum Y Essentially, we require that X S not be too dense in any single row or column, that the singular vectors of X L not be too sparse The level of denseness sparseness are considered jointly in the conditions in order to obtain the weakest possible conditions Under a mild strengthening of the condition, we also show that X S X L can be recovered by solving certain convex programs, that the solution is robust under small perturbations of Y The first program we consider is min λ X S vec + X L

2 subject to certain feasibility constraints such as X S + X L Y ɛ, where vec is the entry-wise -norm is the trace norm These norms are natural convex surrogates for the sparsity of X S the rank of X L [4], [5], which are generally intractable to optimize We also consider a regularized formulation min 2µ X S + X L Y 2 vec2 + λ X S vec + X L where vec2 is the Frobenius norm; this formulation may be more suitable in certain applications enjoys different recovery guarantees A Related work Our work closely follows that of Chrasekaran et al [], who initiated the study of rank-sparsity incoherence its application to matrix decompositions There, the authors identify parameters that characterize the incoherence of X S X L sufficient to guarantee identifiability recovery using convex programs However, their analysis of this characterization yields conditions that are significantly stronger than those given in our present work For instance, the allowed fraction of nonzero entries in X S is quickly vanishing as a function of the matrix size, even under the most favorable conditions on X L ; our analysis does not have this restriction allows X S to have up to Ωmn non-zero entries when X L is low-rank has non-sparse singular vectors In terms of the PCA application, our analysis allows for up to a constant fraction of the data matrix entries to be corrupted by noise of arbitrary magnitude, while the analysis of [] requires that it decrease as a function of the matrix dimensions Moreover, [] only considers exact decompositions, which may be unrealistic in certain applications; we allow for approximate decompositions, study the effect of perturbations on the accuracy of the recovered components The application to principal component analysis with gross sparse errors was studied by Cès et al [3], building on previous results analysis techniques for the related matrix completion problem eg, [6], [7] The sparse errors model of [3] requires that the support of the sparse matrix X S be rom, which can be unrealistic in some settings However, the conditions are significantly weaker than those of []: for instance, they allow for Ωmn non-zero entries in X S Our work makes no probabilistic assumption on the sparsity pattern of X S instead studies purely deterministic structural conditions The price we pay, however, is roughly a factor of rankx L in what is allowed for the support size of X S relative to the probabilistic analysis of [3] Narrowing this gap with alternative deterministic conditions is an interesting open problem Follow-up work to [3] studies the robustness of the recovery procedure [8], as well as quantitatively weaker conditions on X S [9], but these works are only considered under the rom support model Our work is therefore largely complementary to these probabilistic analyses B Outline We describe our main results in Section II In Section III, we review a number of technical tools such as matrix operator norms that are used to characterize the rank-sparsity incoherence properties of the desired decomposition Section IV analyzes these incoherence properties in detail, giving sufficient conditions for identifiability as well as for certifying the approximate optimality of a target decomposition for our optimization formulations The main recovery guarantees are proved in Sections V VI II MAIN RESULTS Fix an observation matrix Y R m n Our goal is to approximately decompose the matrix Y into the sum of a sparse matrix X S a low-rank matrix X L A Optimization formulations We consider two convex optimization problems over X S, X L R m n R m n The first is the constrained formulation parametrized by λ > 0, ɛ vec 0, ɛ 0 min st λ X S vec + X L X S + X L Y vec ɛ vec X S + X L Y ɛ where vec is the entry-wise -norm, is the trace norm ie, sum of singular values The second is the regularized formulation with regularization parameter µ > 0 min 2µ X S + X L Y 2 vec2 + λ X S vec + X L 2 where vec2 is the Frobenius norm entry-wise 2- norm We also consider adding a constraint to control X L vec, the entry-wise -norm of X L To, we add the constraint to 2, we add X L vec b X S Y vec b 2

3 The parameter b is intended as a natural bound for X L is typically known in applications For example, in image processing, the values of interest may lie in the interval [0, 255] say, hence, we might take b = 500 as a relaxation of the box constraint [0, 255] The core of our analyses do not rely on these additional constraints; we only consider them to obtain improved robustness guarantees for recovering X L, which may be important in some applications B Identifiability conditions Our first result is a refinement of the rank-sparsity incoherence notion developed by [] We characterize a target decomposition of Y into Y = X S + X L by the projection operators to subspaces associated with X S X L Let Ω = Ω X S := {X R m n : suppx supp X S } be the space of matrices whose supports are subsets of the support of X S, let P Ω be the orthogonal projector to Ω under the inner product A, B = tra B; this projection is given by { Mi,j if i, j supp [P ΩM] i,j = X S 0 otherwise for all i [m] := {,, m}, j [n] := {,, n} Furthermore, let T = T X L := {X + X 2 R m n : rangex range X L, rangex 2 range X L } be the span of matrices either with row-space contained in that of X L, or with column-space contained in that of X L Let P T be the orthogonal projector to T, again, under the inner product A, B = tra B; this projection is given by P T M = ŪŪ M + M V V ŪŪ M V V where Ū Rm r V R n r are, respectively, matrices of left right orthonormal singular vectors corresponding to the non-zero singular values of XL, r is the rank of XL We will see that certain operator norms of P Ω P T can be bounded in terms of structural properties of X S X L The first property measures the maximum number of non-zero entries in any row or column of XS : αρ := max { ρ sign X S, ρ sign X S } where M p q := max{ Mv q : v R n, v p }, if M i,j < 0 signm i,j = 0 if M i,j = 0 i [m], j [n] + if M i,j > 0 ρ > 0 is a balancing parameter to accommodate disparity between the number of rows columns; a natural choice for the balancing parameter is ρ := n/m We remark that ρ is only a parameter for the analysis; the optimization formulations do not directly involve ρ Note that XS may have Ωmn non-zero entries α n/m = O mn as long as the nonzero entries of X S are spread out over the entire matrix Conversely, a sparse matrix with just Om + n could have α n/m = mn by having all of its non-zero entries in just a few rows columns The second property measures the sparseness of the singular vectors of XL : βρ := ρ ŪŪ vec + ρ V V vec + Ū 2 V 2 For instance, if the singular vectors of X L are perfectly aligned with the coordinate axes, then βρ = Ω On the other h, if the left right singular vectors have entries bounded by c/m c/n, respectively, for some c, then β n/m 3c r/ mn Our main identifiability result is the following Theorem If inf ρ>0 αρβρ <, then Ω T = {0} Theorem is an immediate consequence of the following lemma also given as Lemma 0 Lemma For all M R m n, P ΩP T M vec inf ρ>0 αρβρ M vec Proof of Theorem : Take any M Ω T By Lemma, P ΩP T M vec αρβρ M vec On the other h, P ΩP T M = M, so αρβρ < implies M vec = 0, ie, M = 0 Clearly, if Ω T contains a matrix other than 0, then { X S + M, X L M : M Ω T } gives a family of sparse/low-rank decompositions of Y = X S + X L with at least the same sparsity rank as X S, X L Conversely, if Ω T = {0}, then any matrix in the direct sum Ω T has exactly one decomposition into a matrix A Ω plus a matrix B T, in this sense X S, X L is identifiable Note that, as we have argued above, the condition inf ρ>0 αρβρ < may be achieved even by matrices X S with Ωmn non-zero entries, provided that the non-zero entries of XS are sufficiently spread out, that XL is low-rank has singular vectors far 3

4 from the coordinate basis This is in contrast with the conditions studied by [] Their analysis uses a different characterization of X S X L, which leads to a stronger identifiability condition in certain cases Roughly, if X S has an approximately symmetric sparsity pattern so sign X S sign X S, then [] requires α β < for square n n matrices Since β = Ω/n for any X L R n n, the condition implies α 2 = On Therefore X S must have at most On non-zero entries or else α 2 becomes superlinear In other words, the fraction of non-zero entries allowed in X S by the condition α β < is quickly vanishing as a function of n C Recovery guarantees Our next results are guarantees for approximately recovering the sparse/low-rank decomposition X S, X L from Y = X S + X L via solving either convex optimization problems or 2 We require a mild strengthening of the condition inf ρ>0 αρβρ <, as well as appropriate settings of λ > 0 µ > 0 for our recovery guarantees Before continuing, we first define another property of X L : γ := Ū V vec which is approximately the same as in fact, bounded above by the third term in the definition of βρ The quantities αρ, βρ, γ are central to our analysis Therefore we state the following proposition for reference, which provides a more intuitive understing of their behavior We note that this is the only part in which any explicit dimensional dependencies comes into our analysis Proposition Let m 0 be the maximum number of nonzero entries of XS per column n 0 be the maximum number of non-zero entries of XS per row Let r be the rank of Ū V Assume further that m 0 c m/ r n 0 c n/ r for some c 0,, Ū vec c2 /m V vec c 2 /n for some c 2 > 0 Then with ρ = n/m, we have αρ c mn, r βρ 3c 2 r mn, γ c 2 r mn [] does not explicitly work out the non-square case, but claims that n can be replaced in their analysis by the larger matrix dimension max{m, n} However this does not seem possible, the analysis there should only lead to the quite suboptimal dimensionality dependency min{m, n} This is because a rectangular matrix X L will have left right singular vectors of different dimensions thus different allowable ranges of infinity norms We now proceed with conditions for the regularized formulation 2 Let E := Y X S + X L ɛ 2 2 := E 2 2 ɛ vec := E vec + P T E vec We require the following, for some ρ > 0 c > : αρβρ < 3 λ αρβρ c µ ɛ 2 2 c αρ αρµ ɛ vec + αργ αρ λ c γ + µ 2 αρβρɛ vec αρβρ c αρβρ For instance, if for some ρ > 0, 4 > 0 5 αργ αρβρ 3 4 4, 6 then the conditions are satisfied for c = 2 provided that µ λ are chosen to satisfy { } 2 µ max 4 ɛ 2 2, 5 ɛvec λ 5 2 γ λ 5 82 αρ 7 Note that 6 can be satisfied when c c 2 /4 in Proposition For the constrained formulation, our analysis requires the same conditions as above, except with E set to 0 Note that our analysis still allows for approximate decompositions; it is only the conditions that are formulated with E = 0 Specifically, we require for some ρ > 0 c > : αρβρ < 8 αρβρ c αργ λ 9 c αρ γ λ c > 0 0 αρβρ c αρβρ For instance, if for some ρ > 0, αργ αρβρ 5 5, then the conditions are satisfied for c = 2 provided that λ is chosen to satisfy 5γ λ 3αρ 2 Note that can be satisfied when c c 2 /5 in Proposition 4

5 In summary, Proposition shows that our results can be applied even with m 0 = Ωm/ r n 0 = Ωn/ r corruptions In contrast, the results of [] only apply under the condition maxm 0, n 0 = O minm, n/ r, which is significantly stronger Moreover, unlike the analysis of [3], we do not have to assume that supp X S is rom The following theorem gives our recovery guarantee for the constrained formulation Theorem 2 Fix a target pair X S, X L R m n R m n satisfying Y X S + X L vec ɛ vec Y X S + X L ɛ Assume the conditions 8, 9, 0 hold for some ρ > 0 c > Let ˆX S, ˆX L R m n be the solution to the convex optimization problem We have { ˆX S X S vec, ˆX L X } L vec max + /c 2 αρβρ ɛ vec αρβρ + /c 2 αρβρ αρβρ ɛ /λ If, in addition for some b X L vec, either: the optimization problem is augmented with the constraint X L vec b letting X L := ˆX L, or ˆXL is post-processed by replacing [ ˆX L ] i,j with [ X L ] i,j := min{max{[ ˆX L ] i,j, b}, b} for all i, j, then we also have X L X L vec2 { min ˆX L X L vec, 2b ˆX L X L vec } The proof of Theorem 2 is in Section V It is clear that if Y = X S + X L, then we can set ɛ vec = ɛ = 0 we obtain exact recovery: ˆXS = X S ˆX L = X L Moreover, any perturbation Y X S + X L affects the accuracy of ˆX S, ˆX L in entry-wise -norm by an amount Oɛ vec + ɛ /λ Note that here, the parameter λ serves to balance the entry-wise -norm trace norm of the perturbation in the same way it is used in the objective function of So, for instance, if we have the simplified conditions, then we may choose λ = 5/3γ/αρ to satisfy 2, upon which the error bound becomes { ˆX S X S vec, ˆX L X } L vec max = O ɛ vec + αρ ɛ γ It is possible to modify the constraints in to use norms other than vec ; the analysis could at the very least be modified by simply using stard relationships to change between norms, although this may introduce new slack in the bounds Finally, the second part of the theorem shows how the accuracy of ˆX L in Frobenius norm can be improved by adding an additional constraint or by post-processing the solution Now we state our recovery guarantees for the regularized formulation 2 Theorem 3 Fix a target pair X S, X L R m n R m n Let E := Y X S + X L ɛ 2 2 ɛ vec := E 2 2 := E vec + P T E vec ɛ := P T E Let k := supp X S r := rank X L Assume the conditions 3, 4, 5 hold for some ρ > 0 c > Let ˆX S, ˆX L R m n be the solution to the convex optimization problem 2 augmented with the constraint X S Y vec b for some b X S Y vec b = is allowed Let r := λ + ɛ vec 2 k µ αρβρ λ + γ + ɛ vec µ + + 2µ ɛ r 2αρ αρβρ We have ˆX S X S vec λ + γ + ɛ vec µ + + 2ɛ 2 2 µ r µ /cλ + λ k µ + 2 k r µ + k ɛvec αρβρ ˆX S X S vec2 { } min ˆX S X S vec, 2b ˆX S X S vec ˆX L X L 2 r ˆX S X S vec2 + ɛ r /c r µ 2 The proof of Theorem 3 is in Section VI As before, if Y = X S + X L so E = 0, then we can set µ 0 obtain exact recovery with ˆX S = X S ˆX L = X L When the perturbation E is non-zero, we control the accuracy of XS in entry-wise -norm 2-norm, the accuracy of X L in trace norm Under the simplified conditions 6, we can choose λ = 5/82/αρ 5

6 µ = max{4ɛ 2 2, 2ɛ vec /5λ} to satisfy 7; this leads to the error bounds ˆX S X S vec = O rαρ max{ɛ 2 2, αρɛ vec } ˆX L X L = } O r min { b ˆX S X S vec, ˆX S X S vec + ɛ + r max { } ɛ 2 2, αρɛ vec here, we have used the facts k αρ 2, αρλ = Θ, r = O r, which also implies that k ɛ vec = Oαρ αρɛ vec Finally, note that if the constraint X S Y vec b is added ie, b <, then the requirement b X S Y vec can be satisfied with b := X S vec + ɛ vec This allows for a possibly improved bound on ˆX L X L Our analysis centers around the construction of a dual certificate using a least-squares method similar to that in related works [], [3] The construction requires the invertibility of P Ω P T a composition of projection operators, which is established in our analysis by studying certain operator norms of P Ω P T in previous works, invertibility is established only under probabilistic assumptions [3] or stricter sparsity conditions [] The rest of the analysis then relates the accuracy of the solutions to 2 to properties of the constructed dual certificate D Examples We illustrate our main results with some simple examples Rom models: We first consider a rom model for the matrices X S X L [] Let the support of X S be chosen uniformly at rom k times over the [m] [n] matrix entries so that one entry can be selected multiple times The value of the entries in the chosen support can be arbitrary With high probability, we have k sign X log n S = O n k sign X log m S = O m so for ρ := n log m/m log n, we have log mlog n αρ = O k mn The logarithmic factors are due to collisions in the rom process Now let Ū V be chosen uniformly at rom over all families of r orthonormal vectors in R m R n, respectively Using arguments similar to those in [6], one can show that with high probability, r log m ŪŪ vec = O m V V r log n vec = O n r log m Ū 2 = O m V r log n 2 = O, n so for the previously chosen ρ, we have log mlog n βρ = O r mn log mlog n γ = O r mn Therefore k rlog mlog n αρβρ = O mn k rlog mlog n αργ = O, mn both of which are provided that k δ mn rlog mlog n for a small enough constant δ 0, In other words, when X L is low-rank, the matrix X S can have nearly a constant fraction of its entries be non-zero while still allowing for exact decomposition of Y = X S + X L Our guarantee improves over that of [] by roughly a factor of Ωmn /4, but is worse by a factor of rlog mlog n relative to the guarantees of [3] for the rom model Therefore there is a gap between our generic deterministic analysis a direct probabilistic analysis of this rom model, this gap seems unavoidable with sparsity conditions based on αρ This is because X L could be an n n for simplicity block diagonal matrix with r blocks of n/r n/r rank- matrices; such a matrix guarantees β = Or/n but has just n 2 /r non-zero entries It is an interesting open problem to find alternative characterizations of supp X S that can narrow or close this gap 6

7 2 Principal component analysis with sparse corruptions: Suppose X L is matrix of m data points lying in a low-dimensional subspace of R n, Z is a rom matrix with independent Gaussian noise entries with variance σ 2 Then Y = X L + Z is the stard model for principal component analysis We augment the model with a sparse noise component XS to obtain Y = X S + X L + Z; here, we allow the non-zero entries of X S to possibly approach infinity According to Theorem 3, we need to estimate Z 2 2, Z vec, P T Z vec, P T Z We have the following with high probability [0], Z 2 2 σ m + σ n + Oσ Using stard arguments with the rotational invariance of the Gaussian distribution, we also have Z vec Oσ logmn P T Z vec Oσ logmn with high probability Finally, by Lemma 5, we have P T Z 2 r Z rσ m + 2 rσ n + O rσ Suppose X S, X L has αρ c mn/ r, βρ = Θ r/ mn, γ = Θ r/ mn satisfies the simplified condition 6 This can be achieved with c c 2 /4 in Proposition Also assume λ µ are chosen to satisfy 7, that b X L vec +ɛ vec Then we note that k = Oc 2 mn/ r 2, thus have from Theorem 3 see the discussion thereafter: ˆX S X S vec = O c mn max{σ m + σ n, σ mn logmn/ r} = O σc mn logmn/ r ˆX L X L = O bσc mn logmn/ r + rσ m + n + c mn, where we may take b = Oσ logmn + X L vec Now consider the situation where both m, n, assume that X L vec remains bounded If c logmn 2 = o which means that the number of corruptions per column is om/logmn 2 the number of deterministic corruptions per row is on/logmn 2 then ˆX L X L = O rσ m + n so the normalized trace norm error of ˆX L tends to zero mn ˆX L X L 0 This means that we can correctly recover the principal components of XL with both deterministic corruptions rom noise, when both m n are large c logmn 2 = o in Proposition III TECHNICAL PRELIMINARIES A Norms, inner products, projections Our analysis involves a variety of norms of vectors, matrices viewed as elements of a vector space as well as linear operators of vectors, linear operators of matrices; we define these related notions in this section Entry-wise norms: For any p [, ], define v p := i v i p /p be the p-norm of a vector v with v := max i v i Also, define M vecp := i,j M i,j p /p to be the entry-wise p-norm of a matrix M again, with M vec := max i,j M i,j Note that vec2 corresponds to the Frobenius norm 2 Inner products, linear operators, orthogonal projections: We endow R m n with the inner product, between matrices that induces the Frobenius norm vec2 ; this is given by M, N = trm N For a linear operator T : R m n R m n, we denote its adjoint by T ; this is the unique linear operator that satisfies T M, N = M, T N for all M R m n N R m n in this work, we only consider bounded linear operators For any two linear operators T T 2, we let T T 2 denote their composition as defined by T T 2 M := T T 2 M Given a subspace W R m n, we let W denote its orthogonal complement, let P W : R m n R m n denote the orthogonal projector to W with respect to,, ie, the unique linear operator with range W satisfying P W = P W P W P W = P W 3 Induced norms: For any two vector norms p q, define M p q := max x 0 Mx q / x p to be the corresponding induced operator norm of a matrix M Our analysis uses the following special cases which have alternative definitions: M = max j Me j, M 2 = max j Me j 2, M 2 2 = spectral norm of M ie, largest singular value of M, M 2 = max i M e i 2, M = max i M e i Here, e i is the ith coordinate vector which has a in the ith position 0 elsewhere Finally, we also consider induced operator norms of linear matrix operators T : R m n R m n in particular, projection operators with respect to, For any two matrix norms, define T := max M 0 T M / M 4 Other norms: The trace norm or nuclear norm M of a matrix M is the sum of the singular values of M We will also make use of a hybrid matrix norm 7

8 ρ, parametrized by ρ > 0, which we define by M ρ := max{ρ M, ρ M } Also define M ρ := sup N ρ M, N, ie, the dual of ρ see below 5 Dual pairs: The matrix norm is said to be dual to if, for all M R m n, M = sup N M, N Proposition 2 Fix any matrix norm, let be its dual For all M R m n N R m n, we have M, N M N Proposition 3 Fix any any linear matrix operator T : R m n R m n any pair of matrix norms We have T = T, where is dual to, is dual to The following pairs of matrix norms are dual to each other: vecp vecq where /p + /q = ; ; 3 ρ ρ by definition 6 Some lemmas: First we show that the ρ norm for any ρ > 0 bounds the spectral norm 2 2 Lemma 2 For any M R m n, we have for all ρ > 0, M 2 2 M ρ Proof: Let σ be the largest singular value of M, let u R m v R n be, respectively, associated left right singular vectors Then [ ] [ ] 0 ρm ρ /2 u ρ M 0 ρ /2 v [ ] [ ] ρ /2 u = ρ /2 Mv ρ /2 M u = σ ρ /2 v Moreover, by definition of, [ ] [ ] 0 ρm ρ /2 u ρ /2 M 0 ρ /2 v [ ] [ ] 0 ρm ρ /2 u ρ M 0 ρ /2 v Therefore [ M 2 2 = σ 0 ρm ρ M 0 ] = max{ ρ M, ρm } = max{ρ M, ρ M } = M ρ The following lemma is the dual of Lemma 2 Lemma 3 For any M R m n, we have for all ρ > 0, M ρ M Proof: We know that M ρ = M, N for some matrix N such that N ρ = Therefore N 2 2 from Lemma 2, thus using Proposition 2, M ρ = M, N M N 2 2 M Finally we state a lemma concerning the invertibility of a certain block-form operator used in our analysis Lemma 4 Fix any matrix norm on R m n linear operators T : R m n R m n T 2 : R m n R m n Let I : R m n R m n be the identity operator, suppose T T 2 < I T T 2 is invertible satisfies I T T 2 T T 2 2 The linear operator on R m n R m n [ ] I T T 2 is invertible, its inverse is given by [ ] I T T 2 I [ I T = T 2 I T T 2 T I T 2 T T 2 I T 2 T Proof: The first claim is a stard application of Taylor expansions The second claim then follows from formulae of block matrix inverses using Schur complements B Projection operators subdifferential sets Recall the definitions of the following subspaces ΩX S := {X R m n : suppx suppx S } T X L := {X + X 2 R m n : I rangex rangex L, rangex 2 rangex L } The orthogonal projectors to these spaces are given in the following proposition ] 8

9 Proposition 4 Fix any X S R m n X L R m n For any matrix M R m n, { Mi,j if i, j suppx [P ΩXS M] i,j = S 0 otherwise for all i m j n, P T XL M = UU M + MV V UU MV V where U V are the matrices of left right singular vectors of X L Lemma 5 Under the setting of Proposition 4, with k := suppx S, P ΩXS M vec k P ΩXS M vec2 k M vec2 P ΩXS M vec k P ΩXS M vec k M vec P T XL M M 2 2 P T XL M 2 rankx L M 2 2 P T XL M vec2 2 rankx L M 2 2 Proof: The first second claims rely on the fact that suppp ΩXS M suppx S, as well as the fact that P ΩXS is an orthonormal projector with respect to the inner product that induces the vec2 norm For the third claim, note that P T XL M 2 2 UU M I UU MV V M 2 2 The remaining claims use a similar decomposition as the third claim as well as the fact that rankuu M rankx L ranki UU MV V } rankx L Define signx S {, 0, +} m n to be the matrix whose i, jth entry is sign[x S ] i,j, define orthx L := UV, where U V, respectively, are matrices of the left right orthonormal singular vectors of X L corresponding to non-zero singular values The following proposition characterizes the subdifferential sets for the non-smooth norms vec [] Proposition 5 The subdifferential set of X S X S vec is XS X S vec = {G R m n : G vec, P ΩXS G = signx S }; the subdifferential set of X L X L is XL X L = {G R m n : G 2 2, P T XL G = orthx L } The following lemma is a simple consequence of subgradient properties Lemma 6 Fix λ > 0 define the function gx S, X L := λ X S vec + X L Consider any X S, X L in R m n R m n If there exists Q R m n such that: Q is a subgradient of λ X S vec at X S = X S, Q is a subgradient of X L at X L = X L, P Ω XS Q vec λ/c P T XL Q 2 2 /c for some c >, then gx S, X L g X S, X L Q, X S + X L X S X L + /cλ P Ω X S X S vec + /c P T X L X L for all X S, X L R m n R m n Proof: Let Ω := Ω XS, T := T XL, S := X S X S, L : X L X L For any subgradient G XS λ X S vec, we have G Q = P ΩG + P Ω G P ΩQ P Ω Q = P Ω G P Ω Q Therefore λ X S + S vec λ X S vec Q, S sup{ G, S Q, S : G XS λ X S vec } sup{ G Q, S : G XS λ X S vec } = sup{ P Ω G P Ω Q, S : G XS λ X S vec } = sup{ P Ω G P Ω Q, P Ω S : G XS λ X S vec } = sup{ P Ω G, P Ω S P Ω Q, P Ω S : G XS λ X S vec } = λ P Ω S vec P Ω Q, P Ω S λ P Ω S vec P Ω Q vec P Ω S vec λ /c P Ω S vec 9

10 where the second-to-last inequality uses the duality of vec vec Proposition 3 Similarly, X L L X L Q, L /c P T L by noting the duality of 2 2 Combining these gives the desired inequality IV RANK-SPARSITY INCOHERENCE Throughout this section, we fix a target X S, X L R m n R m n, let Ω := Ω X S T := T X L Also let Ū V be, respectively, matrices of the left right singular vectors of X L corresponding to non-zero singular values Recall the following structural properties of X S X L : αρ := sign X S ρ = max{ρ sign X S, ρ sign X S }; βρ := ρ ŪŪ vec + ρ V V vec + Ū 2 V 2 ; γ := orth X L vec = Ū V vec The parameter ρ is a balancing parameter to hle disparity between row column dimensions The quantity αρ is the maximum number of non-zero entries in any single row or column The quantities βρ γ measure the coherence of the singular vectors of X L, that is, the alignment of the singular vectors with the coordinate basis For instance, under the conditions of Proposition, we have with ρ = n/m αρ c mn, β ρ 3c 2 rank X L mn γ c 2 rank X L mn for some constants c c 2 A Operator norms of projection operators We show that under the condition inf ρ>0 αρβρ <, the pair X S, X L is identifiable from its sum X S + X L Theorem This is achieved by proving that the composition of projection operators P Ω P T is a contraction as per Lemma, which in turn implies that Ω T = {0} The following two lemmas bound the projection operators P Ω P T in complementary norms Lemma 7 For any M R m n p {, }, we have P ΩM p p sign X S p p M vec This implies, for all ρ > 0, P Ω vec ρ αρ Proof: Define sx S {0, } m n to be the entrywise absolute value of signx S We have P ΩM p p = max{ P ΩMv p : v p } P ΩM vec max{ sp ΩMv p : v p } M vec max{ s X S v p : v p } = M vec sign X S p p The second part follows from the definitions of ρ αρ Lemma 8 For any M R m n, we have P T M vec ŪŪ vec M + V V vec M + Ū 2 V 2 M 2 2 This implies, for all ρ > 0, P T ρ vec βρ Proof: We have P T M vec = ŪŪ M + M V V ŪŪ M V V vec ŪŪ M vec + M V V vec + ŪŪ M V V vec by the triangle inequality The bounds for each term now follow from the definitions: ŪŪ M vec = max M i ŪŪ e i M max i ŪŪ e i = M ŪŪ vec ; M V V vec = max M V V e j j M max V V e j j = M V V vec ; ŪŪ M V V vec = max e i ŪŪ M V V e j i,j max Ū e i 2 Ū M V 2 2 V e j 2 i,j M 2 2 Ū 2 V 2 M ρ Ū 2 V 2 where the second step follows by Cauchy-Schwarz, the fourth step follows from Lemma 2 The second part now follows the definition of βρ 0

11 Now we show that the composition of P Ω P T gives a contraction under the certain norms their duals Lemma 9 For all ρ > 0, P Ω P T ρ ρ αρβρ; 2 P T P Ω vec vec αρβρ; Proof: Immediate from Lemma 7 Lemma 8 Lemma 0 For all ρ > 0, P T P Ω ρ ρ αρβρ; 2 P Ω P T vec vec αρβρ Proof: First note that P T P Ω = P Ω P T = P Ω P T because P Ω P T are self-adjoint, similarly P Ω P T = P T P Ω Now the claim follows by Proposition 3 Lemma 9, using the facts that ρ is dual to ρ that vec is dual to vec Note that Lemma is encompassed by Lemma 0 Another consequence of these contraction properties is the following uncertainty principle, analogous to one stated by [], which effectively states that a matrix X cannot have both signx ρ orthx vec simultaneously small Theorem 4 If X = X S = X L 0, then inf ρ>0 αρβρ Proof: Note that the non-zero element X lives in Ω T, so we get the conclusion by the contrapositive of Theorem B Dual certificate The incoherence properties allow us to construct an approximate dual certificate Q Ω, Q T Ω T that is central to the analysis of the optimization problems 2 The certificate is constructed as the solution to the linear system { P ΩQ Ω + Q T + µ E = λ sign X S P T Q Ω + Q T + µ E = orth X L for some matrix E R m n ; this can be equivalently written as [ ][ ] [ ] I P Ω Q Ω λ sign XS µ = P ΩE I orth X L µ P T E P T Q T We show the existence of the dual certificate Q Ω, Q T under the conditions 3, 4, 5 relative to an arbitrary matrix E Recall that the recovery guarantees for the constrained formulation requires the conditions with E = 0, while the guarantees for the regularized formulation takes E = Y X S + X L Theorem 5 Pick any c >, ρ > 0, E R m n Let k := supp X S r := rank X L Let ɛ 2 2 := E 2 2 ɛ vec := E vec + P T E vec If the following conditions hold: αρβρ < 3 λ αρβρ c µ ɛ 2 2 c αρ αρµ ɛ vec + αργ αρ λ c γ + µ 2 αρβρɛ vec αρβρ c αρβρ 4 > 0 5 these are a restatement of 3, 4, 5, then Q Ω := I P Ω P T λ sign X S P Ωorth X L µ P Ω P T E Ω Q T := I P T P Ω orth X L λp T sign X S µ P T P Ω E T are well-defined satisfy P ΩQ Ω + Q T + µ E = λ sign X S P T Q Ω + Q T + µ E = orth X L P Ω Q Ω + Q T + µ E vec λ/c P T Q Ω + Q T + µ E 2 2 /c Moreover, αρ Q Ω 2 2 αρβρ λ + γ + µ ɛ vec 2αρ Q T 2 2 αρβρ λ + γ + µ ɛ vec + + 2µ ɛ 2 2 Q T 2 r Q T 2 2 Q T vec αρβρ λ + γ + µ ɛ vec 2 Q Ω vec αρβρ λ + γ + µ ɛ vec Q Ω vec k Q Ω vec Q Ω + Q T 2 vec2 λ Q Ω vec + µ λ ɛ vec + Q T + 2µ ɛ 2 2 Remark The dual certificate constitutes an approximate subgradient in the sense that Q Ω + Q T + µ E

12 is a subgradient of both λ X S vec at X S = X S, X L at X L = X L Proof: Under the condition 3, we have αρβρ <, therefore Lemma 9 Lemma 4 imply that the operators I P Ω P T I P T P Ω are invertible satisfy I P Ω P T ρ ρ αρβρ, I P T P Ω vec vec αρβρ Thus Q Ω Q T are well-defined We can bound Q Ω 2 2 as Q Ω 2 2 Q Ω ρ Lemma 2 = I P Ω P T λ sign X S P Ωorth X L µ P Ω P T E ρ αρβρ λ sign XS P Ωorth X L µ P Ω P T E ρ αρβρ λ sign X S ρ + P Ωorth X L ρ + µ P Ω P T E ρ αρ αρβρ λ + γ + µ P T E vec Lemma 7 αρ αρβρ λ + γ + µ ɛ vec Above, we have used the bound P T E vec = E P T E vec ɛ vec Therefore, P T Q Ω + µ E 2 2 I ŪŪ Q ΩI V V P T E 2 2 µ Q Ω µ ɛ 2 2 αρ αρβρ λ + γ + µ ɛ vec + µ ɛ 2 2 The condition 4 now implies that this quantity is at most /c Now we bound Q T vec as Q T vec = I P T P Ω orth X L λp T sign X S µ P T P Ω E vec αρβρ orth XL λp T sign X S µ P T P Ω E vec αρβρ orth X L vec + λ P T sign X S vec + µ P T P Ω E vec αρβρ γ + λαρβρ + µ ɛ vec Lemma 9 Above, we have used the bound P T P Ω E vec = P T E P T P ΩE vec P T E vec + αρβρ E vec ɛ vec Therefore, P Ω Q T + µ E vec Q T vec + µ P Ω E vec αρβρ γ + λαρβρ + µ ɛ vec + µ ɛ vec The condition 5 now implies that this quantity is at most λ/c We also have Q T 2 2 = P T Q Ω + µ E orth X L 2 2 2αρ αρβρ λ + γ + µ ɛ vec + + 2µ ɛ 2 2 since P T Q Ω Q Ω 2 2 P T E 2 2 2ɛ 2 2 by Lemma 5, Q Ω vec = P ΩQ T + µ E λ sign X S vec αρβρ λ + γ + µ ɛ vec + λ + µ ɛ vec The bounds on Q T Q Ω vec follow from the facts that rankq T 2 r suppq Ω k 2

13 Finally, Q Ω + Q T 2 vec2 = Q Ω, P ΩQ Ω + Q T + Q T, P T Q Ω + Q T = Q Ω, λp Ωsign X S µ P ΩE + Q T, P T orth X L µ P T E λ Q Ω vec + µ λ P ΩE vec + Q T + µ P T E 2 2 λ Q Ω vec + µ λ ɛ vec + Q T + 2µ ɛ 2 2 V ANALYSIS OF CONSTRAINED FORMULATION Throughout this section, we fix a target decomposition X S, X L that satisfies the constraints of, let ˆX S, ˆX L be the optimal solution to Let S := ˆX S X S L := ˆX L X L We show that under the conditions of Theorem 5 with E = 0 recall that this does not mean we assume Y X S X L = 0 appropriately chosen λ, solving accurately recovers the target decomposition X S, X L We decompose the errors into symmetric antisymmetric parts avg := S + L /2 mid := S L /2 The constraints allow us to easily bound avg, so most of the analysis involves bounding mid in terms of avg Lemma avg vec ɛ vec avg ɛ Proof: Since both ˆX S, ˆX L X S, X L are feasible solutions to, we have for {vec, }, avg = /2 S + L = /2 ˆX S + ˆX L Y X S + X L Y ˆX S + ˆX L Y + X S + X L Y /2 ɛ Lemma 2 Assume the conditions of Theorem 5 hold with E = 0 We have λ P Ω mid vec + P T mid /c λ avg vec + avg Proof: Let Q := Q Ω + Q T be the dual certificate guaranteed by Theorem 5 Note that Q satisfies the conditions of Lemma 6, so we have λ X S + mid vec + X L mid λ X S vec X L /c λ P Ω mid vec + P T mid Using the triangle inequality, we have λ ˆX S vec + ˆX L = λ X S + S vec + X L + L = λ X S + mid + avg vec + X L mid + avg λ X S + mid vec λ avg vec + X L mid avg Now using the fact that λ ˆX S vec + ˆX L λ X S vec + X L gives the claim Lemma 3 Let k := supp X S Assume the conditions of Theorem 5 hold with E = 0 We have P Ω mid vec /c αρβρ avg vec + avg /λ Proof: Because mid = P Ω mid + P Ω mid = P T mid + P T mid, we have the equation P Ω mid P T mid = P Ω mid + P T mid Separately applying P Ω P T to both sides gives [ I P Ω I P T ] [ ] P Ω mid = P T mid [ P Ω P T mid P T P Ω mid Under the condition αρβρ <, Lemma 0 Lemma 4 imply that I P Ω P T vec vec that αρβρ P Ω mid = I P Ω P T P Ω P T mid + P Ω P T P Ω mid ] 3

14 Therefore P Ω mid vec αρβρ P Ω P T mid vec + P Ω P T P Ω mid vec αρβρ k P T mid vec2 + αρβρ P Ω mid vec Lemma 0 αρβρ k P T mid + αρβρ P Ω mid vec /c αρβρ max{ k, } αρβρ/λ λ avg vec + avg Lemma 2 /c αρβρ avg vec + avg /λ where the last inequality uses the facts k αρ 2, αρβρ <, λαρ this last inequality uses the condition in 4 We now prove Theorem 2, which we restate here for convenience Theorem 6 Theorem 2 restated Assume the conditions of Theorem 5 hold with E = 0 We have by Lemma 2 Lemma 3 The bounds on S vec L vec follow from the bounds on mid vec, avg vec, avg from Lemma If the constraint X L vec b is added, then we can use the facts L vec ˆX L vec + X L vec 2b L vec2 L vec L vec 2b L vec If ˆXL is post-processed, then letting clip ˆX L be the result of the post-processing for all i, j, so [ X L ] i,j [ X L ] i,j [ ˆX L ] i,j [ X L ] i,j X L X L vec L vec X L X L vec2 2b X L X L vec 2b L vec max{ S vec, L vec } + /c 2 αρβρ ɛ vec αρβρ + /c 2 αρβρ αρβρ ɛ /λ If, in addition for some b X L vec, either: the optimization problem is augmented with the constraint X L vec b letting X L := ˆX L, or ˆXL is post-processed by replacing [ ˆX L ] i,j with [ X L ] i,j := min{max{[ ˆX L ] i,j, b}, b} for all i, j, then we also have X L X L vec2 min { L vec, 2b L vec } Proof: First note that since S = avg + mid L = avg mid, we have max{ S vec, L vec } avg vec + mid vec We can bound mid vec as mid vec P Ω mid vec + P Ω mid vec /c + αρβρ avg vec + avg /λ VI ANALYSIS OF REGULARIZED FORMULATION Throughout this section, we fix a target decomposition X S, X L that satisfies X S Y vec b, let ˆX S, ˆX L be the optimal solution to 2 augmented with the constraint X S Y vec b for some b X S Y vec b = is allowed Let S := ˆX S X S L := ˆX L X L We show that under the conditions of Theorem 5 with E = Y X S + X L appropriately chosen λ µ, solving 2 accurately recovers the target decomposition X S, X L Lemma 4 There exists G S, G L, H S R m n such that µ ˆX S + ˆX L Y + λg S + H S = 0; G S vec ; 2 µ ˆX S + ˆX L Y + G L = 0; G L 2 2 ; 3 [H S ] i,j [ S ] i,j 0 i, j Proof: We express the constraint X S Y vec b in 2 as 2mn constraints [X S ] i,j Y i,j b 0 [X S ] i,j + Y i,j b 0 for all i, j Now the corresponding Lagrangian is 2µ X S + X L Y 2 vec2 + λ X S vec + X L + Λ +, X S Y b m,n + Λ, X S + Y b m,n 4

15 where Λ +, Λ 0 m,n is the all-ones m n matrix First-order optimality conditions imply that there exists a subgradient G S of X S vec at X S = ˆX S a subgradient G L of X L at X L = ˆX L such that µ ˆX S + ˆX L Y + λg S + Λ + Λ = 0 µ ˆX S + ˆX L Y + G L = 0 hold with E = Y X S + X L, let Q Ω, Q T be the dual certificate from the conclusion We have αρβρ S vec λ /c Q Ω + Q T 2 vec2 µ + λ kµ + 2 k rµ + k P Ω P T E vec, Now since X S Y vec b, we have [ X S ] i,j Y i,j + b [ X S ] i,j Y i,j + b By complementary slackness, if Λ + i,j > 0, then [ ˆX S ] i,j Y i,j b = 0, which means [ ˆX S ] i,j [ X S ] i,j [ ˆX S ] i,j Y i,j + b = 0 So Λ + i,j [ S] i,j 0 Similarly, if Λ i,j > 0, then [ ˆX S ] i,j [ X S ] i,j 0 So Λ i,j [ S] i,j 0 Therefore H := Λ + Λ satisfies H i,j [ S ] i,j 0 Lemma 5 Assume the conditions of Theorem 5 hold with E = Y X S + X L, let Q Ω, Q T be the dual certificate from the conclusion We have λ P Ω S vec + P T L /c Q Ω + Q T 2 vec2 µ/2 Proof: Let Q := Q Ω + Q T := S + L Since Q + µ E satisfies the conditions of Lemma 6, /c λ P Ω S vec + P T L λ ˆX S vec + ˆX L λ X S vec + X L Q + µ E, S + L Furthermore, by the optimality of ˆX S, ˆX L, λ ˆX S vec + ˆX L λ X S vec + X L 2µ X S + X L Y 2 vec2 2µ ˆX S + ˆX L Y 2 vec2 = 2µ E 2 vec2 2µ S + L E 2 vec2 = 2 E,, 2µ Combining the inequalities gives /c λ P Ω S vec + P T L Q, 2µ, Q 2 vec2 µ/2 where the last inequality follows by taking the maximum value over at = µq Now we prove Theorem 3, restated below with an additional result for L ρ Theorem 7 Theorem 3 restated Let k := supp X S r := rank X L Assume the conditions of Theorem 5 { S vec2 min S vec, 2b S vec }, L ρ /c Q Ω + Q T 2 vec2 µ/2 + min {βρ S vec, } 2 r S vec2 + P T E + 2 rµ, L /c Q Ω + Q T 2 vec2 µ/2 + 2 r S vec2 + P T E + 2 rµ Proof: From Lemma 4, we obtain G S, G L, H S R m n the following equations: λp ΩG S = µ P Ω S + P Ω L P ΩE P ΩH S 6 P T G L = µ P T S + P T L P T E 7 P Ω P T G L = µ P Ω P T S + P Ω P T L P Ω P T E 8 Subtracting 8 from 6 gives µ P Ω S P Ω P T P Ω S P Ω P T P Ω S + P Ω P T L + P ΩH S = λp ΩG S + P Ω P T G L + µ P Ω P T E Moreover, we have sign S, P Ω S = P Ω S vec sign S, P ΩH S = P ΩH S vec, so taking inner products with sign S on both sides of the equation gives the 5

16 following chain of inequalities: µ P Ω S vec + P ΩH S vec µ P Ω P T P Ω S vec + µ P Ω P T P Ω S vec + µ P Ω P T L vec + λ P ΩG S vec + P Ω P T G L vec + µ P Ω P T E vec µ αρβρ P Ω S vec + µ αρβρ P Ω S vec + λ k + µ k P T L vec2 + k P T G L vec2 + µ k P Ω P T E vec µ αρβρ P Ω S vec + µ αρβρ P Ω S vec + µ k P T L vec2 + λ k + 2 k r GL µ k P Ω P T E vec µ αρβρ P Ω S vec + µ αρβρ P Ω S vec + µ k P T L vec2 + λ k + 2 k r + µ k P Ω P T E vec The second third inequalities above follow from Lemma 5 Lemma 0, the fourth inequality uses the fact that G L 2 2 Rearranging the inequality applying Lemma 5 gives αρβρ P Ω S vec αρβρ P Ω S vec + k P T L vec2 + λ kµ + 2 k rµ + k P Ω P T E vec max { αρβρ/λ, k} /c Q Ω + Q T 2 vec2 µ/2 + λ kµ + 2 k rµ + k P Ω P T E vec λ /c Q Ω + Q T 2 vec2 µ/2 + λ kµ + 2 k rµ + k P Ω P T E vec since k αρ 2, αρβρ <, λαρ Now we combine this with S vec P Ω S vec + P Ω S vec Lemma 5 to get the first bound For the second bound, we use the facts S vec ˆX S Y vec + X S Y vec 2b S vec2 S vec S vec 2b S vec For the third fourth bounds, we obtain from 7 P T L ρ P T S ρ + P T E ρ + µ P T G L ρ P T vec ρ S vec + P T E + µ P T G L Lemma 3 = P T ρ vec S vec + P T E + µ P T G L Proposition 3 βρ S vec + P T E + µ P T G L Lemma 8 βρ S vec + P T E + 2 rµ Lemma 5 G L 2 2 P T L P T S + P T E + µ P T G L 2 r S vec2 + P T E + 2 rµ Lemma 5 G L 2 2 Now we combine these with L ρ P T L ρ + P T L ρ P T L + min{ P T L, P T L ρ } Lemma 3 L P T L + P T L Lemma 5 Note that we have an error bound for L in ρ norm, which can be significantly smaller than the bound for the trace norm of L ACKNOWLEDGMENT We thank Emmanuel Cès for clarifications about the results in [3] REFERENCES [] V Chrasekaran, S Sanghavi, P A Parrilo, A S Willsky, Rank-Sparsity Incoherence for Matrix Decomposition, ArXiv e- prints, Jun 2009 [2] V Chrasekaran, P A Parrilo, A S Willsky, Latent Variable Graphical Model Selection via Convex Optimization, ArXiv e-prints, Aug 200 [3] E J Cès, X Li, Y Ma, J Wright, Robust Principal Component Analysis? ArXiv e-prints, Dec 2009 [4] R Tibshirani, Regression shrinkage selection via the lasso, J Royal Statist Soc B, vol 58, no, pp , 996 [5] M Fazel, Matrix rank minimization with applications, PhD dissertation, Department of Electrical Engineering, Stanford University, 2002 [6] E Cès R Recht, Exact matrix completion via convex optimization, Foundations of Computational Mathematics, vol 9, pp ,

17 [7] D Gross, Recovering low-rank matrices from few coefficients in any basis, ArXiv e-prints, Oct 2009 [8] Z Zhou, X Li, J Wright, E J Cès, Y Ma, Stable principal component pursuit, in Proceedings of International Symposium on Information Theory, 200 [9] A Ganesh, J Wright, X Li, E J Cès, Y Ma, Dense error correction for low-rank matrices via principal component pursuit, in Proceedings of International Symposium on Information Theory, 200 [0] K R Davidson S J Szarek, Local operator theory, rom matrices banach spaces, in Hbook of the geometry of Banach spaces, Vol I, 200, pp [] G A Watson, Characterization of the subdifferential of some matrix norms, Linear Algebra Applications, vol 70, pp , 992 Daniel Hsu received a BS in Computer Science Engineering from the University of California, Berkeley in 2004 a PhD in Computer Science from the University of California, San Diego in 200 From 200 to 20, he was a postdoctoral scholar at Rutgers University a visiting scholar at the Wharton School of the University of Pennsylvania He is currently a postdoctoral researcher at Microsoft Research New Engl His research interests are in algorithmic statistics machine learning Sham M Kakade is currently an associate professor of statistics at the Wharton School at the University of Pennsylvania, a Senior Researcher at Microsoft Research New Engl He received his BA in Physics from the California Institute of Technology his PhD from the Gatsby Computational Neuroscience Unit affiliated with University College London He spent the following two years as a Postdoctoral Researcher at the Department of Computer Information Science at the University of Pennsylvania Subsequently, he joined the Toyota Technological Institute, where he was an assistant professor for four years His research focuses on artificial intelligence machine learning, their connections to other areas such as game theory economics Tong Zhang received a BA in Mathematics Computer Science from Cornell University in 994 a PhD in Computer Science from Stanford University in 998 After graduation, he worked at IBM TJ Watson Research Center in Yorktown Heights, New York, Yahoo Research in New York city He is currently a professor of statistics at Rutgers University His research interests include machine learning, algorithms for statistical computation, their mathematical analysis applications 7

Robust Principal Component Analysis

ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M