Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization

Size: px
Start display at page:

Download "Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization"

Transcription

1 Robust Principal Coponent Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optiization John Wright, Arvind Ganesh, Shankar Rao, and Yi Ma Departent of Electrical Engineering University of Illinois at Urbana-Chapaign Visual Coputing Group Microsoft Research Asia Abstract. Principal coponent analysis is a fundaental operation in coputational data analysis, with yriad applications ranging fro web search to bioinforatics to coputer vision and iage analysis. However, its perforance and applicability in real scenarios are liited by a lack of robustness to outlying or corrupted observations. This paper considers the idealized robust principal coponent analysis proble of recovering a low rank atrix A fro corrupted observations D = A + E. Here, the error entries E can be arbitrarily large (odeling grossly corrupted observations coon in visual and bioinforatic data), but are assued to be sparse. We prove that ost atrices A can be efficiently and exactly recovered fro ost error sign-and-support patterns, by solving a siple convex progra. Our result holds even when the rank of A grows nearly proportionally (up to a logarithic factor) to the diensionality of the observation space and the nuber of errors E grows in proportion to the total nuber of entries in the atrix. A by-product of our analysis is the first proportional growth results for the related but soewhat easier proble of copleting a low-rank atrix fro a sall fraction of its entries. We propose an algorith based on iterative thresholding that, for large atrices, is significantly faster and ore scalable than general-purpose solvers. We give siulations and real-data exaples corroborating the theoretical results. Corresponding author: John Wright, Roo 146 Coordinated Science Laboratory, 1308 West Main Street, Urbana, IL Eail: jnwright@uiuc.edu 1

2 1 Introduction The proble of finding and exploiting low-diensional structure in high-diensional data is taking on increasing iportance in iage, audio and video processing, web search, and bioinforatics, where datasets now routinely lie in thousand- or even illion-diensional observation spaces. The curse of diensionality is in full play here: eaningful inference with liited nuber of observations requires soe assuption that the data have low intrinsic coplexity, e.g., that they are low-rank 15], sparse in soe basis 10], or lie on soe low-diensional anifold 7, ]. Perhaps the siplest useful assuption is that the observations all lie near soe low-diensional subspace. In other words, if we stack all the observations as colun vectors of a atrix M R n, the atrix should be (approxiately) low rank. Principal coponent analysis (PCA) 15, 0] seeks the best (in an l -sense) such low-rank representation of the given data atrix. It enjoys a nuber of optiality properties when the data are only ildly corrupted by sall noise, and can be stably and efficiently coputed via the singular value decoposition. One ajor shortcoing of classical PCA is its brittleness with respect to grossly corrupted or outlying observations 0]. Gross errors are ubiquitous in odern applications in iaging and bioinforatics, where soe easureents ay be arbitrarily corrupted (e.g., due to occlusion or sensor failure) or siply irrelevant to the structure we are trying to identify. A nuber of natural approaches to robustifying PCA have been explored in the literature. These approaches include influence function techniques 19, 8], ultivariate triing 18], alternating iniization 1], and rando sapling techniques 16]. Unfortunately, none of these existing approaches yields a polynoial-tie algorith with strong perforance guarantees. 1 In this paper, we consider an idealization of the robust PCA proble, in which the goal is to recover a low-rank atrix A fro highly corrupted easureents D = A + E. The errors E can be arbitrary in agnitude, but are assued to be sparsely supported, affecting only a fraction of the entries of D. This should be contrasted with the classical setting in which the atrix A is perturbed by sall (but densely supported) noise. In that setting, classical PCA, coputed via the singular value decoposition, reains optial if the noise is Gaussian. Here, on the other hand, even a sall fraction of large errors can cause arbitrary corruption in PCA s estiate of the low rank structure, A. Our approach to robust PCA is otivated by two recent, and tightly related, lines of research. The first set of results concerns the robust solution of over-deterined linear systes of equations in the presence of arbitrary, but sparse errors. These results iply that for generic systes of equations, it is possible to correct a constant fraction of arbitrary errors in polynoial tie 7]. This is achieved by eploying the l 1 -nor as a convex surrogate for the highly-nonconvex l 0 -nor. A parallel (and still eerging) line of work concerns the proble of coputing low-rank atrix solutions to underdeterined linear equations 6, 6]. One of the ost striking results concerns the exact copletion of low-rank atrices fro only a sall fraction of their entries 6, 8, 5]. There, a siilar convex relaxation is eployed, replacing the highly non-convex atrix rank with the nuclear nor (or su of singular values). The robust PCA proble outlined above cobines aspects of both of these lines of work: we wish 1 Rando sapling approaches guarantee near-optial estiates, but have coplexity exponential in the rank of the atrix A 0. Triing algoriths have coparatively lower coputational coplexity, but guarantee only locally optial solutions. A ajor difference between robust PCA and low-rank atrix copletion is that here we do not know which entries are corrupted, whereas in atrix copletion the support of the issing entries is given.

3 to recover a low-rank atrix fro large but sparse errors. We will show that cobining the solutions to the above probles (nuclear nor iniization for low-rank recovery and l 1 -iniization for error correction) yields a polynoial-tie algorith for robust PCA that provably succeeds under broad conditions: With high probability, solving a siple convex progra perfectly recovers a generic atrix A R of rank as large as C log(), fro errors affecting a constant fraction of the entries. Notice that the conclusion holds with high probability only when the diensionality is high enough. Thus, we will not be able to truly witness the striking ability of convex progra in recovering the low-rank and sparse structures fro the data unless is large enough. This phenoenon has been dubbed as the blessing of diensionality 1]. Thus, we need scalable algoriths to solve the associated convex progra. To this end, we show how a near-solution to this convex progra can be obtained relatively efficiently via iterative thresholding techniques, siilar to those proposed for atrix copletion 4]. or large atrices, this algorith is significantly faster and ore scalable than general-purpose convex progra solvers. inally, our proof iplies strong results for the low-rank atrix copletion proble, and including the first results applicable to the proportional growth setting where the rank of the atrix grows as a constant (non-vanishing) fraction of the diensionality: With overwheling probability, solving a siple convex progra perfectly recovers a generic atrix A R of rank as large as C, fro observations consisting of only a fraction ρ (ρ < 1) of its entries. 1.1 Organization of the paper This paper is organized as follows. Section forulates the robust principal coponent analysis proble ore precisely and states the ain results of this paper, placing these results in the context of existing work. Section 3 begins the proof of the ain result by providing sufficient conditions for a desired pair (A, E) to be recoverable by convex prograing. These conditions will be stated in ters of the existence of a dual vector satisfying certain constraints. Section 4 then shows that with high probability such a certifying dual vector indeed exists. This establishes our ain result. Section 5 explores the iplications of this analysis for the related proble of copleting a low-rank atrix fro an observation consisting of only a sall subset of its entries. Section 6 gives a siple but fast algorith for approxiately solving the robust PCA proble that extends existing iterative thresholding techniques, and is ore efficient and scalable that generic interiorpoint ethods. In Section 7, we perfor siulations and experients corroborating the theoretical results and suggesting their applicability to real-world probles in coputer vision. inally, in Section 8, we outline several proising directions for future work. Proble Setting and Main Results We assue that the observed data atrix D R n was generated fro a low-rank atrix A R n, by corrupting soe of its entries. The corruption can be represented as an additive error E R n, so that D = A + E. Because the error affects only a portion of the entries of D, E is a sparse atrix. The idealized (or noise-free) robust PCA proble can then be forulated as follows: 3

4 Proble.1 (Robust PCA). Given D = A + E, where A and E are unknown, but A is known to be low rank and E is known to be sparse, recover A. This proble forulation iediately suggests a conceptual solution: seek the lowest rank A that could have generated the data, subject to the constraint that the errors are sparse: E 0 k. The Lagrangian reforulation of this optiization proble is in rank(a) + γ E 0 subj A + E = D. (1) A,E If we could solve this proble for appropriate γ, we ight hope to exactly recover the pair (A 0, E 0 ) that generated the data D. Unfortunately, (1) is a highly nonconvex optiization proble, and no efficient solution is known. 3 We can obtain a tractable optiization proble by relaxing (1), replacing the l 0 -nor with the l 1 -nor, and the rank with the nuclear nor A = i σ i(a), yielding the following convex surrogate: in A + λ E 1 subj A + E = D. () A,E This relaxation can be otivated by observing that A +λ E 1 is the convex envelope of rank(a)+ λ E 0 over the set of (A, E) such that A, + E 1, 1. Moreover, recent advances in our understanding of the nuclear nor heuristic for low-rank solutions to atrix equations 6, 6] and the l 1 heuristic for sparse solutions to underdeterined linear systes 7, 13], suggest that there ight be circustances under which solving the tractable proble () perfectly recovers the low-rank atrix A 0. The ain result of this paper will be to show that this is indeed true under surprisingly broad conditions. A sketch of the result is as follows: or alost all pairs (A 0, E 0 ) consisting of a low-rank atrix A 0 and a sparse atrix E 0, (A 0, E 0 ) = arg in A,E A + λ E 1 subj A + E = A 0 + E 0, and the iniizer is uniquely defined. That is, under natural probabilistic odels for low-rank and sparse atrices, alost all observations D = A 0 +E 0 generated as the su of a low-rank atrix A 0 and a sparse atrix E 0 can be efficiently and exactly decoposed into their generating parts by solving a convex progra. 4 Of course, this is only possible with an appropriate choice of the regularizing paraeter λ > 0. ro the optiality conditions for the convex progra (), it is not difficult to show that for atrices D R, the correct scaling is λ = O ( 1/). Throughout this paper, unless otherwise stated, we will fix λ = 1. (3) 3 In a sense, this proble subsues both the low rank atrix copletion proble and the l 0 -iniization proble, both of which are NP-hard and hard to approxiate. While we are not aware of any foral hardness result for (1), it is reasonable to conjecture that it is siilarly difficult. 4 Notice that this is not an equivalence result for the intractable optiization (1) and the convex progra () rather than asserting that the solutions of these two probles are equal with high probability, we directly prove that the convex progra correctly decoposes D = A 0 + E 0 into (A 0, E 0 ). A natural conjecture, however, is that under the conditions of our ain result, Theore 1, (A 0, E 0 ) is also the solution to (1) for soe choice of γ. 4

5 or siplicity, all of our results in this paper will be stated for square atrices D R, although there is little difficulty in extending the to non-square atrices. It should be clear that not all atrices A 0 can be successfully recovered by solving the convex progra (). Consider, e.g., the rank-1 case where U = e i ] and V = e j ]. Without additional prior knowledge, the low-rank atrix A = USV cannot be recovered fro even a single gross error. We therefore restrict our attention to atrices A 0 whose row and colun spaces are not aligned with the standard basis. This can be done probabilistically, by asserting that the arginal distributions of U and V are unifor on the Stiefel anifold W r. In 6], this was called the rando orthogonal odel. Definition. (Rando orthogonal odel 6]). We consider a atrix A 0 to be distributed according to the rando orthogonal odel of rank r if its left and right singular vectors are independent uniforly distributed r atrices with orthonoral coluns. 5 In this odel, the nonzero singular values of A 0 can be arbitrary. Our odel for errors is siilarly natural: each entry of the atrix is independently corrupted with soe probability ρ s, and the signs of the corruptions are independent Radeacher rando variables. Definition.3 (Bernoulli error signs and support). We consider an error atrix E 0 to be drawn fro the Bernoulli sign and support odel with paraeter ρ s if the entries of sign(e 0 ) are independently distributed, each taking on value 0 with probability 1 ρ s, and ±1 with probability ρ s / each. In this odel, the agnitude of the nonzero entries in E 0 can be arbitrary. Our ain result is the following: Theore.4 (Robust recovery fro non-vanishing error fractions). or any p > 0, there exist constants (C 0 > 0, ρ s > 0, 0 ) with the following property: if > 0, (A 0, E 0 ) R R with the singular spaces of A 0 R distributed according to the rando orthogonal odel of rank r C 0 log() and the signs and support of E 0 R distributed according to the Bernoulli sign-and-support odel with error probability ρ s, then with probability at least 1 C p (A 0, E 0 ) = arg in A + 1 E 1 subj A + E = A 0 + E 0, (5) and the iniizer is uniquely defined. In other words, atrices A 0 whose singular spaces are distributed according to the rando orthogonal odel can, with probability approaching one, be efficiently recovered fro alost all corruption sign and support patterns without prior knowledge of the pattern of corruption. Our line of analysis also iplies strong results for the atrix copletion proble studied in 6, 8, 5]. W r. 5 That is, the left and right singular vectors are independent saples fro the Haar easure on the Stiefel anifold (4) 5

6 Theore.5 (Matrix copletion in proportional growth). There exist nuerical constants 0, ρ r, ρ s, C all > 0, with the following property: if > 0 and A 0 R is distributed according to the rando orthogonal odel of rank r ρ r, (6) and Υ ] ] is an independently chosen subset of ] ] in which the inclusion of each pair (i, j) is an independent Bernoulli(1 ρ s ) rando variable with ρ s ρ s, then with probability at least 1 exp ( C), and the iniizer is uniquely defined. A 0 = arg in A subj A(i, j) = A 0 (i, j) (i, j) Υ, (7) inally, in Section 6 we extend existing iterative thresholding techniques for solving equalityconstrained l 1 -nor iniization probles 31] and nuclear nor iniization probles 4] to give an algorith that produces a near-solution to () ore efficiently and scalably than off-theshelf interior point ethods..1 Relationship to existing work During the preparation of this anuscript, we were infored of results by 9] analyzing the sae convex progra. That paper s ain result shows that under the probabilistic odel studied here, exact recovery can be guaranteed with high probability when E 0 0 C 1.5 log() ax(r, log ). (8) While this is an interesting result, even for constant rank r it guarantees correction of only a vanishing fraction o( 1.5 ) of errors. In contrast, our ain result, Theore.4, states that even if r grows proportional to / log(), non-vanishing fractions of errors are corrected with high probability. Both analyses start fro the optiality condition for the convex progra (). The key technical coponent of this iproved result is a probabilistic analysis of an iterative refineent technique for producing a dual vector that certifies optiality of the pair (A 0, E 0 ). This proof technique is related to techniques used in 7, 30], with additional care required to handle an operator nor constraint arising fro the presence of the nuclear nor in (). inally, while Theore.5 is erely a byproduct of our analysis and not the ain focus of this paper, it is interesting in light of results by 8]. That work proves that in the probabilistic odel considered here, a generic rank r atrix can, with high probability, be efficiently and exactly copleted fro a subset of only Cr log 8 () (9) entries. When r > C log 8 (), this bound exceeds the nuber of possible observations. In contrast, our Theore.5 iplies that for certain scenarios with r as large as ρ r, the atrix can be copleted fro a subset of (1 ρ s ) entries. or atrices of relatively large rank, this is a significant extension of 8]. Our result does not supersede (9) for saller ranks, however. These results are further contrasted in Section 5. 6

7 3 Analysis raework In this section, we begin our analysis of the seidefinite progra (). After introducing notation, we provide two sufficient conditions for a pair (A, E) to be the unique optial solution to (). The first condition, given in Lea 3.1 below, is stated in ters of the existence of a dual vector W that certifies optiality of the pair (A, E). The existence (or non-existence) of such a dual vector is itself a rando convex prograing feasibility proble, and so the condition in this Lea is not directly aenable to analysis. The next step, outlined in Lea 3.3, is to show that if there exists soe W 0 that does not violate the constraints of this feasibility proble too badly, then it can be refined to produce a W that satisfies the constraints and certifies optiality of (A, E). The probabilistic analysis in the following Section 4 will then show that, with high probability, such a W 0 exists. 3.1 Notation. or any n Z +, n] = {1... n}. or M R, and I, J ], M I,J will denote the subatrix of M consisting of those rows indexed by I and those coluns indexed by J. We will use as shorthand for the entire index set: M I, is the subatrix consisting of those rows indexed by I. M will denote the transpose of M. or atrices P, Q R n, P, Q =. tracep Q] will denote the (robenius) inner product. The sybol I will denote the identity atrix or identity operator on atrices; the distinction will be clear fro context. M p,q will denote the operator nor of the atrix M, as a linear ap between l p and l q. Iportant special cases are the spectral nor M,, and the ax row- and colun nors, and the ax eleent nor M, = ax i M i, and M 1, = ax M,j, (10) j M 1, = ax M ij. (11) i,j We will also often reason about linear operators on atrices. If L : R R is a linear ap, L, will denote its operator nor with respect to the robenius nor on atrices:. LM] L, = sup. (1) M R \{0} M If S is a subspace, π S will denote the projection operator onto that subspace. or U R r, π U will denote the projection operator onto the range of U. inally, if I ], π I. = i I e ie i will denote the projection onto those coordinates indexed by I. We will let Ω. = {(i, j) E 0ij 0} (13) denote the set of corrupted entries. By slight abuse of notation, we will identify Ω with subspace {M M ij = 0 (i, j) Ω c }, so π Ω will denote the projection onto the corrupted eleents. We will let Σ R be an iid Radeacher (Bernoulli ±1) atrix that defines the signs of the corrupted entries: sign(e 0 ) = π Ω Σ]. (14) Expressing sign(e 0 ) in this way allows us to exploit independence between Ω and Σ. 7

8 The sybol will denote the Kronecker product between atrices. vec will denote the operator that vectorizes a atrix by stacking the coluns. or M R, this operator stacks the entries M ij in lexicographic order. or x = vecm] R we write x Ω as shorthand for x L(Ω), where L(Ω) = {j + i (i, j) Ω} ], and as further shorthand, we will occasionally use vec Ω M] to denote vecm]] Ω. In both our analysis and optiization algorith, we will ake frequent use of the soft thresholding operator, which we denote S γ : S γ x] = { 0, x γ, sign(x)( x γ), x > γ. (15) or atrices X, S γ X] will denote the atrix obtained by applying S γ to each eleent of X ij. inally, we often find it convenient to use the notation S Ωc γ X]. = π Ω S γ X]]. (16) That is, Sγ Ωc applies the soft threshold to X but only retains those eleents not in Ω. Throughout, we will use the sybol W r to refer to the Stiefel anifold of atrices U R r with orthonoral coluns: U U = I r r. We will use SO(r) to refer to the special orthogonal group of r r atrices R such that R R = RR = I r r and det(r) = Optiality conditions for (A 0, E 0 ) We begin with a siple sufficient condition for the pair (A 0, E 0 ) to be the unique optial solution to (). These conditions are stated in ters of a dual vector, the existence of which certifies optiality. Our analysis will then show that under the stated circustances, such a dual certificate can be produced with high probability. Lea 3.1. Let (A 0, E 0 ) R R, with Ω. = supp(e 0 ) ] ]. (17) Let A 0 = USV denote the copact singular value decoposition of A 0, and Θ denote the subspace generated by atrices with colun space range(u) or row space range(v ): Θ. = {UM M R r } + {MV M R r } R. (18) Suppose that π Ω π Θ, < 1 and there exists W R such that UV + W ] ij = λ sign(e 0i,j ) i, j Ω, UV + W ] ij < λ i, j Ω c, U W = 0, W V = 0, W, < 1 Then the pair (A 0, E 0 ) is the unique optial solution to ().. (19) Proof. We show that if a Lagrange ultiplier vector Y. = UV +W satisfying this syste exists, then any nonzero perturbation A 0 A 0 +, E 0 E 0 respecting the constraint A + E = A 0 + E 0 8

9 will increase the objective function. By convexity, we ay restrict our interest to satisfying 1, < in (i,j) Ω E 0ij. or such, λ E 0 1 λ E 0 1 = λ sign(e 0ij ) ij + λ ij Y,. (i,j) Ω (i,j) / Ω with equality if and only if π Ω ] = 0 (since on this set Y ij is strictly saller than λ). Siilarly, since Y = UV + W A0, A 0 + A 0 + Y,. (0) Moreover, since A 0 = UV, A 0 = Y, A 0, A 0 + Y, = Y, A 0 +. Thus, by Lea 3. below, if equality holds in (0), then Θ. Suing the two subgradient bounds, we have A λ E 0 1 A 0 + λ E 0 1, with equality only if Ω Θ. If π Ω π Θ, < 1, then Ω Θ = {0} and so either or = 0. A λ E 0 1 > A 0 + λ E 0 1, Lea 3.. Consider P R, with P, = 1 and σ in (P ) < 1. Write the full singular value decoposition of P as P = ] ] I 0 ] U 1 U V1 V 0 Σ where Σ, < 1. Let Q R have reduced singular value decoposition Q = UΣV. Then if P, Q = Q, U U = 0 and V V = 0. Proof. Let r denote the rank of Q. ] ] I 0 U P, Q =, 1 U 0 Σ U Σ ] V U V 1 V V. Let Y, Z R r denote the atrices Y =. ] U 1 U U, Z. ] V = 1 V U V, and notice that both Y and Z V have orthonoral coluns. Then, after the above rotation r r P, Q = σ i (P ) σ j (Q)Z ij Y ij = σ j (Q) σ i (P )Z ij Y ij. (1) i=1 j=1 or all j, i=1 σ i(p )Z ij Y ij ] I 0 0 Σ Z,j Y,j 1, with equality if and only if σ i (P ) < 1 = Z ij = 0 and Z,j = Y,j. Since whenever P, Q = Q each of the i=1 σ i(p )Z ij Y ij ust be one, this iplies that U U and V V are both zero. Thus, we can guarantee that (A 0, E 0 ) is optial with high probability by asserting that a certain rando convex feasibility proble is satisfiable with high probability. While there are potentially any W satisfying these constraints, ost of the lack an explicit expression. As has proven fruitful in a variety of related probles, we will instead consider a putative dual vector W 0 that does have an explicit expression: the iniu robenius nor solution to the equality constraints in (19). We will see that this vector already satisfies the operator nor constraint with high probability. However, the box constraint due to sign(e 0 ) is likely to be violated. We then describe an iterative procedure that, with high probability, fixes the box constraint, while respecting the equality and operator nor constraints. The output of this iteration is the desired certifying dual vector. j=1 i=1 9

10 3.3 Iterative construction of the dual certificate In constructing the dual certificate, we will use the fact that the violations of the inequality constraints, viewed as a atrix, often have sparse rows and sparse coluns. To foralize this, fix a sall constant c 0, 1], and define Ψ c. = { M R M,j 0 c j, M i, 0 c i, π Ω M] = 0 }. () We will show that such atrices are near the nullspace of the equality constraints in (19). The equality constraints in (19) can be expressed as π Ω W ] = π Ω λ sign(e 0 ) UV ] and π Θ W ] = 0. We will let Γ denote the orthogonal copleent of the nullspace of these constraints: Γ. = Θ + Ω. (3) Let ξ c denote the operator nor of π Γ, with respect to the robenius nor on R, restricted to Ψ c :. π Γ M ξ c = sup 0, 1]. (4) M Ψ c\{0} M In Section 4.3, we establish a useful probabilistic upper bound for this quantity. Before investigating these properties, we show that if ξ c and W 0 are well-controlled, we can find a dual vector certifying optiality of (A 0, E 0 ). Lea 3.3 (Iterative construction of a dual certificate). Suppose for soe c (0, 1) and ε > 0, there exists W 0 satisfying π Θ W 0 ] = 0 and π Ω W 0 ] = π Ω λ sign(e 0 ) UV ], such that and UV + W 0 1, + 1 S c Ω 1 ξ λ ε(uv + W 0 ) (λ ε) c, c (5) UV + W 0, + 1 S c Ω 1 ξ λ ε(uv + W 0 ) (λ ε) c, c (6) Then there exists W satisfying the syste (19). W 0, + 1 S Ω c 1 ξ λ ε(uv + W 0 ) < 1. (7) c Proof. We construct a convergent sequence W 0, W 1,... whose liit satisfies (19). or each k, set W k = W k 1 π Γ S Ωc λ ε (UV + W k 1 ). (8) Notice that π Θ W k ] = π Θ W k 1 ] and π Ω W k ] = π Ω W k 1 ], so for all k, π Θ W k ] = 0 and π Ω W k ] = π Ω λ sign(e 0 ) UV ]. We will further show by induction that the sequence (W k ) satisfies the following properties: UV + W k 1, UV + W 0 1, + Sλ ε(uv Ωc + W 0 ) 10 ξc, i (9) k 1 i=0

11 UV + W k, UV + W 0, + Sλ ε(uv Ωc + W 0 ) ax i ax j S Ωc ] λ ε (UV + W k 1 ) S Ωc ] λ ε (UV + W k 1 ) i,,j ξc, i (30) k 1 i=0 c, (31) 0 c, (3) 0 Sλ ε(uv Ωc + W k ) ξc k Sλ ε(uv Ωc + W 0 ). (33) or k = 0, (9), (30) and (33) hold trivially. The sparsity assertion (31) follows fro the assuptions of the lea: for all i, ] S λ ε(uv Ωc + W 0 ) UV + W 0 ] i, i, 0 (λ ε) UV + W 0 1, (λ ε) c. The exact sae reasoning applied to the coluns gives (3). Now, suppose the stateents (9)-(33) hold for W 0... W k 1. Then UV + W k 1, UV + W k 1 1, + Sλ ε(uv Ωc + W k 1 ) UV + W k 1 1, + Sλ ε(uv Ωc + W 0 ) ξc k 1 UV + W 0 1, + Sλ ε(uv Ωc + W 0 ) ξc, i k 1 establishing (9) for k. The sae reasoning applied to, establishes (30) for k. Bounding the 1 suation in (9) by 1 ξ c and applying assuption (5) of the lea gives that UV + W k 1, (λ ε) c. This iplies (31): the nuber of entries of each colun of UV + W k that exceed λ ε in absolute value is at ost c. The sae chain of reasoning establishes that (3): the rows of Sλ ε Ωc (UV + W k ) are also c-sparse. inally, notice that ) Sλ ε(uv Ωc + W k ) = Sλ ε (UV Ωc + W k 1 π Γ Sλ ε(uv Ωc + W k 1 ) = Sλ ε (UV Ωc + W k 1 Sλ ε(uv Ωc + W k 1 ) + π Γ Sλ ε(uv Ωc + W k 1 )). Since the entries of UV + W k 1 Sλ ε Ωc (UV + W k 1 ) have agnitude λ ε, π Γ Sλ ε Ωc (UV + W k 1 ) doinates Sλ ε Ωc (UV + W k ) eleentwise in absolute value. Hence, Sλ ε Ωc (UV + W k ) π Γ Sλ ε Ωc (UV + W k 1 ) ξ c Sλ ε Ωc (UV + W k 1 ), where we have used that Sλ ε Ωc (UV + W k 1 ) Ψ c. Thus each of the three stateents holds for all k, and Sλ ε Ωc (UV + W k ) decreases geoetrically. or all k sufficiently large, UV + W k 1, λ. Meanwhile, notice that W k, W k 1, + π Γ S Ωc λ ε(uv + W k 1 ), W k 1, + Sλ ε(uv Ωc + W k 1 ) W k 1, + ξc k 1 Sλ ε(uv Ωc + W 0 ). 11 i=0

12 By induction, it is easy to show that W k, W 0, + S Ωc λ ε(uv + W 0 ) k 1 ξc. i By the second assuption of the lea, this quantity is bounded by 1 for all k. i=0 4 Probabilistic Analysis of the Initial Dual Vector In this section, we analyze the iniu robenius nor solution to the equality constraints in the optiality condition (19). We show that with high probability, this initial dual vector satisfies the operator nor constraint, and that the violations of the box constraint are sall enough that the iteration described in (8) will succeed in producing a dual certificate. The analysis is organized as follows: in Section 4.1, we introduce several tools and preliinary results. Section 4. then shows that with overwheling probability the operator nor of the initial dual vector is bounded by a constant that can be ade arbitrarily sall by assuing the error probability ρ s and rank r are both low enough. Section 4.3 then shows that the restricted operator nor ξ c is also bounded by a sall constant, establishing that all row- and colun- sparse atrices are nearly orthogonal to the subspace spanned by the equality constraints in (19). Section 4.4 analyzes the violations of the box constraint. inally, Section 4.5 cobines these results with the results of Section 3 to prove our ain result, Theore Tools and preliinaries In this section, we will often need to refer to the following subspaces Ξ U. = {UM M R r }, Ξ V. = {MV M R r }, Ξ UV. = {UMV M R r r }. Notice that π Θ = π ΞU + π ΞV π ΞUV and that π ΞUV = π ΞU π ΞV = π ΞV π ΞU. In the Bernoulli support odel, the nuber of errors in each row and colun concentrates about ρ s. or each j ], let I j. = {i (i, j) Ω}, (34) i.e., I j contains the indices of the errors in the j-th colun. Siilarly, for i ], set In ters of these quantities, for each η > 0, define the event E Ω (η) : ax j J i. = {j (i, j) Ω}. (35) I j < (1 + η)ρ s and ax J i < (1 + η)ρ s. i Much of our analysis hinges on the operator nor of π Ω π Θ. We will see that on the above event E Ω (η), this quantity can be controlled by bounding nors of subatrices of the singular vectors U 1

13 and V. This results in a nuber of bounds involving the following function of the rank and error probability:. r/ + ρs τ(r/, ρ s ) = 1 r/. (36) Where τ is used as shorthand, it should be understood as τ(r/, ρ s ). We will repeatedly refer to the following good events: E ΩU : π Ω π ΞU, τ (r/, ρ s ), E ΩV : π Ω π ΞV, τ (r/, ρ s ), E ΩΘ : π Ω π Θ, τ (r/, ρ s ). In establishing that these events are overwhelingly likely, the following result on singular values of Gaussian atrices will prove useful: act 4.1. Let M R n, > n and suppose that the eleents of M are independent identically distributed N (0, 1) rando variables. Then P σ ax (M) + n + t ] ( ) exp t, P σ in (M) n t ] ( ) exp t. This result is widely used in the literature, with various estiates of the error exponent. The for stated here can be obtained by via the bounds E σ ax (M)] + n and E σ in (M)] n, in conjunction with ] Proposition.18, Equation (.35), and the observation that the singular values are 1-Lipschitz. Lea 4.. ix any η (0, 1/16). Consider (U, V, Ω) drawn fro the rando orthogonal odel of rank r < with error probability ρ s > 0. Then there exists t (r/, ρ s ) > 0 such that ) ( P U,V,Ω E Ω (η) E ΩU E ΩV E ΩΘ ] 1 4 exp ( t 4 exp η ρ ) s. (37) In particular, if for all larger than soe 0 Z, r/ ρ r < 1, then P U,V,Ω E Ω (η) E ΩU E ΩV E ΩΘ ] 1 exp ( C + O(log())). (38) Proof. Each of the rando variables I j is a su of independent Bernoulli(ρ s ) rando variables X 1,j, X,j,... X,j. The partial su k i=1 X k ρ s k is a Martingale whose ( differences ) are all bounded by 1. So by Azua s inequality, we have P I j ρ s > t] < exp t. The sae calculation clearly holds for the J i, and so setting t = ηρ s, ( P E Ω (η) c ] P I j > ρ s (1 + η)] exp η ρ ) s. (39) So, with high probability each row and colun of E 0 is (1 + η)ρ s -sparse. We next show that such sparse vectors are nearly orthogonal to the rando subspace range(u). The atrix U is uniforly 13

14 distributed on W r, and can be realized by orthogonalizing a Gaussian atrix. Let Z R r be an iid N (0, 1 ) atrix; then U is equal in distribution to Z(Z Z) 1/. or each I j, Z Ij, (Z Z) 1/, Z I j,, σ in (Z), where σ in (Z) denotes the r-th singular value of the r atrix Z. Now, for any t > 0 P σ in (Z) < 1 ] ( ) r/ t < exp t. (40) Meanwhile, on the event E Ω (η), P Z Ω Z Ij,, > ] r/ + I j / + t ( ) < exp t. (41) Hence, ] r/ η ρs + t P U Ij,, > 1 r/ t ] r/ η ρs + t P U Ij U Ij,, > 1 I j < (1 + η)ρ s + P I j (1 + η)ρ s ] r/ t ) ( < exp ( t + exp η ρ ) s. Set t in ( r 3, ) ηρ s. By the assuption of the lea, 1 + η + η < 4/3, and r/ η ρs + t 1 r/ t ( ) r/ η + η ρs ( 3 1 ) r/ r/ ρs ( 3 1 ) τ(r/, ρ s ). r/ So, (applying a syetric arguent to the V Ji ), then, for the event E 1 defined below, ( ) E 1 : ax ax U j Ij,,, ax V i Ji,, τ(r/, ρ s ), (4) P U,V,Ω E 1 E Ω (η)] > 1 4 exp ( t ) ( 4 exp η ρ ) s. (43) If for all > 0, r/ ρ r < 1, then t is bounded away fro zero. We next show that E 1 iplies E ΩU, E ΩV, and E ΩΘ. We can express π Ω M] in ters of its action on the coluns of M: π Ω M] = j π I j M,j e j. So, π π ΞU ΩM] = π U π Ij M,j e j, (44) j 14

15 π U π Ij M,j U π Ij M,j and π ΞU π Ω M] = j j U Ij,, M,j = j ( ) ax U j Ij,, M, and so on E 1, π Ω π ΞU, π ΞU π Ω, τ(r/, ρ s ); and so E 1 establishes that E 1 = E ΩV. Now, notice that since = E ΩU. A syetric arguent π Ω π Θ M] = π Ω π ΞU M] + π Ω π U Mπ V ], (45) if we choose a basis B R r for the orthogonal copleent of range(u) and define Ξ U V {BQV Q R r r } R, then π Ω π Θ, π Ω π ΞU, + πω π ΞU V,. (46). = Since πω π ΞU V, = sup π Ω M] M Ξ U V \{0} sup π Ω M] = π Ω π ΞV,, M Ξ V \{0} E 1 = E ΩΘ and the proof is coplete. We will need to understand concentration of Lipschitz functions of atrices that are uniforly distributed (according to the Haar easure) on two anifolds of interest: the Stiefel anifold W r and the group of r r orthogonal atrices with deterinant one, SO(r). This is governed by the concentration function on these spaces: Definition 4.3. ] Let (X, d) be a etric space. or a given easure µ on X, the concentration function is defined as α X,d,µ (t). = sup { 1 µ(a t) A X, µ(a) 1 }, (47) where A t = {x d(x, A) < t} is a t-neighborhood of A. The concentration functions for W r and SO(r) are well known: act 4.4 (4] Theores and 6.7.1). or r < the anifold W r, with distance d(x, Y ) =. X Y, the Haar easure µ has concentration function ) π α W,d,µ (t) ( 8 exp t. (48) 8 Siilarly, on SO(r) with δ(x, Y ) =. X Y, and ν the Haar easure, ) π α SO(r),δ,ν (t) ( 8 exp rt. (49) 8 15

16 Propositions 1.3 and 1.8 of ] then iply that Lipschitz functions on these spaces concentrate about their edians and expectations: Corollary 4.5. Suppose r <, and let f : R r R with Lipschitz nor f lip. = sup X Y f(x) f(y ) X Y. (50) Then if U is distributed according to the Haar easure on W r, and edian(f) denotes any edian, ( ) P f(u) edian(f) + t] exp t. (51) Siilarly, if g : R r r R with Lipschitz nor 8 f lip g lip. = sup X Y g(x) g(y ) X Y. (5) Then if R is distributed according to the Haar easure on SO(r), g satisfies the following tail bound: ] ( ) 8π P g(r) Eg(R)] g lip r + t exp rt. (53) 8 g lip Proof. The concentration result (51) is a restateent of ] Proposition 1.3. or (53), notice first that ] Proposition 1.3 iplies that ( ) P g(r) edian(g) t] exp rt. (54) If we set ā. = then ] Proposition 1.8 gives 0 exp ( rt 8 g lip ) P g(r) Eg] ā + t] exp 8 g lip dt = g lip 8π r, (55) ( rt 8 g lip ). (56) 4. Bounding the operator nor In this section, we begin our analysis of the iniu robenius nor solution to the equality constraints of the feasibility proble (19). We show that with overwheling probability this atrix also has operator nor bounded by a sall constant. An iportant byproduct of this analysis is a siple proof for that, with the sae odels studied here, the easier proble of atrix copletion filling issing entries in a low-rank atrix can be efficiently and exactly solved by convex optiization, even for cases when the rank of the atrix grows in proportion to the diensionality. Section 5 further elucidates connections to that proble. 16

17 4..1 General approach We begin by showing that with high probability the iniu robenius nor solution is unique, and giving an explicit representation for the pseudoinverse operator in that case. This operator, denoted H, is applied to the atrix λ signe 0 ] UV to give the initial dual vector. We separately bound the nor of the two ters induced by H signe 0 ], and H UV ],, respectively. Both arguents follow in a fairly straightforward anner by reducing to a net and then applying concentration inequalities. Throughout this section, N will denote a 1 -net for S 1. By ] Lea 3.18, there is such a net with size at ost exp(4). Moving fro A, = sup x,y S 1 x Ay to our product of nets loses at ost a constant factor in the estiate: A, 4 sup x,y N x Ay (see e.g., 9] Proposition.6). We will argue that for our A of interest, f(a) = x Ay concentrates, and union bound over all exp(8) pairs in N N. 4.. Representation and uniqueness of W 0. Lea 4.6 (Representation of the pseudoinverse). Suppose that we have π Θ π Ω, < 1. Then the operator I π Ω π Θ π Ω is invertible, and for any M the optiization proble has unique solution in W subj π Θ W ] = 0, π Ω W ] = π Ω M] (57) Ŵ = π Θ π Ω (I π Ω π Θ π Ω ) 1 π Ω M] = π Θ π Ω (π Ω π Θ π Ω ) k π Ω M]. Proof. Choose atrices U, V R r whose coluns for orthonoral bases for the orthogonal copleent of the ranges of U and V, respectively. Let Q denote the atrix Q. = V U V U V U Notice that the rows of Q are orthonoral, and that they for a basis for the subspace of vectors {vecm] M Θ}. The equality constraint in (57) can therefore be expressed as ] ] Q 0 vec(w ) =, (58) I Ω, vec Ω M] Here, I is the identity atrix, and I Ω, is the subatrix of I consisting of those rows indexed by Ω, taken in lexicographic order. The iniu robenius nor solution W is siply the iniu l nor solution to this syste of equations. Define the atrix Notice that for any atrix M, P. = Q I,Ω.. Q P I Ω, vecm] = vec π Θ π Ω M]. k=0 17

18 Since Q and I Ω, each have orthonoral rows, π Θ π Ω, = Q ] P I Ω,, = P,. So, by the I P assuption of the lea P, < 1, and the atrix P is nonsingular. We therefore have I the following explicit expression for the unique iniu l -nor solution to (58): vecŵ ] = Q I,Ω ] I P P I ] 1 0 vec Ω M] ]. (59) Applying the Schur copleent forula (which is justified since I P P ), the above is equal to ] ] Q P I,Ω (I P P ) 1 vec I Ω M] = (I Q Q)I,Ω (I P P ) 1 I Ω, vecm] = (I Q Q)I,Ω (I Ω, Q QI,Ω ) k I Ω, vecm] = vec π Θ π Ω k=0 k=0 (π Ω π Θ π Ω ) k π Ω M] yielding the representation in the stateent of the lea Effect of the singular vectors We next analyze the part of the initial dual vector W 0 induced by UV. ro the previous lea we have the expression H ] = π Θ π Ω k=0 ], (π Ω π Θ π Ω ) k π Ω ], (60) whenever π Ω π Θ, < 1. Analysis of H UV ] is coplicated by the fact that UV and Θ are dependent rando variables. Notice, however, that π Θ depends only on the subspace spanned by the coluns of U, not on any choice of basis for that subspace. UV, on the other hand, does depend on the choice of basis. We will use this fact to decouple H and UV, and show that the l operator nor of H UV ] is bounded below a sall constant with overwheling probability. Lea 4.7. Let (U, V, Ω) be sapled fro the rando orthogonal odel of rank r with error probability ρ s. Suppose that r and ρ s satisfy τ(r/, ρ s ) < 1 4. (61) Then with probability at least 1 exp ( C + O(log )), the solution W UV 0 to the optiization proble in W subj U W = 0, W V = 0, π Ω W ] = π Ω UV ] (6) is unique, and satisfies ( W UV r ) ( 0, 64 τ, ρ s r 1 ) π r 1. (63) 18

19 Proof. irst consider the event ix x, y N and write E 1 : π Ω UV ], 64τ. x π Ω UV ]y = U π Ω xy ], V. On the event E ΩU, U π Ω xy ] τ. So, as a function of V, f(v ) =. x π Ω UV ]y is τ-lipschitz. Since the distribution of V is invariant under the orthogonal transforation V V, f(v ) is a syetric rando variable and 0 is a edian. Hence, on the event E ΩU the tail bound (51) iplies that P V U,Ω f(v ) > t] < exp ( t 8τ ). (64) Set t = 16τ. A union bound over the exp(8) eleents of N N shows that on E ΩU, ] ] P V U,Ω π Ω UV ], > 64τ P V U,Ω sup x π Ω UV ]y > 16τ and hence we can conclude that x,y N N exp ( 4), P U,V,Ω E 1 ] 1 exp ( 4) PE c ΩU ] 1 exp ( C + O(log )). Now, consider the cobined event E 1 E ΩΘ. On this event, the representation in Lea 4.6 is valid and W0 UV = π Θ π Ω (π Ω π Θ π Ω ) k π Ω UV ]. (65) k=0 or any M, π Θ M] = π U Mπ V, π Θ M], M,, so W0 UV, π Ω (π Ω π Θ π Ω ) k π Ω UV ] k=0, π Ω UV ], + (π Ω π Θ π Ω ) k UV ] k=1,. (66) We have already addressed the first ter. To bound the second, we expand our probability space as follows. Consider Ũ and Ṽ distributed according to the Haar easure on W 1. Identify U and V with the first r coluns of Ũ and Ṽ, respectively, and write Û and ˆV for the reaining 1 r coluns. Notice that U and V are indeed distributed according to the rando orthogonal odel of rank r, and that Û and ˆV follow the rando orthogonal odel of rank r 1 (although, of course, U and Û now dependent rando variables). Write (π Ω π Θ π Ω ) k UV ] = Ω π Θ π Ω ), k=1(π k ŨṼ Û ˆV ], Ω π Θ π Ω ) k=1(π k ŨṼ ] + Ω π Θ π Ω ), k=1(π k Û ˆV ]., k=1 19

20 We next show that with overwheling probability each of these ters is well-controlled. Notice that if R SO ( r 1) is any orthogonal atrix, then the joint distribution Ũ, U, and Û is invariant under the ap ] I 0 Ũ Ũ. (67) 0 R This follows fro the right orthogonal invariance of the Haar easure on W 1 (see e.g., 11] Section 1.4.3). Since this ap preserves U and V, it also preserves Θ. Hence, the ter of interest, k=1 (π Ωπ Θ π Ω ) Û k ˆV ], is equal in distribution to k=1 (π Ωπ Θ π Ω ) ÛR k ˆV ],. The orthogonal atrix R is independent of all of the other ters in this expression. This independence allows us to estiate the nor by first bounding the operator nor of the ap k=1 (π Ωπ Θ π Ω ) k and then applying easure concentration on SO ( r 1). or fixed x, y N, consider the quantity ( ) x (π Ω π Θ π Ω ) k ÛR ˆV ] y = Û (π Ω π Θ π Ω ) k xy ] ˆV, R k=1. = M, R. This is a M -Lipschitz function of R. Since for any w S 1, Rw is uniforly distributed on S 1, Ee i Rw] = 0 for all i. So, ER] = 0 and E R M M, R = 0. On the event E ΩΘ, M k=1 (π Ω π Θ π Ω ) k xy ] π Ωπ Θ π Ω, xy 4τ 1 π Ω π Θ π Ω, 1 4τ 4τ 3. (68) k=1 Hence, on E ΩΘ by (53) ) 9( r 1)t P R M M, R > γ() + t] < exp ( 18 τ, (69). where γ() = 4τ( r,ρs) 1 4τ( r 8π,ρs) r 1 1 8π 3 r 1. or copactness of notation, set β(, r). =. Then, setting t = 64τ, union bounding over the exp(8) pairs in N N, and then r 1 3β oving fro N N to S 1 S 1 (losing at ost a factor of 4) gives P R Ũ,V,Ω Ω π Θ π Ω ) k=1(π k ÛR ˆV ] > 56 τ 3β(, r) + 4γ() < exp ( 4). (70), And so, Ω π Θ π Ω ) k=1(π k Û ˆV ] > 56 τ 3β(, r) + 4γ(), = P R Ũ,V,Ω Ω π Θ π Ω ) k=1(π k ÛR ˆV ] > 56 τ 3β(, r) + 4γ(), exp ( 4) + P EΩΘ] c exp ( C + O(log )). PŨ,V,Ω 0

21 An identical arguent, this tie randoizing over R =. ] I 0 shows that 0 R PŨ,V,Ω Ω π Θ π Ω ) k=1(π k ŨṼ ] > 56 τ 3β(, r) + 4γ() exp ( C + O(log )). (71), Cobining ters yields the bound Effect of the error signs We next consider the effect of the error signs. As in the previous proposition, we handle the k = 0 and k = 1... parts of the representation in Lea 4.6 separately. Lea 4.8. There exists a function φ(ρ s ) satisfying li ρs 0 φ(ρ s ) = 0, such that if E 0 is distributed according to the Bernoulli sign and support odel with error probability ρ s. P Ω,Σ sign(e 0 ), φ(ρ s ) ] exp ( C). (7) Proof. We first provide a bound on the oent-generating-function for the iid rando variables. Y ij = sign(e0ij ) of the for Eexp(tY )] exp(αt ). The oents of Y are 1 k = 0, EY k ] = 0 k odd, ρ s k > 0, k even, so Eexp(tY )] = 1 + ρ st k k=1 (k)!. Since exp(αt ) = 1 + k=1 αk t k k!, it suffices to choose α such that ρ s (k)! α k k = 1,,.... k! Since (k)! k! k k 1, α ax k=1,,... k ρ 1 k s suffices. Consider the function ψ : 1, ) 0, 1] R defined by ψ(x, y) = 1 x y 1 x. ψ(1, y) = y, and for all y li x ψ(x, y) = 0. The only stationary point occurs at ( ) x = log(1/y), and hence for y > 0 its axiu on 1, ) is ψ(x 1 (y), y) = ax y, log(y 1 ) y 1 log(y 1 ). Notice that li y 0 ψ(x (y), y) = 0, and that Now, for any fixed pair x, y N, let Z. = x sign(e 0 ) y. Then Eexp(tY )] exp(ψ(x (ρ s ), ρ s )t ). (73) Eexp(tZ)] = ij Eexp(tx i y j Y ij )] ij exp(ψ(x (ρ s ), ρ s )t (x i y j ) ) = exp(ψ(x (ρ s ), ρ s )t ). Applying a Chernoff bound (and optiizing the exponent) gives ( P x t ) sign(e 0 )y > t] < exp 4ψ(x. (74) (ρ s ), ρ s ) 1

22 Union bounding (and recognizing that oving fro N N to S 1 loses at ost a factor of 4) gives P sign(e 0 ), 4t ] ( t ) exp 8 4ψ(x. (75) (ρ s ), ρ s ) If we set, e.g., t(ρ s ) = 8ψ 1/ (x (ρ s ), ρ s ), the probability of failure will be bounded by exp( 8). urther setting φ(ρ s ) = 4t(ρ s ) gives the stateent of the lea. The above lea goes part of the way to controlling the coponent of the initial dual vector induced by the errors, H λ sign(e 0 )]. A straightforward Martingale arguent, detailed in the following lea, copletes the proof. Lea 4.9. Consider (U, V, Ω, Σ) drawn fro the rando orthogonal odel of rank r < with Bernoulli error probability ρ s and rando signs, and suppose that r and ρ s satisfy τ (r/, ρ s ) 1 4. (76) Then with probability at least 1 exp( C + O(log )) in (U, V, Ω, Σ), the solution W0 E optiization proble to the in W subj U W = 0, W V = 0, π Ω W ] = π Ω λ sign(e 0 )] (77) is unique, and satisfies W E, 0 φ(ρ s ) + 18τ(r/, ρ s), (78) 3 where φ( ) is the function defined in Lea 4.8. Proof. On event E ΩΘ, the iniizer is unique, and can be expressed as W0 E = π Θ π Ω (π Ω π Θ π Ω ) k π Ω λ sign(e 0 )]. k=0 Since π Θ is a contraction in the (, ) nor, W0 E, π Ω (π Ω π Θ π Ω ) k π Ω λ sign(e 0 )] k=0, π Ω λ signe 0 ]], + (π Ω π Θ π Ω ) k π Ω λ sign(e 0 )] k=1,. Lea 4.8 controls the first ter below φ(ρ s ) with overwheling probability. or the second ter, it is convenient to treat sign(e 0 ) as the projection of an iid Radeacher atrix Σ onto Ω: sign(e 0 ) = π Ω Σ]. ix x, y N, and notice that x (π Ω π Θ π Ω ) k π Ω λ sign(e 0 )]y = λ (π Ω π Θ π Ω ) k xy ], Σ. k=1 k=1. = M, Σ.

23 On the event E ΩΘ, M R 1 has robenius nor at ost 4τ 4τ 1 4τ 3. Order the indices (i, j) 1 i, j arbitrarily, and consider the Martingale Z defined by Z 0 = 0, Z k = Z k 1 + M ik,j k Σ ik,j k. The k-th Martingale difference is bounded by M ik,j k, and the overall squared l -nor of the differences is bounded by M. Hence, by Azua s inequality, P M, Σ > t] < exp ) ( 9t 3τ. (79) Set t = 3τ 3. A union bound over the exp(8) eleents in N N shows that on the event E ΩΘ, P Σ U,V,Ω (π Ω π Θ π Ω ) k π Ω λ sign(e 0 )] > 18τ exp ( 4). (80) 3 k=1, Hence, P U,V,Ω,Σ k=1 (π Ωπ Θ π Ω ) k π Ω λ sign(e 0 )], > 18τ 3 ] is bounded by exp ( 4) + PE c ΩΘ] exp ( C + O(log )). 4.3 Controlled feedback for sparse atrices We next bound the operator nor of the projection π Γ, restricted to row- and colun-sparse atrices Ψ c. Lea 4.10 (Representation of iterates). Let Θ, Γ, Ω be defined as above. Suppose that π Ω π Θ, < 1. Then for all M Ω, π Γ M] = π Ω π Θ k=0 (π Θ π Ω π Θ ) k π Θ M]. (81) Proof. or M Ω, vecπ Γ M] = Q I P I,Ω ] P I = Q I P I,Ω ] P I ] 1 Q I Ω, ] vecm] ] 1 Q vecm] 0 ]. Under the condition of the lea, P, < 1 so the above inverse is indeed well-defined, and can be expressed via the Schur copleent forula: vecπ Γ M] = Q (I P P ) 1 QvecM] I,Ω P (I P P ) 1 QvecM] = π Ω Q (I P P ) 1 QvecM] = π Ω Q (P P ) k QvecM]. k=0 3

24 Recognizing that Q (P P ) k QvecM] = vec k=0 k=0 π Θ (π Θ π Ω π Θ ) k π Θ M] ] copletes the proof. Lea ix any c (0, 1). Consider (U, V, Ω) drawn fro the rando orthogonal odel of rank r <, with Bernoulli error probability ρ s. urther suppose that r and ρ s satisfy Then with probability at least 1 exp ( C + O(log )), ξ c 3 9 where H(c) is the base-e binary entropy function. τ ( r, ρ ) 1 s 4. (8) ( r + c + ) H(c), (83) Proof. ro the above representation, whenever π Ω π Θ, < 1, ξ c = sup π Ω π Θ (π Θ π Ω π Θ ) k π Θ M] M Ψ c k=0 M 1 Θ π Ω π Θ ) k=0(π k sup π Θ M], M Ψ c M 1 Now, 1 1 π Θ π Ω π Θ, sup M Ψ c M 1 π Θ M]. sup M Ψ c M 1 π Θ M] sup M Ψ c M 1 sup I c U I,, + π U M + Mπ V sup J c V J,,. Identify U with the orthogonalization Z(Z Z) 1/ of an r iid N (0, 1 ) atrix Z. Then for any given I ] of size c, U I,, Z I,, σ. Now, in(z) P σ in (Z) > 1 ] ( ) r/ t 1 < exp t 1. (84) 4

25 Meanwhile, for each I of size c, again P Z I,, > r/ + ] ( ) c + t < exp t. (85) There are at ost exp (H(c) + O(log )) such subsets I, where H( ) denotes the base-e binary entropy function, so ] P ax I ( ] c) Z I,, > r/ + c + t < exp ( (t / H(c)) + O(log ) ). (86) Choosing t 1 = 1 1 r, and set t = H(c) and cobining bounds gives r/ + c + H(c) ξ c (1 4τ(r/, ρ s ) )(1 r/). (87) Noticing that r/ < τ and applying the bound τ < 1/4 to the ters in the denoinator copletes the proof. 4.4 Controlling the initial violations In this section, we analyze the initial dual vector W 0 given by the iniu robenius nor solution to the equality constraints of the optiality condition (19), and show that for any constant β, the probability that S 5 UV + W 6 0 ] > 3β approaches zero. We will treat the parts of UV + W 0 = UV + H UV ] + H λ sign(e 0 )] separately, and use the following siple lea to cobine the bounds: Lea 4.1. or all atrices A, B, and α (0, 1), Sγ Ωc A + B] Sαγ Ωc A] + S(1 α)γ Ωc B]. (88) Proof. Notice that for scalars x, S γ x] is a convex nonnegative function. X R, Sγ Ωc X] is again convex (see, e.g, 3] Exaple 3.14), and so S Ωc γ A + B] = Sγ Ωc α Sγ Ωc A α α A B α + (1 α) 1 α] ] + (1 α) S Ωc γ = S Ωc αγ A] + S Ωc (1 α)γ B]. B 1 α] Hence, for atrices The strategy, then, is to bound the expectation of Sλ/3 Ωc ] for each of the three ters, using the following lea. It will turn out that for any prespecified p, we can choose C 0 such if r < C 0 log, the expected robenius nor of the violations is O( p ); an application of the Markov inequality then bounds the probability that the value deviates above any fixed constant β. 5

26 Lea Let X be a syetric rando variable satisfying the (subgaussian) tail bound P X t] exp( Ct ). Let Y. = S γ (X). Then E Y ] C exp ( Cγ ). (89) Proof. Since X is syetric, Y is also syetric, so EY ] = t PY t]dt 4 (s γ) exp( Cs )ds 4 γ γ 0 t exp( C(t + γ) )dt s exp( Cs )ds. Lea Let X be a rando variable satisfying a tail bound of the for Suppose γ > a and let Y. = S γ (X). Then Proof. EY ] = = 0 γ P X a + t] C 1 exp( C t ). E Y ] C 1 C exp ( C (γ a) ). (90) t P Y t]dt = 0 tp X γ + t]dt (s γ)p X s]ds C 1 (s γ) exp ( C (s a) ) ds = C 1 (q + a γ) exp( C q )dq C 1 exp ( C (γ a) ). C γ a γ In the previous section, we were interested in controlling the quadratic products x W 0 y involving the initial dual vector W 0 and arbitrary unit vectors. Here, because the soft thresholding operator S γ acts eleentwise, in this section, we require tighter control over e i W 0e j. Because there are only pairs (e i, e j ), uch tighter control can be established, as is foralized in the following lea: Lea Consider U, V drawn fro the rando orthogonal odel of rank r. or any κ > 0, define the events 1 E U (κ) : ax U i, i κ log, 1 E V (κ) : ax V i, i κ log. 6

27 Then if r < C0 log() for soe C 0 < 1 4κ, On these good events, P U,V E U (κ) E V (κ)] 1 exp ax i,j π Θ e i e j ] ( (κ 1/ C 0 ) ). (91) 8 log κ log. (9) Proof. Notice that f(u) =. U i, is a 1-Lipschitz function of U. Since i U i, = U = r, by syetry E U i, ] = r. By the Markov inequality, f has a edian no larger than Ef] Ef ] = r. Invoking Lipschitz concentration on W r and union bounding over the choices of i, ] ( 1 P ax U i, > exp ( ) ) 1 r i κ log 8 κ log. (93) An identical calculation applies to E V (κ). Suing the two probabilities of failure copletes the proof. Lea Let (U, V ) be distributed according to the rando orthogonal odel of rank r <. or any fixed Ω ] ] and κ > 0, on the good event E U (κ) E V U S Ωc 1 UV 4 ] ( 1 3 κ 144 ). (94) κ log Proof. ix U and consider UV ] i,j = U i, Vj, as a U i, -Lipschitz function of V. On E U (κ), the Lipschitz constant is bounded by (κ log ) 1/, and ( ) P V U UV ] ij > t] < exp κt log. (95) 8 By Lea 4.13, on E U, ( ) ] E V U S 1 UV ] 3 ij ] Hence, suing over pairs (i, j) Ω c E V U S Ω c 1 3 UV ] Applying the Cauchy-Schwarz inequality copletes the proof. 16 ( κ log exp κ ) 7 log(). (96) ] 16 κ κ log (1 7 ). (97) Lea ix any β > 0, κ > 0. Consider (U, V, Ω) drawn fro the rando orthogonal odel of rank r, with Bernoulli error probability ρ s, and suppose that r, ρ s satisfy r < C 0 log, C 0 < 1 4κ, τ ( r, ρ ) 1 s < 4. (98) Then there exists 0, C 1, C > 0 such that for all > 0, S Ω P c U,V,Ω 1 H UV ] ] ] > β C 1 1 κ β κ log + P U,V,Ω E U (κ) c E V (κ) c EΩΘ] c. 7

28 Proof. We use a siilar splitting trick to that in Lea 4.7. Again let Ũ and Ṽ be uniforly distributed on W 1. Identify U and V with their first r coluns, and let Û and ˆV denote the reaining r 1 coluns. Write S Ωc 1 H UV ] ] = S Ωc 3 1 H ŨṼ Û ˆV ]] 3 S Ωc 1 H ŨṼ ]] + S Ωc 6 1 H Û ˆV ]]. 6 We first address the second ter. On the event E ΩΘ, S Ωc 1 H Û ˆV ]] = 6 SΩc 1 π 6 Θ π Ω Ω π Θ π Ω ) k=0(π k π Ω Û ˆV ]] = SΩc 1 π 6 Ω π Θ π Ω Ω π Θ π Ω ) k=0(π k π Ω Û ˆV ]] = SΩc 1 π 6 Θ π Ω Ω π Θ π Ω ) k=0(π k π Ω Û ˆV ]]. Here, we have used that π Ω π Θ π Ω = π Ω (I π Θ )π Ω = π Ω π Θ π Ω. Now, since for any R SO( r 1), the joint distribution of (Ũ, Ṽ, Ω) is invariant under the ap ] I 0 Ũ Ũ. (99) 0 R Moreover, this ap preserves U and V, and therefore H. Hence, if R is distributed accord- ing to the invariant easure on SO( r 1), SΩc 1 H Û ˆV ]] is equal in distribution to 6 π Θ π Ω k=0 (π Ωπ Θ π Ω ) k π Ω ÛR ˆV ]]. Consider the i, j eleent of this atrix, SΩc 1 6 e i π Θ π Ω k=0 (π Ω π Θ π Ω ) k π Ω ÛR ˆV ]e j = ( ) Û (π Ω π Θ π Ω ) k π Ω π Θ e i e j ] ˆV, R. = M, R. k=0 This is a M -Lipschitz function of R. On E U (κ) E V (κ) E ΩΘ, π Ω π Θ, M π Θ e i e 4τ 1 j ] 1 π Ω π Θ π Ω, 1 4τ κ log() Let δ =. r 1. Since E R M, R ] = 0, the tail bound (53) iplies that ( 9 κδ log()t where P R M M, R > q() + t] < exp q() = κ log. ), (100) 4 8π 3δ κ log. (101) 8

29 ] By Lea 4.14, E R Ũ,Ṽ S,Ω 1 M, R ] κδ log() exp κδ log() 3 ( 1 4 δ 8π κ log() ). (10) or copactness, let ζ(r/, ρ s, ). = 1 κδ 3 ( ) 1 4 8π. (103) δ κ log() Then suing over the eleents in Ω c gives that on the event E U (κ) E V (κ) E ΩΘ, S Ω E c R Ũ,Ṽ,Ω 1 H ÛR ˆV ]] ] κδ log ζ. (104) By the Markov inequality, on E U (κ) E V (κ) E ΩΘ, P R Ũ,Ṽ,Ω S Ω c 1 6 H ÛR ˆV ]] > β An identical calculation shows that if we set R = So, P U,V,Ω S Ω c 1 3 P R Ũ,Ṽ,Ω S Ω c 1 6 H UV ] ] I 0 0 R H Ũ RṼ ]] > β ] < ] > β Under the assuptions of the Lea, δ = < 3 ζ/ ] 16 ζ/ < 3βδ κ log(). (105) ] SO( 1), on E U (κ) E V (κ) E ΩΘ 16 ζ/ 3βδ κ log(). (106) 3βδ κ log() + P(E U (κ) E V (κ)) c ] + PE c ΩΘ]. r 1 that for all > δ, δ(, r) > 1 and so ζ 1 κ 64 1 C0 log 1. Hence, δ > 0 such ( π κ log ) urtherore, for any κ, κ such that for > κ, 16 8π κ log < 1 4. or > ax( δ, κ ), ζ < 1 κ 104. or such sufficiently large, the ultiplier 3 the stateent of the lea. 3δ 3 3. Choosing this value for C 1 and setting 0 = ax( δ, κ ) gives Lea 4.18 (Box violations induced by error). ix any α (0, 1). Let (U, V, Ω, Σ) be distributed according to the rando orthogonal odel of rank r and with error probability ρ s, with r and ρ s satisfying τ(r/, ρ s ) < 1/4. (107) Then on the good event E U (κ) E V (κ) E ΩΘ, E Σ U,V,Ω S Ω c α H λ sign(e 0 )] ] κ log() 1 ακ 4. (108) 9

30 Proof. We can exploit independence of Ω and Σ by writing sign(e 0 ) = π Ω Σ]. ro the representation in the previous lea, S Ωc α H λ sign(e 0 )] ] = S α (π Θ π Ω ) k λ Σ]]. The i, j eleent of the atrix of interest is (π Θ π Ω ) k λ Σ] e j = λ (π Ω π Θ ) k π Θ e i e j ], Σ e i k=1 k=1 k=1. = M, Σ. On E U (κ) E V (κ), π Θ e i e j ] 1 κ log. On E ΩΘ, k=1 (π Θπ Ω ) k, τ 1 τ 1 M κ log. The sae Martingale arguent as in Lea 4.9 shows that Then by Lea 4.13, 1, and so ) ) P Σ M, Σ > t] < exp ( t κ log()t M = exp (. (109) ( ) ] E Σ U,V,Ω S α M, Σ ] ( κ log() exp ) ακ log(). (110) Suing over the eleents in Ω c to bound E ] and then applying the Cauchy-Schwarz inequality establishes the result. The three leas in this section cobine to yield the following bound on the robenius nor of the violations. Corollary 4.19 (Control of box violations). ix any β > 0, p > 0. There exist constants C 0 (p) > 0, ρ s > 0, 0 with the following property: if > 0 and (U, V, Ω, Σ) are distributed according to the rando orthogonal odel of rank r < C 0 (p) log, (111) with Bernoulli error probability ρ s ρ s and rando signs, then S ] Ω P c U,V,Ω,Σ 5 UV + W 6 0 ] 3β 1 C β p. Proof. By Lea 4.1, S Ωc 5 UV + W 6 0 ] UV ] S Ωc S Ωc 1 3 H UV ]] + S Ωc 1 6 H λ sign(e 0 )] ]. We use the three previous leas to estiate each of these ters. Set κ = 048p + 104, and C 0 = 1 16κ. ro Lea 4.16 and the Markov inequality, for any Ω, on E U, ] P V U S Ω c 1 3 UV ] β 30 4 β κ log() 1/ κ/144,

31 and so S Ω P c U,V,Ω,Σ 1 UV ] 3 ro Lea 4.18, P U,V,Ω,Σ S Ωc 1 6 ] 4 β β κ log() 1/ κ/144 + PE U (κ) c ] = o ( p). H λ sign(e 0 )] ] ] β κ log() 1/ κ/4 + P (E U (κ) E V (κ)) c ] + P E c ΩΘ] = o ( p). inally, by Lea 4.17 P U,V,Ω,Σ S Ωc 1 3 H UV ] ] ] β = C 1 β κ log() 1/ κ/048 + P (E U (κ) E V (κ)) c ] + P E c ΩΘ] C 1 p β κ log() + o ( p). Suing the three failure probabilities copletes the proof. 4.5 Pulling it all together We close by fitting the probabilistic leas in the previous three sections together to give a proof of our ain result, Theore.4. Proof. We show that C 0 and 0 can be selected such that with high probability the conditions of Lea 3.3 are satisfied. Set ε = 1 6. ix 0 large enough and c > 0 sall enough that on the good event fro Lea 4.11, as long as C 0 1, ξ c 3 ( 1 + c 9 log + ) H(c ) 1. (11) 0 We then ust show that UV + W 0 1, + Notice that S Ωc 5 6 UV + W 0 ] 5 6 c. (113) UV + W 0 1, UV 1, + W 0 1, ax V i, + W 0,. (114) i ro Leas 4.7 and 4.9, for any choice of C 0, W 0, is with overwheling probability bounded by a linear function of τ. or any fixed C 0, li τ(r/, ρ s ) = ρ s. Hence, by choosing 0 large and ρ s sall, we can bound W 0, 5 c (115) 18 with overwheling probability. Meanwhile, on the overwhelingly likely good event E V, V i, 5 c (116) 18 31

32 for sufficiently large. inally, fix β = 5 36 c and choose C 0 sall enough and 0 large enough that the conditions of Corollary 4.19 are satisfied. Then, with probability at least 1 C p, (5) is satisfied. The sae calculations apply to (6). inally, on these good events, W 0, + 1 S c Ω 5 UV + W 1 ξ c 6 0 ] 18 5 c + 5 c < 1, (117) 36 so (7) holds. By Lea 3.3, then, there exists a certifying dual vector, and the proof is coplete. 5 Iplications on Low-Rank Matrix Copletion Our result has strong iplications for the low-rank atrix copletion proble studied in 6, 8, 5]. In atrix copletion, the goal is to recover a low rank atrix A 0 fro an observation consisting of a known subset Υ. = ] ] \ Ω of its entries. 6 6] and 8] studied the following convex prograing heuristic for the atrix copletion proble A 0 = arg in A subj A(i, j) = A 0 (i, j) (i, j) Υ. (118) Duality considerations in 6] give the following characterization of A 0 that can be recovered by solving (118): Lea 5.1 (6], Lea 3.1). Let A 0 R be a rank-r atrix with reduced singular value decoposition USV. As above, let Θ. = span ( {UM M R r } {MV M R r } ) R. (119) Suppose there exists a atrix Y such that { π ΘY ] = UV, π Υ Y ] = 0, π Θ Y ], < 1 }. (10) Then A 0 is the unique solution to the seidefinite progra in A subj π ΥA] = π ΥA 0]. (11) Candes and collaborators then consider the iniu robenius solution Y 0 to the syste of equations (10) and show that if the nuber of observations Υ is large enough, Y 0, is bounded below one with high probability. Recall that in Section 4, we analyzed the operator nor of the iniu robenius nor solution to a siilar syste of equations. In fact, we will see that Y 0 is exactly equal to UV + W0 UV, where W0 UV is the part of the initial (RPCA) dual vector that was induced by the singular vectors of A 0. Using Lea 4.7, the operator nor of W0 UV can be bounded below one with overwheling probability, even in a proportional growth setting. This yields Theore.5, which we forally prove below: 6 In 6], Ω denotes this set. 3

33 Proof. Notice that (10) is feasible if and only if the syste { π Θ W ] = 0, π Υ W ] = π Υ UV ], W, < 1 } (1) is feasible in W =. Y UV. Here, Υ is the set of issing eleents fro the atrix to be copleted; we can identify this set with the set of corrupted entries Ω in the low-rank recovery proble that is the ain focus of this paper. This is again a rando subset of ] ] in which the inclusion of each pair (i, j) is an independent Bernoulli(ρ s ) rando variable. Hence, if we can show that the atrix W0 UV defined in Lea 4.7 as the iniu robenius nor solution to the equality constraints π Θ W ] = 0, π Ω W ] = π Ω UV ] has operator nor bounded below 1, we will have further established that A 0 uniquely solves (118), and hence can be efficiently recovered by nuclear nor iniization. Lea 4.7 iediately iplies Theore.5 as follows. irst, notice that ax(ρ r, ρ s ) < 1 89 = τ(ρ r, ρ s ) < 1 4. Under this condition, with probability at least 1 exp (C + O(log )), the iniizer W UV ( W0 UV, 64τ(ρ r, ρ s ) is uniquely defined and satisfies ) ρ r 1 π (1 ρ r ) 1. (13) Choose 0 large enough that the second ter is < 1 3 and ρ r 4. Then on the above 1 good event, W0 UV, < 56τ(ρ r, ρ s ) + 1. or ρ r, ρ s sufficiently sall, this is bounded below one: for exaple, ax(ρ r, ρ s ) < suffices. 7 Thus, atrices A 0 distributed according to the rando orthogonal odel of rank as large as r = C can be recovered fro incoplete subsets Υ of size (1 ρ s ). This is the first result to suggest that atrix copletion should succeed in such a proportional growth scenario. The previous best result for A 0 distributed according to the rando orthogonal odel, due to 8], showed that Υ = C r log 8 () (14) observations suffice. When r, this result becoes epty, since in this case the prescribed C log 8 () nuber of easureents exceeds. The fact that our result holds in proportional growth ay see surprising in light of discussions in 8] as discussed there, if all one assues is that the singular spaces of A 0 are incoherent with the standard basis, then Υ = O(r log ) easureents are necessary. It is iportant to note, though, that the singular spaces of atrices A 0 distributed according to the rando orthogonal odel satisfy rich regularity properties in addition to incoherence. In particular, the subatrix nor considerations used in bounding π Θ π Ω, in our proof of Theore.4 can be viewed as a kind of restricted isoetry property for the singular vectors U and V. urtherore, our proofs ake heavy use of the independence of U and V in the rando orthogonal odel. Independence and orthogonal invariance allow us to introduce an auxiliary randoization R over the choice of basis for U, decoupling H (which only depends on the subspace spanned by U and not on any choice of basis for the space) fro UV, which very clearly does depend on the choice of basis. inally, one should also note that our Theore.5 only supersedes (14) for relatively large rank r. or saller (say fixed) rank r, (14) reains the strongest available result for atrix copletion in the rando orthogonal odel. 7 These estiates are clearly pessiistic: larger constant fractions ρ r are possible. However, the arguent given here already suffices to establish that atrix copletion succeeds in a new regie of atrices whose rank is a constant fraction of the diensionality. 33

34 6 Iterative Thresholding for Corrupted Low-rank Matrix Recovery There are a nuber of possible approaches to solving the robust PCA seidefinite progra (). or sall proble sizes, interior point ethods offer superior accuracy and convergence rates. However, off-the-shelf interior point solvers becoe ipractical for data atrices larger than about 70 70, due to the O( 6 ) coplexity of solving the associated Newton syste. In this section, we propose an alternative first-order ethod with far better scaling behavior, with only O( 3 ) coplexity. It can easily work for atrices up to 1, 000 1, 000 on a desktop PC. We will use this algorith for all of our experients in the next section. Our approach is a straightforward cobination of iterative thresholding algoriths for nuclear nor iniization 4] and l 1 -nor iniization 31]. As such, it solves a slightly odified version of (), in which a sall ultiple of the robenius nor is appended: in A + λ E τ A + 1 τ E subj A + E = D. (15) Here, τ is a large constant; as τ, we expect the solution to coincide with that of (). 8 Our algorith for solving (15) cobines the algoriths of 4] for nuclear nor iniization and 31] for l 1 -nor iniization. It aintains an estiate of a dual variable Y R n consisting of the Lagrange ultipliers for the equality constraint D = A + E. The algorith proceeds by soft thresholding the singular values of Y to obtain an estiate of A and soft thresholding the atrix entries theselves to obtain an estiate of E. This is ade precise in Algorith 1 below. Algorith 1: Iterative Thresholding for Corrupted Low-rank Matrix Recovery Input: Observation atrix D R n, weights λ, τ. while not converged (U, S, V ) svd(y k 1 ), A k U S τi] + V, E k sign(y k 1 ) Y k 1 λτ11 ] +, Y k Y k 1 + δ k (D A k E k ), end Output: A. We prove that the iteration in Algorith 1 converges to the global optiu of (15). The proof is a straightforward odification of the arguents of 4]. Let φ τ (A, E). = τ A + λτ E A + 1 E. Lea 6.1. Let (Z 1, Z ) φ τ (A, E), and (Z 1, Z ) φ τ (A, E ). Then Z 1 Z 1, A A + Z Z, E E A A + E E. Proof. Since (Z 1, Z ) φ τ (A, E), Z 1 (τ A + 1 A ) and Z (λτ E E ). We can therefore write Z 1 = τ(uv + Q) + A, where USV is a copact singular value decoposition of A, Q, 1, U Q = 0, and QV = 0. A siilar expression can be given for Z 1. So, Z 1 Z 1, A A = τ UV + Q U V Q, A A + A A = τ ( A + A UV + Q, A U V + Q, A ) + A A, 8 This can ost likely be shown by a siilar arguent to Theore 3.1 of 4]. 34

35 where we have used that UV, A = tr V U USV ] = tr S] = A. Since UV + Q, A UV + Q, A A (and likewise for U V + Q, A ), (16) is A A. Siilarly, Z = λτw + E, where W ij = sign(e ij ) for E ij 0, and W ij 1, 1] if E ij = 0. So, we have that Z Z, E E = λτ W W, E E + E E = λτ ( W, E + W, E W, E W, E ) + E E = λτ ( E 1 + E 1 W, E W, E ) + E E. (16) Since again W, E E 1 and W, E E 1, (16) is bounded below by E E. Cobining the bounds for the two parts copletes the proof. As in 4], let D α denote the singular value shrinkage operator, which aps a atrix A with SVD A = USV to US αi] + V. Lea 6.. The thresholding operator ψ α,β : (A, E) (D α A, S β E) solves the optiization proble in α B 1 + β B B A B E B. (17) 1,B Proof. This follows fro optiality properties of D and S individually. Theore 6.3. Suppose that 0 < δ k < 1 for all k. Then the sequence A k, E k obtained by Algorith 1 converges to the unique global optiu of (15). Proof. Suppose that (A, E ) are optial for (15), and Y is the atrix of Lagrange ultipliers for the constraint A + E = D. Then (Z 1, Z ) φ τ (A, E ) such that Z 1 Y = 0 and Z Y = 0. Siilarly, since (A k, E k ) = ψ τ,λτ (Y k 1, Y k 1 ), by Lea 6., (Z k 1, Z k ) φ τ (A k, E k ) such that 9 Cobining the two, we can write This iplies, via Lea 6.1, that Now, Z k 1 Y k 1 = 0 and Z k Y k 1 = 0. Y k 1 Y = Z k 1 Z 1 and Y k 1 Y = Z k Z. A k A, Y k 1 Y = A k A, Z k 1 Z 1 A k A, E k E, Y k 1 Y = E k E, Z k Z E k E. Y k Y = Y k 1 Y + δ k (D A k E k ) = Y k 1 Y + δ k Y k 1 Y, D A k E k + δ k D A k E k Y k 1 Y δ k Y k 1 Y, A k A δ k Y k 1 Y, E k E + δ k D A k E k Y k 1 Y δ k A k A δ k E k E + δ k A k A + δ k E k E. 9 Noticing that the objective function in (17) can be written as φ τ (B 1, B ) (B 1, B ), (A, E) + 1 (A, E). 35

36 Hence, if 0 < δ k < 1, the sequence Y k Y is nonincreasing. This sequence therefore converges to a liit (possibly 0), and therefore A k A 0 and E k E 0, establishing the result. 7 Siulations and Experients In this section, we first perfor several siulations corroborating the our theoretical results and clarifying their iplications. We then sketch applications to two coputer vision tasks involving the recovery of intrinsically low-diensional data fro gross corruption: background estiation fro video and face alignent under varying illuination Siulation: proportional growth. We deonstrate the efficacy of the proposed iterative thresholding algorith on randoly generated atrices. or siplicity, we restrict our exaples to square atrices. We draw A 0 according to the independent rando orthogonal odel described earlier. We generate E 0 as a sparse atrix whose support is chosen uniforly at rando, and whose non-zero entries are independent and uniforly distributed in the range 500, 500]. We apply our proposed algorith on the atrix D =. A 0 + E 0 to recover  and Ê. The results are presented in Table 1. or these experients, we choose λ = 1/ and τ = 10, 000. We observe that the proposed algorith is successful in recovering A 0 even when 10% of its entries are corrupted. The relative error in recovering A 0 decreases with increase in diensionality. We note that though the algorith in soe cases overestiates the rank of A, we observe that the extra nonzero singular values of  tend to have very sall agnitude. rank(a 0 ) E 0 0  A 0 A 0 rank(â) Ê 0 #iterations tie (s) , , ,011 8, , ,014 10,000 6, , ,081 6,775 31, , ,005 10, , ,0 10, , ,091 10,000 6, , ,55 10,000 35,981 Table 1: Proportional growth. Here the rank of the atrix grows in proportion (5%) to the diensionality ; and the nuber of corrupted easureents grows in proportion to the nuber of entries, top 5% and botto 10%, respectively. 10 Note that it ight see to be overkill to cast these relatively siple vision tasks as robust PCA probles. There ay exist easier and faster solutions that harness special doain inforation or additional structures in the visual data. Here, we erely use these intuitive exaples and data to illustrate how our algorith can be used as a general tool to effectively extract low-diensional and sparse structures fro real data. 36

37 7. Siulation: phase transition wrt (ρ r, ρ s ). We exaine how the rank of A and the proportion of errors in E affect the perforance our algorith. We fix = 00 and then construct a rando atrix pair (A 0, E 0 ) satisfying rank(a 0 ) = ρ r and E 0 = ρ s, where both ρ r and ρ s vary between 0 and 1. The paraeters λ and τ are the sae as the previous experient. or each choice of ρ r and ρ s, we apply the proposed algorith to D, and ter a given experient a success if the recovered  satisfies  A0 A 0 < Each experient is repeated over 10 trials and the results are plotted in igure 1 (left). The color of each cell reflects the epirical recovery rate. White denotes perfect recovery in all experients, and black denotes failure for all experients. We see that there is a relatively sharp phase transition between success and failure of the algorith on the line ρ r + ρ s = 0.5. To verify this behavior, we repeat the experient, but only vary ρ r and ρ s between 0 and 0.3. These results, seen in igure 1 (right), show that phase transition reains fairly sharp even at higher resolution. These results are conservative, and can be potentially iproved by increasing the axiu nuber of iterations (here chosen to be 5000), or varying the choices of λ and τ. igure 1: Phase transition wrt (ρ r, ρ s ). Left: (ρ r, ρ s ) 0, 1]. Right: (ρ r, ρ s ) 0, 0.3]. 7.3 Experient: background odeling fro video. Background odeling or subtraction fro video sequences is a popular approach to detecting activity in the scene, and finds application in video surveillance fro static caeras. The key proble here is to guarantee robustness to partial occlusions and illuination changes in the scene. Background subtraction fits nicely into our odel. If the individual fraes are stacked as coluns of a atrix D, then D can be expressed as the su of a low-rank background atrix and a sparse error atrix representing the activity in the scene. We illustrate this idea by two exaples (see igures and 3). In igure, the video sequence consists of 00 fraes of a scene in an airport. There is no significant change in illuination in the video, but a lot of activity in the foreground. We observe that our algorith is very effective in separating the background fro the activity. In igure 3, we have 550 fraes fro a scene in a lobby. There is little activity in the video, but the illuination changes drastically towards the end of the sequence. We see that our algorith is once again able to recover the background, irrespective of the illuination change. 37

Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization

Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization JOHN WRIGHT and YI MA University of Illinois at Urbana-Champaign, Microsoft Research Asia and

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Tail estimates for norms of sums of log-concave random vectors

Tail estimates for norms of sums of log-concave random vectors Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE CHRISTOPHER J. HILLAR Abstract. A long-standing conjecture asserts that the polynoial p(t = Tr(A + tb ] has nonnegative coefficients whenever is

More information

A Probabilistic and RIPless Theory of Compressed Sensing

A Probabilistic and RIPless Theory of Compressed Sensing A Probabilistic and RIPless Theory of Copressed Sensing Eanuel J Candès and Yaniv Plan 2 Departents of Matheatics and of Statistics, Stanford University, Stanford, CA 94305 2 Applied and Coputational Matheatics,

More information

FAST DYNAMO ON THE REAL LINE

FAST DYNAMO ON THE REAL LINE FAST DYAMO O THE REAL LIE O. KOZLOVSKI & P. VYTOVA Abstract. In this paper we show that a piecewise expanding ap on the interval, extended to the real line by a non-expanding ap satisfying soe ild hypthesis

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010 A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING By Eanuel J Candès Yaniv Plan Technical Report No 200-0 Noveber 200 Departent of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Reed-Muller codes for random erasures and errors

Reed-Muller codes for random erasures and errors Reed-Muller codes for rando erasures and errors Eanuel Abbe Air Shpilka Avi Wigderson Abstract This paper studies the paraeters for which Reed-Muller (RM) codes over GF (2) can correct rando erasures and

More information

On the Use of A Priori Information for Sparse Signal Approximations

On the Use of A Priori Information for Sparse Signal Approximations ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique

More information

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008 LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Reed-Muller Codes. m r inductive definition. Later, we shall explain how to construct Reed-Muller codes using the Kronecker product.

Reed-Muller Codes. m r inductive definition. Later, we shall explain how to construct Reed-Muller codes using the Kronecker product. Coding Theory Massoud Malek Reed-Muller Codes An iportant class of linear block codes rich in algebraic and geoetric structure is the class of Reed-Muller codes, which includes the Extended Haing code.

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

arxiv: v1 [math.na] 10 Oct 2016

arxiv: v1 [math.na] 10 Oct 2016 GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Fundamental Limits of Database Alignment

Fundamental Limits of Database Alignment Fundaental Liits of Database Alignent Daniel Cullina Dept of Electrical Engineering Princeton University dcullina@princetonedu Prateek Mittal Dept of Electrical Engineering Princeton University pittal@princetonedu

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

arxiv: v1 [math.pr] 17 May 2009

arxiv: v1 [math.pr] 17 May 2009 A strong law of large nubers for artingale arrays Yves F. Atchadé arxiv:0905.2761v1 [ath.pr] 17 May 2009 March 2009 Abstract: We prove a artingale triangular array generalization of the Chow-Birnbau- Marshall

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Highly Robust Error Correction by Convex Programming

Highly Robust Error Correction by Convex Programming Highly Robust Error Correction by Convex Prograing Eanuel J. Candès and Paige A. Randall Applied and Coputational Matheatics, Caltech, Pasadena, CA 9115 Noveber 6; Revised Noveber 7 Abstract This paper

More information

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket.

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket. Generalized eigenfunctions and a Borel Theore on the Sierpinski Gasket. Kasso A. Okoudjou, Luke G. Rogers, and Robert S. Strichartz May 26, 2006 1 Introduction There is a well developed theory (see [5,

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Optimal Jamming Over Additive Noise: Vector Source-Channel Case Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper

More information

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel 1 Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel Rai Cohen, Graduate Student eber, IEEE, and Yuval Cassuto, Senior eber, IEEE arxiv:1510.05311v2 [cs.it] 24 ay 2016 Abstract In

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

A Markov Framework for the Simple Genetic Algorithm

A Markov Framework for the Simple Genetic Algorithm A arkov Fraework for the Siple Genetic Algorith Thoas E. Davis*, Jose C. Principe Electrical Engineering Departent University of Florida, Gainesville, FL 326 *WL/NGS Eglin AFB, FL32542 Abstract This paper

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

A note on the realignment criterion

A note on the realignment criterion A note on the realignent criterion Chi-Kwong Li 1, Yiu-Tung Poon and Nung-Sing Sze 3 1 Departent of Matheatics, College of Willia & Mary, Williasburg, VA 3185, USA Departent of Matheatics, Iowa State University,

More information

IN modern society that various systems have become more

IN modern society that various systems have become more Developent of Reliability Function in -Coponent Standby Redundant Syste with Priority Based on Maxiu Entropy Principle Ryosuke Hirata, Ikuo Arizono, Ryosuke Toohiro, Satoshi Oigawa, and Yasuhiko Takeoto

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

Physics 139B Solutions to Homework Set 3 Fall 2009

Physics 139B Solutions to Homework Set 3 Fall 2009 Physics 139B Solutions to Hoework Set 3 Fall 009 1. Consider a particle of ass attached to a rigid assless rod of fixed length R whose other end is fixed at the origin. The rod is free to rotate about

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

The Fundamental Basis Theorem of Geometry from an algebraic point of view

The Fundamental Basis Theorem of Geometry from an algebraic point of view Journal of Physics: Conference Series PAPER OPEN ACCESS The Fundaental Basis Theore of Geoetry fro an algebraic point of view To cite this article: U Bekbaev 2017 J Phys: Conf Ser 819 012013 View the article

More information