Highly Robust Error Correction by Convex Programming

Size: px

Start display at page:

Download "Highly Robust Error Correction by Convex Programming"

Vincent Blake
6 years ago
Views:

1 Highly Robust Error Correction by Convex Prograing Eanuel J. Candès and Paige A. Randall Applied and Coputational Matheatics, Caltech, Pasadena, CA 9115 Noveber 6; Revised Noveber 7 Abstract This paper discusses a stylized counications proble where one wishes to transit a real-valued signal x R n a block of n pieces of inforation) to a reote receiver. We ask whether it is possible to transit this inforation reliably when a fraction of the transitted codeword is corrupted by arbitrary gross errors, and when in addition, all the entries of the codeword are containated by saller errors e.g. quantization errors). We show that if one encodes the inforation as Ax where A R n n) is a suitable coding atrix, there are two decoding schees that allow the recovery of the block of n pieces of inforation x with nearly the sae accuracy as if no gross errors occur upon transission or equivalently as if one has an oracle supplying perfect inforation about the sites and aplitudes of the gross errors). Moreover, both decoding strategies are very concrete and only involve solving siple convex optiization progras, either a linear progra or a secondorder cone progra. We copleent our study with nuerical siulations showing that the encoder/decoder pair perfors rearkably well. Keywords. Linear codes, decoding of rando) linear codes, sparse solutions to underdeterined systes, l 1 iniization, linear prograing, second-order cone prograing, the Dantzig selector, restricted orthonorality, Gaussian rando atrices and rando projections. Acknowledgents. This work has been partially supported by National Science Foundation grants ITR ACI-493 and CCF51536 and by the 6 Wateran Award NSF). E. C. would like to thank the Centre Interfacultaire Bernoulli of the Ecole Polytechnique Fédérale de Lausanne for hospitality during June and July 6. These results were presented at WavE 6, Lausanne, Switzerland, July 6. Thanks to Mike Wakin for his careful reading of the anuscript. We would also like to send anonyous referees and the editor for their useful coents. 1 Introduction This paper discusses a coding proble over the reals. We wish to transit a block of n real values a vector x R n to a reote receiver. A possible way to address this proble is to counicate the codeword Ax where A is an by n coding atrix with n. Now a recurrent proble with real counication or storage devices is that soe portions of the transitted codeword ay becoe corrupted; when this occurs, parts of the received codeword are unreliable and ay 1

2 have nothing to do with their original values. We represent this as receiving a distorted codeword y = Ax + z. The question is whether one can recover the signal x fro the received data y. It has recently been shown [7, 6] that one could recover the inforation x exactly under suitable conditions on the coding atrix A provided that the fraction of corrupted entries of Ax is not too large. In greater details, [7] proved that if the corruption z contains at ost a fixed fraction of nonzero entries, then the signal x R n is the unique solution of the iniu-l 1 approxiation proble in y A x x R n l ) What ay appear as a surprise is the fact that this requires no assuption whatsoever about the corruption pattern z except that it ust be sparse. In particular, the decoding algorith is provably exact even though the entries of z and thus of y as well ay be arbitrary large, for exaple. While this is interesting, it ay not be realistic to assue that except for soe gross errors, one is able to receive the values of Ax with infinite precision. A better odel would assue instead that the receiver gets y = Ax + z, z = e + z, 1.) where e is a possibly sparse vector of gross errors and z is a vector of sall errors affecting all the entries. In other words, one is willing to assue that there are alicious errors affecting a fraction of the entries of the transitted codeword and in addition, saller errors affecting all the entries. For instance, one could think of z as soe sort of quantization error which liits the precision/resolution of the transitted inforation. In this ore practical scenario, we ask whether it is still possible to recover the signal x accurately. The subject of this paper is to show that it is in fact possible to recover the original signal with nearly the sae accuracy as if one had a perfect counication syste in which no gross errors occur upon transission. Further, the recovery algoriths are very concrete and practical; they involve solving very convenient convex optiization probles. Before expanding on our results, we would like to coent on the practical relevance of our odel. Coding theory generally assues that data take on values in a finite field, but there are a nuber of applications where encoding over the reals is of direct interest. We give two exaples. The first exaple concerns Orthogonal Frequency-Division Multiplexing for wireless and wideband digital counication. Here, one can experience deep fades at certain frequencies because of ulipathing for instance) and/or frequency jaing because of strong interferers so that large parts of the data are unreliable. The second exaple is in the area of digital coputations. Here, researchers are currently interested in error correction over the reals to protect real-valued results of onboard coputations which are executed by circuits that are subject to faults due, for exaple, to radiation. As we will see, our work introduces an encoding strategy which is robust to such errors, which runs in polynoial tie, and which provably obeys optial bounds. To understand the clais of this paper in a ore quantitative fashion, suppose that we had a perfect channel in which no gross errors ever occur; that is, we assue e = in 1.). Then we would receive y = Ax + z and would reconstruct x by the ethod of least-squares which, assuing that A has full rank, takes the for x Ideal = A A) 1 A y. 1.3)

3 In this ideal situation, the reconstruction error would then obey x Ideal x l = A A) 1 A z l. 1.4) Suppose we design the coding atrix A with orthonoral coluns so that A A = I. Then we would obtain a reconstruction error whose axiu size is just about that of z. If the saller errors z i are i.i.d. N, σ ), then the ean-squared error MSE) would obey If A A = I, then the MSE is equal to nσ. E x Ideal x l = σ TrA A) 1 ). The question then is, can one hope to do alost as well as this optial ean squared error without knowing e or even the support of e in advance? This paper shows that one can in fact do alost as well by solving very siple convex progras. This holds for all signals x R n and all sparse gross errors no atter how adversary. Two concrete decoding strategies are introduced: one based on second-order cone prograing SOCP) in Section, and another based on linear prograing LP) in Section 3. We will discuss the differences between the SOCP and the LP decoders, and then copare their epirical perforances in Section 4. Decoding by Second-Order Cone Prograing To recover the signal x fro the corrupted vector y 1.) we propose solving the following optiization progra: P ) in y A x z l1 subject to z l ε,.1) A z =, with variables x R n and z R. The paraeter ε above depends on the agnitude of the sall errors and shall be specified later. The progra P ) is equivalent to in 1 ũ, subject to ũ y A x z ũ,.) z l ε, A z =, where we added the slack optiization variable ũ R. In the above forulation, 1 is a vector of ones and the vector inequality u v eans coponentwise, i.e., u i v i for all i. The progra.) is a second-order cone progra and as a result, P ) can be solved efficiently using standard optiization algoriths, see []. The first key point of this paper is that the SOCP decoder is highly robust against iperfections in counication channels. Here and below, V denotes the subspace spanned by the coluns of A, and Q R n) is a atrix whose coluns for an orthobasis of V, the orthogonal copleent to V. Such a atrix Q is a kind of parity-check atrix since Q A =. Applying Q on both sides of 1.) gives Q y = Q e + Q z..3) 3

4 Now if we could soehow get an accurate estiate ê of e fro Q y, we could reconstruct x by applying the ethod of Least Squares to the vector y corrected for the gross errors: ˆx = A A) 1 A y ê)..4) If ê were very accurate, we would probably do very well. The point is that under suitable conditions, P ) provides such accurate estiates. ẽ = y A x z, and observe the following equivalence: Introduce P ) in ẽ l1 P ) in ẽ l 1 subject to ẽ = y A x z, subject to Q y ẽ) l ε. A z =, z l ε, We only need to argue about the second equivalence since the first is iediate. Observe that the condition A z = decoposes y ẽ as the superposition of an arbitrary eleent in V the vector A x) and of an eleent in V the vector z) whose Euclidean length is less than ε. In other words, z = P V y ẽ) where P V = QQ is the orthonoral projector onto V so that the proble is that of iniizing the l 1 nor of ẽ under the constraint P V y ẽ) l ε. The clai follows fro the identity P V v l = Q v l which holds for all v R. The equivalence between P ) and P ) asserts that if ˆx, ẑ) is solution to P ), then ê = y Aˆx ẑ is solution to P ) and vice versa; if ê is solution to P ), then there is a unique way to write y ê as the su Aˆx + ẑ with z V, and the pair ˆx, ẑ) is solution to P ). We note, and this is iportant, that the solution ˆx to P ) is also given by the corrected least squares forula.4). Equally iportant is to note that even though we use the atrix Q to explain the rationale behind the ethodology, one should keep in ind that Q does not play any special role in P ). The issue here is that if P V v l is approxiately proportional to v l for all sparse vectors v R, then the solution ê to P ) is close to e, provided that e is sufficiently sparse [5]. Quantitatively speaking, if ε is chosen so that P V z l ε, then e ê is less than a nuerical constant ties ε; that is, the reconstruction error is within the noise level. The key concept underlying this theory is the so-called restricted isoetry property. Definition.1 Define the isoetry constant δ k of a atrix Φ as the sallest nuber such that.5) 1 δ k ) x l Φx l 1 + δ k ) x l.6) holds for all k-sparse vectors x a k-sparse vector has at ost k nonzero entries). In the sequel, we shall be concerned with the isoetry constants of A ties a scalar. Since AA is the orthogonal projection P V onto V, we will be thus interested in subspaces V such that P V nearly acts as an isoetry on sparse vectors. Our first result states that the SOCP decoder is provably accurate. Theore. Choose a coding atrix A R n with orthonoral coluns spanning V, and let δ k ) be the isoetry constants of the rescaled atrix n A. Suppose P V z l ε. Then the solution ˆx to P ) obeys ε ˆx x l C 1 n + x Ideal x l.7) 4

5 for soe C = C c) provided that the nuber k of gross errors obeys δ 3k + 1 δ k < c n 1) for soe c < 1; x Ideal is the ideal solution 1.3) one would get if no gross errors ever occurred e = ). If the orthonoral) coluns of A are selected uniforly at rando, then with probability at least 1 Oe γ n) ) for soe positive constant γ, the estiate.7) holds for k ρ, provided ρ ρ n/), which is a constant depending only n/. 1 This theore is of significant appeal because it says that the reconstruction error is in soe sense within a constant factor of the ideal solution. Indeed, suppose all we know about z is that z l ε. Then x Ideal x l = A z l ay be as large as ε. Thus for = n, say,.7) asserts that the reconstruction error is bounded by a constant ties the ideal reconstruction error. In addition, if one selects a coding atrix with rando orthonoral coluns one way of doing so is to saple X R n with i.i.d. N, 1) entries and orthonoralize the coluns by eans of the QR factorization), then one can correct a positive fraction of arbitrarily corrupted entries, in a near ideal fashion. Note that in the case where there are no sall errors z = ), the decoding is exact since ε = and x Ideal = x. Hence, this generalizes earlier results [7]. We would like to ephasize that there is nothing special about the fact that the coluns of A are taken to be orthonoral in Theore.. In fact, one could just as well obtain equivalent stateents for general atrices. Our assuption only allows us to forulate siple and useful results. While the previous result discussed arbitrary sall errors, the next is about stochastic errors. Corollary.3 Suppose the sall errors are i.i.d. N, σ ) and set ε := n)1 + t) σ for soe fixed t >. Then under the sae hypotheses about the restricted isoetry constants of A and the nuber of gross errors as in Theore., the solution to P ) obeys ˆx x l C σ,.8) for soe nuerical constant C with probability exceeding 1 e γ n)/ e / where γ = 1+t 1. In particular, this last stateent holds with overwheling probability if A is chosen at rando as in Theore.. Suppose for instance that = n to ake things concrete so that the MSE of the ideal estiate is equal to / σ. Then the SOCP reconstruction is within a ultiplicative factor C of the ideal MSE. Our experients show that in practice the constant is sall: e.g. when = n, one can correct 15% of arbitrary errors, and in the overwheling ajority of cases obtain a decoded vector whose MSE is less than 3 ties larger than the ideal MSE. 1 Analysis shows ρ to be of the for ρ = O n/ 1 but this is not inforative because the constant is log1 n/) unknown. Deterining the constant is extreely challenging; for an analysis with sparse errors see [1, 15]. 5

6 3 Decoding by Linear Prograing Another way to recover the signal x fro the corrupted vector y 1.) is by linear prograing: P ) in y A x z l1 subject to z l λ, 3.1) A z =, with variables x R n and z R. As is well known, the progra P ) ay also be re-expressed as a linear progra by introducing slack variables just as in P ); we oit the standard details. As with P ), the paraeter λ here is related to the size of the sall errors and will be discussed shortly. In the sequel, we shall also be interested in the ore general forulation of P ) y A x z l1 subject to z i λ i, 1 i, 3.) A z =, which gives additional flexibility for adjusting the thresholds λ 1, λ,..., λ to the noise level. The sae arguents as before prove that P ) is equivalent to P ) in ẽ l1 subject to QQ y ẽ) l λ, 3.3) where we recall that P V = QQ is the orthonoral projector onto V V is the colun space of A); that is, if ê is solution to P ), then there is a unique decoposition y ê = Aˆx + ẑ where A ẑ = and ˆx, ẑ) is solution to P ). The converse is also true. Siilarly, the ore general progra 3.) is equivalent to iniizing the l 1 nor of ẽ under the constraint P V y ẽ) i λ i, 1 i. In statistics, the estiator ê solution to P ) is known as the Dantzig selector [8]. It was originally introduced to estiate the vector e fro the data y and the odel y = Q e + z 3.4) where z is a vector of stochastic errors, e.g. independent ean-zero Gaussian rando variables. The connection with our proble is clear since applying the parity-check atrix Q on both sides of 1.) gives Q y = Q e + Q z as before. If z is stochastic noise, we can use the Dantzig selector to recover e fro Q y. Moreover, available statistical theory asserts that if Q obeys nice restricted isoetry properties and e is sufficiently sparse just as before, then this estiation procedure is extreely accurate and in soe sense optial. It reains to discuss how one should specify the paraeter λ in 3.1)-3.3) which is easy. Suppose the sall errors are stochastic. Then we fix λ so that the true vector e is feasible for P ) with very high probability; i.e. we adjust λ so that P V y e) l = P V z l λ with high probability. In the ore general forulation, the thresholds are adjusted so that sup 1 i P V z i /λ i 1 with high probability. The ain result of this section is that the LP decoder is also provably accurate. 6

7 Theore 3.1 Choose a coding atrix A R n with orthonoral coluns spanning V, and let δ k ) be the isoetry constants of the rescaled atrix n A. Suppose P V z l λ. Then the solution ˆx to P ) obeys λ ˆx x l C 1 k 1 n + x Ideal x l 3.5) for soe C 1 = C 1 c) provided that the nuber k of gross errors obeys δ 3k + δ k < c n 1) for soe c < 1; x Ideal is the ideal solution 1.3) one would get if no gross errors ever occurred. If the orthonoral) coluns of A are selected uniforly at rando, then with probability at least 1 Oe γ n) ) for soe positive constant γ, the estiate 3.5) holds for k ρ, provided ρ ρ n/). In effect, the LP decoder efficiently corrects a positive fraction of arbitrarily corrupted entries. Again, when there are no sall errors z = ), the decoding is exact. Also and just as before, there is nothing special about the fact that the coluns of A are taken to be orthonoral.) We now consider the interesting case in which the sall errors are stochastic. Below, we conveniently adjust the thresholds λ j so that the true vector e is feasible with high probability, see Section 5.4 for details. Corollary 3. Choose a coding atrix A with orthonoral) coluns selected uniforly at rando and suppose the sall errors are i.i.d. N, σ ). Fix λ i = log 1 A i, l σ in 3.), where A i, l = 1 j n A i,j )1/ is the l nor of the ith row. Then if the nuber k of gross errors is no ore than a fraction of as in Theore 3.1, the solution ˆx obeys ˆx x l [1 + C 1s] x Ideal x l s = k log n 1 n 3.6) ), with very large probability, where C 1 is soe nuerical constant. In effect, ˆx x l is bounded by just about [1 + C 1 s] nσ since x Ideal x l is distributed as σ ties a chi-square with n degrees of freedo, and is tightly concentrated around nσ. Recall that the MSE is equal to nσ when there are no gross errors and, therefore, this last result asserts that the reconstruction error is bounded by a constant ties the ideal reconstruction error. Suppose for instance that = n. Then s = 4klog )/ and we see that s is sall when there are few gross errors. In this case, the recovery error is very close to that attained by the ideal procedure. Our experients show that in practice, the constant C 1 is quite sall: for instance, when = n, one can correct 15% of arbitrary errors, and in the overwheling ajority of cases obtain a decoded vector whose MSE is less than 3 ties larger than the ideal MSE. Finally, this last result is in soe way ore subtle than the corresponding result for the SOCP decoder. Indeed, note the explicit dependence on k of the scaling factor in 3.6) that is not present in the corresponding expression for the SOCP decoder.8). This says that in soe sense the accuracy of the LP decoder autoatically adapts to the nuber k of gross errors which were introduced. The saller this nuber, the saller the recovery error. For sall values of k, the bound in 3.6) ay in fact be considerably saller than its analog.8). 7

8 4 Nuerical Experients As entioned earlier, nuerical studies show that the epirical perforance of the proposed decoding strategies is noticeable. To confir these findings, this section discusses an experiental setup and presents nuerical results. The reader wanting to reproduce our results ay find the atlab file available at eanuel/convexdecode. useful. Here are the steps we used: 1. Choose a pair n, ) and saple an by n atrix A with independent standard noral entries; the coding atrix is fixed throughout.. Choose a fraction ρ of grossly corrupted entries and define the nuber of corrupted entries as k = roundρ ); e.g. if = 51 and 1% of the entries are corrupted, k = Saple a block of inforation x R n with independent and identically distributed Gaussian entries. Copute Ax. 4. Select k locations uniforly at rando and flip the signs of Ax at these locations. 5. Saple the vector z = z 1,..., z ) of saller errors with z i i.i.d. N, σ ), and add z to the outcoe of the previous step. Obtain y. 6. Obtain ˆx by solving both P ) and P ) followed by a reprojection step discussed below [8]. 7. Repeat steps 3) 6) 5 ties. We briefly discuss the reprojection step. As observed in [8], both progras P ) and P ) have a tendency to underestiate the vector e they tend to be akin to soft-thresholding procedures). One can easily correct for this bias as follows: 1) solve P ) or P ) and obtain ê; ) estiate the support of the gross errors e via I := {i : ê i > σ}, where σ is the standard deviation of the saller errors; recall that y := Q y = Q e + Q z and update the estiate by regressing y onto the selected coluns of Q via the ethod of least squares ê = argin y Q ẽ l subject to ẽ i =, i I c ; 3) finally, obtain ˆx via A A) 1 A y ê) where ê is the reprojected estiate calculated in the previous step. In our series of experients, we used = n = 51 and a corruption rate of 1%. The standard deviation σ is selected in such a way that just about the first three binary digits of each entry of the codeword Ax are reliable. Forally σ = edian Ax /16. Finally and to be coplete, we set the threshold ε in P ) so that Q z l ε with probability.95; in other words, ε = χ n.95) σ, where χ n.95) is the 95th percentile of a chi-squared distribution with n degrees of freedo. We also set the thresholds in the general forulation 3.) of P ) in a siilar fashion. The distribution of QQ z) i is noral with ean and variance s i = QQ ) i,i σ so that the variable z i = QQ z) i /s i is standard noral. We choose λ i = λ s i where λ obeys sup z i λ 1 i 8

9 with probability at least.95. In both cases, our selection akes the true vector e of gross errors feasible with probability at least.95. In our siulations, the thresholds for the SOCP and LP decoders the paraeters χ n.95) and λ) were coputed by Monte Carlo siulations. To evaluate the accuracy of the decoders, we report two statistics ρ Ideal = ˆx x x Ideal x, and ρoracle = ˆx x x Oracle x, 4.1) which copare the perforance of our decoders with that of ideal strategies which assue either exact knowledge of the gross errors or exact knowledge of their locations. As discussed earlier, x Ideal is the reconstructed vector one would obtain if the gross errors were known to the receiver exactly which is of course equivalent to having no gross errors at all). The reconstruction x Oracle is that one would obtain if, instead, one had available an oracle supplying perfect inforation about the location of the gross errors but not their value). Then one could siply delete the corrupted entries of the received codeword y and reconstruct x by the ethod of least squares, i.e. find the solution to y Oracle A Oracle x l, where A Oracle resp. y Oracle ) is obtained fro A resp. y) by deleting the corrupted rows. The results are presented in Figure 1 and suarized in Table 1. These results show that both our approaches work extreely well. As one can see, our ethods give reconstruction errors which are nearly as sharp as if no gross errors had occurred or as if one knew the locations of these large errors exactly. Put in a different way, the constants appearing in our quantitative bounds are in practice very sall. Finally, the SOCP and LP decoders have about the sae perforance although upon closer inspection, one could argue that the LP decoder is perhaps a tiny bit ore accurate. edian of ρ Ideal ean of ρ Ideal edian of ρ Oracle ean of ρ Oracle SOCP decoder LP decoder Table 1: Suary statistics of the ratios ρ Ideal and ρ Oracle 4.1) for the Gaussian coding atrix. We also repeated the sae experient but with a coding atrix A consisting of n = 56 randoly sapled coluns of the discrete Fourier transfor, and obtained very siilar results. The results are presented in Figure and suarized in Table. The nubers are rearkably close to our earlier findings and again both our ethods work extreely well again the LP decoder is a tiny bit ore accurate). This experient is of special interest since it suggests that one can apply our decoding algoriths to very large data vectors, e.g. with sizes ranging in the hundred of thousands. The reason is that one can use off-the-shelf interior point algoriths which only need to be able to apply A or A to arbitrary vectors and never need to anipulate the entries of A or even store the). When A is a partial Fourier transfor, one can evaluate Ax and A y by eans of the FFT and, hence, this is well suited for very large probles. See [3] for very large scale experients of a siilar flavor. 9

10 16 Ideal LS ratio for the SOCP decoder 18 Ideal oracle ratio for the SOCP decoder frequency 8 frequency ratio ratio 16 Ideal LS ratio for the LP decoder 18 Ideal oracle ratio for the LP decoder frequency 8 frequency ratio ratio Figure 1: Statistics of the ratios 4.1) ρ Ideal first colun) and ρ Oracle second colun) which copare the perforance of the proposed decoders with that of ideal strategies which assue either exact knowledge of the gross errors or exact knowledge of their locations. The first row shows the perforance of the SOCP decoder, the second that of the LP decoder. Ideal oracle ratio for the SOCP decoder Ideal oracle ratio for the LP decoder frequency 1 frequency ratio ratio Figure : Statistics of the ratios ρ Oracle for the SOCP decoder first colun) and the LP decoder second colun) in the case where the coding atrix is a partial Fourier transfor. 1

11 edian of ρ Ideal ean of ρ Ideal edian of ρ Oracle ean of ρ Oracle SOCP decoder LP decoder Table : Suary statistics of the ratios ρ Ideal and ρ Oracle 4.1) for the Fourier coding atrix. 5 Proofs In this section, we prove all of our results. We begin with soe preliinaries which will be used throughout, then prove the clais about the SOCP decoder, and end this section with the LP decoder. Our work builds on [5] and [8]. 5.1 Preliinaries We shall ake extensive use of two siple leas that we now record. Lea 5.1 Let Y d χ d be distributed as a chi-squared rando variable with d degrees of freedo. Then for each t > PY d d t d + t ) e t / and PY d d t d) e t /. 5.1) This is fairly standard [], see also [19] for very slightly refined estiates. We will use 5.1) as follows: for each ɛ, 1) we have PY d d 1 ɛ) 1 ) e ɛ d/4 and PY d d 1 ɛ)) e ɛ d/4. 5.) A consequence of these large deviation bounds is the estiate below. Lea 5. Let u 1, u,..., u ) be a vector uniforly distributed on the unit sphere in diensions and Z n = u u n be the squared length of its first n coponents. Then for each t < 1 P Z n n ) 1 t) e nt /16 + e t /16, 5.3) and P Z n n ) 1 e nt /16 + e t / ) 1 t Proof A result of this kind would essentially follow fro the easure concentration on the sphere [1], but we prefer giving a short and eleentary arguent. Suppose X 1, X,..., X are i.i.d. N, 1). Then the distribution of u 1, u,..., u ) is that of the vector X/ X l and, therefore, the law of Z n is that of Y n /Y, where Y k = j k X j. For a fixed t, 1), define the events A = {Y /1 t/)} and B = {Y n /Y n/1 t)}. We have PB) = PB A c ) PA c ) + PB A) PA) PY n n1 t)/1 t/)) + PY /1 t/)). 11

12 For t 1, we have 1 t)/1 t/) 1 t/ and thus PZ n n/1 t)) PY n n1 t/)) + PY /1 t/)) e nt /16 + e t /16, which follows fro 5.). For the second inequality, we eploy a siilar strategy with A = {Y 1 t/)} and B = {Y n /Y n/ 1 t) 1 }, which leads to PZ n n/ 1 t) 1 ) PY n n/1 t/)) + PY 1 t/)) e nt /16 + e t /16, as claied. 5. Restricted isoetries For a atrix Φ, define the sequences a k ) and b k ) as respectively the largest and sallest nubers obeying a k x l Φx l b k x l, 5.5) for all k-sparse vectors. In other words, if we list all the singular values of all the subatrices of Φ with k coluns, a k is the sallest eleent fro that list and b k the largest. Note of course the reseblance with.6) only this is slightly ore general. Restricted extreal singular values of rando orthonoral projections will play an iportant role in the sequel. The following lea states that for an r rando orthogonal projection, the nubers a k and b k are about r/. Lea 5.3 Let Φ be the first r rows of a rando orthogonal atrix sapled fro the Haar easure). Then the restricted extreal singular values of Φ obey P a k Φ) 7 ) r c e γ r 5.6) 8 and P b k Φ) 9 ) r c e γ r 5.7) 8 for soe universal positive constants c, c, γ, γ provided that k c 1r/ log/r) for soe c 1 >. Proof Put Σ k for the set of all unit-nored k-sparse vectors. By definition a k Φ) = inf x Σ k Φx l. Take a fixed vector x in Σ k. Then Φx l is distributed as Z r in Lea 5.. To see why this is true, note that Φx are the first r coponents of Ux where U is an rando orthogonal atrix. The clai follows fro the fact that Ux is uniforly distributed on the 1)-diensional unit sphere. This is useful because Lea 5. can be eployed to show that for a fixed x Σ k, Φx l 1

13 can not deviate uch fro r/. To develop an inequality concerning all sparse vectors, we now eploy a covering nuber arguent. Consider an ɛ-net N ɛ) of Σ k. An ɛ-net is a subset N ɛ) of Σ k such that for all x Σ k, there is an x N ɛ) such that x x l ɛ. In other words, N ɛ) approxiates Σ k to within distance ɛ. For each x Σ k, Φx Φx l Φx x ) l Φx ɛ, for soe x N ɛ) obeying x x l ɛ, where the last inequality follows fro the fact that the operator nor of Φ is bounded by 1. Hence, Now set ɛ = 1/16 r/. Then P a k Φ) < 7 ) r 8 a k Φ) P inf Φx ɛ. x N ɛ) inf Φx l x N ɛ) N ɛ) P Z r < r r 1 1 )) 16 ) ) which coes fro the union bound together with Φx l Z r for each x. Further, one can find N ɛ) obeying ) N ɛ) 3/ɛ) k. k The reason is siple. First, one can find an ɛ-net of the k 1-diensional sphere whose cardinality does not exceed 3/ɛ) k, see [][Lea 4.16]. And second, Σ k is a union of ) k k 1-diensional spheres. We then apply this fact together with Lea 5., and obtain P a k Φ) < 7 ) ) r 48 k /r) k/ e r/ k Next, there is a bound on binoial coefficients of the for log ) k klog/k) + 1) so that ) 48 k /r) k/ exp k5 +.5 log/r) + log/k))). k One can check that if k c r/log /r) for c sufficiently sall, the right-hand-side of the last inequality is bounded by e β r for soe β < 1/133. This establishes the first part of the theore, naely, 5.6). The second part is nearly identical and is only sketched. We have that b k Φ) = sup Φx l sup Φx l + ɛ. x Σ k x N ɛ) The proof now proceeds as before noting that 5.4) gives a bound on the probability that for each x, Φx l exceeds r/ ties a sall ultiplicative factor. Note that in this proof, we have not tried to derive the optial constants, and a ore refined analysis would surely yield far better nuerical constants., 13

14 5.3 The SOCP decoder We begin by adapting an iportant result fro [5]. Lea 5.4 adapted fro [5]) Set Φ R r and let a k ) and b k ) be the restricted extreal singular values of Φ as in 5.5). Any point x R obeying also obeys x l1 x l1, and Φ x Φx l ε, 5.8) x x l provided that x is k-sparse with k such that a 3k Φ) 1 b k Φ) >. 6 ε a 3k Φ) 1 b k Φ), 5.9) The proof follows the sae steps as that of Theore 1.1 in [5], and is oitted. In particular, it follows fro.6) in the aforeentioned reference with M = T and a M+ T resp. b M ) in place of 1 δ T +M resp. 1 + δ M ) in the definition of C T,M Proof of Theore. Recall that the solution ˆx, ẑ) to P ) obeys.4) where ê is the solution to P ). Replacing y in.4) with Ax + e + z gives and since A A = I, ˆx x = A A) 1 A e ê) + A A) 1 A z = A A) 1 A e ê) + x Ideal x, 5.1) ˆx x l A e ê) l + x Ideal x l. To prove.7), it then suffices to show that e ê l C ε 1 n since the -nor of A is at ost 1. By assuption Q y e) l = Q z l ε and thus, e is feasible for P ) which iplies ê l 1 e l1. Moreover, Q e Q ê l Q y e) l + Q y ê) l ε. We then apply Lea 5.4 with Φ = Q ) and obtain 6 ε e ê l a 3k Q ) 1 b k Q ). 5.11) Now since the atrix obtained by concatenating the coluns of A and Q is an isoetry, we have A x l + Q x l = x l x R, whence a k Q ) = 1 b k A ), b k Q ) = 1 a k A ). 14

15 Assuing that a 3k Q ) 1 b k Q ), we deduce fro 5.11) that e ê l 6 ε a 3k Q ) + 1 b k Q ) 1 b 3k A ) 1 1 a k A )) 6ε a 3k Q ) a k A ) b 3k A ). 5.1) Recall that δ k ) are the restricted isoetry constants of n A, and observe that by definition for each k = 1,,..., a k A ) n 1 δ k), b k A ) n 1 + δ k). It follows that the denoinator on the right-hand side of 5.1) is greater or equal to 1 + n 1 δ k) n 1 + δ 3k) = 1 1 n ) n δ 3k + 1 ) δ k. Now suppose that for soe < c < 1, δ 3k + 1 δ k c ) n 1. This autoatically iplies a 3k Q ) 1 b k Q ), and the denoinator on the right-hand side of 5.1) is greater or equal to 1 1 c)1 n ). The nuerator obeys a 3k Q ) = 1 b 3k A ) 1 a 3k A ) 1 1 δ 3k ) n. Since n δ 3k c 1 n ), we also have a 3k Q ) 1 + c )1 n ). In suary, 5.1) gives e ê l C ε 1 n, where one can take C as c/)/1 c). This establishes the first part of the clai. We now turn to the second part of the theore and argue that if the orthonoral coluns of A are chosen uniforly at rando, the error bound.7) is valid as long as we have a constant fraction of gross errors. Put r = n and let X be an by r atrix with independent Gaussian entries with ean and variance 1/. Consider now the reduced singular value decoposition of X X = UΣV, U R r and Σ, V R r r. Then the coluns of U are r orthonoral vectors selected uniforly at rando and thus U and Q have the sae distribution. Thus we can think of Q as being the left singular vectors of a Gaussian atrix X with independent entries. Fro now on, we identify U with Q. Observe now that X ê e) l = V ΣQ ê e) l = ΣQ ê e) l σ 1 X) Q ê e) l, where σ 1 X) is the largest singular value of X. The singular values of Gaussian atrices are well concentrated and a classical result [11, Theore II.13] shows that ) r P σ 1 X) > t e t /. 5.13) 15

16 By choosing t = 1 in the above forula, we have X ê e) l 3 Q ê e) l 6ε with probability at least 1 e / since Q ê e) l ɛ. We now apply Lea 5.4 with Φ = X, which gives 3 6 ε e ê l a 3k X ) 1 b k X ) = r 3 6 ε a 3k Y ) 1 b k Y ), 5.14) where Y = r X. The theore is proved since it is well known that if k c r/ log/r) for soe constant c, we have a 3k Y ) 1 b k Y ) c 1 with probability at least 1 Oe γ r ) for soe universal constants c 1 and γ; this follows fro available bounds on the restricted isoetry constants of Gaussian atrices [9, 7, 14, 4] Proof of Corollary.3 First, we can just assue that σ = 1 as the general case is treated by a siple rescaling. Put r = n. Since the rando vector z follows a ultivariate noral distribution with ean zero and covariance atrix I I is the identity atrix in diensions), Q z is also ultivariate noral with ean zero and covariance atrix Q Q = I r. Consequently, Q z l is distributed as a chi-squared variable with r degrees of freedo. Pick λ = γ r in 5.1), and obtain P Q z l 1 + γ ) + γ )r e γr/. With t = γ + γ so that γ = 1 + t 1)/, we have Q z l r1 + t) with probability at least 1 e γ n)/. On this event, Theore. asserts that ˆx x l C 1 + t) + x x Ideal l. This essentially concludes the proof of the corollary since the size of x x Ideal l is about n. Indeed, x x Ideal l = A z l χ n as observed earlier. As a consequence, for each t >, we have x x Ideal l n1 + t ) σ with probability at least 1 e γ n/, where γ is the sae function of t as before. Selecting t as t = /n, say, gives the result. 5.4 The LP decoder Before we begin, we introduce the nuber θ k,k of a atrix Φ R r for k + k called the k, k -restricted orthogonality constants. This is the sallest quantity such that Φv, Φv θ k,k v l v l 5.15) holds for all k and k -sparse vectors supported on disjoint sets. Sall values of restricted orthogonality constants indicate that disjoint subsets of coluns span nearly orthogonal subspaces. The following lea which relates the nuber θ k,k to the extreal singular values will prove useful. 16

17 Lea 5.5 For any atrix Φ R r, we have θ k,k Φ) 1 b k+k Φ) a k+k Φ)). Proof Consider two vectors v and v which are respectively k and k -sparse. By definition we have a k+k Φ) Φv + Φv l b k+k Φ), a k+k Φ) Φv Φv l b k+k Φ), and the conclusion follows fro the parallelogra identity Φv, Φv = 1 4 Φv + Φv l Φv Φv l 1 b k+k Φ) a k+k Φ)). The arguent underlying Theore 3.1 uses an interediate result whose proof ay be found in the Appendix. Here and in the reainder of this paper, x I is the restriction of the vector x to an index set I, and for a atrix X, X I is the subatrix fored by selecting the coluns of X with indices in I. Lea 5.6 Let Φ be an r -diensional atrix and suppose T is a set of cardinality k. For a vector h R, we let T 1 be the k largest positions of h outside of T. Put T 1 = T T 1 and let Φ T 1 and h T1 be the coordinate restrictions of Φ and h to T 1, respectively. Then h T1 l 1 a k+k Φ) Φ T 1 Φh l + θ k,k+k Φ) a k+k Φ) k h T c l ) and h l h T1 l + 1 k h T c l ) Proof of Theore 3.1 Just as before, it suffices to show that e ê l C k λ 1 n/) 1. Set h = ê e and let T be the support of e which has size k). Because e is feasible for P ) we have on the one hand ê l1 e l1, which gives e T l1 h T l1 + h T c l1 e + h l1 e l1 h T c l1 h T l1. Note that this has an interesting consequence since by Cauchy Schwarz. On the other hand h T c l1 h T l1 k h T l 5.18) QQ h l QQ ê y) l + QQ y e) l λ. 5.19) 17

18 The ingredients are now in place to establish the clai. We set k = k, apply Lea 5.6) with Φ = Q to the vector h = ê e, and obtain h l h T1 l, and h T1 l 1 a k Q ) θ k,k Q ) Q T 1 Q h l. 5.) Since each coponent of Q T1 Q h is at ost equal to λ, see 5.19), we have Q T1 Q h l k λ. We then conclude fro Lea 5.5 that h l k λ a k Q ) + 1 a 3k Q ) 1 b 3k Q ). 5.1) For each k, recall the relations a k Q ) = 1 b k A ) and b k Q ) = 1 a k A ) which give h l 4 k λ D, D := 1 b k A ) 1 b 3k A ) + 1 a 3k A ). Now just as before, it follows fro our definitions that for each k, b k A ) n 1+δ k) and a k A ) n 1 δ k). These inequalities iply D 1 n 1 + δ k + δ 3k ). Therefore, if one assues that ) δ k + δ 3k c n 1, for soe fixed constant < c < 1, then e ê l = h l 4 k 1 c This establishes the first part of the theore. λ 1 n. We turn to the second part of the clai; if the orthonoral coluns of A are chosen uniforly at rando, we show that the error bound 3.5) is valid with large probability as long as we have a constant fraction of gross errors. To do this, it suffices to show that the denoinator D in 5.1) obeys D 3 a 3k Q ) 1 b 3k Q ) r. This follows fro Lea 5.3. If k is sufficiently sall, we have that a 3k Q ) 7/8) r/ and b 3k Q ) 9/8) r/ except on a set of exponentially sall probability, which gives D r 3 ) ) ) 9 8 r. 18

19 5.4. Proof of Corollary 3. First, we can just assue that σ = 1 as the general case is treated by a siple rescaling. The rando vector QQ z follows a ultivariate noral distribution with ean zero and covariance atrix QQ. In particular QQ z) i N, s i ), where s i = QQ ) i,i. This iplies that z i = QQ z) i /s i is standard noral with density φt) = π) 1/ e t /. For each i, P z i > t) φt)/t and thus ) P sup z i t φt)/t. 1 i With t = log, this gives Psup 1 i z i log ) 1/ π log. Better bounds are possible but we will not pursue these refineents here. Observe now that s i = Q i, l = 1 A i, l, and since λ i = log Q i, l, we have that with probability at least 1 1/ π log. On the event 5.), Theore 3.1 then shows that We clai that QQ z i λ i, i 5.) ˆx x l C k /r) ax λ i + x x Ideal l. 5.3) i ax i λ i = ax Q log i i, l 3r 5.4) with probability at least 1 e γ for soe positive constant γ. Cobining 5.3) and 5.4) yields log ˆx x l C n k + x x Ideal l. This would essentially conclude the proof of the corollary since the size of x x Ideal l is about n. Exact bounds for x x Ideal l are found in the proof of Corollary.3 and we do not repeat the arguent. It reains to check why 5.4) is true. For r /3 and since Q i, l 1, the clai holds with probability 1 because 3r/ 1! For r /3, it follows fro Q i, l + A i, l = 1 that P ax Q i i, l r ) = P in A i i, l n The clai follows by applying Lea 5. since r/n 1/. 1 n) r ) P A 1, l n 1 r )). n 6 Discussion We have introduced two decoding strategies for recovering a block x R n of n pieces of inforation fro a codeword Ax which has been corrupted both by adversary and sall errors. Our ethods are concrete, efficient and guaranteed to perfor well. Because we are working with real valued 19

20 inputs, we ephasize that this work has nothing to do with the use of linear prograing ethods proposed by Feldan and his colleagues to decode binary codes such as turbo-codes or low-density parity check codes [16, 17, 18]. Instead, it has uch to do with the recent literature on copressive sapling or copressed sensing [4, 9, 13, 5, 1, 3], see also [6, 1] for related work. In particular, it draws on recent results [5, 8] showing that the theory and practice of copressed sensing is robust vis a vis noise. On the practical end, we truly recoend using the two-step refineent discussed in Section 4 the reprojection step as this really tends to enhance the perforance. We anticipate that other tweaks of this kind ight also work and provide additional enhanceent. On the theoretical end, we have not tried to obtain the best possible constants and there is little doubt that a ore careful analysis will provide sharper constants. Also, we presented soe results for coding atrices with orthonoral coluns for ease of exposition but this is unessential. In fact, our results can be extended to nonorthogonal atrices. For instance, one could just as well obtain siilar results for n coding atrices A with independent Gaussian entries. There are also variations on how one ight want to decode. We focused on constraints of the for P V z where is either the l nor or the l nor, and P V is the orthoprojector onto V, the orthogonal subspace to the colun space of A. But one could also iagine choosing other types of constraints, e.g. of the for X z l ε for P ) or XX z l λ for P ) or constraints about the individual agnitudes of the coordinates XX z) i in the ore general forulation), where the coluns of X span V. In fact, one could choose the decoding atrix X first, and then A so that the ranges of A and X are orthogonal. Choosing X R r with i.i.d. ean-zero Gaussian entries and applying the LP decoder with a constraint on XX z l instead of z l would siplify the arguent since restricted isoetry constants for Gaussian atrices are already readily available [9, 7, 14, 4]! Finally, we discussed the use of coding atrices which have fast algoriths, thus enabling large scale probles. Exploring further opportunities in this area sees a worthy pursuit. 7 Appendix: Proof of Lea 5.6 The proof is a variation on that of Lea 3.1 in [8]. In the sequel, T {1,..., } is a set of size k, T 1 is the k largest positions of h outside of T, T 1 = T T 1. Next, divide T c into subsets of size k and enuerate T c as n 1, n,..., n T in decreasing order of agnitude of h T c. Set T j = {n l, j 1)k + 1 l jk }. That is, T 1 is as before and contains the indices of the k largest coefficients of h T c, T contains the indices of the next k largest coefficients, and so on. Observe that Φh T1 = Φh j Φh T j so that Φh T1 l = Φh T1, Φh j Φh T1, Φh Tj. On the one hand, we have Φh T1, Φh = h T1, Φ T 1 Φh h T1 l Φ T 1 Φh l,

21 and on the other Φh T1, Φh Tj θ k+k,k h T 1 l h Tj l. This gives a k+k h T 1 l Φh T1 l = h T1 l Φ T 1 Φh l + θ k+k,k h Tj l ), 7.1) where for siplicity, we have oitted the dependence on Φ in the constants a k Φ) and θ k,k Φ). We then develop an upper bound on j h T j l as in [5]. By construction, the agnitude of each coefficient in T j+1 is less than the average of the agnitudes in T j, h Tj+1 l h Tj l1 /k h Tj+1 l h Tj l 1 /k. Therefore, h Tj l h Tj l1 / k = h l1 T c) / k. 7.) j j 1 Hence, we deduce fro 7.1) that h T1 l Φ T 1 Φh l a + θ h k+k,k l1 T c ), k+k a k+k k which proves the first part of the lea. For the second part, it follows fro 7.) that h T c 1 l = h Tj l h Tj l h T c l1 / k. j j j References [1] A. Barvinok. A course in convexity Graduate Studies in Matheatics 54). Aerican Matheatical Society, Providence, RI,. [] S. Boyd and L. Vandenberghe. Convex optiization. Cabridge University Press, Cabridge, 4. [3] E. J. Candès and J. Roberg. Practical signal recovery fro rando projections. In SPIE International Syposiu on Electronic Iaging: Coputational Iaging III. SPIE, January 5. [4] E. J. Candès, J. Roberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction fro highly incoplete frequency inforation. IEEE Trans. Infor. Theory, 5):489 59, February 6. [5] E. J. Candès, J. Roberg, and T. Tao. Stable signal recovery fro incoplete and inaccurate easureents. Co. on Pure and Applied Math., 598):17 13, 6. [6] E. J. Candès, M. Rudelson, T. Tao, and R. Vershynin. Error correction via linear prograing. In Proceedings of the 46th Annual IEEE Syposiu on Foundations of Coputer Science FOCS), pages 95 38, 5. [7] E. J. Candès and T. Tao. Decoding by linear prograing. IEEE Trans. Infor. Theory, 511):43 415, Deceber 5. 1

22 [8] E. J. Candès and T. Tao. The Dantzig selector: statistical estiation when p is uch larger than n. Ann. Statist., 5. To appear. [9] E. J. Candès and T. Tao. Near-optial signal recovery fro rando projections: universal encoding strategies? IEEE Trans. Infor. Theory, 51): , Deceber 6. [1] A. Cohen, W. Dahen, and R. DeVore. Copressed sensing and best k-ter approxiation. 6. [11] K. R. Davidson and S. J. Szarek. Local operator theory, rando atrices and Banach spaces. In Handbook of the geoetry of Banach spaces, Vol. I, pages North-Holland, Asterda, 1. [1] D. L. Donoho. Neighborly polytopes and sparse solutions of underdeterined linear equations. Technical Report, Stanford University, 5. [13] D. L. Donoho. Copressed sensing. IEEE Trans. Infor. Theory, 54): , April 6. [14] D. L. Donoho. For ost large underdeterined systes of linear equations the inial l 1 -nor solution is also the sparsest solution. Co. Pure Appl. Math., 596):797 89, 6. [15] C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the liits of LP decoding. In Proceedings of the thirty-ninth annual ACM syposiu on Theory of coputing, San Diego, California, USA, pages 85 94, 7. [16] J. Feldan. Decoding Error-Correcting Codes via Linear Prograing. PhD thesis, Massachussets Institute of Technology, 3. [17] J. Feldan. LP decoding achieves capacity. In SODA, pages SIAM, 5. [18] J. Feldan, T. Malkin, C. Stein, R. A. Servedio, and M. J. Wainwright. LP decoding corrects a constant fraction of errors. In Proc. IEEE International Syposiu on Inforation Theory ISIT). IEEE, 4. [19] I. M. Johnstone. Chi-square oracle inequalities. In State of the art in probability and statistics Leiden, 1999), volue 36 of IMS Lecture Notes Monogr. Ser., pages Inst. Math. Statist., Beachwood, OH, 1. [] B. Laurent and P. Massart. Adaptive estiation of a quadratic functional by odel selection. Ann. Statist., 85): ,. [1] P. Marziliano, M. Vetterli, and T. Blu. Sapling and exact reconstruction of bandliited signals with additive shot noise. IEEE Trans. Infor. Theory, 55):3 33, 6. [] G. Pisier. The Volue of Convex Bodies and Banach Space Geoetry. Cabridge Tracts in Matheatics 94, Cabridge University Press, Cabridge, [3] M. Rudelson and R. Vershynin. Geoetric approach to error-correcting codes and reconstruction of signals. Int. Math. Res. Not., 64): , 5. [4] M. Rudelson and R. Vershynin. Sparse reconstruction by convex relaxation: Fourier and Gaussian easureents. In CISS, 6. [5] J. Tropp and A. Gilbert. Signal recovery fro partial inforation via orthogonal atching pursuit. Subitted anuscript, April 5. [6] M. Vetterli, P. Marziliano, and T. Blu. Sapling signals with finite rate of innovation. IEEE Trans. Signal Process., 56): ,.

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de