arxiv: v2 [cs.it] 10 Aug 2009

Size: px

Start display at page:

Download "arxiv: v2 [cs.it] 10 Aug 2009"

Noel Dalton
5 years ago
Views:

1 Sarse recovery by non-convex otimization instance otimality Rayan Saab arxiv:89.745v2 [cs.it] 1 Aug 29 Deartment of Electrical and Comuter Engineering, University of British Columbia, Vancouver, B.C. Canada V6T 1Z4 Özgür Yılmaz Deartment of Mathematics, University of British Columbia, Vancouver, B.C. Canada V6T 1Z2 Abstract In this note, we address the theoretical roerties of, a class of comressed sensing decoders that rely on l minimization with < < 1 to recover estimates of sarse and comressible signals from incomlete and inaccurate measurements. In articular, we extend the results of Candès, Romberg and Tao [4] and Wojtaszczyk [3] regarding the decoder 1, based on l 1 minimization, to with < < 1. Our results are two-fold. First, we show that under certain sufficient conditions that are weaker than the analogous sufficient conditions for 1 the decoders are robust to noise and stable in the sense that they are (2,) instance otimal for a large class of encoders. Second, we extend the results of Wojtaszczyk to show that, like 1, the decoders are (2,2) instance otimal in robability rovided the measurement matrix is drawn from an aroriate distribution. 1 Introduction The sarse recovery roblem received a lot of attention lately, both because of its role in transform coding with redundant dictionaries (e.g., [9, 28, 29]), and addresses: rayans@ece.ubc.ca (Rayan Saab), oyilmaz@math.ubc.ca (Özgür Yılmaz). 1 This work was suorted in art by a Discovery Grant and by a CRD Grant (DNOISE) from the Natural Sciences and Engineering Research Council of Canada. R. Saab also acknowledges a UGF award from the UBC, and a Pacific Century Graduate Scholarshi from the Province of British Columbia through the Ministry of Advanced Education. Prerint submitted to Elsevier 23 October 218

2 erhas more imortantly because it insired comressed sensing [3, 4, 13], a novel method of acquiring signals with certain roerties more efficiently comared to the classical aroach based on Nyquist-Shannon samling theory. Define Σ N S to be the set of all S-sarse vectors, i.e., Σ N S := {x RN : su(x) S}, and define comressible vectors as vectors that can be well aroximated in Σ N S. Let σ S (x) l denote the best S-term aroximation error of x in l (quasi-)norm where >, i.e., σ S (x) l := min x v. v Σ N S Throughout the text, A denotes an M N real matrix where M < N. Let the associated encoder be the ma x Ax (also denoted by A). The transform coding and comressed sensing roblems mentioned above require the existence of decoders, say : R M R N, with roughly the following roerties: (C1) (Ax) = x whenever x Σ N S with sufficiently small S. (C2) x (Ax+e) e +σ S (x) l, where the norms are aroriately chosen. Here e denotes measurement error, e.g., thermal and comutational noise. (C3) (Ax) can be comuted efficiently (in some sense). Below, we denote the (in general noisy) encoding of x by b, i.e., b = Ax+e. (1) In general, the roblem of constructing decoders with roerties (C1)-(C3) is non-trivial (even in the noise-free case) as A is overcomlete, i.e., the linear system of M equations in (1) is underdetermined, and thus, if consistent, it admits infinitely many solutions. In order for a decoder to satisfy (C1)-(C3), it must choose the correct solution among these infinitely many solutions. Under the assumtion that the original signal x is sarse, one can hrase the roblem of finding the desired solution as an otimization roblem where the objective is to maximize an aroriate measure of sarsity while simultaneously satisfying the constraints defined by (1). In the noise-free case, i.e., when e = in (1), under certain conditions on the M N matrix A, i.e., if A is in general osition, there is a decoder which satisfies (Ax) = x for all x Σ N S whenever S < M/2, e.g., see [14]. This can be exlicitly comuted via the otimization roblem (b) := argmin y y subject to b = Ay. (2) 2

3 Here y denotes the number of non-zero entries of the vector y, equivalently its so-called l -norm. Clearly, the sarsity of y is reflected by its l -norm. 1.1 Decoding by l 1 minimization As mentioned above, (Ax) = x exactly if x is sufficiently sarse deending on the matrix A. However, the associated otimization roblem is combinatorial in nature, thus its comlexity grows quickly as N becomes much larger than M. Naturally, one then seeks to modify the otimization roblem so that it lends itself to solution methods that are more tractable than combinatorial search. In fact, in the noise-free setting, the decoder defined by l 1 minimization, given by 1 (b) := argmin y y 1 subject to Ay = b, (3) recovers x exactly if x is sufficiently sarse and the matrix A has certain roerties (e.g., [4, 6, 9, 14, 15, 26]). In articular, it has been shown in [4] that if x Σ N S and A satisfies a certain restricted isometry roerty, e.g., δ 3S < 1/3 or more generally δ (k+1)s < k 1 for some k > 1 such that k 1N, k+1 S then 1 (Ax) = x (in what follows, N denotes the set of ositive integers, i.e., / N). Here δ S are the S-restricted isometry constants of A, as introduced by Candès, Romberg and Tao (see, e.g., [4]), defined as the smallest constants satisfying (1 δ S ) c 2 2 Ac 2 2 (1+δ S ) c 2 2 (4) for every c Σ N S. Throughout the aer, using the notation of [3], we say that a matrix satisfies RIP(S,δ) if δ S < δ. Checking whether a given matrix satisfies a certain RIP is comutationally intensive, and becomes raidly intractable as the size of the matrix increases. On the other hand, there are certain classes of random matrices which have favorable RIP. In fact, let A be an M N matrix the columns of which are indeendent, identically distributed (i.i.d.) random vectors with any sub- Gaussian distribution. It has been shown that A satisfies RIP(S,δ) with any < δ < 1 when S c 1 M/log(N/M), (5) with robability greater than 1 2e c 2M (see, e.g., [1], [5], [6]), where c 1 and c 2 are ositive constants that only deend on δ and on the actual distribution from which A is drawn. In addition to recovering sarse vectors from error-free observations, it is imortant that the decoder be robust to noise and stable with regards to the comressibility of x. In other words, we require that the reconstruction error scale well with the measurement error and with the non-sarsity of the sig- 3

4 nal (i.e., (C2) above). For matrices that satisfy RIP((k+1)S,δ), with δ < k 1 k+1 for some k > 1 such that k 1 N, it has been shown in [4] that there exists a S feasible decoder ǫ 1 for which the aroximation error ǫ 1 (b) x 2 scales linearly with themeasurement error e 2 ǫ andwith σ S (x) l 1. More secifically, define the decoder ǫ 1 (b) = argmin y 1 subject to Ay b 2 ǫ. (6) y The following theorem of Candès et al. in [4] rovides error guarantees when x is not sarse and when the observation is noisy. Theorem 1.1 [4] Fix ǫ, suose that x is arbitrary, and let b = Ax+e where e 2 ǫ. If A satisfies δ 3S +3δ 4S < 2, then ǫ 1 (b) x σ S (x) l 1 2 C 1,S ǫ+c 2,S. (7) S For reasonable values of δ 4S, the constants are well behaved; e.g., C 1,S = 12.4 and C 2,S = 8.77 for δ 4S = 1/5. Remark 1.2 This means that given b = Ax+e, and x is sufficiently sarse, ǫ 1 (b) recovers the underlying sarse signal within the noise level. Consequently the recovery is erfect if ǫ =. Remark 1.3 By exlicitly assuming x to be sarse, Candès et. al. [4] roved a version of the above result with smaller constants, i.e., for b = Ax+e with x Σ N S and e 2 ǫ, ǫ 1 (b) x 2 C S ǫ, (8) where C S < C 1,S. Remark 1.4 Recently, Candès [2] showed that δ 2S < 2 1 is sufficient to guarantee robust and stable recovery in the sense of (7) with slightly better constants. In the noise free case, i.e., when ǫ =, the reconstruction error in Theorem 1.1 is bounded above by σ S (x) l 1/ S, see (7). This uer bound would sharen if one could relace σ S (x) l 1/ S with σ S (x) l 2 on the right hand side of (7) (note that σ S (x) l 1 can be large even if all the entries of the reconstruction error are small but nonzero; this follows from the fact that for any vector y R N, y 2 y 1 N y 2, and consequently there are vectors x R N for which σ S (x) l 1/ S σ S (x) l 2, esecially when N is large). In [1] it was shown that the term C 2,S σ S (x) l 1/ S on the right hand side of (7) cannot be relaced with Cσ S (x) l 2 if one seeks the inequality to hold for all x R N with a fixed matrix A, unless M > cn for some constant c. This is unsatisfactory since the aradigm of comressed sensing relies on the ability of recovering sarse or comressible vectors x from significantly fewer measurements than 4

5 the ambient dimension N. Even though one cannot obtain bounds on the aroximation error in terms of σ S (x) l 2 with constants that are uniform on x (with a fixed matrix A), the situation is significantly better if we relax the uniformity requirement and seek for a version of (7) that holds with high robability. Indeed, it has been recently shown by Wojtaszczyk that for any secific x, σ S (x) l 2 can be laced in(7) in lieuof σ S (x) l 1/ S (with different constants thatare still indeendent of x) with high robability on the draw of A if (i) M > cslogn and (ii) the entries A is drawn i.i.d. from a Gaussian distribution or the columns of A are drawn i.i.d. from the uniform distribution on the unit shere in R M [3]. In other words, the encoder 1 = 1 is (2,2) instance otimal in robability for encoders associated with such A, a roerty which was discussed in [1]. Following the notation of [3], we say that an encoder-decoder air (A, ) is (q,) instance otimal of order S with constant C if (Ax) x q C σ S(x) l S 1/ 1/q (9) holds for all x R N. Moreover, for random matrices A ω, (A ω, ) is said to be (q,) instance otimal in robability if for any x (9) holds with high robability on the draw of A ω. Note that with this notation Theorem 1.1 imlies that (A, 1 ) is (2,1) instance otimal (set ǫ = ), rovided A satisfies the conditions of the theorem. The receding discussion makes it clear that 1 satisfies conditions (C1) and (C2), at least when A is a sub-gaussian random matrix and S is sufficiently small. It only remains to note that decoding by 1 amounts to solving an l 1 minimization roblem, and is thus tractable, i.e., we also have (C3). In fact, l 1 minimization roblems as described above can be solved efficiently with solvers secifically designed for the sarse recovery scenarios (e.g. [27], [16], [11]). 1.2 Decoding by l minimization We have so far seen that with aroriate encoders, the decoders ǫ 1 rovide robust and stable recovery for comressible signals even when the measurements are noisy [4], and that (A ω, 1 ) is (2,2) instance otimal in robability [3] when A ω is an aroriate random matrix. In articular, stability and robustness roerties are conditioned on an aroriate RIP while the instance otimality roerty is deendent on the draw of the encoder matrix (which is tyically called the measurement matrix) from an aroriate distribution, in addition to RIP. 5

6 Recall that the decoders 1 and ǫ 1 were devised because their action can be comuted by solving convex aroximations to the combinatorial otimization roblem (2) that is required to comute. The decoders defined by ǫ (b) := argmin y s.t. Ay b 2 ǫ, and y (1) (b) := argmin y s.t. Ay = b, y (11) with < < 1 are also aroximations of, the actions of which are comuted via non-convex otimization roblems that can be solved, at least locally, still much faster than (2). It is natural to ask whether the decoders and ǫ ossess robustness, stability, and instance otimality roerties similar to those of 1 and ǫ 1, and whether these are obtained under weaker conditions on the measurement matrices than the analogous ones with = 1. Early work by Gribonval and co-authors [19 22] take some initial stes in answering these questions. In articular, they devise metrics that lead to sufficient conditions for uniqueness of 1 (b) to imly uniqueness of (b) and secifically for having (b) = 1 (b) = x. The authors also resent stability conditions in terms of various norms that bound the error, and they conclude that the smaller the value of is, the more non-zero entries can be recovered by (11). These conditions, however, are hard to check exlicitly and no class of deterministic or random matrices was shown to satisfy them at least with high robability. On the other hand, the authors rovide lower bounds for their metrics in terms of generalized mutual coherence. Still, these conditions are essimistic in the sense that they generally guarantee recovery of only very sarse vectors. Recently, Chartrand showed that in the noise-free setting, a sufficiently sarse signal can be recovered erfectly with, where < < 1, under less restrictive RIP requirements than those needed to guarantee erfect recovery with 1. The following theorem was roved in [7]. Theorem 1.5 [7] Let < 1, and let S N. Suose that x is S-sarse, and set b = Ax. If A satisfies δ ks + k 2 1 δ (k+1)s < k for some k > 1 such that k 1 S N, then (b) = x. Note that, for examle, when =.5 and k = 3, the above theorem only requires δ 3S + 27δ 4S < 26 to guarantee erfect recovery with.5, a less restrictive condition than the analogous one needed to guarantee erfect reconstruction with 1, i.e., δ 3S +3δ 4S < 2.Moreover, in[8],staneva andchartrand study a modified RIP that is defined by relacing Ac 2 in (4) with Ac. They show that under this new definition of δ S, the same sufficient condition as in Theorem 1.5 guarantees erfect recovery. Steneva and Chartrand also show that if A is an M N Gaussian matrix, their sufficient condition is satisfied rovided M > C 1 ()S +C 2 ()Slog(N/S), where C 1 () and C 2 () are 6

7 given exlicitly in [8]. It is imortant to note is that C 2 () goes to zero as goes to zero. In other words, the deendence on N of the required number of measurements M (that guarantees erfect recovery for all x Σ N S ) disaears as aroaches. This result motivates a more detailed study to understand the roerties of the decoders in terms of stability and robustness, which is the objective of this aer Algorithmic Issues Clearly, recovery by l minimization oses a non-convex otimization roblem with many local minimizers. It is encouraging that simulation results from recent aers, e.g.,[7,25], strongly indicate that simle modifications to known aroaches like iterated reweighted least squares algorithms and rojected gradient algorithms yield x that are the global minimizers of the associated l minimization roblem (or aroximate the global otimizers very well). It is also encouraging to note that even though the results resented in this work and in others [7,19 22,25] assume that the global minimizer has been found, a significant set of these results, including all results in this aer, continue to hold if we could obtain a feasible oint x which satisfies x x (where x is the vector to be recovered). Nevertheless, it should be stated that to our knowledge, the modified algorithms mentioned above have only been shown to converge to local minima. 1.3 Paer Outline In what follows, we resent generalizations of the above results, giving stability and robustness guarantees for l minimization. In Section 2.1 we show that the decoders and ǫ are robust to noise and (2,) instance otimal in the case of aroriate measurement matrices. For this section we rely and exand on our note [25]. In Section 2.3 we extend [3] and show that for the same range of dimensions as for decoding by l 1 minimization, i.e., when A ω R M N with M > cslog(n), (A ω, ) is also (2,2) instance otimal in robability for < < 1, rovided the measurement matrix A ω is drawn from an aroriate distribution. The generalization follows the roof of Wojtaszczyk in [3]; however it is non-trivial and requires a variant of a result by Gordon and Kalton [18] on the Banach-Mazur distance between a -convex body and its convex hull. In Section 3 we resent some numerical results, further illustrating theossible benefits ofusing l minimizationandhighlighting the behavior of the decoder in terms of stability and robustness. Finally, in Section 4 we resent the roofs of the main theorems and corollaries. While writing this aer, we became aware of the work of Foucart and Lai [17] 7

8 which also shows similar (2,) instance otimality results for < < 1 under different sufficient conditions. In essence, one could use the (2, )-results of Foucart and Lai to obtain (2,2) instance otimality in robability results similar to the ones we resent in this aer, albeit with different constants. Since neither the sufficient conditions for (2, ) instance otimality resented in [17] nor the ones in this aer are uniformly weaker, and since neither rovide uniformly better constants, we simly use our estimates throughout. 2 Main Results In this section, we resent our theoretical results on the ability of l minimization to recover sarse and comressible signals in the resence of noise. 2.1 Sarse recovery with : stability and robustness We begin with a deterministic stability and robustness theorem for decoders and ǫ when < < 1 that generalizes Theorem 1.1 of Candès et al. Note the associated sufficient conditions on the measurement matrix, given in (12) below, are weaker for smaller values of than those that corresond to = 1. The results in this subsection were initially reorted, in art, in [25]. In what follows, we say that a matrix A satisfies the roerty P(k,S,) if it satisfies δ ks +k 2 1 δ (k+1)s < k 2 1 1, (12) for S N and k > 1 such that k 1 S N. Theorem 2.1 (General Case) Let < 1. Suose that x is arbitrary and b = Ax+e where e 2 ǫ. If A satisfies P(k,S,), then where ǫ (b) x 2 C 1 ǫ σ S (x) l +C 2, (13) S1 /2 C 1 = 2 1+k /2 1 (2/ 1) /2 (1 δ (k+1)s ) /2 (1+δ ks ) /2 k/2 1, and (14) C 2 = 2( 2 )/2 1+ ((2/ 1) 2 +k /2 1 )(1+δ ks ) /2. (15) k 1 /2 (1 δ (k+1)s ) /2 (1+δ ks) /2 k 1 /2 Remark 2.2 By setting = 1 and k = 3 in Theorem 2.1, we obtain Theorem 1.1, with recisely the same constants. 8

9 Remark 2.3 The constants in Theorem 2.1 are generally well behaved; e.g., C 1 = 5.31 and C 2 = 4.31 for δ 4S =.5 and =.5. Note for δ 4S =.5 the sufficient condition (12) is not satisfied when = 1, and thus Theorem 2.1 does not yield any uer bounds on 1 (b) x 2 in terms of σ S (x) l 1. Corollary 2.4 ((2,) instance otimality) Let < 1. Suose that A satisfies P(k,S,). Then (A, ) is (2,) instance otimal of order S with constant C 1/ 2 where C 2 is as in (15). Corollary 2.5 (sarse case) Let < 1. Suose x Σ N S and b = Ax+e where e 2 ǫ. If A satisfies P(k,S,), then where C 1 is as in (14). ǫ (b) x 2 (C 1 ) 1/ ǫ, Remark 2.6 Corollaries 2.4 and 2.5 follow from Theorem 2.1 by setting ǫ = and σ S (x) l =, resectively. Furthermore, Corollary 2.5 can be roved indeendently of Theorem 2.1 leading to smaller constants. See [25] for the exlicit values of these imroved constants. Finally, note that setting ǫ = in Corollary 2.5, we obtain Theorem 1.5 as a corollary. Remark 2.7 In [17], Foucart and Lai give different sufficient conditions for exact recovery than those we resent. In articular, they show that if δ ms < g(m) := 4( 2 1)(m/2) 1/ 1/2 4( 2 1)(m/2) 1/ 1/2 +2 (16) holds for some m 2,m 1 S N, then will recover signals in Σ N S exactly. Note that the sufficient condition in this aer, i.e., (12), holds when δ ms < f(m) := (m 1)2/ 1 1 (m 1) 2/ 1 +1 (17) for some m 2,m 1 N. In Figure 1, we comare these different sufficient S conditions as a function of m for =.1,.5, and.9 resectively. Figure 1 indicates that neither sufficient condition is weaker than the other for all values of m. In fact, we can deduce that (16) is weaker when m is close to 2, while (17) is weaker when m starts to grow larger. Since both conditions are only sufficient, if either one of them holds for an aroriate m, then recovers all signals in Σ N S. Remark 2.8 In [12], Davies and Gribonval showed that if one chooses δ 2S > δ() (where δ() can be comuted imlicitly for < 1), then there exist matrices (matrices in R (N 1) N that corresond to tight Parseval frames in R N 1 ) with the rescribed δ 2S for which fails to recover signals in Σ N S. 9

10 1 Sufficient conditions for exact recovery.5 f g m 1.5 f g m 1.5 f g m Fig. 1.Acomarisonofthesufficientconditionsonδ ms in(17)and(16)asafunction of m, for =.1 (to), =.5 (center) and =.9 (bottom). Note that this result does not contradict with the results that we resent in this aer: we rovide sufficient conditions (e.g., (12)) in terms of δ (k+1)s, where k > 1 and ks N, that guarantee recovery by. These conditions are weaker than the corresonding conditions ensuring recovery by 1, which suggests that using can be beneficial. Moreover, the numerical examles we rovide in Section 3 indicate that by using, < < 1, one can indeed recover signals in Σ N S, even when 1 fails to recover them (see Figure 2). Remark 2.9 In summary, Theorem 2.1 states that if (12) is satisfied then we can recover signals in Σ N S stably by decoding with ǫ. It is worth mentioning that the sufficient conditions resented here reduce the ga between the conditions for exact recovery with (i.e., δ 2S < 1) and with 1, e.g., δ 3S < 1/3. For examle for k = 2 and =.5, δ 3S < 7/9 is sufficient. In the next subsection, we quantify this imrovement. 2.2 The relationshi between S 1 and S Let A be an M N matrix and suose δ m, m {1,..., M/2 } are its m- restricted isometry constants. Define S for A with < 1 as the largest 1

11 value of S N for which the slightly stronger version of (12) given by δ (k+1)s < k k (18) holds for some k > 1, k 1 S N. Consequently, by Theorem 2.1, (Ax) = x for all x Σ N S. We now establish a relationshi between S 1 and S. Proosition 2.1 Suose,in the abovedescribed setting, there exists S 1 N and k > 1, k 1 S 1 N such that δ (k+1)s1 < k 1 k +1 (19) Then 1 recovers all S 1 -sarse vectors, and recovers all S sarse vectors with k +1 S = S k Remark 2.11 For examle, if δ 5S1 < 3/5 then using 2, we can recover all 3 S2-sarse vectors with S2 = 5S Instance otimality in robability and In this section, we show that (A ω, ) is (2,2) instance otimal in robability when A ω is an aroriate random matrix. Our aroach is based on that of [3], which we summarize now. A matrix A is said to ossess the LQ 1 (α) roerty if and only if A(B N 1 ) αbm 2, where Bq n denotes the l q unit ball in R n. In [3], Wojtaszczyk shows that random Gaussian matrices of size M N as well as matrices whose columns aredrawnuniformlyfromtheshereossess,withhighrobability,thelq 1 (α) log(n/m) roertywithα = µ. Noting that such matrices also satisfy RIP((k+ M M 1)S,δ) with S < c, again with high robability, Wojtaszczyk roves log(n/m) that 1, for these matrices, is (2,2) instance otimal in robability of order S. Our strategy for generalizing this result to with < < 1 relies on a generalization of the LQ 1 roerty to an LQ roerty. Secifically, we say that a matrix A satisfies LQ (α) if and only if A(B N ) αbm 2. 11

12 We first show that a random matrix A ω, either Gaussian or uniform as mentioned above, satisfies the LQ (α) roerty with α = 1 ( ) µ 2log(N/M) (1/ 1/2). C() M Once we establish this roerty, the roof of instance otimality in robability for roceeds largely unchanged from Wojtaszczyk s roof with modificationstoaccountonlyforthenon-convexity ofthel -quasinormwith < < 1. Next, we resent our results on instance otimality of the decoder, while deferring the roofs to Section 4. Throughout the rest of the aer, we focus on two classes of random matrices: A ω denotes M N matrices, the entries of which are drawn from a zero mean, normalized column-variance Gaussian distribution, i.e., A ω = (a i,j ) where a i,j N(,1/ M); in this case, we say that A ω is an M N Gaussian random matrix. Ãω, on the other hand, denotes M N matrices, the columns of which are drawn uniformly from the shere; in this case we say that Ãω is an M N uniform random matrix. In each case, (Ω, P) denotes the associated robability sace. We start with a lemma (which generalizes an analogous result of [3]) that shows that the matrices A ω and Ãω satisfy the LQ roerty with high robability. Lemma 2.12 Let < 1, and let A ω be an M N Gaussian random matrix. For < µ < 1/ 2, suose that K 1 M(logM) ξ N e K 2M for some ξ > (1 2µ 2 ) 1 and some constants K 1,K 2 >. Then, there exists a constant c = c(µ,ξ,k 1,K 2 ) >, indeendent of, M, and N, and a set Ω µ = ω Ω : A ω(b N ) 1 C() such that P(Ω µ ) 1 e cm. ( µ 2logN/M M ) 1/ 1/2 B M 2 In other words, A ω satisfies the LQ (α), α = 1/C() ( ) µ 2 log(n/m) 1/ 1/2, M with robability 1 e cm on the draw of the matrix. Here C() is a ositive constant that deends only on. (In articular, C(1) = 1 and see (5) for the exlicit value of C() when < < 1). This statement is true also for Ãω. Theabovelemmafor = 1canbefoundin[3].AswewillseeinSection 4,the generalization of this result to < < 1 is non-trivial and requires a result from [18], cf. [23], relating certain distances of -convex bodies to their convex hulls. It is imortant to note that this lemma rovides the machinery needed to rove the following theorem, which extends to, < < 1, the analogous result of Wojtaszczyk [3] for 1. 12

13 In what follows, for a set T {1,...,N}, T c := {1,...,N}\T; for y R N, y T denotes the vector with entries y T (j) = y(j) for all j T, and y T (j) = for j T c. Theorem( 2.13 Let < < 1. Suose that A R M N satisfies RIP(S,δ) and LQ 1 C() (µ2 /S) 1/ 1/2) for some µ > and C() as in (5). Let be an arbitrary decoder. If (A, ) is (2,) instance otimal of order S with constant C 2,, then for any x R N and e R M, all of the following hold. (i) (Ax+e) x 2 C( e 2 + σ S(x) l S 1/ 1/2 ) (ii) (Ax) x 2 C( Ax T c 2 +σ S (x) l 2) (iii) (Ax+e) x 2 C( e 2 +σ S (x) l 2 + Ax T c 2 ) Above, T denotes the set of indices of the largest (in magnitude) S coefficients of x; the constants (all denoted by C) deend on δ, µ,, and C 2, but not on M and N. For the exlicit values of these constants see (38) and (39). Finally, our main theorem on the instance otimality in robability of the decoder follows. Theorem 2.14 Let < < 1, and let A ω be an M N Gaussian random matrix. Suose that N M[log(M)] 2. There exists constants c 1,c 2,c 3 > such that for all S N with S c 1 M/log(N/M), the following are true. (i) There exists Ω 1 with P(Ω 1 ) 1 3e c 2M such that for all ω Ω 1 (A ω (x)+e) x 2 C( e 2 + σ S(x) l S1/ 1/2), (2) for any x R N and for any e R M. (ii) For any x R N, there exists Ω x with P(Ω x ) 1 4e c 3M such that for all ω Ω x for any e R M. (A ω (x)+e) x 2 C( e 2 +σ S (x) l 2), (21) The statement also holds for Ãω, i.e., for random matrices the columns of which are drawn indeendently from a uniform distribution on the shere. Remark 2.15 The constants above (both denoted by C) deend on the arameters of the articular LQ and RIP roerties that the matrix satisfies, and are given exlicitly in Section 4, see (38) and (41). The constants c 1,c 2, and c 3 deend only on and the distribution of the underlying random matrix (see the roof in Section 4.5) and are indeendent of M and N. 13

14 Remark 2.16 Clearly, the statements do not make sense if the hyothesis of thetheoremforcess tobe.inturn,foragiven(m,n)air,itisossiblethat there is no ositive integer S for which the conclusions of Theorem 2.14 hold. In articular, to get a non-trivial statement, one needs M > 1 c 1 log(n/m). Remark 2.17 Note the difference in the order of the quantifiers between conclusions (i) and (ii) of Theorem Secifically, with statement (i), once the matrix is drawn from the good set Ω 1, we obtain the error guarantee (2) for every x and e. In other words, after the initial draw of a good matrix A, stability and robustness in the sense of (2) are ensured. On the other hand, statement (ii) concludes that associated with every x is a good set Ω x (ossibly different for different x) such that if the matrix is drawn from Ω x, then stability and robustness in the sense of (21) are guaranteed. Thus, in (ii), for every x, a different matrix is drawn, and with high robability on that draw (21) holds. Remark 2.18 The abovetheorem ertainstothedecoders which, like the analogous theorem for 1 resented in [3], requires no knowledge of the noise level. In other words, rovides estimates of sarse and comressible signals from limited and noisy observations without having to exlicitly account for the noise in the decoding. This rovides an imrovement on Theorem 2.1 and a ractical advantage when estimates of measurement noise levels are absent. 3 Numerical Exeriments In this section, we resent some numerical exeriments to highlight imortant asects of sarse reconstruction by decoding using, < 1. First, we comare the sufficient conditions under which decoding with guarantees erfect recovery of signals in Σ N S for different values of and S. Next, we resent numerical results illustrating the robustness and instance otimality of the decoder. Here, we wish to observe the linear growth of the l 2 reconstruction error (Ax+e) x 2, as a function of σ S (x) l 2 and of e 2. To that end, we generate a 1 3 matrix A whose columns are drawn from a Gaussian distribution and we estimate its RIP constants δ S via Monte Carlo (MC) simulations. Under the assumtion that the estimated constants are the correct ones (while in fact they are only lower bounds), Figure 2 (left) shows the regions where (12) guarantees recovery for different (S, )-airs. On the other hand, Figure 2 (right) shows the emirical recovery rates via l quasinorm minimization: To obtain this figure, for every S = 1,...,49, we chose 5 different instances of x Σ 3 S where non-zero coefficients of each were drawn i.i.d. from the standard Gaussian distribution. These vectors were encoded using the same measurement matrix A as above. Since there is 14

15 no known algorithm that will yield the global minimizer of the otimization roblem (11), we aroximated the action of by using a rojected gradient algorithm on a sequence of smoothed versions of the l minimization roblem: In (11), instead of minimizing the y, we minimized ( i(yi 2 +ǫ 2 ) /2) 1/ initially with a large ǫ. We then used the corresonding solution as the starting oint of the next subroblem obtained by decreasing the value of ǫ according to the rule ǫ n = (.99)ǫ n 1. We continued reducing the value of ǫ and solving the corresonding subroblem until ǫ becomes very small. Note that this aroach is similar to the one described in [7]. The emirical results show that (in fact,thearoximationof asdescribed above)issuccessful inawider range of scenarios than those redicted by Theorem 2.1. This can be attributed to the fact that the conditions resented in this aer are only sufficient, or to the fact that in ractice what is observed is not necessarily a manifestation of uniform recovery. Rather, the ractical results could be interreted as success of with high robability on either x or A. Region where recovery with is "guaranteed" for and S (Light Shading = Recoverable) Emirical Recovery Rates with S S Fig. 2. For a Gaussian matrix A R 1 3, whose δ S values are estimated via MC simulations, we generate the theoretical (left) and ractical (right) hase-diagrams for reconstruction via l minimization. Next, we generate scenarios that allude to the conclusions of Theorem To that end, we generate a signal comosed of x T Σ 3 4, suorted on an index set T, and a signal z T c suorted on T c, where all the coefficients are drawn from the standard Gaussian distribution. We then normalize x T and z T c so that x T 2 = z T c 2 = 1 and generate x = x T +λz T c with increasing values of λ (starting from ), thereby increasing σ 4 (x) l 2 λ. For this exeriment, we choose our measurement matrix A R 1 3 by drawing its columns uniformly from the shere. For each value of λ we measure the reconstruction error (Ax) x 2, and we reeat the rocess 1 times while randomizing the index set T but reserving the coefficient values. We reort the averaged results in Figure 3 (left) for different values of. Similarly, we generate noisy observations Ax T +λe, of a sarse signal x T Σ 3 4 where x T 2 = e 2 = 1 15

16 Performance of for Comressible Signal Vs Comressibility S=4, x T 2 = 1, z T c 2 = 1, x = x T + λ z T c; Performance of for Noisy Observation Vs Noise S=4, x T 2 =1, e 2 =1 (Ax) x =.4 =.6.5 =.8 = λ (Ax+λ e) x =.4 =.6.5 =.8 = λ Fig. 3. Reconstruction error with comressible signals (left), noisy observations (right). Observe the almost linear growth of the error in comressible signals and for different values of, highlighting the instance otimality of the decoders. The lots were generated by averaging the results of 1 exeriments with the same matrix A and randomized locations of the coefficients of x. and we increase the noise level starting from λ =. Here, again, the nonzero entries of x T and all entries of e were chosen i.i.d. from the standard Gaussian distribution and then the vectors were roerly normalized. Next, we measure (Ax T +λe) x T 2 (for 1 realizations where we randomize T) and reort the averaged results in Figure 3 (right) for different values of. In both these exeriments, we observe that the error increases roughly linearly as we increase λ, i.e., σ 4 (x) l 2 and the noise ower, resectively. Moreover, when the signal is highly comressible or when the noise level is low, we observe that reconstruction using with < < 1 yields a lower aroximation error than that with = 1. It is also worth noting that for values of close to one, even in the case of sarse signals with no noise, the average reconstruction error is non-zero. This may be due to the fact that for such large the number of measurements is not sufficient for the recovery of signals with S = 4, further highlighting the benefits of using the decoder, with smaller values of. Finally, in Figure 4, we lot the results of an exeriment in which we generate signals x R 2 with sorted coefficients x(j) that decay according to some ower law. In articular, for various values of < q < 1, we set x(j) = cj 1/q such that x 2 = 1.Wethenencodexwith5different 1 2measurement matrices the columns of which were drawn from the uniform distribution on the shere, and examine the aroximations obtained by decoding with for different values of < < 1. The results indicate that values of q rovide the lowest reconstruction errors. Note that in Figure 4, we reort the results in form of signal to noise ratios defined as SNR = 2log 1 ( x 2 (Ax) x 2 ). 16

17 Average Reconstruction Signal to Noise Ratio in db Performance of on Comressible Signals =.4 =.6 =.8 = q Average Reconstruction Signal to Noise Ratio in db 65 Performance of on Comressible Signals 6 q= q= q= Fig. 4. Reconstruction signal to noise ratios (in db) obtained by using to recover signals whose sorted coefficients decay according to a ower law (x(j) = cj 1/q, x 2 = 1) as a function of q (left) and as a function of (right). The resented results are averages of 5 exeriments erformed with different matrices in R 1 2. Observe that for highly comressible signals, e.g., for q =.4, there is a 5 db gain in using <.6 as comared to = 1. The erformance advantage is about 2 db for q =.6. As the signals become much less comressible, i.e., as we increase q to.9 the erformances are almost identical. 4 Proofs 4.1 Proof of Proosition 2.1 First, note that for any A R M N, δ m is non-decreasing in m. Also, the ma k k 1 is increasing in k for k. k+1 Set Then L := (k +1)S 1, l = k 2, and S = L l+1. δ ( l+1) S = δ (k+1)s1 < k 1 l k +1 = We now describe how to choose l and S such that l l, S N, and (l+1)s = L (this will be sufficient to comlete the roof using the monotonicity observations above). First, note that this last equality is satisfied only if (l,s ) is in the set l 2 n {(,L n) : n = 1,...,L 1}. L n Let n be such that n 1 L n +1 < l n L n. (22) To see that such an n exists, recall that l = k 2 where < < 1. Also, 17

18 (k+1)s 1 = L with S 1 N, and k > 1. Consequently, 1 < l < k L 1, and k { n : n = L n L,...,L 1}. Thus, we know that we can find 2 n as above. n Furthermore, > 1. It follows from (22) that L n We now choose L n S < L n +1. l = n L n, and S = S = L n. Then (l+1)s = L, and l l. So, we conclude that for l as above and we have k +1 S = S = S k 1, 2 +1 l2 1 δ (l+1)s < +1. Consequently, the condition of Corollary 2.5 is satisfied and we have the desired conclusion. l Proof of Theorem 2.1 We modify the roof of Candès et. al. of the analogous result for the encoder 1 (Theorem2in[4])toaccountforthenon-convexityofthel quasinorm.we give the full roof for comleteness. We stick to the notation of [4] whenever ossible. Let < < 1, x R N be arbitrary, and define x := ǫ (b) and h := x x. Our goal is to obtain an uer bound on h 2 given that Ah 2 2ǫ (by definition of ǫ ). Below, for a set T {1,...,N}, T c := {1,...,N}\T; for y R N, y T denotes the vector with entries y T (j) = y(j) for all j T, and y T (j) = for j T c. (I) We start by decomosing h as a sum of sarse vectors with disjoint suort.inarticular, denotebyt theset ofindices ofthelargest (inmagnitude) S coefficients of x (here S is to be determined later). Next, artition T c o into sets T 1,T 2,..., T j = L for j 1 where L N (also to be determined later), such that T 1 is the set of indices of the L largest (in magnitude) coefficients of h T c, T 2 is the set of indices of the second L largest coefficients of h T c, and so on. Finally let T 1 := T T 1. We now obtain a lower bound for Ah 2 using the RIP constants of the matrix A. In articular, we have 18

19 Ah 2= Ah T1 + j 2Ah Tj 2 Ah T1 2 j 2 Ah Tj 2 (1 δ L+ T ) /2 h T1 2 (1+δ L ) /2 j 2 h Tj 2. (23) Above, together with RIP, we used the fact that 2 satisfies the triangle inequality for any < < 1. What now remains is to relate h T1 2 and j 2 h Tj 2 to h 2. (II) Next, we aim to bound j 2 h Tj 2 from above in terms of h 2. To that end, we roceed as in [4]. First, note that h Tj+1 (l) h Tj (l ) for all l T j+1,l T j, and thus h Tj+1 (l) h Tj /L. It follows that h Tj L 1 2 htj 2, and consequently Tj j 2 h 2 L 2 1 h Tj = L 2 1 h T c o. (24) j 1 Next, note that, similar to the case when = 1 as shown in[4], the error h is concentrated on the essential suort of x (in our case T ). To quantify this claim, we reeat the analogous calculation in [4]: Note, first, that by definition of x, x = x+h = x T +h T + x T c +h T c x. As satisfies the triangle inequality, we then have Consequently, which, together with (24), imlies j 2 x T h T + h T c x T c x. h T c o h T +2 x T c, (25) h Tj 2 L 2 1 ( h T +2 x T c ) ρ1 2 ( ht1 2+2 T 2 1 x T c ), (26) where ρ := T L, and we used the fact that h T T 1 2 h T 2 (which follows as su(h T ) = T ). Using (26) and (23), we obtain where Ah 2 C,L, T h T1 2 2ρ 1 2 T 2 1 (1+δ L ) 2 xt c, (27) C,L, T := (1 δ L+ T ) 2 (1+δL ) 2 ρ 1 2. (28) At this oint, using Ah 2 2ǫ, we obtain an uer bound on h T1 2 given by ( h T1 1 2 (2ǫ) x +2ρ 1 2 (1+δL ) T c 2 ), (29) T 1 2 C,L, T 19

20 rovided C,L, T > (this will imose the condition given in (12) on the RIP constants of the underlying matrix A). (III) To comlete the roof, we will show that the error vector h is concentrated on T 1. Denote by h T c [m] the mth largest (in magnitude) coefficient of h T c and observe that h T c [m] h T c /m. As h T1 c [m] = h T c [L+m], we then have h T c = m L+1 h T c [m] 2 m L+1 ( )2 ht c m Here, the last inequality follows because for < < 1 m L+1 m 2 Finally, we use (25) and (3) to conclude L t 2 dt = 1 L 2 1 (2/ 1). h T c 2 L 2 1 (2/ 1). (3) [ h 2 2= h T h T c h T1 2 ht ]2 +2 x T c 2 + L 1 2(2/ 1) 2 [ (1+ρ 1 2 (2/ 1) 2) ht1 2 +2ρ 1 2 (2/ 1) x T c 2 ]2.(31) T 1 2 Above, weusedthefactthat h T T 1 2 h T 2,andthatforanya,b, and α 1, a α +b α (a+b) α. (IV) We now set T = S, L = ks where k and S are chosen such that C,kS,S > which is equivalent to having k, S, and satisfy (12). In this case, x T c = σ S (x) l, ρ = 1/k, and combining (29) and (31) yields h 2 C 1 ǫ +C 2 σ S (x) l S 1 2 (32) where C 1 and C 2 are as in (14) and (15), resectively. 4.3 Proof of Lemma ( I) The following result of Wojtaszczyk [3, Proosition 2.2] will be useful. Proosition 4.1 ( [3]) Let A ω be an M N Gaussian random matrix, let < µ < 1/ 2, and suose that K 1 M(logM) ξ N e CM for some ξ > (1 2µ 2 ) 1 and some constants K 1,K 2 >. Then, there exists a constant 2

21 c = c(µ,ξ,k 1,K 2 ) >, indeendent of M and N, and a set such that Ω µ = ω : A ω(b1 N ) µ P(Ω µ ) 1 e cm. The above statement is true also for Ãω. log N/M M BM 2 We will also use the following adatation of [18, Lemma 2] for which we will first introduce some notation. Define a body to be a comact set containing the origin as an interior oint and star shaed with resect to the origin [23]. Below, we use conv(k) to denote the convex-hull of a body K. For K B, we denote by d 1 (K,B) the distance between K and B given by d 1 (K,B) := inf{λ > : K B λk} = inf{λ > : 1 λ B K B}. Finally, we call a body K -convex if for any x,y K, λx+µy K whenever λ,µ [,1] such that λ +µ = 1. Lemma 4.2 Let < < 1, and let K be a -convex body in R n. If conv(k) B n 2, then d 1 (K,B n 2 ) C()d 1(conv(K),B n 2 )(2/ 1), where ( )2 ( C() = (1 )21 /2 2 1 (1 )log2 We defer the roof of this lemma to the Aendix. ) (II) Note that Ãω(B N 1 ) B M 2. This follows because Ãω 1 2, which is equal to the largest column norm of Ãω, is 1 by construction. Thus, for x B N 1, Ãω(x) 2 Ãω 1 2 x 1 1, that is, Ãω(B1 N) BM 2, and so d 1(Ãω(B1 N),BM 2 ) is well-defined. Next, by Proosition 4.1, we know that there exists Ω µ with P(Ω µ ) 1 e cm such that for all ω Ω µ, logn/m Ã ω (B1 N ) µ M BM 2 (33) From this oint on, let ω Ω µ. Then B M 2 Ãω(B N 1 ) µ log N/M M BM 2, 21

22 and consequently d 1 (Ãω(B1 N ),BM 2 ) log N/M µ M 1. (34) The next ste is to note that conv(b N ) = BN 1 and consequently conv (Ãω(B N )) = Ãω ( conv(b N ) ) = Ãω(B N 1 ). We can now invoke Lemma 4.2 to conclude that d 1 (Ãω(B N ),BM 2 ) C()d 1(conv(Ãω(B N )),BM 2 )2 Finally, by using (34), we find that and consequently d 1 (Ãω(B N ),B M 2 ) C() = C()d 1 (Ãω(B N 1 ),B M 2 ) 2. (35) ( µ 2logN/M M ) 1/2 1/, (36) Ã ω (B N ) 1 ( ) µ 2logN/M (1/ 1/2) B2 M. (37) C() M In other words, the matrix Ãω has the LQ (α) roerty with the desired value of α for every ω Ω µ with P(Ω µ ) 1 e cm. Here c is as secified in Proosition 4.1. To see that the same is true for A ω, note that there exists a set Ω with P(Ω ) > 1 e cm such that for all ω Ω, A j (ω) 2 < 2, for every column A j of A ω (this follows from RIP). Using this observation one can trace the above roof with minor modifications. 4.4 Proof of Theorem We start with the following lemma, the roof of which for < 1 follows with very little modification from the analogous roof of Lemma 3.1 in [3] and shall be omitted. Lemma 4.3 Let < < 1 and suose that A satisfies RIP(S,δ) and LQ ( γ /S 1/ 1/2) with γ := µ 2/ 1 /C(). Then for every x R N, there 22

23 exists x R N such that Ax = A x, x S1/ 1/2 γ Ax 2, and x 2 C 3 Ax 2. Here, C 3 = 1 γ + γ(1 δ)+1 (1 δ 2 )γ. Note that C 3 deends only on µ, δ and. Wenowroceed torove Theorem Ourrooffollowsthestes of[3] and differs in the handling of the non-convexity of the l quasinorms for < < 1. First, recall that Asatisfies RIP(S,δ) andlq (γ /S 1/ 1/2 ), so by Lemma 4.3, thereexists z R N such thataz = e, z S1/ 1/2 γ e 2,and z 2 C 3 e 2. Now, A(x+z) = Ax+e, and is (2,) instance otimal with constant C 2,. Thus, (A(x)+e) (x+z) 2 C 2, σ S (x+z) l S 1/ 1/2, and consequently (A(x)+e) x 2 z 2 +C 2, σ S (x+z) l S 1/ 1/2 C 3 e 2 +C 2, σ S (x+z) l S 1/ 1/2 C 3 e / 1 C 2, σ S (x) l + z S 1/ 1/2 C 3 e / 1 C 2, σ S (x) l S 1/ 1/2 +21/ 1 C 2, e 2 γ, whereinthethirdinequality weusedthefactinanythatl quasinormsatisfies the inequality a+b ( a + b ) for all a,b R N. So, we conclude (A(x)+e) x 2 ( ) C / 1 C 2, /γ e / 1 σ S (x) l C 2, S1/ 1/2. (38) That is (i) holds with C = C / 1 C 2, (1/γ +1). Next, we rove arts (ii) and (iii) of Theorem As in the analogous roof of [3], Theorem 2.13 (ii) can be seen as a secial case of Theorem 2.13 (iii), with e =. We therefore turn to roving (iii). Once again, by Lemma 4.3, there exists v and z in R N such that the following hold. Av = e; v S1/ 1/2 γ e 2, v 2 C 3 e 2, and Az = Ax T c ; z S1/ 1/2 γ Ax T c 2, z 2 C 3 Ax T c 2. Here T is the set of indices of the largest (in magnitude) S coefficients of x, and T c and x T c o are as in the roof of Theorem

24 Similar to the revious art we can see that A(x T +z +v) = Ax+e and by the hyothesis of (2,) instance otimality of, we have (Ax+e) (x T +z +v) 2 C 2, σ S (x T +z +v) l S 1/ 1/2. Consequently observing that x T = x x T c and using the triangle inequality, (A(x)+e) x 2 x T c z v 2 +C 2, σ S (x T +z +v) l x T c z v / 1 (C 2, ) S 1/ 1/2 ( ) z + v S 1/ 1/2 ( σ S (x) l 2 + z 2 + v / 1 AxT c C 2 2, + e ) 2 γ γ ( ) σ S (x) l 2 + C / 1C 2, ( e 2 + Ax T c γ 2 ). (39) That is (iii) holds with C = 1+C / 1 C 2, γ. By setting e =, one can see that this is the same constant associated with (ii). This concludes the roof of this theorem. 4.5 Proof of Theorem First, we show that (A ω, ) is (2,) instance otimal of order S for an aroriate range of S with high robability. One of the fundamental results in comressed sensing theory states that for any δ (,1), there exists c 1, c 2 > and Ω RIP with P(Ω RIP ) 1 2e c 2M, all deending only on δ, such that A ω, M ω Ω RIP, satisfies RIP(l,δ) for any l c 1. See, e.g., [6], [1], for the log(n/m) roof of this statement as well as for the exlicit values of the constants. Now, choose δ (,1) such that δ < 22/ 1 1. Then, with c 2 2/ , c 2, and Ω RIP as above, for every ω Ω RIP and for every S < c 1 3 satisfy (18) (and hence (12)), with k = 2. Thus, by Corollary 2.4 (A ω, ) is instance otimal of order S with constant C 1/ 2 as in (15). M log(n/m), the RIP constants of A ω M Now, set S 1 = c 1 with c log(n/m) 1 c 1 /3 such that S 1 N (note that such a c 1 exists if M and N are sufficiently large). By the hyothesis of the theorem, M and N satisfy the hyothesis of the Lemma 2.12 with ξ = 2, K 1 = 1, some < µ < 1/2, and an aroriate K 2 (determined by c 1 above). Because ( µ 2log(N/M) M ) 1/ 1/2 ( = µ 2c ) 1/ 1/2 1 S 1 by Lemma 2.12, there exists Ω µ, P(Ω µ ) 1 e cm such that for every ω Ω µ, 24

25 ( ) A ω satisfies LQ γ(µ) S 1/ 1/2 where γ (µ) := c1/ 1/2 1 1 µ 2/ 1 C(). Consequently, set Ω 1 := Ω RIP Ω µ. Then, P(Ω 1 ) 1 2e c 2M e cm 1 3e c2m, for c 2 = min{ c 2,c}. Note that c 2 deends on c, which is now a universal constant, and c 2, which deends only on the distribution of A ω (and in articular its concentration of measure roerties, see [1]). Now, if ω Ω 1, A ω satisfies ( ) γ RIP(3S 1,δ),thusRIP(S 1,δ),aswellasLQ 1/ 1/2 S.Thereforewecanaly 1 art (i) of Theorem 2.13 to get the first art of this theorem, i.e., (A ω (x)+e) x 2 C ( e 2 + σ ) S 1 (x) l 1/ 1/2 S 1. (4) Here C is as in (38) with C 2, = C 1/ 2. To finish the roof of art (i), note that for S S 1, σ S1 (x) l σ S (x) l and S 1/ 1/2 S 1/ 1/2 1. To rove art (ii), first define T as the suort of the S 1 largest coefficients (in magnitude) of x and T c = {1,...,N}\T. Now, note that for any x there exists a set Ω x with P( Ω x ) 1 e cm for some universal constant c >, such that for all ω Ω x, A ω x T c 2 2 x T c 2 = 2σ S1 (x) l 2 (this follows from the concentration of measure roerty of Gaussian matrices, see, e.g., [1]). Define Ω x := Ω x Ω 1. Thus, P(Ω x ) 1 3e c2m e cm 1 4e c3m where c 3 = min{c 2, c}. Note that the deendencies of c 3 are identical to those of c 2 discussed ( above. ) Recall that for ω Ω 1, A ω satisfies both RIP(S 1,δ) and γ LQ. We can now aly art (iii) of Theorem 2.13 to obtain for (S 1 ) 1/ 1/2 ω Ω x (A ω (x)+e) x 2 C(3σ S1(x) l 2 + e 2 ). (41) Above, the constant C is as in (39). Once again, note that for S S 1, σ S1 (x) l 2 σ S (x) l 2 to finish the roof for any S S 1. 5 Aendix: Proof of Lemma 4.2 In this section we rovide the roof of Lemma 4.2 for the sake of comleteness and also because we exlicitly calculate the otimal constants involved. Let us first introduce some notation used in [18] and [23]. For a body K R n, define its gauge functional by x K := inf{t > : x tk}, and let T q (K), q (1,2], be the smallest constant C such that m N, x 1,...,x m K inf ǫ i =±1 { } m ǫ i x i K Cm 1/q. i=1 25

26 Given a -convex body K and a ositive integer r, define α r = α r (K) := su{ r i=1 x i K r : x i K,i r}. Note that α r r 1+1/. Finally, conforming with the notation used in [18] and [23], we define δ K := d 1 (K,conv(K)). Note that this should not cause confusion as we do not refer to the RIP constants throughout the rest of the aer. It can be shown by a result of [24] that δ K = su r α r (K), cf. [18, Lemma 1] for a roof. We will need the following roositions. Proosition 5.1 (sub-additivity of K) For the gauge functional K associated with a -convex body K R n, the following inequality holds for any x,y R n. x+y K x K + y K. (42) PROOF. Let r = x K and u = y K. If at least one of r and u is zero, then (42) holds trivially. (Note that, as K is a body, x K = if and only if x =.) So, we may assume that both r and u are strictly ositive. Since K is comact, it follows that x/r K and y/u K. Furthermore, K is -convex, i.e., for all α,β [,1] with α + β = 1, we have α 1/ x/r + β 1/ y/u K. In articular, choose α = r and β = u x+y. This gives K. r +u r +u (r +u ) 1/ x+y Consequently, by the definition of the gauge functional (r +u ) 1/ K 1. x+y Finally, (r +u ) 1/ K = x+y K (r +u ) 1and x+y K r +u = x K+ y K. Proosition 5.2 T 2 (B n 2) = 1. PROOF. Note that B n 2 = 2, and thus, by definition, T 2 (B n 2) is the smallest constant C such that for every ositive integer m and for every choice of oints x 1,...,x m B 2, { } m inf ǫ i x i 2 C m. (43) ǫ i =±1 i=1 For m n, we can choose {x 1,...,x m } to be orthonormal. Consequently, m m ǫ i x i 2 2 = ǫ 2 i = m, i=1 i=1 and thus, T 2 = T 2 (B2) n 1. On the other hand, let m be an arbitrary ositive 26

A Note on Guaranteed Sparse Recovery via l 1 -Minimization

A Note on Guaranteed Sparse Recovery via l 1 -Minimization A Note on Guaranteed Sarse Recovery via l -Minimization Simon Foucart, Université Pierre et Marie Curie Abstract It is roved that every s-sarse vector x C N can be recovered from the measurement vector