arxiv: v1 [cs.it] 29 May 2013

Size: px

Start display at page:

Download "arxiv: v1 [cs.it] 29 May 2013"

Rosemary Golden
6 years ago
Views:

1 GREEDY TYPE ALGORITHMS FOR RIP MATRICES. A STUDY OF TWO SELECTION RULES. EUGENIO HERNÁNDEZ AND DANIEL VERA arxiv: v1 [cs.it] 29 May 2013 Abstract. On [24] some consequences of the Restricted Isometry Property (RIP) of matrices have been applied to develop a greedy algorithm called ROMP (Regularized Orthogonal Matching Pursuit) to recover sparse signals and to approximate non-sparse ones. These consequences were subsequently applied to other greedy and thresholding algorithms lie SThresh, CoSaMP, StOMP and SWCGP. In thispaper, wefind anotherconsequenceoftherippropertyanduseit toanalyzethe approximation to -sparse signals with Stagewise Wea versions of Gradient Pursuit (SWGP), Matching Pursuit (SWMP) and Orthogonal Matching Pursuit (SWOMP) algorithms described in in [5]. We combine the above mentioned algorithms with another selection rule similar to the ones that appeared in [8] and [15] showing that results are obtained with less restrictions in the RIP constant, but we need a smaller threshold parameter for the coefficients. The results of some experiments are shown. 1. Introduction One problem in Compressed Sensing (CS) is to reconstruct a sparse vector (all except elements are zero) x R N from a lower dimension vector y = Φx, where Φ R m N is called a CS matrix or measurements ensemble. The aim of CS is to compress the signal while taing samples at the same time in such a way that a good reconstruction is possible. The next definition (see [6]) is a sufficient condition for the so called CS matrices to yield exact reconstruction of sparse signals with the Basis Pursuit (BP) algorithm or l 1 minimization with equality constraints (as firstly proposed by Donoho and collaborators for dictionaries in the signal processing community). This property will play a central role in the results developed here. Definition 1.1. Given N, a matrix Φ C m N (m > ) is said to satisfy the Restricted Isometry Property with parameter δ, 0 < δ < 1 (called the Restricted Isometry Constant), if (1 δ ) x 2 l 2 (R N ) Φx 2 l 2 (R m ) (1+δ ) x 2 l 2 (R N ) (1.1) for all sparse vectors x R N. It is nown that those matrices that satisfy RIP and allow the least number of measurements m for reconstruction of all sparse signals are some random matrices (therefore reconstruction is in probability). For example, in [6] and [14], it is shown that it suffices to tae m linearly with the sparsity and polylogarithmic with the Date: May 31, Mathematics Subject Classification. 41A46, 68W20. Key words and phrases. Compressed sensing, convergence rate, greedy algorithms, random matrices, restricted isometry property. Research supported by Grant MTM of Spain. 1

2 2 EUGENIO HERNÁNDEZ AND DANIEL VERA ambient dimension N, i.e. m C log(n/) for Gaussian and Bernoulli matrices, and m Clog 6 (N) for Fourier matrices (for better bounds see [26]) to reconstruct, with high probability, a -sparse vector with the BP algorithm. It has become standard to use greedy algorithms to iteratively identify the support Γ := supp (x) of a sparse signal. This is done by computing the inner product of the residue r n 1 of the approximation at step n 1 and the columns of the matrix Φ, i.e. g n = Φ r n 1 (Φ denotes de transpose of Φ), and then select the largest element(s) (in absolute value) in g n = (g n 1,...,gn N )t, where each g n i = φ i,r n 1 and φ i is the i-th column of Φ. Since Φ verifies the RIP property, then such an inner product gives an idea on where the support may be because the square of the energies of the -sparse vector signal x and the observation vector y = Φx should not differ more than δ. To see this more clearly, suppose we now the true support of x, Γ = supp (x), and y = Φx is the observation. Writing Φ Γ (resp. x Γ ) to denote the matrix Φ (resp. the vector x) restricted to the columns (resp. the elements) indexed by Γ 1,2,...,N}, and (Φ Γ ) = (Φ Γ Φ Γ ) 1 Φ Γ for the pseudo inverse of Φ Γ (which exists by (1.1)), we can recover x from y using (Φ Γ ) since y = Φx = Φ Γ x Γ and (Φ Γ ) y = (Φ Γ Φ Γ ) 1 Φ Γ Φ Γ x Γ = x Γ. (1.2) We shall use the notation R Γ to denote the subspace of the ambient space R N with significant coordinates in Γ 1,...,N}. Notice the prominent role of Φ Γ : R m R Γ in the above argument. The matrix Φ (the transpose of Φ) will be used in the algorithms below. Also, r span (Φ Γ ) means that r is a linear combination of the columns of Φ indexed by Γ. Section 2 studies conditions to identify the support of a sparse signal sensed with RIP matrices using different greedy type algorithms. A review of Stagewise Wea versions of the Gradient Pursuit (SWGP), Matching Pursuit (SWMP), and Orthogonal Matching Pursuit (SWOMP) algorithms, as first proposed in [5], is done in Subsection 2.1. The main novelty, as pointed out in [5], is that not only one but several elements are allowed to be selected in each iteration. This is a feature also present in [15], a paper published in 2012 but circulated since 2006 as a preprint. Wea algorithms were used in non-linear approximation theory before the development of Compressed Sensing (see [17], [2] and more recently [12], [27], [28], [29] and the references therein), but in all of these wors only one element was selected at each iteration. Known properties of RIP matrices are stated in Subsection 2.2, while a new property of these matrices is proved in Subsection 2.3 (see Lemma 2.2). We give conditions on Φ and on the weaness parameter α of the selection rule to identify the support of a sparse signal in Subsection 2.4 (see Theorem 2.3) for all of the Stagewise Wea algorithms mentioned above. Another selection rule, called relaxed in this paper, is introduced in Section 3 given rise to new algorithms that we name Relaxed Wea Gradient Pursuit (RWGP), Relaxed Wea Matching Pursuit (RWMP) and Relaxed Wea Orthogonal Matching Pursuit (RWOMP). The strategy of selecting several elements in each iteration is also

3 GREEDY TYPE ALGORITHMS FOR RIP MATRICES 3 used in these algorithms. In the Relaxed selection rule, elements are chosen if their magnitude is larger than a fraction of the energy of the residue at an iteration. This procedure has also been used in [8] and [15] (see the second paragraph in Section 3). The name Wea Relaxed has appeared in the non-linear approximation theory associated to greedy algorithms (see [12], [27]), but with a different meaning that in this paper. In Theorem 3.1 we give conditions on a matrix Φ satisfying RIP and on the weaness parameter α of the Relaxed selection rule to identify the support of a sparse signal. In Section 4 the convergence of all the above algorithms is studied. The energy of the residual of the observation at iteration n is compared with the energy at iteration n 1, i.e, r n l 2 C r n 1 l 2.FortheGP,SWGPandRWGPalgorithmsweestablish in Theorem 4.1 the above inequality with C = (1 1 δ (1+δ ) )1/2 < 1; this is a more explicit version than the ones already nown for GP and SWGP (see details in Section 4). For the SWMP, SWOMP, RWMP, and RWOMP the result is stated in Theorem 4.7 giving C = (1 (1 δ ) 2 ) 1/2 < 1. Notice that Gradient Pursuit algorithms seem to have stronger rate of convergence than Matching Pursuit ones, a fact present in the experiments shown for images in Section 6. In Section 5 we study the behavior of the selection rules with particular Gaussian and Bernoulli random matrices not necessarily satisfying RIP. Here we prove that with high probability the algorithms allow to recover the position of the entries of a given -sparse signal. Finally, some experiments are shown in Section 6 for the algorithms described in the above sections, and compare results with already existing algorithms to recover sparse signals and approximate compressible images with a sparse representation. 2. Support Identification with RIP. We will review Stagewise Wea versions of the Gradient Pursuit (SWGP), Matching Pursuit (SWMP) and Orthogonal Matching Pursuit (SWOMP) algorithms and some consequences of the RIP property. Then, we will develop a new consequence of RIP and find some conditions so the algorithms select elements on the support of a sparse signal x on each iteration SWGP, SWMP and SWOMP algorithms. The Stagewise Wea Gradient Pursuit (SWGP), Stagewise Wea Matching Pursuit (SWMP) and Stagewise Wea Orthogonal Matching Pursuit (SWOMP) algorithms select a set of elements (possibly not new) in each iteration by comparing the maximum of the inner products between the columns of the measurement ensemble with the residue at the previous iteration. This stage in the algorithms is the selection rule, where a weaness parameter α is introduced. The main differences between these algorithms are the direction search and the way to update the approximation. For some history on MP and OMP see [30] and references therein. The stagewise wea selection rule is defined in [5]. STAGEWISE WEAK MATCHING PURSUIT (SWMP)

4 4 EUGENIO HERNÁNDEZ AND DANIEL VERA TheMPalgorithmwasthefirstofthegreedyalgorithmstobeusedinsignalprocessing (see [23]); as far as we now, the weaness parameter appeared for the first time in [17] and has been extensively used in non-linear approximation (see [12], [27], [28], [29] and the references therein). In the initialisation we set: at iteration n = 0 the residue is r 0 = y, the approximation to the observation is y 0 = 0 and the estimation of the signal is x 0 = 0. Recall that φ i denotes the i-th column of Φ. The loop until some criteria are met follows the next steps: g n = Φ r n 1, the proxy of the signal. } I n := I n (α) := i : gi n α Φ r n 1 l (R N ), the stagewise wea selection rule with 0 < α 1. y n = y n 1 + i I n gi nφ i = y n 1 + i I n φ i,r n 1 φ i, approximation to the observation. x n i = x n 1 i +g n i = x n 1 i + φ i,r n 1, i I n and x n j = x n 1 j of the signal. r n = r n 1 i I n g n i φ i = r n 1 i I n φ i,r n 1 φ i, the residual. if j / I n, estimation Byrecursiononecanseefromthedefinitionofy n andr n thatr n = r n 1 (y n y n 1 ) = = y y n. STAGEWISE WEAK ORTHOGONAL MATCHING PURSUIT (SWOMP) SWOMP is similar to SWMP. Once I n has been selected in SWMP, the updating of the approximation with i I n gi n φ i might not be the best approximation from the subspace spanned by the columns φ i } i In. For SWOMP instead, the update of the approximation is y n = P Γ ny where P Γ n is the orthogonal projection into the span of the columns Γ n of Φ: y n = Φ Γ nφ Γ ny = Φ Γ n(φ Γ nφ Γ n) 1 Φ Γny on the indices Γ n = n =1 I and therefore the residue becomes r n = y P Γ ny. The residue is then orthogonal to all elements previously selected as can be seen from Φ Γ nrn = Φ Γ n(y Φ Γ nφ Γ ny) = Φ Γ ny Φ Γ nφ Γ n(φ Γ nφ Γ n) 1 Φ Γny = 0. Thus, at every iteration new elements are selected. The initialisation is as in SWMP and the recursion loop is: g n = Φ r n 1, the proxy of the signal. } I n := I n (α) := i : gi n α Φ r n 1 l (R N ), the stagewise wea selection rule with 0 < α 1. Γ n = Γ n 1 I n, update of selected elements. x n := x n Γ n = Φ Γny, estimation of the signal. y n = Φ Γ nφ Γny, approximation to the observation. r n = y y n, the residual. STAGEWISE WEAK GRADIENT PURSUIT (SWGP) The Gradient Pursuit (GP) is described in [4] and the Stagewise Wea Gradient Pursuit (SWGP) was developed in [5] (together with some other variants). We describe SWGP since GP is obtained from SWGP by setting the weaness parameter α = 1 in the selection rule.

5 GREEDY TYPE ALGORITHMS FOR RIP MATRICES 5 At iteration n = 0 we have: x 0 = 0, the estimation of the signal; r 0 = y, the residue; Γ 0 =, the support. Then, the recursion at step n until some criteria are met is the following: g n = Φ r n 1, the proxy of the signal. I n := I n (α) := i : gi n α rn 1 l (R N )}, the stagewise wea selection rule with 0 < α 1. Γ n = Γ n 1 I n, the updated support. d n Γ = n Φ Γ nrn 1, the updated direction. a n = rn 1,Φ Γ nd n Γ n, the optimised step. Φ Γ nd n Γ n 2 l 2 (R m ) x n := x n Γ = n xn 1 +a n d n Γn, the estimation to the signal. y n = y n 1 +a n Φ Γ nd n Γn, the approximation to the observation. r n = r n 1 a n Φ Γ nd n Γ n. Again, one can prove that r n = y y n. In [5] it is proved that SWGP converges to the solution at least as good as the simpler version GP Some Consequences of RIP. This section is based on some implicit results in [6], see especially Lemma 2.1, that were made explicit in [24]; for the proofs the reader can also see [8]. Lemma 2.1. Assume that Φ R m N satisfies RIP with δ, Γ, supp(u) = Γ. Then Φ Γ l 2 (R Γ ) l 2 (R m ) = Φ Γ l 2 (R m ) l 2 (R Γ ) (1+δ ) 1/2, (2.1) (1 δ ) u l 2 (R Γ ) Φ ΓΦ Γ u l 2 (R Γ ) (1+δ ) u l 2 (R Γ ), (2.2) (1+δ ) 1 u l 2 (R Γ ) (Φ Γ Φ Γ) 1 u l 2 (R Γ ) (1 δ ) 1 u l 2 (R Γ ), (2.3) For disjoint sets Γ,Γ such that Γ Γ, we have Φ Γ Φ Γu l 2 (R Γ ) δ u l 2 (R Γ ). (2.4) All of these results have become standard for the greedy algorithms in compressed sensing One more consequence. We derive here another bound as a consequence of the RIP property. Lemma 2.2. Let Φ verify RIP with parameter δ and let Γ. For all r span (Φ Γ ) Φ Γr l 2 (R Γ ) (1 δ ) 1 2 r l 2 (R m ) (2.5) Proof. For the pseudo-inverse we have Φ Γ r l (Φ 2 (R )= Γ Γ Φ Γ) 1 Φ Γ r l 2 (R Γ ) (1 δ ) 1 Φ Γ r l 2 (R Γ ),

6 6 EUGENIO HERNÁNDEZ AND DANIEL VERA where we have used (2.3) in the last inequality. We next use this result and the Schwarz inequality to get Φ Γr 2 l 2 (R Γ ) (1 δ ) Φ Γ r l 2 (R ) Φ Γr l Γ 2 (R Γ ) (1 δ ) Φ Γ r,φ Γ r = (1 δ )(Φ Γ r) Φ Γ r = (1 δ )r Φ Γ (Φ Γ Φ Γ) 1 Φ Γ r = (1 δ )y ΓΦ ΓΦ Γ (Φ ΓΦ Γ ) 1 Φ ΓΦ Γ y Γ = (1 δ )y Γ Φ Γ Φ Γy Γ = (1 δ ) r 2 l 2 (R m ), using the fact that, since r is in the span of Φ Γ, we may write r = Φ Γ y Γ for some y Γ, and conclude the proof Support Identification with SWGP, SWMP and SWOMP. We now give sufficient conditions on the matrix Φ so that the SWGP, SWMP and SWOMP algorithms select elements on the support Γ of the sparse signal x. The SWOMP algorithm will select new atoms in each iteration due to the orthogonality with previous residues, therefore convergence to exact reconstruction of sparse vectors is guaranteed in at most iterations when the condition is met. As previously mentioned, the stagewise wea selection rule is I n := I n (α) := i : gi n α r n 1 } l, (2.6) (R N ) for some α (0,1]. For the next result we follow the line of reasoning of [30], (see also [16] and [5]), where the results are given for OMP on quasi-incoherent dictionaries. Here we use Lemma 2.2. Theorem 2.3. Let Φ satisfies RIP with δ +1. A sufficient condition for the SWGP, SWMP and SWOMP algorithms with selection rule I n (α) given by (2.6), 0 < α 1, to identify elements in the support Γ of the sparse signal x is δ+1 α >. (2.7) 1 δ Proof. Since supp(x) = Γ, then r 0 span(φ Γ ). The algorithms update the approximations and residuals precisely in the indices contained in I n, so that to proceed by induction we can assume that after n 1 iterations we have r n 1 span(φ Γ ). We drop the superindex of r for the proof. The condition I n Γ is implied by Φ Γ c r l (R Γ c ) < α Φ Γ r l (R Γ ). Using the assumption on r, we can write r = Φ Γ y Γ for some y Γ. Rearranging to express the condition as a quotient (called the greedy selection ratio), squaring and choosing λ as one index in Γ c with the largest value, it yields Φ Γ c r 2 l (R Γ c ) Γ r 2 l (R Γ ) = Φ λ r 2 l (R λ ) Γ r 2 l (R Γ ) Φ λ r 2 l 2 (R λ ) 1 Φ r 2 Γ l 2 (R Γ ) = Φ λ Φ Γ y Γ 2 l 2 (R λ ) Φ Φ Γ Γ y Γ 2 l 2 (R Γ ) δ2 +1 (1 δ ) 2

7 GREEDY TYPE ALGORITHMS FOR RIP MATRICES 7 where the first inequality is due to usual norm inequalities and the second by (2.4) in the numerator and consecutive application of (2.5) and the left-hand side of (1.1) in the denominator. Comparing the right side of the last inequality with α 2 ends the proof. Corollary 2.4. With the conditions of Theorem 2.3, if any of the algorithms has selected elements, then it has found the whole support of the -sparse vector x. Since SWOMP, as well as OMP, select new elements at each iteration, in at most steps they recover the -sparse signal by (1.2). Thus we have Corollary 2.5. With the conditions of Theorem 2.3, the SWOMP and OMP algorithms recover every -sparse vector x in at most iterations. Remar 2.6. Observe that δ δ +1 since the set of all -sparse vectors is contained in the set of all ( +1)-sparse vectors. Moreover, to achieve (2.7) the RIP constants δ +1 and δ must satisfy δ +1 < α(1 δ )/. This gives a restriction on, that is < (α(1 δ )/δ +1 ) 2. Remar 2.7. The proof of Theorem 2.3 follow the ideas of the proof in [30], regarding deterministic quasi-incoherent dictionaries, which uses the fact that r = P Γ r = Φ Γ Φ Γ r (that is OMP type algorithms) and then bounds with usual norm inequalities. The condition obtained is then called Exact Reconstruction Condition ERC α for quasi-incoherent dictionaries since the result is on OMP and therefore exact reconstruction is guaranteed in at most iterations; a similar result is called Stability Condition in [16] since it is applied to MP and no exact reconstruction of sparse signals is guaranteed in iterations in this case. In our case, a straight use of the consequences of the RIP property lead us to the result. Remar 2.8. Suppose that δ +1 < Since δ δ +1 we have δ +1 < 1 1+ = δ. Thus, condition (2.7) is satisfied with α = 1. By Corollary 2.5, if δ +1 < 1 1+ the OMP algorithm recovers any -sparse vector x in at most iterations. This result has recently appeared in [20] and [22] (see also [11] and [18] for previous smaller bounds on δ +1 ). Moreover, it is proved in [22] that there exits a -sparse vector x R +1 and a ( +1) ( +1) matrix Φ satisfying RIP with δ +1 = 1 such that OMP does not recover x in iterations (proving a conjecture estated in [9]). Thus the bound δ +1 < 1 1+ is nearly optimal. 3. Support Identification with a Relaxed Wea Selection Rule Next, we consider another decision rule to select indices in the true support of a sparse signal. With the same notation as in section 2, let Ĩ n := Ĩn( α) := i : gi n α } r n 1 l, (3.1) 2 (R m )

8 8 EUGENIO HERNÁNDEZ AND DANIEL VERA be the relaxed wea selection rule. Rule (3.1) compares the absolute value of gi n = φ i,r n 1 with r n 1 l 2 (R m ). It can be proved (see the proof of Theorem 9.10 in [21], p. 422) that there exists β 0 > 0 such that for any x R m, sup i=1,...,n φ i,x β 0 x l 2 (R m ). Thus, Ĩn(α) is a non empty set if we choose α β 0. is Selection rule (3.1) is similar to the ones considered in [15] and [8]. In [15] the rule J n (t) := i : gi n t } r n 1 m l, 2 (R m ) and the algorithm developed is called StOMP. Thus, Ĩn( α) is J n (t) with α = t/ m. In [8] the rule is D n (δ) := i : gi n t } r n 1 l, 2 (R m ) and the algorithm developed is called DTresh (and its cousin STresh). Thus, Ĩn( α) is D n (t) with α = t/. In both of these algorithms the update of the approximation is done using the same updating as the OMP algorithm. In rule (3.1) we compare with the energy of the residual r n 1 l 2 (R m ) instead of comparing withthemaximum ofthecorrelationsoftheresidue withthe atoms ofφ, i.e. Φ r n 1 l (R N ). We choose the term relaxed since, as we will see in Theorem 3.1, the condition on δ +1 and δ is weaer than the one required in Theorem 2.3. Therefore, we decided to call the greedy algorithms with this selection rule Relaxed WGP, WMP or WOMP algorithms, writing RWGP, RWMP and RWOMP respetively. We still consider x a sparse vector with support on Γ. Theorem 3.1. Let Φ satisfies RIP with δ +1. A sufficient condition for the RWGP, RWMP and RWOMP algorithms to identify elements on the support Γ of x with the relaxed wea selection rule Ĩn( α) given by (3.1) is that Ĩn( α) and α > δ +1 (1 δ ) 1/2. (3.2) Proof. It is basically the same proof as for Theorem 2.3. Again, we drop the superindex of r and mae the assumption r span (Φ Γ ) to proceed by induction. The condition Ĩn Γ is implied by Φ Γ c r l (Γ c ) < α r l 2 (R m ). As before, rearranging to express it as a quotient (called relaxed wea greedy selection ratio), squaring and choosing λ as one index in Γ c with the largest value of Φ Γ c r, it

9 GREEDY TYPE ALGORITHMS FOR RIP MATRICES 9 gives Γ c r 2 l (R Γ c ) r 2 l 2 (R m ) = Φ λ r 2 l (R λ ) Φ Γ y Γ 2 l 2 (R m ) = Φ λ Φ Γ y Γ 2 l 2 (R λ ) Φ Γ y Γ 2 l 2 (R m ) Φ λ Φ Γ 2 l 2 (R Γ ) l 2 (R λ ) y Γ 2 l 2 (R Γ ) Φ Γ y Γ 2 l 2 (R m ) δ 2 +1 (1 δ ), where the first and second inequalities are due to usual norm inequalities and the third by (2.4) in the numerator and the left side of (1.1) in the denominator. Comparing the left hand side of the last inequality with α 2 ends the proof. Remar 3.2. Suppose r span(φ Γ ) where Φ is a RIP matrix; then Φ Γ r l (R Γ ) 1 Φ Γ r 2 l 2 (R Γ ) (1 δ ) 1/2 r 2 l 2 (R Γ ), by the usual norm inequality and Lemma 2.2. Thus taing α (1 δ ) 1/2 (3.3) we always have Ĩn( α) at each iteration of the algorithms with the Relaxed selection rule. Notice that (3.2) and (3.3) could hold at the same time for some value of α only if δ +1 (1 δ ) 1/2 (1 δ ) 1/2, which gives the following restriction: 1 δ δ +1. Remar 3.3. Theorem 3.1 could be considered as an Exact Reconstruction Condition for the Relaxed Wea OMP (RWOMP) algorithm with RIP matrices and with parameter δ +1 (see Corollary 2.5) as long as (3.3) is satisfied. The probability of success depends exclusively on the probability that a random ensemble verifies RIP. It has been proved that Gaussian, Bernoulli and partial Fourier matrices verify RIP with very high probability (exponential concentration) as long as the number of measurements m Clog(N/) for the first two ensembles (see [6],[7]) and m Clog 5 (N) for the random Fourier (see [6], [26]). 4. Convergence. The Sparse Case. In this section we obtain convergence rates for the SWGP, SWMP and SWOMP algorithms and their relaxed counterparts for matrices satisfying RIP. The results here are given in terms of the reduccion of the energy of the residuals r n = y y n of the observation y = Φx rather in the energy of the residuals of the approximation, x x n, as is done in [25] for CoSaMP and in [8] for DThresh.

10 10 EUGENIO HERNÁNDEZ AND DANIEL VERA 4.1. Convergence of GP, SWGP and RWGP. In [4] the analysis of convergence ofthegpisbasedonanexistencetheorem(seetheorem9.10in[21])whenφ C N N 1 is a dictionary (this means that Φ contains at least a base for R N and thus N 1 N) that verifies the Exact Reconstruction Condition ERC α (Γ) for quasi-incoherent dictionaries (see [30]). The analysis in [31] is done for random admissible matrices. The result in the theorem below is obtained for matrices satisfying RIP. We still consider x a sparse vector with support on Γ. Theorem 4.1. Consider the algorithms GP, SWGP and RWGP and suppose that at iteration n we have Γ s Γ, s = 1,2,...,n. Let Φ verifies RIP with δ. Then, for all -sparse vectors x R N (supp (x) = Γ ), r n 2 l 2 (R m ) C r n 1 2, (4.1) l 2 (R m ) with C = (1 1 δ (1+δ ) )1/2 < 1. In the case RWGP suppose α (1 δ ) 1/2 Ĩ n ( α). Proof. To shorten notation we will write d n = d n Γn. We have so that r n 2 l 2 (R m ) = r n 1 a n Φ Γ nd n,r n 1 a n Φ Γ nd n = r n 1 2 l 2 (R m ) an r n 1,Φ Γ nd n a n Φ Γ nd n,r n 1 + a n Φ Γ nd n,a n Φ Γ nd n = r n 1 2 l 2 (R m ) 2 Φ Γ ndn Γ n,rn 1 2 Φ Γ nd n 2 l 2 (R m ) + Φ Γ ndn,r n 1 2 Φ Γ nd n 4 l 2 (R m ) Φ Γ nd n 2 l 2 (R m ) = r n 1 2 rn 1,Φ Γ nd n 2 l 2 (R m ) Φ Γ nd n 2. (4.2) l 2 (R m ) Since d n = Φ Γ nrn 1, the second term above can be bounded below by r n 1,Φ Γ nd n 2 Φ Γ nd n 2 l 2 (R m ) = Φ Γ nrn 1,d n 2 Φ Γ nd n 2 l 2 (R m ) Φ Γ nrn 1 4 l 2 (R Γn ) Φ Γ n 2 l 2 (R Γn ) l 2 (R m ) Φ Γ nrn 1 2 l 2 (R Γn ) We have Φ Γ nrn 1 2 l 2 (R Γn ) 1+δ. (4.3) Φ Γ nrn 1 l (R Γn ) In r n 1 l (R In ) = r n 1 l (R N ), (4.4) because Γ n I n in the inequality and because the definition of I n (in GPand SWGP) in the equality. Since we are supposing that Γ n Γ, using (4.4) yields Γ nr n 1 l 2 (R Γn ) Γ nr n 1 l (R Γn ) r n 1 l (R N ) Γ r n 1 l (R Γ ). (4.5) By (4.5) and usual norm inequality Γ nr n 1 l 2 (R Γn ) Φ Γ r n 1 1 l (R Γ ) Φ Γ r n 1. l 2 (R Γ )

11 GREEDY TYPE ALGORITHMS FOR RIP MATRICES 11 Since r n span (Φ Γ ) (assuming conditions in Theorem 2.3 are met and because r n = y y n in GP and SWGP), by Lemma 2.2 we can write Γ nr n δ l 2 (R Γn ) r n 1 2. (4.6) l 2 (R m ) Substituting (4.6) in (4.3) and (4.2) one gets ( r n 2 l 2 (R m ) 1 1 δ ) r n 1 2 (1+δ ), l 2 (R m ) which shows the result for GP and SWGP algorithms. For RWGP the selection rule Ĩ n ( α) = i : φi,r n 1 α r n 1 l 2 (R m ) } do not allow us to write (4.4). But in this case using Remar 3.2 we can write Γ nr n 1 2 l 2 (R Γn ) Γ nr n 1 2 l (R Γn ) Φ Ĩn r n δ r n 1 2 l 2 (R m ), which shows (4.6) for RWGP and therefore the result. l (RĨn ) Remar 4.2. In [4, Theorem 3], it is proved that r n 2 l 2 (R m ) c rn 1 2 l 2 (R m ) with c = (1 ω ) and ω is a positive real number such that Φx 2 Φ 2 l (R N ) > ω x 2 l 2 (R m ) for 2 all x R m. Our Theorem 4.1 gives a value of C depending on the restricted isometry constant δ and the sparseness. This value of C is less than 1, but very close to 1 when increases. It is easy to show that it can never hold C 1/2 when 2. Using Theorems 2.3, 3.1 and 4.1 we deduce the following: Corollary 4.3. Suppose that Φ satisfies RIP with constants δ and δ +1. δ+1 i) Suppose 1 δ < 1; then, the GP algorithm satisfies (4.1). ii) Let 0 < α 1 and suppose δ+1 1 δ < α; then, the SWGP algorithm with selection rule I n (α) satisfies (4.1). iii) Suppose that δ +1 1 δ < 1 δ and that α is chosen such that +1 < α < (1 δ ) 1/2 (1 δ ) 1/2 ; then, the RWGP algorithm with selection rule Ĩn( α) satisfies (4.1). Remar 4.4. Convergence in algorithms of type Gradient Pursuit (GP, WGP y RWGP) is given in Theorem 4.1 in terms of the convergence of the energy of residual r n = y y n. If Φ satisfies RIP, convergence of the residual in terms of estimation x x n can be obtained from the last result. For algorithms of type GP it is not hard to show that y n = Φx n. Then, y y n l 2 (R m ) = Φx Φx n l 2 (R m ) = Φ Γ (x xn ) l 2 (R m ) (1 δ ) 1/2 x x n l 2 (R N ) (4.7) because the left-hand side of RIP as long as Γ n Γ (which is verified under the conditions of Theorems 2.3 and 3.1). Analogously, if we also assume Γ n 1 Γ we

12 12 EUGENIO HERNÁNDEZ AND DANIEL VERA have y y n 1 l 2 (R m ) = x Φx n 1 l 2 (R m ) = Γ (x x n 1 ) l 2 (R m ) (1+δ ) 1/2 x x n 1 l 2 (R N ) (4.8) because the right-hand side of RIP. From (4.7) and (4.8) we deduce x x n l 2 (R N ) C ( 1+δ 1 δ ) 1/2 x x n 1 l 2 (R N ), (4.9) where C is the constant in Theorem 4.1. Observe that, besides conditions of Corollary 4.3, if we want to assure convergence in l 2 (R N ) of x n to x we need which requires δ < ( 1+δ 1 δ ) 1/2 (1 1 δ (1+δ ) )1/2 < 1, If from some iteration we had Γ n = Γ = supp (x) reduction of energy of residuals would be faster than the given in (4.1), as the next result shows. Theorem 4.5. Consider GP, WGP and RWGP algorithms and suppose that at iteration n 0 we have Γ n 0 = Γ = sop (x). Suppose that α, α > δ +1 and Φ verifies (1 δ ) 1/2 RIP with parameter δ +1. Then, for all n n 0, r n l 2 (R m ) D r n 1 l (4.10) 2 (R m ) with D = (1 1 δ 1+δ ) 1/2 = ( 2δ 1+δ ) 1/2 < 1. Proof. Equality (4.2) and inequality (4.3) in the proof of Theorem 4.1 are still valid in our context. Since Γ n 0 = Γ we have Γ n = Γ for n n 0. The fact that α, α > δ +1 allow us to conclude (1 δ ) 1/2 Ĩn 0,Ĩn Γ. Hence, we can replace Γ n by Γ in (4.3) and since r n 1 span (Φ Γ ), by Lemma 2.2 we can write Γ nr n 1 2 = l 2 (R m ) Φ Γ r n 1 2 l 2 (R m ) (1 δ ) r n 1 2 l 2 (R m ). Substituting this inequality in (4.2) and (4.3) we obtain the result for the three algorithms. Remar 4.6. Constant D of Theorem 4.5 can be made as close to 0 as desired taing δ small enough. In particular, it is enough to tae δ 1/ to reduce the energy of the residuals by half in just one iteration, if conditions of Theorem 4.1 are satisfied. Reasoning as in Remar 4.4 we have x x n l 2 (R N ) (1+δ 1 δ ) 1/2 (1 1 δ 1+δ ) 1/2 x x n 1 l 2 (R N ) for which it is enough to tae δ < 1/3 to assure convergence of x n to x in l 2 (R N ) once we have Γ n 0 = Γ.

13 GREEDY TYPE ALGORITHMS FOR RIP MATRICES Convergence of SWMP and RWMP. The results on convergence in this section are similar to those obtained in [16] for quasi-incoherent dictionaries. Theorem 4.7. Consider the algorithms SWMP and RWMP. Suppose that conditions of Theorems 2.3 (for SWMP) and 3.1 are verified so that Γ s Γ, s = 1,2,... Then, for every -sparse vector x R N with supp (x) = Γ, r n 2 l 2 (R m ) C r n 1 2, (4.11) l 2 (R m ) with C = (1 (1 δ ) 2 ) 1/2 < 1. Proof. We have for SWMP as well as for RWMP algorithms that r n 2 l 2 (R m ) = r n 1 Φ In Φ I n r n 1,r n 1 Φ In Φ I n r n 1 = r n 1 2 l 2 (R m ) rn 1,Φ In Φ I n r n 1 Φ In Φ I n r n 1,r n 1 + In Φ I n r n 1 2 l 2 (R m ) = r n 1 2 l 2 (R m ) 2 In r n 1 2 l 2 (R In ) + In Φ I n r n 1 2 l 2 (R m ).(4.12) By (2.1) we have In Φ I n r n 1 2 l 2 (R m ) (1+δ ) In r n 1 2 l 2 (R In ) since I n Γ and Γ =. Hence, from (4.12) we have r n 2 l 2 (R m ) r n 1 2 l 2 (R m ) 2 In r n 1 2 l 2 (R In ) +(1+δ ) In r n 1 2 l 2 (R In ) = r n 1 2 l 2 (R m ) (1 δ ) In r n 1 2 l 2 (R In ). (4.13) For SWMP as well as RWMP we have In r n δ l 2 (R In ) r n 1 2. (4.14) l 2 (R m ) For SWMP the proof is as in (4.6) from Theorem 4.1 and it is not hard to prove it also for RWMP. Substituting (4.14) in (4.13) we have r n 2 l 2 (R m ) (1 (1 δ ) 2 ) r n 1 2 l 2 (R m ) which is the desired result. Corollary 4.8. Suppose Φ satisfies RIP with constants δ and δ +1. δ+1 a) Let 0 < α 1 and suppose that 1 δ < α; then, SWMP with selection rule I n (α) satisfies (4.11). b) Suppose that δ +1 1 δ < 1 and α is chosen so that δ +1 (1 δ ) 1/2 < α < (1 δ ) 2 ; then, RWMP with selection rule Ĩn( α) satisfies (4.11). Remar 4.9. Constant C of Theorem 4.7 is a numer less than 1, so there is always a decreasing in the residual energy. However, it is close to 1 when increases. It is easy to show that we can never have C 1/2 when 2.

14 14 EUGENIO HERNÁNDEZ AND DANIEL VERA Remar As in Remar 4.4 the decreasing of the residual energy given in (4.11) for SWMP y RWMP can be translated to convergence of the estimation. Thus, x x n l 2 (R N ) (1+δ ) 1/2 C x x n 1 1 δ. l 2 (R N ) For ( 1+δ 1 δ ) 1/2 (1 (1 δ ) 2 ) 1/2 to be less than 1 we must have 1 (1 δ ) 2 < 1 δ 2δ < (1 δ ) 2. 1+δ 1+δ Since (1+δ )(1 δ ) 2 = (1 δ )(1 δ 2) = 1 δ δ 2+δ3 last inequality is equivalent to 2δ < 1 δ δ 2 +δ 3 δ 3 δ 2 (1+2)δ +1 > 0. If δ < 1, we have 2+2 from which 1 > δ +(2 +1)δ > δ 2 +(2+1)δ > δ 2 +(2 +1)δ δ 3, δ < is enough to have convergence of approximations in WMP y RWMP Convergencia de WOMP y RWOMP. In WOMP as well as in RWOMP the residual r n is the vector that carries out the distance from y to the subspace V n = Φ Γ nx : x R n with supp(x) Γ n }. Since Γ n 1 Γ n, it is clear that r n l 2 (R m ) r n 1 l 2 (R m ) is always accomplished. A strict inequality can be obtained observing that the residual in WOMP or RWOMP, temporarily denoted r n OMP, has an energy no larger than that for WGP, RWGP, WMP or RWMP, temporarily denoted r n GP and rn MP, since yn GP as well as ymp n are elements from V n. Therefore, if conditions of Theorems 2.3 are verified (for WGP) or 3.1 (for RWGP) we can apply Theorem 4.1 to get r n OMP l 2 (R m ) r n GP l 2 (R m ) C r n 1 GP l 2 (R m ) C n r 0 l GP 2 (R m ) Cn y l 2 (R m ), (4.15) with C < 1, the constant of Theorem 4.1. Analogously, but using Theorem 4.7 we have r n OMP l 2 (R m ) C n y l 2 (R m ), (4.16) with C < 1 the constant of Theorem 4.7. Among the constants C and C the relation is C < C since C < C 1 δ (1+δ ) > (1 δ ) 2 1 > 1 δ 2, 1 1+δ > 1 δ and the last inequality is true because 0 < δ < 1. Therefore, (4.15) gives faster convergence than (4.16) and it proves that GP algorithms converge faster than MP algorithms.

15 GREEDY TYPE ALGORITHMS FOR RIP MATRICES 15 Sincer n OMP isavectorperpendiculartov n, theconstantin(4.15)couldbeimproved, at least in principle. Since r n OMP = y y n = y Φ Γ nφ Γ ny, where Φ Γ is the pseudo-inverse of Φ n Γn, we have to study r n OMP 2 l 2 (R m ) = y 2 l 2 (R m ) y,φ Γ nφ Γ y Φ n Γ nφ Γ y,y n + Φ Γ nφ Γ ny 2. (4.17) l 2 (R m ) We are not yet able to find a bound for (4.17) of the form r n OMP l 2 (R m ) Bn y l 2 (R m ) (4.18) with B K < C. In the case that conditions of Theorems 2.3 and 3.1 are satisfied (for example, if δ +1 < α 1 δ δ for WOMP and +1 < α < (1 δ ) 1/2 (1 δ ) 1/2 for RWOMP) inequalities (4.15)and(4.16) aretrivial ifn since algorithmsoftypeompidentify the support of a -sparse signal x in as much iterations, and then r n = 0. Therefore, it is only interesting to find bounds of the form (4.18) from expression (4.17) if it is satisfied with values of δ +1 andδ less restrictive than those in Theorems 2.3 and 3.1, for which algorithms do not identify the support of x. 5. Behavior of the selection rules for some random matrices The reader can find in [13] a way to construct matrices that satisfy RIP deterministically. These matrices are of order m N with m = p 2 (p a prime number) and N = p r+1, 0 < r < p, and satisfy RIP with < p + 1 and δ r = ( 1)r/p. Therefore, m = p 2 > ( 1) 2 r 2 > ( 1) 2 ; which gives a value of m much larger than the necessary to recover a -sparse signal with the l 1 minimization which is m Clog(N/). It is nown (see [3]) that there exist random matrices that satisfy RIP with parameter δ for any m Clog(N/) with probability greater than 1 2e c m. Among those are the matrices that satisfy an inequality nown as concentration of measure, namely, } Φ(ω)x l P 2 (R m ) x l 2 (R N ) ε x 2 l 2 (R N ) 2e mc0(ε), 0 < ε < 1, (5.1) where the probability is taen over all random matrices Φ(ω) of order m N and c 0 (ε) > 0 is a constant that depends only on ε. An example of such matrices are those Φ = (φ i,j ) such that φ i,j is an independent Gaussian random variable N(0,1/ m), that is, with mean 0 and standard deviation 1/ m. In this case we have c 0 (ε) = ε2 ε3 (see [10]). 4 6 Another example are the matrices whose entries are independent Bernoulli random variables with values 1/ m,1/ m}, with probability 1/2 each. In this case we also have c 0 (ε) = ε2 ε3 (see [1]). 4 6 In this section we will study the behavior of the selection rules and algorithms given in 2.1 and 3 with respect to the random matrices Φ as just described above. The aim is to prove directly that this ind of matrices select elements of the support of a - sparse signal with high probability, following the reasoning given in [31] for Orhogonal Matching Pursuit (OMP).

16 16 EUGENIO HERNÁNDEZ AND DANIEL VERA We start with a result on random processes (see [31] and the references cited in this article), whose proof is given for completeness: Lemma 5.1. a) Letzbeavector withdimensionmwhosecomponentsare Gaussian r.v. N(0,1/ m) i.i.d. Independently a vector u unitary in l 2 (R m ) is chosen. We have, for 0 < ε 1, P u,z ε} e ε2 2 m. (5.2) b) Let w a vector of dimension m whose entries are symmetric Bernoulli r.v. 1/ m,1/ m} i.i.d. Independently a vector u unitary in l 2 (R m ) is chosen. We have, for 0 < ε 1, P u,w ε} 2e ε2 2 m. (5.3) Proof. The inner product u,z = m i=1 u iz i is a Gaussian r.v. (the sum of Gaussian r.v. is a Gaussian r.v.) with mean E u,z } = m i=1 u iez i } = 0 and standard deviation (E u,z 2} m ) 1/2 = ( u 2 i E } m zi 2 + u i u j Ez i z j }) 1/2 = 1, m i=1 since the r.v. z i are independents and that u, unitary, is independent from z. Hence, since u, z is symmetric m 2 P u,z ε} = 2P u,z ε} = 2 e 1 2 mx2 dx = e y2 2π ε π ε 2 dy (5.4) m maing the change of variable mx = y. Let We have where I 2 = I = ε m i=1 j i e y2 2 dy. ( )( ) e y2 ε 2 dy e x2 m ε 2 dx = m e y2 +x 2 2 dxdy, R ε,m Passing to polar coordinates I 2 Therefore, I π ε m ( e y2 +x 2 ε m R ε,m = (x,y) = w R 2 : x,y 0, w 2 ε 2m}. π/2 0 2 e ε 2 m ε 2m ) 2 dy π [ ] e r2 2 rdrdθ = e r2 2 2 ε = π m 2m 2 e ε2 2. 2, which substituting in (5.4) yields (5.2). dx

17 GREEDY TYPE ALGORITHMS FOR RIP MATRICES 17 b) In this case the Hoeffding inequality can be applied (see Theorem 4 in [19]) to get m } m m } } P u,w ε } = 2P u i w i ε = 2P u i w i E u i w i ε i=1 i=1 = 2e 2ε2 / m i=1 (u i/ m ( u i / m)) 2 = 2e ε2 m 2, since E m i=1 u iw i } = m i=1 u iew i } = 0 because u is independent from the r.v. w i and u 2 = 1. In this section we will consider random matrices Φ(ω) R m N that satisfy next conditions, similar to those conditions for admissible matrices in [31]: (M1) The columns of Φ(ω) are statistically independents. } (M2) For each column φ j (ω), j = 1,...,N, of Φ(ω) we have E φ j (ω) 2 l 2 (R m ) = 1. (M3) Let u R m a vector with u l 2 (R m ) 1. If φ(ω) is a column of Φ(ω) independent from u, P φ(ω),u ε} q 1 e c 1ε 2m, with q 1,c 1 constants, q 1 1. (M4) Forevery set Γ 1,...,N}with Γ < N andforevery r span (Φ Γ (ω)) we have i=1 P Φ Γ(ω)r l 2 (R Γ ) 1 } 2 r l 2 (R m ) 1 q 2 D e c2m, with q 2,D and c 2 constants, q 2,D > 1. These properties are satisfied by the Gaussian and Bernoulli random matrices as described at the beginning of this section. Property (M3) es the content of Lemma 5.1 (observe that if u 2 1 Lemma 5.1 is also verified since if we substitute u for a unitary vector in its direction then the probability increases) and condition (M4) is proved in Lemma 5.1 in [3], where a proof is given from a concentration of measure inequality as that in (5.1) Probabilistic support identification for relaxed algorithms. In this section we show that matrices satisfying (M1), (M2), (M3) and (M4) allow to identify indices in the support of a -sparse signal with high probability. The result follows the arguments in [31], with necessary modifications to fit the relaxed selection rule Ĩ( α) given in (3.1). Theorem 5.2. Let Φ(ω) R m N be a random matrix satisfyiing (M1), (M2), (M3) and (M4). Let x R N with supp (x) = Γ y Γ < N. Sea y = Φx. Suppose that α 1/2 and given l, 1 l < N, m max 1 c 1 α 2 lnq 3l(N ), 2 c 2 lnd}, (5.5) with q 3 = q 1 +q 2. Algorithms RWMP, RWOMP y RWGP, with selection rule Ĩn( α) given by(3.1), identify elements from Γ in the first l iterations with probability greater or equal to 1 q 3 l(n )e c 1 α 2m.

18 18 EUGENIO HERNÁNDEZ AND DANIEL VERA Proof. For any of the given algorithms with selection rule Ĩn( α), let E n be the set of matrices that identify elements from Γ = sop (x) at iteration n, n = 1,2,3,... We start bounding PE 1 }, that is, the probability that a matrix identify elements from Γ in the first iteration. For a random matrix Φ(ω) to belong to E 1 it is enough that the next two conditions are satisfied: (A 1 ) That Ĩ1( α). (B 1 ) For u 0 = r 0 / r 0 l 2 (R m ) (we remind that r0 = y) it should be true that Γ c (ω)r 0 l (R Γ c ) r 0 l 2 (R m ) = Φ Γ c (ω)u 0 l (R Γ c ) = max j Γ c u 0,φ j (ω) < α. With abuse of notation, we call A 1 and B 1 the sets of matrices that satisfy (A 1 ) and (B 1 ) respectively. We have PE 1 } PA 1 B 1 } = PA 1 B 1 }PB 1 }. (5.6) To estimate PB 1 } we write } PB 1 } = P max u 0,φ j (ω) < α = P j Γ c u 0,φ j (ω) < α} } j Γ c = P u 0,φ j (ω) < α} j Γ c due to the independence of the columns φ j from Φ(ω) expressed in (M1). Since u 0 = y/ y 2 is unitary and y = Φx = Φ Γ x Γ only depends on the columns φ j with j Γ, u 0 is independent from the columns φ j with j Γ c, and we can use (M3) to get PB 1 } j Γ c (1 q 1 e c1 α2m ) = (1 q 1 e c1 α2m ) N 1 q 1 (N )e c1 α2m, (5.7) since (1 x) n 1 nx if n 1 and x 1. On the other hand, conditioned to B 1 we have that Φ A 1 Φ Γ y α y 2. Since α 1 2, } PA 1 B 1 } = P Φ Γ (ω)y α y l (R Γ ) l 2 (R m ) } P Φ Γ (ω)y α l y 2 (R Γ ) l 2 (R m ) P Φ Γ (ω)y 1 } l 2 (R Γ ) 2 y l 2 (R m ). Using now (M4) we deduce (observe that y = Φx, therefore y span (Φ Γ )): PA 1 B 1 } 1 q 2 D e c 2m 1 q 2 e c 2m/2 taing c 2 m+lnd c 2 m/2, that is, m 2 c 2 lnd. Substituting (5.7) and (5.8) in (5.6) we get PE 1 } (1 q 1 (N )e c 1 α 2m )(1 q 2 e c 2m/2 ) 1 q 1 (N )e c 1 α 2m q 2 e c 2m/2 (5.8) 1 q 3 (N )e c 1 α 2m, (5.9)

19 GREEDY TYPE ALGORITHMS FOR RIP MATRICES 19 since α < 1/2, with q 3 = q 1 +q 2. We have proved the result for l = 1. Suppose the result is true until iteration l 1, this is, P l 1 n=1 E n} 1 q3 (l 1)(N )e c 1 α 2m. (5.10) Until iteration l we have P l n=1 E } n = P El l 1 n=1 E } n P l 1 n=1 E n}. (5.11) For a random matrix Φ(ω) to belong to E l it is enough that next two conditions are satisfied: (A l ) That Ĩl( α). (B l ) For u l 1 = r l 1 / r l 1 2 it should be true that Γ c (ω)r l 1 l (R Γ c ) r l 1 l 2 (R m ) = Φ Γ c (ω)u l 1 l (R Γ c ) = max j Γ c u l 1,φ j (ω) < α. With abuse of notation, we call A l and B l the sets of matrices that satisfy (A l ) and (B l ) respectively. We have P E l l 1 n=1 E n } P Al B l ( l 1 n=1e n ) } P B l l 1 n=1 E n }. (5.12) With the same reasoning that leads to (5.7) we get P B l l 1 n=1 E } n 1 q1 (N )e c 1 α 2 m (5.13) since we are conditioned to the fact that the algorithm has selected indices from Γ until iteration l 1, we have that r l 1 only depends on the columns φ j with j Γ, therefore u l 1 is a unitary vector independent from the columns φ j with j Γ c. Conditioned to B l, Φ A l Γ (ω)r l 1 α r l 1 2. Since α 1 2, we have P A l B l ( l 1 n=1e n ) } = P Γ (ω)r l 1 α } l (R Γ ) r l 1 l 2 (R m ) } l 1 n=1 E n P Φ Γ (ω)y α l } 2 (R Γ ) r l 1 l 2 (R m ) } l 1 n=1 E n Φ P Γ (ω)r l 1 1 } l 2 (R Γ ) r l 1 2 l 2 (R m ) l 1 n=1 E n. It is now possible to use (M4) since r l 1 span (Φ Γ ) (which is not hard to prove) and since we are computing the probability conditioned to the fact that the algorithm has selected indices from Γ. Hence, with the same reasoning that leads to (5.8) we get P A l B l ( l 1 n=1 E n) } 1 q 2 e c2m/2. (5.14) From (5.13) and (5.14) we deduce P E l l 1 n=1 E n } 1 q3 (N )e c 1 α 2 m (5.15) reasoning as in the chain of inequalities that leads to (5.9). Substitute (5.15) and (5.10) in (5.11) to get P l n=1 E n} (1 q3 (N )e c 1 α 2 m/2 )(1 q 3 (l 1)(N )e c 1 α 2m ) 1 q 3 (N )e c 1 α 2 m/2 q 3 (l 1)(N )e c 1 α 2 m = 1 q 3 l(n )e c 1 α 2m, (5.16)

20 20 EUGENIO HERNÁNDEZ AND DANIEL VERA which is the desired estimation. Condition(5.5) assures that this probability is greater than zero. Remar 5.3. Taing α = 1 2, condition (5.5) is written m max 4 c 1 lnq 3 l(n ), 2 c 2 lnd}. Hence, if we tae m Clnl(N ), with C large enough, inequality (5.5) is guaranteed. For RWOMP with parameter α each iteration adds one element at least, as long as Ĩ n ( α). In this situation, RWOMP identifies every index within Γ in as much iteration. Besides, the OMP type algorithms recover any -sparse vector x once the support is nown. Since (N ) N 2 /4 if < N, we have the following corollary. Corollary 5.4. Suppose the same hypothesis that in Theorem 5.2 substituting (5.5) by m max 1 c lnq N 2 1 α 2 3 4, 2 lnd}. (5.17) c 2 Then, RWOMP algorithm recovers the -sparse vector x in the first iterations with probability greater or equal to 1 q 3 N 2 4 e c 1 α2m. Remar 5.5. Suppose we want to obtain the result in Theorem 5.2 with a probability greater than or equal to 1 β for a number β (0,1). It would be enough to tae This is accomplished if for which it is enough to tae To be sure that 1 q 3 l(n )e c 1 α 2m 1 β. βe c 1 α 2m q 3 l(n ), m max 2 c ln q 3l(N ), 2 lnd}. (5.18) 1 α 2 β c 2 ln q 3l(N ) β for all q 3 1, l < N and < N, it suffices to tae β (0,1/e). An analogous comment can be done for Corollary 5.4, and in this case (5.17) should be substituted by m max 1 c ln q 3N 2 1 α 2 4β, 2 lnd} (5.19) c 2 for RWOMP to recover a -sparse signal x within the first iterations with probability greater than or equal to 1 β. > 1

21 GREEDY TYPE ALGORITHMS FOR RIP MATRICES Probabilistic support identification with selection rule I(α). In this section we will study the behavior of matrices satisfying (M1), (M2), (M3) and (M4) with respect to the algorithms with selection rule I n (α) = i : φi,r n 1 α Φ r n 1 l }, (5.20) (R N ) 0 < α 1. An advantage of I n (α) with respect to the selection rule Ĩn( α) given in (3.1) is that I n (α) for all α (0,1) and for all n = 1,2,... There is a disadvantage, however. If we want to prove that I n (α) Γ = supp (x) we need Γ c r n 1 Φ r n 1 < α to be satisfied. If we wrote u n 1 = r n 1 / Φ r n 1 as in the proof of Theorem 5.2 we would not have u n 1 2 1, and could not use (M3). Property (M4) saves this situation as shown in the next result. Theorem 5.6. Choose Φ(ω) R m N a random matrix satisfying (M1), (M2), (M3) y (M4). Let x R N with supp (x) = Γ and Γ < N. Let y = Φx. Suppose that m max 4 c 1 α lnq 3l(N ), 2 lnd}, (5.21) 2 c 2 with q 3 = q 1 + q 2. Algorithms WMP, WOMP and WGP, with selection rule I n (α) given in 5.20, identify elements from Γ in the first l iterations with probability greater than or equal to 1 q 3 l(n )e c 1 α2 4 m. Proof. For any of the given algorithms with selection rule I n (α), let E n be the set of matrices that identify elements from Γ = supp (x) at iteration n, n = 1,2,... We start bounding PE 1 }, that is, the probability that a matrix identify elements from Γ in the first iteration. For a random matrix Φ(ω) belong to E 1 it is enough that Φ Γ (ω)r 0 c l (R Γ c ) (ω)r 0 < α, (r 0 = y), (5.22) Γ l (R N ) is verified. Let u 0 = y/2 Γ y ; we have Γ (ω)y c Φ Φ (ω)y (ω)y Γ Φ Γ c (ω)y = 2 Φ Γ (ω)u c 0 Γ 2 = 2 sup φ j (ω),u 0. (5.23) j Γ c Let A 1 be the set of matrices satisfying (M1), (M2), (M3) and (M4) such that sup φ j (ω),u 0 < α j Γ c 2. (5.24) From (5.23) we deduce that if Φ(ω) A 1, then Φ(ω) satisfy (5.22) and we have Φ(ω) E 1. Therefore, PE 1 } PA 1 }. (5.25)

22 22 EUGENIO HERNÁNDEZ AND DANIEL VERA Let B 1 be the set of matrices Φ(ω) satisfying (M1), (M2), (M3) and (M4) such that From (5.25) we deduce Φ Γ (ω)y y 2. (5.26) PE 1 } PA 1 } PA 1 B 1 } = PA 1 B 1 }PB 1 }. (5.27) Since y = Φx = Φ Γ x Γ span (Φ Γ ) from (M4) we deduce PB 1 } 1 q 2 D e c 2m 1 q 2 e c 2m/2 (5.28) taing c 2 m+lnd c 2 m/2, this is, m 2 c 2 lnd. Conditioned to B 1 the vector u 0 = y/2 Φ y Γ 2 satisfy u 0 2 = 1 y 2 2 Φ y 1. Γ 2 Therefore, to bound } PA 1 B 1 } = P sup φ j (ω),u 0 < α j Γ c 2 B 1 we use (M3) to get PA 1 B 1 } = P = j Γ c φ j (ω),u 0 < α j Γ c P 2 } B 1 } φ j (ω),u 0 < α 2 B 1 (1 q 1 e c 1 α2 4 m ) N duetotheindependenceofthecolumnsφ j ofφ(ω)expressedin(m1)andbecauseu 0 = y/2 Γ y 2 only depends on the columns φ j with j Γ, and therefore independent of the columns φ j with j Γ c. Since (1 x) n 1 nx if n 1, x 1, we can write Substituting (5.29) and (5.28) in (5.27) we get PA 1 B 1 } 1 q 1 (N )e c 1 α2 4 m. (5.29) PE 1 } (1 q 1 (N )e c 1 α2 4 m )(1 q 2 e c 1 m 2 ) 1 q 1 (N )e c 1 α2 4 m q 2 e c 1 m 2 1 q 3 (N )e c 1 α2 4 m (5.30) with q 3 = q 1 +q 2, since α2 1/4 < 1/2. This proves the result for l = 1. 4 Suppose that the result is true until iteration l 1, this is Until iteration l we have P } l 1 n=1e n 1 q3 (l 1)(N )e c 1 α 2 4 m. (5.31) P l n=1 E n } = P El l 1 n=1 E n } P l 1 n=1 E n}. (5.32) }

23 GREEDY TYPE ALGORITHMS FOR RIP MATRICES 23 Write u l 1 = r l 1 /2 r l 1 Γ 2. Let A l be the set of matrices satisfying (M1), (M2), (M3) and (M4) such that sup φ j (ω),u l 1 < α j Γ c 2. (5.33) If Φ A l we have Γ (ω)r l 1 c Φ (ω)r l 1 Γ Φ Γ c (ω)r l 1 (ω)r l 1 Γ 2 = 2 sup φ j (ω),u l 1 < α, j Γ c = 2 Φ Γ c(ω)u l 1 and this is enough to assure that I n (α) Γ, this is, Φ E l. Therefore, P E l l 1 n=1 E } n P Al l 1 n=1 E n}. (5.34) Let B l be the set of matrices Φ(ω) satisfying (M1), (M2), (M3) and (M4) such that From (5.34) we deduce Γ (ω)r n r n (5.35) P } } E l l 1 n=1 E n P Al B l l 1 n=1 E n = P Al B l ( l 1 n=1e n ) } P } B l l 1 n=1 E n. (5.36) Conditioned to B l, r l 1 2 u l 1 2 = 1 2 Φ r l 1 1 Γ 2 due to (5.35). By (M1) the columns φ j of Φ(ω) are independents among them. Besides, conditioned to l 1 n=1 E n, the vector r n 1 only depends on the columns φ j with j Γ since I n (α) Γ, n = 1,2,...,l 1. Hence, u l 1 is independent from the columns φ j with j Γ c and we can use (M3). Then, P A l B l ( l 1 n=1e n ) } = P sup φ j (ω),u l 1 < α j Γ c 2 B l ( l 1 n=1e n ) = φ j (ω),u l 1 < α } 2 B l ( l 1 n=1 E n) j Γ c P (1 q 1 e c 1 α2 4 m ) N 1 q 1 (N )e c 1 α2 4 m. (5.37) Conditioned to l 1 n=1e n, it is not hard to prove that the vector r n 1 span (Φ Γ ). We can use (M4) to get P B l l 1 n=1 E n } 1 q1 D e c 2m 1 q 1 e c 2m/2 } (5.38) m 2 taing c 2 m+lnd c 2, this is, m 2 c 2 lnd. Substituting (5.36) y (5.38) en (5.35) and proceed as in the calculations that lead to (5.30) to get P E l l 1 n=1 E } n 1 q3 (N )e c 1 α2 4 m, (5.39)

24 24 EUGENIO HERNÁNDEZ AND DANIEL VERA with q 3 = q 1 +q 2. Substitute (5.39) and (5.31) in (5.32) to get P l n=1 E } n (1 q3 (N )e c 1 α2 4 m )(1 q 3 (l 1)(N )e c 1 α2 4 m ) 1 q 3 (N )e c 1 α2 4 m q 3 (l 1)(N )e c 1 α2 4 m = 1 q 3 l(n )e c 1 α2 4 m, which is what we wanted to prove. For the last probability be greater than 0 we must tae m 4 c 1 α 2 lnq 3l(N ). We can now state similar comments to those at the end of the proof of Theorem 5.2. We emphasize next corollary, that in the case α = 1 gives Theorem 6 of [31]. Corollary 5.7. Choose the same conditions as in Theorem 5.6, substituting (5.21) by m C α 2 ln(q 3N 2 /4) (5.40) with C large enough. Then, WOMP algorithm recovers the -sparse vector x in the first iterations with probability greater than or equal to 1 q 3 N 2 α 2 4 e c 1 4 m. 6. Some Experiments. In this section we present experiments on the sparse recovery and non sparse approximation problems for the orthogonal and gradient algorithms with the selection rule Ĩ = Ĩ(α) given by (3.1) and compare them against the results obtained with the more classical selection rule I = I(α) given by (2.6). In every set of experiments one Gaussian matrix was created of order m N. For some signals the results are shown as percentage of elements recovered and for others we use the signal-to-noise ratio of the energy of the original signal x R N and the energy of the difference between the signal and the approximation a, given by x SNR = 10log 10 ( 2 ). x a 2 Figure 1 shows the percentage of elements recovered with the RWOMP algorithm with parameter α = The Gaussian matrices generated have N = 256 and m = 10l, l = 1,2,...,25. The sparsity levels have been chosen to be = 4,12,20,28,36, (each graph corresponds to one of them). For each pair (m,), 200 experiments were run for different signals. The results can be compared with those obtained on Figure 1 of [31] for OMP (α = 1). The parameters N, and m tae the same value, but 1000 experiments for each set were performed in [31]. The results for RWOMP are better that those for OMP in [31] for = 20,28,36. Next, for computational purposes, we introduce a minor modification of the RWOMP algorithm, called -RWOMP algorithm. At each iteration in RWOMP we eep the -largest elements of the orthogonal approximation x n. Figure 2 shows the results of applying the -RWOMP with the same parameters of the experiments described for

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence