Hard Thresholding Pursuit Algorithms: Number of Iterations

Similar documents
Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and robustness of l 1 -minimizations with Weibull matrices and redundant dictionaries

Sparse Recovery from Inaccurate Saturated Measurements

The Pros and Cons of Compressive Sensing

Sparse solutions of underdetermined systems

SPARSE signal representations have gained popularity in recent

Introduction to Compressed Sensing

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

STAT 200C: High-dimensional Statistics

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Greedy Signal Recovery and Uniform Uncertainty Principles

The Pros and Cons of Compressive Sensing

List of Errata for the Book A Mathematical Introduction to Compressive Sensing

Reconstruction from Anisotropic Random Measurements

Analysis of Greedy Algorithms

Exponential decay of reconstruction error from binary measurements of sparse signals

Robust Sparse Recovery via Non-Convex Optimization

A Tutorial on Compressive Sensing. Simon Foucart Drexel University / University of Georgia

Two added structures in sparse recovery: nonnegativity and disjointedness. Simon Foucart University of Georgia

Sparsest Solutions of Underdetermined Linear Systems via l q -minimization for 0 < q 1

Stochastic geometry and random matrix theory in CS

Generalized Orthogonal Matching Pursuit- A Review and Some

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

Compressed Sensing and Sparse Recovery

Interpolation via weighted l 1 -minimization

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

GREEDY SIGNAL RECOVERY REVIEW

arxiv: v1 [math.na] 28 Oct 2018

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

A strongly polynomial algorithm for linear systems having a binary solution

Elaine T. Hale, Wotao Yin, Yin Zhang

Sensing systems limited by constraints: physical size, time, cost, energy

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Conditions for Robust Principal Component Analysis

THE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Geometry of log-concave Ensembles of random matrices

Sparse and Low Rank Recovery via Null Space Properties

An Introduction to Sparse Approximation

Sparse Legendre expansions via l 1 minimization

Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit

Jointly Low-Rank and Bisparse Recovery: Questions and Partial Answers

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

of Orthogonal Matching Pursuit

Recoverabilty Conditions for Rankings Under Partial Information

Course Notes: Week 1

Supremum and Infimum

Lecture 22: More On Compressed Sensing

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Optimization for Compressed Sensing

Strengthened Sobolev inequalities for a random subspace of functions

ACCORDING to Shannon s sampling theorem, an analog

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Douglas-Rachford splitting for nonconvex feasibility problems

Recent Developments in Compressed Sensing

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Least Sparsity of p-norm based Optimization Problems with p > 1

An algebraic perspective on integer sparse recovery

Sparse Recovery with Pre-Gaussian Random Matrices

Linear Convergence of Adaptively Iterative Thresholding Algorithms for Compressed Sensing

Interpolation via weighted l 1 minimization

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Recovering overcomplete sparse representations from structured sensing

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Toeplitz Compressed Sensing Matrices with. Applications to Sparse Channel Estimation

The Analysis Cosparse Model for Signals and Images

Robustly Stable Signal Recovery in Compressed Sensing with Structured Matrix Perturbation

Constrained optimization

The Dirichlet Problem for Infinite Networks

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

An asymptotic ratio characterization of input-to-state stability

Lecture Notes 9: Constrained Optimization

Generalized greedy algorithms.

Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

Numerical Analysis Lecture Notes

Cover Page. The handle holds various files of this Leiden University dissertation

X. Numerical Methods

arxiv: v2 [cs.it] 21 Mar 2012

A LOWER BOUND ON BLOWUP RATES FOR THE 3D INCOMPRESSIBLE EULER EQUATION AND A SINGLE EXPONENTIAL BEALE-KATO-MAJDA ESTIMATE. 1.

Rui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Sparsity Regularization

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Interpolation via weighted l 1 minimization

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

A Randomized Algorithm for the Approximation of Matrices

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

THE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX

Small Ball Probability, Arithmetic Structure and Random Matrices

Chapter Four Gelfond s Solution of Hilbert s Seventh Problem (Revised January 2, 2011)

Message passing and approximate message passing

Transcription:

Hard Thresholding Pursuit Algorithms: Number of Iterations Jean-Luc Bouchot, Simon Foucart, Pawel Hitczenko Abstract The Hard Thresholding Pursuit algorithm for sparse recovery is revisited using a new theoretical analysis. The main result states that all sparse vectors can be exactly recovered from compressive linear measurements in a number of iterations at most proportional to the sparsity level as soon as the measurement matrix obeys a certain restricted isometry condition. The recovery is also robust to measurement error. The same conclusions are derived for a variation of Hard Thresholding Pursuit, called Graded Hard Thresholding Pursuit, which is a natural companion to Orthogonal Matching Pursuit and runs without a prior estimation of the sparsity level. In addition, for two extreme cases of the vector shape, it is shown that, with high probability on the draw of random measurements, a fixed sparse vector is robustly recovered in a number of iterations precisely equal to the sparsity level. These theoretical findings are experimentally validated, too. Key words and phrases: compressive sensing, uniform sparse recovery, nonuniform sparse recovery, random measurements, iterative algorithms, hard thresholding. AMS classification: 65F, 65J2, 5A29, 94A2. Introduction This paper deals with the standard compressive sensing problem, i.e., the reconstruction of vectors x C N from an incomplete set of m N linear measurements organized in the form y = Ax C m for some matrix A C m N. It is now well known that if x is s-sparse i.e., has only s nonzero entries) and if A is a random matrix whose number m of rows scales like s times some logarithmic factors, then the reconstruction of x is achievable via a variety of methods. The l -minimization is probably the most popular one, but simple iterative algorithms do provide alternative methods. We consider here the hard thresholding pursuit HTP) algorithm [6] as well as a novel variation and we focus on the number of iterations needed for the reconstruction. This reconstruction is addressed in two settings: S. F. and J.-L. B. partially supported by NSF DMS-2622), P. H. by Simons Foundation grant 28766). e-mail: simon.foucart@centraliens.net

An idealized situation, where the vectors x C N are exactly sparse and where the measurements y C m are exactly equal to Ax. In this case, the exact reconstruction of x C N is targeted. A realistic situation, where the vectors x C N are not exactly sparse and where the measurements y C m contain errors e C m, i.e., y = Ax + e. In this case, only an approximate reconstruction of x C N is targeted. Precisely, the reconstruction error should be controlled by the sparsity defect and by the measurement error. The sparsity defect can be incorporated in the measurement error if y = Ax + e is rewritten as y = Ax S + e where S is an index set of s largest absolute entries of x and e := Ax S + e. We shall mainly state and prove our results in the realistic situation. They specialize to the idealized situation simply by setting e =. In fact, setting e = inside the proofs would simplify them considerably. Let us now recall that HTP) consists in constructing a sequence x n ) of s-sparse vectors, starting with an initial s-sparse x C N we take x = and iterating the scheme HTP ) HTP 2 ) S n := index set of s largest absolute entries of x n + A y Ax n ), x n := argmin{ y Az 2, suppz) S n }, until a stopping criterion is met. It had been shown [] that, in the idealized situation, exact reconstruction of every s-sparse x C N is achieved in s iterations of HTP) with y = Ax provided the coherence of the matrix A C m N satisfies µ < /3s) note that this condition can be fulfilled when m s 2 ). Exact and approximate reconstructions were treated in [6], where it was in particular shown that every s-sparse x C N is the limit of the sequence x n ) produced by HTP) with y = Ax provided the 3sth restricted isometry constant of the measurement matrix A C m N obeys δ 3s < / 3.577 note that this condition can be fulfilled when m s lnn/s)). As a reminder, the kth restricted isometry constant δ k of A is defined as the smallest constant δ such that δ) z 2 2 Az 2 2 + δ) z 2 2 for all k-sparse z C N. In fact, it was shown in [6] that the convergence is achieved in a finite number n of iterations. In the idealized situation, it can be estimated as ln 2/3 x 2 /ξ) 2δ3s 2 ) n, ρ 3s := ln/ρ 3s ) δ3s 2 <, ξ := min x j. j suppx) This paper establishes that the number of iterations can be estimated independently of the shape of x: under a restricted isometry condition, it is at most proportional to the sparsity s, exact arithmetic is assumed and, among several candidates for the index set of largest absolute entries, the smallest one in lexicographic order is always chosen 2

see Theorem 5 and a robust version in Theorem 6. This is reminiscent of the work of T. Zhang [5] on orthogonal matching pursuit OMP), see also [7, Theorem 6.25] where it is proved that n 2s provided that δ 3s < /6. However, HTP) presents a significant drawback in that a prior estimation of the sparsity s is required to run the algorithm, while OMP) does not although stopping OMP) at iteration 2s does require this estimation). We therefore consider a variation of HTP) avoiding the prior estimation of s. We call it graded hard thresholding pursuit GHTP) algorithm, because the index set has a size that increases with the iteration. Precisely, starting with x =, a sequence x n ) of n-sparse vectors is constructed according to GHTP ) GHTP 2 ) S n := index set of n largest absolute entries of x n + A y Ax n ), x n := argmin{ y Az 2, suppz) S n }, until a stopping criterion is met. For GHTP), too, it is established that a restricted isometry condition implies that the number of iterations needed for robust reconstruction is at most proportional to the sparsity s, see Theorem 8. One expects the number of GHTP) iterations needed for s-sparse recovery to be exactly equal to s, though. Such a statement is indeed proved, but in the nonuniform setting rather than in the uniform setting i.e., when successful recovery is sought with high probability on the draw of random measurements for a fixed x C N rather than for all x C N simultaneously) and in two extreme cases on the shape of the s-sparse vector to be recovered. The first case deals with vectors that are flat on their support: robust nonuniform recovery is guaranteed with m s lnn) Gaussian measurements, see Proposition 9. The second case deals with vectors that are decaying exponentially fast on their support: robust nonuniform recovery is guaranteed with m max{s, lnn)} Gaussian measurements or with m s lnn) Fourier measurements, see Proposition. This is an improvement attenuated by the extra shape condition) on uniform recovery results based on the restricted isometry property that require m s lnn/s) Gaussian measurements or currently) m s ln 4 N) Fourier measurements. To conclude, the HTP) and GHTP) algorithms are tested numerically and benchmarked against the OMP) algorithm. Their performance is roughly similar in terms of success rate, with different behaviors mostly explained by the shape of the vectors to be recovered. In terms of number of iterations, however, HTP) does perform best, owing to the fact that an a priori knowledge of the sparsity s is used to run the algorithm which also provides a convenient stopping criterion). It also seems that GHTP) always requires less iterations than OMP). The empirical observations that flat vectors are least favorable, coupled with the theoretical findings of Sections 3 and 5, support the beliefs that uniform recovery via HTP) is achievable in much less iterations than the upper bound cs of Theorem 5 and that nonuniform recovery via GHTP) is achievable in exactly s iterations whatever the shape of the vector. Finally, the numerical part ends with an experimental study of the influence of the measurement error. 3

Throughout the paper, we make use of the following notation: x R N + is the nonincreasing rearrangement of a vector x C N, i.e., x x 2... x N and there exists a permutation π of {,..., N} such that x j = x πj) for all j {,..., N}; S is the support or an index set of s largest absolute entries of a vector x C N ; x T is the restriction of a vector x C N to an index set T depending on the context, it is interpreted either as a vector in C T or as a vector in C N ; T is the complementary set of an index set T, i.e., T = {,..., N} \ T ; T T is the symmetric difference of sets T and T, i.e., T T = T \ T ) T \ T ); ρ s, τ s are quantities associated to the sth restricted isometry constant δ s < via 2δs 2) ρ s = 2 2 + δs δs 2, τ s = +. δ s δ s 2 Preliminary Results This section collects the key facts needed in the subsequent analyses of HTP) and GHTP). It also includes a discussion on appropriate stopping criteria. 2. Extensions of previous arguments The arguments used in [6] to prove the convergence of the sequence x n ) produced by HTP) in the idealized situation, say, relied on the exponential decay of x x n 2, precisely x x n 2 ρ 3s x x n 2, n, where ρ 3s < provided δ 3s < / 3. This followed from two inequalities derived as results of HTP ) and HTP 2 ), namely 3) x S n 2 2δ3s 2 x xn 2, x x n 4) 2 δ2s 2 x S n 2. For the analyses of HTP) and GHTP), similar arguments are needed in the more general form below. 4

. Consequence of HTP )-GHTP ). If x C N is s-sparse, if y = Ax + e for some e C m, if x C N is s -sparse, and if T is an index set of t s largest absolute entries of x + A y Ax ), then 5) x T 2 2δs+s 2 +t x x 2 + 2 A e ) S T 2. Indeed, with S := suppx), we first notice that x + A Ax x ) + A e ) T 2 2 x + A Ax x ) + A e ) S 2 2. Eliminating the contribution on S T gives x + A Ax x ) + A e ) T \S 2 x + A Ax x ) + A e ) S\T 2. The left-hand side satisfies x + A Ax x ) + A e ) T \S 2 = A A I)x x ) + A e ) T \S 2, while the right-hand side satisfies x + A Ax x ) + A e ) S\T 2 x S\T 2 A A I)x x ) + A e ) S\T 2. Therefore, we obtain x S\T 2 A A I)x x ) + A e ) T \S 2 + A A I)x x ) + A e ) S\T 2 2 A A I)x x ) + A e ) T S 2 6) 2 A A I)x x ) ) T S 2 + 2 A e ) T S 2 2 δ s+s +t x x 2 + 2 A e ) T S 2, where the last step used 3.8) of [6]. The resulting inequality is the one announced in 5). 2. Consequence of HTP 2 )-GHTP 2 ). If x C N is s-sparse, if y = Ax + e for some e C m, if T is an index set of size t, and if x is a minimizer of y Az 2 subject to suppz) T, then 7) x x 2 δs+t 2 x T 2 + A e ) δ T 2. s+t Indeed, the vector x is characterized by the orthogonality condition A y Ax ) ) T = in other words, by the normal equations), therefore x x ) T 2 2 = x x ) T, x x ) T = x x A y Ax ) ) T, x x ) T I A A)x x ) ) T, x x ) T + A e ) T, x x ) T 8) I A A)x x ) ) T 2 x x ) T 2 + A e ) T 2 x x ) T 2 δ s+t x x 2 x x ) T 2 + A e ) T 2 x x ) T 2, 5

where the last step used 3.8) of [6]. After dividing through by x x ) T 2, we obtain It then follows that x x ) T 2 δ s+t x x 2 + A e ) T 2. x x 2 2 = x x ) T 2 2 + x x ) T 2 2 δ s+t x x 2 + A e ) T 2) 2 + xt 2 2, in other words P x x 2 ), where the quadratic polynomial P is defined by P z) = δ 2 s+t)z 2 2δ s+t A e ) T 2 z A e ) T 2 2 x T 2 2. This implies that x x 2 is smaller than the largest root of P, i.e., δ s+t A e ) x x T 2 + A e ) T 2 2 + δ2 s+t ) x T 2 2 2 δs+t 2 + δ s+t ) A e ) T 2 + δs+t 2 ) x T 2 δs+t 2. This is the inequality announced in 7). Remark. The full restricted isometry property is not needed to derive 5) and 7). The restricted isometry constants only appear in 6) and 8), where I A A)x x ) ) U 2 is bounded for U = T S and U = T. This is easier to fulfill than the restricted isometry property if suppx), suppx ), and T are all included in a fixed index set, as will be the case in Section 5. From inequalities 5) and 7), we deduce the decay of the sequences x x n 2 for HTP) )n and x x n 2 for GHTP). )n s Corollary 2. Let x C N be s-sparse and let y = Ax + e for some e C m. If x n ) is the sequence produced by HTP), then 9) x x n 2 ρ 3s x x n 2 + τ 2s e 2 for any n. If x n ) is the sequence produced by GHTP), then ) x x n 2 ρ s+2n x x n 2 + τ s+n e 2 for any n s. Proof. We consider HTP) first. Applying 5) to x = x n and T = S n gives x S n 2 2δ3s 2 x xn 2 + 2 A e) S S n 2 and applying 7) to T = S n and x = x n gives x x n 2 δ2s 2 x S n 2 + A e) S n 2. δ 2s 6

Combining these two inequalities, while using δ 2s δ 3s, yields ) x x n 2δ3s 2 2 δ3s 2 x x n 2 2 + δ2s 2 + A e) S S n 2. δ 2s In view of A e) S S n 2 + δ 2s e 2 see e.g. 3.9) in [6]), we recognize 9) with the values of ρ 3s and τ 2s given in 2). Let us now consider GHTP). Applying 5) to x = x n and T = S n for n s gives x S n 2 2δs+2n 2 x xn 2 + 2 A e) S S n 2 and applying 7) to T = S n and x = x n gives x x n 2 δs+n 2 x S n 2 + A e) S n 2. δ s+n Combining these two inequalities, while using δ s+n δ s+2n, yields ) x x n 2δs+2n 2 2 δs+2n 2 x x n 2 2 + δs+n 2 + A e) S S n 2. δ s+n In view of A e) S S n 2 + δ s+n e 2, we recognize ) with the values of ρ s+2n and τ s+n given in 2). 2.2 Additional arguments While the above argument probed only the closeness of x n to x, the following lemmas examine the indices of nonzero entries of x that are correctly captured in the support sets produced by HTP) and GHTP). These lemmas are the main addition to the earlier proof techniques. They show how many iterations are necessary to increase the number of correct indices by a specified amount. The first lemma concerns HTP). Lemma 3. Let x C N be s-sparse and let S n ) be the sequence of index sets produced by HTP) with y = Ax + e for some e C m. For integers n, p, suppose that S n contains the indices of p largest absolute entries of x. Then, for integers k, q, S n+k contains the indices of p + q largest absolute entries of x, provided ) x p+q > ρ k 3s x {p+,...,s} 2 + κ 3s e 2, where the constant κ 3s depends only on δ 3s. For instance, δ 3s < /3 yields κ 3s 5.72. 7

Proof. Let π be the permutation of {, 2,..., N} for which x πj) = x j for all j {, 2,..., N}. Our hypothesis is π{,..., p}) S n and our aim is to prove that π{,..., p + q}) S n+k. In other words, we aim at proving that the x n+k +A y Ax n+k )) πj) for j {,..., p+q} are among the s largest values of x n+k + A y Ax n+k )) i for i {,..., N}. With suppx) S, cards) = s, it is the enough to show that 2) min x n+k + A y Ax n+k )) πj) > max x n+k + A y Ax n+k )) l. j {,...,p+q} l S We notice that, for j {,..., p + q} and l S, x n+k + A y Ax n+k )) πj) x πj) x + x n+k + A y Ax n+k )) πj) x p+q A A I)x x n+k ) + A e ), πj) x n+k + A y Ax n+k )) l = x + x n+k + A y Ax n+k )) l = A A I)x x n+k ) + A e ). l Therefore, 2) will be shown as soon as it is proved that, for all j {,..., p + q} and l S, 3) x p+q > A A I)x x n+k ) + A e ) + A A I)x x n+k ) + A e ). πj) l The right-hand side can be bounded by 2 I A A)x x n+k ) + A e ) {πj),l} 2 2 I A A)x x n+k ) ) {πj),l} 2 + 2 A e ) {πj),l} 2 δ 2s+2 x x n+k 2 + 2 + δ 2 e 2 2 δ 3s ρ k 3s x xn 2 + τ ) 2s e 2 + 2 + δ 2s ) e 2, ρ 3s where 9) was used k times in the last step. Using 7) and the assumption that π{,..., p}) S n, we derive 2 I A A)x x n+k ) + A e ) {πj),l} 2 δ 3s ρ k 3s ρ k 3s x π{,...,p}) 2 + δ 2 2s x S n 2 + ρ k 3s x {p+,...,s} 2 + κ 3s e 2, 2 2 δ3s ρ k 3s + δs + δ 2s where the constant κ 3s is determined by κ 3s = ) 2 A δ3s τ 2s e) S n 2 + + ) 2 + δ 2s ) e 2 δ 2s ρ 3s 2 δ3s τ 2s + ) 2 + δ 2s ) e 2 ρ 3s 2 + δ3s ) δ 3s + 2 δ3s τ 3s ρ 3s. Therefore, 3) is proved as soon as condition ) holds. The proof is completed by noticing that δ 3s /3 yields ρ 3s /2, τ 3s 2 3, and finally κ 3s 4/ 6. 8 2

For GHTP), an analog of Lemma 3 is needed in the form below. Lemma 4. Let x C N be s-sparse and let S n ) be the sequence of index sets produced by GHTP) with y = Ax + e for some e C m. For integers n s and p, suppose that S n contains the indices of p largest absolute entries of x. Then, for integers k, q, S n+k contains the indices of p + q largest absolute entries of x, provided 4) x p+q > ρ k s+2n+2k x {p+,...,s} 2 + κ s+2n+2k e 2, where the constant κ s+2n+2k depends only on δ s+2n+2k. Proof. Given the permutation π of {, 2,..., N} for which x πj) = x j for all j {,..., N}, our hypothesis is π{,..., p}) S n and our aim is to prove that π{,..., p + q}) S n+k. For this purpose, we aim at proving that the x n+k + A y Ax n+k )) πj) for j {,..., p + q} are among the n+k largest values of x n+k +A y Ax n+k )) i, i {,..., N}. Since n+k s, it is enough to show that they are among the s largest values of x n+k +A y Ax n+k )) i, i {,..., N}. The rest of the proof duplicates the proof of Lemma 3 modulo the change imposed by ) for the indices of δ and κ. 2.3 Stopping the HTP) and GHTP) algorithms The stopping criterion, which is an integral part of the algorithm, has not been addressed yet. We take a few moments to discuss this issue here. For HTP), the natural stopping criterion 2 is S n = S n, since the algorithm always produce the same vector afterwards. Moreover, under the condition δ 3s < / 3,. if S n = S n occurs, then robustness is guaranteed, since x n = x n substituted in 9) yields x x n 2 ρ 3s x x n 2 + τ 2s e 2, i.e., x x n 2 τ 2s ρ 3s e 2 ; 2. and S n = S n does occur provided e 2 < 4[τ 2s / ρ 3s )] x s. Indeed, for e 2 > 3 and for n large enough to have ρ n 3s x 2 [τ 2s / ρ 3s )] e 2, the inequality 9) applied n times implies that x x n 2 ρ n 3s x 2 + [τ 2s / ρ 3s )] e 2 2[τ 2s / ρ 3s )] e 2. In turn, for j suppx) and l suppx), the inequalities x n j x j x x n ) j x s 2[τ 2s / ρ 3s )] e 2 > 2[τ 2s / ρ 3s )] e 2, x n l = x x n ) l 2[τ 2s / ρ 3s )] e 2 show that S n = suppx). The same reasoning gives S n = suppx), hence S n = S n. 2 again, exact arithmetic is assumed and the lexicographic rule to break ties applies 3 in the case e =, we have x n = x for some n see )), hence x n = x, and in particular S n = S n 9

For GHTP), a stopping criterion not requiring any estimation of s or e 2 seems to be lost. It is worth recalling that iterative algorithms such as CoSaMP [2], IHT [2], or HTP [6] typically involve an estimation of s and that optimization methods such as quadratically constrained l -minimization typically involve an estimation of e 2 unless the measurement matrix satisfies the l -quotient property, see [4]. Here, we settle for a stopping criterion requiring an estimation of either one of s or e 2, precisely GHTP) is stopped either when n = 4s or when y Ax n 2 d e 2 for some constant d 2.83, say. Indeed, under the condition δ 9s < /3, robustness is guaranteed from Theorem 8 by the fact that x x n 2 d e 2 for some n 4s and some d 2.45. Thus, in the alternative n = 4s, x x n 2 ρ 4s n 9s x x n 2 + τ 9s ρ 9s e 2 d + τ 9s ρ 9s ) e 2 d e 2 and in the alternative y Ax n 2 d e 2 which is guaranteed to occur for some n n by virtue of y Ax n 2 + δ 9s x x n 2 + δ 9s d e 2 ), x x n 2 δ9s Ax x n ) 2 y Ax n 2 + e 2 ) d + e 2 d e 2, δ9s δ9s where d := max{d + τ 9s / ρ 9s ), d + )/ δ 9s } 9.38. We point out that, according to the result of [5], a similar stopping criterion involving either one of s or e 2 can be used for OMP). This makes the resemblance between GHTP) and OMP) even more compelling. Their empirical performances will be compared in Section 6. 3 Uniform Recovery via Hard Thresholding Pursuit This section is dedicated to the main results about HTP), namely that sparse recovery is achieved in a number of iterations at most proportional to the sparsity level, independently of the shape of the vector to be recovered. The results are uniform in the sense that the same measurement matrix allows for the recovery of all s-sparse vectors simultaneously. For clarity of exposition, we first state and prove a result in the idealized situation only the realistic situation is considered thereafter. Theorem 5. If the restricted isometry constant of the matrix A C m N obeys δ 3s 5, then every s-sparse vector x C N is recovered from the measurement vector y = Ax C m via a number n of iterations of HTP) satisfying n c s. The constant c 3 depends only on δ 3s. For instance, δ 3s /3 yields c 2.

Proof. Let π be the permutation of {, 2,..., N} for which x πj) = x j for all j {, 2,..., N}. Our goal is to prove that suppx) S n for n 3s, hence HTP 2 ) ensures that Ax n = Ax and x n = x follows from δ 3s <. For this purpose, we shall consider a partition Q Q 2... Q r, r s, of suppx) = π{,..., s}). Lemma 3 enables to argue repeatedly that, if Q... Q i has been correctly captured, then a suitable number k i of additional HTP) iterations captures Q... Q i Q i. The remaining step is just the estimation of k +... + k r. Let us set q = and let us define the sets Q,..., Q r inductively by 5) Q i = π{q i +,..., q i }), q i := maximum index q i + such that x q i > x q 2 i +. We notice that x q i + x q i + / 2 for all i {,..., r }. With the artificial introduction of Q = and k =, we now prove by induction on i {,..., r} that 6) Q Q... Q i S k +k +...+k i, where k,..., k r are defined, with ρ := ρ 3s <, as ln 2cardQi ) + cardq i+ )/2 + + cardq r )/2 r i ) ) 7) k i := ln/ρ 2 ) For i =, the result holds trivially. Next, if 6) holds for i, i {,..., r}, Lemma 3 with e = ) guarantees that 6) holds for i, provided 8) x q i ) 2 > ρ 2k i x Qi 2 2 + x Qi+ 2 2 + + x Qr 2 2). In view of x q i ) 2 > x q i + )2 /2 and of x Qi 2 2 + x Qi+ 2 2 + + x Qr 2 2 x q i +) 2 cardq i ) + x q i +) 2 cardq i+ ) + + x q r +) 2 cardq r ) x q i +) 2 cardq i ) + 2 cardq i+) + + ) 2 r i cardq r), we verify that condition 8) is fulfilled thanks to the definition 7) of k i. This concludes the inductive proof. We derive that the support π{,..., s}) of x is recovered in a number of iterations at most r r n = k i + ln 2cardQ i ) + cardq i+ )/2 + + cardq r )/2 r i ) ) ) ln/ρ 2 ) i= i= = r + r ln/ρ 2 ) r i= r ln 2cardQ i ) + cardq i+ )/2 + + cardq r )/2 r i ) ). Using the concavity of the logarithm, we obtain r ln/ρ 2 ) n r 2 ln cardqi ) + cardq i+ )/2 + + cardq r )/2 r i)) r r i= 2 = ln cardq ) + + /2)cardQ 2 ) + + + /2 + + /2 r )cardq r ) )) ln r 4 r cardq ) + cardq 2 ) + + cardq r ) )) ) 4s = ln. r.

Since ln4x)/x ln4) for x, we deduce that and we subsequently obtain n r + ln/ρ 2 ) n r s ln 4s/r) s/r ln4), ln4) ln/ρ 2 ) s + ln4) ) ln/ρ 2 s = ln4/ρ2 ) ) ln/ρ 2 ) s. This is the desired result with c = ln2 δ3s 2 )/δ2 3s )/ ln δ2 3s )/2δ2 3s )). We observe that δ 3s / 5 yields c ln8)/ ln2) = 3 and that δ 3s /3 yields c ln6)/ ln4) = 2. The proof is complete. The approach above does not allow for the reduction of the number of iterations below c s. Indeed, if the s-sparse vector x C N has nonzero entries x j = µ j for µ, /2), then each set Q i consists of one element, so there are s of these sets, and n = s i= k i s. However, such a vector whose nonincreasing rearrangement decays exponentially fast is highly compressible applying Theorem 6 turns out to be more appropriate see Remark 7). For other types of sparse vectors, the number of iterations can be significantly lowered. We give two examples below. In each of them, the partition involves a single set Q. This corresponds to invoking ). Sparse vectors whose nonincreasing rearrangement decays slowly. For s-sparse vectors x C N satisfying x j = /jα for all j {,..., s} and for some α flat vectors are included with the choice α = ), the number of iterations is at most proportional to lns). Indeed, we have and it follows from ) that x 2 2 x s) 2 = + + /s2α /s 2α s /s 2α = s2α+, n ln 2/3 s α+/2 ) c δ3s,α lns). ln/ρ 3s ) Gaussian vectors. In numerical experiments, the nonzero entries of an s-sparse vector g C N are often taken as independent standard Gaussian random variables g j, j S. In this case, with probability at least ε, the vector g is recovered from y = Ag with a number of HTP) iterations at most proportional to lns/ε). Indeed, we establish below that 9) 2) π P gs < 8 g 2 2 > 3π 2 P ) ε ε s 2, ) s ε ε 2. Thus, with failure probability at most ε, we have gs π/8 ε/s and g 2 3π/2 s/ε. It follows from ) that the number of iterations necessary to recover g satisfies ln2s/ε) 3/2 ) 3 ln2s/ε) n = c δ3s lns/ε). ln/ρ 3s ) 2 ln/ρ 3s ) 2

To obtain 9), with g denoting a standard Gaussian random variable and with t := π/8 ε/s, we write t Pgs exp u 2 /2) 2 < t) = P g j < t for some j S) s P g < t) = s du s 2π π t = ε/2. To obtain 2), with u, /4) to be chosen later and with t := 3π/2)s/ε, we write P g 2 2 > t) = P E exp u )) g 2 )) j E exp ugj 2 gj 2 j S j S 2u) /2 ) s > t = =. exput) exput) exput) j S Using 2u) /2 = + 2u/ 2u)) /2 + 4u) /2 exp2u) for u, /4), we derive )) 3π P g 2 2 > t) exp2us ut) = exp us 2ε 2 = ε 2, where we have chosen us = ln2/ε)/3π/2ε) 2), so that u us < /4 when ε < /2. We now state the extension of Theorem 5 to the realistic situation. We need to assume that the measurement error is not too large compared with the smallest nonzero absolute entry of the sparse vector such a condition quite common in the literature, see e.g. [, Theorem 3.2] or [9, Theorem 4.]) is further discussed in Subsection 6.3. Theorem 6. If the restricted isometry constant of the matrix A C m N obeys δ 3s 3, then the sequence x n ) produced by HTP) with y = Ax + e C m for some s-sparse x C N and some e C m with e 2 γ x s satisfies t x x n 2 d e 2, n c s. The constants c 3, d 2.45, and γ.79 depend only on δ 3s. Proof. Let π be the permutation of {, 2,..., N} for which x πj) = x j for all j {, 2,..., N}. As before, we partition suppx) = π{,..., s}) as Q Q 2... Q r, r s, where the Q i are defined as in 5). With Q = and k =, we prove by induction on i {,..., r} that Q Q... Q i S k +k +...+k i, where k,..., k r are defined, with ρ := ρ 3s <, by ln 6cardQi ) + cardq i+ )/2 + + cardq r )/2 r i ) ) 2) k i := ln/ρ 2 ) The induction hypothesis holds trivially for i =. Next, if it holds for i, i {,..., r}, then Lemma 3 guarantees that it holds for i, provided 22) x q i > ρ k i x Qi 2 2 + x Q i+ 2 2 + + x Q r 2 2 + κ 3s e 2. 3.

With the choice γ := 2 2 )/4κ 3s ), we have κ 3s e 2 κ 3s γx s / 2 /4)x q i +. Moreover, in view of x q i > x q i + / 2 and of x Qi 2 2 + x Q i+ 2 2 + + x Q r 2 2 x q i + cardq i ) + 2 cardq i+) + + 2 r i cardq r), we verify that 22) is fulfilled thanks to the definition 2) of k i. This concludes the induction. In particular, the support π{,..., s}) of x is included in S n, where n can be estimated using similar arguments as before) by n = r i= k i c s, c := ln6/ρ2 ) ln/ρ 2 ). It follows from HTP 2 ) that y Ax n 2 y Ax 2 = e 2. In turn, we derive that x x n 2 δ n Ax x n ) 2 ) 2 y Ax n 2 + e 2 e 2 =: d e 2. δcs δcs We note that δ 3s /3 yields c 3, then d 6, and finally γ 4 2) 3/56. Remark 7. When the vector x is not exactly s-sparse, we denote by S an index set of its s largest absolute entries. Writing y = Ax + e as y = Ax S + e with e := Ax S + e, Theorem 6 guarantees that, under the restricted isometry condition δ 3s /3, x x n 2 x S 2 + x S x n 2 x S 2 + d e 2, n c s, provided e 2 γ x s. Let us now assume that e = for simplicity. With S, S 2,... denoting index sets of s largest absolute entries of x S, of next s largest absolute entries of x S, etc., we notice that e 2 = Ax S 2 Ax Sk 2 + δ s x Sk 2. k In particular, vectors whose nonincreasing rearrangement decays as x j = µj for µ, /2) satisfy Ax S 2 c δs µ s+, so that e 2 γ x s holds for small enough µ. Taking x S 2 c µ s+ into account, too, the reconstruction bound becomes x x n 2 c δ s µ s+. As a matter of fact, the same bound is valid if s is replaced by a smaller value t even t = ), so the reconstruction error becomes quite satisfactory after a relatively small number c t of iterations. k 4 Uniform Recovery via Graded Hard Thresholding Pursuit This section contains the main result about GHTP) in the uniform setting, namely the fact that the reconstruction of all s-sparse vectors is achieved under a restricted isometry condition in a number of iterations at most proportional to s. This is comparable to the main result about HTP), but we recall the added benefits that GHTP) does not need an estimation of 4

the sparsity to run and that the stopping criterion involves an estimation of either one of the sparsity or the measurement error the only other such algorithm we are aware of is OMP). In passing, we note that in both uniform and nonuniform settings) the number of GHTP) iterations for s-sparse recovery cannot be less than s, since t < s iterations only produce t- sparse vectors. We prove the result directly in the realistic situation and we invite the reader to weaken the restricted isometry condition in the idealized situation. Theorem 8. If the restricted isometry constant of the matrix A C m N obeys δ 9s 3, then the sequence x n ) produced by GHTP) with y = Ax + e C m for some s-sparse x C N and some e C m with e 2 γ x s satisfies x x n 2 d e 2, n c s. The constants c 4, d 2.45, and γ.79 depend only on δ 9s. Proof. Let π be the permutation of {, 2,..., N} for which x πj) = x j for all j {, 2,..., N}. Our goal is to show that suppx) = π{,..., s}) S n for some n 4s. The arguments are very similar to the ones used in the proof of Theorem 5, except for the notable difference that the first s iterations are ignored. We still partition π{,..., s}) as Q Q 2... Q r, r s, where the Q i are defined as in 5), and we prove by induction on i {,..., r} that 23) Q Q... Q i S s+k +k +...+k i, where Q =, k =, and k,..., k r are defined, with ρ := ρ 9s /2, by ln 6cardQi ) + cardq i+ )/2 + + cardq r )/2 r i ) ) 24) k i := ln/ρ 2 ) Based on arguments already used, we observe that r k i ln4/ρ2 ) ln/ρ 2 ) s 3s. i= We now note that 23) holds trivially for i =. Next, if 23) holds for i, i {,..., r}, then Lemma 4 guarantees that 23) holds for i, provided 25) x q i > ρ k i x Qi 2 2 + x Q i+ 2 2 + + x Q r 2 2 + κ 9s e 2, since ρ s+2s+k +...+k i )+2k i ρ 9s. With γ := 2 2 )/4κ 9s ) 4 2) 3/56, we verify that 25) is fulfilled thanks to the definition 24) of k i. This concludes the induction. In particular, the support π{,..., s}) of x is included in S n with n = s + k +... + k r 4s. We derive that y Ax n 2 y Ax 2 = e 2 from HTP 2 ), and then x x n 2 δ n Ax x n ) 2 which is the desired result with d = 2/ δ 9s 6. ) 2 y Ax n 2 + e 2 e 2, δ4s δ9s. 5

5 Nonuniform Recovery via Graded Hard Thresholding Pursuit In this section, we demonstrate that s-sparse recovery via exactly s iterations of GHTP) can be expected in the nonuniform setting, i.e., given an s-sparse vector x C N, a random measurement matrix allows for the recovery of this specific x with high probability. Precisely, we obtain results in two extreme cases: when the nonincreasing rearrangement of the sparse vector does not decay much and when it decays exponentially fast. Proposition 9 covers the first case, suspected to be the least favorable one see Subsection 6.2. The result and its proof have similarities with the work of [3], which was extended to incorporate measurement error in [9]. The result is stated directly in this realistic situation, and it is worth pointing out its validity for all measurement errors e C m simultaneously. Proposition 9. Let λ and let x C N be an s-sparse vector such that x λ x s. If A R m N is a normalized) Gaussian random matrix with m Cs lnn), then it occurs with probability at least 2N c that, for all e C m with e 2 γ x s, the sequences S n ) and x n ) produced by GHTP) with y = Ax + e satisfy, at iteration s, S s = suppx) and x x s 2 d e 2. The constants γ and d depend only on λ, while the constant C depends on γ and c. Proof. The result is in fact valid if normalized) Gaussian random matrices are replaced by matrices with independent subgaussian entries with mean zero and variance /m. One part of the argument relies on the fact that, for a fixed index set S of size s, 26) P A SA S I 2 2 > δ) 2 exp c δ 2 m) provided m C s/δ 2, with c, C depending only on the subgaussian distribution see e.g. [7, Theorem 9.9], which leads to the improvement [7, Theorem 9.] of [, Theorem 5.2] in terms of dependence of constants on δ). Another part of the argument relies on the fact that, for a vector v C N and an index l {,..., N}, with a l denoting the lth column of the matrix A, 27) P a l, v > t v 2 ) 4 exp c t 2 m ) for a constant c depending only on the subgaussian distribution see e.g. [7, Theorem 7.27] in the case v R N ). With S := suppx), let us now define, for n {,..., s}, two random variables χ n and ζ n as χ n := [ x n + A y Ax n )) S ] n, ζ n := [ x n + A y Ax n )) S ], 6

i.e., χ n is the nth largest value of x n + A Ax x n )) j when j runs through S and ζ n is the largest value of x n + A Ax x n )) l when l runs through S. We are going to prove that, with high probability, S n S for all n {,..., s}, which follows from χ n > ζ n for all n {,..., s}. The failure probability of this event is P := P n {,..., s} : ζ n χ n and χ n > ζ n,..., χ > ζ )) ) 28) P A S {l} A S {l} I 2 2 > δ for some l S s ) 29) + P ζ n χ n, χ n > ζ n,..., χ > ζ ), A S {l} A S {l} I 2 2 δ for all l S), n= where the constant δ = δ λ is to be chosen later. According to 26), the term in 28) is bounded by 2N s) exp c δ 2 m) since a proper choice of C depending on λ through δ) guarantees that m C s + )/δ 2. Let us turn to the term in 29) and let us use the shorthand P E) to denote the probability of an event E intersected with the events χ n > ζ n,..., χ > ζ ) and A S {l} A S {l} I 2 2 δ for all l S), which are assumed to hold below. On the one hand, with T s n+ S representing an index set corresponding to s n + smallest values of x n + A y Ax n )) j when j runs through S, we notice that χ n s n + x n + A y Ax n )) T s n+ 2 s n + xt s n+ 2 A A I)x x n )) T s n+ 2 A e) T s n+ 2 ). Then, since the condition χ n > ζ n yields S n S, we obtain A A I)x x n )) T s n+ 2 A SA S I)x x n ) 2 δ x x n 2. We also use the inequality A e) T s n+ 2 + δ e 2 to derive χ n x T s n+ 2 δ x x n 2 ) + δ e 2. s n + On the other hand, since S n S, we have It follows that P ζ n χ n ) ζ n = max l S A y Ax n )) l max l S max a l, Ax x n ) + + δ e 2. l S P max a l, Ax x n ) l S P max a l, Ax x n ) δ ) x x n 2, l S s A Ax x n )) l + max A e) l l S ) xt s n+ 2 δ x x n ) 2 2 + δ e 2 s n + 7

where the last step will be justified by the inequality s n + x T s n+ 2 2 + δ e 2 2δ s n + x x n 2. In order to fulfill this inequality, we remark that x T s n+ 2 s n + x s, that e 2 γ x s, and, thanks to the extension of 7) mentioned in Remark, that x x n 2 x δ 2 S n 2 + s n + + δ δ A e) S n 2 x δ 2 + δ e 2 ) λ + δ γ s n + δ 2 x s + x s, δ so that it is enough to choose δ then γ depending on λ) small enough to have 2 ) λ + δ γ + δ γ 2δ +. δ 2 δ Taking Ax x n ) 2 + δ x x n 2 into account, we obtain 3) P ζ n χ n ) P max a l, Ax x n ) l S ) δ Ax x n ) 2. + δ s Let us now remark that the condition χ = [A A S x + A e) S ] > ζ = [A A S x + A e) S ] implies that the random set S corresponding to the largest value of A S A Sx + A S e) depends only on the random submatrix A S, so that the random vector x = A y depends only on A S S, too; then, since x is supported on S S, the condition χ 2 = [x + A A S x x ) + A e) S ] 2 > ζ 2 = [x + A A S x x ) + A e) S ] implies that the random set S2 corresponding to the two largest values of x S + A S A Sx x ) + A S e) depends only on the random submatrix A S, so that the random vector x 2 = A y depends only on A S 2 S, too; and so on until we infer that the random vector x n, supported on S n S, depends only on A S, and in turn so does Ax x n ). Exploiting the independence of Ax x n ) and of the columns a l for l S, the estimate 27) can be applied in 3) to yield P ζ n χ n ) l S P a l, Ax x n ) 4N s) exp c δ 2 ) m. + δ)s ) δ Ax x n ) 2 + δ s Altogether, the failure probability P satisfies P 2N s) exp c δ 2 m) + 4sN s) exp c δ 2 ) m + δ)s 2N ) exp c δ 2 m) + N 2 exp c δ 2 ) m. + δ)s 8

In particular, we have χ s > ζ s, and in turn S s = S, with failure probability at most P. Then, since A S A S I 2 2 δ with failure probability given in 26), we also have x x s 2 δ Ax x s ) 2 δ y Ax s 2 + y Ax 2 ) 2 δ e 2, which is the desired result with d = 2/ δ. The resulting failure probability is bounded by 2N exp c δ 2 m) + N 2 exp c δ 2 ) m 2N 2 exp + δ)s c δ m s ) 2N c, where the last inequality holds with a proper choice of the constant C in m Cs lnn). The previous result is limited by its nonapplicability to random partial Fourier matrices. For such matrices, the uniform results of the two previous sections or any recovery result based on the restricted isometry property do apply, but the restricted isometry property is only known to be fulfilled with a number of measurements satisfying m cs ln 4 N). This threshold can be lowered in the nonuniform setting by reducing the power of the logarithm. This is typically achieved via l -minimization see [4, Theorem.3], [3, Theorem.], or [8, V 3]), for which the successful recovery of a vector does not depend on its shape but only on its sign pattern. We establish below a comparable result for GHTP): fewer measurements guarantee the high probability of nonuniform s-sparse recovery via exactly s iterations of GHTP). There is an additional proviso on the vector shape, namely its nonincreasing rearrangement should decay exponentially fast. This is admittedly a strong assumption, but possible improvements in the proof strategy perhaps incorporating ideas from Proposition 9) suggest that the result could be extended beyond this class of sparse vectors for Fourier measurements. Incidentally, for this class of vectors, it is striking that the number of Gaussian measurements can be reduced even below the classical reduction see [5, Theorems.2 and.3]) of the constant c in m cs lnn/s) when passing from the uniform to the nonuniform setting. Proposition. Let µ, ) and let x C N be an s-sparse vector such that x j+ µ x j for all j {,..., s }. If A R m N is a normalized) Gaussian random matrix with m max{cs, C lnn)}, or if A C m N is a normalized) random partial Fourier matrix with m C s lnn), then it occurs with probability at least 2N c that, for all e C m with e 2 γ x s, the sequences S n ) and x n ) produced by GHTP) with y = Ax + e satisfy, at iteration s, S s = suppx) and x x s 2 d e 2. The constants C, γ, and d depend only on µ, while the constants C and C depend on µ and c. 9

Proof. The result would also be valid if normalized) Gaussian random matrices were replaced by normalized) subgaussian random matrices and if normalized) random partial Fourier matrices were replaced by normalized) random sampling matrices associated to bounded orthonormal systems. We recall see [7, chapter 2] for details) that a bounded orthonormal system is a system φ,..., φ N ) of functions, defined on a set D endowed with a probability measure ν, that satisfy, for some constant K, 3) sup φ j t) K and φ k t)φ l t)dνt) = δ k,l t D for all j, k, l {,..., N}. The normalized random sampling matrix A C m N associated to φ,..., φ N ) has entries A k,l = m φ l t k ), k {,..., m}, l {,..., N}, D where the sampling points t,..., t m D are selected independently at random according to the probability measure ν. For instance, the random partial Fourier matrix corresponds to the choices D = {,..., N}, νt ) = cardt )/N for all T {,..., N}, and φ k t) = expi2πkt), in which case K =. The argument here relies on the fact see [7, Theorem 2.2]) that, for a fixed index set S of size s, ) 32) P A SA S I 2 2 > δ) 2s exp 3δ2 m 8K 2. s Let π be the permutation of {, 2,..., N} for which x πj) = x j for all j {, 2,..., N} and let Π n denote the set π{, 2,..., n}) for each n {,..., N}. Let δ = δ µ > be chosen small enough to have 2δ/ δ 2 µ) µ 2 /3 for reason that will become apparent later. Invoking 26) and 32) for the sets Π s {k}, k Π s, ensures that 33) A Π s {k} A Π s {k} I 2 2 δ. In the Gausian case, provided m C µ s, this holds with failure probability at most 2N s) exp c µ m) 2N exp c µ C µ,c lnn)) 2N c where C µ,c is chosen large enough. In the Fourier case, this holds with failure probability at most 2N s)s exp c ) µm 2N 2 exp c µ C µ,c lnn)) 2N c s where C µ,c is chosen large enough. We are going to prove by induction that S n = Π n for all n {,,..., s}. The result holds trivially for n =. To increment the induction from n to n, n {,..., s}, we need to show that min j=,...,n xn + A y Ax n )) πj) > max l=n+,...,n xn + A y Ax n )) πl). 2

We observe that for j {,..., n} and l {n +,..., N}, x n + A y Ax n )) πj) x πj) A A I)x x n ) + A e) πj) x n A A I)x x n )) πj) A e) πj), x n + A y Ax n )) πl) x πl) + A A I)x x n ) + A e) πl) x n+ + A A I)x x n )) πl) + A e) πl). Hence, it is enough to prove that, for all j {,..., n} and l {n +,..., N}, 34) x n > x n++ A A I)x x n )) πj) + A e) πj) + A A I)x x n )) πl) + A e) πl). For any k {,..., N}, using the the induction hypothesis and 33), we obtain 35) A A I)x x n )) k A Π s {k} A Π s {k} I)x x n ) 2 δ x x n 2. Moreover, in view of the extension of 7) mentioned in Remark, we have 36) x x n 2 Combining 35) and 36) gives, for any k {,..., N}, A A I)x x n )) k + A e) k x δ 2 Π n 2 + δ A e) Πn 2. δ x δ 2 Π n 2 + δ δ A e) Πn 2 + A e) k δ δ 2 µ 2 x n + δ A e) Πs {k} 2, where we have used x Πn 2 2 x n) 2 + µ 2 + + µ 2 ) s n ) x n) 2 / µ 2 ) in the last step. Using also the estimate imposed by 33) for the maximal singular value of A Πs {k} in A e) Πs {k} 2 + δ e 2 + δ γ x s + δ γ x n, we arrive at ) A A I)x x n )) k + A δ + δ γ e) k δ 2 µ + x 2 δ n. Taking the fact that x n+ µ x n into account, we observe that 34) is fulfilled as soon as 2δ > µ + δ 2 µ + 2 + δ γ. 2 δ The latter holds with the choice of δ made at the beginning and then with an appropriate choice of γ depending only on µ). At this point, we have justified inductively that S n = Π n for all n {,..., s}, hence in particular that S s = Π s = suppx). To conclude the proof, we remark that GHTP 2 ) yields y Ax s 2 y Ax 2 = e 2 and we also use the estimate imposed by 33) for the minimal singular value of A Πs to derive x x s 2 δ Ax x s ) 2 The constant d := 2/ δ depends only on µ. δ y Ax s 2 + y Ax 2 ) 2 δ e 2. 2

6 Numerical experiments This section provides some empirical validation of our theoretical results. We focus on the performances of HTP), GHTP), and its natural companion OMP). All the tests were carried out for Gaussian random matrices A R m N with m = 2 and N =. We generated such random matrices. To assess the influence of the decay in the nonincreasing rearrangement of s-sparse vectors x R N to be recovered, we tested flat vectors with x j = for j {,..., s}, linear vectors with x j = s + j)/s for j {,..., s}, and Gaussian vectors whose s nonzero entries are independent standard normal random variables. We generated such vectors per matrix. All these experiments can be reproduced by downloading the associated MATLAB files from the authors webpages. These files contain supplementary numerical tests, e.g. one involving Fourier submatrices to illustrate Proposition. 6. Frequency of successful recoveries We present here a comparison between HTP), GHTP), and OMP) in terms of successful recovery this complements the experiments of [6] where HTP) was compared to basis pursuit, compressive sampling matching pursuit, iterative hard thresholding, and normalized iterative hard thresholding. We have also included a comparison with a version of HTP), denoted HTP ), where the number of largest absolute entries kept at each iteration is 2s instead of s. The natural stopping criterion S n = S n was used for HTP) and HTP ), and we used the criterion 4 [suppx) S n or x x n 2 / x 2 < 4 ] for GHTP) and OMP). 9 8 HTP GHTP OMP HTP 9 8 HTP GHTP OMP HTP 9 8 HTP GHTP OMP HTP Percentage of success 7 6 5 4 3 Percentage of success 7 6 5 4 3 Percentage of success 7 6 5 4 3 2 2 2 2 3 4 5 6 7 8 9 Sparsity 2 3 4 5 6 7 8 9 Sparsity 2 3 4 5 6 7 Sparsity a) Gaussian vectors b) Linear vectors c) Flat vectors Figure : Frequency of successful recoveries for HTP), GHTP), OMP), and HTP ) Gaussian measurements using Figure suggests that no algorithm consistently outperforms the other ones, although GHTP) is always marginally better than OMP). Moreover, HTP ) underperforms in two cases, possibly because theoretical requirements such as restricted isometries properties are harder to fulfill when 2s replaces s. The shape of the vector to be recovered has a strong influence on the success rate, in contrast with basis pursuit whose success depend only on the sign pattern. 4 the true sparse solution is available in this experiment, unlike in a practical scenario 22

6.2 Number of iterations and indices correctly captured We now focus on the information provided by our experiments in terms of number of iterations needed for the recovery of a sparse vector and in terms of number of indices from the true support correctly captured in the active sets as a function of the iteration count. First, Figure 2 displays the maximum over all our trials for the number ns) of iterations needed for s-sparse recovery as a function of s in the sparsity regions where recovery is certain see Figure ). With this indicator, HTP) is the best algorithm thanks to the fact that a prior knowledge of s is exploited. For this algorithm, no notable difference seems to result from the vector shape but remember that it did influence the success rate), so the upper bound ns) c s from Theorem 5 is likely to be an overestimation to be replaced by the bound ns) c lns), which is known to be valid for flat vectors. As for GHTP) and OMP), we observe that the vector shape influences the regions predicted by Propositions 9 and ) of s-values for which ns) = s and that GHTP) generally requires fewer iterations than OMP), probably thanks to the fact that incorrectly selected indices can be rejected at the next GHTP) iteration however, this complicates speed-ups based on QR-updates). Maximum number of iterations 8 7 6 5 4 3 2 HTP GHTP OMP Maximum number of iterations 8 7 6 5 4 3 2 HTP GHTP OMP Maximum number of iterations 8 7 6 5 4 3 2 HTP GHTP OMP 5 5 2 25 3 35 4 45 5 Sparsity a) Gaussian vectors 5 5 2 25 3 35 4 45 Sparsity b) Linear vectors 5 5 2 25 3 Sparsity c) Flat vectors Figure 2: Number of iterations for HTP), GHTP), and OMP) using Gaussian measurements Next, Figures 3, 4, and 5 display the number in) of indices from the true support that are correctly captured in the active set S n as a function of the iteration count n. We point out that these figures do not reflect the worst case anymore, but instead they present an average over all successful recoveries. Once again, HTP) is the superior algorithm because several correct indices can be captured at once. For GHTP) and OMP), the ideal situation corresponds to the selection of correct indices only, i.e., in) = n for all n s. This seems to occur for small sparsities, with larger regions of validity for vectors whose nonincreasing rearrangements decay quickly. This is an encouraging fact since in) = n has been justified for all n s in the proof of Proposition 9 for the least favorable case of flat vectors. For these vectors, Figure 5 also reveals that, even in the region where s-sparse recovery via GHTP) and OMP) is certain, the recovery is sometimes achieved in more than s iterations. 23

Number of correct indices 5 45 4 35 3 25 2 5 s=5 s=45 s=4 s=35 s=3 s=25 s=2 Gaussian vectors HTP Number of correct indices 5 45 4 35 3 25 2 5 s=5 s=45 s=4 s=35 s=3 s=25 s=2 Gaussian vectors GHTP Number of correct indices 5 45 4 35 3 25 2 5 s=5 s=45 s=4 s=35 s=3 s=25 s=2 Gaussian vectors OMP 5 5 5 5 5 2 Number of iterations 2 3 4 5 6 Number of iterations 2 3 4 5 6 Number of iterations a) Recovery via HTP) b) Recovery via GHTP) c) Recovery via OMP) Figure 3: Number of support indices of Gaussian vectors correctly captured by the active set as a function of the iteration count for HTP), GHTP), and OMP) using Gaussian measurements 5 45 4 s=45 s=4 s=35 s=3 s=25 s=2 Linear vectors HTP 5 45 4 s=45 s=4 s=35 s=3 s=25 s=2 Linear vectors GHTP 5 45 4 s=45 s=4 s=35 s=3 s=25 s=2 Linear vectors OMP Number of correct indices 35 3 25 2 5 Number of correct indices 35 3 25 2 5 Number of correct indices 35 3 25 2 5 5 5 5 5 5 2 Number of iterations 2 3 4 5 6 Number of iterations 2 3 4 5 6 Number of iterations a) Recovery via HTP) b) Recovery via GHTP) c) Recovery via OMP) Figure 4: Number of support indices of linear vectors correctly captured by the active set as a function of the iteration count for HTP), GHTP), and OMP) using Gaussian measurements 5 45 4 s=45 s=4 s=35 s=3 s=25 s=2 Flat vectors HTP 5 45 4 s=45 s=4 s=35 s=3 s=25 s=2 Flat vectors GHTP 5 45 4 s=45 s=4 s=35 s=3 s=25 s=2 Flat vectors OMP Number of correct indices 35 3 25 2 5 Number of correct indices 35 3 25 2 5 Number of correct indices 35 3 25 2 5 5 5 5 5 5 2 Number of iterations 2 3 4 5 6 Number of iterations 2 3 4 5 6 Number of iterations a) Recovery via HTP) b) Recovery via GHTP) c) Recovery via OMP) Figure 5: Number of support indices of flat vectors correctly captured by the active set as a function of the iteration count for HTP), GHTP), and OMP) using Gaussian measurements 24

Finally, Figure 6 shows the effect of overestimating the sparsity level as 2s instead of s in each hard thresholding pursuit iteration. We notice a slight improvement for small sparsities, as more correct indices can be captured at each iteration, but a degradation for larger sparsities, again possibly due to harder theoretical requirements. Number of correct indices 5 45 4 35 3 25 2 5 s=5 s=45 s=4 s=35 s=3 s=25 s=2 Gaussian vectors HTP Number of correct indices 5 45 4 35 3 25 2 5 s=45 s=4 s=35 s=3 s=25 s=2 Linear vectors HTP Number of correct indices 5 45 4 35 3 25 2 5 s=45 s=4 s=35 s=3 s=25 s=2 Flat vectors HTP 5 5 5 5 5 2 Number of iterations 5 5 2 Number of iterations 5 5 2 Number of iterations a) Recovery of Gaussian vectors b) Recovery of linear vectors c) Recovery of flat vectors Figure 6: Number of support indices correctly captured by the active set as a function of the iteration count for HTP ) using Gaussian measurements 6.3 Robustness to measurement error Our final experiment highlights the effect of the measurement error e C m on the recovery via HTP), GHTP), and OMP) of sparse vectors x C N acquired through y = Ax + e C m. For a sparsity set to 25, with other parameters still fixed at m = 2 and N =, the test was carried out on linear and flat s-sparse vectors. For an error level η varying from to 5 by increment of., we generated Gaussian matrices A, then instances of a random support S and a Gaussian noise vector e normalized by e 2 = η x s, and we run the three algorithms on the resulting measurement vectors per error level. After the algorithms exited, we averaged the number of indices from the true support correctly caught, as well as the reconstruction error measured in l 2 -norm. Under restricted isometry conditions, the proofs of Theorems 6 and 8 guarantee that all indices are correctly caught provided η is below some threshold η. This phenomenon is apparent in Figure 7 abscissa: η = e 2 /x s, ordinate: cards S ), where S is the support of the output x of the algorithm). The figure also shows that the threshold is approximately the same for the three algorithms and that it is higher for flat vectors. For linear vectors, the value η. does not differ too much from the value γ.79 obtained in Theorems 6 and 8. Once the threshold is passed, we also notice that HTP) catches fewer correct indices than GHTP) and OMP), probably due to the fact that it produces index sets of smaller size. In terms of reconstruction error, Figure 8 abscissa: η = e 2 /x s, ordinate: x x 2 /x s) displays transitions around the same values η as Figure 7. Although the ratio x x 2 / e 2 of reconstruction error by measurement error increases after the threshold, it does not rule out a robustness estimate of the type x x 2 d e 2 valid for all e C m with an absolute constant d this is established in [6] for HTP) under 25