Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Similar documents
Randomized Recovery for Boolean Compressed Sensing

Weighted- 1 minimization with multiple weighting sets

Sharp Time Data Tradeoffs for Linear Inverse Problems

Support recovery in compressed sensing: An estimation theoretic approach

Feature Extraction Techniques

Block designs and statistics

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

Non-Parametric Non-Line-of-Sight Identification 1

Hamming Compressed Sensing

On the Use of A Priori Information for Sparse Signal Approximations

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Lower Bounds for Quantized Matrix Completion

Highly Robust Error Correction by Convex Programming

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

SPECTRUM sensing is a core concept of cognitive radio

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

A note on the multiplication of sparse matrices

COS 424: Interacting with Data. Written Exercises

A Simple Homotopy Algorithm for Compressive Sensing

A Simple Regression Problem

Recovering Block-structured Activations Using Compressive Measurements

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Interactive Markov Models of Evolutionary Algorithms

Lecture 20 November 7, 2013

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

Pattern Recognition and Machine Learning. Artificial Neural networks

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

Compressive Sensing Over Networks

On the theoretical analysis of cross validation in compressive sensing

Introduction to Machine Learning. Recitation 11

A Probabilistic and RIPless Theory of Compressed Sensing

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION Vol. IX Uncertainty Models For Robustness Analysis - A. Garulli, A. Tesi and A. Vicino

An RIP-based approach to Σ quantization for compressed sensing

Distributed Subgradient Methods for Multi-agent Optimization

Lecture 21. Interior Point Methods Setup and Algorithm

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Exact tensor completion with sum-of-squares

3.3 Variational Characterization of Singular Values

Computational and Statistical Learning Theory

Boosting with log-loss

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

A Generalized Restricted Isometry Property

Detection and Estimation Theory

Principal Components Analysis

Pattern Recognition and Machine Learning. Artificial Neural networks

arxiv: v5 [cs.it] 16 Mar 2012

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

arxiv: v1 [cs.ds] 17 Mar 2016

On Conditions for Linearity of Optimal Estimation

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Polygonal Designs: Existence and Construction

Efficient Filter Banks And Interpolators

Testing equality of variances for multiple univariate normal populations

Using a De-Convolution Window for Operating Modal Analysis

Fairness via priority scheduling

Weighted Superimposed Codes and Constrained Integer Compressed Sensing

Hybrid System Identification: An SDP Approach

Physics 215 Winter The Density Matrix

A Nonlinear Sparsity Promoting Formulation and Algorithm for Full Waveform Inversion

Computational and Statistical Learning Theory

Kernel Methods and Support Vector Machines

Ch 12: Variations on Backpropagation

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

Lecture 9 November 23, 2015

Pattern Recognition and Machine Learning. Artificial Neural networks

RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

The linear sampling method and the MUSIC algorithm

On Constant Power Water-filling

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

CS Lecture 13. More Maximum Likelihood

Estimating Parameters for a Gaussian pdf

Complexity reduction in low-delay Farrowstructure-based. filters utilizing linear-phase subfilters

Fixed-to-Variable Length Distribution Matching

CHAPTER 19: Single-Loop IMC Control

PAC-Bayes Analysis Of Maximum Entropy Learning

Identical Maximum Likelihood State Estimation Based on Incremental Finite Mixture Model in PHD Filter

Multi-Scale/Multi-Resolution: Wavelet Transform

In this chapter, we consider several graph-theoretic and probabilistic models

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

An Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

MEASUREMENT MATRIX DESIGN FOR COMPRESSIVE SENSING WITH SIDE INFORMATION AT THE ENCODER

Recovery of Sparsely Corrupted Signals

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

arxiv: v3 [quant-ph] 18 Oct 2017

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Chapter 6 1-D Continuous Groups

arxiv: v1 [cs.ds] 3 Feb 2014

Transcription:

1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer Engineering Rice University Houston TX 77005 2 Dept. of Electrical Engineering Colubia University New York NY 10027 3 Dept. of Electrical and Coputer Engineering University of Wisconsin Madison WI 53706 Abstract The recently-proposed theory of distilled sensing establishes that adaptivity in sapling can draatically iprove the perforance of sparse recovery in noisy settings. In particular it is now known that adaptive point sapling enables the detection and/or support recovery of sparse signals that are otherwise too weak to be recovered using any ethod based on non-adaptive point sapling. In this paper the theory of distilled sensing is extended to highly-undersapled regies as in copressive sensing. A siple adaptive sapling-and-refineent procedure called copressive distilled sensing is proposed where each step of the procedure utilizes inforation fro previous observations to focus subsequent easureents into the proper signal subspace resulting in a significant iproveent in effective easureent SNR on the signal subspace. As a result for the sae budget of sensing resources copressive distilled sensing can result in significantly iproved error bounds copared to those for traditional copressive sensing. I. INTRODUCTION Let x R n be a sparse vector supported on the set S = {i : x i 0} where S = s n and consider observing x according to the linear observation odel y = Ax + w 1 where A is an n real-valued atrix possibly rando that satisfies E [ ] A 2 iid F n and where wi N 0 σ 2 for soe σ 0. This odel is central to the eerging field of copressive sensing CS which deals priarily with recovery of x in highly-underdeterined settings that is where the nuber of easureents n. Initial results in CS establish a rather surprising result using certain observation atrices A for which the nuber of rows is a constant ultiple of s log n it is possible to recover x exactly fro {y A} and in addition the recovery can be accoplished by solving a tractable convex optiization [1] [3]. Matrices A for which this exact recovery is possible are easy to construct in practice. For exaple atrices whose entries are i.i.d. realizations of certain zero-ean distributions Gaussian syetric Bernoulli etc. are sufficient to allow this recovery with high probability [2] [4]. In practice however it is rarely the case that observations are perfectly noise-free. In these settings rather than attept This work was partially supported by the ARO grant no. W911NF-09-1- 0383 the NSF grant no. CCF-0353079 and the AFOSR grant no. FA9550-09-1-0140. to recover x exactly the goal becoes to estiate x to high accuracy in soe etric such as l 2 nor [5] [6]. One such estiation procedure is the Dantzig selector proposed in [6] which establishes that CS recovery reains stable in the presence of noise. We state the result here as a lea. Lea 1 Dantzig selector. For = Ωs log n generate a rando n atrix A whose entries are i.i.d. N 0 1/ and collect observations y according to 1. The estiate x = arg in z R n z l 1 subject to A T y Az l < λ where λ = Θσ log n satisfies x x 2 l 2 = Osσ 2 log n with probability 1 On C 0 for soe constant C 0 > 0. Reark 1. The constants in the above can be specified explicitly or bounded appropriately but we choose to present the results here and where appropriate in the sequel in ters of scaling relationships 1 in the interest of siplicity. On the other hand suppose that an oracle were to identify the locations of the nonzero signal coponents or equivalently the support S prior to recovery. Then one could construct the least-squares estiate x LS = A T S A S 1 A T S y where A S denotes the subatrix of A fored fro the coluns indexed by the eleents of S. The error of this estiate is x LS x 2 l 2 = Osσ 2 with probability 1 On C 1 for soe C 1 > 0 as shown in [6]. Coparing this oracleassisted bound with the result of Lea 1 we see that the priary difference is the presence of the logarithic ter in the error bound of the latter which can be interpreted as the searching penalty associated with having to learn the correct signal subspace. Of course the signal subspace will rarely if ever be known a priori. But suppose that it were possible to learn the signal subspace fro the data in a sequential adaptive fashion as the data are collected. In this case sensing energy could be focused only into the true signal subspace gradually iproving the effective easureent SNR. Intuitively one ight expect that this type of procedure could ultiately yield an estiate whose accuracy would be closer to that of 1 Recall that for functions f = fn and g = gn f = Og eans f cg for soe constant c for all n sufficiently large f = Ωg eans f c g for a constant c for all n sufficiently large and f = Θg eans that f = Og and f = Ωg. In addition we will use the notation f = og to indicate that li n f/g = 0.

2 the oracle-assisted estiator since the effective observation atrix would begin to assue the structure of A S. Such adaptive copressive sapling ethods have been proposed and exained epirically [7] [9] but to date the perforance benefits of adaptivity in copressive sapling have not been established theoretically. In this paper we take a step in that direction by analyzing the perforance of a ulti-step adaptive saplingand-refineent procedure called copressive distilled sensing CDS extending our own prior work in distilled sensing where the theoretical advantages of adaptive sapling in uncopressed settings were quantified [10] [11]. Our ain results here guarantee that for signals having not too any nonzero entries and for which the dynaic range is not too large a total of Os log n adaptively-collected easureents yield an estiator that with high probability achieves the Osσ 2 error bound of the oracle-assisted estiator. The reainder of the paper is organized as follows. The CDS procedure is described in Sec. II and its perforance is quantified as a theore in Sec. III. Extensions and conclusions are briefly described in Sec. IV and a sketch of the proof of the ain result and associated leata appear in the Appendix. Algorith 1: Copressive distilled sensing CDS. Input: Nuber of observation steps k; R j j = 1... k such that k j=1 Rj n; j j = 1... k such that k j=1 j ; Initialize: Initial index set I 1 = {1 2... n}; Distillation: for j = 1 to k do Copute τ j = R j / I j ; Construct { A j where A j N 0 τ j j uv iid u {1... j } v I j ; 0 u {1... j } v / I j Collect y j = A j x + w j ; Copute x j = A j T y j ; Refine I j+1 = {i I j : x j i > 0}; end Output: Distilled observations { y j A j} k j=1 ; II. COMPRESSIVE DISTILLED SENSING In this section we describe the copressive distilled sensing CDS procedure which is a natural generalization of the distilled sensing DS procedure [10] [11]. The CDS procedure given in Algorith 1 is an adaptive procedure coprised of an alternating sequence of sapling or observation steps and refineent or distillation steps and for which the observations are subject to a global budget of sensing resources or sensing energy that effectively quantifies the average easureent precision. The key point is that the adaptive nature of the procedure allows for sensing resources to be allocated nonuniforly; in particular proportionally ore of the resources can be devoted to subspaces of interest as they are identified. In the jth sapling step for j = 1... k we collect easureents only at locations of x corresponding to indices in a set I j where I 1 = {1... n} initially. The jth refineent step for j = 1... k 1 identifies the set of locations I j+1 I j for which the corresponding signal coponents are to be easured in step j + 1. It is clear that in order to leverage the benefit of adaptivity the distillation step should have the property that I j+1 contains ost or ideally all of the indices in I j that correspond to true signal coponents. In addition and perhaps ore iportantly we also want the set I j+1 to be significantly saller than I j since in that case we can realize an SNR iproveent fro focusing our sensing resources into the appropriate subspace. In the DS procedure exained in [10] [11] observations were in the for of noisy saples of x at any location i {1... n} at each step j. In that case it was shown a siple refineent operation identifying all locations for which the corresponding observation exceeded a threshold was sufficient to ensure that with high probability I j+1 would contain ost of the indices in I j corresponding to true signal coponents but only about half of the reaining indices even when the signal is very weak. On the other hand here we utilize a copressive sensing observation odel where at each step the observations are in the for of a lowdiensional vector y R with n. In an attept to iic the uncopressed case here we propose a siilar refineent step applied to the back-projection estiate A j T y j = x j R n which can essentially be thought of as one of any possible estiates or reconstructions of x that can be obtained fro y j and A j. The results in the next section quantify the iproveents that can be achieved using this approach. III. MAIN RESULTS To state our ain results we set the input paraeters of Algorith 1 as follows. Choose α 0 1/3 let b = 1 α/1 2α and let k = 1 + log b log n. Allocate sensing resources according to R j = { αn 1 2α 1 α αn j 1 j = 1... k 1 j = k and note that this allocation guarantees that R j+1 /R j > 1/2 and k j=1 Rj n. The latter inequality ensures that the total sensing energy does not exceed the total sensing energy used in conventional CS. The nuber of easureents acquired in each step is { } j ρ0 s log n/k 1 j = 1... k 1 = ρ 1 s log n j = k for soe constants ρ 0 which depends on the dynaic range and ρ 1 sufficiently large so that the results of Lea 1 hold. Note that = Os log n the sae order as the iniu nuber of easureents required by conventional CS. }

3 Our ain result of the paper stated below and proved in the Appendix quantifies the error perforance of one particular estiate obtained fro adaptive observations collected using the CDS procedure. Theore 1. Assue that x R n is sparse with s = n β/ log log n for soe constant 0 < β < 1. Furtheore assue that each non-zero coponent of x satisfies σµ x i Dσµ for soe µ > 0. Here σ is the noise standard deviation D > 1 is the dynaic range of the signal and µ 2 is the SNR. Adaptively easure x according to Algorith 1 with the input paraeters as specified above and construct the estiator x CDS by applying the Dantzig selector with λ = Θσ to the output of the algorith i.e. with A = A k and y = y k. 1 There exists µ 0 = Ω log n/ log log n such that if µ µ 0 then x CDS x 2 l 2 = Osσ 2 with probability 1 On C 0 / log log n for soe C 0 > 0. 2 There exists µ 1 = Ω log log log n such that if µ 1 µ < µ 0 then x CDS x 2 l 2 = Osσ 2 with probability 1 Oe C 1 µ2 for soe C 1 > 0. 3 If µ < µ 1 then x CDS x 2 l 2 = Osσ 2 log log log n with probability 1 On C 2 for soe C 2 > 0. In words when the SNR is sufficiently large the estiate achieves the error perforance of the oracle-assisted estiator albeit with a lower slightly sub-polynoial convergence rate. For a class of slightly weaker signals the oracle-assisted error perforance is still achieved but with a rate of convergence that is inversely proportional to the SNR. Note that we ay suarize the results of the theore with the general clai x CDS x 2 l 2 = Osσ 2 log log log n with probability 1 o1. It is worth pointing out that for any probles of practical interest the log log log n ter can be negligible whereas log n is not; for exaple log log log10 6 < 1 but log10 6 14. IV. EXTENSIONS AND CONCLUSIONS Although the CDS procedure was specified under the assuption that the nonzero signal coponents were positive it can be easily extended to signals having negative entries as well. In that case one could split the budget of sensing resources in half executing the procedure once as written and again replacing the refineent step by I j+1 = {i I j : x j i < 0}. In addition the results presented here also apply if the signal is sparse another basis. To ipleent the procedure in that case one would generate the A j as above but observations of x would be obtained using A j T where T R n n is an appropriate orthonoral transforation atrix discrete wavelet or cosine transfor for exaple. In either case the qualitative behavior is the sae observations are collected by projecting x onto a superposition of basis eleents fro the appropriate basis. We have shown here that the copressive distilled sensing procedure can significantly iprove the theoretical perforance of copressive sensing. In experients not shown here due to space liitations we have found that CDS can perfor significantly better than CS in practice like siilar previously proposed adaptive ethods [7] [9]. We reark that our theoretical analysis shows that CDS is sensitive to the dynaic range of the signal. This is an artifact of the ethod for obtaining the signal estiate x j at each step. As alluded at the end of Section II x j could be obtained using any of a nuber of ethods including for exaple Dantzig selector estiation with a saller value of λ or other ixed-nor reconstruction techniques such as LASSO with sufficiently sall regularization paraeters. Such extensions will be explored in future work. A. Leata V. APPENDIX We first establish several key leata that will be used in the sketch of the proof of the ain result. In particular the first two results presented below quantify the effects of each refineent step. Lea 2. Let x R n be supported on S with S = s and let x S denote the subvector of x coposed of entries of x whose indices are in S. Let A be an n atrix whose entries are i.i.d. N 0 τ/ for soe 0 < τ in τ and let A S and A S c be subatrices of A coposed of the coluns of A corresponding to the indices in the sets S and S c respectively. Let w R be independent of A and have i.i.d. N 0 σ 2 entries. For the z 1 vector U = A T S ca Sx S + A T Scw where z = S c = n s we have 1/2 ɛ z z j=1 1 {U i>0} 1/2 + ɛ z for any ɛ 0 1/2 with probability at least 1 2 exp 2ɛ 2 z. Proof: Define Y = Ax + w = A S x S + w and note that given Y the entries of U = A T S cy are i.i.d. N 0 Y 2 2τ/. Thus when Y 0 we have PrU i > 0 = 1/2 for all i = 1... z. Let T i = 1 {Ui>0} and apply Hoeffding s inequality to obtain that for any ɛ 0 1/2 z i z 2 > ɛz Y : Y 0 2 exp 2ɛ 2 z. Now we integrate to obtain z i z 2 > ɛz 2 exp 2ɛ 2 z dp Y + Y :Y 0 2 exp 2ɛ 2 z. Y :Y =0 1 dp Y The last result follows fro the fact that the event Y = 0 has probability zero since Y is Gaussian-distributed. Lea 3. Let x S x S A A S and w be as defined in the previous lea. Assue further that the entries of x satisfy σµ x i Dσµ for i S for soe µ > 0 and fixed D > 1. Define = exp 32 sd 2 + µ 2 < 1 /τ in then for the s 1 vector V = A T S A Sx S + A T S w either of the following bounds are valid: s Pr 1 {Vi>0} s 2s 2

4 or s Pr 1 {Vi>0} < s1 3 4. Proof: Given A i the ith colun of A we have V i N A i 2 l 2 x i A i 2 l 2 τ s x 2 j + σ 2 j=1 j i and so by a standard Gaussian tail bound A i l2 x i PrV i 0 A i = Pr N 0 1 > τ s j=1 x 2 j + σ2 j i A i 2 l exp 2 x 2 i 2τ x 2 / + σ 2 Now we can leverage a result on the tails of a chi-squared rando variable fro [12] to obtain that for any γ 0 1 Pr A i 2 1 γτ exp γ 2 /4. Again we eploy conditioning to obtain PrV i 0 1 dp Ai A i : A i 2 1 γτ + PrV i 0 A i dp Ai A i : A i 2 >1 γτ exp γ2 + exp τ1 γx2 i 4 exp γ2 4 + exp 2τ x 2 / + σ 2 τ1 γµ 2 2τsD 2 µ 2 / + 1 where the last bound follows fro the conditions on the x i. Now to siplify we choose γ = γ 0 1 to balance the two ters obtaining γ = sd 2 + 1 τµ 2 1 + 2 sd 2 + τµ 2 1. Using the fact that 1 + 2t 1 t > 1 2 t for t > 1 we can conclude γ > 1 sd 2 + 1/2 2 τµ 2 since s > 1 by assuption. Now using the fact that τ τ in we have that PrV i 0 2 2 where = exp 32 sd 2 + µ 2. /τ in The first result follows fro s s Pr 1 {Vi >0} s = Pr {V i 0} s ax i {1...s} Pr V i 0 2s 2. For the second result let us siplify notation by introducing the variables T i = 1 {Vi>0} and t i = E [T i ]. By Markov s Inequality we have [ ] s s s s i t i > p p 1 E T i t i Now note that T i t i = s p 1 E [ T i t i ] p 1 s ax i {1...s} E [ T i t i ]. { 1 P Vi > 0 V i > 0 P V i > 0 V i 0 and so E [ T i t i ] 2P V i 0. Thus we have that ax i {1...s} E [ T i t i ] = 2 2 and so s s i t i > p 4p 1 s 2. Now let p = s to obtain s i < s t i s 4. Since t i = 1 Pr V i 0 we have s t i s1 2 2 and thus s i < s1 2 2 4. The result follows fro the fact that 2 2 + < 3. Lea 4. For 0 < p < 1 and q > 0 we have 1 p q 1 qp/1 p. Proof: We have log 1 p q = q log 1 p = q log 1 + p/1 p qp/1 p where the last bound follows fro the fact that log 1 + t t for t 0. Thus 1 p q exp qp/1 p 1 qp/1 p the last bound following fro the fact e t 1 + t for all t R. B. Sketch of Proof of Theore 1 A j A j S j T w j and To establish the ain results of the paper we will first show that the final set of observations of the CDS procedure is with high probability equivalent in distribution to a set of observations of the for 1 but with different paraeters saller effective diension n eff and effective noise power σeff 2 and for which soe fraction of the original signal coponents ay be absent. To that end let S j = S I j and Z j = S c I j for j = 1... k denote the subsets of indices of S and its copleent respectively that reain to be easured in step j. Note that at each step of the procedure the back-projection estiate x j = A j T A j x + A j T w j can be deco- T j posed into x S j = S A x j S j S j + T x Z j = A j j Z A x j S j S j + subvectors are precisely of the for specified in the conditions of Leas 2 and 3. A j Z j T w j and that these

5 Let z j = Z j and s j = S j and in particular note that s 1 = s and z 1 = z = n s. Choose the paraeters of the CDS algorith as specified in Section III. Iteratively applying Lea 2 we have that for any fixed ɛ 0 1/2 the bounds 1/2 ɛ j 1 z z j 1/2 + ɛ j 1 z hold siultaneously for all j = 1 2... k with probability at least 1 2k 1 exp 2zɛ 2 1/2 ɛ k 2 which is no less than 1 O exp c 0 n/ log c 1 n for soe constants c 0 > 0 and c 1 > 0 for n sufficiently large 2. As a result with the sae probability the total nuber of locations in the set I j satisfies I j s 1 +z 1 1 2 + ɛ j 1 for all j = 1 2... k. Thus we can lower bound τ j = R j / I j at each step by τ j αn1 2α/1 α j 1 s+z1+2ɛ/2 j 1 j = 1... k 1 αn s+z1+2ɛ/2 j 1 j = k. Now note that when n is sufficiently large 3 we have s z 1/2 + ɛ j 1 holding for all j = 1... k. Letting ɛ = 1 3α/2 2α we can siplify the bounds on τ j to obtain that τ j α/2 for j = 1... k 1 and τ k α log n/2. The salient point to note here is the value of τ k and in particular its dependence on the signal diension n. This essentially follows fro the fact that the set of indices to easure decreases by a fixed factor with each distillation step and so after Olog log n steps the nuber of indices to easure is saller than in the initial step by a factor of about log n. Thus for the sae allocation of resources R 1 = R k the SNR of the final set of observations is larger than that of the first set by a factor of log n. Now the final set of observations is y k = A k x k +w k where x k R n eff for soe n eff < n is supported on the set S k = S I k A k is an k n eff atrix and the w i are i.i.d. N 0 σ 2. We can divide throughout by τ k to obtain the equivalent stateent ỹ = à x + w where now the entries of à are i.i.d. N 0 1/ and the w i are i.i.d. N 0 σ 2 where σ 2 2σ 2 /α log n. To bound the overall squared error we consider the variance associated with estiating the coponents of x using the Dantzig selector cf. Lea 1 as well as the squared bias arising fro the fact that soe signal coponents ay not be present in the final support set S k. In particular a bound for the overall error is given by x x 2 l 2 = x x + x x 2 l 2 2 x x 2 l 2 + 2 x x 2 l 2. We can bound the first ter by applying the result of Lea 1 to obtain that for ρ 1 sufficiently large x x 2 l 2 = Osσ 2 holds with probability 1 On C0 for soe C 0 > 0. Now let δ = S S k /s denote the fraction of true signal coponents that are rejected by the CDS procedure. Then we have x x 2 l 2 = Osσ 2 δµ 2 and so overall we have x x 2 l 2 = Osσ 2 + sσ 2 δµ 2 with probability 1 On C0. The ethod for bounding the second ter in the error bound varies 2 In particular we require n c 0 log log log nlog nc 1/1 n c 2 / log log n 1 where c 0 c 1 and c 2 are positive functions of ɛ and β. 3 In particular we require n 1 + log n log log n/log log n β. depending on the signal aplitude µ; we consider three cases below. 1 µ 8D 3/α log n/ log log n: Conditioned on the event that the stated lower-bounds for τ j are valid we can iteratively apply Lea 3 taking τ in = α/2. For ρ 0 = 96D 2 / log b where b is the paraeter fro the expression for k let j = ρ 0 s log n/ log b log n. Then we obtain that for all n sufficiently large δ = 0 with probability at least 1 On C 0 / log log n for soe constant C 0 > 0. Since this ter governs the rate we have overall that x x 2 l 2 = Osσ 2 holds with probability 1 On C 0 / log log n as claied. 2 16 2/α log b log log log n µ < 8D 3/α log n/ log log n: For this range of signal aplitude we will need to control δ explicitly. Conditioned on the event that the lower-bounds for τ j hold we iteratively apply Lea 3 where for ρ 0 = 96D 2 / log b we let j = ρ 0 s log n/ log b log n. Now we invoke Lea 4 to obtain that for n sufficiently large δ = 1 1 3 k 1 = Oe C 1 µ2 with probability at least 1 Oe C 1 µ2 for soe C 1 > 0. It follows that δµ 2 is O1 and so overall x x 2 l 2 = Osσ 2 with probability 1 Oe C 1 µ2. 3 µ < 16 2/α log b log log log n: Invoking the trivial bound δ 1 it follows fro above that for n sufficiently large the error satisfies x x 2 l 2 = Osσ 2 log log log n with probability 1 On C 2 for soe constant C 2 > 0 as claied. REFERENCES [1] E. J. Candès J. Roberg and T. Tao Robust uncertainty principles: Exact signal reconstruction fro highly incoplete frequency inforation IEEE Trans. Infor. Theory vol. 52 no. 2 pp. 489 509 Feb. 2006. [2] D. L. Donoho Copressed sensing IEEE Trans. Infor. Theory vol. 52 no. 4 pp. 1289 1306 Apr. 2006. [3] E. J. Candès and T. Tao Near-optial signal recovery fro rando projections: Universal encoding strategies? IEEE Trans. Infor. Theory vol. 52 no. 12 pp. 5406 5425 Dec. 2006. [4] R. Baraniuk M. Davenport R. A. DeVore and M. Wakin A siple proof of the restricted isoetry property for rando atrices Constructive Approxiation 2008. [5] J. Haupt and R. Nowak Signal reconstruction fro noisy rando projections IEEE Trans. Infor. Theory vol. 52 no. 9 pp. 4036 4048 Sept. 2006. [6] E. J. Candès and T. Tao The Dantzig selector: Statistical estiation when p is uch larger than n Ann. Statist. vol. 35 no. 6 pp. 2313 2351 Dec. 2007. [7] S. Ji Y. Xue and L. Carin Bayesian copressive sensing IEEE Trans. Signal Processing vol. 56 no. 6 pp. 2346 2356 June 2008. [8] R. Castro J. Haupt R. Nowak and G. Raz Finding needles in noisy haystacks in Proc. IEEE Conf. Acoustics Speech and Signal Proc. Honolulu HI Apr. 2008 pp. 5133 5136. [9] J. Haupt R. Castro and R. Nowak Adaptive sensing for sparse signal recovery in Proc. IEEE 13th Digital Sig. Proc./5th Sig. Proc. Education Workshop Marco Island FL Jan. 2009 pp. 702 707. [10] J. Haupt R. Castro and R. Nowak Adaptive discovery of sparse signals in noise in Proc. 42nd Asiloar Conf. on Signals Systes and Coputers Pacific Grove CA Oct. 2008 pp. 1727 1731. [11] J. Haupt R. Castro and R. Nowak Distilled sensing: Selective sapling for sparse signal recovery in Proc. 12th International Conference on Artificial Intelligence and Statistics AISTATS Clearwater Beach FL Apr. 2009 pp. 216 223. [12] B. Laurent and P. Massart Adaptive estiation of a quadratic functional by odel selection Ann. Statist. vol. 28 no. 5 pp. 1302 1338 Oct. 2000.