Performance Analysis for Sparse Support Recovery

Performance Analysis for Sparse Support Recovery Gongguo Tang and Arye Nehorai ESE, Washington University April 21st 2009 Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 1 / 41

Outline 1 Background and Motivation 2 Research Overview 3 Mathematical Model 4 Theoretical Analysis 5 Conclusions 6 Future Work Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 2 / 41

Background and Motivation Background and Motivation Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 3 / 41

Background Basic Concepts and Notations Sparse signals refer to a set of signals that have only a few nonzero components under a common basis/dictionary. The set of indices corresponding to the nonzero components are called the support for the signal. If several sparse signals share a common support, we call them jointly sparse. Sparse signal support recovery aims at identifying the true support of jointly sparse signals through its noisy linear measurements. Suppose that S is an index set, then for x 2 F N a vector, x S denotes the vector formed by those components of x indicated by S; for A 2 F MN a matrix, A S denotes the matrix formed by those columns indicated by S. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 4 / 41

Background Review of Compressive Sensing Long-established paradigm for digital data acquisition sample data at Nyquist rate (2x bandwidth) compress data (signal-dependent, nonlinear) brick wall to resolution/performance This slide is adapted from R. Baraniuk, J. Romberg and M. Wakin s "Tutorial on Compressive Sensing". Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 5 / 41

"Why go to so much e ort to acquire all the data when most of what we get will be thrown away? Can t we just directly measure the part that won t end up being thrown away?" David L. Donoho Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 6 / 41

Background Review of Compressive Sensing Directly acquire compressed data Replace samples by more general measurements K < M N This slide is adapted from R. Baraniuk, J. Romberg and M. Wakin s "Tutorial on Compressive Sensing". Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 7 / 41

Background Review of Compressive Sensing When data is sparse/compressible, we can directly acquire a condensed representation with no/little information loss Random projection will work This slide is adapted from R. Baraniuk, J. Romberg and M. Wakin s "Tutorial on Compressive Sensing". Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 8 / 41

Background Previous Assumptions When there are measurement noises, there are di erent criteria for measuring the recovery performance various l p norms E kbx x k p, especially l 2 and l 1 predictive power (e.g., E ky byk 2 2, where by is the estimate of y based on bx 0 1 loss associated with the event of recovering the correct support S Assumptions on noise bounded noise sparse noise Gaussian noise Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 9 / 41

Background Previous Assumptions Assumptions on sparse signal deterministic with unknown support but known component values deterministic with unknown support and unknown component values random with unknown support Assumptions on measurement matrix standard Gaussian ensemble Bernoulli ensemble random but with a structure such as Toeplitz deterministic Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 10 / 41

Motivation Why Support Recovery? The support of a sparse signal has physical signi cance the timing of events the locations of objects or anomalies Compressive Radar Imaging Compressive Sensor Network the frequency components Compressive Spectrum Analysis the existence of certain substances such as chemicals and mrnas Compressed Sensing DNA Microarrays Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 11 / 41

Motivation Theoretical Consideration After the recovery of the support, the magnitudes of the nonzero components can be obtained by a solving a least-square problem Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 12 / 41

Motivation Other Applications leading to Support Recovery Consider parameter estimation problem associated with the following widely applied model, y (t) = A (θ) x (t) + w (t), t = 1,, T, where A (θ) = ϕ (θ 1 ) ϕ (θ 2 ) ϕ (θ K ) and θ 1, θ 2,, θ K are true parameters. In order to solve this problem, we sample the parameter space to θ 1, θ 2,, θ N and form Ã θ = ϕ θ 1 ϕ θ 2 ϕ θ N. De ne vector x (t) by setting its components to those of x (t) when their locations correspond to true parameters and zero otherwise. Then we have transformed a traditional parameter estimation problem to one of support recovery. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 13 / 41

Research Overview Research Overview Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 14 / 41

Research Overview Introduce hypothesis testing problems for sparse signal support recovery Derive an upper bound for the probability of error (PoE) for general measurement matrix Study the e ect of di erent parameters Analyze the PoE for multiple hypothesis testing and its implications for system design Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 15 / 41

Mathematical Model Mathematical Model Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 16 / 41

Mathematical Model Measurement Model We will focus on the following model: or in matrix form y (t) = Ax (t) + w (t), t = 1,, T, (1) Y = AX + W. Here we have x (t) 2 F N, w (t) 2 F M, y (t) 2 F M with F = R or C. X, W, Y are matrices with columns formed by fx(t)g T t=1, fw(t)gt t=1, fy(t)g T t=1 respectively. Our analysis involves a constant κ which is 1 2 for F = R and 1 for F = C. Generally M is the dimension of hardware while T is the number of time samples. Hence increasing M is more expensive. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 17 / 41

Mathematical Model Assumptions on Signal and Noise We have the following assumptions: fx(t)g T t=1 are jointly sparse signals with a common support S = supp (X). fx S (t)g T t=1 follow i.i.d. FN (0, I K). fw (t)g T t=1 follow i.i.d. FN (0, σ2 I M ) and are independent of fx(t)g T t=1. Note that the noise variance σ2 can be viewed as 1/SNR. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 18 / 41

Mathematical Model Assumptions on Measurement Matrix We consider two types of measurement matrices: 1 Non-degenerate measurement matrix: we say that a general measurement matrix A MN is non-degenerate if every M M submatrix of A is nonsingular. 2 Gaussian measurement matrix: The element of A, say, a ij follows i.i.d. FN (0, 1). Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 19 / 41

Mathematical Model Hypothesis Testing We focus on two hypothesis testing problem: 1 Binary hypothesis testing (BHT) with js 0 j = js 1 j: H0 : supp (X) = S 0 H 1 : supp (X) = S 1. 2 Multiple hypothesis testing (MHT) : 8 >< H 1 : supp (X) = S 1.. >: H L : supp (X) = S L where S i s are candidate supports with the same cardinality js i j = K. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 20 / 41

Mathematical Model Probability of Error Our aim is to calculate an accurate upper bound for the PoE and analyze the e ect of M, T, and noise variance σ 2. p err (A) = 1 2 Z H 1 Pr(YjH 0 )dy + 1 2 Z H 0 Pr(YjH 1 )dy for BHT and p err (A) = L i=1 Z 1 Pr(YjH i )dy L H j :j6=i for MHT. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 21 / 41

Theoretical Analysis Theoretical Analysis Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 22 / 41

Theoretical Analysis Optimal Decision Rule for BHT Y = AX + W The BHT problem is equivalent to deciding between two distributions of Y: YjH 0 FN M,T (0, Σ 0 I T ) or YjH 1 FN M,T (0, Σ 1 I T ), where Σ i = σ 2 I M + A Si A S i. With equal prior probabilities of S 0 and S 1, the optimal decision rule is given by the likelihood ratio test: H f (YjH 1 ) 1 h i H 1 R 1, tr Y Σ1 1 Σ0 1 Y Q T log jσ 0j f (YjH 0 ) jσ H 0 H 0 1 j Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 23 / 41

Theoretical Analysis Calculation of PoE for BHT Due to the symmetry of H 0 and H 1, we can just compute the probability of false alarm p FA = Pr fh 1 jh 0 g = Pr tr = Pr h Y Σ1 1 Σ0 1 i Y < T log jσ 0j h tr Z Σ 1/2 0 Σ1 1 Σ1/2 0 I M Z i jσ 1 j jh 0 < T log jσ 0j jσ 1 j jh 0, where Z = Σ 1/2 0 Y FN (0, I M I T ). We de ne H = Σ 1/2 0 Σ1 1 Σ1/2 0 with Σ i = A Si A S i + σ 2 I M, which is a fundamental matrix in our analysis. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 24 / 41

Theoretical Analysis Calculation of PoE for BHT Suppose the ordered eigenvalues of H are σ 1 < σ 2 < < σ k1 < 1 = 1 = = 1 < λ 1 < λ 2 < < λ k0., and H can be diagonalized by an orthogonal/unitary matrix Q. Then the transformation of Z = QN will give us p FA = Pr f k 0 i=1 (λ i 1) T t=1 jn it j 2 k 1 i=1 < T log jσ 0j jσ 1 j jh 0g (1 σ i ) T t=1 N (i+k0 )t 2 Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 25 / 41

Theoretical Analysis Eigenvalue Structure of H The eigenvalue structure of H, especially the eigenvalues that are greater than 1, determines the performance of measurement matrix A in distinguishing between di erent supports. We study the structure of H in a slightly general seting where the sizes of the two candidate supports might not be equal. Problem 1 How many eigenvalues of H are less than 1, greater than 1 and equal to 1? Is there a general rule? 2 Can we give tight lower bounds on the eigenvalues that are greater than 1? The bounds should have a nice distribution that can be handled easily. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 26 / 41

Theoretical Analysis Eigenvalue Structure of H M = 200, js 0 \ S 1 j = 20, js 0 ns 1 j = 80, js 1 ns 0 j = 60 and the elements of A are i.i.d. real Gaussian. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 27 / 41

Theoretical Analysis Eigenvalue Structure of H Note that js 1 ns 0 j = 60 eigenvalues of H are less than 1, js 0 ns 1 j = 80 greater than 1, and M (js 0 ns 1 j + js 1 ns 0 j) = 60 identical to 1. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 28 / 41

Theoretical Analysis Eigenvalue Structure of H Theorem Suppose k i = js 0 \ S 1 j, k 0 = js 0 ns 1 j, k 1 = js 1 ns 0 j and M > k 0 + k 1, for general non-degenerate measurement matrix, k 0 eigenvalues of matrix H are greater than 1, k 1 less than 1 and M (k 0 + k 1 ) equal to 1. q Note that from the bound we present later, k 0 i=1 λ i k 1 i=1 (1/σ i) determines the performance of the optimal BHT decision rule. Hence, generally and quite intuitively, the larger the di erence set S 0 S 1, the easier to distinguish between the two candidate supports. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 29 / 41

Theoretical Analysis Eigenvalue Structure of H Theorem For Gaussian measurement matrix, the sorted eigenvalues of H that are greater than 1 are lower bounded by those of I k0 + 1 σ 2 V with probability one, where V is a matrix obtained from measurement matrix A and V follows W k0 (I k0, 2κ (M k 1 k i )). We comment that generally the larger M k 1 k i = M js 1 j, the larger the eigenvalues of I k0 + 1 σ 2 V, and hence the better we can distinguish the true support from the false one. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 30 / 41

Theoretical Analysis A Lower Bound on Eigenvalues M = 200, js 0 \ S 1 j = 20, js 0 ns 1 j = 80, js 1 ns 0 j = 60, σ 2 = 4 and the element of A are i.i.d. real Gaussian. Blue line represents the true sorted eigenvalues of H that are greater than 1 and red line represents the lower bound. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 31 / 41

Theoretical Analysis Bound on PoE Theorem The Probability of False Alarm can by bounded by ( kd /2 kd /2 λ g (S 0, S 1 ) λ g (S 1, S 0 ) ) κt p FA = Pr (S 1 jh 0 ), 4 4 q where k d = js 0 ns 1 j, λ g (S 0, S 1 ) = k d k d j=1 λ j with λ j s the eigenvalues of 1/2 1 1/2that H = A S0 A + σ 2 I S0 M A S1 A + σ 2 I S1 M A S0 A S 0 + σ 2 I M are greater than one. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 32 / 41

Theoretical Analysis Implications of the Bound The bound can be equivalently written as q! κkd k d i=1 λ i k T d i=1 (1/σ i ) 4 with λ i s and σ i s eigenvalues of H that are greater and less than 1, respectively. Hence these eigenvalues determines the systems ability in distinguishing two supports. As we will see the minimum of all λ g S i, S j s determines the systems ability in distinguishing all candidate supports, and can be viewed as a measure of incoherence. The logarithm of the bound can be approximated by κk d T 1 2 log λ g (S 0, S 1 ) λ g (S 1, S 0 ) log 4. Hence, if we can guarantee that λ g (S 0, S 1 ) λ g (S 1, S 0 ) of our measurement matrix is greater than some constant, then we can make the p FA arbitrarily small by taking more temporal samples. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 33 / 41

Theoretical Analysis Multiple Hypothesis Testing Now we turn to the MHT problem 8 >< H 1 : supp (X) = S 1.. >: H L : supp (X) = S L where S i s are candidate supports with the same cardinality js i j = K and L = C K N, the total number of candidate supports with size K. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 34 / 41

Theoretical Analysis PoE for MHT Theorem Denote by λ min = min λ g bounded by S i, S j, then the total PoE for MHT can be n p err C exp κt hlog λ min io log (4K (N K)) κt 1. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 35 / 41

Theoretical Analysis Multiple Hypothesis Testing Theorem For T = O log N log[k log N K ] as N, K, M!. and M = O(K log (N/K)), then n o Pr λ min > 4 [K (N K)] κt 1! 1, Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 36 / 41

Theoretical Analysis Discussion M = O(K log (N/K)) is the same as conventional compressive sensing. We need MT samples in total. When K is su ciently small compared with N, this value is still much smaller than N. Actually the value of T is not very large. For example, for N = 10 100, K = 10 5 log N, we have log[k log N K ] 13; for N = 10100, K = 10 98, we have log N log[k log N K ] 1; After we recover the support, we can get the component values by solving a least-square problem. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 37 / 41

Theoretical Analysis Implications of the Bound In practice, given N, K, we take M = O(K log (N/K)), T = O log N log[k log N K ] and generate measurement matrix A. Then with large probability, we will get λ min > 4 [K (N K)] 1 κt. For safety, we can compute λ min nd T large enough such that λ min > 4 [K (N K)] 1 κt continue to increase T so that p err < α. Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 38 / 41

Conclusions Hypothesis testing for sparse signal support recovery BHT MHT Bound for PoE non-degenerate measurement matrix The behavior of critical quantity Implications in system design Another dimension of data collection gives us more exibility Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 39 / 41

Future Work Design measurement system with optimal λ min. Establish a necessary condition imposed on M and T Analyze the behavior of λ (S 0, S 1 ) and λ min for other measurement matrix structures. Devise an e cient algorithm for support recovery and compare its performance with the optimal one The performance of l 1 minimization algorithm Develop an algorithm to compute λ min for given measurement matrix Explore the relationship between λ min and Restricted Isometry Property (RIP). Apply this result to the design of transmitted signals in Compressive Radar Imaging Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 40 / 41

Thank you! Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support Recovery April 21st 2009 41 / 41