Recovering Block-structured Activations Using Compressive Measurements

Size: px

Start display at page:

Download "Recovering Block-structured Activations Using Compressive Measurements"

Gavin Shepherd
6 years ago
Views:

1 Recovering Block-structured Activations Using Copressive Measureents Sivaraan Balakrishnan, Mladen Kolar, Alessandro Rinaldo, and Aarti Singh Abstract We consider the probles of detection and support recovery of a contiguous block of weak activation in a large atrix, fro a sall nuber of noisy, possibly adaptive, copressive (linear) easureents. We characterize the tradeoffs between the various proble diensions, the signal strength and the nuber of easureents required to reliably detect and recover the support of the signal. In each case sufficient conditions are copleented with inforation theoretic lower bounds. This is closely related to the proble of (noisy) copressed sensing, where the analogous task is to detect or recover the support of a sparse vector using a sall nuber of linear easureents. In copressed sensing, it has been shown that, at least in a iniax sense, for both detection and support recovery, adaptivity and contiguous structure only reduce signal strength requireents by logarithic factors. On the contrary, we show that while for detection neither adaptivity nor structure reduce the signal strength requireent, for support recovery the signal strength requireent is strongly influenced by both structure and the ability to choose easureents adaptively. Keywords: Adaptive procedures, copressive easureents, large average subatrix, signal detection, signal localization, structured Noral eans Introduction In this paper, we consider the probles of detecting the presence of and estiating the support of a sall contiguous block of positive signal that is ebedded in a large data atrix, given access to noisy copressive easureents, that is, linear cobinations of the atrix entries corrupted with noise. Statistical inference leveraging large structured atrices is the one of the central thees in several applications such atrix copletion, counity detection and ultivariate linear regression. Data atrices with sparse, contiguous block-structured signal coponents for an idealized odel for signals arising in several real-world applications. Consider, for instance, the expression pattern resulting fro genes, grouped by pathways, expressed under the influence of drugs, grouped by siilarity. The block structure in this case arises fro the fact that groups of genes (belonging to coon pathways, say) are co-expressed under Departent of Statistics, UC Berkeley; e-ail: sbalakri@cs.cu.edu. Machine Learning Departent, Carnegie Mellon University; e-ail: ladenk@cs.cu.edu. Departent of Statistics, Carnegie Mellon University; e-ail: arinaldo@stat.cu.edu. Machine Learning Departent, Carnegie Mellon University; e-ail: aarti@cs.cu.edu.

2 the influence of sets of siilar drugs (Yoon et al., 2005). Siilarly, the autoated detection of particular artificial objects in natural satellite iages is an iportant task in coputer vision (Caron et al., 2002). The block-structured odel can be seen as an idealization of a odel for vehicles or buildings in natural iages. The block structure in this case is a consequence of the geoetry of the artificial objects. More generally, the task of detecting and localizing groups of anoalies in networks has a variety of applications in environental onitoring using sensor networks, as detailed in the work of Arias-Castro et al. (20a). The block-structured odel we consider is an idealization of the so called thick cluster odel considered in that work, and it can be expected that several of our results and techniques can be extended to handle this ore flexible notion of structure. Indeed, in work that followed up on the initial posting of this work, Krishnaurthy et al. (203) extend our results to the ore general setting of graph-structured signals, although at the expense of optial results in the setting we consider. In several applications, it is often difficult to easure, copute or store all the entries of the data atrix. For exaple, acquiring and storing high-resolution iages is expensive, while easuring expression levels of all genes under all possible drugs is tie-consuing and costly. However, if we have access to linear cobinations of atrix entries (that is, copressive easureents) such as those obtained fro copressive iaging caeras (Wakin et al., 2006) or fro the cobined expression of ultiple genes under the influence of ultiple drugs (see, for exaple, Kainkarya et al., 200, Kainkarya and Woolf, 2009), then we ight need to only ake and store few such easureents, while still being able to infer the existence or estiate the position of the block of signal in the data atrix. In each of these cases the goal is to detect or recover the signal block, corresponding to a set of co-expressed genes and drugs or to the vehicles/buildings using only few copressive easureents of the data atrix. In this work we consider both the passive (non-adaptive) and active (adaptive) copressive easureents. The non-adaptive easureents are rando or pre-specified linear cobinations of atrix entries. The active easureents schees are adaptive and use feedback fro previously taken easureents to sequentially design linear cobinations that are ore inforative. Related work. In the last few years, there has been a flurry of interest in the use of copressive easureents, priarily for inferring sparse vectors. Specifically, several papers, including Candés and Tao (2006), Donoho (2006), Candés and Tao (2007), Candés and Wakin (2008), have shown that it is possible to recover, in an l 2 sense, a k-sparse vector in n diensions using only O(k log n) incoherent copressive easureents, instead of easuring all of the n coordinates. Along with l 2 recovery, researchers have also considered the probles of detection (Duarte et al., 2006, Haupt and Nowak, 2007, Arias-Castro, 202, Arias-Castro et al., 20b, Ingster et al., 200) and support recovery (Wainwright, 2009a,b, Aeron et al., 200, Reeves and Gastpar, 202) of the non-zero entries of a sparse vector fro non-adaptive copressive easureents. More recently, researchers have analyzed the possibility of taking adaptive copressive easureents, that is, where subsequent easureents are designed based on past observations (Candés and Davenport, 203, Arias-Castro et al., 203, Haupt et al., 2009). For exaple, Haupt et al. (2009) deonstrate upper bounds on the l 2 error for recovering a sparse vector using an adaptive procedure called copressive distilled sensing, and Malloy 2

3 and Nowak (202b) show that a copressive version of standard binary search achieves iniax perforance for the localization of a one-sparse vector. Malloy and Nowak (202a) extend the binary search procedure for identifying the support of k-sparse vectors. In Arias- Castro et al. (203), the authors show that adaptive copressive schees offer iproveents over the passive schees which, in ters of l 2 recovery and support recovery, are liited to a log factor. Arias-Castro (202) further extends these results to the proble of detection fro adaptive copressive easureents. In order to copare with the results we present in this work, Table suarizes soe results for the detection and exact support recovery of a sparse vector using passive and active copressive easureents. In order to avoid clutter, in each case we only give sufficient conditions. The corresponding references detail all the assuptions on the easureent atrices (for instance, the result of Wainwright (2009b) applies to Gaussian ensebles). In the table, we follow the standardization used throughout this paper (and various others on this topic), that is, the length of the vector is n and the nuber of non-zero coordinates in the vector is k. The nuber of easureents is and each easureent is assued to have unit l 2 nor (in expectation, if the easureent is rando). Table : Suary of known results for the sparse vector case. µ in represents the iniu (absolute) signal aplitude over all non-zero coordinates, and represents the standard deviation of the noise. C is a universal constant independent of the proble paraeters. Detection Localization Passive µ in µ in C n for positive signals. k Ck log(n/k) 2 C n k for arbitrary signals. µ Active Arias-Castro (202) in n log n Wainwright (2009b) C Ck log(n/k) Arias-Castro et al. (203) µ in C n log k Malloy and Nowak (202a) Alost all of the prior work described so far has been focused on inference for sparse unstructured data vectors fro (passive or adaptive) copressed easureents. In this paper, we focus on the relatively unexplored probles of detection and support recovery for structured data atrices fro copressive easureents. We are concerned with signals that are both sparse and highly structured, taking the for of a contiguous sub-atrix within a larger noisy atrix. If the signal coponents are unstructured, the treatent of data atrices is exactly equivalent to the treatent of data vectors. However, in the structured case, it is clear that soe notions of structure are ore natural when considered in the context of data atrices rather than data vectors. Specifically, the contiguous block signals considered in this work can (soewhat unnaturally) be viewed as arising fro a particular collection of structured signals on vectors. The works of Baraniuk et al. (200) and Huang et al. (20) have analyzed different fors of structured sparsity in the vector setting, for exaple, when the non-zero locations in a data vector for non-overlapping or partially-overlapping groups. These works do not however provide sharp results when specialized to the setting we consider and do not address the adaptive setting. Castro (202) and Tánczos and Castro (203) study support recovery of unstructured and structured signals fro adaptive easureents when the signal is observed 3

4 directly. Soni and Haupt (20) consider the case of adaptive sensing of tree-structured signals. The class of tree-structured signals is quite different fro the class of signals we consider, and the gains fro adaptive sensing for the tree-structured signals are relatively liited (to a log factor as in the unstructured setting). Also related is the recent work on the recovery of low-rank atrices (see, for exaple, Negahban and Wainwright, 20, Koltchinskii et al., 20). Indeed, our analysis for support recovery in the passive setting (see Theore 3) is inspired by restricted strong convexity analysis of Negahban and Wainwright (20). The signals under consideration in this paper are a special case of low rank and sparse atrices. Richard et al. (202) study the recovery of low rank and sparse atrices fro direct observations while Golbabaee and Vandergheynst (202) study recovery fro copressive easureents. Each of these results focuses on recovery in Frobenius nor fro passive easureents while our focus is on detection and signal support recovery fro possibly active easureents. Furtherore, the additional structure of block-structured signals is crucially leveraged in our analysis and by our estiators, and specializing these earlier results does not give sharp results even in the passive setting. Results applicable to the sparse and low-rank setting do however extend to the setup where there is a non-contiguous sub-atrix or block, also considered in Sun and Nobel (203), Butucea and Ingster (203), Butucea et al. (203), Bhaidi et al. (202) and Kolar et al. (20) in the case of direct easureents. This setting is beyond the scope of our paper. Suary of our contributions. We establish lower bounds on the iniu nuber of copressive easureents and the iniu (non-zero) signal aplitude needed to detect the presence of a block with positive signal, as well as to recover the support of the signal block, using both non-adaptive and adaptive easureents. We also establish atching upper bounds for the procedures that guarantee consistent detection and support recovery of weak block-structured signals using few non-adaptive and adaptive copressive easureents. Our results indicate that adaptivity and structure play a key role and that significant iproveents are possible over non-adaptive and unstructured settings. This is unlike the vector case where, at least in the settings analogous to the ones we consider here, contiguous structure and adaptivity have been shown to provide inor, if any, iproveent. In our setting we take copressive easureents of a data atrix of size (n n 2 ), the block containing the signal is of size (k k 2 ), with iniu signal aplitude µ in in each coordinate of the (k k 2 ) block. The easureents are corrupted with Gaussian noise with standard deviation, and we have a budget of copressive easureents with each easureent atrix constrained to have unit Frobenius nor. Table 2 describes our ain findings. In this table we describe both necessary and sufficient conditions. To reduce clutter in the expressions we use n = n 2 := n and k = k 2 := k. For detection, akin to the vector setting, structure and adaptivity play no role. The structured data atrix setting requires the aplitude of a signal to scale as for both n k 2 non-adaptive and adaptive cases, which is sae as the aplitude needed to detect a k 2 - sparse non-negative vector of length n 2 as deonstrated in Arias-Castro (202). As in the vector case, the structure of the signal coponents as well as the power of adaptivity offer no advantage in the detection proble. For support recovery of the signal block first notice that the necessary and sufficient conditions are tight in the scalings of µ in / upto log factors (which we ignore for this discussion). We use C for possibly different universal constants. The structured data atrix setting re- 4

5 Table 2: Suary of ain results of our paper. We use C for possibly different universal constants. Detection Passive and Active Passive µ in C n k 2, necessary. Theore. µ in C n k 2, sufficient. Arias-Castro (202). µ in C n k ax Localization (, log n k ), necessary. Theore 2. C log n and µ in C n k log n, sufficient. Theore 3. Active ( µ in C ax n ( C log n and µ in C ax k 2, k ), necessary. Theore 4. log(k) log log(k) k ), sufficient. Theore 5. n k 2, quires µ in / to scale as µ in C n k when using non-adaptive copressive easureents. In contrast, the unstructured setting requires a higher aplitude of µ in C n (the necessary conditions appear in Wainwright (2009b), Aeron et al. (200), and Reeves and Gastpar (202)). Structure without adaptivity already yields a factor of k reduction in the sallest µ in that still allows for reliable support recovery. Adaptivity in the copressive easureent design yields further iproveents: with adaptive easureents, identifying the signal ( block requires µ in C ax n,. In contrast, for the sparse vector case, Arias-Castro k 2 et al. (203) showed that adaptive copressive easureents cannot recover the locations of the non-zero coponents if the aplitude of a signal is saller than Cn. It is clear that the k ) potential for gains with an adaptive easureent schee are considerable, either a k 2 factor or an n k factor. Exploiting the structure of the signal coponents and designing adaptive linear easureents can both yield significant gains if the signal corresponds to a contiguous block in a data atrix. The rest of this paper is organized as follows. We describe the proble set up and notation in Section 2. We study the detection proble in Section 3. Section 4 is devoted to non-adaptive support recovery, while Section 5 is focused on adaptive support recovery. Finally, in Section 6 we present and discuss soe siulations that support our findings. The proofs are given in the Appendix. 2 Preliinaries We begin by defining soe notation that will be used frequently in the paper. Notation: We denote [n] to be the set {,..., n}. I{ } denotes the indicator function. For a vector a R n, we let supp(a) = {j : a j 0} be the support set (with an analogous definition for atrices A R n n 2 ), a q, q [, ), the l q -nor defined as a q = ( i [n] a i q ) /q 5

6 with the usual extensions for q {0, }, that is, a 0 = supp(a) and a = ax i [n] a i. For a atrix A R n n 2, we use the notation vec(a) to denote the vector in R n n 2 fored by stacking the coluns of A. We denote the Frobenius nor of A by A 2 fro := i [n ],j [n 2 ] a2 ij. For two atrices A and B of identical diensions, A, B denotes the trace inner product, A, B = trace(a T B). For a atrix A R n n 2, define n = in(n, n 2 ), and denote the ordered singular values of A by (A)... n (A). Denote, the operator nor of A by A op := (A). For two sequences of nubers {a n } n= and {b n} n=, we use a n = O(b n ) to denote that a n Cb n for soe finite positive constant C, and for all n large enough. If a n = O(b n ) and b n = O(a n ), we use the notation a n b n. The notation a n = o(b n ) is used to denote that a n b n n 0. Let A R n n 2 be a signal atrix with unknown entries. We are interested in a structured setting where the support set of the atrix A, that is, the set of non-zero eleents, is a contiguous block of size (k k 2 ), and the eleents in the support set are equal to or larger than the level µ in > 0. Although we focus on the case when k and k 2 are known we also discuss the cases in which one can adapt to the unknown size of the support. In order to siplify the presentation of our results we will assue that k j n j /2 for j =, 2. The following collection contains locations of all contiguous blocks of size k k 2 B = { I r I c : I r and I c are contiguous subsets of [n ] and [n 2 ], I r = k, I c = k 2 }. (2.) With this, the signal atrix A = (a ij ) can be represented as a ij = µ ij I{(i, j) B } with µ ij µ in > 0 for soe (unknown) B B. We consider the following observation odel under which we collect noisy linear easureents, {y,..., y } of A y i = A, X i + ɛ i, i =,...,, (2.2) iid where ɛ,..., ɛ N (0, 2 ), with > 0 known. 2 We use the ter aplitude to refer to µ in. The sensing atrices (X i ) i [] R n n 2 are noralized to satisfy one of the following two conditions: ) X i 2 fro = or 2) E X i 2 fro = for (i []). This noralization ensures that every easureent is ade with the sae aount of energy (possibly in expectation). The first condition is used when the easureents are non-rando, see, for exaple, the detection proble described in Section 3, while the second condition is used for rando easureents. Under the observation odel in Eq. (2.2), we study two tasks: () detecting whether a contiguous block of positive signal exists in A and (2) identifying the support of the signal, B. Both of these are forally defined below. We develop efficient algoriths for these two tasks that provably require the sallest nuber of easureents (up to logarithic factors). The algoriths are designed for one of two easureent schees: () the easureent schee can be ipleented in an adaptive or sequential fashion by letting each X i be a (possibly randoized) function of (y j, X j ) j [i ], or (2) the easureent atrices are chosen We only deal with positive signal in the ain text and briefly discuss how the ethods should be changed to address the signals that are both positive and negative. 2 The easureent schee could be equivalently written in a vector for as y i = vec(a) T vec(x i) + ɛ i. To ephasize that the support of the signal is a contiguous block, we prefer the representation of Eq. (2.2). 6

7 all at once, ignoring the outcoes of previous easureents. We call the first schee active, while the second schee is referred to as passive. Detection. The detection proble consists in deciding whether a (positive) contiguous block of signal exists in A. As we will show later, we can detect the presence of a contiguous block with a uch saller nuber of easureents than is required for identifying its support. Forally, detection is a hypothesis testing proble, with a coposite alternative, of the for H 0 : A = 0 n n 2 (2.3) H : A A(µ in ), where the notation 0 n n 2 is used for the all-zero atrix in R n n 2 and A(µ in ) = {A : A has entries a ij = µ ij I {(i, j) B }, µ ij µ in, B B}. A test T is a function that takes (y i, X i ) i [] as an input and outputs either, if the null hypothesis is rejected, and 0 otherwise. For any test T, we define its risk as R det (T ) := P 0 [ T ( (yi, X i ) i [] ) = ] + sup A A(µ in ) P A [ T ( (yi, X i ) i [] ) = 0 ], where P 0 and P A denote the joint probability distributions of ( (y i, X i ) i [] ) under the null hypothesis and alternative hypothesis, respectively. The risk R det (T ) easures the su of type I and axial type II errors over the set of alternatives. The overall difficulty of the detection proble is quantified by the iniax detection risk R det := inf T Rdet (T ), where the infiu is taken over all tests. When the aplitude of the signal is sufficiently sall, as easured by µ in, the iniax risk can reain bounded away fro zero, which iplies that no test can reliably (that is, with a vanishing probability of error) distinguish H 0 fro H. In Section 3 we will characterize necessary and sufficient conditions for distinguishing H 0 and H in ters of µ in. Identifying the support of the signal A. The proble of identifying the support of the signal A (or support recovery) is that of deterining the exact location of the non-zero eleents of A. Let Ψ be an estiator of B, that is, a function that takes (y i, X i ) i [] as input and outputs an eleent fro B, the estiated location of the support. We define the risk of any such estiator as R sup (Ψ) := while the iniax support recovery risk is [ ( ) ] sup P A Ψ (yi, X i ) i [] supp(a), A A(µ in ) R sup := inf Ψ Rsup (Ψ), where the infiu is taken over all such estiators Ψ. Like in the detection task, the iniax risk specifies the inial risk of any support identification procedure. We focus in this paper on exact support recovery. An interesting extension of the work in this paper, that we do 7

8 not address, would be to understand the iniax risk of support recovery in the Haing etric (for instance). Below we provide upper and lower bounds on the iniax risk for the probles of detection and support recovery as functions of tuples of (n, n 2, k, k 2,, µ in, ), for both the active and passive sapling schees. Our results identify (up to logarithic factors) necessary and sufficient conditions on the inial aplitude of the signal µ in that can be detected or whose support can be recovered given a budget of possibly adaptive easureents. 3 Detection of contiguous blocks In this section, we give necessary and sufficient conditions on the proble paraeters (n, n 2, k, k 2,, µ in, ), for distinguishing between H 0 and H defined in Eq. (2.3). We start with odel (2.2) and assue that each easureent atrix X i can be chosen adaptively. We have the following lower bound on the agnitude of the signal. Theore. Fix any 0 < α <. Based on (possibly adaptive) easureents, if then the iniax detection risk R det α. µ in 8( α) n n 2 k 2, k2 2 The lower bound given in Theore is established by analyzing the risk of the (optial) likelihood ratio test under a unifor prior over the alternatives. Careful odifications of standard arguents are necessary to account for adaptivity. We closely follow the approach of Arias-Castro (202) who established the analogue of Theore in the vector setting. We ake inor odifications (detailed in Appendix A.) to deal with the particular for of the block-structured signals we consider. Next, we describe a test for distinguishing H 0 and H, proposed in Arias-Castro (202). The test is given as T ( ) { (y i ) i [] = I y i > } 2 log(α ) (3.) i with the easureent atrices chosen passively as X i = (n n 2 ) /2 n n 2 (i =,..., ), where n is used to denote the vector of ones in R n. Fro Proposition in Arias-Castro (202), if µ in C n n 2 log(α k ) (3.2) 2k2 2 for soe universal constant C > 0, then R det (T ) α. The lower bound given in Theore atches the upper bound given in Eq. 3.2 up to constants. Taken together they establish that the iniax rate for detection under the odel in Eq. (2.2) is µ in (k k 2 ) n n 2. It is worth pointing out that the structure of the activation pattern does not play any role in the iniax detection proble. Indeed the rate established atches the known bounds for detection in the unstructured vector case Arias-Castro (202). We will contrast this to the proble of support recovery below. Furtherore, the procedure that achieves the adaptive lower bound (up to constants) is non-adaptive, indicating that adaptivity can not help uch in the detection proble. 8

9 4 Support recovery fro passive easureents In this section, we address the proble of estiating the support set B of the signal A fro noisy linear easureents as in equation (2.2). We give a lower bound for the case when the easureent atrices (X i ) i [] are independent with jointly Gaussian entries, with variance n n 2. The variance of the entries is set so that the energy of each easureent E X i 2 fro =. For the upper bound we consider the case when the signal is block constant, that is, µ ij = µ in for (i, j) B. We consider the ore general case of easureent atrices which have possibly correlated sub-gaussian entries. In particular, denote by X the ( n n 2 ) atrix where each row is vec(x i ). We give upper bounds for the case when X = ΨΣ /2, where Ψ R ( n n 2 ) has independent sub-gaussian entries with variance n n 2, where Σ R (n n 2 n n 2 ) satisfies 0 < c λ in (Σ) λ ax (Σ) C for universal constants c, C. We state our results in ters of λ ax (Σ) and λ in (Σ) and track the dependence on the in our proofs. The energy of each easureent E X i 2 tr(σ) fro is n n 2. In order to copare results across different easureent schees we prefer to treat λ in (Σ) and λ ax (Σ) as universal constants, so that the energy of each easureent is also at ost a universal constant. In this case we say the easureent atrices are drawn fro the Σ sub-gaussian enseble. It is easy to see that this generalizes the case of an independent Gaussian enseble (by taking λ in (Σ) = λ ax (Σ) = ). 4. Lower bound The following theore gives a lower bound on the aplitude of the signal, below which no procedure can reliably identify the support set B. Theore 2. Consider sensing with easureent atrices X i for which each entry is independent Gaussian with variance n n 2. There exist positive constants C, α > 0 independent of the proble paraeters (k, k 2, n, n 2 ), such that if ( µ in C n n 2 ax in(k, k 2 ), log(n ) n 2 ), k k 2 then the iniax support recovery risk R sup α > 0. The proof is based on a standard technique described in Chapter 2.6 of Tsybakov (2009). We start by identifying a subset of atrices that are hard to distinguish. Once a suitable finite set is identified, tools for establishing lower bounds on the error in ultiple-hypothesis testing can be directly applied. These tools only require coputing the Kullback-Leibler (KL) divergence between the induced distributions, which in our case are two ultivariate noral distributions. Notice that the bound in Theore 2 has two ters. The first ter characterizes the difficulty of distinguishing two signals whose supports overlap considerably. The second ter characterizes the difficulty of identifying the correct support when the signal is one of a large faily of signals whose supports do not overlap. These constructions and calculations are described in detail in Appendix A.2. 9

10 4.2 Upper bound We will investigate a procedure that searches over all contiguous blocks of size (k k 2 ) as defined in Eq. (2.). For any block B B, let A B be the atrix I{B} I{B} T R n n 2, where I{B} denotes the (n n 2 ) atrix with entries which are for indices in B and 0 otherwise. Our procedure is to solve the following progra: B = arg in B B in µ R + 2 (y i µ A B, X i ) 2. (4.) This progra can be solved in polynoial tie by a brute force search for the outer iniization. We show the following result for the output of this procedure. Theore 3. Consider sensing with easureent atrices X i, that are drawn fro the Σ sub-gaussian enseble. For block-constant signals, for ( ) λax (Σ) 2 ( n n ) 2 C log, λ in (Σ) α for a universal constant C if µ in 2 λax (Σ) λ in (Σ) 2 n n 2 log (2n n 2 /α) in(k, k 2 ) the procedure described above outputs a block B such that R sup ( B) α. The proof follows the general recipe outlined in Negahban and Wainwright (20) of establishing the strong convexity of the easureent enseble for atrices that satisfy a certain cone condition. However, a direct application of those results leads to uch weaker bounds in our setting and several careful odifications are needed in order to obtain sharp results. We present two other analyses in the Appendix. In Appendix A.4, we study a procedure that outputs the block B for which Ψ(B) := y i X i,ab, (4.2) (a,b) B i [] is axiized. We show that this procedure achieves the sae rate for uncorrelated designs but without the assuption of block-constant signal. We also show that a odification of the procedure in Equation 4. is able to estiate a support of a signal that has both positive and negative coponents, without the assuption of a block-constant signal, albeit at a slower rate, in Appendix B. Coparing to the lower bound in Theore 2, we observe that the procedure outlined in this section achieves the lower bound up to a log (n n 2 ) factor. It is also worth noting that the aplitude of the signal needed for correct support recovery is larger than the bound we saw earlier for passive detection. The block structure of the activation allows us, even in the passive setting, to localize uch weaker signals. A straightforward adaptation of results on the LASSO (Wainwright, 0

11 2009a) suggest that if the non-zero entries are spread out (say at rando) then we would require µ in C n n 2 log(n n 2 ) to correctly identify the block B. Exploiting the structure in the signal collection allows us to gain a / in(k, k 2 ) factor. As we will see in Section 5 even ore substantial gains are possible when we choose easureents adaptively. We also note that one can also use a siilar procedure to adapt to the unknown size of the activation block. In particular, one can perfor exhaustive search procedure for all possible sizes of activation blocks. Sall odifications to the proof of Theore 3 can be used to show that this procedure adapts to the unknown block size while still achieving the sae risk. We oit the details but note that siilar odifications have been detailed in Butucea and Ingster (203) for a related proble. 5 Support recovery fro active easureents In this section, we study identification of B using adaptive procedures, that is, the easureent atrix X i ay be a function of (y j, X j ) j [i ]. 5. Lower bound The following theore gives a lower bound on the signal aplitude needed for any active procedure to estiate the support set B. Theore 4. Fix any 0 < α <. There exists a constant C > 0 independent of the proble paraeters (k, k 2, n, n 2 ), such that, given adaptively chosen easureents, if then R sup α. µ in < C( α) ax ( n n 2 k 2, k2 2 ) in(k, k 2 ) The proof is siilar to the proof of Theore 2 in that we construct specific pairs of hypotheses that are hard to distinguish. The two ters in the lower bound reflect the hardness of estiating the support of the signal in two iportant cases. The first ter reflects the difficulty of approxiately estiating the support B. This ter grows at the sae rate as the detection lower bound, and its proof is siilar. Given a coarse estiate of the support, we still need to identify the exact support of the signal. The hardness of this proble gives rise to the second ter in the lower bound. The ter is independent of n and n 2 but has a considerably worse dependence on k and k Upper bound The upper bound is established by analyzing the procedures described in Algoriths and 2 for approxiate and exact recovery of the support of the signal. Algorith is used to approxiately estiate the support B, that is, it locates a 8k 8k 2 block that contains the support B with high probability. The algorith essentially perfors copressive binary

12 Algorith Approxiate support recovery input Measureent budget log p a, a collection of size b p of blocks D of size (2k 2k 2 ). Initial support: J () 0 := {,..., p}, s 0 := log p. For each s in,..., log 2 p. Allocate: s := ( s 0 )s2 s Split: Partition J (s) 0 into two collections of blocks of equal size, J (s) contains the first p/2 s blocks and J (s) 2 the reainder. 3. Sensing atrix: X s = 4. Measure: y (s) i 2 (s 0 s+) 4k k 2 on J (s), X 2 s = (s 0 s+) 4k k 2 = A, X s + z (s) i for i [,..., s ]. on J (s) 2 and 0 otherwise. 5. Update support: J (s+) 0 = J (s) if s y(s) i > 0 and J (s+) 0 = J (s) 2 otherwise. output The single block in J (s0+) 0. a Recall the definition of p = n n 2 4k k 2. b We assue p is dyadic to siplify our presentation of the algorith. Algorith 2 Exact recovery (of coluns) input Measureent budget, a sub-atrix B R 4k 4k2, success probability α.. Measure: y c i = (4k ) /2 4k l= B lc+z c i for i = {,..., /5} and c {, k 2+, 2k 2 +, 3k 2 +}. 2. Let l = argax c /5 yc i, r = l + k 2, b = 6 log 2 k While r l (a) Let c = r+l 2. (b) Measure y c i = (4k ) /2 4k l= B lc + z c i for i = {,..., b}. (c) If a ( ) ) b yc i (log O log k2 b 2 α log k 2 then l = c, otherwise r = c. output Set of coluns {l k 2 +,..., l}. a The exact constants appear in the proof of Theore 5. 2

13 search (Davenport and Arias-Castro (202)) on a collection of non-overlapping blocks that partition the atrix. Specifically, we create 4 collections D, D 2, D 3 and D 4 defined as 3 and D := {B, := [,..., 2k ] [,..., 2k 2 ], B,2 := [2k +,..., 4k ] [,..., 2k 2 ]..., B,n n 2 /4k k 2 := [n 2k,..., n ] [n 2 2k 2,..., n 2 ] } D 2 := {B 2, := [k,..., 3k ] [k 2,..., 3k 2 ], B 2,2 := [3k +,..., 5k ] [k 2,..., 3k 2 ]..., B 2,n n 2 /4k k 2 := [n k,..., n,,..., k ] [n 2 k 2,..., n 2,,..., k 2 ] } D 3 := {B 3, := [k,..., 3k ] [,..., 2k 2 ], B 3,2 := [3k +,..., 5k ] [,..., 2k 2 ]..., B 3,n n 2 /4k k 2 := [n k,..., n,,..., k ] [n 2 2k 2,..., n 2 ] }, D 4 := {B 4, := [,..., 2k ] [k 2,..., 3k 2 ], B 4,2 := [2k +,..., 4k ] [k 2,..., 3k 2 ]..., B 4,n n 2 /4k k 2 := [n 2k,..., n ] [n 2 k 2,..., n 2,,..., k 2 ] } each of which contains p := n n 2 /(4k k 2 ) non-overlapping blocks of the atrix A. D is a partition of the atrix into disjoint blocks of size (2k 2k 2 ), D 3 is a siilar partition shifted down by k rows, D 4 is shifted to the right by k 2 coluns and D 2 is both shifted down by k rows and to the right by k 2 coluns. Figure illustrates this. Notice, that one of these collections ust include a partition (one of D through D 4 ) that copletely contains the support B. Algorith is run separately on each of the collections and it outputs 4 blocks. As we will show in our analysis of this procedure, one of these 4 blocks contains the support B with high probability. Algorith is a odification of the copressive binary search (Davenport and Arias-Castro, 202) adapted to perfor the copressive binary search in a atrix setting. We focus our attention on the collection that copletely contains B. The algorith proceeds in steps. Starting with a collection of disjoint blocks, at each step the collection is split into two halves and a test is perfored to deterine which half is ore likely to contain the support of B. This process continues in log 2 (p) steps until only one block reains. Algorith 2 is used next to precisely identify B within one of the four blocks identified by Algorith. Algorith 2 itself works in several stages: in the first stage the procedure easures repeatedly a sall nuber of coluns, exactly one of which is contained in B, to identify an active colun with high probability. The next stage finds the first non-active colun to the left and right by testing coluns using a binary search procedure. In this way, all the active coluns are located. Finally, Algorith 2 is repeated on the rows to identify the active rows. The following theore states that Algorith and Algorith 2 succeed in estiating the support B with high probability if the aplitude of the signal is large enough. Theore 5. If ( ( )) µ in C n n 2 log (ax(k, k 2 )) log log(k k 2 ) log(/α) ax k 2, k2 2 in(k, k 2 ) 3 For siplicity, we assue n is a ultiple of 2k and n 2 of 2k 2 3

14 Figure : The collection of blocks D is shown in solid lines and the collection D 2 is shown in dashed lines. The collections D 3 and D 4 overlap with these and are not shown. The (k k 2 ) block of activation is shown in red. and 3 log(n n 2 ) then R( B) α, where B is the block output by running Algorith for approxiate localization followed by Algorith 2 on the subset of rows and coluns returned by Algorith. ( Note that the upper bound given in Theore 5 atches the lower bound up to a log ) O (ax(k, k 2 )) log log(k k 2 ) factor. It is worth noting that for sall k and k 2, when the first ter doinates, our active support recovery procedure achieves the detection liits. This is the best result we could hope for. When the support is larger, the lower bound indicates that no procedure can achieve the detection rate. The active procedure still reains significantly ore efficient than the passive one, and is able to recover the support of signals that are weaker by a n n 2 factor. The great potential for gains fro adaptive easureents is clearly seen in our odel which captures the fundaental interplay between structure and adaptivity. 6 Experients In this section, we perfor a set of siulation studies to illustrate finite saple perforance of the proposed procedures. We let n = n 2 = n and k = k 2 = k. Theore 8 and Theore 5 characterize the signal aplitude needed for the passive and active identification of a contiguous block, respectively. We deonstrate that the scalings predicted by these theores are sharp by plotting the probability of successful recovery against appropriately rescaled signal aplitude and showing that the curves for different values of n and k line up. 4

15 Experient. Figure 2 shows the probability of successful localization of B using B defined in Eq. (A.5) plotted against n k (µ in /), where the nuber of easureents = 00. Each plot in Figure 2 represents different relationship between k and n; in the first plot, k = Θ(log n), in the second k = Θ( n), while in the third plot k = Θ(n). The dashed vertical line denotes the threshold position for the scaled aplitude at which the probability of success is larger than We observe that irrespective of the proble size and the relationship between n and k, Theore 8 tightly characterizes the iniu signal aplitude needed for successful identification. Probability of success n k SNR (n, k) = ( 50, 4) (n, k) = (00, 5) (n, k) = (50, 6) (n, k) = (200, 7) n k SNR (n, k) = ( 50, 7) (n, k) = (00, 0) (n, k) = (50, 2) (n, k) = (200, 4) n k SNR (n, k) = ( 50, 0) (n, k) = (00, 20) (n, k) = (50, 30) (n, k) = (200, 40) Figure 2: Probability of success with passive easureents (averaged over 00 siulation runs). Experient 2. Figure 3 shows the probability of successful localization of B using the procedure outlined in Section 5.2., with = 500 adaptively chosen easureents, plotted against the scaled signal aplitude. (µ in /) is scaled by n k 2 in the first two plots where k = Θ(log n) and k = Θ( n) respectively, while in the third plot the aplitude is scaled by k/ log k as k = Θ(n). The dashed vertical line denotes the threshold position for the scaled signal aplitude at which the probability of success is larger than We observe that Theore 5 sharply characterizes the iniu signal aplitude needed for successful identification. Probability of success (n, k) = (280, 0) (n, k) = (286, ) (n, k) = (644, 2) n k 2 SNR (n, k) = ( 8960, 70) (n, k) = (0240, 80) (n, k) = (520, 90) n k 2 SNR Probability of success (n, k) = ( 4000, 000) (n, k) = (6000, 2000) (n, k) = (24000, 3000) k/log(k) SNR Figure 3: Probability of success with adaptively chosen easureents (averaged over 00 siulation runs). 5

16 7 Discussion In this paper, we establish the fundaental liits for the proble of detecting and localizing a contiguous block of weak activation in a data atrix fro either adaptive or non-adaptive copressive easureents. Our bounds characterize the tradeoffs between the signal aplitude, size of the atrix, size of the non-zero block and nuber of easureents. We also deonstrate constructive coputationally efficient estiators that achieve these bounds. Soe interesting direct extensions of the work in this paper would be to address the case of unknown signal support size, to address the ore general proble of recovering a signal block within a randoly peruted data atrix, coonly known as biclustering, and finally to understand localization beyond exact recovery. The contiguous block odel we consider is one useful for of the ore general class of structured sparse signals, where adaptivity helps considerably. Techniques developed in this work have since been applied to the probles of adaptive copressive sensing of tree-sparse and graph-structured signals in the works of Soni and Haupt (203) and Krishnaurthy et al. (203). We expect a full characterization of structured signals for which adaptive sensing can be useful will be a fruitful path for future investigation. Acknowledgeents We would like to thank Larry Wasseran for his ideas, indispensable advice and wise guidance. SB would also like to thank Martin Wainwright for helpful discussions related to the proof of Theore 3. This research is supported in part by AFOSR under grant FA , NSF under grant IIS-6458 and NSF CAREER grant DMS References S. Aeron, V. Saligraa, and M. Zhao. Inforation theoretic bounds for copressed sensing. IEEE Trans. Inf. Theory, 56(0):5 530, 200. E. Arias-Castro. Detecting a vector based on linear easureents. Electron. J. Stat., 6: , 202. E. Arias-Castro, E. J. Candés, and A. Durand. Detection of an anoalous cluster in a network. Ann. Stat., 39(): , 20a. E. Arias-Castro, E. J. Candés, and Y. Plan. Global testing under sparse alternatives: Anova, ultiple coparisons and the higher criticis. Ann. Stat., 39(5): , 20b. E. Arias-Castro, E. J. Candés, and M. A. Davenport. On the fundaental liits of adaptive sensing. IEEE Trans. Inf. Theory, 59():472 48, 203. R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde. Model-based copressive sensing. IEEE Trans. Inf. Theory, 56(4): , 200. S. Bhaidi, P. S. Dey, and A. B. Nobel. Energy landscape for large average subatrix detection probles in gaussian rando atrices. arxiv preprint arxiv:2.2284,

17 L. Birgé. An alternative point of view on lepski s ethod. In State of the art in probability and statistics (Leiden, 999), volue 36 of IMS Lecture Notes Monogr. Ser., pages Inst. Math. Statist., Beachwood, OH, 200. C. Butucea and Y. I. Ingster. Detection of a sparse subatrix of a high-diensional noisy atrix. Bernoulli, 9(5B): , 203. C. Butucea, Y. I. Ingster, and I. Suslina. Sharp variable selection of a sparse subatrix in a high-diensional noisy atrix. arxiv preprint arxiv; , 203. E. J. Candés and M. A. Davenport. How well can we estiate a sparse vector? Appl. Coput. Haron. Anal., 34(2):37 323, 203. E. J. Candés and T. Tao. The Dantzig selector: Statistical estiation when p is uch larger than n. Ann. Stat., 35(6): , E. J. Candés and T. Tao. Near-optial signal recovery fro rando projections: Universal encoding strategies? IEEE Trans. Inf. Theory, 52(2): , E. J. Candés and M. B. Wakin. An introduction to copressive sapling. IEEE Signal Process. Mag., 25(2):2 30, Y. Caron, P. Makris, and N. Vincent. A ethod for detecting artificial objects in natural environents. In 6th Int. Conf. Pattern Recogn., volue, pages IEEE, R. M. Castro. Adaptive sensing perforance lower bounds for sparse signal detection and support estiation. arxiv preprint arxiv: , 202. M. A. Davenport and E. Arias-Castro. Copressive binary search. arxiv preprint arxiv: , 202. D. L. Donoho. Copressed sensing. IEEE Trans. Inf. Theory, 52(4): , M. F. Duarte, M. A. Davenport, M. J. Wainwright, and R. G. Baraniuk. Sparse signal detection fro incoherent projections. In Proc. IEEE Int. Conf. Acoustics Speed and Signal Processing, pages III 305 III 308, M. Golbabaee and P. Vandergheynst. Copressed sensing of siultaneous low-rank and joint-sparse atrices. arxiv preprint arxiv:2.5058, 202. J. D. Haupt and R. D. Nowak. Copressive sapling for signal detection. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, volue 3, pages III 509 III 52. IEEE, J. D. Haupt, R. G. Baraniuk, R. M. Castro, and R. D. Nowak. Copressive distilled sensing: Sparse recovery using adaptivity in copressive easureents. In Proc. 43rd Asiloar Conf. Signals, Systes and Coputers, pages IEEE, J. Huang, T. Zhang, and D. Metaxas. Learning with structured sparsity. J. Mach. Learn. Res., 2: , 20. 7

18 Y. I. Ingster, A. B. Tsybakov, and N. Verzelen. Detection boundary in sparse regression. Electron. J. Stat., 4: , 200. I. M. Johnstone and A. Y. Lu. On consistency and sparsity for principal coponents analysis in high diensions. J. A. Stat. Assoc., 04(486): , R. M. Kainkarya and P. J. Woolf. Pooling in high-throughput drug screening. Current opinion in drug discovery & developent, 2(3): , R. M. Kainkarya, A. Bruex, A. C. Gilbert, J. Schiefelbein, and P. J. Woolf. poolmc: Sart pooling of RNA saples in icroarray experients. Bioinforatics, :299, 200. M. Kolar, S. Balakrishnan, A. Rinaldo, and A. Singh. Miniax localization of structural inforation in large noisy atrices. In J. Shawe-Taylor, R. S. Zeel, P. L. Bartlett, F. C. N. Pereira, and K. Q. Weinberger, editors, Advances in Neural Inforation Processing Systes 24, pages , 20. V. Koltchinskii, K. Lounici, and A. B. Tsybakov. Nuclear-nor penalization and optial rates for noisy low-rank atrix copletion. Ann. Stat., 39(5): , 20. A. Krishnaurthy, J. Sharpnack, and A. Singh. Recovering graph-structured activations using adaptive copressive easureents. arxiv preprint arxiv: , 203. B. Laurent and P. Massart. Adaptive estiation of a quadratic functional by odel selection. Ann. Stat., 28(5): , M. L. Malloy and R. D. Nowak. Near-optial adaptive copressed sensing. In Proc. 46th Asiloar Conf. Signals, Systes and Coputers, pages IEEE, 202a. M. L. Malloy and R. D. Nowak. Near-optial copressive binary search. arxiv preprint arxiv: , 202b. S. Negahban and M. J. Wainwright. Estiation of (near) low-rank atrices with noise and high-diensional scaling. Ann. Stat., 39(2): , 20. G. Reeves and M. Gastpar. The sapling rate-distortion tradeoff for sparsity pattern recovery in copressed sensing. IEEE Trans. Inf. Theory, 58(5): , 202. E. Richard, P.-A. Savalle, and N. Vayatis. Estiation of siultaneously sparse and low rank atrices. In Proc. 29th Int. Conf. Mach. Learn., pages , 202. M. Rudelson and R. Vershynin. Non-asyptotic theory of rando atrices: extree singular values. arxiv preprint arxiv: , 200. M. Rudelson and R. Vershynin. Hanson-wright inequality and sub-gaussian concentration. Electron. Coun. Probab., 8(0), 203. A. Soni and J. D. Haupt. Efficient adaptive copressive sensing using sparse hierarchical learned dictionaries. In Proc. 45th Conf. Signals, Systes and Coputers, pages IEEE, 20. 8

19 A. Soni and J. D. Haupt. On the fundaental liits of recovering tree sparse vectors fro noisy linear easureents. arxiv preprint arxiv: , 203. X. Sun and A. B. Nobel. On the axial size of large-average and ANOVA-fit subatrices in a gaussian rando atrix. Bernoulli, 9(): , 203. E. Tánczos and R. M. Castro. Adaptive sensing for estiation of structured sparse signals. arxiv preprint arxiv:3.78, 203. A. B. Tsybakov. Introduction To Nonparaetric Estiation. Springer Series in Statistics. Springer, New York, M. J. Wainwright. Sharp thresholds for high-diensional and noisy sparsity recovery using l -constrained quadratic prograing (lasso). IEEE Trans. Inf. Theory, 55(5): , 2009a. M. J. Wainwright. Inforation-theoretic liits on sparsity recovery in the high-diensional and noisy setting. IEEE Trans. Inf. Theory, 55(2): , 2009b. M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotha, D. Takhar, K. F. Kelly, and R. G. Baraniuk. An architecture for copressive iaging. In Proc. IEEE Int. Conf. Iage Processing,, pages IEEE, S. Yoon, C. Nardini, L. Benini, and G. De Micheli. Discovering coherent biclusters fro gene expression data using zero-suppressed binary decision diagras. IEEE/ACM Trans. Coput. Biol. Bioinf., 2(4): ,

20 A Proofs of Main Results In this appendix, we collect proofs of the results stated in the paper. Throughout the proofs, we will denote c, c 2,... positive constants that ay change their value fro line to line. A. Proof of Theore We lower bound the Bayes risk of any test T. defined in Eq. (2.3), Recall, the null and alternate hypothesis, H 0 : A = 0 n n 2 H : A A(µ in ). We will accoplish this by lower bounding the Bayes risk of any test T for a sipler hypothesis testing proble H 0 : A = 0 n n 2 H : A = (a ij ) with a ij = µ in I {(i,j) B}, B B. It is clear that inf T R det (T ) inf T R det (T ). To further bound the worst case risk of T, we consider a unifor prior over the alternatives π, and bound the average risk R det π (T ) = P 0 [T = ] + E A π P A [T = 0], which provides a desired lower bound. Under the prior π, the hypothesis testing proble becoes to distinguish between H 0 : A = 0 n n 2 H : A = (a ij ) with a ij = µ in E B π [I {(i,j) B} ]. Both H 0 and H are siple and the likelihood ratio test is optial by the Neyan-Pearson lea. The likelihood ratio is L E πp A [(y i, X i ) i [] ] = E π P A[y i X i ] P 0 [(y i, X i ) i [] ] P, 0[y i X i ] where the second equality follows by decoposing the probabilities by the chain rule and observing that P 0 [X i (y j, X j ) j [i ] ] = P A [X i (y j, X j ) j [i ] ], since the sapling strategy (whether active or passive) is the sae irrespective of the true hypothesis. The likelihood ratio can be further siplified as ( ) 2y i A, X i A, X i 2 L = E π exp 2 2. The average risk of the likelihood ratio test Rπ det (T ) = 2 E πp A P 0 TV is deterined by the total variation distance between the ixture of alternatives fro the null. By Pinkser s inequality Tsybakov (2009), E π P A P 0 TV KL(P 0, E π P A )/2, 20

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer