Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery

Similar documents
Limits on Sparse Support Recovery via Linear Sketching with Random Expander Matrices

Improved Group Testing Rates with Constant Column Weight Designs

Shannon s Noisy-Channel Coding Theorem

Compressed Sensing Using Bernoulli Measurement Matrices

High-dimensional graphical model selection: Practical and information-theoretic limits

Communication by Regression: Achieving Shannon Capacity

High-dimensional graphical model selection: Practical and information-theoretic limits

Shannon s noisy-channel theorem

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Iterative Quantization. Using Codes On Graphs

Lecture 4 Noisy Channel Coding

Upper Bounds on the Capacity of Binary Intermittent Communication

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Channel Polarization and Blackwell Measures

Communication by Regression: Sparse Superposition Codes

STAT 200C: High-dimensional Statistics

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Turbo Codes for Deep-Space Communications

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Efficient Use of Joint Source-Destination Cooperation in the Gaussian Multiple Access Channel

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

Lecture 11: Polar codes construction

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel

Structured signal recovery from non-linear and heavy-tailed measurements

ELEC546 Review of Information Theory

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

sparse and low-rank tensor recovery Cubic-Sketching

On Source-Channel Communication in Networks

ECE Information theory Final (Fall 2008)

SPARSE signal models are of great interest due to their applicability

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Variable-Rate Universal Slepian-Wolf Coding with Feedback

ON THE MINIMUM DISTANCE OF NON-BINARY LDPC CODES. Advisor: Iryna Andriyanova Professor: R.. udiger Urbanke

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

The Poisson Channel with Side Information

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Information Recovery from Pairwise Measurements

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Interactions of Information Theory and Estimation in Single- and Multi-user Communications

Secret Key Agreement Using Asymmetry in Channel State Knowledge

Lecture 7 September 24

A new converse in rate-distortion theory

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Quiz 2 Date: Monday, November 21, 2016

arxiv: v1 [cs.it] 21 Feb 2013

Lecture 8: Shannon s Noise Models

Tightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications

Lecture 2: August 31

EECS 750. Hypothesis Testing with Communication Constraints

Cut-Set Bound and Dependence Balance Bound

An Overview of Compressed Sensing

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Graph-based codes for flash memory

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

3. Review of Probability and Statistics

Compressed Sensing under Optimal Quantization

Approximate Message Passing

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Performance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)

Classical codes for quantum broadcast channels. arxiv: Ivan Savov and Mark M. Wilde School of Computer Science, McGill University

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Lecture 2. Capacity of the Gaussian channel

On Large Deviation Analysis of Sampling from Typical Sets

Bayesian Inference Course, WTCN, UCL, March 2013

Shannon s Noisy-Channel Coding Theorem

Survey of Interference Channel

Least Squares Superposition Codes of Moderate Dictionary Size, Reliable at Rates up to Capacity

Fundamental Limits of Compressed Sensing under Optimal Quantization

Subset Universal Lossy Compression

On Multiple User Channels with State Information at the Transmitters

Second-Order Asymptotics for the Gaussian MAC with Degraded Message Sets

Exercises with solutions (Set B)

Lecture 5 Channel Coding over Continuous Channels

Hypothesis Testing with Communication Constraints

On the Secrecy Capacity of Fading Channels

Research Collection. The relation between block length and reliability for a cascade of AWGN links. Conference Paper. ETH Library

Remote Source Coding with Two-Sided Information

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

Shannon-Theoretic Limits on Noisy Compressive Sampling Mehmet Akçakaya, Student Member, IEEE, and Vahid Tarokh, Fellow, IEEE

Support recovery in compressed sensing: An estimation theoretic approach

Lecture 22: Final Review

Channels with cost constraints: strong converse and dispersion

X 1 : X Table 1: Y = X X 2

Frans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko)

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Entropy as a measure of surprise

Approximate Message Passing Algorithms

arxiv: v2 [cs.lg] 6 May 2017

Wireless Compressive Sensing for Energy Harvesting Sensor Nodes over Fading Channels

The PPM Poisson Channel: Finite-Length Bounds and Code Design

Asymptotic Estimates in Information Theory with Non-Vanishing Error Probabilities

National University of Singapore Department of Electrical & Computer Engineering. Examination for

On the Optimality of Treating Interference as Noise in Competitive Scenarios

(Classical) Information Theory III: Noisy channel coding

ELEMENT OF INFORMATION THEORY

Error Exponent Region for Gaussian Broadcast Channels

A Comparison of Two Achievable Rate Regions for the Interference Channel

Transcription:

Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery Jonathan Scarlett jonathan.scarlett@epfl.ch Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland Bristol Statistics Seminar [October, 2015] Joint work with Volkan Cevher @ LIONS

Group testing Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 2/ 25

Group testing In this talk: Defective set S Uniform( p ) k n p Measurement matrix X P X (x i=1 j=1 i,j) with P X Bernoulli(ν/k) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 2/ 25

Group testing In this talk: Defective set S Uniform( p ) k n p Measurement matrix X P X (x i=1 j=1 i,j) with P X Bernoulli(ν/k) Perfect recovery P e := P[Ŝ S] Partial recovery P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 2/ 25

Group testing In this talk: Defective set S Uniform( p ) k n p Measurement matrix X P X (x i=1 j=1 i,j) with P X Bernoulli(ν/k) Perfect recovery P e := P[Ŝ S] Partial recovery P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Goal: Conditions on n for P e 0 or P e(d max) 0 Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 2/ 25

Noiseless Group Testing (Exact Recovery) Observation model { } Y = 1 {X i = 1} i S Bernoulli measurements, sparsity k = O(p θ ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 3/ 25

Noiseless Group Testing (Exact Recovery) Observation model { } Y = 1 {X i = 1} i S Bernoulli measurements, sparsity k = O(p θ ) Sufficient for P e 0: n inf max ν>0 Necessary for P e 1: { }( θ e ν ν(1 θ), 1 H 2 (e ν k log p ) (1 + η) ) k n k log p k log 2 (1 η) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 3/ 25

Noiseless Group Testing (Exact Recovery) 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Key Implication: i.i.d. Bernoulli measurements are asymptotically as good as optimal adaptive measurements when k = O(p 1/3 ). Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 4/ 25

Noisy Group Testing (Exact Recovery) Observation model where Z Bernoulli(ρ) { } Y = 1 {X i = 1} Z i S Bernoulli measurements, sparsity k = O(p θ ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 5/ 25

Noisy Group Testing (Exact Recovery) Observation model where Z Bernoulli(ρ) { } Y = 1 {X i = 1} Z i S Bernoulli measurements, sparsity k = O(p θ ) Sufficient for P e 0: n inf δ 2 (0,1) max { }( 1 ζ(ρ, δ 2, θ), k log p ) (1 + η) log 2 H 2 (ρ) k where ζ(ρ, δ 2, θ) := 2 { 2(1 + 1 log 2 max 3 δ 2(1 2ρ)) θ 1+2θ } 1 θ 1 θ δ2 2(1, 2ρ)2 (1 2ρ) log 1 ρ ρ (1 δ. 2) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 5/ 25

Noisy Group Testing (Exact Recovery) Observation model where Z Bernoulli(ρ) { } Y = 1 {X i = 1} Z i S Bernoulli measurements, sparsity k = O(p θ ) Sufficient for P e 0: n inf δ 2 (0,1) max { }( 1 ζ(ρ, δ 2, θ), k log p ) (1 + η) log 2 H 2 (ρ) k where ζ(ρ, δ 2, θ) := 2 { 2(1 + 1 log 2 max 3 δ 2(1 2ρ)) θ 1+2θ } 1 θ 1 θ δ2 2(1, 2ρ)2 (1 2ρ) log 1 ρ ρ (1 δ. 2) Necessary for P e 1: n k log p k (1 η) log 2 H 2 (ρ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 5/ 25

Noisy Group Testing (Exact Recovery) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Key Implication: i.i.d. Bernoulli measurements are asymptotically as good as optimal adaptive measurements when k = O(p θ ) for sufficiently small θ. Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 6/ 25

Partial Recovery Recovery criterion P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] where d max = α k for some α (0, 1) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 7/ 25

Partial Recovery Recovery criterion P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] where d max = α k for some α (0, 1) Sufficient for P e(d max) 0: Necessary for P e(d max) 1: n k log p k (1 + η) log 2 H 2 (ρ) n (1 α )k log p k log 2 H 2 (ρ) (1 η) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 7/ 25

Partial Recovery Recovery criterion P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] where d max = α k for some α (0, 1) Sufficient for P e(d max) 0: Necessary for P e(d max) 1: n k log p k (1 + η) log 2 H 2 (ρ) n (1 α )k log p k log 2 H 2 (ρ) (1 η) For small θ, the reduction is at most a factor 1 α asymptotically Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 7/ 25

Channel coding Message S Codeword Output Encoder X S Channel Y PY n XS Decoder Estimate Ŝ e.g. see [Wainwright, 2009], [Atia and Saligrama, 2012], [Aksoylar et al., 2013] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 8/ 25

Thresholding Techniques for Channel Coding Mutual information I (X; Y ) := P XY (x, y) log P Y X (y x) P Y (y) x,y Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 9/ 25

Thresholding Techniques for Channel Coding Mutual information I (X; Y ) := P XY (x, y) log P Y X (y x) P Y (y) x,y Information density ı(x; y) := log P Y X (y x) P Y (y) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 9/ 25

Thresholding Techniques for Channel Coding Mutual information I (X; Y ) := P XY (x, y) log P Y X (y x) P Y (y) x,y Information density ı(x; y) := log P Y X (y x) P Y (y) Non-asymptotic channel coding bounds [Han, 2003] P e P [ ı n (X; Y) nr log δ ] + δ P e P [ ı n (X; Y) nr + log δ ] δ Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 9/ 25

Non-Asymptotic Bounds Information density P Y Xsdif X seq (y x sdif, x seq ) ı(x sdif ; y x seq ) := log P Y Xseq (y x seq ) where (s dif, s eq) is a partition of s Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 10/ 25

Non-Asymptotic Bounds Information density P Y Xsdif X seq (y x sdif, x seq ) ı(x sdif ; y x seq ) := log P Y Xseq (y x seq ) where (s dif, s eq) is a partition of s Achievability [ P e P s dif,s eq { ( ı n p k) }] (X sdif ; Y X seq ) log + + δ 1 s dif Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 10/ 25

Non-Asymptotic Bounds Information density P Y Xsdif X seq (y x sdif, x seq ) ı(x sdif ; y x seq ) := log P Y Xseq (y x seq ) where (s dif, s eq) is a partition of s Achievability [ P e P s dif,s eq { ( ı n p k) }] (X sdif ; Y X seq ) log + + δ 1 s dif Converse [ ( P e P ı n p k + s dif ) ] (X sdif ; Y X seq ) log + log δ1 δ 1 s dif Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 10/ 25

Achievability Bound Decoder that searches for the unique set s S such that for all (s dif, s eq) of s with s dif. ı n (x sdif ; y x seq ) > γ sdif Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25

Achievability Bound Decoder that searches for the unique set s S such that ı n (x sdif ; y x seq ) > γ sdif for all (s dif, s eq) of s with s dif. Initial bound: [ P e P (s dif,s eq) { }] ı n (X sdif ; Y X seq ) γ sdif + s S\{s} [ ] P ı n (X s\s ; Y X s s ) > γ sdif, Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25

Achievability Bound Decoder that searches for the unique set s S such that ı n (x sdif ; y x seq ) > γ sdif for all (s dif, s eq) of s with s dif. Analysis of second term (with l := s\s ): [ ] P ı n (X s\s ; Y X s s ) > γ l = P n (k l) X (x s s )P n l X (x s\s)p n Y X seq (y x s s ) x s s,x s\s,y { P n (y x Y X sdif X seq s\s, x s s ) } 1 log P n > γ l (y x Y X seq s s ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25

Achievability Bound Decoder that searches for the unique set s S such that ı n (x sdif ; y x seq ) > γ sdif for all (s dif, s eq) of s with s dif. Analysis of second term (with l := s\s ): [ ] P ı n (X s\s ; Y X s s ) > γ l = P n (k l) X (x s s )P n l X (x s\s)p n Y X seq (y x s s ) x s s,x s\s,y x s s,x s\s,y P n (k l) X { P n (y x Y X sdif X seq s\s, x s s ) } 1 log P n > γ l (y x Y X seq s s ) (x s s )P n l X (x s\s)p n Y X sdif X seq (y x s\s, x s s )e γ l Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25

Achievability Bound Decoder that searches for the unique set s S such that ı n (x sdif ; y x seq ) > γ sdif for all (s dif, s eq) of s with s dif. Analysis of second term (with l := s\s ): [ ] P ı n (X s\s ; Y X s s ) > γ l = P n (k l) X (x s s )P n l X (x s\s)p n Y X seq (y x s s ) x s s,x s\s,y x s s,x s\s,y P n (k l) X { P n (y x Y X sdif X seq s\s, x s s ) } 1 log P n > γ l (y x Y X seq s s ) (x s s )P n l X (x s\s)p n Y X sdif X seq (y x s\s, x s s )e γ l = e γ l Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25

Achievability Bound Decoder that searches for the unique set s S such that ı n (x sdif ; y x seq ) > γ sdif for all (s dif, s eq) of s with s dif. After some re-arrangements and choosing {γ l } appropriately, [ { ( P e P ı n p k) }] (X sdif ; Y X seq ) log + + δ 1 s dif s dif,s eq Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25

Converse Bound Consider a genie that reveals part of the defective set, S eq S, to the decoder. The decoder is left to estimate S dif := S\S eq. Starting point: P e(s eq) P[A(s eq)] P[A(s eq) no error], where A(s eq) = { ı n (X Sdif ; Y X seq ) γ }. Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 12/ 25

Converse Bound Consider a genie that reveals part of the defective set, S eq S, to the decoder. The decoder is left to estimate S dif := S\S eq. Analysis of second term: P[A(s eq) no error] 1 = ( p k+l ) P n p X (x) s dif S dif (s eq) l (x,y) D(s dif s eq) { P n (y x P n Y X sdif X sdif, x seq ) seq Y X sdif X seq (y x sdif, x seq )1 log P n (y x Y X seq ) seq } γ Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 12/ 25

Converse Bound Consider a genie that reveals part of the defective set, S eq S, to the decoder. The decoder is left to estimate S dif := S\S eq. Analysis of second term: P[A(s eq) no error] 1 = ( p k+l ) P n p X (x) s dif S dif (s eq) l (x,y) D(s dif s eq) { P n (y x P n Y X sdif X sdif, x seq ) seq Y X sdif X seq (y x sdif, x seq )1 log P n (y x Y X seq ) seq 1 ( p k+l ) P n p X (x)pn Y X seq (y x seq )e γ l s dif S dif (s eq) (x,y) D(s dif s eq) } γ Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 12/ 25

Converse Bound Consider a genie that reveals part of the defective set, S eq S, to the decoder. The decoder is left to estimate S dif := S\S eq. Analysis of second term: P[A(s eq) no error] 1 = ( p k+l ) P n p X (x) s dif S dif (s eq) l (x,y) D(s dif s eq) { P n (y x P n Y X sdif X sdif, x seq ) seq Y X sdif X seq (y x sdif, x seq )1 log P n (y x Y X seq ) seq 1 ( p k+l ) P n p X (x)pn Y X seq (y x seq )e γ l e γ = ( p k+l ). l s dif S dif (s eq) (x,y) D(s dif s eq) } γ Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 12/ 25

Converse Bound Consider a genie that reveals part of the defective set, S eq S, to the decoder. The decoder is left to estimate S dif := S\S eq. After some re-arrangements and choosing {γ l } appropriately, [ ( P e P ı n p k + s dif ) ] (X sdif ; Y X seq ) log + log δ1 δ 1 s dif Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 12/ 25

Non-Asymptotic Bounds Information density P Y Xsdif X seq (y x sdif, x seq ) ı(x sdif ; y x seq ) := log P Y Xseq (y x seq ) where (s dif, s eq) is a partition of s Achievability [ P e P s dif,s eq { ( ı n p k) }] (X sdif ; Y X seq ) log + + δ 1 s dif Converse [ ( P e P ı n p k + s dif ) ] (X sdif ; Y X seq ) log + log δ1 δ 1 s dif Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 13/ 25

Application Techniques General steps for application [ 1. Bound the tail probabilities P ı n (X sdif ; Y X seq ) E[ı n ] ± nδ ] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 14/ 25

Application Techniques General steps for application [ 1. Bound the tail probabilities P ı n (X sdif ; Y X seq ) E[ı n ] ± nδ 2. Control and simplify the remainder terms ] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 14/ 25

Application Techniques General steps for application [ 1. Bound the tail probabilities P ı n (X sdif ; Y X seq ) E[ı n ] ± nδ 2. Control and simplify the remainder terms General form of corollaries: P e 0 if ( ) p k log s n max dif (1 + η) (s dif,s eq) I (X sdif ; Y X seq ) ] and P e 1 if n max (s dif,s eq) ( ) p k+ s log dif s dif (1 η) I (X sdif ; Y X seq ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 14/ 25

Concentration Inequalities Noiseless and noisy cases (This one alone is enough for partial recovery!): [ ı ] ( ) P n (X sdif ; Y Xseq ) ni (l) nδ 2 exp δ2 n 4(8 + δ) Proved using Bernstein s inequality The only property used is that Y = 2 Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 15/ 25

Concentration Inequalities Noiseless and noisy cases (This one alone is enough for partial recovery!): [ ı ] ( ) P n (X sdif ; Y Xseq ) ni (l) nδ 2 exp δ2 n 4(8 + δ) Proved using Bernstein s inequality The only property used is that Y = 2 Noiseless case: [ ] P ı n ni (l)(1 δ 2 ) ( exp n l ) ) k e ν ν ((1 δ 2 ) log(1 δ 2 ) + δ 2 (1 ɛ) Proved by writing ı n (X sdif ; Y X seq ) in terms of Binomial random variables, then applying Binomial tail bounds Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 15/ 25

Concentration Inequalities Noiseless and noisy cases (This one alone is enough for partial recovery!): [ ı ] ( ) P n (X sdif ; Y Xseq ) ni (l) nδ 2 exp δ2 n 4(8 + δ) Proved using Bernstein s inequality The only property used is that Y = 2 Noiseless case: [ ] P ı n ni (l)(1 δ 2 ) ( exp n l ) ) k e ν ν ((1 δ 2 ) log(1 δ 2 ) + δ 2 (1 ɛ) Proved by writing ı n (X sdif ; Y X seq ) in terms of Binomial random variables, then applying Binomial tail bounds Noisy case with crossover probability ρ: [ ] ( P ı n ni (l)(1 δ 2 ) exp n l ( k e ν ν δ2 2 (1 2ρ)2 2(1 + 1 3 δ 2(1 2ρ)) ) ) (1 ɛ) Proved using Bennet s inequality may be somewhat crude Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 15/ 25

Recap of Results Noiseless case (exact recovery): Sufficient for P e 0: { θ n inf max ν>0 e ν ν(1 θ), 1 H 2(e ν ) Necessary for P e 1: } ( k log p k ) (1 + η) n k log p k (1 η) log 2 Noisy case (exact recovery): Sufficient for P e 0: { } 1 ( n inf max ζ(ρ, δ 2, θ), k log p ) (1 + η). δ 2 (0,1) log 2 H 2(ρ) k Necessary for P e 1: k log p k n (1 η) log 2 H 2(ρ) Noiseless and noisy cases (partial recovery): Sufficient for P e(d max) 0: Necessary for P e(d max) 1: n k log p k (1 + η) log 2 H 2(ρ) n (1 α )k log p k (1 η) log 2 H 2(ρ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 16/ 25

More General Suport Recovery X 2 R n p Y 2 R n 2 R p Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 17/ 25

More General Suport Recovery X 2 R n p Y 2 R n 2 R p General probabilistic models Support S Uniform( p ) k n Measurement matrix X i=1 Non-zero entries β S P βs Observations (Y X, β) P Y XS β S p j=1 P X (x i,j) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 17/ 25

More General Suport Recovery X 2 R n p Y 2 R n 2 R p General probabilistic models Support S Uniform( p ) k n Measurement matrix X i=1 Non-zero entries β S P βs Observations (Y X, β) P Y XS β S Support recovery Partial recovery p j=1 P X (x i,j) P e := P[Ŝ S] P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 17/ 25

More General Suport Recovery X 2 R n p Y 2 R n 2 R p General probabilistic models Support S Uniform( p ) k n Measurement matrix X i=1 Non-zero entries β S P βs Observations (Y X, β) P Y XS β S Support recovery Partial recovery p j=1 P X (x i,j) P e := P[Ŝ S] P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Goal: Conditions on n for P e 0 or P e(d max) 0 Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 17/ 25

Channel coding Channel State β S P βs Message S Codeword Output Encoder X S Channel Y PY n XSβS Decoder Estimate Ŝ e.g. see [Wainwright, 2009], [Atia and Saligrama, 2012], [Aksoylar et al., 2013] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 18/ 25

More General Suport Recovery Steps for applying generalized non-asymptotic bounds: 1. Construct a typical set T β of non-zero entries β S Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 19/ 25

More General Suport Recovery Steps for applying generalized non-asymptotic bounds: 1. Construct a typical set T β [ of non-zero entries β S ] 2. Bound the tail probabilities P ı n (X sdif ; Y X seq, β s) E[ı n ] ± nδ βs = b s Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 19/ 25

More General Suport Recovery Steps for applying generalized non-asymptotic bounds: 1. Construct a typical set T β [ of non-zero entries β S ] 2. Bound the tail probabilities P ı n (X sdif ; Y X seq, β s) E[ı n ] ± nδ βs = b s 3. Control and simplify the remainder terms Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 19/ 25

More General Suport Recovery Steps for applying generalized non-asymptotic bounds: 1. Construct a typical set T β [ of non-zero entries β S ] 2. Bound the tail probabilities P ı n (X sdif ; Y X seq, β s) E[ı n ] ± nδ βs = b s 3. Control and simplify the remainder terms General form of corollaries: P e 0 if ( ) p k log s n max dif (1 + η) (s dif,s eq) I (X sdif ; Y X seq, β s = b for s) all bs T β and P e 1 if n max (s dif,s eq) ( ) p k+ s log dif s dif (1 η) I (X sdif ; Y X seq, β s = b for s) all bs T β Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 19/ 25

Exact Recovery for Linear and 1-bit Models Observation models Y = X, β + Z Y = sign ( X, β + Z ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 20/ 25

Exact Recovery for Linear and 1-bit Models Observation models Y = X, β + Z Y = sign ( X, β + Z ) Case 1: Gaussian X, discrete β S, sparsity k = Θ(1), low SNR Necessary and sufficient conditions: Linear 1-bit s dif log p max (s dif,seq) 1 2σ i s 2 bi 2 dif s dif log p max (s dif,seq) 1 πσ i s 2 bi 2 dif Only factor π 2 difference Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 20/ 25

Exact Recovery for Linear and 1-bit Models Observation models Y = X, β + Z Y = sign ( X, β + Z ) Case 1: Gaussian X, discrete β S, sparsity k = Θ(1), low SNR Necessary and sufficient conditions: Linear 1-bit s dif log p max (s dif,seq) 1 2σ i s 2 bi 2 dif s dif log p max (s dif,seq) 1 πσ i s 2 bi 2 dif Only factor π 2 difference Case 2: Gaussian X, fixed β S, sparsity k = Θ(p), moderate SNR Conditions: Linear Θ(p) sufficient 1-bit ( ) Ω p log p necessary Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 20/ 25

Partial Recovery for Linear and 1-bit Models Partial recovery P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Linear and 1-bit models Gaussian X, Gaussian β S, sparsity k = o(p), allowed errors d max = α k Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 21/ 25

Partial Recovery for Linear and 1-bit Models Partial recovery P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Linear and 1-bit models Gaussian X, Gaussian β S, sparsity k = o(p), allowed errors d max = α k Sufficient for P e 0: Necessary for P e 1: n max α [α,1] αk log p k f (α) (1 + η) n max α [α,1] (α α )k log p k f (α) (1 η) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 21/ 25

Partial Recovery 10 5 10 4 10 3 10 2 10 1 10 0 10-1 10-2 -20 0 20 40 60 80 100 Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 22/ 25

Conclusion Contributions: New information-theoretic limits for group testing and other support recovery problems Exact thresholds (phase transitions), or near-exact Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 23/ 25

Conclusion Contributions: New information-theoretic limits for group testing and other support recovery problems Exact thresholds (phase transitions), or near-exact Implications for group testing: Optimality of i.i.d. Bernoulli measurements in many cases Limited gain by moving to partial recovery Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 23/ 25

Conclusion Contributions: New information-theoretic limits for group testing and other support recovery problems Exact thresholds (phase transitions), or near-exact Implications for group testing: Optimality of i.i.d. Bernoulli measurements in many cases Limited gain by moving to partial recovery Future work: Closing the remaining gaps (better concentration inequalities?) Non-i.i.d. measurements Applications to other non-linear models Information-spectrum type analysis of other statistical problems Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 23/ 25

Conclusion Contributions: New information-theoretic limits for group testing and other support recovery problems Exact thresholds (phase transitions), or near-exact Implications for group testing: Optimality of i.i.d. Bernoulli measurements in many cases Limited gain by moving to partial recovery Future work: Closing the remaining gaps (better concentration inequalities?) Non-i.i.d. measurements Applications to other non-linear models Information-spectrum type analysis of other statistical problems Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 23/ 25

Further Details Further details (group testing): http://infoscience.epfl.ch/record/206886 (accepted to 2016 SODA conference) Further details (general models): http://arxiv.org/abs/1501.07440 (submitted to IEEE Transactions on Information Theory) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 24/ 25

References I M. Wainwright, Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting, IEEE Trans. Inf. Theory, vol. 55, no. 12, pp. 5728 5741, Dec. 2009. G. Atia and V. Saligrama, Boolean compressed sensing and noisy group testing, IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1880 1901, March 2012. C. Aksoylar, G. Atia, and V. Saligrama, Sparse signal processing with linear and non-linear observations: A unified Shannon theoretic approach, April 2013, http://arxiv.org/abs/1304.0682. T. S. Han, Information-Spectrum Methods in Information Theory. Springer, 2003. Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 25/ 25