Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery Jonathan Scarlett jonathan.scarlett@epfl.ch Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) Switzerland Bristol Statistics Seminar [October, 2015] Joint work with Volkan Cevher @ LIONS
Group testing Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 2/ 25
Group testing In this talk: Defective set S Uniform( p ) k n p Measurement matrix X P X (x i=1 j=1 i,j) with P X Bernoulli(ν/k) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 2/ 25
Group testing In this talk: Defective set S Uniform( p ) k n p Measurement matrix X P X (x i=1 j=1 i,j) with P X Bernoulli(ν/k) Perfect recovery P e := P[Ŝ S] Partial recovery P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 2/ 25
Group testing In this talk: Defective set S Uniform( p ) k n p Measurement matrix X P X (x i=1 j=1 i,j) with P X Bernoulli(ν/k) Perfect recovery P e := P[Ŝ S] Partial recovery P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Goal: Conditions on n for P e 0 or P e(d max) 0 Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 2/ 25
Noiseless Group Testing (Exact Recovery) Observation model { } Y = 1 {X i = 1} i S Bernoulli measurements, sparsity k = O(p θ ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 3/ 25
Noiseless Group Testing (Exact Recovery) Observation model { } Y = 1 {X i = 1} i S Bernoulli measurements, sparsity k = O(p θ ) Sufficient for P e 0: n inf max ν>0 Necessary for P e 1: { }( θ e ν ν(1 θ), 1 H 2 (e ν k log p ) (1 + η) ) k n k log p k log 2 (1 η) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 3/ 25
Noiseless Group Testing (Exact Recovery) 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Key Implication: i.i.d. Bernoulli measurements are asymptotically as good as optimal adaptive measurements when k = O(p 1/3 ). Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 4/ 25
Noisy Group Testing (Exact Recovery) Observation model where Z Bernoulli(ρ) { } Y = 1 {X i = 1} Z i S Bernoulli measurements, sparsity k = O(p θ ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 5/ 25
Noisy Group Testing (Exact Recovery) Observation model where Z Bernoulli(ρ) { } Y = 1 {X i = 1} Z i S Bernoulli measurements, sparsity k = O(p θ ) Sufficient for P e 0: n inf δ 2 (0,1) max { }( 1 ζ(ρ, δ 2, θ), k log p ) (1 + η) log 2 H 2 (ρ) k where ζ(ρ, δ 2, θ) := 2 { 2(1 + 1 log 2 max 3 δ 2(1 2ρ)) θ 1+2θ } 1 θ 1 θ δ2 2(1, 2ρ)2 (1 2ρ) log 1 ρ ρ (1 δ. 2) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 5/ 25
Noisy Group Testing (Exact Recovery) Observation model where Z Bernoulli(ρ) { } Y = 1 {X i = 1} Z i S Bernoulli measurements, sparsity k = O(p θ ) Sufficient for P e 0: n inf δ 2 (0,1) max { }( 1 ζ(ρ, δ 2, θ), k log p ) (1 + η) log 2 H 2 (ρ) k where ζ(ρ, δ 2, θ) := 2 { 2(1 + 1 log 2 max 3 δ 2(1 2ρ)) θ 1+2θ } 1 θ 1 θ δ2 2(1, 2ρ)2 (1 2ρ) log 1 ρ ρ (1 δ. 2) Necessary for P e 1: n k log p k (1 η) log 2 H 2 (ρ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 5/ 25
Noisy Group Testing (Exact Recovery) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Key Implication: i.i.d. Bernoulli measurements are asymptotically as good as optimal adaptive measurements when k = O(p θ ) for sufficiently small θ. Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 6/ 25
Partial Recovery Recovery criterion P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] where d max = α k for some α (0, 1) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 7/ 25
Partial Recovery Recovery criterion P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] where d max = α k for some α (0, 1) Sufficient for P e(d max) 0: Necessary for P e(d max) 1: n k log p k (1 + η) log 2 H 2 (ρ) n (1 α )k log p k log 2 H 2 (ρ) (1 η) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 7/ 25
Partial Recovery Recovery criterion P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] where d max = α k for some α (0, 1) Sufficient for P e(d max) 0: Necessary for P e(d max) 1: n k log p k (1 + η) log 2 H 2 (ρ) n (1 α )k log p k log 2 H 2 (ρ) (1 η) For small θ, the reduction is at most a factor 1 α asymptotically Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 7/ 25
Channel coding Message S Codeword Output Encoder X S Channel Y PY n XS Decoder Estimate Ŝ e.g. see [Wainwright, 2009], [Atia and Saligrama, 2012], [Aksoylar et al., 2013] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 8/ 25
Thresholding Techniques for Channel Coding Mutual information I (X; Y ) := P XY (x, y) log P Y X (y x) P Y (y) x,y Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 9/ 25
Thresholding Techniques for Channel Coding Mutual information I (X; Y ) := P XY (x, y) log P Y X (y x) P Y (y) x,y Information density ı(x; y) := log P Y X (y x) P Y (y) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 9/ 25
Thresholding Techniques for Channel Coding Mutual information I (X; Y ) := P XY (x, y) log P Y X (y x) P Y (y) x,y Information density ı(x; y) := log P Y X (y x) P Y (y) Non-asymptotic channel coding bounds [Han, 2003] P e P [ ı n (X; Y) nr log δ ] + δ P e P [ ı n (X; Y) nr + log δ ] δ Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 9/ 25
Non-Asymptotic Bounds Information density P Y Xsdif X seq (y x sdif, x seq ) ı(x sdif ; y x seq ) := log P Y Xseq (y x seq ) where (s dif, s eq) is a partition of s Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 10/ 25
Non-Asymptotic Bounds Information density P Y Xsdif X seq (y x sdif, x seq ) ı(x sdif ; y x seq ) := log P Y Xseq (y x seq ) where (s dif, s eq) is a partition of s Achievability [ P e P s dif,s eq { ( ı n p k) }] (X sdif ; Y X seq ) log + + δ 1 s dif Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 10/ 25
Non-Asymptotic Bounds Information density P Y Xsdif X seq (y x sdif, x seq ) ı(x sdif ; y x seq ) := log P Y Xseq (y x seq ) where (s dif, s eq) is a partition of s Achievability [ P e P s dif,s eq { ( ı n p k) }] (X sdif ; Y X seq ) log + + δ 1 s dif Converse [ ( P e P ı n p k + s dif ) ] (X sdif ; Y X seq ) log + log δ1 δ 1 s dif Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 10/ 25
Achievability Bound Decoder that searches for the unique set s S such that for all (s dif, s eq) of s with s dif. ı n (x sdif ; y x seq ) > γ sdif Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25
Achievability Bound Decoder that searches for the unique set s S such that ı n (x sdif ; y x seq ) > γ sdif for all (s dif, s eq) of s with s dif. Initial bound: [ P e P (s dif,s eq) { }] ı n (X sdif ; Y X seq ) γ sdif + s S\{s} [ ] P ı n (X s\s ; Y X s s ) > γ sdif, Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25
Achievability Bound Decoder that searches for the unique set s S such that ı n (x sdif ; y x seq ) > γ sdif for all (s dif, s eq) of s with s dif. Analysis of second term (with l := s\s ): [ ] P ı n (X s\s ; Y X s s ) > γ l = P n (k l) X (x s s )P n l X (x s\s)p n Y X seq (y x s s ) x s s,x s\s,y { P n (y x Y X sdif X seq s\s, x s s ) } 1 log P n > γ l (y x Y X seq s s ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25
Achievability Bound Decoder that searches for the unique set s S such that ı n (x sdif ; y x seq ) > γ sdif for all (s dif, s eq) of s with s dif. Analysis of second term (with l := s\s ): [ ] P ı n (X s\s ; Y X s s ) > γ l = P n (k l) X (x s s )P n l X (x s\s)p n Y X seq (y x s s ) x s s,x s\s,y x s s,x s\s,y P n (k l) X { P n (y x Y X sdif X seq s\s, x s s ) } 1 log P n > γ l (y x Y X seq s s ) (x s s )P n l X (x s\s)p n Y X sdif X seq (y x s\s, x s s )e γ l Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25
Achievability Bound Decoder that searches for the unique set s S such that ı n (x sdif ; y x seq ) > γ sdif for all (s dif, s eq) of s with s dif. Analysis of second term (with l := s\s ): [ ] P ı n (X s\s ; Y X s s ) > γ l = P n (k l) X (x s s )P n l X (x s\s)p n Y X seq (y x s s ) x s s,x s\s,y x s s,x s\s,y P n (k l) X { P n (y x Y X sdif X seq s\s, x s s ) } 1 log P n > γ l (y x Y X seq s s ) (x s s )P n l X (x s\s)p n Y X sdif X seq (y x s\s, x s s )e γ l = e γ l Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25
Achievability Bound Decoder that searches for the unique set s S such that ı n (x sdif ; y x seq ) > γ sdif for all (s dif, s eq) of s with s dif. After some re-arrangements and choosing {γ l } appropriately, [ { ( P e P ı n p k) }] (X sdif ; Y X seq ) log + + δ 1 s dif s dif,s eq Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 11/ 25
Converse Bound Consider a genie that reveals part of the defective set, S eq S, to the decoder. The decoder is left to estimate S dif := S\S eq. Starting point: P e(s eq) P[A(s eq)] P[A(s eq) no error], where A(s eq) = { ı n (X Sdif ; Y X seq ) γ }. Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 12/ 25
Converse Bound Consider a genie that reveals part of the defective set, S eq S, to the decoder. The decoder is left to estimate S dif := S\S eq. Analysis of second term: P[A(s eq) no error] 1 = ( p k+l ) P n p X (x) s dif S dif (s eq) l (x,y) D(s dif s eq) { P n (y x P n Y X sdif X sdif, x seq ) seq Y X sdif X seq (y x sdif, x seq )1 log P n (y x Y X seq ) seq } γ Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 12/ 25
Converse Bound Consider a genie that reveals part of the defective set, S eq S, to the decoder. The decoder is left to estimate S dif := S\S eq. Analysis of second term: P[A(s eq) no error] 1 = ( p k+l ) P n p X (x) s dif S dif (s eq) l (x,y) D(s dif s eq) { P n (y x P n Y X sdif X sdif, x seq ) seq Y X sdif X seq (y x sdif, x seq )1 log P n (y x Y X seq ) seq 1 ( p k+l ) P n p X (x)pn Y X seq (y x seq )e γ l s dif S dif (s eq) (x,y) D(s dif s eq) } γ Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 12/ 25
Converse Bound Consider a genie that reveals part of the defective set, S eq S, to the decoder. The decoder is left to estimate S dif := S\S eq. Analysis of second term: P[A(s eq) no error] 1 = ( p k+l ) P n p X (x) s dif S dif (s eq) l (x,y) D(s dif s eq) { P n (y x P n Y X sdif X sdif, x seq ) seq Y X sdif X seq (y x sdif, x seq )1 log P n (y x Y X seq ) seq 1 ( p k+l ) P n p X (x)pn Y X seq (y x seq )e γ l e γ = ( p k+l ). l s dif S dif (s eq) (x,y) D(s dif s eq) } γ Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 12/ 25
Converse Bound Consider a genie that reveals part of the defective set, S eq S, to the decoder. The decoder is left to estimate S dif := S\S eq. After some re-arrangements and choosing {γ l } appropriately, [ ( P e P ı n p k + s dif ) ] (X sdif ; Y X seq ) log + log δ1 δ 1 s dif Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 12/ 25
Non-Asymptotic Bounds Information density P Y Xsdif X seq (y x sdif, x seq ) ı(x sdif ; y x seq ) := log P Y Xseq (y x seq ) where (s dif, s eq) is a partition of s Achievability [ P e P s dif,s eq { ( ı n p k) }] (X sdif ; Y X seq ) log + + δ 1 s dif Converse [ ( P e P ı n p k + s dif ) ] (X sdif ; Y X seq ) log + log δ1 δ 1 s dif Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 13/ 25
Application Techniques General steps for application [ 1. Bound the tail probabilities P ı n (X sdif ; Y X seq ) E[ı n ] ± nδ ] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 14/ 25
Application Techniques General steps for application [ 1. Bound the tail probabilities P ı n (X sdif ; Y X seq ) E[ı n ] ± nδ 2. Control and simplify the remainder terms ] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 14/ 25
Application Techniques General steps for application [ 1. Bound the tail probabilities P ı n (X sdif ; Y X seq ) E[ı n ] ± nδ 2. Control and simplify the remainder terms General form of corollaries: P e 0 if ( ) p k log s n max dif (1 + η) (s dif,s eq) I (X sdif ; Y X seq ) ] and P e 1 if n max (s dif,s eq) ( ) p k+ s log dif s dif (1 η) I (X sdif ; Y X seq ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 14/ 25
Concentration Inequalities Noiseless and noisy cases (This one alone is enough for partial recovery!): [ ı ] ( ) P n (X sdif ; Y Xseq ) ni (l) nδ 2 exp δ2 n 4(8 + δ) Proved using Bernstein s inequality The only property used is that Y = 2 Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 15/ 25
Concentration Inequalities Noiseless and noisy cases (This one alone is enough for partial recovery!): [ ı ] ( ) P n (X sdif ; Y Xseq ) ni (l) nδ 2 exp δ2 n 4(8 + δ) Proved using Bernstein s inequality The only property used is that Y = 2 Noiseless case: [ ] P ı n ni (l)(1 δ 2 ) ( exp n l ) ) k e ν ν ((1 δ 2 ) log(1 δ 2 ) + δ 2 (1 ɛ) Proved by writing ı n (X sdif ; Y X seq ) in terms of Binomial random variables, then applying Binomial tail bounds Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 15/ 25
Concentration Inequalities Noiseless and noisy cases (This one alone is enough for partial recovery!): [ ı ] ( ) P n (X sdif ; Y Xseq ) ni (l) nδ 2 exp δ2 n 4(8 + δ) Proved using Bernstein s inequality The only property used is that Y = 2 Noiseless case: [ ] P ı n ni (l)(1 δ 2 ) ( exp n l ) ) k e ν ν ((1 δ 2 ) log(1 δ 2 ) + δ 2 (1 ɛ) Proved by writing ı n (X sdif ; Y X seq ) in terms of Binomial random variables, then applying Binomial tail bounds Noisy case with crossover probability ρ: [ ] ( P ı n ni (l)(1 δ 2 ) exp n l ( k e ν ν δ2 2 (1 2ρ)2 2(1 + 1 3 δ 2(1 2ρ)) ) ) (1 ɛ) Proved using Bennet s inequality may be somewhat crude Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 15/ 25
Recap of Results Noiseless case (exact recovery): Sufficient for P e 0: { θ n inf max ν>0 e ν ν(1 θ), 1 H 2(e ν ) Necessary for P e 1: } ( k log p k ) (1 + η) n k log p k (1 η) log 2 Noisy case (exact recovery): Sufficient for P e 0: { } 1 ( n inf max ζ(ρ, δ 2, θ), k log p ) (1 + η). δ 2 (0,1) log 2 H 2(ρ) k Necessary for P e 1: k log p k n (1 η) log 2 H 2(ρ) Noiseless and noisy cases (partial recovery): Sufficient for P e(d max) 0: Necessary for P e(d max) 1: n k log p k (1 + η) log 2 H 2(ρ) n (1 α )k log p k (1 η) log 2 H 2(ρ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 16/ 25
More General Suport Recovery X 2 R n p Y 2 R n 2 R p Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 17/ 25
More General Suport Recovery X 2 R n p Y 2 R n 2 R p General probabilistic models Support S Uniform( p ) k n Measurement matrix X i=1 Non-zero entries β S P βs Observations (Y X, β) P Y XS β S p j=1 P X (x i,j) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 17/ 25
More General Suport Recovery X 2 R n p Y 2 R n 2 R p General probabilistic models Support S Uniform( p ) k n Measurement matrix X i=1 Non-zero entries β S P βs Observations (Y X, β) P Y XS β S Support recovery Partial recovery p j=1 P X (x i,j) P e := P[Ŝ S] P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 17/ 25
More General Suport Recovery X 2 R n p Y 2 R n 2 R p General probabilistic models Support S Uniform( p ) k n Measurement matrix X i=1 Non-zero entries β S P βs Observations (Y X, β) P Y XS β S Support recovery Partial recovery p j=1 P X (x i,j) P e := P[Ŝ S] P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Goal: Conditions on n for P e 0 or P e(d max) 0 Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 17/ 25
Channel coding Channel State β S P βs Message S Codeword Output Encoder X S Channel Y PY n XSβS Decoder Estimate Ŝ e.g. see [Wainwright, 2009], [Atia and Saligrama, 2012], [Aksoylar et al., 2013] Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 18/ 25
More General Suport Recovery Steps for applying generalized non-asymptotic bounds: 1. Construct a typical set T β of non-zero entries β S Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 19/ 25
More General Suport Recovery Steps for applying generalized non-asymptotic bounds: 1. Construct a typical set T β [ of non-zero entries β S ] 2. Bound the tail probabilities P ı n (X sdif ; Y X seq, β s) E[ı n ] ± nδ βs = b s Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 19/ 25
More General Suport Recovery Steps for applying generalized non-asymptotic bounds: 1. Construct a typical set T β [ of non-zero entries β S ] 2. Bound the tail probabilities P ı n (X sdif ; Y X seq, β s) E[ı n ] ± nδ βs = b s 3. Control and simplify the remainder terms Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 19/ 25
More General Suport Recovery Steps for applying generalized non-asymptotic bounds: 1. Construct a typical set T β [ of non-zero entries β S ] 2. Bound the tail probabilities P ı n (X sdif ; Y X seq, β s) E[ı n ] ± nδ βs = b s 3. Control and simplify the remainder terms General form of corollaries: P e 0 if ( ) p k log s n max dif (1 + η) (s dif,s eq) I (X sdif ; Y X seq, β s = b for s) all bs T β and P e 1 if n max (s dif,s eq) ( ) p k+ s log dif s dif (1 η) I (X sdif ; Y X seq, β s = b for s) all bs T β Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 19/ 25
Exact Recovery for Linear and 1-bit Models Observation models Y = X, β + Z Y = sign ( X, β + Z ) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 20/ 25
Exact Recovery for Linear and 1-bit Models Observation models Y = X, β + Z Y = sign ( X, β + Z ) Case 1: Gaussian X, discrete β S, sparsity k = Θ(1), low SNR Necessary and sufficient conditions: Linear 1-bit s dif log p max (s dif,seq) 1 2σ i s 2 bi 2 dif s dif log p max (s dif,seq) 1 πσ i s 2 bi 2 dif Only factor π 2 difference Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 20/ 25
Exact Recovery for Linear and 1-bit Models Observation models Y = X, β + Z Y = sign ( X, β + Z ) Case 1: Gaussian X, discrete β S, sparsity k = Θ(1), low SNR Necessary and sufficient conditions: Linear 1-bit s dif log p max (s dif,seq) 1 2σ i s 2 bi 2 dif s dif log p max (s dif,seq) 1 πσ i s 2 bi 2 dif Only factor π 2 difference Case 2: Gaussian X, fixed β S, sparsity k = Θ(p), moderate SNR Conditions: Linear Θ(p) sufficient 1-bit ( ) Ω p log p necessary Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 20/ 25
Partial Recovery for Linear and 1-bit Models Partial recovery P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Linear and 1-bit models Gaussian X, Gaussian β S, sparsity k = o(p), allowed errors d max = α k Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 21/ 25
Partial Recovery for Linear and 1-bit Models Partial recovery P e(d max) := P [ S\Ŝ > dmax Ŝ\S > dmax ] Linear and 1-bit models Gaussian X, Gaussian β S, sparsity k = o(p), allowed errors d max = α k Sufficient for P e 0: Necessary for P e 1: n max α [α,1] αk log p k f (α) (1 + η) n max α [α,1] (α α )k log p k f (α) (1 η) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 21/ 25
Partial Recovery 10 5 10 4 10 3 10 2 10 1 10 0 10-1 10-2 -20 0 20 40 60 80 100 Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 22/ 25
Conclusion Contributions: New information-theoretic limits for group testing and other support recovery problems Exact thresholds (phase transitions), or near-exact Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 23/ 25
Conclusion Contributions: New information-theoretic limits for group testing and other support recovery problems Exact thresholds (phase transitions), or near-exact Implications for group testing: Optimality of i.i.d. Bernoulli measurements in many cases Limited gain by moving to partial recovery Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 23/ 25
Conclusion Contributions: New information-theoretic limits for group testing and other support recovery problems Exact thresholds (phase transitions), or near-exact Implications for group testing: Optimality of i.i.d. Bernoulli measurements in many cases Limited gain by moving to partial recovery Future work: Closing the remaining gaps (better concentration inequalities?) Non-i.i.d. measurements Applications to other non-linear models Information-spectrum type analysis of other statistical problems Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 23/ 25
Conclusion Contributions: New information-theoretic limits for group testing and other support recovery problems Exact thresholds (phase transitions), or near-exact Implications for group testing: Optimality of i.i.d. Bernoulli measurements in many cases Limited gain by moving to partial recovery Future work: Closing the remaining gaps (better concentration inequalities?) Non-i.i.d. measurements Applications to other non-linear models Information-spectrum type analysis of other statistical problems Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 23/ 25
Further Details Further details (group testing): http://infoscience.epfl.ch/record/206886 (accepted to 2016 SODA conference) Further details (general models): http://arxiv.org/abs/1501.07440 (submitted to IEEE Transactions on Information Theory) Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 24/ 25
References I M. Wainwright, Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting, IEEE Trans. Inf. Theory, vol. 55, no. 12, pp. 5728 5741, Dec. 2009. G. Atia and V. Saligrama, Boolean compressed sensing and noisy group testing, IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1880 1901, March 2012. C. Aksoylar, G. Atia, and V. Saligrama, Sparse signal processing with linear and non-linear observations: A unified Shannon theoretic approach, April 2013, http://arxiv.org/abs/1304.0682. T. S. Han, Information-Spectrum Methods in Information Theory. Springer, 2003. Limits on Support Recovery Jonathan Scarlett, jonathan.scarlett@epfl.ch Slide 25/ 25