Sas 300C: Theory of Saisics Spring 208 Lecure 2 April 04, 208 Prof. Emmanuel Candes Scribe: Paulo Orensein; edied by Sephen Baes, XY Han Ouline Agenda: Global esing. Needle in a Haysack Problem 2. Threshold Phenomenon 3. Opimaliy of Bonferroni s Global Tes Las ime: We inroduced Bonferroni s global es. In his lecure, we show ha Bonferroni s mehod is somehow opimal for esing agains sparse alernaives. This claim relies on power calculaions, which require us o specify alernaives. In his lecure, we consider an independen Gaussian sequence model: We are ineresed in he n hypoheses y i ind N (µ i, ), i =,..., n. H 0,i : µ i = 0 so ha in his case, he global null assers ha all he means µ i vanish. Under he alernaive H, some means µ i 0. We saw ha Bonferroni s mehod rejecs if max y i z(α/n) in he one-sided case, and if max y i z(α/2n) in he wo-sided case. Pu anoher way, Bonferroni rejecs he global null hypohesis if he larges y i is large enough. For he special case where he n ess are muually independen, we also calculaed P H0 (Type I Error) := q(α) e α α. 2 Magniude of Bonferroni s Threshold How large is our hreshold = z(α/n) (one sided) or z(α/2n) (wo sided)? If φ() is he sandard normal pdf, hen we can derive by Markov s Inequaliy he useful resul: φ() ( ) 2 P(Z > ) φ(),
8 alpha = 0.05 7 6 5 4 True Approx sqr(2*log(n)) 3 0 2 0 4 0 6 0 8 0 0 0 2 8 alpha = 0.0 7 6 5 4 True Approx sqr(2*log(n)) 3 0 2 0 4 0 6 0 8 0 0 0 2 ( ) Figure : z(α/n), B log B B ( Approx ), and 2 log n for n {0 2, 0 2 }. where Z N(0, ). Tha is, for large, φ() is a good approximaion o he normal ail probabiliy (Gaussian quanile). Roughly speaking, hen, P(Z > ) = α/n φ() Holding α fixed, hen, we can show ha for large n, z(α/n) 2 log n 2 log n. [ 4 α/n. ] log log n log n Hence, he quaniles grow like 2 log n, wih a small correcion facor. Figure plos z(α/n) and 2 log n. Noice ha Bonferroni hen basically amouns o rejecing when max yi > 2 log n. One remarkable fac abou all of his is ha here is (asympoically) no dependence on α. Tha is, whaever α we use, our rejecion hreshold for max y i is asympoic o 2 log n. This is a consequence of he fac ha, under H 0, max y i 2 log n p. 2
In oher words, he firs order erm 2 log n asympoically dominaes he erms conaining α. For finie samples, i is of course possible o develop approximaions o z(α/n) which are a boh more accurae han 2 log n. Se Then B = 2 log(n/α) log() = 2 log(n/α).8379. z(α/n) B ( log B ). B Figure shows ha his approximaion is nearly indisiguishable from z(α/n), even for modes values of n. 3 Sharp Deecion Threshold for he Needle in a Haysack Asympoic Power: Consider a sequence of problems wih n. How powerful is Bonferroni, or, pu i anoher way, wha is he limiing power P H (max y i > z(α/n))? Needle in a Haysack Problem: To answer he quesion above, we need o specify alernaive hypoheses. The needle in a haysack problem is his: under he alernaive, one µ i = µ > 0. We don know which one. For he needle in he haysack problem, we shall see ha he answer o he power quesion depends µ very sensiively on he limiing raio (n) 2 log n, where µ (n) > 0 is he value of he single nonzero mean. (The (n) in he superscrip capures he dependence of µ (n) on n). There are wo cases.. Asympoic full power above hreshold: Suppose µ (n) > (+ε) 2 log n. Then, assuming wihou loss of generaliy ha µ = µ (n), P H (max y i > z(α/n) ) P(y > z(α/n) ) = P(z > z(α/n) µ (n) ) In he second o las sep, we use he fac ha y = z + µ (n) where z follows N(0, ). 2. Asympoic powerlessness below hreshold: Suppose µ (n) < ( ε) 2 log n. Then P H (max y i > z(α/n) ) P(y > z(α/n) ) + P(max i> y i > z(α/n) ) P(z > z(α/n) ) µ (n) ) + P(max i> z i > z(α/n) ) 0 + q(α) α. This is a bad es because we can obain he same level and power by flipping a biased coin ha rejecs α of he ime. 3
Conclusion: We effecively see ha 2 log n consiues a sharp deecion hreshold. When µ (n) 2 log n = + ε, we always deec he needle µ > 0. We can even achieve P H0 (Type I Error) 0 and P H (Type II Error) 0 if we use 2 log n insead of z(α/n) as our hreshold. In oher words, asympoically we make no misakes. µ However, when (n) 2 log n = ε, wih q(α) = e α α being he asympoic size, Bonferroni s global es gives P(Type I Error) q(α) and P(Type II Error) q(α), ha is, i does no beer han flipping a coin. Can we do beer han Bonferroni? When µ (n) = ( ε) 2 log n, we saw ha, roughly, P H0 (Type I Error) α and P H (Type II Error) α, so we are doing no beer han flipping a biased coin ha disregards he acual daa. This is in fac rue for any es in his scenario. To see his, we firs reduce our composie hypohesis o a simple one, and hen we show ha even he opimal es given by he Neyman-Pearson Lemma does no beer han flipping a coin. 4 Opimaliy of Deecion Threshold Bayesian Decision Problem: Consider H 0 : µ i = 0 for all i H : {µ i } π where π selecs a coordinae I uniformly and ses µ I = µ, wih all oher µ i = 0. This seup differs from he previous problem in he imporan respec ha H 0 and H are boh simple hypoheses and we can now apply Neyman-Pearson. The opimal es rejecs for large values of he likelihood raio. The densiies under he null and he alernaive are given by f 0 (y) = n j= f (y) = n e 2 y2 j, e 2 (y i µ) 2 Afer cancellaions, he likelihood raio is given by L = f f 0 = n 4 j:j i e y iµ 2 µ2. e 2 y2 j.
Properies of L under H 0 : Wriing X i = e y iµ 2 µ2, we have ha under H 0 he X i are iid and L = n X i ; his is a sample average wih mean EX and variance n VarX. Firs impulse: We would like o apply he CLT; however, because µ is no fixed bu raher µ (n) = ( ε) 2 log n, we would need a riangular array argumen. The (sufficien bu no necessary) Lyapunov condiion, for insance, is violaed for q = 3: [ i Var(X i)] 3/2 E X i 3 as n. We shall, herefore, focus on deriving a weaker resul. Proposiion. If µ = ( ε) 2 log n, hen L p. Proof. Proof provided a he end of he noes. This already hins a he fac ha he likelihood es canno do very well. Bu before we formally prove his (his will no be done in his lecure), we skip o he punchline. Proposiion 2. Se hreshold T n (α) such ha P 0 (L T n (α)) = α. Then lim P(Type II error) = α. n Proof. Noe ha P(Type II Error) = P (L T n (α)) = {L Tn(α)} dp = {L Tn(α)}L dp 0 = {L Tn(α)} dp 0 + {L Tn(α)}(L ) dp 0 = ( α) + {L Tn(α)}(L ) dp 0 i ( α). The las claim follows from he fac ha L p. We can make his rigorous as follows: le p p Z n = {L Tn(α)}(L ). Firs, Z n 0. Second, Because L, Tn (α) is uniformly bounded, and hence so is Z n. The bounded convergence heorem [, Secion 3.6] hen gives ha E Z n 0 (his is a simple resul ha can be checked by hand). 5
Conclusion: If µ (n) = ( ε) 2 log n, hen he opimal es has P(Type I Error) + P(Type II Error). Broad Conclusion: Le s hink back o he original problem, wih H : µ i > 0 for one i, a composie of n alernaives. We have shown oday ha he average ype II error (Bayes risk) of any level-α procedure is no beer han α, from which i of course follows ha he wors-case error (minimax risk) is no beer eiher: i.e. for any es lim inf [P H0 (Type I Error) + sup H P(Type II Error)] where he sup is aken over all alernaives in which one coordinae has mean µ (n) = ( ɛ) 2 log n. In his regime, he Bonferroni procedure is opimal for esing he global null. Asympoically, i is able o perfecly differeniae beween he null and alernaive hypohesis when µ (n) is larger han he 2 log n hreshold, and we have jus shown ha no es is able o do beer in minimax risk han a coin flip when µ (n) is smaller han he 2 log n hreshold. 5 Proof of he proposiion Recall he saemen of Proposiion : If µ = ( ε) 2 log n, hen L p. Proof. Recall L = n X i wih X i = e y iµ 2 µ2 iid. Assume firs 0 < ɛ < /2, ake T n = 2 log n, and wrie L = n X i {yi T n}. We have and i suffices o esablish ha P( L L) P(max y i T n ) 0, L = Φ(ε 2 log n) + o P0 () which in paricular follows if. E 0 ( L) = Φ(ε 2 log n) 2. Var 0 ( L) = o() 6
Proceeding, [ ] Tn E 0 ( L) = E 0 X {y T n} = = Tn = Φ(T n µ) e µz µ2 /2 e z2 /2 dz e (z µ)2 /2 dz = Φ(ε 2 log n). Furhermore, Var 0 ( L) = n Var ( ) X {y T n} n E [ 0 X 2 Tn {y T n}] = e µ2 e 2µz φ(z) dz n Since Φ(T n 2µ) φ(2µ T n ), his gives = n eµ2 Φ(T n 2µ). Var 0 ( L) n eµ2 φ(2µ T n ) = T 2 n e( ε)2 n e ( 2ε)2 Tn 2/2 = n e ( 2ε2 )T 2 n/2 = e ε2 T 2 n 0. This proves he resul for 0 < ɛ < /2. The claim for > ɛ > /2 is even simpler since exp(µ 2 )/n converges o zero in his case. References [] D. Williams. Probabiliy wih maringales, Cambridge Universiy Press, 99. 7