Lecture 2 April 04, 2018

Similar documents
Lecture 12: Multiple Hypothesis Testing

Notes for Lecture 17-18

Comparing Means: t-tests for One Sample & Two Related Samples

1 Review of Zero-Sum Games

Lecture 33: November 29

GMM - Generalized Method of Moments

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

CS Homework Week 2 ( 2.25, 3.22, 4.9)

Asymptotic Equipartition Property - Seminar 3, part 1

Hypothesis Testing in the Classical Normal Linear Regression Model. 1. Components of Hypothesis Tests

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

Lecture Notes 2. The Hilbert Space Approach to Time Series

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

Expert Advice for Amateurs

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Final Spring 2007

Solutions from Chapter 9.1 and 9.2

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

An random variable is a quantity that assumes different values with certain probabilities.

Math 10B: Mock Mid II. April 13, 2016

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Empirical Process Theory

A Bayesian Approach to Spectral Analysis

14 Autoregressive Moving Average Models

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Linear Response Theory: The connection between QFT and experiments

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

Econ Autocorrelation. Sanjaya DeSilva

Stochastic models and their distributions

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

What Ties Return Volatilities to Price Valuations and Fundamentals? On-Line Appendix

Homework 10 (Stats 620, Winter 2017) Due Tuesday April 18, in class Questions are derived from problems in Stochastic Processes by S. Ross.

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

Two Coupled Oscillators / Normal Modes

Math 334 Test 1 KEY Spring 2010 Section: 001. Instructor: Scott Glasgow Dates: May 10 and 11.

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

Block Diagram of a DCS in 411

Unit Root Time Series. Univariate random walk

Distribution of Estimates

Cash Flow Valuation Mode Lin Discrete Time

CHAPTER 2 Signals And Spectra

EE3723 : Digital Communications

Online Convex Optimization Example And Follow-The-Leader

Stationary Time Series

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

1 Solutions to selected problems

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE

Forecasting optimally

13.3 Term structure models

Financial Econometrics Jeffrey R. Russell Midterm Winter 2009 SOLUTIONS

Answers to QUIZ

U( θ, θ), U(θ 1/2, θ + 1/2) and Cauchy (θ) are not exponential families. (The proofs are not easy and require measure theory. See the references.

EXERCISES FOR SECTION 1.5

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Physics 127b: Statistical Mechanics. Fokker-Planck Equation. Time Evolution

Some Ramsey results for the n-cube

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Approximation Algorithms for Unique Games via Orthogonal Separators

An Introduction to Malliavin calculus and its applications

KEY. Math 334 Midterm I Fall 2008 sections 001 and 003 Instructor: Scott Glasgow

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Supplementary Material

5.1 - Logarithms and Their Properties

Lecture 3: Exponential Smoothing

Y 0.4Y 0.45Y Y to a proper ARMA specification.

Some Basic Information about M-S-D Systems

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Ensamble methods: Bagging and Boosting

Technical Appendix. Political Uncertainty and Risk Premia

SMT 2014 Calculus Test Solutions February 15, 2014 = 3 5 = 15.

Ensamble methods: Boosting

556: MATHEMATICAL STATISTICS I

Lecture 10: The Poincaré Inequality in Euclidean space

4 Sequences of measurable functions

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should

Vehicle Arrival Models : Headway

Homework sheet Exercises done during the lecture of March 12, 2014

Innova Junior College H2 Mathematics JC2 Preliminary Examinations Paper 2 Solutions 0 (*)

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Wednesday, November 7 Handout: Heteroskedasticity

SOME MORE APPLICATIONS OF THE HAHN-BANACH THEOREM

Regression with Time Series Data

Challenge Problems. DIS 203 and 210. March 6, (e 2) k. k(k + 2). k=1. f(x) = k(k + 2) = 1 x k

20. Applications of the Genetic-Drift Model

arxiv: v1 [math.pr] 19 Feb 2011

MATH 4330/5330, Fourier Analysis Section 6, Proof of Fourier s Theorem for Pointwise Convergence

Measurement Error 1: Consequences Page 1. Definitions. For two variables, X and Y, the following hold: Expectation, or Mean, of X.

Matlab and Python programming: how to get started

1. Diagnostic (Misspeci cation) Tests: Testing the Assumptions

Convergence of the Neumann series in higher norms

Let us start with a two dimensional case. We consider a vector ( x,

Vectorautoregressive Model and Cointegration Analysis. Time Series Analysis Dr. Sevtap Kestel 1

Transcription:

Sas 300C: Theory of Saisics Spring 208 Lecure 2 April 04, 208 Prof. Emmanuel Candes Scribe: Paulo Orensein; edied by Sephen Baes, XY Han Ouline Agenda: Global esing. Needle in a Haysack Problem 2. Threshold Phenomenon 3. Opimaliy of Bonferroni s Global Tes Las ime: We inroduced Bonferroni s global es. In his lecure, we show ha Bonferroni s mehod is somehow opimal for esing agains sparse alernaives. This claim relies on power calculaions, which require us o specify alernaives. In his lecure, we consider an independen Gaussian sequence model: We are ineresed in he n hypoheses y i ind N (µ i, ), i =,..., n. H 0,i : µ i = 0 so ha in his case, he global null assers ha all he means µ i vanish. Under he alernaive H, some means µ i 0. We saw ha Bonferroni s mehod rejecs if max y i z(α/n) in he one-sided case, and if max y i z(α/2n) in he wo-sided case. Pu anoher way, Bonferroni rejecs he global null hypohesis if he larges y i is large enough. For he special case where he n ess are muually independen, we also calculaed P H0 (Type I Error) := q(α) e α α. 2 Magniude of Bonferroni s Threshold How large is our hreshold = z(α/n) (one sided) or z(α/2n) (wo sided)? If φ() is he sandard normal pdf, hen we can derive by Markov s Inequaliy he useful resul: φ() ( ) 2 P(Z > ) φ(),

8 alpha = 0.05 7 6 5 4 True Approx sqr(2*log(n)) 3 0 2 0 4 0 6 0 8 0 0 0 2 8 alpha = 0.0 7 6 5 4 True Approx sqr(2*log(n)) 3 0 2 0 4 0 6 0 8 0 0 0 2 ( ) Figure : z(α/n), B log B B ( Approx ), and 2 log n for n {0 2, 0 2 }. where Z N(0, ). Tha is, for large, φ() is a good approximaion o he normal ail probabiliy (Gaussian quanile). Roughly speaking, hen, P(Z > ) = α/n φ() Holding α fixed, hen, we can show ha for large n, z(α/n) 2 log n 2 log n. [ 4 α/n. ] log log n log n Hence, he quaniles grow like 2 log n, wih a small correcion facor. Figure plos z(α/n) and 2 log n. Noice ha Bonferroni hen basically amouns o rejecing when max yi > 2 log n. One remarkable fac abou all of his is ha here is (asympoically) no dependence on α. Tha is, whaever α we use, our rejecion hreshold for max y i is asympoic o 2 log n. This is a consequence of he fac ha, under H 0, max y i 2 log n p. 2

In oher words, he firs order erm 2 log n asympoically dominaes he erms conaining α. For finie samples, i is of course possible o develop approximaions o z(α/n) which are a boh more accurae han 2 log n. Se Then B = 2 log(n/α) log() = 2 log(n/α).8379. z(α/n) B ( log B ). B Figure shows ha his approximaion is nearly indisiguishable from z(α/n), even for modes values of n. 3 Sharp Deecion Threshold for he Needle in a Haysack Asympoic Power: Consider a sequence of problems wih n. How powerful is Bonferroni, or, pu i anoher way, wha is he limiing power P H (max y i > z(α/n))? Needle in a Haysack Problem: To answer he quesion above, we need o specify alernaive hypoheses. The needle in a haysack problem is his: under he alernaive, one µ i = µ > 0. We don know which one. For he needle in he haysack problem, we shall see ha he answer o he power quesion depends µ very sensiively on he limiing raio (n) 2 log n, where µ (n) > 0 is he value of he single nonzero mean. (The (n) in he superscrip capures he dependence of µ (n) on n). There are wo cases.. Asympoic full power above hreshold: Suppose µ (n) > (+ε) 2 log n. Then, assuming wihou loss of generaliy ha µ = µ (n), P H (max y i > z(α/n) ) P(y > z(α/n) ) = P(z > z(α/n) µ (n) ) In he second o las sep, we use he fac ha y = z + µ (n) where z follows N(0, ). 2. Asympoic powerlessness below hreshold: Suppose µ (n) < ( ε) 2 log n. Then P H (max y i > z(α/n) ) P(y > z(α/n) ) + P(max i> y i > z(α/n) ) P(z > z(α/n) ) µ (n) ) + P(max i> z i > z(α/n) ) 0 + q(α) α. This is a bad es because we can obain he same level and power by flipping a biased coin ha rejecs α of he ime. 3

Conclusion: We effecively see ha 2 log n consiues a sharp deecion hreshold. When µ (n) 2 log n = + ε, we always deec he needle µ > 0. We can even achieve P H0 (Type I Error) 0 and P H (Type II Error) 0 if we use 2 log n insead of z(α/n) as our hreshold. In oher words, asympoically we make no misakes. µ However, when (n) 2 log n = ε, wih q(α) = e α α being he asympoic size, Bonferroni s global es gives P(Type I Error) q(α) and P(Type II Error) q(α), ha is, i does no beer han flipping a coin. Can we do beer han Bonferroni? When µ (n) = ( ε) 2 log n, we saw ha, roughly, P H0 (Type I Error) α and P H (Type II Error) α, so we are doing no beer han flipping a biased coin ha disregards he acual daa. This is in fac rue for any es in his scenario. To see his, we firs reduce our composie hypohesis o a simple one, and hen we show ha even he opimal es given by he Neyman-Pearson Lemma does no beer han flipping a coin. 4 Opimaliy of Deecion Threshold Bayesian Decision Problem: Consider H 0 : µ i = 0 for all i H : {µ i } π where π selecs a coordinae I uniformly and ses µ I = µ, wih all oher µ i = 0. This seup differs from he previous problem in he imporan respec ha H 0 and H are boh simple hypoheses and we can now apply Neyman-Pearson. The opimal es rejecs for large values of he likelihood raio. The densiies under he null and he alernaive are given by f 0 (y) = n j= f (y) = n e 2 y2 j, e 2 (y i µ) 2 Afer cancellaions, he likelihood raio is given by L = f f 0 = n 4 j:j i e y iµ 2 µ2. e 2 y2 j.

Properies of L under H 0 : Wriing X i = e y iµ 2 µ2, we have ha under H 0 he X i are iid and L = n X i ; his is a sample average wih mean EX and variance n VarX. Firs impulse: We would like o apply he CLT; however, because µ is no fixed bu raher µ (n) = ( ε) 2 log n, we would need a riangular array argumen. The (sufficien bu no necessary) Lyapunov condiion, for insance, is violaed for q = 3: [ i Var(X i)] 3/2 E X i 3 as n. We shall, herefore, focus on deriving a weaker resul. Proposiion. If µ = ( ε) 2 log n, hen L p. Proof. Proof provided a he end of he noes. This already hins a he fac ha he likelihood es canno do very well. Bu before we formally prove his (his will no be done in his lecure), we skip o he punchline. Proposiion 2. Se hreshold T n (α) such ha P 0 (L T n (α)) = α. Then lim P(Type II error) = α. n Proof. Noe ha P(Type II Error) = P (L T n (α)) = {L Tn(α)} dp = {L Tn(α)}L dp 0 = {L Tn(α)} dp 0 + {L Tn(α)}(L ) dp 0 = ( α) + {L Tn(α)}(L ) dp 0 i ( α). The las claim follows from he fac ha L p. We can make his rigorous as follows: le p p Z n = {L Tn(α)}(L ). Firs, Z n 0. Second, Because L, Tn (α) is uniformly bounded, and hence so is Z n. The bounded convergence heorem [, Secion 3.6] hen gives ha E Z n 0 (his is a simple resul ha can be checked by hand). 5

Conclusion: If µ (n) = ( ε) 2 log n, hen he opimal es has P(Type I Error) + P(Type II Error). Broad Conclusion: Le s hink back o he original problem, wih H : µ i > 0 for one i, a composie of n alernaives. We have shown oday ha he average ype II error (Bayes risk) of any level-α procedure is no beer han α, from which i of course follows ha he wors-case error (minimax risk) is no beer eiher: i.e. for any es lim inf [P H0 (Type I Error) + sup H P(Type II Error)] where he sup is aken over all alernaives in which one coordinae has mean µ (n) = ( ɛ) 2 log n. In his regime, he Bonferroni procedure is opimal for esing he global null. Asympoically, i is able o perfecly differeniae beween he null and alernaive hypohesis when µ (n) is larger han he 2 log n hreshold, and we have jus shown ha no es is able o do beer in minimax risk han a coin flip when µ (n) is smaller han he 2 log n hreshold. 5 Proof of he proposiion Recall he saemen of Proposiion : If µ = ( ε) 2 log n, hen L p. Proof. Recall L = n X i wih X i = e y iµ 2 µ2 iid. Assume firs 0 < ɛ < /2, ake T n = 2 log n, and wrie L = n X i {yi T n}. We have and i suffices o esablish ha P( L L) P(max y i T n ) 0, L = Φ(ε 2 log n) + o P0 () which in paricular follows if. E 0 ( L) = Φ(ε 2 log n) 2. Var 0 ( L) = o() 6

Proceeding, [ ] Tn E 0 ( L) = E 0 X {y T n} = = Tn = Φ(T n µ) e µz µ2 /2 e z2 /2 dz e (z µ)2 /2 dz = Φ(ε 2 log n). Furhermore, Var 0 ( L) = n Var ( ) X {y T n} n E [ 0 X 2 Tn {y T n}] = e µ2 e 2µz φ(z) dz n Since Φ(T n 2µ) φ(2µ T n ), his gives = n eµ2 Φ(T n 2µ). Var 0 ( L) n eµ2 φ(2µ T n ) = T 2 n e( ε)2 n e ( 2ε)2 Tn 2/2 = n e ( 2ε2 )T 2 n/2 = e ε2 T 2 n 0. This proves he resul for 0 < ɛ < /2. The claim for > ɛ > /2 is even simpler since exp(µ 2 )/n converges o zero in his case. References [] D. Williams. Probabiliy wih maringales, Cambridge Universiy Press, 99. 7