Lecture 5. Power properties of EL and EL for vectors

Size: px

Start display at page:

Download "Lecture 5. Power properties of EL and EL for vectors"

Cameron Joseph
5 years ago
Views:

1 Stats 34 Empirical Likelihood Oct.8 Lecture 5. Power properties of EL ad EL for vectors Istructor: Art B. Owe, Staford Uiversity. Scribe: Jigshu Wag Power properties of empirical likelihood Power of the EL ratio test Some NP tests ca lose power e.g. sig test for the media). For uivariate empirical likelihood ratio test, oe ca show that: d log Rµ0 + τ σ0 / ) χ) τ ) where µ0 is the true mea, τ is a ocetrality parameter ad σ0 = VarXi ). It meas that empirial likelihood ifereces will have roughly the same power as parametric ifereces up to st order), i a family with Fiser iformatio equal to /σ0.. Empirical discovery of parametric families If F0 is i fact iside a kow parametric family, it s impossible to discover it from EL as there is o uique parametric family through F0 to discover. For example, suppose Xi N, ), the F0 ca be either from the family N µ, ), µ R} or N µ, µ ), µ 0, )}, etc.. The plot below shows the curves represetig three differet parametric families trhough N, ).

2 Lecture 5. Power properties of EL ad EL for vectors I fact, we expect the C.I. got from empirical likelihood to be at least as wide as the C.I. from the least favorable parametric family referig to the family of distributios that have miimum Fisher iformatio amog the distributios with the same µ)..3 Coverage levels geeral fact Bahadur ad Savage 956) show that there is o otrivial C.I. for mea i a large family of distributios F. Theorem.. Let F be ay set of distributios for X R Assume it satisfies the three coditios:. x df < for all F F. for each m R there is a F F with xdf = m 3. if F, G F, the λf + λ)g F for 0 λ the if C.I has at least α coverage for all F F, the for ayf ad m, with probability at least α that the C.I icludes m whe samplig from F. I practice, for EL, people will use approximate C.I. from ELT. Empirical likelihood for vectors. EL for a multivariate mea For X R d, the NPMLE is still the ECDF ˆF. The profile likelihood is: Rµ) = max w i w i X i = µ, w i 0, } w i = The cofidece regio is the C r, = w i X i w i r, w i 0, } w i = i= i= Theorem.. If X i F 0, ad are i.i.d. with mea µ 0, the C r, is covex ad as where q = rak[varx i )]. log Rµ 0 ) d χ q. Computig Rµ).. The Lagrage dual problem The domai H = HX,, X ) = wi X i w i 0, } w i =

3 Lecture 5. Power properties of EL ad EL for vectors 3 is the covex hull of poits X,, X. If µ is o the boudary of H, the we must have oe w i = 0, which meas that Rµ) = 0. So we oly cosider µ which is i the relative iterior of H. Usig the method of Lagrage multiplier, write G = log w i λ T i= settig to 0 the derivatives with respoect to w i, so ad λ R d satisfies i= w i X i µ) + γ w i ) i= w i G = w i λ T X i µ) + γ = 0 0 = w i G w i = + γ γ = w i = + λ T X i µ) 0 = Plug ito G, the we have that the dual fuctio is Xi µ + λ T X i µ) Lλ) = log + λ T X i µ) ) = log Rµ)) Based o covex duality, ow miimize Lλ) over λ R d. As ad the costraits have the form L λ λ T L λ = X i µ + λ T X i µ) = X i µ)x i µ) T + λ T X i µ)) 0 + λ T X i µ) > 0 i which give a itersectio of half-plaes. So the dual problem is miizig a covex fuctio over a covex set... Remove the costraits Sice w i, which is + λ T X i µ) + λt X i µ) the we ca chage the form of L for + λ T X i µ) < / to preserve covexity but remove positivity costraits. Let logz) z log z) = log 3 z) + z ) Z

4 4 Lecture 5. Power properties of EL ad EL for vectors The log0 z) = z z z z z z 00 log z) = z Let L λ) = X log + λt Xi µ)) the we ca miimize L λ) over λ Rd without ay costraits. Ad we have replaced a costraied optimizatio over parameters by a ucostraied dual optimizatio over d parameters. Actually, we ca detect whether µ H by optimizig L λ) ad ispectig: if for the optimal value of λ that miimize L λ), the correspodig wi s do t satisfy the costraits 0 < wi <, the it meas that L λ) or Lλ)) gets its miimum o the boudary over the set defied by the costraits, which meas that some wi = or some wi = 0, i.e. µ is ot a relative iterior poit of H...3 Newto s method to fid the optimal λ A self-cocordat fuctio o R is a fuctio f : R R for which f 000 x) f 00 x)3/ e.g. the fuctio f x) = maxx + ), x ) } is ot self-cocordat. The fuctio log is ot self-cocordat as log000 x) does ot exist at x = /. If we defie logz) z log z) = Qz) z where Qz) is a quardratic approximate ot log at. The log is self-cocordat, which guratees the covergece of Newto s method ad yields a boud o the umber of steps.

5 Lecture 5. Power properties of EL ad EL for vectors 5 Now back to usig Newto s method to miimize L z). For each step: λ λ H g where H is the Hessia matrix ad g is the gradiet of L. We ca tur H g ito a least squares problem, i.e. where J is the Jacobia matrix. λ L λ) = λ λ T L = where J R d has the ith row ad y R has Remark. For z /, i= y i = thikig of X T X) X T y usig J T J) J T y log + λ T X i µ) ) X i µ) J T y i= log + λ T X i µ) ) X i µ)x i µ) T J T J log ) / X i µ) log + λ T X i µ) ) [ log + λ T X i µ)) ] / log z) [ log z) ] / = Why turig H g ito a least square problem? Let J = UΣV T where J ad U are d, ad Σ ad V are d d be the skiy SVD. The J T J) J T y = V Σ U T y ad computig Σ will be more stable tha computig H.

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short