Likelihood inference in the presence of nuisance parameters

Size: px

Start display at page:

Download "Likelihood inference in the presence of nuisance parameters"

John Jackson
5 years ago
Views:

1 Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference with no nuisance parameters; first and third order 3. Profile log-likelihood 4. Adjustments to profile log-likelihood 5. Third order p-values 6. Model classes 1

2 1. Notation... - model Y f(y; θ), θ R d θ = (ψ, λ) - likelihood L(θ) = L(θ; y) = f(y; θ), l(θ) - i.i.d. sampling y = (y 1,..., y n ) L(θ; y) = - m.l.e. sup θ L(θ) = L(ˆθ) - observed information j(ˆθ) = l (ˆθ) - expected information i(θ) = E{ l (θ)} - partitioned information i(θ) = ( iψψ i ψλ i λψ i λλ ) - partitioned inverse i 1 (θ) = ( i ψψ i ψλ i λψ i λλ ) 2

3 i(θ) = ( iψψ i ψλ i λψ i ψψ ) ψ is orthogonal to λ if i ψλ (θ) = 0 implies in particular that ˆψ and ˆλ are asymptotically independent Example: ratio of Poisson means y 1 P o(λ), y 2 P o(ψλ): L(ψ, λ; y 1, y 2 ) = e 2λ ψ ψ y 2λ y 1+y 2 in fact =L 1 (ψ; y 2 )L 2 (λ; y + ), stronger than orthogonality Example: exponential regression y i follows an exponential distribution E(y) = λ exp( ψx i ); Σx i = 0 l(ψ, λ; y) = n log λ + λσy i exp( ψx i ) 3

4 Likelihood inference with no nuisance parameters - Plot the likelihood - ˆθ is asymptotically normal, mean θ variance i 1 (θ) - r(θ) = ±[2{l(ˆθ) l(θ)}] 1/2 is asymptotically N(0, 1) (better) - r (θ) = r(θ) + 1 r(θ) log { q(θ) r(θ) is asymptotically N(0, 1) (even better) } 4

5 Example: Y P o(θ), θ > b, b known; b = 6.7, y = 17: likelihood µ Fraser, Reid, Wong

6 p value µ p-values: upper lower mid r r ˆθ

7 likelihood mu p-value mu

8 Nuisance parameters: profile likelihood θ = (ψ, λ) restricted m.l.e. ˆλ ψ : sup λ L(ψ, λ) L p (ψ) = L(ψ, ˆλ ψ ) (concentrated likelihood) for λ of fixed dimension, i.i.d. sampling y: - sup ψ L p (ψ) = L( ˆψ, ˆλ) - r p (ψ) = ±[2{l p ( ˆψ) l p (ψ)}] 1/2 d N(0, 1) - ˆψ asymptotically normal with mean ψ and variance consistently estimated by { l p( ˆψ)} 1 = j ψψ ( ˆψ, ˆλ) 7

9 But, profile likelihood can be too concentrated, and maximized at wrong point: Example: linear regression y i = x i β + ɛ i, x i = (x i1,..., x ip ) ɛ i N(0, ψ) ˆψ = 1 n Σ(y i x iˆβ) 2 profile σ 8

10 Adjustments to profile log-likelihood If ψ is orthogonal to λ: l a (ψ) = l p (ψ) 1 2 log j λλ(ψ, ˆλ ψ ) j(θ) = l (θ) = l p is O p (n), log j is O p (1) ( ) jψψ j ψλ j λψ j λλ Example: product of exponential means y 1i Exp(ψλ i ) y 2i Exp(ψ/λ i ), i = 1,..., n ˆψ π 4 ψ ˆψ a π 3 ψ 9

11 not invariant to (one-one) reparametrizations of λ; better to use l a (ψ) = l p (ψ) log j λλ(ψ, ˆλ ψ ) + B(ψ) with B(ψ) = O p (1) this can make l a invariant and remove the need for orthogonal parametrization B(ψ) = 1 2 log ϕ λ (ψ, ˆλ ψ )j ϕϕ ( ˆψ, ˆλ)ϕ λ (ψ, ˆλ ψ ) with ϕ = ϕ(θ) = l ;V (θ; y 0 ) comes from an approximating location model: 2003 Biometrika Fraser 10

12 p-values from profile likelihood First order: p(ψ). = Φ(r p ) r p (ψ) = ±[2{l p ( ˆψ) l p (ψ)}] 1/2 N(0, 1) Third order: p(ψ). = Φ(r ) r (ψ) = r p (ψ) + 1 r p log r p Q Q = (ˆν ˆν ψ )ˆσ 1/2 ν ν(θ) = e T ψ ϕ(θ), e ψ = ψ ϕ (ˆθ ψ )/ ψ ϕ (ˆθ ψ ), ˆσ ν 2 = j (λλ) (ˆθ ψ ) / j (θθ) (ˆθ), j (θθ) (ˆθ) = j θθ (ˆθ) ϕ θ (ˆθ) 2, j (λλ) (ˆθ ψ ) = j λλ (ˆθ ψ ) ϕ λ (ˆθ ψ ) 2. Fraser, Reid, Wu (1999) Biometrika 11

13 Example: log y N(µ, σ 2 ): inference for ψ = log(ey ) solid: 3rd order dotted: 1st order p value ψ 12

14 Example: comparing two binomials Employment of men and women at the Space Telescope Science Institute, (from Science magazine, Volume 299, page 993, 14 February 2003). Left Stayed Total Men Women Total Y 1 Bin(19, p 1 ), Y 2 Bin(7, p 2 ) ψ = log p 1(1 p 2 ) p 2 (1 p 1 ) p-value for testing ψ = 0 is using normal approx to maximum likelihoo using normal approx to r p (1st order) using normal approx to r (3rd order) 13

15 Some more technical points In special model classes, it is possible to eliminate nuisance parameters by either conditioning or marginalizing. The conditional or marginal likelihood then gives essentially exact inference for the parameter of interest, if this likelihood can itself be computed exactly. The main example is the canonical parameter of an exponential family: f(y; ψ, λ) = exp{ψs + λ t c(ψ, λ) d(y)}; f(s t; ψ) = exp{ψs C t (ψ) D t (s)} l cond (ψ) = ψs C t (ψ) The adjusted log-likelihood l a (ψ) = l p (ψ) (1/2) log j λλ approximates l cond 14

16 The 3rd order p-value approximation is particularly simple: r = r a + 1 r a log( Q r a ) r = r a = ±[2{l a ( ˆψ a ) l a (ψ)}] 1/2 Q = ( ˆψ a ψ){j a ( ˆψ)} 1/2 A similar discussion applies to the class of transformation models, using marginal approximations. Both class are reviewed in Reid 1996 The approximations given earlier reduce to these special cases.

17 Some References Fraser, D.A.S., Reid, N., Wong, A. (2003). Inference for bounded parameters. xxx.lanl.gov/ Fraser, D.A.S. (2003). Likelihood for component parameters. Biometrika 90, Fraser, D.A.S., Reid, N., Wu, J. (1999). A simple general formula for tail probabilities for frequentist and Bayesian inference. Biometrika 86, Reid, N. (2003). Asymptotics and the theory of inference. Ann. Statist., to appear. Reid, N. (1992). Aspects of modified profile likelihood. in Nonparametric Statistics and Related Topics, A.K.Md.E. Saleh, ed. North-Holland, Amsterdam. Cox, D.R. and Reid, N. (1987). Parameter orthogonality and approximate conditional inference (with discussion). J. R. Statist. Soc. B, 49, Reid, N. (1996). Likelihood and higher-order approximatons to tail areas: a review and annotated bibliography. Canad. J. Statist

Likelihood Inference in the Presence of Nuisance Parameters

PHYSTAT2003, SLAC, September 8-11, 2003 1 Likelihood Inference in the Presence of Nuance Parameters N. Reid, D.A.S. Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe