Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Maximum Likelihood Tests and Quasi-Maximum-Likelihood Wendelin Schnedler Department of Economics University of Heidelberg 10. Dezember 2007 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 1 / 17

Objectives After this lecture, you should be able to test restrictions using the maximum likelihood method by examining the restriction (WALD test) comparing the value of the likelihood (likelihood ratio test) checking the first order condition (Lagrange multiplier test) decide when to apply which test explain twhat happens if the wrong density is used for the likelihood Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 2 / 17

Motivation often linear model not reasonable last time: non-linear relationships next time: limited dependent variables How can we estimate θ in such models? GMM, non-linear regression, or maximum likelihood today: maximum likelihood H 0 : Rθ = r, how can we test this? in linear model: F-test and now? Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 3 / 17

Maximum-Likelihood revisited (Y, X) f (y, x, θ) + data (y i, X i ) for i = 1,..., n, i.i.d. joint density evaluated at data: i f i(y i, X i, θ) this is called: likelihood taking the logarithm: l(θ) = i l i(θ) = i log(f i(y i, X i, θ)) gives log-likelihood maximum-likelihood θ ML estimator maximises (log-)likelihood necessary condition: l(θml ) = 0 or score is equal to zero properties: θ ML consistent n(θ ML θ) is asymptotically normal distributed with variation V = J 1, J : information matrix equal to Cramer-Rao lower bound = asymptotically efficient. Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 4 / 17

Consistent estimates of V Hessian notation of V : V (θ) = J 1 = ( 1 [ ] [ n n i=1 E 2 log(f i (y i,x i,θ)) ) 1 i.i.d = ( E 2 log(f (y,x,θ)) ]) 1. estimate: V (θ) H = ( 1 n n i=1 consistent estimator of V! 2 log(f i (y i,x i,θ ML )) ) 1 gradient notation of V : V (θ) = ( 1 ] n n i=1 [( E log(f i (y i,x i,θ ML )) )( log(f i (y i,x i,θ ML )) ) n i=1 ( log(f i (y i,x i,θ ML )) estimate: V (θ) G = ( 1 n consistent estimator of V ˆV G : Berndt Hall, Hall and Hausman (BHHH) estimator ) 1 )( log(f i (y i,x i,θ ML )) ) ) 1 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 5 / 17

The Wald test H 0 : Rθ r = 0, number of restrictions: q test idea: how well does this equation hold? estimate without restriction θ ML How is n(rθ ML r) asymptotically distributed under H 0? under H 0 : E [ n(rθ ML r) ] = 0 asymptotically Why? because θ ML close to θ asy. variation of n(rθ ML r) is RVR n(rθ ML r) asy. N(0, RVR ) replace unknown V by consistent ˆV W := n (Rθ ML r) (R ˆV R ) 1 (Rθ ML r) asy. χ 2 (q) q number of restrictions (elements in r) Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 6 / 17

More on Wald test alternatively (but equivalent) test take q ˆV instead of V asy. q ˆV χ 2 (n k), k number of estimated parameters so qw = n(rθ ML r) (Rq ˆV R ) 1 (Rθ ML r) asy. F(q, n k) qw and W are asymptotically equivalent in linear model with normally distributed errors: ML estimator = OLS estimator variance of ML estimator: (X X) 1 σ 2 can be estimated by (X X) 1 s 2 then qw F(q, n k) normal linear model: wald test also valid for finite samples Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 7 / 17

Do restrictions have to be linear? until now H 0 : Rθ r = 0 mighty class of restrictions which includes θ1 = 4 with R = (1, 0,..., 0) and r = 4 θ1 = θ 2 with R = (1, 1,..., 0) and r = 0 θ1 = 2θ 2 with R = (1, 2,..., 0) and r = 0 θ 1 = θ2 2 with R = no! not possible extend tests such that restriction is h(θ) = 0 requirements: h(θ) full rank θ ML R in interior of parameter space replace Rθ r by h(θ) in all formulas and variance R ˆV R by h(θ) ˆV h(θ) Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 8 / 17

Likelihood ratio test H 0 : h(θ) = 0 test idea: how much larger is likelihood without H 0? estimate without restriction θ ML estimate with restriction θ ML R how? maximise likelihood with side-constraint h(θ) = 0 note that l(θ ML ) l(θ ML R ) 0 what is the distribution? expand using Taylor series: 2(l(θ ML ) l(θr ML = (θ ML θ ML (θ ML θ ML R )+rest χ2 (q) q number of restrictions (elements in r) )) Taylor useful for any two nested models R ) 2 l Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 9 / 17

Score test (Preparation) H 0 : h(θ) = 0 recall: s(θ ML ) := l(θml ) = 0 if H 0 valid: l(θml R ) 0 test idea: examine size of score how is the score distributed? ) = l(θml R ) = i ( log(f i (y i,x i,θr ML)) ) = i s i(θr ML). ) is asymptotically normal with s(θr ML thus 1 n sθml R E [ 1 n s(θml R )] = 0 and [ VAR 1 n s(θml R )] = ( 1 n estimate for J 1? as before [ ] n i=1 E 2 log(f i (y i,x i,θ ML R )) ) 1 = J 1 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 10 / 17

Score test H 0 : h(θ) = 0 consider 1 n s(θml R ) V (θr ML) 1 n s(θml R number of restrictions in H 0. ) asy. score test or lagrange multiplier test why the name lagrange multiplier (LM-) test? maximise l(θ) under restriction H 0 : h(θ) = 0 χ 2 (q), where q is the lagrange approach:l(θ) λh(θ) first-order condition: l(θml R ) = λ h(θ), same as ) = λ h(θ) s(θ ML R tests based on s and λ are equivalent (as h(θ) has full rank) Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 11 / 17

Similarities of Wald, Score and Likelihood ratio test log likelihood Likelihood ratio Score unrestricted estimate θ ML Wald restricted estimate θ ML R parameter Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 12 / 17

Overview on tests how do you test hypothesis H 0 : h(θ) = 0 all three tests ( holy trinity ): asymptotically equivalent problem: which test should be used? criteria: ease of computation (number of maximisation problems): Wald: one (unrestricted estimate) Score test: one (restricted estimate) Likelihood ratio test: two (both) finite sample properties: few analytical results, Wald test is exact for linear model with normal errors robustness, sensitive to...... algebraically equivalent parameterisations? Wald: yes; Score: sometimes; LR: always... wrong specification of density? next Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 13 / 17

Maximum likelihood as a moment estimator assumed density: g(y i x i, θ), true density: f (y i x i, θ) what if ˆθ is moment estimator? if moment condition is valid, estimator is still correct can we write likelihood estimator as moment estimator? definition of density f (y i x i ; θ)dy i = 1 differentiate f (y i x i ; θ)dy i = 0 observe that log(f (y i x i ; θ)) = f (y i x i ;θ) f (y i x i ;θ) so that f (y i x i ; θ)dy i = log(f (y i x i ; θ))f (y i x i ; θ)dy i which means E [ log(f (y i x i ; θ)) ] = 0 valid moment condition for any density identical to first-order condition to identify ML estimator Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 14 / 17

Saving maximum likelihood estimation assumed density: g(y i x i, θ), true density: f (y i x i, θ) but E [ log(g(y i x i ; θ)) ] = 0 use method of moment techniques: replace theoretical moment by sample moment resulting moment estimator: quasi-maximum likelihood estimator result by Gourieroux/Monfort/Trognon (1984): n(ˆθ θ) asy N(0, V) with V = I 1 JI 1 and I = ( 1 [ n n i=1 E 2 log(g i (y i,x i,θ)) ]) 2 log(g i (y i,x i,ˆθ)) ) estimate: Î = ( 1 n n i=1 estimate J = ( 1 n n i=1 [( E log(f i (y i,x i,ˆθ)) Ĵ = 1 n n i=1 ( log(f i (y i,x i,ˆθ)) )( log(f i (y i,x i,ˆθ)) ) )( log(f i (y i,x i,ˆθ)) ) ] ) 1 by Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 15 / 17

More results on quasi-maximum likelihood if g = f it follows that J = I back to ordinary ML V is called sandwich formula; true information matrix is sandwiched by correction matrices estimating V by Î 1 ĴÎ 1 : sandwich estimator sandwich estimator leads to robust standard errors also called White or Huber estimates here: robustness with respect to density misspecification can be used in Wald test how? replace ˆV by ˆV by the way: linear model sandwich estimator is old friend: heteroscedasticity-consistent covariance estimator Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 16 / 17

Where are we? when estimating using maximum likelihood how can we test hypotheses? what happens if density is wrong? next: models where dependent variable is limited clue: can only be estimated with ML Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10. Dezember 2007 17 / 17