Questions and Answers on Maximum Likelihood

Questios ad Aswers o Maximum Likelihood L. Magee Fall, 2008 1. Give: a observatio-specific log likelihood fuctio l i (θ) = l f(y i x i, θ) the log likelihood fuctio l(θ y, X) = l i(θ) a data set (x i, y i ), i = 1,..., a value for the maximum likelihood estimator ˆθ of the parameter vector θ briefly describe how you would compute (a) the egative Hessia estimator of the variace of ˆθ (b) the outer product of gradiet (OPG) estimator of the variace of ˆθ (c) a misspecificatio-cosistet variace estimator that follows from iterpretig the ML estimator as a method of momets estimator 2. The radom variable y has a probability desity fuctio f(y) = (1 θ) + 2θy for 0 < y < 1 = 0 otherwise for 1 < θ < 1. distributio. There are observatios y i, i = 1,...,, draw idepedetly from this (a) (i) Write the cumulative distributio fuctio of y. (ii) Derive the expected value of y. (iii) Suggest a method of momets estimator for θ based o the sample mea ȳ. (b) (i) Write the log likelihood fuctio for θ. (ii) Write the first-order coditio for the ML estimator of θ. 3. y 1,..., y are idepedet draws from a expoetial distributio. The probability desity fuctio of each y i is f(y i θ) = θ 1 exp( y i /θ), where y i > 0 ad θ > 0. The expoetial distributio has the property E(y i ) = θ. (a) Derive (i) the observatio-specific log likelihood fuctio l i (θ) (ii) the log likelihood fuctio l(θ) (iii) the maximum likelihood (ML) estimator of θ, ˆθ. 1

(b) Derive the followig estimators of the variace of ˆθ, showig their geeral formulas as part of your aswer. (i) the egative Hessia variace estimator (ii) the Iformatio matrix variace estimator (iii) the outer product of gradiet (OPG) variace estimator (iv) the misspecificatio-cosistet variace estimator that follows from iterpretig the ML estimator as a method of momets estimator 4. Give observatios o the scalar x i, i = 1,...,, each y i is idepedetly draw accordig to the coditioal pdf f(y i x i, θ) = (x i θ) 1 exp( y i x i θ ) where y i > 0, x i > 0, ad θ > 0. θ is a ukow scalar parameter. (a) Write the observatio-specific log likelihood fuctio l i (θ) (b) Write log likelihood fuctio l(θ) = i l i(θ) (c) Derive the maximum likelihood (ML) estimator of θ. (d) I this model, E(y i x i, θ) = x i θ. Usig this fact, suggest aother cosistet estimator of θ that is differet from the ML estimator i (c). No explaatio is required. 5. (16 marks: 4 for each part) Let y i, i = 1,..., be idepedetly-observed o-egative itegers draw from a Poisso distributio Prob(y i θ) = θy i e θ, y i = 0, 1, 2,... y i! The Poisso distributio has the property E(y i θ) = θ. (Aside:! is kow as the factorial operator. y i!, or y i factorial, is defied as y i! = 1 2... (y i 1) y i. I the curret questio, this term serves as a ormalizig costat, ad has o effect o the derivatios of the maximum likelihood estimator or its variace estimators, much like the 2π term i the deomiator of the ormal pdf.) (a) Write the observatio-specific log likelihood fuctio l i (θ) (b) Write log likelihood fuctio l(θ) = i l i(θ) (c) Derive ˆθ, the maximum likelihood (ML) estimator of θ. (d) Derive a estimator of the variace of ˆθ usig ay oe of the four stadard methods. 2

Aswers 1. (a) the egative Hessia estimator: ˆVa = ( 2 l ) 1, evaluated at θ = ˆθ (b) the OPG estimator: ˆVb = ( ( l i )( l i ) ) 1, where the l i s are evaluated at θ = ˆθ (c) misspecificatio-cosistet estimator: give the defiitios i (a) ad (b), it ca be writte as ˆV c = ˆV 1 a ˆV b ˆV a, or ˆV c = ( 2 l ) 1 ( ( l i )( l i ) )( 2 l ) 1 2. (a) (i) The probability desity fuctio f(y) = 0 whe y < 0 ad f(y) = 0 whe y > 1. Therefore whe y < 0, the cdf is F (y) = y f(s)ds = 0, ad whe y > 1, F (y) = 1. Whe 0 < y < 1, F (y) = y (ii) E(y) = 1 0 yf(y)dy = 0 ((1 θ) + 2θs)ds = ((1 θ)s + θs 2 ) y s=0 = (1 θ)y + θy 2, 0 < y < 1 1 0 y((1 θ) + 2θy)dy = ((1/2)(1 θ)y 2 + (2/3)θy 3 ) 1 y=0 = (1/2)(1 θ) + (2/3)θ = (1/2) + (1/6)θ (iii) From (ii), E(y) = (1/2) + (1/6)θ, which gives a populatio momet coditio E(y ((1/2) + (1/6)θ)) = 0 The sample momet coditio is 1 (y i ((1/2) + (1/6)ˆθ)) = 0 which ca be writte as ȳ (1/2) (1/6)ˆθ = 0, ad the estimator is ˆθ = 6ȳ 3 (b) (i) l(θ) = l(1 θ + 2θy i). (ii) There is o closed-form solutio. The first-order coditio is l(θ)/ = 0 2y i 1 1 θ + 2θy i = 0 at θ = ˆθ 3

3. (a) (i) l i (θ) = l(f(y i θ) = l(θ) y i /θ (ii) l(θ) = l i(θ) = l(θ) y i/θ (iii) ˆθ is the value of θ that solves l/ = 0. Therefore l/ = θ + y i θ 2 ˆθ = y i ˆθ ˆθ = y i 2 = ȳ (b) (i) 2 l/ 2 = θ 2 2 y i θ 3 Evaluatig this at ˆθ = ȳ ad subbig out y i = ȳ gives 2 l(ˆθ)/ 2 = ṋ θ 2 2(ȳ) ˆθ 3 = ˆθ 2 The egative Hessia variace estimator is ˆV 1 = ( 2 l(ˆθ)/ 2 ) 1 = ˆθ 2 (ii) The Iformatio matrix is mius oe times the expected value of the secod derivative matrix derived i part (i). The expoetial desity assumptio implies E(y i ) = θ, so E 2 l/ 2 = 2 θ θ 3 θ 2 = 2θ θ 3 θ 2 = θ 2 The Iformatio matrix variace estimator is the iverse of the Iformatio matrix, evaluated at ˆθ ˆV 2 = ( ṋ θ 2 ) 1 = ˆθ 2 (iii) Evaluate the gradiet, or first derivative vector of l i, at ˆθ : l i (ˆθ)/ = 1ˆθ + y i ˆθ = y i ˆθ 2 ˆθ 2 = y i ȳ ˆθ 2 For otatioal coveiece, use ˆσ 2 = 1 (y i ȳ) 2, eve though there is o σ 2 4

parameter i the model. The the OPG is ( l i(ˆθ) )( l i(ˆθ) ) = ( l i(ˆθ) )2 = (y i ȳ) 2 ˆθ 4 = ˆσ2 ˆθ 4 (iv) The outer product of gradiet (OPG) variace estimator is the iverse of this OPG (Aside: ˆV 3 = ˆθ 4 ˆσ 2 ˆV3 has the odd feature that ˆσ 2 appears i the deomiator rather tha the umerator. But it turs out that for the expoetial distributio, Var(y i ) = θ 2. Sice plim(ˆσ 2 ) =Var(y i ), the as, ˆσ 2 ad ˆθ 2 both coverge to θ 2. So as, the ˆV 3 becomes close to θ4 = θ2 θ 2, the same as ˆV 1 ad ˆV 2. This equivalece depeds o the assumptio that y i has a expoetial distributio.) ˆV 4 = ( 2ˆl 2 ) 1 ( = ˆV 1 ( ˆV 3 ) 1 ˆV1 = ( ˆθ 2 )(ˆσ2 ˆθ 2 )( ˆθ 4 ) = ˆσ2 ( l i(ˆθ) 4. (a) l i (θ) = l(x i θ) y i x i θ (b) l(θ) = l i(θ) = i l(x iθ) i ( y i x i θ ) (c) l(θ)/ = ( l(x i θ)/) = θ + ( y i ( 1 x i θ 2 ) = 0 whe ˆθ + i )( l i(ˆθ) ) )( 2ˆl 2 ) 1 ( ( y i x i θ )/) ( y i x i ) = 0 ˆθ = 1 i (d) Sice E(y i x i, θ) = x i θ, the Em(y i, x i, θ) = 0 where m = y i x i θ. This populatio momet ( y i x i ) coditio leads to the sample momet coditio 1 i (y i x i ˆθ) = 0. Solvig for ˆθ gives ˆθ = i y i/ i x i = ȳ/ x. (Aother choice of momet coditio is Ex i (y i x i θ) = 0, which leads to OLS, ˆθ = i x iy i / i x2 i.) 5. (a) l i (θ) = l(prob(y i θ)) = y i l θ θ l(y i!) (b) l(θ) = i l i(θ) = i y i l θ θ i l(y i!) 5

(c) ˆθ is the value of θ satisfyig the first-order coditio l/ = 0. l/ = i y i θ = 0 at ˆθ = i y i (d) The egative Hessia variace estimator is V (ˆθ) = 2 l 2 = i y i V (ˆθ) = = ȳ ( ) 1 2 l 2 evaluated at θ = ˆθ, ad θ 2, therefore i y ) 1 i ) = ˆθ 2 ˆθ 2 ( ( i y i = ˆθ 2 ˆθ = ˆθ 6