Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum likelihood estimatio alog with some examples. Let the probability desity fuctio (pdf) of a radom variable, y, coditioal o a set of parameters, θ, be deoted by f(y θ). The fuctio idetifies the datageeratig process that uderlies a observed sample ad provides a mathematical descriptio of the data that the process will produce. The joit desity of idepedet ad idetically distributed (i.i.d.) observatios from this process is the product of idividual desities f(y 1,...,y )= f(y i θ)=l(θ y) (9.1) This joit desity is the likelihood fuctio ad it is defied as a fuctio of the ukow parameter vector, θ, ad the collectio of sample data, y. Writig the parameters as a fuctio of the data, the log of the likelihood fuctio ca be writte as: ll(θ y)= l f(y i θ). (9.2) We ca geeralize the cocept ad allow the likelihood fuctio to deped o other coditioig variables, x. Suppose that i the classical liear regressio model the disturbace follows a ormal distributio. The, coditioal o x i, y i is ormally distributed with mea µ i = x i β ad variace σ 2. The log likelihood is: ll(θ y,x) = l f(y i x i,θ) (9.3) 77
78 9 Maximum Likelihood Estimatio = 1 2 [lσ 2 + l(2π)+ (y i x i β)2 ] σ 2, where X is the K matrix of data with the ith row equal to x i. We say that the parameter vector θ is idetified (estimable) if for ay other parameter vector, θ θ, for some data y, L(θ y) L(θ y). 9.2 Properties of Maximum Likelihood Estimators Maximum likelihood estimators (MLEs) are most attractive because of their largesample or asymptotic properties. Give certai regularity coditios MLE is asymptotically efficiet. That is, it is cosistet, asymptotically ormally distributed, ad has a asymptotic covariace matrix that is ot larger tha the asymptotic covariace matrix of ay other cosistet, asymptotically ormally distributed estimator. 9.3 Maximum Likelihood Estimators: Two examples 9.3.1 MLE i Stata: A liear regressio model Suppose we wat to estimate the followig model The Stata MLE program is foreig i = β 0 + β 1 mpg i + β 2 weight i + ε i (9.4) capture program drop myols program myols versio 11 args lf xb lsigma local y "$ML_y1" quietly replace lf = l(ormalde( y, xb, exp( lsigma ))) ed To ru this program we eed to load the data ad the type use http://www.stata-press.com/data/r11/auto ml model lf myols (xb: foreig = mpg weight) (lsigma:) ml maximize This yields the followig regressio output iitial: log likelihood = -79.001451 Iteratio 5: log likelihood = -29.838155 Number of obs = 74 Wald chi2(2) = 43.88 Log likelihood = -29.838155 Prob > chi2 = 0.0000 foreig Coef. Std. Err. z P> z [95% Cof. Iterval]
9.3 Maximum Likelihood Estimators: Two examples 79 xb mpg -.0194295.0124106-1.57 0.117 -.0437539.0048949 weight -.0004678.0000924-5.06 0.000 -.0006488 -.0002867 _cos 2.123506.5181488 4.10 0.000 1.107953 3.139059 lsigma _cos -1.01572.0821995-12.36 0.000-1.176828 -.8546122 We ca check these results with the OLS that you will obtai with the commad reg foreig mpg weight. reg foreig mpg weight This yields the followig regressio output Source SS df MS Number of obs = 74 -------------+------------------------------ F( 2, 71) = 21.05 Model 5.75462023 2 2.87731012 Prob > F = 0.0000 Residual 9.70483923 71.136687876 R-squared = 0.3722 -------------+------------------------------ Adj R-squared = 0.3546 Total 15.4594595 73.211773417 Root MSE =.36971 foreig Coef. Std. Err. t P> t [95% Cof. Iterval] mpg -.0194295.0126701-1.53 0.130 -.044693.005834 weight -.0004678.0000943-4.96 0.000 -.0006558 -.0002797 _cos 2.123506.5289824 4.01 0.000 1.068745 3.178267 9.3.2 Biary choice models: Probit A probit is a model to explai biary choice variables. The idea is that whe the depedet variable is either Y = 1 or Y = 0 we should use biary choice models. For example, if a perso chooses to buy a foreig car (foreig = 1), the probability of choosig this type of car ca be modeled as a fuctio ofmpg adweight. I geeral, Prob(Y = 1 x) = F(x,β) (9.5) Prob(Y = 0 x) = 1 F(x,β) The set of parameters β reflects the impact of the chages i x o the probability. The problem at this poit is to devise a suitable model for the right-had side of the equatio. The simplest approach is to retai the familiar liear regressio, F(x,β)=x β. (9.6) Because E[y x]=f(x,β), we ca costruct the regressio model. y=e[y x]+(y E[y x])=x β + ε. (9.7)
80 9 Maximum Likelihood Estimatio This liear probability model has a umber of shortcomigs. The first complicatio is that ε is heteroscedastic. Because x β + ε must be either 0 of 1, ε equals either xβ or 1 xβ with probabilities 1 F( ) ad F( ), respectively. 1 The secod complicatio is that we caot costrai x β to be i the 0 1 iterval. This meas that the model ca geerate probabilities outside[0, 1] ad potetially egative variaces. Fig. 9.1 Model for a Probability, from [Greee (2008)]. As Figure 9.1 suggests, for a give regressor vector, x β, we would like lim Prob(Y = 1 x)=1 (9.8) x β + lim Prob(Y = 1 x)=0 x β A cotiuous probability distributio defied over the real lie should work. Whe the ormal distributio is used, this gives rise to the probit model, x β Prob(Y = 1 x)= φ(t)dt = Φ(x β), (9.9) where φ( ) ad Φ( ) are the pdf ad the cdf of the ormal distributio, respectively. 1 It ca be show that the variace is: Var[ε x]=x β(1 x β).
9.3 Maximum Likelihood Estimators: Two examples 81 9.3.3 Probit via step by step MLE i Stata I a probit model the log likelihood fuctio for Equatio 9.5 is give by { lφ(θ1 j ) if y j = 1 ll i = l(1 Φ(θ 1 j )) if y j = 0 where θ 1 j = x j b 1. The probit program is capture program drop myprobit program myprobit versio 11 args lf xb local y "$ML_y1" quietly replace lf = l( ormal( xb )) if y ==1 quietly replace lf = l(1-ormal( xb )) if y ==0 ed To ru this program use http://www.stata-press.com/data/r11/auto ml model lf myprobit (foreig = mpg weight) ml maximize The probit regressio output is iitial: log likelihood = -51.292891 Iteratio 5: log likelihood = -26.844189 Number of obs = 74 Wald chi2(2) = 20.75 Log likelihood = -26.844189 Prob > chi2 = 0.0000 foreig Coef. Std. Err. z P> z [95% Cof. Iterval] mpg -.1039503.0515689-2.02 0.044 -.2050235 -.0028772 weight -.0023355.0005661-4.13 0.000 -.003445 -.0012261 _cos 8.275464 2.554142 3.24 0.001 3.269437 13.28149 You ca check these results with Stata built-i procedure for probit estimatio usig the Stata built-i commad probit foreig mpg weight The probit regressio output is Iteratio 0: log likelihood = -45.03321 Iteratio 5: log likelihood = -26.844189 Probit regressio Number of obs = 74 LR chi2(2) = 36.38 Prob > chi2 = 0.0000 Log likelihood = -26.844189 Pseudo R2 = 0.4039 foreig Coef. Std. Err. z P> z [95% Cof. Iterval] mpg -.1039503.0515689-2.02 0.044 -.2050235 -.0028772 weight -.0023355.0005661-4.13 0.000 -.003445 -.0012261 _cos 8.275464 2.554142 3.24 0.001 3.269437 13.28149
82 9 Maximum Likelihood Estimatio The margial effect i the logistic regressio are give by E[y x] x I Stata these oe ca be computed usig mfx compute The probit regressio output is = φ(x β)β. (9.10) Margial effects after probit y = Pr(foreig) (predict) =.16096991 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- mpg -.0253924.0128-1.98 0.047 -.050478 -.000307 21.2973 weight -.0005705.00013-4.25 0.000 -.000833 -.000308 3019.46 This commad assumes that x β i Equatio 9.10 is evaluated i the sample meas of the regressors. As a fial commet, MLEs ca be used i a large umber of cases. Some examples iclude latet regressio models, SUR, simultaeous equatios models, oliear regressio models, stochastic frotier models, cout data models, dyamic discrete choice models, GARCHs, etc.