Maximum Likelihood Estimation

Similar documents
Lecture 11 and 12: Basic estimation theory

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Random Variables, Sampling and Estimation

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

1 General linear Model Continued..

Logit regression Logit regression

Economics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity

Topic 9: Sampling Distributions of Estimators

Properties and Hypothesis Testing

11 Correlation and Regression

Topic 9: Sampling Distributions of Estimators

TAMS24: Notations and Formulas

1.010 Uncertainty in Engineering Fall 2008

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Questions and Answers on Maximum Likelihood

Efficient GMM LECTURE 12 GMM II

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Lecture 7: Properties of Random Samples

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Topic 9: Sampling Distributions of Estimators

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

of the matrix is =-85, so it is not positive definite. Thus, the first

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

1 Inferential Methods for Correlation and Regression Analysis

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Estimation for Complete Data

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Chapter 6 Principles of Data Reduction

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Chi-Squared Tests Math 6070, Spring 2006

Lecture 33: Bootstrap

Simple Regression Model

Stat 421-SP2012 Interval Estimation Section

MA Advanced Econometrics: Properties of Least Squares Estimators

Lecture 11 Simple Linear Regression

32 estimating the cumulative distribution function

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Distribution of Random Samples & Limit theorems

STA6938-Logistic Regression Model

Problem Set 4 Due Oct, 12

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Asymptotic Results for the Linear Regression Model

Simple Linear Regression

ECO 312 Fall 2013 Chris Sims LIKELIHOOD, POSTERIORS, DIAGNOSING NON-NORMALITY

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Lecture 2: Monte Carlo Simulation

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Last Lecture. Wald Test

Table 1: Mean FEV1 (and sample size) by smoking status and time. FEV (L/sec)

An Introduction to Asymptotic Theory

Statistical Inference Based on Extremum Estimators

Estimation of the Mean and the ACVF

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

11 THE GMM ESTIMATION

Exponential Families and Bayesian Inference

Matrix Representation of Data in Experiment

Statistical Properties of OLS estimators

CS284A: Representations and Algorithms in Molecular Biology

10-701/ Machine Learning Mid-term Exam Solution

Expectation and Variance of a random variable

Output Analysis and Run-Length Control

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

2. The volume of the solid of revolution generated by revolving the area bounded by the

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Section 14. Simple linear regression.

Classical Linear Regression Model. Normality Assumption Hypothesis Testing Under Normality Maximum Likelihood Estimator Generalized Least Squares

1 Models for Matched Pairs

Simple Linear Regression

15-780: Graduate Artificial Intelligence. Density estimation

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Quick Review of Probability

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

Machine Learning Brett Bernstein

STAT Homework 1 - Solutions

Kurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Mathematical Statistics - MS

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Solution to Chapter 2 Analytical Exercises

1 Review of Probability & Statistics

1 Covariance Estimation

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Mathematical Statistics Anna Janicka

Transcription:

Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum likelihood estimatio alog with some examples. Let the probability desity fuctio (pdf) of a radom variable, y, coditioal o a set of parameters, θ, be deoted by f(y θ). The fuctio idetifies the datageeratig process that uderlies a observed sample ad provides a mathematical descriptio of the data that the process will produce. The joit desity of idepedet ad idetically distributed (i.i.d.) observatios from this process is the product of idividual desities f(y 1,...,y )= f(y i θ)=l(θ y) (9.1) This joit desity is the likelihood fuctio ad it is defied as a fuctio of the ukow parameter vector, θ, ad the collectio of sample data, y. Writig the parameters as a fuctio of the data, the log of the likelihood fuctio ca be writte as: ll(θ y)= l f(y i θ). (9.2) We ca geeralize the cocept ad allow the likelihood fuctio to deped o other coditioig variables, x. Suppose that i the classical liear regressio model the disturbace follows a ormal distributio. The, coditioal o x i, y i is ormally distributed with mea µ i = x i β ad variace σ 2. The log likelihood is: ll(θ y,x) = l f(y i x i,θ) (9.3) 77

78 9 Maximum Likelihood Estimatio = 1 2 [lσ 2 + l(2π)+ (y i x i β)2 ] σ 2, where X is the K matrix of data with the ith row equal to x i. We say that the parameter vector θ is idetified (estimable) if for ay other parameter vector, θ θ, for some data y, L(θ y) L(θ y). 9.2 Properties of Maximum Likelihood Estimators Maximum likelihood estimators (MLEs) are most attractive because of their largesample or asymptotic properties. Give certai regularity coditios MLE is asymptotically efficiet. That is, it is cosistet, asymptotically ormally distributed, ad has a asymptotic covariace matrix that is ot larger tha the asymptotic covariace matrix of ay other cosistet, asymptotically ormally distributed estimator. 9.3 Maximum Likelihood Estimators: Two examples 9.3.1 MLE i Stata: A liear regressio model Suppose we wat to estimate the followig model The Stata MLE program is foreig i = β 0 + β 1 mpg i + β 2 weight i + ε i (9.4) capture program drop myols program myols versio 11 args lf xb lsigma local y "$ML_y1" quietly replace lf = l(ormalde( y, xb, exp( lsigma ))) ed To ru this program we eed to load the data ad the type use http://www.stata-press.com/data/r11/auto ml model lf myols (xb: foreig = mpg weight) (lsigma:) ml maximize This yields the followig regressio output iitial: log likelihood = -79.001451 Iteratio 5: log likelihood = -29.838155 Number of obs = 74 Wald chi2(2) = 43.88 Log likelihood = -29.838155 Prob > chi2 = 0.0000 foreig Coef. Std. Err. z P> z [95% Cof. Iterval]

9.3 Maximum Likelihood Estimators: Two examples 79 xb mpg -.0194295.0124106-1.57 0.117 -.0437539.0048949 weight -.0004678.0000924-5.06 0.000 -.0006488 -.0002867 _cos 2.123506.5181488 4.10 0.000 1.107953 3.139059 lsigma _cos -1.01572.0821995-12.36 0.000-1.176828 -.8546122 We ca check these results with the OLS that you will obtai with the commad reg foreig mpg weight. reg foreig mpg weight This yields the followig regressio output Source SS df MS Number of obs = 74 -------------+------------------------------ F( 2, 71) = 21.05 Model 5.75462023 2 2.87731012 Prob > F = 0.0000 Residual 9.70483923 71.136687876 R-squared = 0.3722 -------------+------------------------------ Adj R-squared = 0.3546 Total 15.4594595 73.211773417 Root MSE =.36971 foreig Coef. Std. Err. t P> t [95% Cof. Iterval] mpg -.0194295.0126701-1.53 0.130 -.044693.005834 weight -.0004678.0000943-4.96 0.000 -.0006558 -.0002797 _cos 2.123506.5289824 4.01 0.000 1.068745 3.178267 9.3.2 Biary choice models: Probit A probit is a model to explai biary choice variables. The idea is that whe the depedet variable is either Y = 1 or Y = 0 we should use biary choice models. For example, if a perso chooses to buy a foreig car (foreig = 1), the probability of choosig this type of car ca be modeled as a fuctio ofmpg adweight. I geeral, Prob(Y = 1 x) = F(x,β) (9.5) Prob(Y = 0 x) = 1 F(x,β) The set of parameters β reflects the impact of the chages i x o the probability. The problem at this poit is to devise a suitable model for the right-had side of the equatio. The simplest approach is to retai the familiar liear regressio, F(x,β)=x β. (9.6) Because E[y x]=f(x,β), we ca costruct the regressio model. y=e[y x]+(y E[y x])=x β + ε. (9.7)

80 9 Maximum Likelihood Estimatio This liear probability model has a umber of shortcomigs. The first complicatio is that ε is heteroscedastic. Because x β + ε must be either 0 of 1, ε equals either xβ or 1 xβ with probabilities 1 F( ) ad F( ), respectively. 1 The secod complicatio is that we caot costrai x β to be i the 0 1 iterval. This meas that the model ca geerate probabilities outside[0, 1] ad potetially egative variaces. Fig. 9.1 Model for a Probability, from [Greee (2008)]. As Figure 9.1 suggests, for a give regressor vector, x β, we would like lim Prob(Y = 1 x)=1 (9.8) x β + lim Prob(Y = 1 x)=0 x β A cotiuous probability distributio defied over the real lie should work. Whe the ormal distributio is used, this gives rise to the probit model, x β Prob(Y = 1 x)= φ(t)dt = Φ(x β), (9.9) where φ( ) ad Φ( ) are the pdf ad the cdf of the ormal distributio, respectively. 1 It ca be show that the variace is: Var[ε x]=x β(1 x β).

9.3 Maximum Likelihood Estimators: Two examples 81 9.3.3 Probit via step by step MLE i Stata I a probit model the log likelihood fuctio for Equatio 9.5 is give by { lφ(θ1 j ) if y j = 1 ll i = l(1 Φ(θ 1 j )) if y j = 0 where θ 1 j = x j b 1. The probit program is capture program drop myprobit program myprobit versio 11 args lf xb local y "$ML_y1" quietly replace lf = l( ormal( xb )) if y ==1 quietly replace lf = l(1-ormal( xb )) if y ==0 ed To ru this program use http://www.stata-press.com/data/r11/auto ml model lf myprobit (foreig = mpg weight) ml maximize The probit regressio output is iitial: log likelihood = -51.292891 Iteratio 5: log likelihood = -26.844189 Number of obs = 74 Wald chi2(2) = 20.75 Log likelihood = -26.844189 Prob > chi2 = 0.0000 foreig Coef. Std. Err. z P> z [95% Cof. Iterval] mpg -.1039503.0515689-2.02 0.044 -.2050235 -.0028772 weight -.0023355.0005661-4.13 0.000 -.003445 -.0012261 _cos 8.275464 2.554142 3.24 0.001 3.269437 13.28149 You ca check these results with Stata built-i procedure for probit estimatio usig the Stata built-i commad probit foreig mpg weight The probit regressio output is Iteratio 0: log likelihood = -45.03321 Iteratio 5: log likelihood = -26.844189 Probit regressio Number of obs = 74 LR chi2(2) = 36.38 Prob > chi2 = 0.0000 Log likelihood = -26.844189 Pseudo R2 = 0.4039 foreig Coef. Std. Err. z P> z [95% Cof. Iterval] mpg -.1039503.0515689-2.02 0.044 -.2050235 -.0028772 weight -.0023355.0005661-4.13 0.000 -.003445 -.0012261 _cos 8.275464 2.554142 3.24 0.001 3.269437 13.28149

82 9 Maximum Likelihood Estimatio The margial effect i the logistic regressio are give by E[y x] x I Stata these oe ca be computed usig mfx compute The probit regressio output is = φ(x β)β. (9.10) Margial effects after probit y = Pr(foreig) (predict) =.16096991 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- mpg -.0253924.0128-1.98 0.047 -.050478 -.000307 21.2973 weight -.0005705.00013-4.25 0.000 -.000833 -.000308 3019.46 This commad assumes that x β i Equatio 9.10 is evaluated i the sample meas of the regressors. As a fial commet, MLEs ca be used i a large umber of cases. Some examples iclude latet regressio models, SUR, simultaeous equatios models, oliear regressio models, stochastic frotier models, cout data models, dyamic discrete choice models, GARCHs, etc.