Maximum Likelihood (ML) Estimation

Similar documents
Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Introduction to Estimation Methods for Time Series models Lecture 2

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Discrete Dependent Variable Models

Greene, Econometric Analysis (6th ed, 2008)

Spatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Generalized Method of Moments (GMM) Estimation

Eksamen på Økonomistudiet 2006-II Econometrics 2 June 9, 2006

Econometrics of Panel Data

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Large Sample Properties of Estimators in the Classical Linear Regression Model

Linear Regression with Time Series Data

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Advanced Quantitative Methods: maximum likelihood

Testing Linear Restrictions: cont.

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Statistics and econometrics

Introductory Econometrics

Econometrics - 30C00200

Formulary Applied Econometrics

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Multiple Linear Regression

Univariate Time Series Analysis; ARIMA Models

Heteroskedasticity. Part VII. Heteroskedasticity

Exercises Chapter 4 Statistical Hypothesis Testing

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

1 Appendix A: Matrix Algebra

Empirical Economic Research, Part II

The outline for Unit 3

Chapter 1. GMM: Basic Concepts

Cointegration Lecture I: Introduction

GARCH Models Estimation and Inference

Reliability of inference (1 of 2 lectures)

2. Linear regression with multiple regressors

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Tests forspatial Lag Dependence Based onmethodof Moments Estimation

Økonomisk Kandidateksamen 2004 (II) Econometrics 2 June 14, 2004

Advanced Quantitative Methods: maximum likelihood

GARCH Models Estimation and Inference

A Course on Advanced Econometrics

ADVANCED FINANCIAL ECONOMETRICS PROF. MASSIMO GUIDOLIN

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Econometrics II - EXAM Answer each question in separate sheets in three hours

Introduction to Econometrics. Heteroskedasticity

Outline of GLMs. Definitions

F9 F10: Autocorrelation

1. The Multivariate Classical Linear Regression Model

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Assumptions of classical multiple regression model

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Non-Stationary Time Series, Cointegration, and Spurious Regression

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Lecture 6: Hypothesis Testing

Advanced Econometrics

Size and Power of the RESET Test as Applied to Systems of Equations: A Bootstrap Approach

Lecture 7: Dynamic panel models 2

Econometrics of Panel Data

(c) i) In ation (INFL) is regressed on the unemployment rate (UNR):

Introduction to Maximum Likelihood Estimation

8. Hypothesis Testing

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Introduction to Eco n o m et rics

LECTURE 11. Introduction to Econometrics. Autocorrelation

ECON 5350 Class Notes Nonlinear Regression Models

Testing Hypothesis after Probit Estimation

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Functional Form. Econometrics. ADEi.

Quick Review on Linear Multiple Regression

13.2 Example: W, LM and LR Tests

Graduate Econometrics Lecture 4: Heteroskedasticity

GMM Based Tests for Locally Misspeci ed Models

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Econometric Analysis of Cross Section and Panel Data

Departamento de Economía Universidad de Chile

This note analyzes OLS estimation in a linear regression model for time series

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Generalized Linear Models

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Introduction to Estimation Methods for Time Series models. Lecture 1

1 Motivation for Instrumental Variable (IV) Regression

New Developments in Econometrics Lecture 16: Quantile Estimation

Testing Statistical Hypotheses

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Dynamic Regression Models (Lect 15)

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

6. MAXIMUM LIKELIHOOD ESTIMATION

Transcription:

Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear regression. (5) Pseudo-ML estimation. (6) Test principles. Wald test. Likelihood ratio (LR) test. Lagrange multiplier (LM) test or score test. 2of32

Introduction: GMM and ML Consider stochastic variables (1 1) and ( 1) and the observations µ 12 Assume that we have a conditional model for in mind. GMM derives moment conditions [( )] 0. The sample moments () 1 X ( ) links to the data. The GMM estimator, b, minimizes () 0 (). Maximum likelihood (ML) assumes knowledge of the distribution density() up to unknown parameters,. The ML estimator, b, maximizes the probability of the data given density. 3of32 The Likelihood Principle The probability of observing given is given by the conditional density ( ; ) If the observations are i.i.d., the probability of 1 (given ) is ( 1 2 ; ) Y ( ; ) The likelihood function for the sample is defined as Y Y ( ; ) ( 1 2 ; ) ( ; ) ( ; ) where () is called the likelihood contribution for observation. The ML estimator b is chosen to maximize ( ; ). 4of32

The ML Estimator Easier to maximize the log-likelihood function Define the score vector as () ( 1) log () log () X log ( ; ) X where () is the score for each observation. log () X () The first order conditions, the so-called likelihood equations, states ( b log () X ) log () X ( b b )0 b Might be dicult to solve in practice. Use numerical optimization. 5of32 Properties of ML Given correct specification (and weak regularity conditions): The ML estimator is consistent, plim b. The ML estimator is asymptotically normal ³ b (0) where () 1 µ 2 1 log () 0 The negative expected Hessian, (), is called the information matrix. The more curvature of the likelihood function, the more precision. The ML estimator is asymptotically ecient. All other consistent and asymptotically normal estimators will have an asymptotic variance larger than () 1. () 1 is denoted the Cramèr-Rao lower bound. 6of32

Asymptotic Inference Asymptotic inference can be based on b where b is a consistent estimate of. µ 1 b One possibility is the sample average of second derivatives à b 1 X ( ) 2 log () 0 b! 1 It can be shown than [ () () 0 ](). An alternative estimator is the outer-product-of-the-scores Ã! 1 b 1 X ( b ) ( b ) 0 7of32 Joint and Conditional Distributions Aboveweconsideredamodelfor corresponding to the conditional distribution, ( ; ). From a statistical point of view a natural alternative would be a model for all the data ( 0 ) 0, corresponding to the joint density, ( ). Recall the factorization ( ) ( ; ) ( ) If the two sets of parameters, and, are not related (and is exogenous in a certain sense), we can estimate in the conditional model ( ; ) ( ) ( ) 8of32

Time Series (Non-i.i.d.) Data and Factorization The multiplicative form ( 1 2 ; ) Y ( ; ) follows from the i.i.d. assumption, which cannot be made for many time series. For a time series, the object of interest is often [ 1 1 ]. This can be used to factorize the likelihood function. Recall again that ( 1 2 ) ( 1 1 ; ) ( 1 2 1 ) Y ( 1 ) ( 1 1 ; ) Conditioning on the first observation, 1, gives a multiplicative structure 2 ( 2 1 ; ) ( 1 2 ) ( 1 ) Y ( 1 1 ; ) 2 9of32 Example I: Binomial Trials Consider random draws from a pool of red and blue balls. We are interested in the proportion of red balls,. Let ½ 1 if a ball is red 0 if a ball is blue Consideradatasetof draws 1 2 e.g. 1 0 0 1 0 1 The model implies that prob( 1) and prob( 0)1 The density function for is given by the Binomial ( ) (1 ) (1) 10 of 32

Since the draws are independent, the likelihood function is given by Y () ( ) and the log-likelihood function is Y (1 ) (1) X log () [ log()+(1 ) log (1 )] The ML estimator, b, is chosen to maximize this expression. The score for an individual observation is given by () log () 1 1 11 of 32 The likelihood equation is given by the first order condition X X () () 1 0 1 This implies that P P 1 Ã! X X (1 ) X X X b P 12 of 32

The second derivative is 2 log () µ 1 1 1 2 (1 ) 2 Using that [ ], theinformation is given by 2 log () () 1 2 (1 ) 2 1 + 2 (1 ) 2 1 + 1 1 1 (1 ) + (1 ) 1 (1 ) 13 of 32 Inference on can be based on b µ 1 b (1 b ) ML estimation can easily be done in PcGive using numerical optimization. Specify actual: fitted: Nothing in this case. loglik: log ( ) log()+(1 ) log (1 ) Initial value for the parameters, denoted &0,&1,...,&k. In our case (where the data series is denoted Bin): actual Bin; fitted 0; loglik actual*log(&0)+(1-actual)*log(1-&0); &0 0.5; 14 of 32

Example II: Linear Regression Consider a linear regression model 0 + 1 2 Make the following assumptions [ ] 0 () [ 0 ] 2 () () is a moment condition. The models is the conditional expectation. () excludes heteroscedasticity and autocorrelation. For ML estimation, we have to specify a distribution for.assumethat (0 2 ) This implies that ( 0 2 ) 15 of 32 The probability of observing given the model is the density ( ) ; 2 1 exp 1 ( 0 ) 2 2 2 2 2 Since the observations are assumed i.i.d. the probability of 1 is 1 2 ; 2 Y ; 2 µ 1 2 2 ( ) Y exp 1 ( 0 ) 2 2 2 16 of 32

The likelihood function is given by 2 µ 1 2 2 ( ) Y exp 1 ( 0 ) 2 2 2 The log-likelihood function is log 2 2 log 2 2 1 X 2 ( 0 ) 2 The ML estimators b and b 2 are chosen to maximize this expression. 2 17 of 32 The likelihood contributions are log 2 1 2 log 2 2 1 2 ( 0 ) 2 The individual scores are given by the first derivatives! Ã 2 2 (+1) 1 1 2 + 1 2 2 Ã log ( 2 ) log ( 2 ) 2 2 ( 0 ) ( 0 ) 2 4! The first order conditions are given by () X Ã P 2 ( 0 ) 2 2 2 + 1 2 P ( 0 ) 2 4! µ 0 0 18 of 32

The first condition: X X X ( 0 ) 0 0 implies à X! 1 X b 0 b Letting b ( 0 b ), the second condition yields 1 2 X b 2 4 2 2 b 2 1 X b 2 which is slightly dierent from the OLS variance. The ML estimator of the variance is biased but consistent. 19 of 32 The second derivatives are given by 2 log ( 2 ) 0 ½ ( 0 ¾ ) 2 2 log ( 2 ) ½ ( 0 ¾ ) 2 2 ( 2 ) 2 log ( 2 ) 1 2 2 2 2 + 1 ( 0 ) 2 2 2 4 2 log ( 2 ) 2 0 ( 1 2 + 1 ( 0 ) 2 2 2 4 ) 0 2 ( 0 ) 4 1 2 4 2 6 0 ( 0 ) 4 4 0 4 That gives the information matrix à ( 2 ) 0 2 4 0 4 1 2 4 2 6 Note that the information matrix is block diagonal.! µ 2 [ 0 ] 0 0 1 2 4 20 of 32

Recall that µ b b 2 µµ 2 1 ()1 This implies b µ 2 ([ 0 ]) 1 where the variance can be estimated by b 2 Ã 1! 1 X 0 b 2 Ã X 1! 0 Furthermore b 2 µ 2 2 4 21 of 32 Pseudo-ML Estimation The first order conditions for ML estimation can be seen as a sample counterpart to a moment condition 1 X () 0 corresponds to [ ()] 0 and ML becomes a special case of GMM. b is consistent for weaker assumptions than maintained by ML. The FOC for a normal regression model corresponds to [ ( 0 )] 0 which is weaker than the assumption that the entire distribution is correctly specified. OLS is consistent even if is not normal. A ML estimation that maximizes a likelihood function dierent from the true models likelihood is referred to as a pseudo-ml or a quasi-ml estimator. 22 of 32

Three Classical Test Principles Consider a null hypothesis of interest, 0, and an alternative,.e.g. 0 : ( ) against : 6. Let e and b denote the ML estimates under 0 and respectively Wald test. Estimates the model only under, and look at the distance b normalized by the covariance matrix. Likelihood ratio (LR) test. Estimate under 0 and under and look at the loss in likelihood log ( b ) log ( e ). Lagrange multiplier (LM) or score test. Estimate e under 0 and see if the FOCs P ( e )0are significantly violated. The tests are asymptotically equivalent. For the normal regression model in finite samples. 23 of 32 LogL( ) LR { LM { W 0 24 of 32

Wald Test Recall, that b 1,sothat b 1 0 If the null hypothesis is true, and a natural test statistic is ³ b 0 ³ b 0 1 ³ b Under the null this is distributed as 2 () An example is the ratio for 0 : 0 p b q 0 ( b ) (0 1) Requires only estimation under the alternative,. 25 of 32 Likelihood Ratio (LR) Test For the LR test we estimate both under 0 and under The LR test statistic is given by à ( 2 log e! ) ( b 2 ) where ( e ) and ( b ) are the two likelihood values. Under the null, this is asymptotically distributed as 2 () ³ log ( e ) log ( b ) The test is insensitive to how the model and restrictions are formulated. Test is only appropriate when the models are nested. 26 of 32

Lagrange Multiplier (LM) or Score Test Recall that the average score is zero at the unrestricted estimate If the restriction is true () ( e ) X ( b )0 X ( e ) 0 This can be tested by the quadratic form à X à X!! 1 à X! ( e )!ÃVariance 0 ( e ) ( e ) which under the null is 2 (). 27 of 32 Thevarianceoftheindividualscore, (), is the information matrix, () [ () () 0 ] which can be estimated as b ( e ) 1 X ( e ) ( e ) 0 The estimated variance of P ( b ) is therefore b ( e ) X ( e ) ( e ) 0 AndtheLMtestcanbewrittenas à X!à X! 1 à X ( e ) 0 ( e ) ( e ) 0 which will have a 2 () distribution under the null. ( e )! 28 of 32

Note, that the quadratic form is of dimension, but elements are unrestricted and P ( e )0for these elements. Hence a 2 (). Inthecasewherethenullhypothesisis 0 : 2 0,for µ µ 1 1 (), and () 2 2 () the LM statistic only depends on the scores corresponding to 2 : à X à X!! 1 à X! 2 ( e )!ÃVariance 0 2 ( e ) 2 ( e ) Note, however, that to calculate à X Variance 2 ( e )! 1 we need the full score vector, except if the covariance matrix is block-diagonal. 29 of 32 LM Tests by Auxiliary Regressions Define the matrix Then ( ) 1( e ) 0. ( e ) 0 X 0 (1 1) 0 ( 1) and ( 1) ( e ) 1. 1 X 0 ( e ) ( e ) 0 ( ) Therefore, we can write the LM statistic as à X!à X! 1 à X ( e ) 0 ( e ) ( e ) 0 0 ( 0 ) 1 0 ( e )! 30 of 32

Now consider the regression model + residual. () The OLS estimator and the predicted values are given by, respectively, b ( 0 ) 1 0 and b b ( 0 ) 1 0 The LM test can the written as 0 ( 0 ) 1 0 0 ( 0 ) 1 0 0 b0 b 0 ESS TSS 2 The regression () is not always the most convenient. Often, alternative auxiliary regressions are used. 31 of 32 Examples of LM tests in a linear regression based on 2 from auxiliary regressions: Test for omitted variables,.runtheregression b 0 + b 0 + residual, or b 0 + 0 + residual. Breuch-Godfrey test for no first order autocorrelation. Run the regression b b 1 + 0 + residual. Breuch-Pagan test for no heteroscedasticity. Run the regression b 2 0 + residual. 32 of 32