(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Similar documents
Statistics and econometrics

Ch. 5 Hypothesis Testing

A Very Brief Summary of Statistical Inference, and Examples

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Greene, Econometric Analysis (6th ed, 2008)

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

8. Hypothesis Testing

The outline for Unit 3

Introduction to Estimation Methods for Time Series models Lecture 2

ECON 5350 Class Notes Nonlinear Regression Models

POLI 8501 Introduction to Maximum Likelihood Estimation

Some General Types of Tests

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

A Very Brief Summary of Statistical Inference, and Examples

Likelihood-Based Methods

Econometrics II - EXAM Answer each question in separate sheets in three hours

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Answer Key for STAT 200B HW No. 8

Econ 583 Final Exam Fall 2008

Advanced Quantitative Methods: maximum likelihood

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Advanced Quantitative Methods: maximum likelihood

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Maximum Likelihood (ML) Estimation

Model comparison and selection

Outline of GLMs. Definitions

13.2 Example: W, LM and LR Tests

Economics 583: Econometric Theory I A Primer on Asymptotics

Exercises Chapter 4 Statistical Hypothesis Testing

MS&E 226: Small Data

Testing Restrictions and Comparing Models

Generalized Method of Moment

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Bootstrap Testing in Nonlinear Models

Lecture 10: Generalized likelihood ratio test

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Quick Review on Linear Multiple Regression

Likelihood Ratio tests

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Statistics 3858 : Maximum Likelihood Estimators

Lecture 6: Hypothesis Testing

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Graduate Econometrics I: Maximum Likelihood I

Sampling distribution of GLM regression coefficients

Advanced Econometrics

Asymptotics for Nonlinear GMM

Multiple Regression Analysis

Link lecture - Lagrange Multipliers

MEI Exam Review. June 7, 2002

Econometrics of Panel Data

ECONOMETRICS I. Assignment 5 Estimation

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Machine Learning 2017

2014/2015 Smester II ST5224 Final Exam Solution

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Statistical Estimation

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

ECON 4160, Autumn term Lecture 1

STAT5044: Regression and Anova

Chapter 7. Hypothesis Testing

A note on profile likelihood for exponential tilt mixture models

Linear Methods for Prediction

Lecture 3 September 1

DA Freedman Notes on the MLE Fall 2003

Problem Selected Scores

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

Modelling Ireland s exchange rates: from EMS to EMU

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

CHAPTER 1: BINARY LOGIT MODEL

Chapter 4. Theory of Tests. 4.1 Introduction

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Association studies and regression

1.5 Testing and Model Selection

Central Limit Theorem ( 5.3)

simple if it completely specifies the density of x

Maximum-Likelihood Estimation: Basic Ideas

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Introductory Econometrics

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Likelihood-based inference with missing data under missing-at-random

A Primer on Asymptotics

Testing and Model Selection

Generalized Linear Models

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Answers to Problem Set #4

Loglikelihood and Confidence Intervals

Empirical Likelihood

Functional Form. Econometrics. ADEi.

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Inference about the Indirect Effect: a Likelihood Approach

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

ECE 275A Homework 7 Solutions

Transcription:

Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative matrix, and θ is the solution of this set of equations. Rewriting the above, θ = θ H θθ (θ ) 1 d θ (θ ). This turns out to be an updating formula of the estimates of θ. Notice if θ solves = 0, then (θ ) = (θ ) = d θ (θ ) = 0 and thus θ = θ. This suggest if θ θ 0, we need to iterate the formula, θ = θ H θ (θ ) 1 d θ (θ ) The updating sequence will not terminate until d θ (θ ) 0. More generally, ˆθ (n) = ˆθ (n 1) H θθ (ˆθ (n 1) ) 1 d θ (ˆθ (n 1) ) where ˆθ (n) is the estimates of θ at the end of nth iterations. 5 5 Accordingtoournotations,ˆθ 0 = θ, ˆθ1 = θ, ˆθ2 = θ 15

2. Scoring This method is to replace the Hessian matrix by the Fisher information matrix. Denote the information matrix for sample size T by F θθ,t. Because F θθ,t = E( 2 L ) it suggests replacing ( H θθ )byf θθ,t. In other words, the updating formula becomes ˆθ n = ˆθ (n 1) + F θθ,t (ˆθ (n 1) ) 1 d θ (ˆθ (n 1) ) Why should we do this? Some of times, F θθ,t is easier to compute than the Hessian matrix, because there are probably less elements to compute, for example F βσ 2,T (θ) =0 in the previous regression model, and also because this use information about the question we study. There is a variant of scoring 6 that deserves mentioning. Note that With LLN, we would expect F θθ,t = TE( ). T t t (ˆθ (n 1) ) t (ˆθ (n 1) ) F θθ,t (ˆθ (n 1) ) p F θθ (θ) where t ln ft =. 7 The above computation proposal simply shows that Econometrics is not just computer science. Taking advantage of the information at hand makes the problem solving easier. Now we look at an important case of nonlinear regression model where y t = g(x t,β)+u t,u t nid(0,σ 2 ) where x t areassumedtobefixed.sotheloglikelihoodare L(y 1,,y T θ) = 1 2 log 2π 1 2 log σ2 1 2Tσ 2 (yt g(x t,β)) 2. A bit calculations lead to β = 1 (yt g(x Tσ 2 t,β)) g(x t,β), β σ 2 = 1 2σ 2 + 1 2σ 4 T (yt g(x t,β)) 2 2 L β β = 1 σ 2 T g(xt,β) g(x t,β) + 1 (yt g(x β β Tσ 2 t,β)) 2 g(x t,β) β β 6 This is how LIMDEP, an econometric software, does. 7 The reason of dividing by T instead of times by T is that we have multiplied 1 T in log likelihood. 16

2 L β σ = 1 (yt g(x 2 Tσ 4 t,β)) g(x t,β), β Take expectation to get F θθ,t,where 2 L σ 2 σ 2 = 1 2σ 4 1 σ 6 T (yt g(x t,β)) 2 F ββ,t = 1 g(xt,β) Tσ 2 β The first relation is due to g(x t,β) β,f βσ 2,T =0,F σ 2 σ 2,T = 1 2σ 4. E((y t g(x t,β)) 2 g(x t,β) )= 2 g(x t,β) E(y β β β β t g(x t,β)) = 0 The scoring algorithm is [ ] [ ] ˆβn ˆβ(n 1) = + ˆσ 2 n ˆσ 2 (n 1) [ Fββ,T (ˆθ (n 1) ) F βσ 2,T (ˆθ (n 1) ) F σ 2 β,t(ˆθ (n 1) ) F σ 2 σ 2,T (ˆθ (n 1) ) Because F βσ 2,T (ˆθ (n 1) )=0=F σ 2 β,t(ˆθ (n 1) ). Therefore, ˆβ n = ˆβ (n 1) + F 1 ββ,t (ˆθ n 1 ) (ˆθ β n 1 ) = ˆβ 1 (n 1) +[ T ˆσ 2 (n 1) ĝt β ĝ t β ] 1 [ 1 T ˆσ 2 (n 1) ĝ t β ] 1 [ (y t g(x t, ˆβ (n 1) )) ĝt β ] = ˆβ (n 1) +[ ĝ t β = ˆβ (n 1) +[ z tz t ] 1 z t(y t g(x t, ˆβ (n 1) )) ] 1 [ (ˆθ ] β (n 1) ) (ˆθ σ 2 (n 1) ) T t=1 (y t g(x t, ˆβ (n 1) )) ĝt β ] where ĝt = g(xt,β) β β ˆβ(n 1) = z t. There is an regression explanation emerging from the derivation. Note that [ z tz t ] 1 z t(y t g(x t, ˆβ (n 1) )) is the estimated coefficients in the regression of y t g(x t, ˆβ (n 1) ) against z t. 8 In general, the updating procedure is an iterative least squares estimation. This is the so-called Gauss-Newton algorithm. In this case, the ML estimation amounts to nonlinear regression estimation, if the problem can be formulated in the way we see before. The Gauss-Newton algorithm is one of the simplest and most effective way of maximizing the likelihood. 7 Wald, LM, and LR tests: the trinity One of major goal for econometric exercises is to do inference from the observed data. Based on the MLE results, generally there are 3 important testing strategies we can undertake. 8 To see how this interpretation comes from, carefully compare the following simple regression model y t = x t β + e t, ˆβOLS =( x t x t) 1 ( x t y t) with the nonlinear regression where y t g(x t, ˆβ (n 1) ) plays the role of dependent variables as y t in the simple regression, and z t as the independent variables, x t. 17

They differ from each other by whether the restrictions under tests has been taken into account in the testing. In what follows, we will present them in order. Suppose we like to test if the true parameters, θ 0, satisfying the restrictions H 0 : Rθ 0 = r To have a concrete idea of what R and r look like, suppose we want to test if the production function is a Cobb-Douglass where the parameters should meet the constraint α 0 + β 0 =1. In correspondence to this example, R =[1, 1], θ 0 = [ α 0 ] β 0,r= 1. More generally, R can be a matrix and r be a vector, when more than one constraint are jointly under test. 7.1 Wald test The Wald test does not use information about the null hypothesis when forming the statistics. Suppose ˆθ is the estimates of θ 0, and possibly is estimated by maximum likelihood. So if the data is really drawn from the null, we should expect ˆθ is not much different from the true value θ 0. As a result, under the null of the hypothesis being tested, we expect Rˆθ r 0ifRθ 0 = r as ˆθ θ 0 ) So this is a subject that we can use to discriminate the null hypothesis from the alternative counterpart. This is because if the data is not drawn from the null, instead, Rˆθ r will be different from 0. The more the the difference shows, the stronger evidence is against the null. The question now is how we can employ this notion to construct an appropriate statistics that is powerful against the alternative. To do this with the Wald test, note that Rˆθ r = Rˆθ Rθ 0 + Rθ 0 r = R(ˆθ θ 0 ) where under H 0, Rθ 0 r = 0. However, the quantity R(ˆθ θ 0 ) is a random variable with some distribution. To investigate what the distribution is, observe the asymptotic normality for the MLE that T 1/2 (ˆθ θ 0 ) d N(0,F 1 θθ (θ 0)). Since R(ˆθ θ 0 )isjustalineartransformationofˆθ θ 0 ), it is straightforward to show that T 1/2 R(ˆθ θ 0 ) d N(0,RF 1 θθ (θ 0)R ) which gives the distribution we desired. It is natural to form the statistics W T (Rˆθ r) (RF 1 θθ (θ 0)R ) 1 (Rˆθ r) d χ 2 (dim(r)) 18

The asymptotic result comes from the facts that T 1/2 R(ˆθ θ 0 ) is normal, and that RF 1 θθ (θ 0)R is the scaling covariance. 9 It should be emphasized that in the Wald test statistics, the parameters of concern, ˆθ, are estimated without the information of the restrictions from the null. It is an unrestricted version of MLE. The Wald test statistics in fact are infeasible because F 1 θθ (θ 0) involves unknown parameters, θ 0. The test can be made feasible by replacing θ 0 with its consistent counterparts 10 ˆθ. So the test statistics takes the form W T (Rˆθ r) (RF 1 θθ,t (ˆθ)R ) 1 (Rˆθ r) The test statistics behave quite different under the alternative hypothesis where the constraint Rθ 0 r = 0 does not hold. Under the alternative, Rˆθ r is not close to 0, and the Wald test statistics is asymptotically a non-central χ 2 distribution. 7.2 LM test The major difference between the Lagrange multiplier (LM) test and the Wald test lies in whether the parameter estimates used are unrestricted or restricted. In the construction of the LM tests, the parameters are estimated with the information of the restrictions from the null. In this sense, the LM test statistics have lowest computation cost among the three test statistics under study. The restricted MLE is computed as follows: To solve the maximization question, we form max L(θ) subject to Rθ = r. L = L(θ)+λ (Rθ r) where λ is the Lagrange multiplier, and the LM test is a statistic based on this quantity. The first order conditions from the above are = (θ) + R λ =0, λ = Rθ r =0 Let ( θ, λ) be the solutions. where λ is an estimate of Lagrang multiplier. So ( θ)+r λ =0 9 Recall that if X N j (µ, Ω), then (X µ) Ω 1 (X µ) χ 2 (j). 10 The unrestricted MLE is consistent under both the null and the alternative. 19

Now under the null where the restrictions Rθ r = 0 are valid, imposition of the restrictions should therefore little change the likelihood. This implies that the Lagrange multiplier ( λ) should be very small, and thus ( θ) 0if λ 0 Thus, testing if λ = 0 is equivalent to testing if ( θ) =0.Buttestingif λ = 0 is really testing whether the restriction imposed on the estimation is correct. We can employ ( θ) =0to build test statistics. So the LM test statistics take the form LM T ( θ) F 1 θθ,t ( θ) ( θ) d χ 2 (dim(r)) Again under the H 0, the test statistics converge in distribution to χ distribution as the Wald test with degrees of freedom dim R. A few observations from the test statistics. First, it is natural to ask why is F θθ,t used as a scaling factor? To answer this, simply note that V ( )=E( )=F θθ,t. Since θ is unknown, again in practice replacing θ by θ would do the job. Second, why is there a T in front of the LM test? This is because we look at the average score when working on the maximization question, i,e. = 1 ln L. Furthermore, because the pdf is independent, T 1/2 it is easy to obtain the result that T N(0,F θθ). Collecting these arguments, the LM test statistics having an asymptotic χ 2 distribution is well expected. 7.3 LR test The LR test is another intuitive test statistics. It involves information from both restricted ML estimation and unrestricted one. Before spelling out the test statistics, first note that the ratio of the restricted likelihood to the unrestricted likelihood should be close to 1, under the null. In notations, λ = L ( θ) L (ˆθ), where θ and ˆθ are, respectively, restricted estimates and unrestricted estimates. The reason that this ratio is close to 1 under the null is that the restricted and unrestricted estimates are of similar magnitude, under the null. We have discussed this notion previously. Therefore, under H 0, θ ˆθ, and thus log λ 0. But to do inference, it is needed to know the distribution of the LM test statistics. Fortunately, such a result exists, as under H 0. 2logλ = 2[log L (ˆθ) log L ( θ)] = 2T [L(ˆθ) L( θ)] d χ 2 (dim(r)) 20

The LR test probably involve more computation cost than other alternative tests. This is because to compute the test statistics, both unrestricted and restricted ML estimation need to be performed first. Asymptotically, the three tests, Wald, LM and LR, are equivalent. But for a given data set, these tests differ from each other in small samples. Specifically we can obtain the order relation that LM < LR < W ald. That is, the LM test is most conservative, and the Wald test is most liberal. A figure that illustrates the difference among the 3 tests is contained below. L( θ) L( θ) LR φ(θ) =Rθ r L(θ) φ( θ) LM Wald 0 θ θ dl(θ)/dθ Figure 2: Three Asymptotically Equivalent Tests 8 Non-linear restrictions We now switch attention to the case of testing for nonlinear restrictions. An example of the nonlinear restriction could be β 1 β 2 β 3 =0 in contrast to the linear restriction β 1 + β 2 = 1 before. While the nonlinear restriction may appear to be quite difficult to deal with, testing for such restrictions turns out to be similar 21

to what we proceed in the linear case. In general, we are interested in testing for H 0 : φ(θ 0 )=0, where φ( ) = 0 is a known function. φ, though nonlinear, can be linearized by Taylor expansion. Suppose ˆθ is the unrestricted ML estimate. Taking Taylor expansion about φ(ˆθ) at θ 0, φ(ˆθ) =φ(θ 0 )+ φ (θ 0)(ˆθ θ 0 )+ 2 φ (θ 0)(ˆθ θ 2 0 ) 2 +. A bit calculation would lead to T 1/2 (φ(ˆθ) φ(θ 0 )) = φ (θ 0)T 1/2 (ˆθ θ 0 )+ 2 φ 2 (θ 0)[T 1/2 (ˆθ θ 0 ) 2 ]+ where φ(θ 0 ) = 0 under the null. Note that the second term on the RHS is asymptotically negligible because T 1/2 (ˆθ θ 0 ) d normal distribution and (ˆθ θ 0 ) p 0. Any higher terms are also asymptotically negligible by the same token. Therefore, asymptotically (in large sample) under the H 0, T 1/2 (φ(ˆθ)) φ (θ 0)T 1/2 (ˆθ θ 0 ) RT 1/2 (ˆθ θ 0 ) where R = φ (θ 0) is the first derivative matrix that contains constants as elements. The statistic now has the same expression as that in the linear case, except that R is the first derivative with respect to the parameters evaluated at θ 0 in the nonlinear case. Naturally, the Wald test is computed as W = T 1/2 (φ(ˆθ))var(φ(θ 0 )) 1 T 1/2 (φ(ˆθ)) Tφ(ˆθ)( ˆRF 1 θθ,t (ˆθ) ˆR ) 1 φ(ˆθ). The second equivalence is obtained by replacing unknown θ 0 with consistent estimates ˆθ. Equivalently, ˆR = φ (ˆθ). The aforementioned discussion concentrates on the Wald tests. Then, how to calculate both LM test and LR test under nonlinear restrictions? Because these 2 tests do not utilize the difference between the true parameters and the estimated counterpart to construct the statistic, calculating both tests remains the same except the restrictions of concern are nonlinear. 22