The outline for Unit 3

Similar documents
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Statistics and econometrics

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Introduction to Estimation Methods for Time Series models Lecture 2

simple if it completely specifies the density of x

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Econometrics of Panel Data

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Economics 583: Econometric Theory I A Primer on Asymptotics

ECON 5350 Class Notes Nonlinear Regression Models

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 583 Final Exam Fall 2008

Sampling distribution of GLM regression coefficients

Econometrics II - EXAM Answer each question in separate sheets in three hours

8. Hypothesis Testing

Ch. 5 Hypothesis Testing

Econometrics. Andrés M. Alonso. Unit 1: Introduction: The regression model. Unit 2: Estimation principles. Unit 3: Hypothesis testing principles.

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Greene, Econometric Analysis (6th ed, 2008)

Advanced Econometrics

Solutions for Econometrics I Homework No.3

Instrumental Variables

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Topic 3: Inference and Prediction

Advanced Quantitative Methods: maximum likelihood

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Lecture 3: Multiple Regression

Advanced Quantitative Methods: maximum likelihood

Spatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Topic 3: Inference and Prediction

Analysis of Cross-Sectional Data

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Some General Types of Tests

Generalized Method of Moment

Maximum Likelihood (ML) Estimation

Economics 582 Random Effects Estimation

A Very Brief Summary of Statistical Inference, and Examples

Quick Review on Linear Multiple Regression

Functional Form. Econometrics. ADEi.

Lecture 6: Hypothesis Testing

Introductory Econometrics

General Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35

Lecture 4: Testing Stuff

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Topic 5: Non-Linear Relationships and Non-Linear Least Squares

Chapter 7. Hypothesis Testing

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Econometrics of Panel Data

Understanding Regressions with Observations Collected at High Frequency over Long Span

Introduction to Estimation Methods for Time Series models. Lecture 1

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Specification testing in panel data models estimated by fixed effects with instrumental variables

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Chapter 1. GMM: Basic Concepts

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Graduate Econometrics I: Maximum Likelihood I

Empirical Economic Research, Part II

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Chapter 4. Theory of Tests. 4.1 Introduction

Instrumental Variables, Simultaneous and Systems of Equations

Standard Linear Regression Model (SLM)

Bootstrap Testing in Nonlinear Models

The Standard Linear Model: Hypothesis Testing

Analysis of Cross-Sectional Data

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Ch 3: Multiple Linear Regression

Linear Models Review

Ch 2: Simple Linear Regression

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Testing and Model Selection

13.2 Example: W, LM and LR Tests

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Generalized Linear Models

Advanced Econometrics I

Estimation, Inference, and Hypothesis Testing

Multiple Linear Regression

Econometrics of Panel Data

A Primer on Asymptotics

Inference Based on the Wild Bootstrap

Testing Linear Restrictions: cont.

[y i α βx i ] 2 (2) Q = i=1

Generalized Linear Models. Kurt Hornik

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved.

A Very Brief Summary of Statistical Inference, and Examples

Brief Suggested Solutions

QED. Queen s Economics Department Working Paper No Russell Davidson Queen s University. James G. MacKinnon Queen s University

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

DA Freedman Notes on the MLE Fall 2003

Inference about the Indirect Effect: a Likelihood Approach

MS&E 226: Small Data

Regression: Lecture 2

Lecture 17: Likelihood ratio and asymptotic tests

Review and continuation from last week Properties of MLEs

Transcription:

The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test. 3.4 Comparison between LR, Wald and LM tests. Addendum A. Hypothesis testing using bootstrap. Unit 4: Heteroscedasticity in the regression model. Unit 5: Endogeneity of regressors.

Economic restrictions 2 In many cases, economic theory suggests restrictions on the parameters of a model. For example, a demand function is supposed to be homogeneous of degree zero in prices and income. If we have a Cobb-Douglas model, then we need that ln q = β 0 + β 1 ln p 1 + β 2 ln p 2 + β 3 ln m + ε, k 0 ln q = β 0 + β 1 ln kp 1 + β 2 ln kp 2 + β 3 ln km + ε, so, the only way to guarantee this for arbitrary k is to set which is a parameter restriction. β 1 + β 2 + β 3 = 0,

Linear restrictions - 1 3 The general formulation of linear equality restrictions is the model y = Xβ + ε Rβ = r where R is a Q K matrix, Q < K and r is a Q 1 vector of constants. We assume R is of rank Q, so that there are no redundant restrictions. We also assume that β that satisfies the restrictions: they aren t infeasible.

Linear restrictions - 2 4 Let s consider how to estimate β subject to the restrictions Rβ = r. The most obvious approach is to set up the Lagrangean min β s(β) = 1 n (y Xβ) (y Xβ) + 2λ (Rβ r). The Lagrange multipliers are scaled by 2, which makes things less messy. The fonc are D β s( ˆβ, ˆλ) = 2X y + 2X X ˆβ R + 2R ˆλ 0 D λ s( ˆβ, ˆλ) = R ˆβ R r 0, which can be written as [ X X R R 0 ] [ ˆβRˆλ ] = [ X y r ].

Linear restrictions - 3 5 We get So, [ ˆβRˆλ ] = [ X X R R 0 ] 1 [ X y r ]. [ ˆβRˆλ ] = = = [ (X X) 1 (X X) 1 R P 1 R (X X) 1 (X X) 1 R P 1 P 1 R (X X) 1 ˆβ ( (X X) 1 R P 1 R ˆβ ) r P (R 1 ˆβ ) r [ ( IK (X X) 1 R P 1 R ) ] [ (X ˆβ P 1 + X) 1 R P 1 r R P 1 r P 1 ] ] [ X y r ] where P = R(X X) 1 R.

Linear restrictions - 4 6 If the number of restrictions is small, we can impose them by substitution and write y = X 1 β 1 + X 2 β 2 + ε [ ] [ ] β R1 R 1 2 = r β 2 where R 1 is Q Q nonsingular. Supposing the Q restrictions are linearly independent, one can always make R 1 nonsingular by reorganizing the columns of X. Then β 1 = R1 1 r R 1 1 R 2β 2. Substitute this into the model y = X 1 R 1 1 r X 1R 1 1 R 2β 2 + X 2 β 2 + ε y X 1 R 1 1 r = [ X 2 X 1 R 1 1 R 2] β2 + ε or with the appropriate definitions, y R = X R β 2 + ε.

Linear restrictions - 5 7 This model satisfies the classical assumptions, supposing the restriction is true. One can estimate by OLS. The variance of ˆβ 2 is V ( ˆβ 2 ) = (X R X R) 1 σ 2 0 and the estimator is ˆV ( ˆβ 2 ) = (X R X R) 1 ˆσ 2 where one estimates σ 2 0 in the normal way, using the restricted model, i.e., σ 2 0 = ( y R X R β2 ) ( y R X R β2 ) n (K Q) To recover ˆβ 1, use the restriction. To find the variance of ˆβ 1, use the fact that it is a linear function of ˆβ 2, so V ( ˆβ 1 ) = R 1 1 R 2V ( ˆβ 2 )R 2 ( R 1 = R 1 1 R 2 (X 2X 2 ) 1 R 2 1 ) ( ) R 1 1 σ 2 0

Properties of the restricted estimator - 1 8 We have that ˆβ R = ˆβ ( (X X) 1 R P 1 R ˆβ ) r = ˆβ + (X X) 1 R P 1 r (X X) 1 R P 1 R(X X) 1 X y = β + (X X) 1 X ε + (X X) 1 R P 1 [r Rβ] (X X) 1 R P 1 R(X X) 1 X ε ˆβ R β = (X X) 1 X ε + (X X) 1 R P 1 [r Rβ] (X X) 1 R P 1 R(X X) 1 X ε Mean squared error is MSE( ˆβ R ) = E( ˆβ R β)( ˆβ R β)

Properties of the restricted estimator - 2 9 Noting that the crosses between the second term and the other terms expect to zero, and that the cross of the first and third has a cancellation with the square of the third, we obtain MSE( ˆβ R ) = (X X) 1 σ 2 + (X X) 1 R P 1 [r Rβ] [r Rβ] P 1 R(X X) 1 (X X) 1 R P 1 R(X X) 1 σ 2 So, the first term is the OLS covariance and If the restriction is true, the second term is 0, so we are better off. True restrictions improve efficiency of estimation. If the restriction is false, we may be better or worse off, in terms of MSE, depending on the magnitudes of r Rβ and σ 2.

Testing: t-test 10 Suppose one has the model y = Xβ + ε and one wishes to test the single restriction H 0 :Rβ = r vs. H A :Rβ r. Under H 0, with normality of the errors, R ˆβ r N ( 0, R(X X) 1 R σ0 2 ) so R ˆβ r R(X X) 1 R σ 2 0 = R ˆβ r N (0, 1). σ 0 R(X X) 1 R The problem is that σ 2 0 is unknown. One could use the consistent estimator σ 2 0 in place of σ2 0, but the test would only be valid asymptotically in this case.

Testing: t-test - 2 11 Proposition 1: N(0, 1) t(q) (1) χ 2 (q) q as long as the N(0, 1) and the χ 2 (q) are independent. Proposition 2: If x N(µ, I n ) is a vector of n independent r.v. s., then x x χ 2 (n, λ) (2) where λ = i µ2 i = µ µ is the noncentrality parameter. When a χ 2 r.v. has the noncentrality parameter equal to zero, it is referred to as a central χ 2 r.v., and it s distribution is written as χ 2 (n), suppressing the noncentrality parameter.

Testing: t-test - 3 12 Proposition 3: If the n dimensional random vector x N(0, V ), then x V 1 x χ 2 (n). Proof: Factor V 1 as P P (this is the Cholesky factorization). Then consider y = P x. We have y N(0, P V P ) but V P P = I n P V P P = P so P V P = I n and thus y N(0, I n ). Thus y y χ 2 (n) but y y = x P P x = xv 1 x.

Testing: t-test - 4 13 A more general proposition which implies this result is Proposition 4: If the n dimensional random vector x N(0, V ), then if and only if BV is idempotent. An immediate consequence is x Bx χ 2 (ρ(b)) (3) Proposition 5: If the random vector (of dimension n) x N(0, I), and B is idempotent with rank r, then x Bx χ 2 (r). (4)

Testing: t-test - 5 14 Application: ˆε ˆε σ 2 0 = ε M X ε σ 2 0 = ( ) ( ) ε ε M X σ 0 σ 0 χ 2 (n K) Proposition 6: If the random vector (of dimension n) x N (0, I), then Ax and x Bx are independent if AB = 0. Now consider (remember that we have only one restriction in this case) R ˆβ r σ 0 R(X X) 1 R ˆε ˆε (n K)σ 2 0 = R ˆβ r σ 0 R(X X) 1 R This will have t(n K) distribution if ˆβ and ˆε ˆε are independent.

Testing: t-test - 6 15 But ˆβ = β + (X X) 1 X ε and (X X) 1 X M X = 0, so R ˆβ r σ 0 R(X X) 1 R = R ˆβ r ˆσ R ˆβ t(n K) In particular, for the commonly encountered test of significance of an individual coefficient, for which H 0 : β i = 0 vs. H 0 : β i 0, the test statistic is ˆβ i /ˆσ ˆβi t(n K) Note: the t test is strictly valid only if the errors are normally distributed. If one has nonnormal errors, one could use the above asymptotic result to justify taking critical values from the N (0, 1) distribution, since t(n K) d N (0, 1) as n. In practice, a conservative procedure is to take critical values from the t distribution if nonnormality is suspected. This will reject H 0 less often since the t distribution is fatter-tailed than is the normal.

Testing: F-test 16 The F test allows testing multiple restrictions jointly. Proposition 7: If x χ 2 (r) and y χ 2 (s), then x/r y/s F (r, s) (5) provided that x and y are independent. Proposition 8: If the random vector (of dimension n) x N(0, I), then x Ax and x Bx are independent if AB = 0. Using these results, and previous results on the χ 2 distribution, it is simple to show that the following statistic has the F distribution: F = ( R ˆβ ) ) 1 ( r (R (X X) 1 R R ˆβ ) r qˆσ 2 F (q, n K).

Testing: F-test - 2 17 A numerically equivalent expression is (ESS R ESS U ) /q ESS U /(n K) F (q, n K). Note: The F test is strictly valid only if the errors are truly normally distributed. The following tests will be appropriate when one cannot assume normally distributed errors.

Testing: Wald-test 18 The Wald principle is based on the idea that if a restriction is true, the unrestricted model should approximately satisfy the restriction. Given that the least squares estimator is asymptotically normally distributed: then under H 0 : Rβ 0 = r, we have so by Proposition 5 we have n ( ) d ( n ˆβ β0 N 0, σ 2 0 Q 1 ) X ( n R ˆβ ) d ( r N 0, σ 2 0 RQ 1 X R ) ( R ˆβ ) ( ( r σ 2 0 RQ 1 X R ) 1 R ˆβ d r) χ 2 (q)

Testing: Wald-test - 2 19 Note that Q 1 X or σ2 0 are not observable. The test statistic we use substitutes the consistent estimators. Use (X X/n) 1 as the consistent estimator of Q 1 X. With this, there is a cancellation of n s, and the statistic to use is ( R ˆβ ) r ( σ2 0 R(X X) 1 R ) 1 ( R ˆβ d r) χ 2 (q) The Wald test is a simple way to test restrictions without having to estimate the restricted model. Note that this formula is similar to one of the formulae provided for the F test.

Testing: Score-test 20 In some cases, an unrestricted model may be nonlinear in the parameters, but the model is linear in the parameters under the null hypothesis. For example, the model y = (Xβ) γ + ε is nonlinear in β and γ, but is linear in β under H 0 : γ = 1. Estimation of nonlinear models is a bit more complicated, so one might prefer to have a test based upon the restricted, linear model. The score test is useful in this situation. Score-type tests are based upon the general principle that the gradient vector of the unrestricted model, evaluated at the restricted estimate, should be asymptotically normally distributed with mean zero, if the restrictions are true. The original development was for ML estimation, but the principle is valid for a wide variety of estimation methods.

Testing: Score-test - 2 21 We have seen that ˆλ = ( R(X X) 1 R ) ( 1 R ˆβ ) r ( = P 1 R ˆβ ) r Given that ( n R ˆβ ) d ( r N 0, σ 2 0 RQ 1 X R ) under the null hypothesis, nˆλ d N ( 0, σ 2 0 P 1 RQ 1 X R P 1) or d nˆλ N (0, σ0 2 lim n (np ) 1 RQ 1 X R P 1) since the n s cancel and inserting the limit of a matrix of constants changes nothing.

Testing: Score-test - 3 22 However, So there is a cancellation and we get lim np = lim nr(x X) 1 R = ( X ) 1 X lim R R n = RQ 1 X R nˆλ d N ( 0, σ 2 0 lim np 1) In this case, ( R(X ˆλ X) 1 R ) ˆλ d χ 2 (q) since the powers of n cancel. consistent estimator of σ0. 2 σ 2 0 To get a usable test statistic substitute a

Testing: Score-test - 4 23 This makes it clear why the test is sometimes referred to as a Lagrange multiplier test. It may seem that one needs the actual Lagrange multipliers to calculate this. If we impose the restrictions by substitution, these are not available. Note that the test can be written as ( R ˆλ) (X X) 1 R ˆλ σ 2 0 d χ 2 (q) However, we can use the fonc for the restricted estimator: X y + X X ˆβ R + R ˆλ = 0 to get that R ˆλ = X (y X ˆβ R ) = X ˆε R. Substituting this into the above, we get ˆε R X(X X) 1 X ˆε R d χ 2 (q) but this is simply ˆε P X d σ0 2 R ˆε σ0 2 R χ 2 (q).

Testing: Score-test - 5 24 To see why the test is also known as a score test, note that the fonc for restricted least squares X y + X X ˆβ R + R ˆλ = 0 give us R ˆλ = X y X X ˆβ R and the rhs is simply the gradient (score) of the unrestricted model, evaluated at the restricted estimator. The scores evaluated at the unrestricted estimate are identically zero. The logic behind the score test is that the scores evaluated at the restricted estimate should be approximately zero, if the restriction is true. The test is also known as a Rao test, since P. Rao first proposed it in 1948.

Testing: Likelihood ratio-type tests 25 The Wald test can be calculated using the unrestricted model. The score test can be calculated using only the restricted model. The likelihood ratio test, on the other hand, uses both the restricted and the unrestricted estimators. The test statistic is ( ) LR = 2 ln L(ˆθ) ln L( θ) where ˆθ is the unrestricted estimate and θ is the restricted estimate. To show that it is asymptotically χ 2, take a second order Taylor s series expansion of ln L( θ) about ˆθ : ln L( θ) ln L(ˆθ) + n ) ( θ ˆθ) H(ˆθ) ( θ ˆθ 2 (note, the first order term drops out since D θ ln L(ˆθ) 0 by the fonc and we need to multiply the second-order term by n since H(θ) is defined in terms of 1 n ln L(θ)) so ) LR n ( θ ˆθ H(ˆθ) ( θ ˆθ)

Testing: Likelihood ratio-type tests - 2 26 As n, H(ˆθ) H (θ 0 ) = I(θ 0 ), by the information matrix equality. So LR = a ) n ( θ ˆθ) I (θ 0 ) ( θ ˆθ We also have that n (ˆθ θ0 ) a= I (θ 0 ) 1 n 1/2 g(θ 0 ). An analogous result for the restricted estimator is: n ( θ θ0 ) a= I (θ 0 ) 1 ( I n R ( RI (θ 0 ) 1 R ) 1 RI (θ 0 ) 1) n 1/2 g(θ 0 ). Combining the last two equations ) a= n ( θ ˆθ n 1/2 I (θ 0 ) 1 R ( RI (θ 0 ) 1 R ) 1 RI (θ 0 ) 1 g(θ 0 )

Testing: Likelihood ratio-type tests - 3 27 So, LR a = [ n 1/2 g(θ 0 ) I (θ 0 ) 1 R ] [ RI (θ 0 ) 1 R ] [ ] 1 RI (θ 0 ) 1 n 1/2 g(θ 0 ) But since the linear function n 1/2 g(θ 0 ) d N (0, I (θ 0 )) RI (θ 0 ) 1 n 1/2 g(θ 0 ) d N(0, RI (θ 0 ) 1 R ). We can see that LR is a quadratic form of this rv, with the inverse of its variance in the middle, so LR d χ 2 (q).

28 The asymptotic equivalence of LR, Wald and score tests We have seen that the three tests all converge to χ 2 random variables. In fact, they all converge to the same χ 2 rv, under the null hypothesis. We ll show that the Wald and LR tests are asymptotically equivalent. We have seen that the Wald test is asymptotically equivalent to W a = n ( R ˆβ ) ( ( r σ 2 0 RQ 1 X R ) 1 R ˆβ d r) χ 2 (q) Using and ˆβ β 0 = (X X) 1 X ε R ˆβ r = R( ˆβ β 0 )

29 The asymptotic equivalence of LR, Wald and score tests - 2 We get nr( ˆβ β0 ) = nr(x X) 1 X ε = R ( X X n Substitute this into Wald statistics we get W a = n 1 ε XQ 1 X R ( σ 2 0RQ 1 X R ) 1 RQ 1 X X ε ) 1 n 1/2 X ε a = ε X(X X) 1 R ( σ 2 0R(X X) 1 R ) 1 R(X X) 1 X ε a = ε A(A A) 1 A ε σ0 2 a = ε P R ε σ0 2 where P R is the projection matrix formed by the matrix X(X X) 1 R. Note that this matrix is idempotent and has q columns, so the projection matrix has rank q.

30 The asymptotic equivalence of LR, Wald and score tests - 3 Now consider the likelihood ratio statistic LR a = n 1/2 g(θ 0 ) I(θ 0 ) 1 R ( RI(θ 0 ) 1 R ) 1 RI(θ0 ) 1 n 1/2 g(θ 0 ) Under normality, we have seen that the likelihood function is Using this, ln L(β, σ) = n ln 2π n ln σ 1 (y Xβ) (y Xβ) 2 σ 2. g(β 0 ) D β 1 n ln L(β, σ) = X (y Xβ 0 ) nσ 2 = X ε nσ 2

31 The asymptotic equivalence of LR, Wald and score tests - 4 Also, by the information matrix equality: I(θ 0 ) = H (θ 0 ) = lim D β g(β 0 ) X (y Xβ 0 ) = lim D β nσ 2 = lim X X nσ 2 = Q X σ 2 so I(θ 0 ) 1 = σ 2 Q 1 X

32 The asymptotic equivalence of LR, Wald and score tests - 5 Substituting these last expressions into LR test statistics, we get LR a = ε X (X X) 1 R ( σ 2 0R(X X) 1 R ) 1 R(X X) 1 X ε a = ε P R ε σ0 2 a = W This completes the proof that the Wald and LR tests are asymptotically equivalent. Similarly, one can show that, under the null hypothesis, qf a = W a = LM a = LR

33 The asymptotic equivalence of LR, Wald and score tests - 6 The proof for the statistics except for LR does not depend upon normality of the errors, as can be verified by examining the expressions for the statistics. The LR statistic is based upon distributional assumptions, since one can t write the likelihood function without them. However, due to the close relationship between the statistics qf and LR, supposing normality, the qf statistic can be thought of as a pseudo-lr statistic, in that it s like a LR statistic in that it uses the value of the objective functions of the restricted and unrestricted models, but it doesn t require distributional assumptions. The presentation of the score and Wald tests has been done in the context of the linear model. This is readily generalizable to nonlinear models and/or other estimation methods.

Testing nonlinear restrictions 34 Testing nonlinear restrictions of a linear model is not much more difficult, at least when the model is linear. Since estimation subject to nonlinear restrictions requires nonlinear estimation methods, which are beyond the score of this course, we ll just consider the Wald test for nonlinear restrictions on a linear model. Consider the q nonlinear restrictions r(β 0 ) = 0. where r( ) is a q-vector valued function. Write the derivative of the restriction evaluated at β as D β r(β) β = R(β)

Testing nonlinear restrictions - 2 35 We suppose that the restrictions are not redundant in a neighborhood of β 0, so that ρ(r(β)) = q in a neighborhood of β 0. Take a first order Taylor s series expansion of r( ˆβ) about β 0 : r( ˆβ) = r(β 0 ) + R(β )( ˆβ β 0 ) where β is a convex combination of ˆβ and β 0. Under the null hypothesis we have r( ˆβ) = R(β )( ˆβ β 0 ) Due to consistency of ˆβ we can replace β by β 0, asymptotically, so nr( ˆβ) a = nr(β0 )( ˆβ β 0 ) We ve already seen the distribution of n( ˆβ β 0 ). Using this we get nr( ˆβ) d N ( 0, R(β0 )Q 1 X R(β 0) σ 2 0).

Testing nonlinear restrictions - 3 36 Considering the quadratic form nr( ˆβ) ( R(β 0 )Q 1 X R(β 0) ) 1 r( ˆβ) σ 2 0 d χ 2 (q) under the null hypothesis. Substituting consistent estimators for β 0, Q X and σ 2 0, the resulting statistic is r( ˆβ) (R( ˆβ)(X X) 1 R( ˆβ) ) 1 r( ˆβ) σ 2 d χ 2 (q). This is known in the literature as the Delta method, or as Klein s approximation. Since this is a Wald test, it will tend to over-reject in finite samples. The score and LR tests are also possibilities, but they require estimation methods for nonlinear models, which aren t in the scope of this course.

Empirical comparison of LM, Wald and score tests 37 Example 1: Lets assume that the DGP satisfy a ln C = β 1 + β 2 ln Q + β 3 ln P L + β 4 ln P F + β 5 ln P K + ɛ where the variables: are COST (C), OUTPUT (Q), PRICE OF LABOR (P L ), PRICE OF FUEL (P F ) and PRICE OF CAPITAL (P K ). The following restriction are imposed: It verify the property of HOD1, i.e., 5 i=3 β i = 1. It verify the property of CRTS technology, i.e., γ = 1 β q = 1 Compare the Wald and score tests.

Empirical comparison of LM, Wald and score tests 38 Example 1 (cont.): HOD1. 1 Sample size = 25 1 Sample size = 25 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 1 Sample size = 1000 1 Sample size = 1000 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1

Empirical comparison of LM, Wald and score tests 39 Example 1 (cont.): CRTS. 1 Sample size = 25 1 Sample size = 25 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 1 Sample size = 1000 1 Sample size = 1000 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1

Empirical comparison of LM, Wald and score tests 40 Example 1 (cont.): Estimated sizes. 0.09 0.08 0.07 0.06 0.05 0.04 0 100 200 300 400 500 600 700 800 900 1000 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0 100 200 300 400 500 600 700 800 900 1000