Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Similar documents
The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

The Simple Regression Model. Part II. The Simple Regression Model

3. Linear Regression With a Single Regressor

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Heteroskedasticity. Part VII. Heteroskedasticity

Statistical Inference. Part IV. Statistical Inference

Introduction to Estimation Methods for Time Series models Lecture 2

The Multiple Regression Model Estimation

Introductory Econometrics

Econometrics of Panel Data

Review of Econometrics

The outline for Unit 3

ECON3150/4150 Spring 2015

10. Time series regression and forecasting

Ch 3: Multiple Linear Regression

Multiple Linear Regression

Graduate Econometrics Lecture 4: Heteroskedasticity

ECON3150/4150 Spring 2016

Lab 07 Introduction to Econometrics

Empirical Economic Research, Part II

2. Linear regression with multiple regressors

Homoskedasticity. Var (u X) = σ 2. (23)

Ch 2: Simple Linear Regression

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Applied Econometrics (QEM)

Greene, Econometric Analysis (6th ed, 2008)

Lecture 3: Multiple Regression

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Intermediate Econometrics

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Brief Suggested Solutions

Multiple Regression Analysis

Answers to Problem Set #4

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

13.2 Example: W, LM and LR Tests

Brief Sketch of Solutions: Tutorial 3. 3) unit root tests

Simple Linear Regression

Testing and Model Selection

Heteroskedasticity and Autocorrelation

Greene, Econometric Analysis (7th ed, 2012)

Statistics and econometrics

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

6. Assessing studies based on multiple regression

Introduction to Estimation Methods for Time Series models. Lecture 1

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

x = 1 n (x = 1 (x n 1 ι(ι ι) 1 ι x) (x ι(ι ι) 1 ι x) = 1

Correlation and Regression

Advanced Econometrics I

Multivariate Regression Analysis

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

ECO220Y Simple Regression: Testing the Slope

Statistical Inference with Regression Analysis

4. Nonlinear regression functions

Simple Linear Regression: The Model

ECON The Simple Regression Model

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

Econometrics Multiple Regression Analysis: Heteroskedasticity

Instrumental Variables, Simultaneous and Systems of Equations

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Applied Statistics and Econometrics

Model Specification and Data Problems. Part VIII

Econometrics Homework 1

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

8. Hypothesis Testing

Practice Questions for the Final Exam. Theoretical Part

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Econometrics of Panel Data

Quick Review on Linear Multiple Regression

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Intermediate Econometrics

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Empirical Application of Simple Regression (Chapter 2)

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Least Squares Estimation-Finite-Sample Properties

Applied Regression Analysis

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Master s Written Examination

7. Integrated Processes

ECON 450 Development Economics

ECON3150/4150 Spring 2016

Multiple Linear Regression

CHAPTER 6: SPECIFICATION VARIABLES

Multiple Regression Analysis: Heteroskedasticity

11. Simultaneous-Equation Models

Inference for Regression

Formulary Applied Econometrics

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Multiple Regression Analysis

1 Independent Practice: Hypothesis tests for one parameter:

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section

Outline. 2. Logarithmic Functional Form and Units of Measurement. Functional Form. I. Functional Form: log II. Units of Measurement

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

MS&E 226: Small Data

Econometrics. 9) Heteroscedasticity and autocorrelation

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Transcription:

Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa

1. Introduction 1.1 Econometrics Econometrics is a discipline of statistics, specialized for using and developing mathematical and statistical tools for empirical estimation of economic relationships, testing economic theories, making economic predictions, and evaluating government and business policy. Data: Nonexperimental (observational) Major tool: Regression analysis (in wide sense) 1

1.2 Types of Economic Data (a) Cross-sectional Data collected at given point of time. E.g. a sample of households or firms, from each of which are a number of variables like turnover, operating margin, market value of shares, etc., are measured. From econometric point of view it is important that the observations consist a random sample from the underlying population. 2

(b) Time Series Data A time series consist of observations on a variable(s) over time. Typical examples are daily share prices, interest rates, CPI values. An important additional feature over crosssectional data is the ordering of the observations, which may convey important information. An additional feature is data frequency which may require special attention. 3

(c) Pooled Cross-sections Both time series and cross-section features. For example a number of firms are randomly selected, say in 1990, and another sample is selected in 2000. If in both samples the same features are measured, combining both years form a pooled cross-section data set. Pooled cross-section data is analyzed much the same way as usual cross-section data. However, it may be important to pay special attention to the fact that there are 10 years in between. Usually the interest is whether there are some important changes between the time points. Statistical tools are usually the same as those used for analysis of differences between two independently sampled populations. 4

(d) Panel Data Panel data (longitudinal data) consists of (time series) data for the same cross section units over time. Allows to analyze much richer dependencies than pure cross section data. Example 1.1: Job training data from Holzer et al. (1993) Are training subsidies effective? The Michigan experience, Industrial and Labor Relations Review 19, 625 636. Excerpt from the data: year fcode employ sales avgsal 1987 410032 100 4.70E+07 35000 1988 410032 131 4.30E+07 37000 1989 410032 123 4.90E+07 39000 1987 410440 12 1560000 10500 1988 410440 13 1970000 11000 1989 410440 14 2350000 11500 1987 410495 20 750000 17680 1988 410495 25 110000 18720 1989 410495 24 950000 19760 1987 410500 200 2.37E+07 13729 1988 410500 155 1.97E+07 14287 1989 410500 80 2.60E+07 15758 1987 410501. 6000000. 1988 410501. 8000000. 1989 410501. 1.00E+07. etc 5

1.3 The linear regression model The linear regression model is the single most useful tool in econometrics. Assumption: each observation i, i = 1,..., n is generated by the underlying process described by (1) y i = β 0 + β 1 x i1 + + β k x ik + u i, where y i is the dependent or explained variable and x i1, x i2,..., x ik are independent or explanatory variables, u is the error term, and β 0, β 1,..., β k are regression coefficients (slope coefficients) (β 0 is called the intercept term or constant term). 6

A notational convenience: (2) y i = x i β + u i, where x i = (1, x 1i,..., x ik ) and β = (β 0, β 1,..., β k ) are k + 1 column vectors. Stacking the x-observation vectors to an n (k + 1) matrix (3) X = x 1 x 2. x i. x n = 1 x 11 x 12... x 1k 1. x 21. x 22.... x 2k. 1. x i1. x i2.... x ik. 1 x n1 x n2... x nk we can write (4) y = Xβ + u, where y = (y 1,..., y n ), and u = (u 1,..., u n ). 7

Example 1.2: In Example 1.1 the interest is whether grant for employee education decreases product failures. The estimated model is assumed to be (5) log(scrap) = β 0 + β 1 grant + β 2 grant 1 + u, where scrap is scarp rate (per 100 items), grant = 1 if firm received grant in year t grant = 0 otherwise, and grant 1 = 1 if firm received grant in the previous year grant 1 = 0 otherwise. The above model does not take into account that the data consist of three consecutive year measurements from the same firms (i.e., panel data). 8

Ordinary Least Squares (OLS) Estimation yields (Stata): regress lscrap grant grant_1 Source SS df MS Number of obs = 162 ----------+------------------------------ F( 2, 159) = 0.30 Model 1.34805124 2.67402562 Prob > F = 0.7395 Residual 354.397022 159 2.22891209 R-squared = 0.0038 ----------+------------------------------ Adj R-squared = -0.0087 Total 355.745073 161 2.20959673 Root MSE = 1.493 --------------------------------------------------- lscrap Coef. Std. Err. t P> t --+------------------------------------------------ grant.0543534.310501 0.18 0.861 grant_1 -.2652102.36995-0.72 0.474 _cons.4150563.139828 2.97 0.003 --------------------------------------------------- Neither of the coefficients are statistically significant and grant has even positive sign, although close to zero. Dealing later with the panel estimation we will see that the situation can be improved. 9

The problem with the above estimation is that the OLS assumptions are usually not met in panel data. This will be discussed in the next chapter. The OLS assumptions are: (i) E[u i X] = 0 for all i (ii) Var[u i X] = σu 2 for all i (iii) Cov[u i, u j X] = 0 for all i j, (iv) X is a n (k + 1) matrix with rank k + 1 Remark 1.1: Assumption (1) implies (6) Cov[u i, X] = 0, which is crucial in OLS-estimation. 10

Under assumptions (i) (iv) the OLS estimator (7) ˆβ = (X X) 1 X y is the Best Linear Unbiased Estimator (BLUE) of the regression coefficients β of the linear model in equation (4). This is known as the Gauss-Markov theorem. 11

The variance covariance matrix of ˆβ is (8) Var[ˆβ] = (X X) 1 σ 2 u, which depends upon the unknown variance σ 2 u of the error terms u i. In order to obtain an estimator for Var[ˆβ], use the residuals (9) û = y X ˆβ in order to calculate (10) s 2 u = û û/(n k 1), which is an unbiased estimator of the error variance σ 2 u. Then replace σ 2 u in (8) with s 2 u in order to obtain (11) Var[ˆβ] = (X X) 1 s 2 u as an unbiased estimator of Var[ˆβ]. 12

1.4 Regression statistics Sum of Squares (SS) identity: (12) SST = SSR + SSE, where (13) Total: SST = n i=1 (y i ȳ) 2 (14) Model: SSR = n i=1 (ŷ i ȳ) 2, (15) Residual: SSE = n i=1 (y i ŷ i ) 2 with ŷ i = x iˆβ, and ȳ = 1 n ni=1 y i, the sample mean. 13

Goodness of fit: R-square, R 2 (16) R 2 = SSR SST = 1 SSE SST, Adjusted R-square (Adj R-square), R 2 (17) R 2 = 1 where (18) s 2 u = SSE/(n k 1) 1 n k 1 SST/(n 1) n i=1 û 2 i = = 1 s2 u s 2, y SSE n k 1 is an estimator of the variance σu 2 = Var[u i ] of the error term (s u = s 2 u, Root MSE in the Stata output), and (19) s 2 y = 1 n 1 n i=1 is the sample variance of y. (y i ȳ) 2 14

1.5 Inference Assumption (v) u N(0, σui), 2 where I is an n n identity matrix. Individual coefficient restrictions: Hypotheses are of the form (20) H 0 : β j = βj, where βj is a given constant. t-statistics: (21) t = where ˆβ j β j s.e(ˆβ j ), (22) s.e(ˆβ j 1 ) = s u (X X) jj, and (X X) jj is the jth diagonal element of (X X) 1. (First diagonal element for ˆβ 0, second diagonal element for ˆβ 1, etc.) 15

Confidence intervals: A 100(1 α)% confidence interval for a single parameter is of the form (23) ˆβ j ± t α/2 s.e(ˆβ j ), where t α/2 is the 1 α/2 percentile of the t-distribution with df = n k 1 degrees of freedom, which may be obtained from excel with the command TINV(α, df). 16

F -test: The overall hypothesis that none of the explanatory variables influence the y-variable, i.e., (24) H 0 : β 1 = β 2 = = β k = 0 is tested by an F -test of the form (25) F = SSR/k SSE/(n k 1), which is F -distributed with degrees of freedom f 1 = k and f 2 = n k 1 if the null hypothesis is true. 17

General (linear) restrictions: (26) H 0 : Rβ = q, where R is a fixed m (k + 1) matrix and q is a fixed m-vector. m indicates the number of independent linear restrictions imposed to the coefficients. The alternative hypothesis is (27) H 1 : Rβ q. The null hypothesis in (26) can be tested with an F -statistic of the form (28) F = (SSE R SSE U )/m SSE U /(n k 1), which under the null hypothesis has the F - distribution with degrees of freedom f 1 = m and f 2 = n k 1. SSE R and SSE U denote the residual sum of squares obtained in the restricted and unrestricted models, respectively. 18

Examle 1.3: Consider model (29) y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + u. In terms of the general linear hypothesis (26) testing for single coefficients, e.g., (30) H 0 : β 1 = 0 is obtained by selecting (31) R = (0 1 0 0 0) and q = 0. The null hypothesis in (24), i.e., (32) H 0 : β 1 = β 2 = β 3 = β 4 = 0 is obtained by selecting (33) R = 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 and q = 0 0 0 0 (34) H 0 : β 1 + β 2 = 1, β 3 = β 4 corresponds to ( ) 0 1 1 0 0 (35) R = 0 0 0 1 1 and q = ( 1 0 ). 19

Example 1.4. Consider the following consumption function (C =consumption, Y = disposable income): (36) C t = β 0 + β 1 Y t + β 2 C t 1 + u t. Then β 1 = dc t /dy t is called the short-run MPC (marginal propensity to consume). The long-run MPC β lrmpc = de(c)/de(y ) is (37) β lrmpc = β 1 1 β 2. Test the hypothesis whether the long run MPC = 1, i.e., (38) H 0 : β 1 1 β 2 = 1. This is equivalent to β 1 + β 2 = 1. Thus, the non-linear hypothesis (38) reduces in this case to the linear hypothesis (39) H 0 : β 1 + β 2 = 1, and we can use the general linear hypothesis of the form (26) with (40) R = (0 1 1) and q = 1. 20

Remark 1.2: Hypotheses of the form (39) can be easily tested with the standard t-test by re-parameterizing the model. Defining Z t = C t 1 Y t, equation (36) is (statistically) equivalent to (41) C t = β 0 + γy t + β 2 Z t + u t, where γ = β 1 + β 2. Thus, in terms of (41) testing hypothesis (38) reduces to testing (42) H 0 : γ = 1, which can be worked out with the usual t-statistic. (43) t = ˆγ 1 s.e(ˆγ). 21

Example 1.5: Generalized Cobb-Douglas production function in transportation industry Y i = value added (output), L = labor, K = capital, and N = the number of establishments in the transportation industry. (44) log(y/n) = β 0 + β 1 log(k/n) + β 2 log(l/n) + u. Estimation results: Dependent Variable: LOG(VALUEADD/NFIRM) Method: Least Squares Sample: 1 25 Included observations: 25 ============================================================== Variable Coefficient Std. Error t-statistic Prob. -------------------------------------------------------------- C 2.293263 0.107183 21.39582 0.0000 LOG(CAPITAL/NFIRM) 0.278982 0.080686 3.457639 0.0022 LOG(LABOR/NFIRM) 0.927312 0.098322 9.431359 0.0000 ============================================================== R-squared 0.959742 Mean dependent var 0.771734 Adjusted R-squared 0.956082 S.D. dependent var 0.899306 S.E. of regression 0.188463 Akaike info criter. -0.387663 Sum squared resid 0.781403 Schwarz criterion -0.241398 Log likelihood 7.845786 Hannan-Quinn criter. -0.347095 F-statistic 262.2396 Durbin-Watson stat 1.937830 Prob(F-statistic) 0.000000 Zellner, A and N. Revankar (1970). Generalized production functions, Review of Economic Studies 37, 241 250. Data Source: http:// people.stern.nyu.edu/wgreene/text/econometricanalysis.htm Table F7.2 22

According to the results the capital elasticity is 0.279 and the labor elasticity is 0.927, thus labor intensive. Remark 1.3: Estimation of the regression parameters under the restrictions of the form Rβ = q are obtained by using restricted Least Squares, provided by modern statistical packages. Let us test for the constant return to scale, i.e., (45) H 0 : β 1 + β 2 = 1. The general restricted hypothesis method (26) yields =========================================== Test statistics df p-value ------------------------------------------- F-statistic 14.82203 (1, 22) 0.0009 =========================================== which rejects the null hypothesis. In order to demonstrate the re-parametrization approach, define the regression model (46) log(y/n) = β 0 + γ log(k/n) + β 2 log(l/k) + u Estimation of the specification yields 23

Dependent Variable: LOG(VALUEADD/NFIRM) Method: Least Squares Sample: 1 25 Included observations: 25 ============================================================== Variable Coefficient Std. Error t-statistic Prob. -------------------------------------------------------------- C 2.293263 0.107183 21.39582 0.0000 LOG(CAPITAL/NFIRM) 1.206294 0.053584 22.51232 0.0000 LOG(LABOR/CAPITAL) 0.927312 0.098322 9.431359 0.0000 ============================================================== R-squared 0.959742 Mean dependent var 0.771734 Adjusted R-squared 0.956082 S.D. dependent var 0.899306 S.E. of regression 0.188463 Akaike info criter. -0.387663 Sum squared resid 0.781403 Schwarz criterion -0.241398 Log likelihood 7.845786 Hannan-Quinn criter. -0.347095 F-statistic 262.2396 Durbin-Watson stat 1.937830 Prob(F-statistic) 0.000000 All the goodness-of-fit of these models are exactly the same, indicating the equivalence of the models in a statistical sense. The null hypothesis of the constant returns to scale in terms of this model is (47) H 0 : γ = 1. The t-value is (48) t = ˆγ 1 s.e(ˆγ) = 1.206294 1 0.053584 3.85 with p value = 0.0009, exactly the same as above, again rejecting the null hypothesis. 24

1.6 Nonlinear hypotheses Economic theory implies sometimes nonlinear hypotheses. In fact, the long-run MPC example is an example of non-linear hypothesis, which we could transform to a linear hypothesis. This is not always possible. For example a hypothesis of the form (49) H 0 : β 1 β 2 = 1 is nonlinear. Non-linear hypotheses can be tested using Wald-test, Lagrange multiplier test, or Likelihood Ratio (LR) test. All of these are under the null hypothesis asymptotically χ 2 -distributed with degrees of freedom equal to the number of imposed restrictions on the parameters. These tests will be considered more closely later, after a brief discussion of maximum likelihood estimation. 25

1.7 Maximum Likelihood Estimation Likelihood Function Generally, suppose that the probability distribution of a random variable Y depends on a set of parameters, θ = (θ 1,..., θ q ), then the probability density for the random variable is denoted as f Y (y; θ). If for example Y N(µ, σ 2 ), then θ = (µ, σ 2 ) and (50) f Y (y; µ, σ 2 ) = 1 2πσ 2 e (y µ)2 2σ 2. In probability calculus we consider θ as given and use the density f Y in order to calculate the probability that Y attains a value near y as P (y y Y y+ y) = y+ y y y f Y (u; θ)du. In maximum likelihood estimation we consider the data point y as given and ask which parameter set θ most likely produced it. In that context f Y (y; θ) is called the likelihood of observation y on the random variable Y. 26

In statistical analysis we may regard a sample of observations y 1,..., y n as realisations (observed values) of independent random variables Y 1,..., Y n. If all random variables are identically distributed, that is, they all share the same density f(y i ; θ), then the likelihood function of (y 1,..., y n ) is the product of the likelihoods of each point, that is, (51) L(θ) L(θ; y 1,..., y n ) = n i=1 f(y i ; θ). Taking (natural) logarithms on both sides, we get the log likelihood function (52) l(θ) log L(θ) = n i=1 log f(y i ; θ). Denoting the log-likelihoods of individual observations as l i (θ) = log f(y i ; θ), we can write (52) as (53) l(θ) = n i=1 l i (θ). 27

Example 1.6: Under the normality assumption of the error term u i in the regression (54) y i = x iβ + u i (55) u i N(0, σ 2 u). It follows that given x i (56) y i x i N(x iβ, σ 2 u). Thus, with θ = (β, σ 2 u), the (conditional) density function is (57) f(y i x i ; θ) = 1 e (yi x i β)2 2πσ 2 u 2σ 2 u, (58) l i (θ) = 1 2 log(2π) 1 2 log σ2 u 1 (y i x i β)2, 2 and (59) l(θ) = n 2 log(2π) n 2 log σ2 u 1 2 σ 2 u n (y i x i β)2. i=1 In matrix form (59) becomes (60) l(θ) = n 2 log(2π) n 2 log σ2 u 1 (y Xβ) (y Xβ). 2σu 2 σ 2 u 28

Maximum Likelihood Estimate We say that the parameter vector θ is identified or estimable if for any other parameter vector θ θ, for some data data y, L(θ ; y) L(θ; y). Given data y the maximum likelihood estimate (MLE) of θ is the value ˆθ of the parameter for which (61) L(ˆθ) = max θ L(θ), i.e., the parameter value that maximizes the likelihood function. The MLE of a parameter vector θ solves (62) L(θ; y)/ θ i = 0 (i = 1,..., q), provided the matrix of second derivatives is negative definite. In practice it is usually more convenient to maximize the log-likelihood, such that the MLE of θ is the value ˆθ which satisfies (63) l(ˆθ) = max θ l(θ). 29

Example 1.7. Consider the simple regression model (64) y i = β 0 + β 1 x i + u i, with u i N(0, σ 2 u). Given a sample of observations (y 1, x 1 ), (y 2, x 2 ),..., (y n, x n ), the log likelihood is (65) l(θ) = n 2 log(2π) n 2 log σ2 u 1 2 where θ = (β 0, β 1, σ 2 u). n (y i β 0 β 1 x i ) 2 /σu 2, i=1 The maximum of (65) can be found by setting the partial derivatives to zero, that is, (66) l β 0 = n i=1 (y i β 0 β 1 x i )/σ 2 u = 0, l β 1 = n i=1 x i (y i β 0 β 1 x i )/σ 2 u = 0, l σ 2 u = n 2σ 2 u + 1 2(σ 2 u )2 ni=1 (y i β 0 β 1 x i ) 2 = 0. 30

Solving these gives ni=1 (x i x)(y i ȳ) (67) ˆβ 1 = ni=1 (x i x) 2 (68) ˆβ 0 = ȳ ˆβ 1 x and (69) ˆσ 2 u = 1 n where n i=1 û 2 i, (70) û i = y i ˆβ 0 ˆβ 1 x i is the regression residual and ȳ and x are the sample means of y i and x i. In this particular case the ML estimators of the regression parameters, β 0 and β 1 coincide with the OLS estimators. In OLS the error variance σ 2 u estimator is (71) s 2 = 1 n 2 n i=1 û 2 i = n n 2ˆσ2 u. 31

Properties of Maximum Likelihood Estimators Let θ 0 be the population value of the parameter (vector) θ and let ˆθ be the MLE of θ 0. Then (a) Consistency: plim ˆθ = θ 0, i.e., ˆθ is a consistent estimator of θ 0 (b) Asymptotic normality: ˆθ N ( θ 0, I(θ 0 ) 1) asymptotically, where [ 2 ] l(θ) (72) I(θ 0 ) = E θ θ θ=θ 0. That is, ˆθ is asymptotically normally distributed. I(θ 0 ) is called the Fisher information matrix and (73) H = 2 l(θ) θ θ is called the Hessian of the log-likelihood. 32

(c) Asymptotic efficiency: ˆθ is asymptotically efficient. That is, in the limit as the sample size grows, MLE is unbiased and its (limiting) variance is smallest among estimators that are asymptotically unbiased. (d) Invariance: The MLE of γ 0 = g(θ 0 ) is g(ˆθ), where g is a (continuously differentiable) function. Example 1.8: In Example 1.7 the MLE of the error variance σ 2 u is given by ˆσ2 u defined in equation (69). Using property (d), the MLE of the standard deviation σ u = σ 2 u is ˆσ u = ˆσ 2 u. 33

Remark 1.5: The inverse of the Fisher information matrix defined in (72), I(θ 0 ) 1, plays a similiar role in MLE as does σ 2 (X X) 1 in OLS. I.e. it may be used to find the standard errors of the ML estimators. Example 1.9: Consider MLE from n observations on a normally distributed random variable with unknown parameter vector θ = (µ, σ 2 ). The log likelihood function is in analogy to (65) (74) l(θ) = n 2 log(2π) n 2 log σ2 1 n (y i µ) 2 /σ 2. 2 i=1 The first partial derivatives are (75) l µ = (yi µ) l, σ 2 σ = n 2 2σ + 1 2 2(σ 2 ) 2 n (y i µ) 2. i=1 Setting these equal to zero yields the ML estimators (76) ˆµ = 1 n y i = ȳ and ˆσ 2 = 1 n (y i ȳ) 2. n n i=1 The second derivatives are (77) 2 l µ 2 = n σ 2, 2 l ( σ 2 ) 2 = i=1 n 2(σ 2 ) 2 (yi µ) 2 (σ 2 ) 3, and 2 l µ σ 2 = 2 l σ 2 µ = (yi µ) (σ 2 ) 2, 34

such that the Hessian matrix becomes (78) H = n (yi µ) σ 2 (σ 2 ) 2 (yi µ) n (σ 2 ) 2 2(σ 2 ) 2 (yi µ) 2 (σ 2 ) 3 Taking expectations and multiplying with -1 yields the information matrix ( ) n/σ 2 0 (79) I(θ 0 ) = 0 n/2σ 4 with inverse (80) I(θ 0 ) 1 = ( σ 2 ) /n 0 0 2σ 4. /n The standard errors of the ML estimators are found by taking the square roots of the diagonal elements of I(ˆθ 0 ) 1, that is ˆσ n is the standard error of ˆµ, and ˆσ 2 2/n is the standard error of ˆσ2.. These may be used to construct confidence intervals in the usual way. 35

1.8 Likelihood Ratio, Wald, and Lagrange Multiplier tests For testing general restrictions of the form (81) H 0 : c(θ) = 0, where c( ) is some (vector valued) function, there are three general purpose test methods to be discussed on the following slides. Remark 1.6: We could specify the above hypothesis alternatively as (82) H 0 : r(θ) = q, where r( ) is some function and q is some constant. Defining c(θ) = r(θ) q reduces then to hypothesis (81) stated above. 36

1. Likelihood ratio test (LR-test) ( ) LR (83) LR = 2 log where L U = 2(l R l U ), L R = max L(θ) θ,c(θ)=0 is the maximum of the likelihood under the restriction of hypothesis (81), L U = max L(θ) θ is the unrestricted maximum of the likelihood function (l U = log L U and l R = log L R ). Remark 1.7: Use of the LR test requires computing both the restricted MLE of θ (to compute l R ) and the unrestricted MLE (to compute l U ). 37

Example 1.10. The LR test for the linear regression model is (84) LR = n(log SSE R log SSE U ), where SSE R and SSE U are the residual sum of squares for the restricted and unrestricted model respectively, where this time arbitrary (not only linear) restrictions are allowed. To see that (84) holds, insert the residual sum of squares SSE = n i=1 (y i x i β)2 and the ML estimate ˆσ u 2 = SSE/n into (60): (85) l(θ 0 ) = n 2 log(2π) n ( ) SSE 2 log 1 n n 2SSE SSE = n ( ( )) SSE 1 + log(2π) + log. 2 n Hence LR = 2(l R l U ) = n[log(sse R /n) log(sse U /n)] = n(log SSE R log SSE U ). 38

2. Wald test (86) W = c(ˆθ) V 1 c(ˆθ), where V is the asymptotic variance covariance matrix of c(ˆθ). Remark 1.8: find the unrestricted MLE. Use of the Wald test requires only to The Wald test for linear regression with normally distributed errors is (87) W = SSE R SSE U SSE U /(n k 1), where k is the number of regressors (without the constant). 39

3. Lagrange multiplier test (LM) (88) LM = ( ) l(ˆθ R ) [I(ˆθ R ) ] ( 1 l(ˆθ R ) θ θ where ˆθ R is the restricted MLE satisfying the restriction c(ˆθ R ) = 0 of the general hypothesis (81). ), Remark 1.9: restricted MLE. Use of the LM test requires only the The Lagrange multiplier test for linear regression with normally distributed errors is (89) LM = SSE R SSE U SSE R /(n k + q 1), where k is the number of regressors (without constant) and q is the number of restrictions. 40

Under the null hypothesis (81) each of these test statistics is asymptotically χ 2 -distributed with degrees of freedom equal to the number of restrictions q. Thus, they are asymptotically equivalent. In small samples numerical values may differ, however. Usually the LR test is preferred, because it can be shown under fairly general conditions to be the most powerful test. Bear in mind that while the tests can be developed for arbitrary distributions of the error term, their exact form depends upon that distribution. I.e. the test statistics (84), (87) and (89) apply only for regressions with normally distributed error terms. Also bear in mind that the tests apply only in large samples. 41