Multiple Regression Analysis. Part III. Multiple Regression Analysis

Similar documents
The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The Simple Regression Model. Part II. The Simple Regression Model

CHAPTER 6: SPECIFICATION VARIABLES

Heteroskedasticity. Part VII. Heteroskedasticity

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Homoskedasticity. Var (u X) = σ 2. (23)

ECNS 561 Multiple Regression Analysis

Statistical Inference. Part IV. Statistical Inference

Model Specification and Data Problems. Part VIII

The Multiple Regression Model Estimation

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

Multiple Linear Regression CIVL 7012/8012

ECON The Simple Regression Model

Answers to Problem Set #4

Multivariate Regression Analysis

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

THE MULTIVARIATE LINEAR REGRESSION MODEL

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Heteroscedasticity 1

2. Linear regression with multiple regressors

Statistical Inference with Regression Analysis

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Intermediate Econometrics

3. Linear Regression With a Single Regressor

Advanced Quantitative Methods: ordinary least squares

Multiple Linear Regression

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Linear Regression. Junhui Qian. October 27, 2014

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

6. Assessing studies based on multiple regression

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Making sense of Econometrics: Basics

Multiple Regression Analysis

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Econometrics - Slides

Empirical Economic Research, Part II

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

Introductory Econometrics

x = 1 n (x = 1 (x n 1 ι(ι ι) 1 ι x) (x ι(ι ι) 1 ι x) = 1

Heteroskedasticity and Autocorrelation

Outline. 2. Logarithmic Functional Form and Units of Measurement. Functional Form. I. Functional Form: log II. Units of Measurement

Motivation for multiple regression

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Applied Statistics and Econometrics

Making sense of Econometrics: Basics

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Multiple Regression Analysis

Least Squares Estimation-Finite-Sample Properties

Review of Econometrics

Econometrics Summary Algebraic and Statistical Preliminaries

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

11. Simultaneous-Equation Models

ECO220Y Simple Regression: Testing the Slope

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Applied Econometrics (QEM)

Eastern Mediterranean University Department of Economics ECON 503: ECONOMETRICS I. M. Balcilar. Midterm Exam Fall 2007, 11 December 2007.

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Brief Suggested Solutions

4. Nonlinear regression functions

10. Time series regression and forecasting

Introduction to Statistical modeling: handout for Math 489/583

Making sense of Econometrics: Basics

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Regression Models - Introduction

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2

Applied Quantitative Methods II

In Chapter 2, we learned how to use simple regression analysis to explain a dependent

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Applied Regression Analysis

Regression Analysis: Basic Concepts

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

ECON3150/4150 Spring 2015

Correlation and Regression

Multiple Regression Analysis: Further Issues

Intermediate Econometrics

Econometrics Review questions for exam

Inference in Regression Analysis

Multiple Regression Analysis

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Econ 510 B. Brown Spring 2014 Final Exam Answers

Topic 4: Model Specifications

In Chapter 2, we learned how to use simple regression analysis to explain a dependent

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Econometrics Multiple Regression Analysis: Heteroskedasticity

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016

7. Prediction. Outline: Read Section 6.4. Mean Prediction

ECON Program Evaluation, Binary Dependent Variable, Misc.

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Transcription:

Part III Multiple Regression Analysis As of Sep 26, 2017

1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

The general linear regression with k explanatory variables is just an extension of the simple regression as follows Because y = β 0 + β 1 x 1 + + β k x k + u. (1) y x j = β j (2) j = 1,..., k, coefficient β j indicates the marginal effect of variable x j, and indicates the amount y is expected to change as x j changes by one unit and other variables are kept constant (ceteris paribus).

Multiple regression opens up several additional options to enrich analysis and make modeling more realistic compared to the simple regression.

Example 3.1: Consider the consumption function C = f (Y ), where Y is income. Suppose the assumption is that as incomes grow the marginal propensity to consume decreases. In simple regression we could dry to fit a level-log model or log-log model. One possibility also could be β 1 = β 1l + β 1q Y, where according to our hypothesis β 1q < 0. Thus the consumption function becomes C = β 0 + (β 1l + β 1q Y )Y + u = β 0 + β 1l Y + β 1q Y 2 + u This is a multiple regression model with x 1 = Y and x 2 = Y 2.

This simple example demonstrates that we can meaningfully enrich simple regression analysis (even though we have essentially only two variables, C and Y ) and at the same time get a meaningful interpretation to the above polynomial model. Technically, considering the simple regression C = β 0 + β 1 Y + v, the extension C = β 0 + β il Y + β 1q Y 2 + u means that we have extracted the quadratic term Y 2 out from the error term v of the simple regression.

Estimation 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Estimation In order to estimate the model we replace the classical assumption 4 by 4. The explanatory variables are in mathematical sense linearly independent. That is no column in the data matrix X (see below) is a linear combination of the other columns. The key assumption (Assumption 5) in terms of the conditional expectation becomes E[u x 1,..., x k ] = 0. (3)

Estimation Given observations (y i, x i1,..., x ik ), i = 1,..., n (n = the number of observations), the estimation method again is the OLS, which produces estimates ˆβ 0, ˆβ 1,..., ˆβ k by minimizing n (y i β 0 β 1 x i1 β k x ik ) 2 (4) i=1 with respect to the parameters. Again the first order solution is to set the (k + 1) partial derivatives equal to zero. The solution is straightforward although the explicit form of the estimators become complicated.

Estimation 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Estimation Using matrix algebra simplifies considerably the notations in multiple regression. Denote the observation vector on y as where the prime denotes transposition. y = (y 1,..., y n ), (5) In the same manner denote the data matrix on x-variables enhanced with ones in the first column as an n (k + 1) matrix 1 x 11 x 12 x 1k 1 x 21 x 22 x 2k X =...., (6) 1 x n1 x n2 x nk where k < n (the number of observations, n, is larger than the number of x-variables, k).

Estimation Then we can present the whole set of regression equations for the sample y 1 = β 0 + β 1 x 11 + + β k x 1k + u 1 y 2 = β 0 + β 1 x 21 + + β k x 2k + u 2 in the matrix form as y 1 y 2 =.. y n or shortly where. y n = β 0 + β 1 x n1 + + β k x nk + u n 1 x 11 x 12 x 1k 1 x 21 x 22 x 2k... 1 x n1 x n2 x nk β 0 β 1 β 2. β k + u 1 u 2. u n (7) (8) y = Xβ + u, (9) β = (β 0, β 1,..., β k )

Estimation The normal equations for the first order conditions of the minimization of (4) in matrix form are simply X Xˆβ = X y (10) which gives the explicit solution for the OLS estimator of β as where ˆβ = ( ˆβ 0, ˆβ 1,..., ˆβ k ). The fitted model is i = 1,..., n. ˆβ = (X X) 1 X y, (11) ŷ i = ˆβ 0 + ˆβ 1 x i1 + + ˆβ k x ik, (12) Remark 3.1: For example, if you fit the simple regression ỹ = β 0 + β 1 x 1, where β 0 and β 1 are the OLS estimators, and fit a multiple regression ŷ = ˆβ 0 + ˆβ 1 x 1 + ˆβ 2 x 2 then generally β 1 ˆβ 1 unless ˆβ 2 = 0, or x 1 and x 2 are uncorrelated.

Estimation Example 3.2: Consider the hourly wage example. enhance the model as log(w) = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3, (13) where w = average hourly earnings, x 1 = years of education (educ), x 2 = years of labor market experience (exper), and x 3 = years with the current employer (tenure). Dependent Variable: LOG(WAGE) Method: Least Squares Sample: 1 526 Included observations: 526 Variable Coefficient Std. Error t-statistic Prob. C 0.284360 0.104190 2.729230 0.0066 EDUC 0.092029 0.007330 12.55525 0.0000 EXPER 0.004121 0.001723 2.391437 0.0171 TENURE 0.022067 0.003094 7.133070 0.0000 R-squared 0.316013 Mean dependent var 1.623268 Adjusted R-squared 0.312082 S.D. dependent var 0.531538 S.E. of regression 0.440862 Akaike info criterion 1.207406 Sum squared resid 101.4556 Schwarz criterion 1.239842 Log likelihood -313.5478 F-statistic 80.39092 Durbin-Watson stat 1.768805 Prob(F-statistic) 0.000000

Goodness-of-Fit 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Goodness-of-Fit Again in the same manner as with the simple regression we have or where n n n (y i ȳ) 2 = (ŷ i ȳ) 2 + (y i ŷ i ) 2 (14) i=1 i=1 i=1 SST = SSE + SSR, (15) ŷ i = ˆβ 0 + ˆβ 1 x i1 + + ˆβ k x ik. (16)

Goodness-of-Fit 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Goodness-of-Fit R 2 = SSE SST = 1 SSR SST. (17) Again as in the case of the simple regression, R 2 can be shown to be the squared correlation coefficient between the actual y i and fitted ŷ i. This correlation is called the multiple correlation R = (yi ȳ)(ŷ i ŷ) (yi ȳ) 2. (18) (ŷi ŷ) 2 Remark 3.2: ŷ = ȳ. Remark 3.3: R 2 never decreases when an explanatory variable is added to the model.

Goodness-of-Fit 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Goodness-of-Fit where s 2 u = 1 n k 1 R 2 = 1 s2 u sy 2, (19) n (y i ŷ i ) 2 = i=1 1 n k 1 n ûi 2 (20) i=1 is an estimator of the error variance σ 2 u = var[u i ].

Expected values of the OLS estimators 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Expected values of the OLS estimators Under the classical assumptions 1 5 (with assumption 5 updated to multiple regression) we can show that the estimators of the regression coefficients are unbiased. That is Theorem 3.1: Given observations on the x-variables [ ] E ˆβ j = β j (21)

Expected values of the OLS estimators Using matrix notations, the proof of Theorem 3.1 is pretty straightforward. To do this write ˆβ = (X X) 1 X y = (X X) 1 X (Xβ + u) = (X X) 1 X Xβ + (X X) 1 X u (22) = β + (X X) 1 X u. Given X, the expected value of ˆβ is ] E[ˆβ = β + (X X) 1 X E[u] = β (23) because by Assumption 1 E[u] = 0. Thus the OLS estimators of the regression coefficients are unbiased.

Expected values of the OLS estimators Remark 3.4: If z = (z 1,..., z k ) is a random vector, then E[z] = (E[z 1 ],..., E[z n ]). That is the expected value of a vector is a vector whose components are the individual expected values.

Expected values of the OLS estimators 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Expected values of the OLS estimators Suppose the correct model is but we estimate the model as y = β 0 + β 1 x 1 + u (24) y = β 0 + β 1 x 1 + β 2 x 2 + u. (25) Thus, β 2 = 0 in reality. The OLS estimation results yield ŷ = ˆβ 0 + ˆβ 1 x 1 + ˆβ 2 x 2 (26) [ ] [ ] By Theorem 3.1 E ˆβ j = β j, thus in particular E ˆβ 2 = β 2 = 0, implying that inclusion of extra variables to a regression does not bias the results. However, as will be seen later, they decrease accuracy of estimation by increasing variance of the OLS estimates.

Expected values of the OLS estimators 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Expected values of the OLS estimators Suppose now as an example that the correct model is but we misspecify the model as y = β 0 + β 1 x 1 + β 2 x 2 + u, (27) y = β 0 + β 1 x 1 + v, (28) where the omitted variable is embedded into the residual term v = β 2 x 2 + u. OLS estimator for β 1 for specification (28)is β 1 = (xi1 x 1 )y i (xi1 x 1 ) 2. (29)

Expected values of the OLS estimators From Equation (2.38) we have n β 1 = β 1 + a i v i, (30) i=1 where a i = (x i1 x 1 ) (xi1 x 1 ) 2. (31)

Expected values of the OLS estimators Thus because E[v i ] = E[β 2 x i2 + u i ] = β 2 x i2 [ ] E β1 = β 1 + a i E[v i ] = β 1 + a i β 2 x i2 (32) = β 1 + β 2 (xi1 x 1 )x i2 (xi1 x 1 ) 2 i.e., [ ] (xi1 x 1 )x i2 E β 1 = β 1 + β 2 (xi1 x 1 ) 2, (33) implying that β 1 is biased for β 1 unless x 1 and x 2 are uncorrelated (or β 2 = 0). This is called the omitted variable bias.

Expected values of the OLS estimators The direction of the omitted variable bias is as follows: corr[x 1, x 2 ] > 0 corr[x 1, x 2 ] < 0 β 2 > 0 positive bias negative bias β 2 < 0 negative bias positive bias

1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Write the regression model y i = β 0 + β 1 x i1 + + β k x ik + u i (34) i = 1,..., n in the matrix form y = Xβ + u. (35) Then we can write the OLS estimators compactly as ˆβ = (X X) 1 X y. (36)

Under the classical assumption 1 5, and assuming X fixed we can show that the variance-covariance matrix β is ] cov[ˆβ = (X X) 1 σu. 2 (37) Variances of the individual coefficients are obtained form the main diagonal of the matrix, and can be shown to be of the form [ ] var ˆβ j = σ 2 u (1 R 2 j ) n i=1 (x ij x j ) 2, (38) j = 1,..., k, where Rj 2 is the R-square when regressing x j on the other explanatory variables and the constant term.

1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

In terms of linear algebra, we say that vectors are linearly independent if x 1, x 2,..., x k a 1 x 1 + + a k x k = 0 holds only if a 1 = = a k = 0. Otherwise x 1,..., x k are linearly dependent. In such a case some a l 0 and we can write x l = c 1 x 1 + + c l 1 x l 1 + c l+1 x l+1 + + c k x k, where c j = a j /a l that is x l can be represented as a linear combination of the other variables.

In statistics the multiple correlation measures the degree of linear dependence. If the variables are perfectly linearly dependent. That is, if for example, x j is a linear combination of other variables, the multiple correlation R j = 1. A perfect linear dependence is rare between random variables. However, particularly between macro economic variables dependencies are often high.

[ ] From the variance equation (38) we see that var ˆβ j as R 2 j 1. That is, the more the explanatory variables are linearly dependent the larger the variance becomes. This implies that the coefficient estimates become increasingly unstable. High (but not perfect) correlation between two or more explanatory variables is called multicollinearity.

Symptoms of multicollinearity: 1 High correlations between explanatory variables. 2 R 2 is relatively high, but the coefficient estimates tend to be insignificant (see the section of hypothesis testing) 3 Some of the coefficients are of wrong sign and some coefficients are at the same time unreasonably large. 4 Coefficient estimates change much from one model alternative to another.

Example 3.3: Variable E t denotes expenditure costs in a sample of Toyota Mark II cars at time point t, M t denotes the mileage and A t age. Consider model alternatives: Model A: Model B: Model C: E t = α 0 + α 1 A t + u 1t E t = β 0 + β 1 M t + u 2t E t = γ 0 + γ 1 M t + γ 2 A t + u 3t

Estimation results: (t-values in parentheses) Variable Malli A Malli B Malli C Constant -626.24-796.07 7.29 (-5.98) (-5.91) (0.06) Age 7.35 27.58 (22.16) (9.58) Miles 53.45-151.15 (18.27) (-7.06) df 55 55 54 R 2 0.897 0.856 0.946 ˆσ 368.6 437.0 268.3 Findings: Apriori, coefficients α 1, β 1, γ 1, and γ 2 should be positive. However, ˆγ 2 = 151.15 (!!?), but ˆβ 1 = 53.45. Correlation r M,A = 0.996!

Remedies: In the collinearity problem the question is the there is not enough information to reliably identify each variables contribution as an explanatory variable in the model. Thus in order to alleviate the problem: 1 Use non-sample information if available to impose restrictions between coefficients. 2 Increase the sample size if possible. 3 Drop the most collinear (on the base of R 2 j ) variables. 4 If a linear combination (usually a sum) of the most collinear variables is meaningful, replace the collinear variables by the linear combination.

Remark: Multicollinearity is not always harmful. If an explanatory variable is not correlated with the rest of the explanatory variables multicollinearity of these variables (provided that it is not perfect) does not harm the variance of the slope coefficient estimator of the uncorrelated explanatory variable. If x j is not correlated with any of the other predictors, Rj 2 = 0 and hence, as is seen from equation (38), the correlation factor (1 Rj 2 ) drops out [ ] from its slope coefficient estimator variance var ˆβj. Another case where multicollinearity (again excluding the perfect case) is not a problem is if we are purely interested in the predictions of the model. Predictions are affected essentially only by the usefulness of the available information.

Exploring further (Wooldridge 5e, p.93) Consider a model explaining final exam score by class attendance (number of classes attended). To control for student abilities and efforts outside the classroom, additional explanatory variables like cumulative GPA, SAT score, and a measure of high school performance are included. Someone says, You cannot learn much from this exercise because cumulative GPA, SAT score, and high school performance are likely to be highly collinear. What should be your response?

1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Consider again as in (27) the regression model y = β 0 + β 1 x 1 + β 2 x 2 + u. (39) Suppose the following models are estimated by OLS ŷ = ˆβ 0 + ˆβ 1 x 1 + ˆβ 2 x 2 (40) and Then and [ ] var ˆβ 1 = ỹ = β 0 + β 1 x 1. (41) [ ] var β 1 = σ 2 u (1 r 2 12 ) (x i1 x 1 ) 2 (42) σ 2 u (xi1 x 1 ) 2, (43) where r 12 is the sample correlation between x 1 and x 2.

[ ] [ ] Thus var β1 var ˆβ1, and the inequality is strict if r 12 0. In summary (assuming r 12 0): 1 If β 2 0, then β 1 is biased, ˆβ [ ] [ ] 1 is unbiased, and var β1 < var ˆβ1 2 If β 2 = 0, then both β 1 and ˆβ [ ] [ ] 1 are unbiased, but var β1 < var ˆβ1

1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

An unbiased estimator of the error variance var[u] = σ 2 u is where ˆσ 2 u = 1 n k 1 n ûi 2, (44) i=1 û i = y i ˆβ 0 ˆβ 1 x i1 ˆβ k u ik. (45) The term n k 1 in (43) is the degrees of freedom (df). It can be shown that i.e., ˆσ 2 u is unbiased estimator of σ 2 u. E [ˆσ 2 u] = σ 2 u, (46) ˆσ u is called the standard error of the regression.

1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

Standard deviation of ˆβ j is the square root of (38), i.e. σ ˆβj = σ u var[β j ] = (1 Rj 2) (47) (x ij x j ) 2 Substituting σ u by its estimate ˆσ u = ˆσ 2 u gives the standard error of ˆβ j se( ˆβ j ) = ˆσ u (1 R 2 j ) (x ij x j ) 2. (48)

The Gauss-Markov Theorem 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant variables in a regression Omitted variable bias Multicollinearity Variances of misspecified models Estimating error variance σ 2 u Standard error of ˆβ k The Gauss-Markov Theorem

The Gauss-Markov Theorem Theorem 3.2: Under the classical assumptions 1 5 ˆβ 0, ˆβ 1,... ˆβ k are the best linear unbiased estimators (BLUEs) of β 0, β 1,... β k, respectively. BLUE: Best: The variance of the OLS estimator is smallest among all linear unbiased estimators of β j Linear: ˆβ j = n i=1 w ijy i. [ ] Unbiased: E ˆβ j = β j.