Multiple Regression Analysis

Similar documents
Multiple Regression Analysis: Heteroskedasticity

Econometrics Multiple Regression Analysis: Heteroskedasticity

Multiple Regression Analysis

Introductory Econometrics

Heteroskedasticity. Part VII. Heteroskedasticity

Intermediate Econometrics

Econometrics - 30C00200

Introduction to Econometrics. Heteroskedasticity

Topic 7: Heteroskedasticity

Course Econometrics I

Chapter 8 Heteroskedasticity

Heteroskedasticity (Section )

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

the error term could vary over the observations, in ways that are related

Econometrics Summary Algebraic and Statistical Preliminaries

Lab 11 - Heteroskedasticity

Heteroskedasticity and Autocorrelation

Introductory Econometrics

ECO375 Tutorial 7 Heteroscedasticity

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Lecture 4: Heteroskedasticity

Multiple Linear Regression

Graduate Econometrics Lecture 4: Heteroskedasticity

ECON The Simple Regression Model

Homoskedasticity. Var (u X) = σ 2. (23)

Review of Econometrics

The Simple Regression Model. Simple Regression Model 1

Introductory Econometrics

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Motivation for multiple regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

CHAPTER 6: SPECIFICATION VARIABLES

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Lectures 5 & 6: Hypothesis Testing

The Simple Linear Regression Model

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Multiple Linear Regression CIVL 7012/8012

Lecture 8: Heteroskedasticity. Causes Consequences Detection Fixes

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Econometrics I Lecture 3: The Simple Linear Regression Model

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Simple Linear Regression: The Model

Testing Linear Restrictions: cont.

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Least Squares Estimation-Finite-Sample Properties

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

Iris Wang.

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham

Semester 2, 2015/2016

Week 11 Heteroskedasticity and Autocorrelation

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Diagnostics of Linear Regression

1 Motivation for Instrumental Variable (IV) Regression

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECNS 561 Multiple Regression Analysis

Answer Key: Problem Set 6

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model

Simple Linear Regression Model & Introduction to. OLS Estimation

Heteroskedasticity Example

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Statistical Inference with Regression Analysis

Econ 510 B. Brown Spring 2014 Final Exam Answers

Inference in Regression Model

Lecture 4: Testing Stuff

Rockefeller College University at Albany

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 2 Fall 2016 Instructor: Martin Farnham

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Review of Statistics

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

ECON 497: Lecture Notes 10 Page 1 of 1

Measuring the fit of the model - SSR

Regression and Stats Primer

Financial Econometrics

EC312: Advanced Econometrics Problem Set 3 Solutions in Stata

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

2. Linear regression with multiple regressors

EC402 - Problem Set 3

Ordinary Least Squares Regression

mrw.dat is used in Section 14.2 to illustrate heteroskedasticity-robust tests of linear restrictions.

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Applied Quantitative Methods II

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Transcription:

Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity

What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables, the variance of the unobserved error, u, was constant! If this is not true, that is if the variance of u is different for different values of the x s, then the errors are heteroskedastic! Example: We estimate returns to education but ability is unobservable, and we think the variance in ability differs by educational attainment

Example of Heteroskedasticity f(y x)... E(y x) = 0 + 1 x x 1 x x 3 x

Why Worry About Heteroskedasticity?! We didn t need homoskedasticity to show that our OLS estimates are unbiased and consistent! However, the estimated standard errors of the hats are biased with heteroskedasticity! i.e we can not use the usual t or F or LM statistics for drawing inferences! And, this problem doesn t go away as n increases! Our estimates are also no longer BLUE --could find other estimators with smaller variance

Variance with Heteroskedasticity! Suppose that in a simple (univariate) regression model, the first 4 Gauss-Markov assumptions hold but Var(u i x i )= i (i.e. variance depends on x)! Then, the OLS estimator is just: ˆβ 1 = β 1 + Var( ˆβ ) 1 = ( x i x ) u i ( x i x )! We can show that: σ i! Without the subscript i we could show that this equals /SST x ( x i x ), where SST x = x i x SST x ( )

Heteroskedasticity Robust SE! So, we need some way to estimate this! A valid estimator of this when i is: ( x i x ) û i, where û i are the OLS residuals SST x! For the general multiple regression model, a valid estimator of the var(betahat j ) with heteroskedasticity: Var ( ˆβ ) j = ˆr ijû i SSR j! Where r ij is the i th residual from regressing x j on all the other x s and SSR j is the SSR from this regression

Heteroskedasticity Robust SE! To see this recall that under homoskedasticity: Var( ˆβ ) j = = r ij σ = σ r ij SSR j σ ( r ) = σ ij SST j 1 R j ( ) ( r ) ij! With heteroskedasticity we can t pull out front!

Robust Standard Errors! Now that we have a consistent estimate of the variance, the square root can be used as a standard error for inference! Typically call these robust standard errors! Sometimes the estimated variance is corrected for degrees of freedom by multiplying by n/(n k 1)! As n tends to infinity it s all the same, though! Note: the robust standard errors only have asymptotic justification

Robust Standard Errors (cont)! Can use the robust SEs to construct robust t- statistic, if heteroskedasticity is a problem (just substitute in robust SE when forming t-stat)! With small sample sizes t statistics formed with robust standard errors will not have a distribution close to the t, and inferences will not be correct As sample size gets large, this problem disappears (i.e. het-robust t-stats are asymptotically t-distributed)! Thus, if homoskedasticity holds it is better to use the usual standard errors for small samples! In EViews, robust standard errors are easily obtained.

A Robust LM Statistic! We can compute LM statistics that are robust to heteroskedasticity to test multiple exclusion restrictions! Where, y= 0 + 1 x 1 + x + + k x k +u and we want to test H 0 : 1 =0, =0 1. Run OLS on the restricted model and save the residuals û. Regress each of the excluded variables on all of the included variables (q different regressions) and save each set of residuals r 1, r i.e. x 1 on x 3.x k and x on x 3.x k

A Robust LM Statistic (cont) 3. Create a variable equal to 1 for all i observations and regress this on r 1 * û, r * û with no intercept! The heteroskedasticity-robust LM statistic turns out to be n SSR 1, where SSR 1 is the sum of squared residuals from this final regression! LM still has a Chi-squared distribution with q degrees of freedom! I offer no explanation as to why or how this works as the derivation is tedious! Just know it can be done.

Testing for Heteroskedasticity! We still may want to know if heteroskedasticity is present if, for example, we have a small sample or if we want to know the form of heteroskedasticity! Essentially we want to test H 0 : Var(u x 1, x,, x k ) =! This is equivalent to H 0 : E(u x 1, x,, x k ) = E(u ) = {E(u)=0}! So to test for heteroskedasticity we want to test whether u is related to any of the x s

The Breusch-Pagan Test! If assume the relationship between u and x j will be linear, can test as a linear restriction! So, for u = 0 + 1 x 1 + + k x k + v, this means testing H 0 : 1 = = = k = 0! Don t observe the error, but can estimate it with the residuals from the OLS regression! Thus, regress the residuals squared on all the x s! The R from this regression can be used to form an F or LM statistic to test for overall significance

The Breusch-Pagan Test (cont) F-Statistic:! The F statistic is just the reported F statistic for overall significance of the regression! F = [R /k]/[(1 R )/(n k 1)], which is distributed F k, n - k - 1 LM-Statistic (Breusch-Pagan)! The LM statistic is LM = nr, which is distributed k Note that we did not have to run an auxilliary regression in this case! Could carry out this test for any of the x s individually (instead of for all x s together)

The White Test! The Breusch-Pagan test will detect any linear forms of heteroskedasticity! The White test allows for nonlinearities by using squares and crossproducts of all the x s! e.g. With two x s estimate:! û = 0 + 1 x 1 + x + 3 x 1 + 4 x + 5 x 1 x +erro r! Still just using an F or LM to test whether all the x j, x j, and x j x h are jointly significant! This can get to be unwieldy pretty quickly as number of x variables gets bigger.

Alternate form of the White test! We can get all the squares and cross products on the RHS in an easier way! If we recognize that the fitted values from OLS, yhats, are a function of all the x s! Thus, yhat will be a function of the squares and cross-products of the x j s! So another way to do the White test is to regress the residuals squared on yhat and yhat u ˆ = δ + δ yˆ + δ yˆ + 0 1 error! Then, as before, use the R to form an F or LM statistic

Alternate form (cont)! Here the null hypothesis is H 0 : 1 =0, =0! So, there are only restrictions regardless of the number of independent variables in the original model! It is important to test for functional form first otherwise we could reject H 0 even if it is true Make sure your functional form is right, before doing heteroskedasticity tests.! e.g. Don t estimate a model with experience only when should have experience and experience Your heteroskedasticity test (using the misspecified model) would lead to rejection of the null, when the null is true.

Weighted Least Squares! While it s always possible to estimate robust standard errors for OLS estimates (see above) the OLS estimates themselves may not be efficient! If we know something about the specific form of the heteroskedasticity, we can obtain more efficient estimates than OLS! The basic idea is going to be to transform the model into one that has homoskedastic errors! We call this the weighted least squares approach

Heteroskedasticity is known up to a multiplicative constant! Suppose the heteroskedasticity can be modeled as Var(u x) = h(x) h(x) is some function of the x s, greater than zero Here h(x) is assumed to be known (Case 1)! The trick is to figure out what h(x) looks like! Thus, for a random sample from the population! s = Var(u i x) = s h i! How can we use this information to get homoskedastic errors and therefore estimate betas more efficiently?

Heteroskedasticity is known! First note that: E u i h i x! And, therefore, ( ) = 0, since h i is a function of the x's ( ) = σ, since Var u i x ( ) = σ h i Var u i h i x! So, if we divided our whole equation by the square root of h i we would have a model where the error is homoskedastic y i h i = β 0 h i + β 1 x 1i h i +... + β k x ki h i + u i h i

Heteroskedasticity is known y * = β x * + β x * +... + β x * + u * i 0 i0 1 i1 k ik i! So, estimate the transformed equation to get more efficient estimates of the parameters! Notice that the s have the same interpretation as in the original model Example:! Cigs= 0 + 1 Cigprice+ Rest+ + k Income +u! Might think that var(u i x i )= *income! i.e. that h(x)=income

Generalized Least Squares! Estimating the transformed equation by OLS is an example of generalized least squares (GLS)! GLS will be BLUE as long as the original model satisfies Gauss-Markov except homoskedasticity! GLS is a weighted least squares (WLS) procedure where each squared residual is weighted by the inverse of Var(u i x i )! So, less weight is given to observations with a higher error variance

Weighted Least Squares! While it is intuitive to see why performing OLS on a transformed equation is appropriate, it can be tedious to do the transformation! Weighted least squares is a way of getting the same thing, without the transformation! Idea is to minimize the weighted sum of squares (weighted by 1/h i ) minimize: ( y i b 0 b 1 x i1... b k x ) ik h i! This is the same as the transformed regression! EViews will do this without transforming variables

More on WLS! WLS is great if we know what Var(u i x i ) looks like! In most cases, won t know form of heteroskedasticity (Case )

Feasible GLS! In the more typical case where you don t know the form of heteroskedasticity, you can estimate h(x i )! Typically, we start with the assumption of a fairly flexible model, such as! Var(u x) = exp( 0 + 1 x 1 + + k x k ), where h (x i ) = exp( 0 + 1 x 1 + + k x k )! Use exp( ) to ensure that h(x i )>0! Since we don t know the s, we must estimate them

Feasible GLS (continued)! Our assumption implies that u = exp( 0 + 1 x 1 + + k x k )v! Where E(v x) = 1, then if E(v) = 1(independent): ln(u ) = 0 + 1 x 1 + + k x k + e! Where E(e) = 0 and e is independent of x! Intercept has a different interpretation ( 0 0 )! Now, we know that û is an estimate of u, so we can estimate this by OLS! regress log(û ) on the x s

Feasible GLS (continued)! We then obtain the fitted values from this regression (g i hats)! Then the estimates of the h i s is obtained as h i hat = exp(g i hat)! Now all we need to do is to run a WLS regression using 1/ exp(ghat) as the weights Notes:! FGLS is biased (due to use of h i hats) but consistent and asymptotically more efficient than OLS! Remember, the estimates ß give the marginal effect of x j on y j not x j * on y j *

WLS Wrapup! F-statistics after WLS (whether R or SSR form) are a bit tricky! Need to form the weights from the unrestricted model and use those to estimate WLS for the restricted model as well (i.e use same weighted x s)! The OLS and WLS estimates will be different due to sampling error! However, if they are very different then it s likely that some other Gauss-Markov assumption is false e.g. (0 conditional mean)