Lecture 19. Common problem in cross section estimation heteroskedasticity

Similar documents
Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Handout 12. Endogeneity & Simultaneous Equation Models

10) Time series econometrics

7 Introduction to Time Series

Econometrics. 9) Heteroscedasticity and autocorrelation

Answers: Problem Set 9. Dynamic Models

9) Time series econometrics

Graduate Econometrics Lecture 4: Heteroskedasticity

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15

Handout 11: Measurement Error

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Problem Set 10: Panel Data

Topic 7: Heteroskedasticity

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Lecture 14. More on using dummy variables (deal with seasonality)

Lecture 4: Multivariate Regression, Part 2

Course Econometrics I

Lecture 4: Multivariate Regression, Part 2

Stationary and nonstationary variables

Binary Dependent Variables

Multiple Regression Analysis

Essential of Simple regression

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Making sense of Econometrics: Basics

Empirical Application of Simple Regression (Chapter 2)

ECO220Y Simple Regression: Testing the Slope

THE MULTIVARIATE LINEAR REGRESSION MODEL

Intermediate Econometrics

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

BCT Lecture 3. Lukas Vacha.

Lab 11 - Heteroskedasticity

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48

Introduction to Econometrics

Introductory Workshop on Time Series Analysis. Sara McLaughlin Mitchell Department of Political Science University of Iowa

Lecture 4: Heteroskedasticity

Chapter 8 Heteroskedasticity

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

Introductory Econometrics

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON3150/4150 Spring 2015

Introduction to Econometrics. Heteroskedasticity

Lecture 8: Functional Form

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Multiple Regression Analysis: Heteroskedasticity

Time Series Methods. Sanjaya Desilva

ECON3150/4150 Spring 2016

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Econometrics - 30C00200

Econ 423 Lecture Notes

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54

Instrumental Variables, Simultaneous and Systems of Equations

Applied Statistics and Econometrics

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Statistical Modelling in Stata 5: Linear Models

Lecture 8: Heteroskedasticity. Causes Consequences Detection Fixes

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X

Statistical Inference with Regression Analysis

11. Further Issues in Using OLS with TS Data

Lecture#12. Instrumental variables regression Causal parameters III

ECON Introductory Econometrics. Lecture 13: Internal and external validity

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Econometrics. 8) Instrumental variables

Practice exam questions

Economics 308: Econometrics Professor Moody

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Rockefeller College University at Albany

Multivariate Time Series

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Interpreting coefficients for transformed variables

Lab 07 Introduction to Econometrics

Lab 10 - Binary Variables

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

F Tests and F statistics

Covers Chapter 10-12, some of 16, some of 18 in Wooldridge. Regression Analysis with Time Series Data

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Question 1 [17 points]: (ch 11)

Autoregressive models with distributed lags (ADL)

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics Part Three

Heteroskedasticity Example

AUTOCORRELATION. Phung Thanh Binh

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Semester 2, 2015/2016

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

ECO375 Tutorial 7 Heteroscedasticity

Multiple Regression Analysis

Lecture 3: Multivariate Regression

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECON3150/4150 Spring 2016

Econometrics and Structural

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Lecture 9 Time Series Methods

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Transcription:

Lecture 19 Learning to worry about and deal with stationarity Common problem in cross section estimation heteroskedasticity What is it Why does it matter What to do about it

Stationarity Ultimately whether you can sensibly include lags of either the dependent or explanatory variables or indeed the current level of a variable in a regression also depends on whether the time series data that you are analysing are stationary A variable is said to be (weakly) stationary if 1) its mean 2) its variance 3) its autocovariance Cov(Y t, Y t-s ) where s t do not change over time Stationarity is needed if the Gauss-Markov conditions for unbiased, efficient OLS estimation are to be met by time series data (Essentially any variable that is trended is unlikely to be stationary)

200000 400000 600000 800000 Example: A plot of nominal GDP over time using the data set stationary.dta use "E:\qm2\Lecture 17\stationary.dta", clear two (scatter gdp year) 1000000 1200000 1940 1960 1980 2000 2020 year GDP displays a distinct upward trend and so is unlikely to be stationary. Neither its mean value or its variance are stable over time su gdp if year<1980 Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- gdp 31 451016.3 109771 293576 644491 su gdp if year>=1980 Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- gdp 28 900660.8 196878.9 622722 1266397

0 5 10 15 20 25 Some series are already stationary if there is no obvious trend and some sort of reversion to a long run value. The UK inflation rate is one example (from the data set stationary.dta) two (line inflation year) 1950 1960 1970 1980 1990 2000 year

In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway)

In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today.

In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk

In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t

In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t - the value of Y today equals last period s value plus an unpredictable random error e (hence the name) and no other lags

In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t - the value of Y today equals last period s value plus an unpredictable random error e (hence the name) and no other lags This means that the best forecast of this period s level is last period s level.

Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1

Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1. A coefficient of one means that the series is a unit root process

Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1. A coefficient of one means that the series is a unit root process

Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term

Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Y t = Y t-1 + e t

Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t

Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift

Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year)

Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year) Can also model this by adding a time trend (t=year) Y t = b 0 + Y t-1 + t + e t

Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year) Can also model this by adding a time trend (t=year) Y t = b 0 + Y t-1 + t + e t what this means is that a series can be stationary around an upward (or downward) trend

Consequences Can show that if variables are NOT stationary then

Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased

Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased 2. This often leads to spurious regression variables appear to be related (significant in a regression) but this is because both are trended. If take trend out would not be.

Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased 2. This often leads to spurious regression variables appear to be related (significant in a regression) but this is because both are trended. If take trend out would not be. 3. OLS estimates of coefficient on lagged dependent variable are biased toward zero

Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by

Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X)

Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X) and in non-stationary series the variance of X goes to infinity as the sample size T (number of time periods) increases

Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X) and in non-stationary series the variance of X goes to infinity as the sample size T (number of time periods) increases so the 2 nd term effectively goes to zero and endogeneity is less of an issue in (long) time series data

Example: Suppose you decide to regress United States inflation rate on the level of British GDP. There should, in truth, be very little relationship between the two (it is difficult to argue how British GDP could really affect US inflation) If you regress US inflation rates on UK GDP for the period 1956-1979. u gdp_sta. reg usinf gdp if year<1980 & quarter==1 Source SS df MS Number of obs = 24 -------------+------------------------------ F( 1, 22) = 50.81 Model 156.605437 1 156.605437 Prob > F = 0.0000 Residual 67.8141518 22 3.08246144 R-squared = 0.6978 -------------+------------------------------ Adj R-squared = 0.6841 Total 224.419589 23 9.75737343 Root MSE = 1.7557 ------------------------------------------------------------------------------ usinf Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp.0001402.0000197 7.13 0.000.0000994.000181 _cons -9.352343 1.945736-4.81 0.000-13.38755-5.317133 which appears to suggest a significant positive (causal) relationship between the two. The R 2 is also very high and if you regress US inflation rates on UK GDP for the period 1980-2002. reg usinf gdp if year>=1980 & quarter==1 Source SS df MS Number of obs = 23 -------------+------------------------------ F( 1, 21) = 11.48 Model 59.6216433 1 59.6216433 Prob > F = 0.0028 Residual 109.033142 21 5.19205437 R-squared = 0.3535 -------------+------------------------------ Adj R-squared = 0.3227 Total 168.654785 22 7.66612659 Root MSE = 2.2786 ------------------------------------------------------------------------------ usinf Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp -.0000589.0000174-3.39 0.003 -.000095 -.0000227 _cons 13.77226 2.904938 4.74 0.000 7.731107 19.81341 this now gives a significant negative relationship and the R 2 is much lower

0 5 usinf 10 15 60000 80000 100000 120000 140000 gdp In truth it is hard to believe that UK GDP has any real effect on US inflation rates. The reason why there appears to be a significant relation is because both variables are trended upward in the 1 st period and the regression picks up the common (but unrelated) trends. This is spurious regression twoway (scatter usinf year if year<=1980) (scatter gdp year if year<=1980, yaxis(2)) 1955 1960 1965 1970 1975 1980 year... usinf gdp

usinf 2 4 6 8 120000 140000 160000 180000 200000 220000 gdp 10 twoway (scatter usinf year if year>1980) (scatter gdp year if year>1980, yaxis(2)) 1980 1985 1990 1995 2000 year... usinf gdp

What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary

What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference

What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1

What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t

What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t = e t which should be stationary ie random and not trended

What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t = e t which should be stationary ie random and not trended - since the differenced variable is just equal to the random error term which has no trend or systematic behaviour

-20000 0 20000 40000 Example: The % change in gdp looks more likely to be stationary. use "E:\qm2\Lecture 17\stationary.dta", clear 1940 1960 1980 2000 2020 year By inspection it seems there is no trend in the difference of GDP over time (and hence the mean and variance look reasonably stable over time)

Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant.

Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant

Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant (if the series changes by the same proportional amount in each period then the log of a series changes by the same amount in each subperiod)

Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant (if the series changes by the same proportional amount in each period then the log of a series changes by the same amount in each subperiod)

In practice not always easy to tell by looking at a series whether it is a random walk (non-stationary) or not. So need to test this formally

Use the Dickey-Fuller test

Detection Given Y t = Y t-1 + e t is non-stationary (1)

Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) )

Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) ) So the test of stationarity is a test of whether b=1

Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) ) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out)

Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1

Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t

Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t

Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t

Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3)

Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0

Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1)

Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1) If so, the data follow a random walk and so the variable is non-stationary

Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1) If so, the data follow a random walk and so the variable is non-stationary This is called the Dickey Fuller Test

So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero.

So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values

So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression

So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression

So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression and as a general rule only regress variables that are stationary on each other.

So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression and as a general rule only regress variables that are stationary on each other.

If they fail the Dickey-Fuller test then try using the difference of that variable instead

Example: To test formally whether the UK house prices are stationary or not. u price_sta tsset TIME time variable: TIME, 24004 to 24084 delta: 1 unit. g dprice=price-price[_n-1] /* creates 1 st difference variable */ (1 missing value generated). g d2price=dprice-dprice[_n-1]. reg dprice l.price Source SS df MS Number of obs = 80 -------------+------------------------------ F( 1, 78) = 3.32 Model 8932482.25 1 8932482.25 Prob > F = 0.0724 Residual 210035668 78 2692764.98 R-squared = 0.0408 -------------+------------------------------ Adj R-squared = 0.0285 Total 218968150 79 2771748.74 Root MSE = 1641 ------------------------------------------------------------------------------ dprice Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- price L1. -.0088124.0048384-1.82 0.072 -.018445.0008202 _cons 3012.817 869.2151 3.47 0.001 1282.343 4743.291 ------------------------------------------------------------------------------. dfuller price Dickey-Fuller test for unit root Number of obs = 80 ---------- Interpolated Dickey-Fuller --------- Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value ------------------------------------------------------------------------------ Z(t) -1.821-3.538-2.906-2.588 ------------------------------------------------------------------------------ MacKinnon approximate p-value for Z(t) = 0.3699Since estimated t value < Dickey-Fuller critical value (2.86) can t reject null that null that g= 0 (and b=1) and so original series (ie the level, not the change in prices follows a random walk. So conclude that house prices are a non-stationary series If we repeat the test for the 1 st difference in prices (ie the change in prices)

. reg d2price l.dprice Source SS df MS Number of obs = 79 -------------+------------------------------ F( 1, 77) = 29.84 Model 67875874.8 1 67875874.8 Prob > F = 0.0000 Residual 175154796 77 2274737.61 R-squared = 0.2793 -------------+------------------------------ Adj R-squared = 0.2699 Total 243030671 78 3115777.83 Root MSE = 1508.2 ------------------------------------------------------------------------------ d2price Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- dprice L1. -.5589261.1023204-5.46 0.000 -.7626721 -.3551801 _cons 810.6236 225.3341 3.60 0.001 361.9261 1259.321 ------------------------------------------------------------------------------ Since estimated t value now > Dickey-Fuller critical value (2.86) reject null that g= 0 (and b=1) and so new series (ie the change in, not the level of prices) is a stationary series Should therefore use the change in prices rather than the level of prices in any OLS estimation (same test should be applied to any other variables used in a regression) Note: stata will do (a variant of) this test automatically note that the critical values are different since stata includes lagged values of the dependent variable in the test (the augmented Dickey Fuller test). dfuller dprice, regress Dickey-Fuller test for unit root Number of obs = 79 ---------- Interpolated Dickey-Fuller --------- Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value ------------------------------------------------------------------------------ Z(t) -5.463-3.539-2.907-2.588 ------------------------------------------------------------------------------ MacKinnon approximate p-value for Z(t) = 0.0000 ------------------------------------------------------------------------------ D.dprice Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- dprice L1. -.5589261.1023204-5.46 0.000 -.7626721 -.3551801 _cons 810.6236 225.3341 3.60 0.001 361.9261 1259.321 p value is <.05 so again reject null that g= 0 (and b=1)

Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i

Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i In practice this means the spread of observations at any given value of X will not now be constant

Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i In practice this means the spread of observations at any given value of X will not now be constant Eg. food expenditure is known to vary much more at higher levels of income than at lower levels of income, the level of profits tends to vary more across large firms than across small firms)

-20 0 Residuals 20 40 60 Example: the data set food.dta contains information on food expenditure and income. A graph of the residuals from a regression of food spending on total household expenditure clearly that the residuals tend to be more spread out at higher levels of income this is typical pattern associated with heteroskedasticity.. reg food expnethsum Source SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 107.19 Model 22490.0823 1 22490.0823 Prob > F = 0.0000 Residual 41544.8096 198 209.822271 R-squared = 0.3512 -------------+------------------------------ Adj R-squared = 0.3479 Total 64034.8918 199 321.783376 Root MSE = 14.485 ------------------------------------------------------------------------------ food Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- expnethsum.0355189.0034308 10.35 0.000.0287534.0422844 _cons 28.55002 1.56964 18.19 0.000 25.45466 31.64537 ------------------------------------------------------------------------------. predict res, resid. two (scatter res expnet if expnet<500) 0 100 200 300 400 500 household expenditure net of housing

Consequences of Heteroskedasticity Can show: 1. OLS estimates of coefficients remains unbiased (as with autocorrelation)

Consequences of Heteroskedasticity Can show: 1. OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given Y i = b 0 + b 1 X i +u I

Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) Var( X)

Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X)

Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X)

Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness,

Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness, so OLS estimate of coefficients remains unbiased in presence of heteroskedasticity

Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness, so OLS estimate of coefficients remains unbiased in presence of heteroskedasticity but

2) can show that heteroskedasticity (like autocorrelation) means the OLS estimates of the standard errors (and hence t and F tests) are biased.

2) can show that heteroskedasticity (like autocorrelation) means the OLS estimates of the standard errors (and hence t and F tests) are biased. (intuitively, if all observations are distributed unevenly about the regression line then OLS is unable to distinguish the quality of the observations - observations further away from the regression line should be given less weight in the calculation of the standard errors (since they are more unreliable) but OLS can t do this, so the standard errors are biased).

Testing for Heteroskedasticity 1. Residual Plots In absence of Heteroskedasticity there should be no obvious pattern to the spread of the residuals, so useful to plot the residuals against the X variable thought to be causing the problem,

Testing for Heteroskedasticity 1. Residual Plots In absence of Heteroskedasticity there should be no obvious pattern to the spread of the residuals, so useful to plot the residuals against the X variable thought to be causing the problem, - assuming you know which X variable it is (often difficult)

2. Goldfeld-Quandt Again assuming know which variable is causing the problem then can test formally whether the residual spread varies with values of the suspect X variable. i) Order the data by the size of the X variable and split the data into 2 equal sub-groups (one high variance the other low variance) ii) Drop the middle c observations where c is approximately 30% of your sample iii) Run separate regressions for the high and low variance subsamples iv) Compute RSShighvar iancesub sample N c 2k N c 2k F ~ F, RSSlowvar iancesub sample 2 2 v) If estimated F>Fcritical, reject null of no heteroskedasticity (intuitively the residuals from the high variance sub-sample are much larger than the residuals from the high variance subsample) Fine if certain which variable causing the problem, less so if unsure.

3. Breusch-Pagan Test In most cases involving more than one right hand side variable it is unlikely that you will know which variable is causing the problem. A more general test is therefore to regress an approximation of the (unknown) residual variance on all the right hand side variables and test for a significant causal effect (if there is then you suspect heteroskedasticity)

Breusch-Pagan Test Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u

Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance

Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2)

Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute F R (1 R 2 auxillary/ k 1 2 auxillary) / N k ~ F[ k 1, N k]

Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxillary regression

Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxillary regression

Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxiliary regression or compute N*R 2 auxillary ~ 2 (k-1) (k-1 since not testing constant)

Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxiliary regression or compute N*R 2 auxillary ~ 2 (k-1) (k-1 since not testing constant) If F or N*R 2 auxillary > respective critical values reject null of no heterosked.

Example: Breusch-Pagan Test of Heteroskedastcity The data set smoke.dta contains information on the smoking habits, wages age and gender of a cross-section of individuals. u smoke.dta /* read in data */. reg lhw age age2 female smoke Source SS df MS Number of obs = 7970 -------------+------------------------------ F( 4, 7965) = 284.04 Model 304.964893 4 76.2412233 Prob > F = 0.0000 Residual 2137.94187 7965.268417059 R-squared = 0.1248 -------------+------------------------------ Adj R-squared = 0.1244 Total 2442.90677 7969.306551232 Root MSE =.51809 ------------------------------------------------------------------------------ lhw Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- age.0728466.0031712 22.97 0.000.0666301.0790631 age2 -.000847.0000382-22.17 0.000 -.0009219 -.0007721 female -.2583456.0116394-22.20 0.000 -.2811618 -.2355294 smokes -.1501679.0128866-11.65 0.000 -.1754291 -.1249068 _cons.8732505.062907 13.88 0.000.7499363.9965646 ------------------------------------------------------------------------------ /* save residuals */. predict reshat, resid. g reshat2=reshat^2 /* square them */ /* regress square of residuals on all original rhs variables */. reg reshat2 age age2 female smoke Source SS df MS Number of obs = 7970 -------------+------------------------------ F( 4, 7965) = 6.59 Model 13.2179958 4 3.30449895 Prob > F = 0.0000 Residual 3996.90523 7965.501808566 R-squared = 0.0033 -------------+------------------------------ Adj R-squared = 0.0028 Total 4010.12323 7969.503215363 Root MSE =.70838

------------------------------------------------------------------------------ reshat2 Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- age.0012546.004336 0.29 0.772 -.0072452.0097544 age2.0000252.0000522 0.48 0.630 -.0000772.0001276 female.0022702.0159145 0.14 0.887 -.0289264.0334668 smokes -.0174587.0176199-0.99 0.322 -.0519983.0170808 _cons.1766929.0860128 2.05 0.040.0080854.3453004 ------------------------------------------------------------------------------ Breusch-Pagan test is N*R 2. di 7970*.0033 26.301 which is chi-squared k-1 degrees of freedom (4 in this case) and the critical value is 9.48. So estimated value exceeds critical value Similarly the F test for goodness of fit in stata output in the top right corner is test for joint significance of all the rhs variables in this model (excluding the constant) From F tables, Fcritical 5% level (4,7970) = 2.37 So estimated F = 6.59 > Fcritical, so reject null of no heteroskedasticity Or could use Stata s version of the Breusch-Pagan test bpagan lhw age age2 female smoke Breusch-Pagan LM statistic: 2405.986 Chi-sq( 5) P-value = 0

What to do if heteroskedasticity present? 1. Try different functional form Sometimes taking logs of dependent or explanatory variable can reduce the problem

. reg food expnethsum if exp<1000 Source SS df MS Number of obs = 192 -------------+------------------------------ F( 1, 190) = 110.00 Model 21179.4196 1 21179.4196 Prob > F = 0.0000 Residual 36583.9436 190 192.547072 R-squared = 0.3667 -------------+------------------------------ Adj R-squared = 0.3633 Total 57763.3632 191 302.425986 Root MSE = 13.876 ------------------------------------------------------------------------------ food Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- expnethsum.0504532.0048106 10.49 0.000.0409641.0599423 _cons 24.60655 1.770093 13.90 0.000 21.11499 28.0981 ------------------------------------------------------------------------------. bpagan expn Breusch-Pagan LM statistic: 7.54351 Chi-sq( 1) P-value =.006 The Breusch-Pagan test indicates the presence of heteroskedasticity (estimated chi-squared value > critical value). This means the standard errors, t statistics etc are biased If use the log of the dependent variable rather than in levels. g lfood=log(food). reg lfood expnethsum if exp<1000 Source SS df MS Number of obs = 192 -------------+------------------------------ F( 1, 190) = 109.54 Model 15.6957883 1 15.6957883 Prob > F = 0.0000 Residual 27.2248994 190.143288944 R-squared = 0.3657 -------------+------------------------------ Adj R-squared = 0.3624 Total 42.9206877 191.224715642 Root MSE =.37854 ------------------------------------------------------------------------------ lfood Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- expnethsum.0013735.0001312 10.47 0.000.0011146.0016323 _cons 3.166388.0482874 65.57 0.000 3.071139 3.261636 ------------------------------------------------------------------------------. bpagan expnethsum Breusch-Pagan LM statistic: 1.398311 Chi-sq( 1) P-value =.237

2. Drop Outliers Sometimes heteroskedasticity can be influenced by 1 or 2 observations in the data set which stand a long way from the main concentration of data - outliers.

2. Drop Outliers Sometimes heteroskedasticity can be influenced by 1 or 2 observations in the data set which stand a long way from the main concentration of data - outliers. Often these observations may be genuine in which case you should not drop them but sometimes they may be the result of measurement error or miscoding in which case you may have a case for dropping them.

5 infmort 10 15 20 Example The data infmort.dta gives infant mortality for 51 U.S. states along with the number of doctors per capita ine ach state. A graph of infant mortality against number of doctors clearly shows that Washington D.C. is something of an outlier (it has lots of doctors but also a very high infant mortality rate). twoway (scatter infmort state, mlabel(state)), ytitle(infmort) ylabel(, labels) xtitle(state) dc goergia mississip scarol louisiana alabama alaska illinois michigan ncarol delaware sodakota tennesee virginia indiana newyork ohio westvirg florida marylandmissouri pennsyl arkansas newjers oklahoma arizona colarado montana newmex idaho kansas kentucky connet iowa nebraska nevada wyoming oregon nodakot rhodis texas wisconsin calif washingt minnesot utah mass newhamp hawaii maine vermont 0 10 20 30 40 50 state A regression of infant mortality on (the log of) doctor numbers for all 51 observations suffers from heteroskedasticity. reg infmort ldocs Source SS df MS Number of obs = 51 -------------+------------------------------ F( 1, 49) = 4.08 Model 17.7855153 1 17.7855153 Prob > F = 0.0488

Residual 213.461954 49 4.3563664 R-squared = 0.0769 -------------+------------------------------ Adj R-squared = 0.0581 Total 231.247469 50 4.62494938 Root MSE = 2.0872 ------------------------------------------------------------------------------ infmort Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- ldocs 2.130049 1.054189 2.02 0.049.0115765 4.248522 _cons -1.959674 5.572467-0.35 0.727-13.15797 9.238617 ------------------------------------------------------------------------------. bpagan ldocs Breusch-Pagan LM statistic: 67.14974 Chi-sq( 1) P-value = 2.5e-16 However if the outlier is excluded then. reg infmort ldocs if dc==0 Source SS df MS Number of obs = 50 -------------+------------------------------ F( 1, 48) = 5.13 Model 9.49879378 1 9.49879378 Prob > F = 0.0280 Residual 88.8244081 48 1.8505085 R-squared = 0.0966 -------------+------------------------------ Adj R-squared = 0.0778 Total 98.3232019 49 2.00659596 Root MSE = 1.3603 ------------------------------------------------------------------------------ infmort Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- ldocs -1.915912.8456428-2.27 0.028-3.616191 -.2156336 _cons 19.12582 4.448765 4.30 0.000 10.18098 28.07066 ------------------------------------------------------------------------------. bpagan ldocs Breusch-Pagan LM statistic:.0825086 Chi-sq( 1) P-value =.7739 Can see that the problem of heteroskedasticty disappears though the D.C. observation is genuine so you need to think carefully about the benefits of dropping it against the costs.

2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity

Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2

Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 )

2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1

2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 )

2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) =1/X i 2 Var(u i )

2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2

2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X)= Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set

2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1)

2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1) becomes Y i / X i = b 0 / X i + b 1 X i / X i + u i / X i (2)

2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1) becomes Y i / X i = b 0 / X i + b 1 X i / X i + u i / X i (2) and the estimates of b 0 and b 1 in (2) will not be affected by heterosked.

This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF

This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF The assumption about the form of heteroskedasticity is correct

This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF The assumption about the form of heteroskedasticity is correct If not the solution may be much worse than OLS

Example. reg hourpay age Source SS df MS Number of obs = 12098 -------------+------------------------------ F( 1, 12096) = 133.08 Model 5207.03058 1 5207.03058 Prob > F = 0.0000 Residual 473292.608 12096 39.1280264 R-squared = 0.0109 -------------+------------------------------ Adj R-squared = 0.0108 Total 478499.638 12097 39.5552317 Root MSE = 6.2552 ------------------------------------------------------------------------------ hourpay Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- age.0586134.005081 11.54 0.000.0486539.0685729 _cons 6.168383.2066433 29.85 0.000 5.763329 6.573437 ------------------------------------------------------------------------------. bpagan age Breusch-Pagan LM statistic: 17.27396 Chi-sq( 1) P-value = 3.2e-05 Test suggests heteroskedasticity present Suppose you decide that heteroskedasticity is given by var(u i )=σ 2 Age i So transform variables by dividing by SQUARE ROOT of Age (including the constant). g ha=hourpay/sqrt(age). g aa=age/sqrt(age). g ac=1/sqrt(age) /* this is new constant term */. reg ha aa ac, nocon Source SS df MS Number of obs = 12098 -------------+------------------------------ F( 2, 12096) =10990.27 Model 22854.251 2 11427.1255 Prob > F = 0.0000 Residual 12576.8073 12096 1.03974928 R-squared = 0.6450 -------------+------------------------------ Adj R-squared = 0.6450

Total 35431.0584 12098 2.92867072 Root MSE = 1.0197 ------------------------------------------------------------------------------ ha Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- aa.0932672.0049446 18.86 0.000.0835749.1029594 ac 4.813435.184437 26.10 0.000 4.451908 5.174961 ------------------------------------------------------------------------------ If heteroskedastic assumption is correct these are the GLS estimates and should be preferred to OLS. If assumption is not correct they will be misleading.

3. White adjustment (OLS robust standard errors) As with autocorrelation, best fix may be to make OLS standard errors unbiased (if inefficient) if don t know precise form of heteroskedasticity

3. White adjustment (OLS robust standard errors) As with autocorrelation, best fix may be to make OLS standard errors unbiased (if inefficient) if don t know precise form of heteroskedasticity In absence of heteroskedasticity we know OLS estimate of variance on any coefficient is