Lecture 19. Common problem in cross section estimation heteroskedasticity

Size: px

Start display at page:

Download "Lecture 19. Common problem in cross section estimation heteroskedasticity"

Anthony Armstrong
5 years ago
Views:

1 Lecture 19 Learning to worry about and deal with stationarity Common problem in cross section estimation heteroskedasticity What is it Why does it matter What to do about it

2 Stationarity Ultimately whether you can sensibly include lags of either the dependent or explanatory variables or indeed the current level of a variable in a regression also depends on whether the time series data that you are analysing are stationary A variable is said to be (weakly) stationary if 1) its mean 2) its variance 3) its autocovariance Cov(Y t, Y t-s ) where s t do not change over time Stationarity is needed if the Gauss-Markov conditions for unbiased, efficient OLS estimation are to be met by time series data (Essentially any variable that is trended is unlikely to be stationary)

3 Example: A plot of nominal GDP over time using the data set stationary.dta use "E:\qm2\Lecture 17\stationary.dta", clear two (scatter gdp year) year GDP displays a distinct upward trend and so is unlikely to be stationary. Neither its mean value or its variance are stable over time su gdp if year<1980 Variable Obs Mean Std. Dev. Min Max gdp su gdp if year>=1980 Variable Obs Mean Std. Dev. Min Max gdp

4 Some series are already stationary if there is no obvious trend and some sort of reversion to a long run value. The UK inflation rate is one example (from the data set stationary.dta) two (line inflation year) year

5 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway)

6 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today.

7 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk

8 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t

9 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t - the value of Y today equals last period s value plus an unpredictable random error e (hence the name) and no other lags

10 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t - the value of Y today equals last period s value plus an unpredictable random error e (hence the name) and no other lags This means that the best forecast of this period s level is last period s level.

11 Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1

12 Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1. A coefficient of one means that the series is a unit root process

13 Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1. A coefficient of one means that the series is a unit root process

14 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term

15 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Y t = Y t-1 + e t

16 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t

17 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift

18 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year)

19 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year) Can also model this by adding a time trend (t=year) Y t = b 0 + Y t-1 + t + e t

20 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year) Can also model this by adding a time trend (t=year) Y t = b 0 + Y t-1 + t + e t what this means is that a series can be stationary around an upward (or downward) trend

21 Consequences Can show that if variables are NOT stationary then

22 Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased

23 Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased 2. This often leads to spurious regression variables appear to be related (significant in a regression) but this is because both are trended. If take trend out would not be.

24 Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased 2. This often leads to spurious regression variables appear to be related (significant in a regression) but this is because both are trended. If take trend out would not be. 3. OLS estimates of coefficient on lagged dependent variable are biased toward zero

25 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by

26 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X)

27 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X) and in non-stationary series the variance of X goes to infinity as the sample size T (number of time periods) increases

28 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X) and in non-stationary series the variance of X goes to infinity as the sample size T (number of time periods) increases so the 2 nd term effectively goes to zero and endogeneity is less of an issue in (long) time series data

29 Example: Suppose you decide to regress United States inflation rate on the level of British GDP. There should, in truth, be very little relationship between the two (it is difficult to argue how British GDP could really affect US inflation) If you regress US inflation rates on UK GDP for the period u gdp_sta. reg usinf gdp if year<1980 & quarter==1 Source SS df MS Number of obs = F( 1, 22) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = usinf Coef. Std. Err. t P> t [95% Conf. Interval] gdp _cons which appears to suggest a significant positive (causal) relationship between the two. The R 2 is also very high and if you regress US inflation rates on UK GDP for the period reg usinf gdp if year>=1980 & quarter==1 Source SS df MS Number of obs = F( 1, 21) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = usinf Coef. Std. Err. t P> t [95% Conf. Interval] gdp _cons this now gives a significant negative relationship and the R 2 is much lower

30 0 5 usinf gdp In truth it is hard to believe that UK GDP has any real effect on US inflation rates. The reason why there appears to be a significant relation is because both variables are trended upward in the 1 st period and the regression picks up the common (but unrelated) trends. This is spurious regression twoway (scatter usinf year if year<=1980) (scatter gdp year if year<=1980, yaxis(2)) year... usinf gdp

31 usinf gdp 10 twoway (scatter usinf year if year>1980) (scatter gdp year if year>1980, yaxis(2)) year... usinf gdp

32 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary

33 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference

34 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1

35 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t

36 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t = e t which should be stationary ie random and not trended

37 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t = e t which should be stationary ie random and not trended - since the differenced variable is just equal to the random error term which has no trend or systematic behaviour

38 Example: The % change in gdp looks more likely to be stationary. use "E:\qm2\Lecture 17\stationary.dta", clear year By inspection it seems there is no trend in the difference of GDP over time (and hence the mean and variance look reasonably stable over time)

39 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant.

40 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant

41 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant (if the series changes by the same proportional amount in each period then the log of a series changes by the same amount in each subperiod)

42 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant (if the series changes by the same proportional amount in each period then the log of a series changes by the same amount in each subperiod)

43 In practice not always easy to tell by looking at a series whether it is a random walk (non-stationary) or not. So need to test this formally

44 Use the Dickey-Fuller test

45 Detection Given Y t = Y t-1 + e t is non-stationary (1)

46 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) )

47 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) ) So the test of stationarity is a test of whether b=1

48 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) ) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out)

49 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1

50 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t

51 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t

52 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t

53 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3)

54 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0

55 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1)

56 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1) If so, the data follow a random walk and so the variable is non-stationary

57 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1) If so, the data follow a random walk and so the variable is non-stationary This is called the Dickey Fuller Test

58 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero.

59 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values

60 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression

61 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression

62 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression and as a general rule only regress variables that are stationary on each other.

63 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression and as a general rule only regress variables that are stationary on each other.

64 If they fail the Dickey-Fuller test then try using the difference of that variable instead

65 Example: To test formally whether the UK house prices are stationary or not. u price_sta tsset TIME time variable: TIME, to delta: 1 unit. g dprice=price-price[_n-1] /* creates 1 st difference variable */ (1 missing value generated). g d2price=dprice-dprice[_n-1]. reg dprice l.price Source SS df MS Number of obs = F( 1, 78) = 3.32 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = dprice Coef. Std. Err. t P> t [95% Conf. Interval] price L _cons dfuller price Dickey-Fuller test for unit root Number of obs = Interpolated Dickey-Fuller Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value Z(t) MacKinnon approximate p-value for Z(t) = Since estimated t value < Dickey-Fuller critical value (2.86) can t reject null that null that g= 0 (and b=1) and so original series (ie the level, not the change in prices follows a random walk. So conclude that house prices are a non-stationary series If we repeat the test for the 1 st difference in prices (ie the change in prices)

66 . reg d2price l.dprice Source SS df MS Number of obs = F( 1, 77) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = d2price Coef. Std. Err. t P> t [95% Conf. Interval] dprice L _cons Since estimated t value now > Dickey-Fuller critical value (2.86) reject null that g= 0 (and b=1) and so new series (ie the change in, not the level of prices) is a stationary series Should therefore use the change in prices rather than the level of prices in any OLS estimation (same test should be applied to any other variables used in a regression) Note: stata will do (a variant of) this test automatically note that the critical values are different since stata includes lagged values of the dependent variable in the test (the augmented Dickey Fuller test). dfuller dprice, regress Dickey-Fuller test for unit root Number of obs = Interpolated Dickey-Fuller Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value Z(t) MacKinnon approximate p-value for Z(t) = D.dprice Coef. Std. Err. t P> t [95% Conf. Interval] dprice L _cons p value is <.05 so again reject null that g= 0 (and b=1)

67 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

68 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i

69 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i In practice this means the spread of observations at any given value of X will not now be constant

70 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i In practice this means the spread of observations at any given value of X will not now be constant Eg. food expenditure is known to vary much more at higher levels of income than at lower levels of income, the level of profits tends to vary more across large firms than across small firms)

71 -20 0 Residuals Example: the data set food.dta contains information on food expenditure and income. A graph of the residuals from a regression of food spending on total household expenditure clearly that the residuals tend to be more spread out at higher levels of income this is typical pattern associated with heteroskedasticity.. reg food expnethsum Source SS df MS Number of obs = F( 1, 198) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = food Coef. Std. Err. t P> t [95% Conf. Interval] expnethsum _cons predict res, resid. two (scatter res expnet if expnet<500) household expenditure net of housing

72 Consequences of Heteroskedasticity Can show: 1. OLS estimates of coefficients remains unbiased (as with autocorrelation)

73 Consequences of Heteroskedasticity Can show: 1. OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given Y i = b 0 + b 1 X i +u I

74 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) Var( X)

75 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X)

76 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X)

77 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness,

78 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness, so OLS estimate of coefficients remains unbiased in presence of heteroskedasticity

79 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness, so OLS estimate of coefficients remains unbiased in presence of heteroskedasticity but

80 2) can show that heteroskedasticity (like autocorrelation) means the OLS estimates of the standard errors (and hence t and F tests) are biased.

81 2) can show that heteroskedasticity (like autocorrelation) means the OLS estimates of the standard errors (and hence t and F tests) are biased. (intuitively, if all observations are distributed unevenly about the regression line then OLS is unable to distinguish the quality of the observations - observations further away from the regression line should be given less weight in the calculation of the standard errors (since they are more unreliable) but OLS can t do this, so the standard errors are biased).

82 Testing for Heteroskedasticity 1. Residual Plots In absence of Heteroskedasticity there should be no obvious pattern to the spread of the residuals, so useful to plot the residuals against the X variable thought to be causing the problem,

83 Testing for Heteroskedasticity 1. Residual Plots In absence of Heteroskedasticity there should be no obvious pattern to the spread of the residuals, so useful to plot the residuals against the X variable thought to be causing the problem, - assuming you know which X variable it is (often difficult)

84 2. Goldfeld-Quandt Again assuming know which variable is causing the problem then can test formally whether the residual spread varies with values of the suspect X variable. i) Order the data by the size of the X variable and split the data into 2 equal sub-groups (one high variance the other low variance) ii) Drop the middle c observations where c is approximately 30% of your sample iii) Run separate regressions for the high and low variance subsamples iv) Compute RSShighvar iancesub sample N c 2k N c 2k F ~ F, RSSlowvar iancesub sample 2 2 v) If estimated F>Fcritical, reject null of no heteroskedasticity (intuitively the residuals from the high variance sub-sample are much larger than the residuals from the high variance subsample) Fine if certain which variable causing the problem, less so if unsure.

3. Breusch-Pagan Test In most cases involving more than one right hand side variable it is unlikely that you will know which variable is causing the problem.

85 3. Breusch-Pagan Test In most cases involving more than one right hand side variable it is unlikely that you will know which variable is causing the problem. A more general test is therefore to regress an approximation of the (unknown) residual variance on all the right hand side variables and test for a significant causal effect (if there is then you suspect heteroskedasticity)

86 Breusch-Pagan Test Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u

87 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance

88 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2)

89 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute F R (1 R 2 auxillary/ k 1 2 auxillary) / N k ~ F[ k 1, N k]

90 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxillary regression

91 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxillary regression

92 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxiliary regression or compute N*R 2 auxillary ~ 2 (k-1) (k-1 since not testing constant)

93 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxiliary regression or compute N*R 2 auxillary ~ 2 (k-1) (k-1 since not testing constant) If F or N*R 2 auxillary > respective critical values reject null of no heterosked.

94 Example: Breusch-Pagan Test of Heteroskedastcity The data set smoke.dta contains information on the smoking habits, wages age and gender of a cross-section of individuals. u smoke.dta /* read in data */. reg lhw age age2 female smoke Source SS df MS Number of obs = F( 4, 7965) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lhw Coef. Std. Err. t P> t [95% Conf. Interval] age age female smokes _cons /* save residuals */. predict reshat, resid. g reshat2=reshat^2 /* square them */ /* regress square of residuals on all original rhs variables */. reg reshat2 age age2 female smoke Source SS df MS Number of obs = F( 4, 7965) = 6.59 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.70838

95 reshat2 Coef. Std. Err. t P> t [95% Conf. Interval] age age female smokes _cons Breusch-Pagan test is N*R 2. di 7970* which is chi-squared k-1 degrees of freedom (4 in this case) and the critical value is So estimated value exceeds critical value Similarly the F test for goodness of fit in stata output in the top right corner is test for joint significance of all the rhs variables in this model (excluding the constant) From F tables, Fcritical 5% level (4,7970) = 2.37 So estimated F = 6.59 > Fcritical, so reject null of no heteroskedasticity Or could use Stata s version of the Breusch-Pagan test bpagan lhw age age2 female smoke Breusch-Pagan LM statistic: Chi-sq( 5) P-value = 0

96 What to do if heteroskedasticity present? 1. Try different functional form Sometimes taking logs of dependent or explanatory variable can reduce the problem

97 . reg food expnethsum if exp<1000 Source SS df MS Number of obs = F( 1, 190) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = food Coef. Std. Err. t P> t [95% Conf. Interval] expnethsum _cons bpagan expn Breusch-Pagan LM statistic: Chi-sq( 1) P-value =.006 The Breusch-Pagan test indicates the presence of heteroskedasticity (estimated chi-squared value > critical value). This means the standard errors, t statistics etc are biased If use the log of the dependent variable rather than in levels. g lfood=log(food). reg lfood expnethsum if exp<1000 Source SS df MS Number of obs = F( 1, 190) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lfood Coef. Std. Err. t P> t [95% Conf. Interval] expnethsum _cons bpagan expnethsum Breusch-Pagan LM statistic: Chi-sq( 1) P-value =.237

98 2. Drop Outliers Sometimes heteroskedasticity can be influenced by 1 or 2 observations in the data set which stand a long way from the main concentration of data - outliers.

99 2. Drop Outliers Sometimes heteroskedasticity can be influenced by 1 or 2 observations in the data set which stand a long way from the main concentration of data - outliers. Often these observations may be genuine in which case you should not drop them but sometimes they may be the result of measurement error or miscoding in which case you may have a case for dropping them.

100 5 infmort Example The data infmort.dta gives infant mortality for 51 U.S. states along with the number of doctors per capita ine ach state. A graph of infant mortality against number of doctors clearly shows that Washington D.C. is something of an outlier (it has lots of doctors but also a very high infant mortality rate). twoway (scatter infmort state, mlabel(state)), ytitle(infmort) ylabel(, labels) xtitle(state) dc goergia mississip scarol louisiana alabama alaska illinois michigan ncarol delaware sodakota tennesee virginia indiana newyork ohio westvirg florida marylandmissouri pennsyl arkansas newjers oklahoma arizona colarado montana newmex idaho kansas kentucky connet iowa nebraska nevada wyoming oregon nodakot rhodis texas wisconsin calif washingt minnesot utah mass newhamp hawaii maine vermont state A regression of infant mortality on (the log of) doctor numbers for all 51 observations suffers from heteroskedasticity. reg infmort ldocs Source SS df MS Number of obs = F( 1, 49) = 4.08 Model Prob > F =

101 Residual R-squared = Adj R-squared = Total Root MSE = infmort Coef. Std. Err. t P> t [95% Conf. Interval] ldocs _cons bpagan ldocs Breusch-Pagan LM statistic: Chi-sq( 1) P-value = 2.5e-16 However if the outlier is excluded then. reg infmort ldocs if dc==0 Source SS df MS Number of obs = F( 1, 48) = 5.13 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = infmort Coef. Std. Err. t P> t [95% Conf. Interval] ldocs _cons bpagan ldocs Breusch-Pagan LM statistic: Chi-sq( 1) P-value =.7739 Can see that the problem of heteroskedasticty disappears though the D.C. observation is genuine so you need to think carefully about the benefits of dropping it against the costs.

102 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity

103 Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2

104 Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 )

105 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1

106 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 )

107 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) =1/X i 2 Var(u i )

108 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2

109 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X)= Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set

110 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1)

111 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1) becomes Y i / X i = b 0 / X i + b 1 X i / X i + u i / X i (2)

112 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1) becomes Y i / X i = b 0 / X i + b 1 X i / X i + u i / X i (2) and the estimates of b 0 and b 1 in (2) will not be affected by heterosked.

113 This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF

114 This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF The assumption about the form of heteroskedasticity is correct

115 This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF The assumption about the form of heteroskedasticity is correct If not the solution may be much worse than OLS

116 Example. reg hourpay age Source SS df MS Number of obs = F( 1, 12096) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = hourpay Coef. Std. Err. t P> t [95% Conf. Interval] age _cons bpagan age Breusch-Pagan LM statistic: Chi-sq( 1) P-value = 3.2e-05 Test suggests heteroskedasticity present Suppose you decide that heteroskedasticity is given by var(u i )=σ 2 Age i So transform variables by dividing by SQUARE ROOT of Age (including the constant). g ha=hourpay/sqrt(age). g aa=age/sqrt(age). g ac=1/sqrt(age) /* this is new constant term */. reg ha aa ac, nocon Source SS df MS Number of obs = F( 2, 12096) = Model Prob > F = Residual R-squared = Adj R-squared =

117 Total Root MSE = ha Coef. Std. Err. t P> t [95% Conf. Interval] aa ac If heteroskedastic assumption is correct these are the GLS estimates and should be preferred to OLS. If assumption is not correct they will be misleading.

118 3. White adjustment (OLS robust standard errors) As with autocorrelation, best fix may be to make OLS standard errors unbiased (if inefficient) if don t know precise form of heteroskedasticity

119 3. White adjustment (OLS robust standard errors) As with autocorrelation, best fix may be to make OLS standard errors unbiased (if inefficient) if don t know precise form of heteroskedasticity In absence of heteroskedasticity we know OLS estimate of variance on any coefficient is

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set Heteroskedasticity Occurs when the Gauss Markov assumption that