Lecture 19. Common problem in cross section estimation heteroskedasticity

Size: px
Start display at page:

Download "Lecture 19. Common problem in cross section estimation heteroskedasticity"

Transcription

1 Lecture 19 Learning to worry about and deal with stationarity Common problem in cross section estimation heteroskedasticity What is it Why does it matter What to do about it

2 Stationarity Ultimately whether you can sensibly include lags of either the dependent or explanatory variables or indeed the current level of a variable in a regression also depends on whether the time series data that you are analysing are stationary A variable is said to be (weakly) stationary if 1) its mean 2) its variance 3) its autocovariance Cov(Y t, Y t-s ) where s t do not change over time Stationarity is needed if the Gauss-Markov conditions for unbiased, efficient OLS estimation are to be met by time series data (Essentially any variable that is trended is unlikely to be stationary)

3 Example: A plot of nominal GDP over time using the data set stationary.dta use "E:\qm2\Lecture 17\stationary.dta", clear two (scatter gdp year) year GDP displays a distinct upward trend and so is unlikely to be stationary. Neither its mean value or its variance are stable over time su gdp if year<1980 Variable Obs Mean Std. Dev. Min Max gdp su gdp if year>=1980 Variable Obs Mean Std. Dev. Min Max gdp

4 Some series are already stationary if there is no obvious trend and some sort of reversion to a long run value. The UK inflation rate is one example (from the data set stationary.dta) two (line inflation year) year

5 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway)

6 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today.

7 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk

8 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t

9 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t - the value of Y today equals last period s value plus an unpredictable random error e (hence the name) and no other lags

10 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t - the value of Y today equals last period s value plus an unpredictable random error e (hence the name) and no other lags This means that the best forecast of this period s level is last period s level.

11 Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1

12 Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1. A coefficient of one means that the series is a unit root process

13 Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1. A coefficient of one means that the series is a unit root process

14 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term

15 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Y t = Y t-1 + e t

16 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t

17 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift

18 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year)

19 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year) Can also model this by adding a time trend (t=year) Y t = b 0 + Y t-1 + t + e t

20 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year) Can also model this by adding a time trend (t=year) Y t = b 0 + Y t-1 + t + e t what this means is that a series can be stationary around an upward (or downward) trend

21 Consequences Can show that if variables are NOT stationary then

22 Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased

23 Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased 2. This often leads to spurious regression variables appear to be related (significant in a regression) but this is because both are trended. If take trend out would not be.

24 Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased 2. This often leads to spurious regression variables appear to be related (significant in a regression) but this is because both are trended. If take trend out would not be. 3. OLS estimates of coefficient on lagged dependent variable are biased toward zero

25 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by

26 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X)

27 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X) and in non-stationary series the variance of X goes to infinity as the sample size T (number of time periods) increases

28 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X) and in non-stationary series the variance of X goes to infinity as the sample size T (number of time periods) increases so the 2 nd term effectively goes to zero and endogeneity is less of an issue in (long) time series data

29 Example: Suppose you decide to regress United States inflation rate on the level of British GDP. There should, in truth, be very little relationship between the two (it is difficult to argue how British GDP could really affect US inflation) If you regress US inflation rates on UK GDP for the period u gdp_sta. reg usinf gdp if year<1980 & quarter==1 Source SS df MS Number of obs = F( 1, 22) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = usinf Coef. Std. Err. t P> t [95% Conf. Interval] gdp _cons which appears to suggest a significant positive (causal) relationship between the two. The R 2 is also very high and if you regress US inflation rates on UK GDP for the period reg usinf gdp if year>=1980 & quarter==1 Source SS df MS Number of obs = F( 1, 21) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = usinf Coef. Std. Err. t P> t [95% Conf. Interval] gdp _cons this now gives a significant negative relationship and the R 2 is much lower

30 0 5 usinf gdp In truth it is hard to believe that UK GDP has any real effect on US inflation rates. The reason why there appears to be a significant relation is because both variables are trended upward in the 1 st period and the regression picks up the common (but unrelated) trends. This is spurious regression twoway (scatter usinf year if year<=1980) (scatter gdp year if year<=1980, yaxis(2)) year... usinf gdp

31 usinf gdp 10 twoway (scatter usinf year if year>1980) (scatter gdp year if year>1980, yaxis(2)) year... usinf gdp

32 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary

33 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference

34 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1

35 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t

36 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t = e t which should be stationary ie random and not trended

37 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t = e t which should be stationary ie random and not trended - since the differenced variable is just equal to the random error term which has no trend or systematic behaviour

38 Example: The % change in gdp looks more likely to be stationary. use "E:\qm2\Lecture 17\stationary.dta", clear year By inspection it seems there is no trend in the difference of GDP over time (and hence the mean and variance look reasonably stable over time)

39 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant.

40 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant

41 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant (if the series changes by the same proportional amount in each period then the log of a series changes by the same amount in each subperiod)

42 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant (if the series changes by the same proportional amount in each period then the log of a series changes by the same amount in each subperiod)

43 In practice not always easy to tell by looking at a series whether it is a random walk (non-stationary) or not. So need to test this formally

44 Use the Dickey-Fuller test

45 Detection Given Y t = Y t-1 + e t is non-stationary (1)

46 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) )

47 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) ) So the test of stationarity is a test of whether b=1

48 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) ) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out)

49 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1

50 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t

51 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t

52 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t

53 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3)

54 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0

55 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1)

56 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1) If so, the data follow a random walk and so the variable is non-stationary

57 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1) If so, the data follow a random walk and so the variable is non-stationary This is called the Dickey Fuller Test

58 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero.

59 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values

60 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression

61 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression

62 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression and as a general rule only regress variables that are stationary on each other.

63 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression and as a general rule only regress variables that are stationary on each other.

64 If they fail the Dickey-Fuller test then try using the difference of that variable instead

65 Example: To test formally whether the UK house prices are stationary or not. u price_sta tsset TIME time variable: TIME, to delta: 1 unit. g dprice=price-price[_n-1] /* creates 1 st difference variable */ (1 missing value generated). g d2price=dprice-dprice[_n-1]. reg dprice l.price Source SS df MS Number of obs = F( 1, 78) = 3.32 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = dprice Coef. Std. Err. t P> t [95% Conf. Interval] price L _cons dfuller price Dickey-Fuller test for unit root Number of obs = Interpolated Dickey-Fuller Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value Z(t) MacKinnon approximate p-value for Z(t) = Since estimated t value < Dickey-Fuller critical value (2.86) can t reject null that null that g= 0 (and b=1) and so original series (ie the level, not the change in prices follows a random walk. So conclude that house prices are a non-stationary series If we repeat the test for the 1 st difference in prices (ie the change in prices)

66 . reg d2price l.dprice Source SS df MS Number of obs = F( 1, 77) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = d2price Coef. Std. Err. t P> t [95% Conf. Interval] dprice L _cons Since estimated t value now > Dickey-Fuller critical value (2.86) reject null that g= 0 (and b=1) and so new series (ie the change in, not the level of prices) is a stationary series Should therefore use the change in prices rather than the level of prices in any OLS estimation (same test should be applied to any other variables used in a regression) Note: stata will do (a variant of) this test automatically note that the critical values are different since stata includes lagged values of the dependent variable in the test (the augmented Dickey Fuller test). dfuller dprice, regress Dickey-Fuller test for unit root Number of obs = Interpolated Dickey-Fuller Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value Z(t) MacKinnon approximate p-value for Z(t) = D.dprice Coef. Std. Err. t P> t [95% Conf. Interval] dprice L _cons p value is <.05 so again reject null that g= 0 (and b=1)

67 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

68 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i

69 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i In practice this means the spread of observations at any given value of X will not now be constant

70 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i In practice this means the spread of observations at any given value of X will not now be constant Eg. food expenditure is known to vary much more at higher levels of income than at lower levels of income, the level of profits tends to vary more across large firms than across small firms)

71 -20 0 Residuals Example: the data set food.dta contains information on food expenditure and income. A graph of the residuals from a regression of food spending on total household expenditure clearly that the residuals tend to be more spread out at higher levels of income this is typical pattern associated with heteroskedasticity.. reg food expnethsum Source SS df MS Number of obs = F( 1, 198) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = food Coef. Std. Err. t P> t [95% Conf. Interval] expnethsum _cons predict res, resid. two (scatter res expnet if expnet<500) household expenditure net of housing

72 Consequences of Heteroskedasticity Can show: 1. OLS estimates of coefficients remains unbiased (as with autocorrelation)

73 Consequences of Heteroskedasticity Can show: 1. OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given Y i = b 0 + b 1 X i +u I

74 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) Var( X)

75 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X)

76 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X)

77 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness,

78 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness, so OLS estimate of coefficients remains unbiased in presence of heteroskedasticity

79 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness, so OLS estimate of coefficients remains unbiased in presence of heteroskedasticity but

80 2) can show that heteroskedasticity (like autocorrelation) means the OLS estimates of the standard errors (and hence t and F tests) are biased.

81 2) can show that heteroskedasticity (like autocorrelation) means the OLS estimates of the standard errors (and hence t and F tests) are biased. (intuitively, if all observations are distributed unevenly about the regression line then OLS is unable to distinguish the quality of the observations - observations further away from the regression line should be given less weight in the calculation of the standard errors (since they are more unreliable) but OLS can t do this, so the standard errors are biased).

82 Testing for Heteroskedasticity 1. Residual Plots In absence of Heteroskedasticity there should be no obvious pattern to the spread of the residuals, so useful to plot the residuals against the X variable thought to be causing the problem,

83 Testing for Heteroskedasticity 1. Residual Plots In absence of Heteroskedasticity there should be no obvious pattern to the spread of the residuals, so useful to plot the residuals against the X variable thought to be causing the problem, - assuming you know which X variable it is (often difficult)

84 2. Goldfeld-Quandt Again assuming know which variable is causing the problem then can test formally whether the residual spread varies with values of the suspect X variable. i) Order the data by the size of the X variable and split the data into 2 equal sub-groups (one high variance the other low variance) ii) Drop the middle c observations where c is approximately 30% of your sample iii) Run separate regressions for the high and low variance subsamples iv) Compute RSShighvar iancesub sample N c 2k N c 2k F ~ F, RSSlowvar iancesub sample 2 2 v) If estimated F>Fcritical, reject null of no heteroskedasticity (intuitively the residuals from the high variance sub-sample are much larger than the residuals from the high variance subsample) Fine if certain which variable causing the problem, less so if unsure.

85 3. Breusch-Pagan Test In most cases involving more than one right hand side variable it is unlikely that you will know which variable is causing the problem. A more general test is therefore to regress an approximation of the (unknown) residual variance on all the right hand side variables and test for a significant causal effect (if there is then you suspect heteroskedasticity)

86 Breusch-Pagan Test Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u

87 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance

88 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2)

89 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute F R (1 R 2 auxillary/ k 1 2 auxillary) / N k ~ F[ k 1, N k]

90 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxillary regression

91 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxillary regression

92 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxiliary regression or compute N*R 2 auxillary ~ 2 (k-1) (k-1 since not testing constant)

93 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxiliary regression or compute N*R 2 auxillary ~ 2 (k-1) (k-1 since not testing constant) If F or N*R 2 auxillary > respective critical values reject null of no heterosked.

94 Example: Breusch-Pagan Test of Heteroskedastcity The data set smoke.dta contains information on the smoking habits, wages age and gender of a cross-section of individuals. u smoke.dta /* read in data */. reg lhw age age2 female smoke Source SS df MS Number of obs = F( 4, 7965) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lhw Coef. Std. Err. t P> t [95% Conf. Interval] age age female smokes _cons /* save residuals */. predict reshat, resid. g reshat2=reshat^2 /* square them */ /* regress square of residuals on all original rhs variables */. reg reshat2 age age2 female smoke Source SS df MS Number of obs = F( 4, 7965) = 6.59 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.70838

95 reshat2 Coef. Std. Err. t P> t [95% Conf. Interval] age age female smokes _cons Breusch-Pagan test is N*R 2. di 7970* which is chi-squared k-1 degrees of freedom (4 in this case) and the critical value is So estimated value exceeds critical value Similarly the F test for goodness of fit in stata output in the top right corner is test for joint significance of all the rhs variables in this model (excluding the constant) From F tables, Fcritical 5% level (4,7970) = 2.37 So estimated F = 6.59 > Fcritical, so reject null of no heteroskedasticity Or could use Stata s version of the Breusch-Pagan test bpagan lhw age age2 female smoke Breusch-Pagan LM statistic: Chi-sq( 5) P-value = 0

96 What to do if heteroskedasticity present? 1. Try different functional form Sometimes taking logs of dependent or explanatory variable can reduce the problem

97 . reg food expnethsum if exp<1000 Source SS df MS Number of obs = F( 1, 190) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = food Coef. Std. Err. t P> t [95% Conf. Interval] expnethsum _cons bpagan expn Breusch-Pagan LM statistic: Chi-sq( 1) P-value =.006 The Breusch-Pagan test indicates the presence of heteroskedasticity (estimated chi-squared value > critical value). This means the standard errors, t statistics etc are biased If use the log of the dependent variable rather than in levels. g lfood=log(food). reg lfood expnethsum if exp<1000 Source SS df MS Number of obs = F( 1, 190) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lfood Coef. Std. Err. t P> t [95% Conf. Interval] expnethsum _cons bpagan expnethsum Breusch-Pagan LM statistic: Chi-sq( 1) P-value =.237

98 2. Drop Outliers Sometimes heteroskedasticity can be influenced by 1 or 2 observations in the data set which stand a long way from the main concentration of data - outliers.

99 2. Drop Outliers Sometimes heteroskedasticity can be influenced by 1 or 2 observations in the data set which stand a long way from the main concentration of data - outliers. Often these observations may be genuine in which case you should not drop them but sometimes they may be the result of measurement error or miscoding in which case you may have a case for dropping them.

100 5 infmort Example The data infmort.dta gives infant mortality for 51 U.S. states along with the number of doctors per capita ine ach state. A graph of infant mortality against number of doctors clearly shows that Washington D.C. is something of an outlier (it has lots of doctors but also a very high infant mortality rate). twoway (scatter infmort state, mlabel(state)), ytitle(infmort) ylabel(, labels) xtitle(state) dc goergia mississip scarol louisiana alabama alaska illinois michigan ncarol delaware sodakota tennesee virginia indiana newyork ohio westvirg florida marylandmissouri pennsyl arkansas newjers oklahoma arizona colarado montana newmex idaho kansas kentucky connet iowa nebraska nevada wyoming oregon nodakot rhodis texas wisconsin calif washingt minnesot utah mass newhamp hawaii maine vermont state A regression of infant mortality on (the log of) doctor numbers for all 51 observations suffers from heteroskedasticity. reg infmort ldocs Source SS df MS Number of obs = F( 1, 49) = 4.08 Model Prob > F =

101 Residual R-squared = Adj R-squared = Total Root MSE = infmort Coef. Std. Err. t P> t [95% Conf. Interval] ldocs _cons bpagan ldocs Breusch-Pagan LM statistic: Chi-sq( 1) P-value = 2.5e-16 However if the outlier is excluded then. reg infmort ldocs if dc==0 Source SS df MS Number of obs = F( 1, 48) = 5.13 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = infmort Coef. Std. Err. t P> t [95% Conf. Interval] ldocs _cons bpagan ldocs Breusch-Pagan LM statistic: Chi-sq( 1) P-value =.7739 Can see that the problem of heteroskedasticty disappears though the D.C. observation is genuine so you need to think carefully about the benefits of dropping it against the costs.

102 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity

103 Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2

104 Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 )

105 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1

106 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 )

107 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) =1/X i 2 Var(u i )

108 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2

109 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X)= Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set

110 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1)

111 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1) becomes Y i / X i = b 0 / X i + b 1 X i / X i + u i / X i (2)

112 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1) becomes Y i / X i = b 0 / X i + b 1 X i / X i + u i / X i (2) and the estimates of b 0 and b 1 in (2) will not be affected by heterosked.

113 This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF

114 This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF The assumption about the form of heteroskedasticity is correct

115 This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF The assumption about the form of heteroskedasticity is correct If not the solution may be much worse than OLS

116 Example. reg hourpay age Source SS df MS Number of obs = F( 1, 12096) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = hourpay Coef. Std. Err. t P> t [95% Conf. Interval] age _cons bpagan age Breusch-Pagan LM statistic: Chi-sq( 1) P-value = 3.2e-05 Test suggests heteroskedasticity present Suppose you decide that heteroskedasticity is given by var(u i )=σ 2 Age i So transform variables by dividing by SQUARE ROOT of Age (including the constant). g ha=hourpay/sqrt(age). g aa=age/sqrt(age). g ac=1/sqrt(age) /* this is new constant term */. reg ha aa ac, nocon Source SS df MS Number of obs = F( 2, 12096) = Model Prob > F = Residual R-squared = Adj R-squared =

117 Total Root MSE = ha Coef. Std. Err. t P> t [95% Conf. Interval] aa ac If heteroskedastic assumption is correct these are the GLS estimates and should be preferred to OLS. If assumption is not correct they will be misleading.

118 3. White adjustment (OLS robust standard errors) As with autocorrelation, best fix may be to make OLS standard errors unbiased (if inefficient) if don t know precise form of heteroskedasticity

119 3. White adjustment (OLS robust standard errors) As with autocorrelation, best fix may be to make OLS standard errors unbiased (if inefficient) if don t know precise form of heteroskedasticity In absence of heteroskedasticity we know OLS estimate of variance on any coefficient is

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set Heteroskedasticity Occurs when the Gauss Markov assumption that

More information

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)

Heteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant) Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u 2 i /X i ) σ 2 i (In practice this means the spread

More information

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time Autocorrelation Given the model Y t = b 0 + b 1 X t + u t Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time This could be caused

More information

Handout 12. Endogeneity & Simultaneous Equation Models

Handout 12. Endogeneity & Simultaneous Equation Models Handout 12. Endogeneity & Simultaneous Equation Models In which you learn about another potential source of endogeneity caused by the simultaneous determination of economic variables, and learn how to

More information

10) Time series econometrics

10) Time series econometrics 30C00200 Econometrics 10) Time series econometrics Timo Kuosmanen Professor, Ph.D. 1 Topics today Static vs. dynamic time series model Suprious regression Stationary and nonstationary time series Unit

More information

7 Introduction to Time Series

7 Introduction to Time Series Econ 495 - Econometric Review 1 7 Introduction to Time Series 7.1 Time Series vs. Cross-Sectional Data Time series data has a temporal ordering, unlike cross-section data, we will need to changes some

More information

Econometrics. 9) Heteroscedasticity and autocorrelation

Econometrics. 9) Heteroscedasticity and autocorrelation 30C00200 Econometrics 9) Heteroscedasticity and autocorrelation Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Heteroscedasticity Possible causes Testing for

More information

Answers: Problem Set 9. Dynamic Models

Answers: Problem Set 9. Dynamic Models Answers: Problem Set 9. Dynamic Models 1. Given annual data for the period 1970-1999, you undertake an OLS regression of log Y on a time trend, defined as taking the value 1 in 1970, 2 in 1972 etc. The

More information

9) Time series econometrics

9) Time series econometrics 30C00200 Econometrics 9) Time series econometrics Timo Kuosmanen Professor Management Science http://nomepre.net/index.php/timokuosmanen 1 Macroeconomic data: GDP Inflation rate Examples of time series

More information

Graduate Econometrics Lecture 4: Heteroskedasticity

Graduate Econometrics Lecture 4: Heteroskedasticity Graduate Econometrics Lecture 4: Heteroskedasticity Department of Economics University of Gothenburg November 30, 2014 1/43 and Autocorrelation Consequences for OLS Estimator Begin from the linear model

More information

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15 Econ 495 - Econometric Review 1 Contents 7 Introduction to Time Series 3 7.1 Time Series vs. Cross-Sectional Data............ 3 7.2 Detrending Time Series................... 15 7.3 Types of Stochastic

More information

Handout 11: Measurement Error

Handout 11: Measurement Error Handout 11: Measurement Error In which you learn to recognise the consequences for OLS estimation whenever some of the variables you use are not measured as accurately as you might expect. A (potential)

More information

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like. Measurement Error Often a data set will contain imperfect measures of the data we would ideally like. Aggregate Data: (GDP, Consumption, Investment are only best guesses of theoretical counterparts and

More information

Problem Set 10: Panel Data

Problem Set 10: Panel Data Problem Set 10: Panel Data 1. Read in the data set, e11panel1.dta from the course website. This contains data on a sample or 1252 men and women who were asked about their hourly wage in two years, 2005

More information

Topic 7: Heteroskedasticity

Topic 7: Heteroskedasticity Topic 7: Heteroskedasticity Advanced Econometrics (I Dong Chen School of Economics, Peking University Introduction If the disturbance variance is not constant across observations, the regression is heteroskedastic

More information

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE 1. You have data on years of work experience, EXPER, its square, EXPER, years of education, EDUC, and the log of hourly wages, LWAGE You estimate the following regressions: (1) LWAGE =.00 + 0.05*EDUC +

More information

Lecture 14. More on using dummy variables (deal with seasonality)

Lecture 14. More on using dummy variables (deal with seasonality) Lecture 14. More on using dummy variables (deal with seasonality) More things to worry about: measurement error in variables (can lead to bias in OLS (endogeneity) ) Have seen that dummy variables are

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Course Econometrics I

Course Econometrics I Course Econometrics I 4. Heteroskedasticity Martin Halla Johannes Kepler University of Linz Department of Economics Last update: May 6, 2014 Martin Halla CS Econometrics I 4 1/31 Our agenda for today Consequences

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Stationary and nonstationary variables

Stationary and nonstationary variables Stationary and nonstationary variables Stationary variable: 1. Finite and constant in time expected value: E (y t ) = µ < 2. Finite and constant in time variance: Var (y t ) = σ 2 < 3. Covariance dependent

More information

Binary Dependent Variables

Binary Dependent Variables Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41 ECON2228 Notes 7 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 6 2014 2015 1 / 41 Chapter 8: Heteroskedasticity In laying out the standard regression model, we made

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/

More information

Empirical Application of Simple Regression (Chapter 2)

Empirical Application of Simple Regression (Chapter 2) Empirical Application of Simple Regression (Chapter 2) 1. The data file is House Data, which can be downloaded from my webpage. 2. Use stata menu File Import Excel Spreadsheet to read the data. Don t forget

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

THE MULTIVARIATE LINEAR REGRESSION MODEL

THE MULTIVARIATE LINEAR REGRESSION MODEL THE MULTIVARIATE LINEAR REGRESSION MODEL Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,

Auto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e., 1 Motivation Auto correlation 2 Autocorrelation occurs when what happens today has an impact on what happens tomorrow, and perhaps further into the future This is a phenomena mainly found in time-series

More information

BCT Lecture 3. Lukas Vacha.

BCT Lecture 3. Lukas Vacha. BCT Lecture 3 Lukas Vacha vachal@utia.cas.cz Stationarity and Unit Root Testing Why do we need to test for Non-Stationarity? The stationarity or otherwise of a series can strongly influence its behaviour

More information

Lab 11 - Heteroskedasticity

Lab 11 - Heteroskedasticity Lab 11 - Heteroskedasticity Spring 2017 Contents 1 Introduction 2 2 Heteroskedasticity 2 3 Addressing heteroskedasticity in Stata 3 4 Testing for heteroskedasticity 4 5 A simple example 5 1 1 Introduction

More information

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page! Econometrics - Exam May 11, 2011 1 Exam Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page! Problem 1: (15 points) A researcher has data for the year 2000 from

More information

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48 ECON2228 Notes 10 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 10 2014 2015 1 / 48 Serial correlation and heteroskedasticity in time series regressions Chapter 12:

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics STAT-S-301 Introduction to Time Series Regression and Forecasting (2016/2017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 Introduction to Time Series Regression

More information

Introductory Workshop on Time Series Analysis. Sara McLaughlin Mitchell Department of Political Science University of Iowa

Introductory Workshop on Time Series Analysis. Sara McLaughlin Mitchell Department of Political Science University of Iowa Introductory Workshop on Time Series Analysis Sara McLaughlin Mitchell Department of Political Science University of Iowa Overview Properties of time series data Approaches to time series analysis Stationarity

More information

Lecture 4: Heteroskedasticity

Lecture 4: Heteroskedasticity Lecture 4: Heteroskedasticity Econometric Methods Warsaw School of Economics (4) Heteroskedasticity 1 / 24 Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity White Goldfeld-Quandt Breusch-Pagan

More information

Chapter 8 Heteroskedasticity

Chapter 8 Heteroskedasticity Chapter 8 Walter R. Paczkowski Rutgers University Page 1 Chapter Contents 8.1 The Nature of 8. Detecting 8.3 -Consistent Standard Errors 8.4 Generalized Least Squares: Known Form of Variance 8.5 Generalized

More information

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity Outline: Further Issues in Using OLS with Time Series Data 13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process I. Stationary and Weakly Dependent Time Series III. Highly Persistent

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Introduction to Econometrics. Heteroskedasticity

Introduction to Econometrics. Heteroskedasticity Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory

More information

Lecture 8: Functional Form

Lecture 8: Functional Form Lecture 8: Functional Form What we know now OLS - fitting a straight line y = b 0 + b 1 X through the data using the principle of choosing the straight line that minimises the sum of squared residuals

More information

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Testing methodology. It often the case that we try to determine the form of the model on the basis of data Testing methodology It often the case that we try to determine the form of the model on the basis of data The simplest case: we try to determine the set of explanatory variables in the model Testing for

More information

Multiple Regression Analysis: Heteroskedasticity

Multiple Regression Analysis: Heteroskedasticity Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing

More information

Time Series Methods. Sanjaya Desilva

Time Series Methods. Sanjaya Desilva Time Series Methods Sanjaya Desilva 1 Dynamic Models In estimating time series models, sometimes we need to explicitly model the temporal relationships between variables, i.e. does X affect Y in the same

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression

More information

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

More information

Econometrics - 30C00200

Econometrics - 30C00200 Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business

More information

Econ 423 Lecture Notes

Econ 423 Lecture Notes Econ 423 Lecture Notes (These notes are slightly modified versions of lecture notes provided by Stock and Watson, 2007. They are for instructional purposes only and are not to be distributed outside of

More information

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54

ECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54 ECON2228 Notes 10 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 10 2014 2015 1 / 54 erial correlation and heteroskedasticity in time series regressions Chapter 12:

More information

Instrumental Variables, Simultaneous and Systems of Equations

Instrumental Variables, Simultaneous and Systems of Equations Chapter 6 Instrumental Variables, Simultaneous and Systems of Equations 61 Instrumental variables In the linear regression model y i = x iβ + ε i (61) we have been assuming that bf x i and ε i are uncorrelated

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution

More information

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Lecture 8: Heteroskedasticity. Causes Consequences Detection Fixes

Lecture 8: Heteroskedasticity. Causes Consequences Detection Fixes Lecture 8: Heteroskedasticity Causes Consequences Detection Fixes Assumption MLR5: Homoskedasticity 2 var( u x, x,..., x ) 1 2 In the multivariate case, this means that the variance of the error term does

More information

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X Functional Form So far considered models written in linear form Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X Functional Form So far considered models written in linear form

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

11. Further Issues in Using OLS with TS Data

11. Further Issues in Using OLS with TS Data 11. Further Issues in Using OLS with TS Data With TS, including lags of the dependent variable often allow us to fit much better the variation in y Exact distribution theory is rarely available in TS applications,

More information

Lecture#12. Instrumental variables regression Causal parameters III

Lecture#12. Instrumental variables regression Causal parameters III Lecture#12 Instrumental variables regression Causal parameters III 1 Demand experiment, market data analysis & simultaneous causality 2 Simultaneous causality Your task is to estimate the demand function

More information

ECON Introductory Econometrics. Lecture 13: Internal and external validity

ECON Introductory Econometrics. Lecture 13: Internal and external validity ECON4150 - Introductory Econometrics Lecture 13: Internal and external validity Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 9 Lecture outline 2 Definitions of internal and external

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11 Econ 495 - Econometric Review 1 Contents 1 Linear Regression Analysis 4 1.1 The Mincer Wage Equation................. 4 1.2 Data............................. 6 1.3 Econometric Model.....................

More information

Econometrics. 8) Instrumental variables

Econometrics. 8) Instrumental variables 30C00200 Econometrics 8) Instrumental variables Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Thery of IV regression Overidentification Two-stage least squates

More information

Practice exam questions

Practice exam questions Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.

More information

Economics 308: Econometrics Professor Moody

Economics 308: Econometrics Professor Moody Economics 308: Econometrics Professor Moody References on reserve: Text Moody, Basic Econometrics with Stata (BES) Pindyck and Rubinfeld, Econometric Models and Economic Forecasts (PR) Wooldridge, Jeffrey

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Suggested Review Problems from Pindyck & Rubinfeld Original prepared by Professor Suzanne Cooper John F. Kennedy School of Government, Harvard

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Fall 2008 Environmental Econometrics (GR03) TSII Fall 2008 1 / 16 More on AR(1) In AR(1) model (Y t = µ + ρy t 1 + u t ) with ρ = 1, the series is said to have a unit root or a

More information

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity 1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information

Interpreting coefficients for transformed variables

Interpreting coefficients for transformed variables Interpreting coefficients for transformed variables! Recall that when both independent and dependent variables are untransformed, an estimated coefficient represents the change in the dependent variable

More information

Lab 07 Introduction to Econometrics

Lab 07 Introduction to Econometrics Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand

More information

Lab 10 - Binary Variables

Lab 10 - Binary Variables Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

F Tests and F statistics

F Tests and F statistics F Tests and F statistics Testing Linear estrictions F Stats and F Tests F Distributions F stats (w/ ) F Stats and tstat s eported F Stat's in OLS Output Example I: Bodyfat Babies and Bathwater F Stats,

More information

Covers Chapter 10-12, some of 16, some of 18 in Wooldridge. Regression Analysis with Time Series Data

Covers Chapter 10-12, some of 16, some of 18 in Wooldridge. Regression Analysis with Time Series Data Covers Chapter 10-12, some of 16, some of 18 in Wooldridge Regression Analysis with Time Series Data Obviously time series data different from cross section in terms of source of variation in x and y temporal

More information

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University Topic 4 Unit Roots Gerald P. Dwyer Clemson University February 2016 Outline 1 Unit Roots Introduction Trend and Difference Stationary Autocorrelations of Series That Have Deterministic or Stochastic Trends

More information

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of Wooldridge, Introductory Econometrics, d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of homoskedasticity of the regression error term: that its

More information

Question 1 [17 points]: (ch 11)

Question 1 [17 points]: (ch 11) Question 1 [17 points]: (ch 11) A study analyzed the probability that Major League Baseball (MLB) players "survive" for another season, or, in other words, play one more season. They studied a model of

More information

Autoregressive models with distributed lags (ADL)

Autoregressive models with distributed lags (ADL) Autoregressive models with distributed lags (ADL) It often happens than including the lagged dependent variable in the model results in model which is better fitted and needs less parameters. It can be

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Econometrics Part Three

Econometrics Part Three !1 I. Heteroskedasticity A. Definition 1. The variance of the error term is correlated with one of the explanatory variables 2. Example -- the variance of actual spending around the consumption line increases

More information

Heteroskedasticity Example

Heteroskedasticity Example ECON 761: Heteroskedasticity Example L Magee November, 2007 This example uses the fertility data set from assignment 2 The observations are based on the responses of 4361 women in Botswana s 1988 Demographic

More information

AUTOCORRELATION. Phung Thanh Binh

AUTOCORRELATION. Phung Thanh Binh AUTOCORRELATION Phung Thanh Binh OUTLINE Time series Gauss-Markov conditions The nature of autocorrelation Causes of autocorrelation Consequences of autocorrelation Detecting autocorrelation Remedial measures

More information

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 11 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 30 Recommended Reading For the today Advanced Time Series Topics Selected topics

More information

Semester 2, 2015/2016

Semester 2, 2015/2016 ECN 3202 APPLIED ECONOMETRICS 5. HETEROSKEDASTICITY Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016 WHAT IS HETEROSKEDASTICITY? The multiple linear regression model can

More information

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

1 The Multiple Regression Model: Freeing Up the Classical Assumptions 1 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions were crucial for many of the derivations of the previous chapters. Derivation of the OLS estimator

More information

ECO375 Tutorial 7 Heteroscedasticity

ECO375 Tutorial 7 Heteroscedasticity ECO375 Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 1 / 24 Review: Heteroscedasticity Consider

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Lecture 3: Multivariate Regression

Lecture 3: Multivariate Regression Lecture 3: Multivariate Regression Rates, cont. Two weeks ago, we modeled state homicide rates as being dependent on one variable: poverty. In reality, we know that state homicide rates depend on numerous

More information

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECON Introductory Econometrics. Lecture 16: Instrumental variables ECON4150 - Introductory Econometrics Lecture 16: Instrumental variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 12 Lecture outline 2 OLS assumptions and when they are violated Instrumental

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and

More information

Econometrics and Structural

Econometrics and Structural Introduction to Time Series Econometrics and Structural Breaks Ziyodullo Parpiev, PhD Outline 1. Stochastic processes 2. Stationary processes 3. Purely random processes 4. Nonstationary processes 5. Integrated

More information

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Multiple Regression Analysis: Heteroskedasticity Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties

More information

Econometrics Lecture 9 Time Series Methods

Econometrics Lecture 9 Time Series Methods Econometrics Lecture 9 Time Series Methods Tak Wai Chau Shanghai University of Finance and Economics Spring 2014 1 / 82 Time Series Data I Time series data are data observed for the same unit repeatedly

More information

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance. Heteroskedasticity y i = β + β x i + β x i +... + β k x ki + e i where E(e i ) σ, non-constant variance. Common problem with samples over individuals. ê i e ˆi x k x k AREC-ECON 535 Lec F Suppose y i =

More information