Lecture 19. Common problem in cross section estimation heteroskedasticity
|
|
- Anthony Armstrong
- 5 years ago
- Views:
Transcription
1 Lecture 19 Learning to worry about and deal with stationarity Common problem in cross section estimation heteroskedasticity What is it Why does it matter What to do about it
2 Stationarity Ultimately whether you can sensibly include lags of either the dependent or explanatory variables or indeed the current level of a variable in a regression also depends on whether the time series data that you are analysing are stationary A variable is said to be (weakly) stationary if 1) its mean 2) its variance 3) its autocovariance Cov(Y t, Y t-s ) where s t do not change over time Stationarity is needed if the Gauss-Markov conditions for unbiased, efficient OLS estimation are to be met by time series data (Essentially any variable that is trended is unlikely to be stationary)
3 Example: A plot of nominal GDP over time using the data set stationary.dta use "E:\qm2\Lecture 17\stationary.dta", clear two (scatter gdp year) year GDP displays a distinct upward trend and so is unlikely to be stationary. Neither its mean value or its variance are stable over time su gdp if year<1980 Variable Obs Mean Std. Dev. Min Max gdp su gdp if year>=1980 Variable Obs Mean Std. Dev. Min Max gdp
4 Some series are already stationary if there is no obvious trend and some sort of reversion to a long run value. The UK inflation rate is one example (from the data set stationary.dta) two (line inflation year) year
5 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway)
6 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today.
7 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk
8 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t
9 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t - the value of Y today equals last period s value plus an unpredictable random error e (hence the name) and no other lags
10 In general just looking at the time series of a variable will not be enough to judge whether the variable is stationary or not (though it is good practice to graph the series anyway) If a variable is stationary then its values are persistent. This means that the level of the variable at some point in the past continues to influence the level of the variable today. The simplest way of modelling persistence of a non-stationary process is the random walk Y t = Y t-1 + e t - the value of Y today equals last period s value plus an unpredictable random error e (hence the name) and no other lags This means that the best forecast of this period s level is last period s level.
11 Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1
12 Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1. A coefficient of one means that the series is a unit root process
13 Y t = Y t-1 + e t Y t = ρy t-1 + e t similar then to the AR(1) model used for autocorrelation but with the coefficient set to 1. A coefficient of one means that the series is a unit root process
14 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term
15 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Y t = Y t-1 + e t
16 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t
17 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift
18 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year)
19 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year) Can also model this by adding a time trend (t=year) Y t = b 0 + Y t-1 + t + e t
20 Since many series (like GDP) have an obvious trend, can adapt this model to allow for a movement ( drift ) in one direction or the other by adding a constant term So Becomes Y t = Y t-1 + e t Y t = b 0 + Y t-1 + e t This is a random walk with drift the best forecast of this period s level is now is last period s value plus a positive constant b 0 (more realistic model of GDP growing at say 2% a year) Can also model this by adding a time trend (t=year) Y t = b 0 + Y t-1 + t + e t what this means is that a series can be stationary around an upward (or downward) trend
21 Consequences Can show that if variables are NOT stationary then
22 Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased
23 Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased 2. This often leads to spurious regression variables appear to be related (significant in a regression) but this is because both are trended. If take trend out would not be.
24 Consequences Can show that if variables are NOT stationary then 1. OLS t values on any variables are biased 2. This often leads to spurious regression variables appear to be related (significant in a regression) but this is because both are trended. If take trend out would not be. 3. OLS estimates of coefficient on lagged dependent variable are biased toward zero
25 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by
26 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X)
27 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X) and in non-stationary series the variance of X goes to infinity as the sample size T (number of time periods) increases
28 Note that any concerns about endogeneity are dwarfed compared to the issue of stationarity since the bias in OLS is given by ^ OLS Cov( X, u) Var( X) and in non-stationary series the variance of X goes to infinity as the sample size T (number of time periods) increases so the 2 nd term effectively goes to zero and endogeneity is less of an issue in (long) time series data
29 Example: Suppose you decide to regress United States inflation rate on the level of British GDP. There should, in truth, be very little relationship between the two (it is difficult to argue how British GDP could really affect US inflation) If you regress US inflation rates on UK GDP for the period u gdp_sta. reg usinf gdp if year<1980 & quarter==1 Source SS df MS Number of obs = F( 1, 22) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = usinf Coef. Std. Err. t P> t [95% Conf. Interval] gdp _cons which appears to suggest a significant positive (causal) relationship between the two. The R 2 is also very high and if you regress US inflation rates on UK GDP for the period reg usinf gdp if year>=1980 & quarter==1 Source SS df MS Number of obs = F( 1, 21) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = usinf Coef. Std. Err. t P> t [95% Conf. Interval] gdp _cons this now gives a significant negative relationship and the R 2 is much lower
30 0 5 usinf gdp In truth it is hard to believe that UK GDP has any real effect on US inflation rates. The reason why there appears to be a significant relation is because both variables are trended upward in the 1 st period and the regression picks up the common (but unrelated) trends. This is spurious regression twoway (scatter usinf year if year<=1980) (scatter gdp year if year<=1980, yaxis(2)) year... usinf gdp
31 usinf gdp 10 twoway (scatter usinf year if year>1980) (scatter gdp year if year>1980, yaxis(2)) year... usinf gdp
32 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary
33 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference
34 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1
35 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t
36 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t = e t which should be stationary ie random and not trended
37 What to do? - Make the variables stationary and OLS will be OK Often the easiest way to do this is by differencing the data (ie taking last period s value away from this period s value) Eg If Y t = Y t-1 + e t is non-stationary then take Y t-1 to the other side to get the difference Y t - Y t-1 = ΔY t = e t which should be stationary ie random and not trended - since the differenced variable is just equal to the random error term which has no trend or systematic behaviour
38 Example: The % change in gdp looks more likely to be stationary. use "E:\qm2\Lecture 17\stationary.dta", clear year By inspection it seems there is no trend in the difference of GDP over time (and hence the mean and variance look reasonably stable over time)
39 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant.
40 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant
41 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant (if the series changes by the same proportional amount in each period then the log of a series changes by the same amount in each subperiod)
42 Note: Sometimes taking the (natural) log of a series can make the standard deviation of the log of the series constant. If the series is exponential (as sometimes is GDP) then the log of the series will be linear and the standard deviation of the log across subperiods will be constant (if the series changes by the same proportional amount in each period then the log of a series changes by the same amount in each subperiod)
43 In practice not always easy to tell by looking at a series whether it is a random walk (non-stationary) or not. So need to test this formally
44 Use the Dickey-Fuller test
45 Detection Given Y t = Y t-1 + e t is non-stationary (1)
46 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) )
47 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) ) So the test of stationarity is a test of whether b=1
48 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) (can show the variance of Y is constant for (2) ) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out)
49 Detection Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1
50 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) So the test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t
51 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t
52 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t
53 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3)
54 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0
55 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1)
56 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1) If so, the data follow a random walk and so the variable is non-stationary
57 Given Y t = Y t-1 + e t is non-stationary (1) But Y t = by t-1 + e t is stationary if b<1 (2) The test of stationarity is a test of whether b=1 In practice can subtract Y t-1 from both sides of (2) (the 2 Y t-1 cancel out) Y t Y t-1 = by t-1 Y t-1 + e t ΔY t = (b-1) Y t-1 + e t ΔY t = gy t-1 + e t (3) and test whether the coefficient g= b-1 = 0 (if g=0 then b=1) If so, the data follow a random walk and so the variable is non-stationary This is called the Dickey Fuller Test
58 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero.
59 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values
60 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression
61 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression
62 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression and as a general rule only regress variables that are stationary on each other.
63 So estimate ΔY t = gy t-1 + e t by OLS and accept null of random walk if g is not significantly different from zero. Turns out that the critical values of this test differ from the normal t test critical values Use instead (asymptotic) 5 % critical value = 1.94 and 2.86 if there is a constant in the regression and 3.41 if there is a constant and a time trend in the regression and as a general rule only regress variables that are stationary on each other.
64 If they fail the Dickey-Fuller test then try using the difference of that variable instead
65 Example: To test formally whether the UK house prices are stationary or not. u price_sta tsset TIME time variable: TIME, to delta: 1 unit. g dprice=price-price[_n-1] /* creates 1 st difference variable */ (1 missing value generated). g d2price=dprice-dprice[_n-1]. reg dprice l.price Source SS df MS Number of obs = F( 1, 78) = 3.32 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = dprice Coef. Std. Err. t P> t [95% Conf. Interval] price L _cons dfuller price Dickey-Fuller test for unit root Number of obs = Interpolated Dickey-Fuller Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value Z(t) MacKinnon approximate p-value for Z(t) = Since estimated t value < Dickey-Fuller critical value (2.86) can t reject null that null that g= 0 (and b=1) and so original series (ie the level, not the change in prices follows a random walk. So conclude that house prices are a non-stationary series If we repeat the test for the 1 st difference in prices (ie the change in prices)
66 . reg d2price l.dprice Source SS df MS Number of obs = F( 1, 77) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = d2price Coef. Std. Err. t P> t [95% Conf. Interval] dprice L _cons Since estimated t value now > Dickey-Fuller critical value (2.86) reject null that g= 0 (and b=1) and so new series (ie the change in, not the level of prices) is a stationary series Should therefore use the change in prices rather than the level of prices in any OLS estimation (same test should be applied to any other variables used in a regression) Note: stata will do (a variant of) this test automatically note that the critical values are different since stata includes lagged values of the dependent variable in the test (the augmented Dickey Fuller test). dfuller dprice, regress Dickey-Fuller test for unit root Number of obs = Interpolated Dickey-Fuller Test 1% Critical 5% Critical 10% Critical Statistic Value Value Value Z(t) MacKinnon approximate p-value for Z(t) = D.dprice Coef. Std. Err. t P> t [95% Conf. Interval] dprice L _cons p value is <.05 so again reject null that g= 0 (and b=1)
67 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set
68 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i
69 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i In practice this means the spread of observations at any given value of X will not now be constant
70 Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u i 2 /X i ) 2 i In practice this means the spread of observations at any given value of X will not now be constant Eg. food expenditure is known to vary much more at higher levels of income than at lower levels of income, the level of profits tends to vary more across large firms than across small firms)
71 -20 0 Residuals Example: the data set food.dta contains information on food expenditure and income. A graph of the residuals from a regression of food spending on total household expenditure clearly that the residuals tend to be more spread out at higher levels of income this is typical pattern associated with heteroskedasticity.. reg food expnethsum Source SS df MS Number of obs = F( 1, 198) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = food Coef. Std. Err. t P> t [95% Conf. Interval] expnethsum _cons predict res, resid. two (scatter res expnet if expnet<500) household expenditure net of housing
72 Consequences of Heteroskedasticity Can show: 1. OLS estimates of coefficients remains unbiased (as with autocorrelation)
73 Consequences of Heteroskedasticity Can show: 1. OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given Y i = b 0 + b 1 X i +u I
74 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) Var( X)
75 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X)
76 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X)
77 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness,
78 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) sub in Y i = b 0 + b 1 X i +u I Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness, so OLS estimate of coefficients remains unbiased in presence of heteroskedasticity
79 Consequences of Heteroskedasticity Can show: 1) OLS estimates of coefficients remains unbiased (as with autocorrelation) - since given and Y i = b 0 + b 1 X i +u I ^ ols 1 b COV( X, Y) Var( X) Cov( X, u) b1 Var( X) heteroskedasticity assumption that E(u i 2 /X i ) 2 does not affect Cov(X,u) = 0 needed to prove unbiasedness, so OLS estimate of coefficients remains unbiased in presence of heteroskedasticity but
80 2) can show that heteroskedasticity (like autocorrelation) means the OLS estimates of the standard errors (and hence t and F tests) are biased.
81 2) can show that heteroskedasticity (like autocorrelation) means the OLS estimates of the standard errors (and hence t and F tests) are biased. (intuitively, if all observations are distributed unevenly about the regression line then OLS is unable to distinguish the quality of the observations - observations further away from the regression line should be given less weight in the calculation of the standard errors (since they are more unreliable) but OLS can t do this, so the standard errors are biased).
82 Testing for Heteroskedasticity 1. Residual Plots In absence of Heteroskedasticity there should be no obvious pattern to the spread of the residuals, so useful to plot the residuals against the X variable thought to be causing the problem,
83 Testing for Heteroskedasticity 1. Residual Plots In absence of Heteroskedasticity there should be no obvious pattern to the spread of the residuals, so useful to plot the residuals against the X variable thought to be causing the problem, - assuming you know which X variable it is (often difficult)
84 2. Goldfeld-Quandt Again assuming know which variable is causing the problem then can test formally whether the residual spread varies with values of the suspect X variable. i) Order the data by the size of the X variable and split the data into 2 equal sub-groups (one high variance the other low variance) ii) Drop the middle c observations where c is approximately 30% of your sample iii) Run separate regressions for the high and low variance subsamples iv) Compute RSShighvar iancesub sample N c 2k N c 2k F ~ F, RSSlowvar iancesub sample 2 2 v) If estimated F>Fcritical, reject null of no heteroskedasticity (intuitively the residuals from the high variance sub-sample are much larger than the residuals from the high variance subsample) Fine if certain which variable causing the problem, less so if unsure.
85 3. Breusch-Pagan Test In most cases involving more than one right hand side variable it is unlikely that you will know which variable is causing the problem. A more general test is therefore to regress an approximation of the (unknown) residual variance on all the right hand side variables and test for a significant causal effect (if there is then you suspect heteroskedasticity)
86 Breusch-Pagan Test Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u
87 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance
88 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ^ u ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2)
89 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute F R (1 R 2 auxillary/ k 1 2 auxillary) / N k ~ F[ k 1, N k]
90 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxillary regression
91 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxillary regression
92 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxiliary regression or compute N*R 2 auxillary ~ 2 (k-1) (k-1 since not testing constant)
93 Given Y i = a + b 1 X 1 + b 2 X 2 +u i (1) i) Estimate (1) by OLS and save residuals ii) Square residuals and regress these on all the original X variables in (1) - these squared OLS residuals proxy the unknown true residual variance and should not be correlated with the X variables ^ u 2 t = g + g 1 X 1 + g 2 X 2 +u i (2) Using (2) either compute 2 R auxillary/ k 1 F ~ F[ k 1, N k] (1 R 2 auxillary) / N k ie test of goodness of fit for the model in this auxiliary regression or compute N*R 2 auxillary ~ 2 (k-1) (k-1 since not testing constant) If F or N*R 2 auxillary > respective critical values reject null of no heterosked.
94 Example: Breusch-Pagan Test of Heteroskedastcity The data set smoke.dta contains information on the smoking habits, wages age and gender of a cross-section of individuals. u smoke.dta /* read in data */. reg lhw age age2 female smoke Source SS df MS Number of obs = F( 4, 7965) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lhw Coef. Std. Err. t P> t [95% Conf. Interval] age age female smokes _cons /* save residuals */. predict reshat, resid. g reshat2=reshat^2 /* square them */ /* regress square of residuals on all original rhs variables */. reg reshat2 age age2 female smoke Source SS df MS Number of obs = F( 4, 7965) = 6.59 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.70838
95 reshat2 Coef. Std. Err. t P> t [95% Conf. Interval] age age female smokes _cons Breusch-Pagan test is N*R 2. di 7970* which is chi-squared k-1 degrees of freedom (4 in this case) and the critical value is So estimated value exceeds critical value Similarly the F test for goodness of fit in stata output in the top right corner is test for joint significance of all the rhs variables in this model (excluding the constant) From F tables, Fcritical 5% level (4,7970) = 2.37 So estimated F = 6.59 > Fcritical, so reject null of no heteroskedasticity Or could use Stata s version of the Breusch-Pagan test bpagan lhw age age2 female smoke Breusch-Pagan LM statistic: Chi-sq( 5) P-value = 0
96 What to do if heteroskedasticity present? 1. Try different functional form Sometimes taking logs of dependent or explanatory variable can reduce the problem
97 . reg food expnethsum if exp<1000 Source SS df MS Number of obs = F( 1, 190) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = food Coef. Std. Err. t P> t [95% Conf. Interval] expnethsum _cons bpagan expn Breusch-Pagan LM statistic: Chi-sq( 1) P-value =.006 The Breusch-Pagan test indicates the presence of heteroskedasticity (estimated chi-squared value > critical value). This means the standard errors, t statistics etc are biased If use the log of the dependent variable rather than in levels. g lfood=log(food). reg lfood expnethsum if exp<1000 Source SS df MS Number of obs = F( 1, 190) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lfood Coef. Std. Err. t P> t [95% Conf. Interval] expnethsum _cons bpagan expnethsum Breusch-Pagan LM statistic: Chi-sq( 1) P-value =.237
98 2. Drop Outliers Sometimes heteroskedasticity can be influenced by 1 or 2 observations in the data set which stand a long way from the main concentration of data - outliers.
99 2. Drop Outliers Sometimes heteroskedasticity can be influenced by 1 or 2 observations in the data set which stand a long way from the main concentration of data - outliers. Often these observations may be genuine in which case you should not drop them but sometimes they may be the result of measurement error or miscoding in which case you may have a case for dropping them.
100 5 infmort Example The data infmort.dta gives infant mortality for 51 U.S. states along with the number of doctors per capita ine ach state. A graph of infant mortality against number of doctors clearly shows that Washington D.C. is something of an outlier (it has lots of doctors but also a very high infant mortality rate). twoway (scatter infmort state, mlabel(state)), ytitle(infmort) ylabel(, labels) xtitle(state) dc goergia mississip scarol louisiana alabama alaska illinois michigan ncarol delaware sodakota tennesee virginia indiana newyork ohio westvirg florida marylandmissouri pennsyl arkansas newjers oklahoma arizona colarado montana newmex idaho kansas kentucky connet iowa nebraska nevada wyoming oregon nodakot rhodis texas wisconsin calif washingt minnesot utah mass newhamp hawaii maine vermont state A regression of infant mortality on (the log of) doctor numbers for all 51 observations suffers from heteroskedasticity. reg infmort ldocs Source SS df MS Number of obs = F( 1, 49) = 4.08 Model Prob > F =
101 Residual R-squared = Adj R-squared = Total Root MSE = infmort Coef. Std. Err. t P> t [95% Conf. Interval] ldocs _cons bpagan ldocs Breusch-Pagan LM statistic: Chi-sq( 1) P-value = 2.5e-16 However if the outlier is excluded then. reg infmort ldocs if dc==0 Source SS df MS Number of obs = F( 1, 48) = 5.13 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = infmort Coef. Std. Err. t P> t [95% Conf. Interval] ldocs _cons bpagan ldocs Breusch-Pagan LM statistic: Chi-sq( 1) P-value =.7739 Can see that the problem of heteroskedasticty disappears though the D.C. observation is genuine so you need to think carefully about the benefits of dropping it against the costs.
102 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity
103 Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2
104 Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 )
105 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1
106 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 )
107 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) =1/X i 2 Var(u i )
108 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2
109 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X)= Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set
110 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = Var(σ 2 X 1 2 ) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1)
111 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1) becomes Y i / X i = b 0 / X i + b 1 X i / X i + u i / X i (2)
112 2. Feasible GLS If (and this is a big if) you think you know the exact functional form of the heteroskedasticity eg you know that var(u i )=σ 2 X 1 2 (and not say σ 2 X 2 3 ) so that there is a common component to the variance, σ 2, and a part that rises with the square of the level of the variable X 1 Consider the term Var(u i /X) = 1/X i 2 Var(u i ) = 1/X i 2 * σ 2 X i 2 = σ 2 So the variance of this is constant for all observations in the data set This means if we divide all the observations by 1/X i (not 1/X 2 ) Y i = b 0 + b 1 X i + u i (1) becomes Y i / X i = b 0 / X i + b 1 X i / X i + u i / X i (2) and the estimates of b 0 and b 1 in (2) will not be affected by heterosked.
113 This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF
114 This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF The assumption about the form of heteroskedasticity is correct
115 This is called a Feasible Generalised Least Squares Estimator (FGLS) and will be more efficient than OLS IF The assumption about the form of heteroskedasticity is correct If not the solution may be much worse than OLS
116 Example. reg hourpay age Source SS df MS Number of obs = F( 1, 12096) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = hourpay Coef. Std. Err. t P> t [95% Conf. Interval] age _cons bpagan age Breusch-Pagan LM statistic: Chi-sq( 1) P-value = 3.2e-05 Test suggests heteroskedasticity present Suppose you decide that heteroskedasticity is given by var(u i )=σ 2 Age i So transform variables by dividing by SQUARE ROOT of Age (including the constant). g ha=hourpay/sqrt(age). g aa=age/sqrt(age). g ac=1/sqrt(age) /* this is new constant term */. reg ha aa ac, nocon Source SS df MS Number of obs = F( 2, 12096) = Model Prob > F = Residual R-squared = Adj R-squared =
117 Total Root MSE = ha Coef. Std. Err. t P> t [95% Conf. Interval] aa ac If heteroskedastic assumption is correct these are the GLS estimates and should be preferred to OLS. If assumption is not correct they will be misleading.
118 3. White adjustment (OLS robust standard errors) As with autocorrelation, best fix may be to make OLS standard errors unbiased (if inefficient) if don t know precise form of heteroskedasticity
119 3. White adjustment (OLS robust standard errors) As with autocorrelation, best fix may be to make OLS standard errors unbiased (if inefficient) if don t know precise form of heteroskedasticity In absence of heteroskedasticity we know OLS estimate of variance on any coefficient is
Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set
Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set Heteroskedasticity Occurs when the Gauss Markov assumption that
More informationHeteroskedasticity. (In practice this means the spread of observations around any given value of X will not now be constant)
Heteroskedasticity Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set so that E(u 2 i /X i ) σ 2 i (In practice this means the spread
More informationAutocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time
Autocorrelation Given the model Y t = b 0 + b 1 X t + u t Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time This could be caused
More informationHandout 12. Endogeneity & Simultaneous Equation Models
Handout 12. Endogeneity & Simultaneous Equation Models In which you learn about another potential source of endogeneity caused by the simultaneous determination of economic variables, and learn how to
More information10) Time series econometrics
30C00200 Econometrics 10) Time series econometrics Timo Kuosmanen Professor, Ph.D. 1 Topics today Static vs. dynamic time series model Suprious regression Stationary and nonstationary time series Unit
More information7 Introduction to Time Series
Econ 495 - Econometric Review 1 7 Introduction to Time Series 7.1 Time Series vs. Cross-Sectional Data Time series data has a temporal ordering, unlike cross-section data, we will need to changes some
More informationEconometrics. 9) Heteroscedasticity and autocorrelation
30C00200 Econometrics 9) Heteroscedasticity and autocorrelation Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Heteroscedasticity Possible causes Testing for
More informationAnswers: Problem Set 9. Dynamic Models
Answers: Problem Set 9. Dynamic Models 1. Given annual data for the period 1970-1999, you undertake an OLS regression of log Y on a time trend, defined as taking the value 1 in 1970, 2 in 1972 etc. The
More information9) Time series econometrics
30C00200 Econometrics 9) Time series econometrics Timo Kuosmanen Professor Management Science http://nomepre.net/index.php/timokuosmanen 1 Macroeconomic data: GDP Inflation rate Examples of time series
More informationGraduate Econometrics Lecture 4: Heteroskedasticity
Graduate Econometrics Lecture 4: Heteroskedasticity Department of Economics University of Gothenburg November 30, 2014 1/43 and Autocorrelation Consequences for OLS Estimator Begin from the linear model
More information7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15
Econ 495 - Econometric Review 1 Contents 7 Introduction to Time Series 3 7.1 Time Series vs. Cross-Sectional Data............ 3 7.2 Detrending Time Series................... 15 7.3 Types of Stochastic
More informationHandout 11: Measurement Error
Handout 11: Measurement Error In which you learn to recognise the consequences for OLS estimation whenever some of the variables you use are not measured as accurately as you might expect. A (potential)
More informationMeasurement Error. Often a data set will contain imperfect measures of the data we would ideally like.
Measurement Error Often a data set will contain imperfect measures of the data we would ideally like. Aggregate Data: (GDP, Consumption, Investment are only best guesses of theoretical counterparts and
More informationProblem Set 10: Panel Data
Problem Set 10: Panel Data 1. Read in the data set, e11panel1.dta from the course website. This contains data on a sample or 1252 men and women who were asked about their hourly wage in two years, 2005
More informationTopic 7: Heteroskedasticity
Topic 7: Heteroskedasticity Advanced Econometrics (I Dong Chen School of Economics, Peking University Introduction If the disturbance variance is not constant across observations, the regression is heteroskedastic
More information1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE
1. You have data on years of work experience, EXPER, its square, EXPER, years of education, EDUC, and the log of hourly wages, LWAGE You estimate the following regressions: (1) LWAGE =.00 + 0.05*EDUC +
More informationLecture 14. More on using dummy variables (deal with seasonality)
Lecture 14. More on using dummy variables (deal with seasonality) More things to worry about: measurement error in variables (can lead to bias in OLS (endogeneity) ) Have seen that dummy variables are
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationCourse Econometrics I
Course Econometrics I 4. Heteroskedasticity Martin Halla Johannes Kepler University of Linz Department of Economics Last update: May 6, 2014 Martin Halla CS Econometrics I 4 1/31 Our agenda for today Consequences
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationStationary and nonstationary variables
Stationary and nonstationary variables Stationary variable: 1. Finite and constant in time expected value: E (y t ) = µ < 2. Finite and constant in time variance: Var (y t ) = σ 2 < 3. Covariance dependent
More informationBinary Dependent Variables
Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome
More informationMultiple Regression Analysis
Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,
More informationEssential of Simple regression
Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship
More informationECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41
ECON2228 Notes 7 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 6 2014 2015 1 / 41 Chapter 8: Heteroskedasticity In laying out the standard regression model, we made
More informationMaking sense of Econometrics: Basics
Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/
More informationEmpirical Application of Simple Regression (Chapter 2)
Empirical Application of Simple Regression (Chapter 2) 1. The data file is House Data, which can be downloaded from my webpage. 2. Use stata menu File Import Excel Spreadsheet to read the data. Don t forget
More informationECO220Y Simple Regression: Testing the Slope
ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x
More informationTHE MULTIVARIATE LINEAR REGRESSION MODEL
THE MULTIVARIATE LINEAR REGRESSION MODEL Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus
More informationIntermediate Econometrics
Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the
More informationAuto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,
1 Motivation Auto correlation 2 Autocorrelation occurs when what happens today has an impact on what happens tomorrow, and perhaps further into the future This is a phenomena mainly found in time-series
More informationBCT Lecture 3. Lukas Vacha.
BCT Lecture 3 Lukas Vacha vachal@utia.cas.cz Stationarity and Unit Root Testing Why do we need to test for Non-Stationarity? The stationarity or otherwise of a series can strongly influence its behaviour
More informationLab 11 - Heteroskedasticity
Lab 11 - Heteroskedasticity Spring 2017 Contents 1 Introduction 2 2 Heteroskedasticity 2 3 Addressing heteroskedasticity in Stata 3 4 Testing for heteroskedasticity 4 5 A simple example 5 1 1 Introduction
More informationPlease discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!
Econometrics - Exam May 11, 2011 1 Exam Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page! Problem 1: (15 points) A researcher has data for the year 2000 from
More informationECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 48
ECON2228 Notes 10 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 10 2014 2015 1 / 48 Serial correlation and heteroskedasticity in time series regressions Chapter 12:
More informationIntroduction to Econometrics
Introduction to Econometrics STAT-S-301 Introduction to Time Series Regression and Forecasting (2016/2017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 Introduction to Time Series Regression
More informationIntroductory Workshop on Time Series Analysis. Sara McLaughlin Mitchell Department of Political Science University of Iowa
Introductory Workshop on Time Series Analysis Sara McLaughlin Mitchell Department of Political Science University of Iowa Overview Properties of time series data Approaches to time series analysis Stationarity
More informationLecture 4: Heteroskedasticity
Lecture 4: Heteroskedasticity Econometric Methods Warsaw School of Economics (4) Heteroskedasticity 1 / 24 Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity White Goldfeld-Quandt Breusch-Pagan
More informationChapter 8 Heteroskedasticity
Chapter 8 Walter R. Paczkowski Rutgers University Page 1 Chapter Contents 8.1 The Nature of 8. Detecting 8.3 -Consistent Standard Errors 8.4 Generalized Least Squares: Known Form of Variance 8.5 Generalized
More information13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity
Outline: Further Issues in Using OLS with Time Series Data 13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process I. Stationary and Weakly Dependent Time Series III. Highly Persistent
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity
More informationECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47
ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with
More informationECON3150/4150 Spring 2015
ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2
More informationIntroduction to Econometrics. Heteroskedasticity
Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory
More informationLecture 8: Functional Form
Lecture 8: Functional Form What we know now OLS - fitting a straight line y = b 0 + b 1 X through the data using the principle of choosing the straight line that minimises the sum of squared residuals
More informationTesting methodology. It often the case that we try to determine the form of the model on the basis of data
Testing methodology It often the case that we try to determine the form of the model on the basis of data The simplest case: we try to determine the set of explanatory variables in the model Testing for
More informationMultiple Regression Analysis: Heteroskedasticity
Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing
More informationTime Series Methods. Sanjaya Desilva
Time Series Methods Sanjaya Desilva 1 Dynamic Models In estimating time series models, sometimes we need to explicitly model the temporal relationships between variables, i.e. does X affect Y in the same
More informationECON3150/4150 Spring 2016
ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression
More informationECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests
ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one
More informationEconometrics - 30C00200
Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business
More informationEcon 423 Lecture Notes
Econ 423 Lecture Notes (These notes are slightly modified versions of lecture notes provided by Stock and Watson, 2007. They are for instructional purposes only and are not to be distributed outside of
More informationECON2228 Notes 10. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 54
ECON2228 Notes 10 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 10 2014 2015 1 / 54 erial correlation and heteroskedasticity in time series regressions Chapter 12:
More informationInstrumental Variables, Simultaneous and Systems of Equations
Chapter 6 Instrumental Variables, Simultaneous and Systems of Equations 61 Instrumental variables In the linear regression model y i = x iβ + ε i (61) we have been assuming that bf x i and ε i are uncorrelated
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution
More informationChapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression
Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationLecture 8: Heteroskedasticity. Causes Consequences Detection Fixes
Lecture 8: Heteroskedasticity Causes Consequences Detection Fixes Assumption MLR5: Homoskedasticity 2 var( u x, x,..., x ) 1 2 In the multivariate case, this means that the variance of the error term does
More informationFunctional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X
Functional Form So far considered models written in linear form Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X Functional Form So far considered models written in linear form
More informationStatistical Inference with Regression Analysis
Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing
More information11. Further Issues in Using OLS with TS Data
11. Further Issues in Using OLS with TS Data With TS, including lags of the dependent variable often allow us to fit much better the variation in y Exact distribution theory is rarely available in TS applications,
More informationLecture#12. Instrumental variables regression Causal parameters III
Lecture#12 Instrumental variables regression Causal parameters III 1 Demand experiment, market data analysis & simultaneous causality 2 Simultaneous causality Your task is to estimate the demand function
More informationECON Introductory Econometrics. Lecture 13: Internal and external validity
ECON4150 - Introductory Econometrics Lecture 13: Internal and external validity Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 9 Lecture outline 2 Definitions of internal and external
More informationEconometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague
Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in
More information1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11
Econ 495 - Econometric Review 1 Contents 1 Linear Regression Analysis 4 1.1 The Mincer Wage Equation................. 4 1.2 Data............................. 6 1.3 Econometric Model.....................
More informationEconometrics. 8) Instrumental variables
30C00200 Econometrics 8) Instrumental variables Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Thery of IV regression Overidentification Two-stage least squates
More informationPractice exam questions
Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.
More informationEconomics 308: Econometrics Professor Moody
Economics 308: Econometrics Professor Moody References on reserve: Text Moody, Basic Econometrics with Stata (BES) Pindyck and Rubinfeld, Econometric Models and Economic Forecasts (PR) Wooldridge, Jeffrey
More information5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is
Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do
More informationRockefeller College University at Albany
Rockefeller College University at Albany PAD 705 Handout: Suggested Review Problems from Pindyck & Rubinfeld Original prepared by Professor Suzanne Cooper John F. Kennedy School of Government, Harvard
More informationMultivariate Time Series
Multivariate Time Series Fall 2008 Environmental Econometrics (GR03) TSII Fall 2008 1 / 16 More on AR(1) In AR(1) model (Y t = µ + ρy t 1 + u t ) with ρ = 1, the series is said to have a unit root or a
More informationLongitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois
Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control
More informationOutline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity
1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi
More informationLecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)
Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model
More informationInterpreting coefficients for transformed variables
Interpreting coefficients for transformed variables! Recall that when both independent and dependent variables are untransformed, an estimated coefficient represents the change in the dependent variable
More informationLab 07 Introduction to Econometrics
Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand
More informationLab 10 - Binary Variables
Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2
More informationRecent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data
Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)
More informationF Tests and F statistics
F Tests and F statistics Testing Linear estrictions F Stats and F Tests F Distributions F stats (w/ ) F Stats and tstat s eported F Stat's in OLS Output Example I: Bodyfat Babies and Bathwater F Stats,
More informationCovers Chapter 10-12, some of 16, some of 18 in Wooldridge. Regression Analysis with Time Series Data
Covers Chapter 10-12, some of 16, some of 18 in Wooldridge Regression Analysis with Time Series Data Obviously time series data different from cross section in terms of source of variation in x and y temporal
More informationTopic 4 Unit Roots. Gerald P. Dwyer. February Clemson University
Topic 4 Unit Roots Gerald P. Dwyer Clemson University February 2016 Outline 1 Unit Roots Introduction Trend and Difference Stationary Autocorrelations of Series That Have Deterministic or Stochastic Trends
More informationWooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of
Wooldridge, Introductory Econometrics, d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of homoskedasticity of the regression error term: that its
More informationQuestion 1 [17 points]: (ch 11)
Question 1 [17 points]: (ch 11) A study analyzed the probability that Major League Baseball (MLB) players "survive" for another season, or, in other words, play one more season. They studied a model of
More informationAutoregressive models with distributed lags (ADL)
Autoregressive models with distributed lags (ADL) It often happens than including the lagged dependent variable in the model results in model which is better fitted and needs less parameters. It can be
More informationEconometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague
Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage
More informationEconometrics Part Three
!1 I. Heteroskedasticity A. Definition 1. The variance of the error term is correlated with one of the explanatory variables 2. Example -- the variance of actual spending around the consumption line increases
More informationHeteroskedasticity Example
ECON 761: Heteroskedasticity Example L Magee November, 2007 This example uses the fertility data set from assignment 2 The observations are based on the responses of 4361 women in Botswana s 1988 Demographic
More informationAUTOCORRELATION. Phung Thanh Binh
AUTOCORRELATION Phung Thanh Binh OUTLINE Time series Gauss-Markov conditions The nature of autocorrelation Causes of autocorrelation Consequences of autocorrelation Detecting autocorrelation Remedial measures
More informationEconometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague
Econometrics Week 11 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 30 Recommended Reading For the today Advanced Time Series Topics Selected topics
More informationSemester 2, 2015/2016
ECN 3202 APPLIED ECONOMETRICS 5. HETEROSKEDASTICITY Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016 WHAT IS HETEROSKEDASTICITY? The multiple linear regression model can
More information1 The Multiple Regression Model: Freeing Up the Classical Assumptions
1 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions were crucial for many of the derivations of the previous chapters. Derivation of the OLS estimator
More informationECO375 Tutorial 7 Heteroscedasticity
ECO375 Tutorial 7 Heteroscedasticity Matt Tudball University of Toronto Mississauga November 9, 2017 Matt Tudball (University of Toronto) ECO375H5 November 9, 2017 1 / 24 Review: Heteroscedasticity Consider
More informationMultiple Regression Analysis
Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators
More informationLecture 3: Multivariate Regression
Lecture 3: Multivariate Regression Rates, cont. Two weeks ago, we modeled state homicide rates as being dependent on one variable: poverty. In reality, we know that state homicide rates depend on numerous
More informationECON Introductory Econometrics. Lecture 16: Instrumental variables
ECON4150 - Introductory Econometrics Lecture 16: Instrumental variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 12 Lecture outline 2 OLS assumptions and when they are violated Instrumental
More informationECON3150/4150 Spring 2016
ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and
More informationEconometrics and Structural
Introduction to Time Series Econometrics and Structural Breaks Ziyodullo Parpiev, PhD Outline 1. Stochastic processes 2. Stationary processes 3. Purely random processes 4. Nonstationary processes 5. Integrated
More informationEconometrics Multiple Regression Analysis: Heteroskedasticity
Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties
More informationEconometrics Lecture 9 Time Series Methods
Econometrics Lecture 9 Time Series Methods Tak Wai Chau Shanghai University of Finance and Economics Spring 2014 1 / 82 Time Series Data I Time series data are data observed for the same unit repeatedly
More informationHeteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.
Heteroskedasticity y i = β + β x i + β x i +... + β k x ki + e i where E(e i ) σ, non-constant variance. Common problem with samples over individuals. ê i e ˆi x k x k AREC-ECON 535 Lec F Suppose y i =
More information