1 Motivation Auto correlation 2 Autocorrelation occurs when what happens today has an impact on what happens tomorrow, and perhaps further into the future This is a phenomena mainly found in time-series applications Typically found in financial data, macro data, wage data Autocorrelation can only happen into the past, not into the future cov(ɛ i, ɛ j ) 0 i, j 2 AR(1) Errors AR(1) errors occur when y i = X i β + ɛ i and ɛ i = ρɛ i 1 + u i where ρ is the autocorrelation coecient, ρ < 1 and u i N(0, σ 2 u) Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, ie, ɛ i = ρ 1 ɛ i 1 + ρ 2 ɛ i 2 + + ρ p ɛ i p Note: We will need ρ < 1 for stability and stationarity If ρ < 1 happens to fail then we have the following problems: 1 ρ = 0: No serial correlation present 2 ρ > 1: The process explodes 3 ρ = 1: The process follows a random walk 4 ρ = 1: The process is oscillatory 5 ρ < 1: The process explodes in an oscillatory fashion The consequences for OLS: ˆβ is unbiased and consistent but no longer ecient and usual statistical inference is rendered invalid 1
Lemma: ɛ i = ρ j u i j j=0 Note the expectation of ɛ i is E[ɛ i ] = E[ ρ j u i j ] The variance of ɛ is var(ɛ i ) = E[ɛ 2 i ] = j=0 ρ j E[u i j ] = j=0 = E[(u i + ρu i 1 + ρ 2 u i 2 + ) 2 ] ρ j 0 = 0 j=0 = E[u 2 i + ρu i 1 u i + ρ 2 u 2 i 1 + ρ 4 u 2 i 2 + ] var(ɛ i ) = σ 2 u + ρ 2 σ 2 u + ρ 4 σ 2 u + Therefore, the var(ɛ i ) is var(ɛ i ) = σ 2 u + ρ 2 σ 2 u + ρ 4 σ 2 u + = σ 2 u + ρ 2 (var(ɛ i 1 )) But, assuming homoscedasticity, var(ɛ i ) = var(ɛ i 1 ) so that var(ɛ i ) = σ 2 u + ρ 2 (var(ɛ i 1 )) = σu 2 + ρ 2 (var(ɛ i )) var(ɛ i ) = σu 2 1 ρ 2 σ2 Note: This is why we need ρ < 1 for stability in the process If ρ > 1 then the denominator is negative and the var(ɛ i ) cannot be negative We note the correlation between ɛ i and ɛ i 1 cov(ɛ i, ɛ i 1 ) corr(ɛ i, ɛ i 1 ) = var(ɛi )var(ɛ i 1 ) = ρ σ 2 1 ρ 2 u σu 2 = ρ is the correlation coecient 1 ρ 2 At this point the following results hold 1 The OLS estimate of s 2 is biased but consistent 2 s 2 is usually biased downward because we usually nd ρ > 0 in economic data This implies that σ 2 (X X) 1 tends to be less than σ 2 (X X) 1 X ΩX(X X) 1 if ρ > 0 and the variables of X are positively correlated over time This implies that t-statistics are over-stated and we may introduce Type I errors in our inferences How do we know if we have Autocorrelation or not? 2
3 Tests for Autocorrelation 1 Plot residuals (ˆɛ i ) against time 2 Plot residuals (ˆɛ i ) against ˆɛ i 1 4 Durbin-Watson Test This is a most popular test Assumptions: (a) Regression has a constant term (b) No lagged dependent variables (c) No missing values (d) AR(1) error structure The null hypothesis is that ρ = 0 or that there is no serial correlation The test statistic is calculated as d = i=2 (ˆɛ i ˆɛ i 1 ) 2 i=1 ˆɛ2 i which is equivalent to ˆɛ Aˆɛ where A = 1 1 0 0 1 2 1 0 0 0 1 2 0 0 2 1 1 1 An equivalent test is d = 2(1 ˆρ) where ˆρ comes from ˆɛ i = ρˆɛ i 1 + u i Note that 1 ρ 1 so that d [0, 4] where (a) d = 0 indicates perfect positive serial correlation (b) d = 4 indicates perfect negative serial correlation (c) d = 2 indicates no serial correlation The reason for this is that the DW statistic does not follow a standard distribution The distribution of the statistic is dependent upon ˆɛ i, which are dependent upon the X is in the model Further, there are dierent degrees of freedom that must be controlled for For example, let N = 25, k = 3 then DW L = 0906 and DW U = 1409 If d = 178 then d > DW U but d < 4 DW U and we fail to reject the null Graphically this looks like Reject H 0 Positive Correlation Inconclusive Zone Fail to Reject H 0 or H 0 0 DW L DW U 2 4 DWU 4 DW L 4 Inconclusive Zone Reject H 0 Negative Correlation 3
Let's look a little closer at our DW statistic i=2 DW = ˆɛ i 2 2 i=2 ˆɛ iˆɛ i 1 + i=2 ˆɛ2 i 1 [ 2 i=2 ˆɛ iˆɛ i 1 + ] ˆɛ 2 1 ˆɛ 2 N = why? Note the following: i=2 ˆɛ2 i = ˆɛ 2 2 + ˆɛ 2 3 + + ˆɛ 2 N N i=2 ˆɛ2 i 1 = ˆɛ 2 1 + ˆɛ 2 2 + + ˆɛ 2 N 1 = ˆɛ 2 1 + ˆɛ 2 2 + + ˆɛ 2 N = ˆɛ 2 1 + ˆɛ 2 2 + + ˆɛ 2 N Therefore we have simply added and subtracted ˆɛ 2 1 and ˆɛ 2 N Therefore, DW = 2ˆɛ ˆɛ 2 i=2 ˆɛ iˆɛ i 1 ˆɛ 2 1 ˆɛ 2 N = 2 2 i=2 (ρˆɛ i 1 + u i )ˆɛ i 1 [ˆɛ 2 1 + ˆɛ 2 N ] then DW = 2 2γ 1 ˆρ γ 2 where γ 1 = i=2 ˆɛ2 i 1 and γ 2 = ˆɛ2 1 + ˆɛ 2 N Note that as N then γ 1 1 and γ 2 0 so that DW 2 2ˆρ Under H 0 : ρ = 0 and thus DW = 2 Note: We can calculate ˆρ as ˆρ = 1 05DW Durbin's h-test on the lagged dependent variable The Durbin-Watson test assumes that X is non-stochastic This may not always be the case, eg, if we include lagged dependent variables on the right-hand side Durbin oers an alternative test in this case Under the null hypothesis that ρ = 0 the test becomes ( h = 1 d ) N 2 1 N(var(α)) where α is the coecient on the lagged dependent variable Note: If Nvar(α) > 1 then we have a problem because we can't take the square root of a negative number Durbin's h statistic is approximately distributed as a normal with unit variance 4
Breusch-Godfrey Test Errors are AR(p) This is basically a Lagrange Multiplier test of H 0 : No autocorrelation versus H α : Errors are AR(p) Regress ˆɛ i on X i, ˆɛ i 1,, ˆɛ i p and obtain NR 2 χ 2 p where p is the number of lagged values that contribute to the correlation The intuition behind this test is rather straightforward We know that X ˆɛ = 0 so that any R 2 > 0 must be caused by correlation between the current and the lagged residuals 4 Correcting an AR(1) Process One way to x the problem is to get the error term of the estimated equation to satisfy the full ideal conditions One way to do this might be through substitution Consider the model we estimate is y t = β 0 + β 1 X t + ɛ t where ɛ t = ρɛ t 1 + u t and u t (0, σ 2 u) It is possible to rewrite the original model as y t = β 0 + β 1 X t + ρɛ t 1 + u t but ɛ t 1 = y t 1 β 0 β 1 X t 1 thus y t = β 0 + β 1 X t + ρ(y t 1 β 0 β 1 X t 1 ) + u t : via substitution y t ρy t 1 = β 0 (1 ρ) + β 1 (X t ρx t 1 ) + u t : via gathering terms y t = β 0 + β 1 X t + u t We can estimate the transformed model, which satises the full ideal conditions as long as ut satisfies the full ideal conditions One downside is the loss of the rst observation, which can be a considerable sacrifice in degrees of freedom What if ρ is unknown? We seek a consistent estimator of ρ so as to run Feasible GLS Methods of estimating ρ 1 Cochranne-Orcutt: Throw out the rst observation We assume an AR(1) process which implies ɛ i = ρɛ i 1 + u i So, we run OLS on ˆɛ i = ρˆɛ i 1 + u i and obtain ˆρ = i=2 ˆɛ iˆɛ i 1 i=2 ˆɛ2 i which is the OLS estimator of ρ 5
2 Durbin's Method After substituting for ɛ i we see that y i = β 0 + β 1 X i1 + β 2 X i2 + + β k X ik + ρɛ i 1 + u i = β 0 + β 1 X i1 + + β k X ik + ρ(y i 1 β 0 β 1 X i 1,1 β k X i 1,k ) + u i So, we run OLS on y i = ρy i 1 + (1 ρ)β 0 + β 1 X i1 β 1 ρx i 1,1 + + β k ρx i,k β k ρx i 1,k + u i From this we obtain ˆρ which is the coecient on y i 1 This parameter estimate is biased but consistent Note: When k is large, we may have a problem in the degrees of freedom To preserve the degrees of freedom, we must have N > 2k + 1 observations to employ this method In small samples, this method may not be feasible 3 Newey-West Covariance Matrix We can correct the covariance matrix of ˆβ much like we did in the case of heteroscedasticity This extention of White (1980) was oered by Newey and West We seek a consistent estimator of X ΩX which then leads to where cov( ˆβ) = σ 2 (X X) 1 X ΩX(X X) 1 X ΩX = 1 N ˆɛ 2 i X i X i + 1 N L ω iˆɛ jˆɛ j 1 (X j X j 1 + X j 1 X j) i=1 j=i+1 where ω i = 1 i L + 1 A possible problem in this approach is to determine L, or how far back into the past to go to correct the covariance matrix of autocorrelation 7
Forecasting in the AR(1) Environment Having estimated β GLS we know that β GLS is BLUE when the cov(ɛ) = σ 2 Ω when Ω I With an AR(1) process, we know that tomorrow's output is dependent upon today's output and today's random error We estimate where ɛ t = ρɛ t 1 + u t The forecast becomes y t = X t β + ɛ t y t+1 = X t+1 β + ɛ t+1 = X t+1 β + ρɛ t + u t+1 To nish the forecast, we need ˆρ from our previous estimation techniques and then we recongize that ɛ t = y t X t β from GLS estimation We assume that u t+1 as a zero mean Then we see that ŷ t+1 = X t+1 β + ˆρ ɛ Example: Gasoline Retail Prices In this example we look at the relationship between the US average retail price of gasoline and the wholesale price of gasoline from from January 1985 through February 2006 As an initial step, we plot the two series over time and notice a highly correlated set of series: 50 100 150 200 250 0 50 100 150 200 250 obs allgradesprice 8 wprice
A simple OLS regression model produces: reg allgradesprice on wprice Source SS df MS Number of obs = 254 -------------+------------------------------ F( 1, 252) = 346783 Model 27915617 1 27915617 Prob > F = 00000 Residual 202856879 252 804987614 R-squared = 09323 -------------+------------------------------ Adj R-squared = 09320 Total 299441858 253 118356466 Root MSE = 89721 allgradesp~e Coef Std Err t P> t [95% Conf Interval] wprice 1219083 0207016 5889 0000 1178313 1259853 _cons 3198693 1715235 1865 0000 2860891 3536495 The results suggest that for every penny in wholesale price, there is a 121 penny increase in the average retail price of gasoline The constant term suggests that, on average, there is approximately 32 cents dierence between retail and wholesale prices, comprised of prots, state and federal taxes A Durbin-Watson statistic calculated after the regression yields Durbin-Watson d-statistic( 2, 254) = 1905724 The DW statistic suggests that the data suer from signicant autocorrelation Reversing out an estimate of ˆρ = 1 d/2 suggests that ρ = 0904 Here is a picture of the tted residuals against time: Residuals 20 10 0 10 20 0 50 100 150 200 250 obs 9
Here are robust-regression results: reg allgradesprice on wprice + r Number of obs= 254 Regression with robust standard errors F( 1, 252) = 595112 Prob > F = 00000 R-squared = 09323 Root MSE = 89721 Robust allgradesp~e Coef Std Err t P> t [95% Conf Interval] wprice 1219083 0158028 7714 0000 118796 1250205 _cons 3198693 1502928 2128 0000 2902703 3494683 The robust regression results suggest that the naive OLS over-states the variance in the parameter estimate on wprice, but the positive value of ρ suggests the opposite is likely true Various xes are possible First, Newey-West standard errors: Reg allgradesprice on wprice, lag(1) Regression with Newey-West standard errors Number of obs = 254 maximum lag: 1 F(1,252) = 355842 Prob > F = 00000 Newey-West allgradesp~e Coef Std Err t P> t [95% Conf Interval] wprice 1219083 0204364 5965 0000 1178835 1259331 _cons 3198693 2023802 1581 0000 2800121 3597265 The Newey-West corrected standard errors, assuming AR(1) errors, are significantly higher than the robust OLS standard errors but are only slightly lower than those in naive OLS Cochrane-Orcutt AR(1) regression -- iterated estimates Source SS df MS Number of obs = 253 -------------+------------------------------ F( 1, 251) = 60624 Model 574073875 1 574073875 Prob > F = 00000 Residual 237681962 251 946940088 R-squared = 07072 -------------+------------------------------ Adj R-squared = 07060 Total 811755837 252 322125332 Root MSE = 30772 allgradesp~e Coef Std Err t P> t [95% Conf Interval] wprice 8133207 0330323 2462 0000 7482648 8783765 _cons 7527718 1257415 599 0000 505129 1000415 rho 9840736 Durbin-Watson statistic (original) 0190572 Durbin-Watson statistic (transformed) 2065375 10