Chapter 10: Multicollinearity Iris Wang iris.wang@kau.se
Econometric problems
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate out the effects of the individual regressors. Standard errors may be overestimated and t values depressed. Note: a symptom may be high R 2 but low t values How can you detect the problem? Examine the correlation matrix of regressors also carry out auxiliary regressions amongst the regressors. Look at the Variance inflating factor (VIF) NOTE: be careful not to apply t tests mechanically without checking for multicollinearity multicollinearity is a data problem, not a misspecification problem
Variance inflating inflating factor (VIF) Multicollinearity inflates the variance of an estimator VIF 2 J = 1/(1 R J2 ) where R J2 measures the R 2 from a regression of X j on the other X variable/s ibl/ serious multicollinearity problem if VIF J >5
Econometric problems
Heteroskedasticity What does it mean? The variance of the error term is not constant t What are its consequences? The least squares results are no longer efficient i and t tests and F tests results may be misleading How can you detect dt tthe problem? Plot the residuals against each of the regressors or use one of the more formal tests How can we remedy theproblem? Respecify the model look for other missing variables; perhaps take logs or choose some other appropriate functional form; or make sure relevant variables are expressed per capita
The Homoskedastic Case
The Heteroskedastic Case
The consequences of heteroskedasticity OLS estimators are still unbiased (unless there are also omitted variables) ibl However OLS estimators are no longer efficient or minimum variance The formulae used to estimate the coefficient standard errors are no longer correct so the t-tests will be misleading confidence intervals based on these standard errors will be wrong
Detecting heteroskedasticity Visual inspection of scatter diagram or the residuals Goldfeld Quandt test suitable for a simple form of heteroskedasticity
Goldfeld Quandt test (JASA, 1965) P. 382, Suppose it looks as if σ ui = σ u X i i.e. the error variance is proportional to the square of one of the X s Rank the data according to the variable and conduct an F test using RSS 2 /RSS 1 where these RSS are based on regressions using the first and last [n c]/2 observations [c is a central section of data usually about 25% of n] Reject H 0 of homoskedasticity if F cal > F tables
Remedies Respecification of the model Include relevant omitted variable(s) Express model in log-linear form or some other appropriate functional form Express variables in per capita form Where respecification won t solve the problem use robust Heteroskedastic Consistent Standard Errors (due to Hal White, Econometrica 1980)
Basic Econometrics, Spring 2012 Chapter 11: Heteroskedasticity Iris Wang iris.wang@kau.se 1
Chapter 11: Heteroskedasticity Definition: Heteroskedasticity occurs when the constant variance assumption, i.e. Var(u i X i )= σ 2, fails. This happens when variance of the error term (u i ) changes across different values of X i. Example: Savings i =α 0 +α 1 income+u i Heteroskedasticity is present if the variance of unobserved factors affecting savings (u i ) increases with income - Higher variance of u i for higher income 2
Chapter 11: Ht Heteroskedasticity kd tiit Outline 1. Consequences of Heteroskedasticity 2. Testing for Heteroskedasticity 3
1. Consequences of Heteroskedasticity OLS is unbiased and consistent under the following 4 assumptions: Linear in parameters Random sampling No perfect collinearity Zero conditional mean (E(u X)=0) Homoskedasticity assumption (MLR.4) stating constant error variance (Var(u X)= σ 2 ) plays no role in showing that OLS is unbiased & consistent Heteroskedasticity doesn t cause bias or inconsistency in OLS estimators 4
1. Consequences of Heteroskedasticity cntd However, estimators of variances, Var(β j) are biased without homoskedasticity OLS standard errors are biased Standard confidence interval, t, and F statistics which are based on standard errors are no longer valid. t & F statistics no longer have t & F distribution resp. And this is not resolved in large samples OLS is no longer BLUE and asymptotically yefficient It is possible to find estimates that are more efficient than OLS (e.g. GLS, Generalized Least Squares) Solutions involve using: i. Generalized least squares (GLS) ii. Weighted least squares (WLS) is a special case of GLS, p.373 5
Weighted Least Squares (WLS) Aim: to specify the form of heteroskedasticity detected and use weighted least squares estimator. If we have correctly specified the form of the variance, then WLS is more efficient than OLS If we used wrong form of variance, WLS will be biased but tit is generally consistent as long as E(u X)=0. But, efficiency of WLS is not guaranteed when using wrong form of variance. We use this to transform the original regression equation with homoskedastic error term i.e. the bias will improve with large N 6
2. Testing for Heteroskedasticity Why test for heteroskedasticity? First, unless there is evidence of heteroskedasticity, many prefer to use the usual t under OLS This is because the usual t statistics have exact t distribution under the assumptions of homoskedasticity & normally distributed errors. Second, if heteroskedasticity is present, it is possible to obtain better estimator than OLS when the form of heteroskedasticity is known. In the regression model: Y= β 0 +β 1 x 1 + +β k x k +u We assume that E(u x 1, x k )=0 OLS is unbiased and consistent. In order to test for violation of the homoskedasticity assumption, we want to test the null hypothesis: Ho: Var(u x 1,, x k )=σ 2 7
2. Testing for Heteroskedasticity cntd To test the null hypothesis above, we test whether expected value of u 2 is related to one or more of the explanatory variables. If we reject Ho, then heteroskedasticity is a problem & needs to be solved. Two types heteroskedasticity tests: A. Goldfeld Quandt Test for heteroskedasticity, p.382 B. White s General Heteroskedasticity kd ii Test, p.386 Once we reject Ho of homoskedasticity, we should treat the heteroskedasticity problem 8
B. White heteroskedasticity test homoskedasticity assumption, Var(u X)=σ 2, can be replaced with weaker assumption that u 2 is uncorrelated with: All the independent variables (x j ) Their squared terms (x 2 j) and Their cross products (x j x h for all h j) Under this weaker assumption, OLS standard errors and test statistics are asymptotically valid Whiteheteroskedasticity heteroskedasticity test is motivatedby thisassumption assumption. For e.g. for k=3, û 2= δ 0 + δ 1 x 1 + δ 2 x 2 + δ 3 x 3 +δ 4 x 2 1 + δ 5 x 2 2 + δ 6 x 2 3+ δ 7 x 1 x 2 + δ 8 x 1 x 3 + δ 9 x 2 x 3 +v White test is F statistics for testing all δ j, except δ 0,arezero. Limitation: it consumes degrees of freedom (for k=3, we needed 9 variables) 9
Basic Econometrics Autocorrelation Iris Wang iris.wang@kau.se
Econometric problems
Topics to be covered Overview of autocorrelation First order autocorrelation and the Durbin Watson test Higher order autocorrelation and the Breusch Godfrey test Dealing with autocorrelation Examples and practical illustrations
Autocorrelated series and autocorrelated Autocorrelated series and autocorrelated disturbances
Overview of autocorrelation What is meant by autocorrelation? The error terms are not independent from observation to observation u t depends ds on one eor more epast values uesof u What are its consequences? The least squares estimators are no longer efficient (i.e. they don t have the lowest variance). More seriously autocorrelation may be a symptom of model misspecification ifi How can you detect the problem? Plot the residuals against time or their own lagged values, calculate the Durbin Watson statistic or use some other tests of autocorrelation such as the Breusch Godfrey (BG) test How can you remedy the problem? Consider possible model re specification of the model: a different functional form, missing variables, lags etc. If all elsefails you couldcorrectcorrect for autocorrelation by using the Cochrane Orcutt procedure or Autoregressive Least Squares
First order autocorrelation
The sources of autocorrelation
The consequences of autocorrelation
Detecting autocorrelation
The Durbin Watson test
More on the Durbin Watson statistic
Using the Durbin Watson statistic
Durbin Watson critical values
The Breusch Godfrey (BG) test
The Breusch Godfrey test continued
Dealing with autocorrelation How should you deal with a problem of autocorrelation? Consider possible re specification of the model: a different functional form, the inclusion of additional explanatory variables, the inclusion of lagged variables (independent and dependent) If all else fails you can correct for autocorrelation by using the Autoregressive Least Squares
Quick questions and answers
Question 1: What is the problem of autocorrelation?
Answer: Autocorrelation is theproblem where the disturbances in a regression model are not independent of one another from observation to observation (it is mainly a problem for models estimated using time series data)
Question 2: Is serial correlation the same as autocorrelation?
Answer: Yes. Serially correlated disturbances or errors are the same as autocorrelated ones.
Question 3: What is meant by AR(1) errors?
Answer: This means that the errors or disturbances dstuba follow oo a first order ode autoregressive pattern u t = ρu t 1 + ε t
Question 4: What is the best known test for AR(1) disturbances?
Answer: The Durbin Watson test. The null hypothesis of no autocorrelation (serial lindependence) d )is H 0 ρ=00
Question 5: What is the range of possible values for the DW statistic?
Answer: 0 DW 4. If there is no autocorrelation you would expect to get a DW stat of around 2.
Question 6: What are the three main limitations of the DW test?
Answer: 1. It only tests for AR(1) errors 2. It has regions where the test is inconclusive (between d L and d U ) 3. The DW statistic is biased towards 2 in models with a lagged dependent variable.
Question 7: How do you test for higher order autocorrelated errors?
Answer: Using the Breusch Godfrey (BG) test
Question 9: How do I know what order of autocorrelation to test for?
Answer: With annual data a first order test is probably enough, with quarterly or monthly dt data check kfor AR(4) or AR(12) errors if you have enough data. If in doubt repeat the test for a number of different maximum lags.
Question 10: What should I do if my model exhibits autocorrelation?
Answer: On the first instance try model re specification (additional lagged values of variables ibl or a log transformation ti of some series). If this doesn t deal with the problem use Autoregressive Least Squares rather than OLS estimation.