CHAPTER 4: Forecasting by Regression

Size: px

Start display at page:

Download "CHAPTER 4: Forecasting by Regression"

Clarence Perkins
6 years ago
Views:

1 CHAPTER 4: Forecasting by Regression Prof. Alan Wan 1 / 57

2 Table of contents 1. Revision of Linear Regression 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation 2 / 57

3 Revision of Linear Regression One main purpose of regression is to forecast an outcome, also called response variable or dependent variable, based on certain factors, also called explanatory variables or regressors. 3 / 57

4 Revision of Linear Regression One main purpose of regression is to forecast an outcome, also called response variable or dependent variable, based on certain factors, also called explanatory variables or regressors. The outcome has to be quantitative, but the explanatory variables can either be quantitative or qualitative. 3 / 57

5 Revision of Linear Regression One main purpose of regression is to forecast an outcome, also called response variable or dependent variable, based on certain factors, also called explanatory variables or regressors. The outcome has to be quantitative, but the explanatory variables can either be quantitative or qualitative. Linear regression postulates a linear association between the response and each of the explanatory variables; simple regression deals with situations with one explanatory variable, whereas multiple regression tackles cases with more than one regressor. 3 / 57

6 Revision of Linear Regression A multiple linear regression model may be expressed as: Y t = β 0 + β 1 X 1t + β 2 X 2t + β 3 X 3t + + β k X kt + ɛ t, where ɛ t N(0, σ 2 ). Hence E(Y t ) = β 0 + β 1 X 1t + β 2 X 2t + β 3 X 3t + + β k X kt 4 / 57

7 Revision of Linear Regression A multiple linear regression model may be expressed as: Y t = β 0 + β 1 X 1t + β 2 X 2t + β 3 X 3t + + β k X kt + ɛ t, where ɛ t N(0, σ 2 ). Hence E(Y t ) = β 0 + β 1 X 1t + β 2 X 2t + β 3 X 3t + + β k X kt The estimated sample multiple linear regression model is thus Ŷ t = b 0 + b 1 X 1t + b 2 X 2t + b 3 X 3t + + b k X kt, where b 0, b 1,, b k are the ordinary least squares (O.L.S.) estimators of β 0, β 1,, β k respectively obtained by the criterion min n t=1 e2 t = min n t=1 (Y t Ŷt) 2 4 / 57

8 Revision of Linear Regression A slope coefficient represents the marginal change of Y t with respect to a one-unit change in the corresponding explanatory variable. 5 / 57

9 Revision of Linear Regression A slope coefficient represents the marginal change of Y t with respect to a one-unit change in the corresponding explanatory variable. The linear regression model assumes 1. that there is a linear association between the response and each of the explanatory variables 5 / 57

10 Revision of Linear Regression A slope coefficient represents the marginal change of Y t with respect to a one-unit change in the corresponding explanatory variable. The linear regression model assumes 1. that there is a linear association between the response and each of the explanatory variables 2. that E(ɛ t ) = 0 for all t, meaning that no relevant explanatory variable has been omitted 5 / 57

11 Revision of Linear Regression A slope coefficient represents the marginal change of Y t with respect to a one-unit change in the corresponding explanatory variable. The linear regression model assumes 1. that there is a linear association between the response and each of the explanatory variables 2. that E(ɛ t ) = 0 for all t, meaning that no relevant explanatory variable has been omitted 3. that the disturbances are homoscedastic, i.e., var(ɛ t ) = σ 2 for all t 5 / 57

12 Revision of Linear Regression A slope coefficient represents the marginal change of Y t with respect to a one-unit change in the corresponding explanatory variable. The linear regression model assumes 1. that there is a linear association between the response and each of the explanatory variables 2. that E(ɛ t ) = 0 for all t, meaning that no relevant explanatory variable has been omitted 3. that the disturbances are homoscedastic, i.e., var(ɛ t ) = σ 2 for all t 4. that the disturbances are uncorrelated, i.e., cov(ɛ t, ɛ t+s ) = 0 for all t and s 0 5 / 57

13 Revision of Linear Regression A slope coefficient represents the marginal change of Y t with respect to a one-unit change in the corresponding explanatory variable. The linear regression model assumes 1. that there is a linear association between the response and each of the explanatory variables 2. that E(ɛ t ) = 0 for all t, meaning that no relevant explanatory variable has been omitted 3. that the disturbances are homoscedastic, i.e., var(ɛ t ) = σ 2 for all t 4. that the disturbances are uncorrelated, i.e., cov(ɛ t, ɛ t+s ) = 0 for all t and s 0 5. the absence of perfect multicollinearity, i.e., no exact linear association exists among the explanatory variables 5 / 57

14 Revision of Linear Regression A slope coefficient represents the marginal change of Y t with respect to a one-unit change in the corresponding explanatory variable. The linear regression model assumes 1. that there is a linear association between the response and each of the explanatory variables 2. that E(ɛ t ) = 0 for all t, meaning that no relevant explanatory variable has been omitted 3. that the disturbances are homoscedastic, i.e., var(ɛ t ) = σ 2 for all t 4. that the disturbances are uncorrelated, i.e., cov(ɛ t, ɛ t+s ) = 0 for all t and s 0 5. the absence of perfect multicollinearity, i.e., no exact linear association exists among the explanatory variables 6. normality of ɛ t s (this assumption is needed only when conducting inference) 5 / 57

15 Revision of Linear Regression The O.L.S. estimator b j is a linear estimator of β j for j = 0,, k, because each b j can be written as a linear combination of Y t s weighted by a mixture of the values of X t s. 6 / 57

16 Revision of Linear Regression The O.L.S. estimator b j is a linear estimator of β j for j = 0,, k, because each b j can be written as a linear combination of Y t s weighted by a mixture of the values of X t s. When Assumptions are fulfilled, b j yields the best linear unbiased estimator (B.L.U.E.) of β j, meaning that the linear estimator b j is unbiased (i.e., E(b j ) = β j for j = 0,, k) and b j has the smallest variance (and hence the highest average precision) of all linear unbiased estimators of β j. 6 / 57

17 Revision of Linear Regression The O.L.S. estimator b j is a linear estimator of β j for j = 0,, k, because each b j can be written as a linear combination of Y t s weighted by a mixture of the values of X t s. When Assumptions are fulfilled, b j yields the best linear unbiased estimator (B.L.U.E.) of β j, meaning that the linear estimator b j is unbiased (i.e., E(b j ) = β j for j = 0,, k) and b j has the smallest variance (and hence the highest average precision) of all linear unbiased estimators of β j. The theorem proving the above result is known as the Gauss-Markov Theorem. 6 / 57

18 Revision of Linear Regression Common model diagnostics include 1. t-tests of significance of individual coefficients 2. F test of model significance 3. R 2 and adjusted-r 2 for goodness of fit 4. test of autocorrelation (usually for time series data) 5. test of homoscedasticity (usually for cross section data) 6. test of autoregressive conditional heteroscedasticity (usually for financial time series data) 7. detection of outliers 8. test of normality of errors 9. test of coefficient constancy (structural change) and others. 7 / 57

19 Revision of Linear Regression The following example with n = 34 annual observations is taken from Griffiths, Hill and Judge (1993). We are concerned with the relationship between the area of sugarcane (A, in 1000 of hectares) planted in a region of Bangladesh. By using area planted instead of quantity as the dependent variable, we are elminating yield uncertainty. It is thought that when farmers decide on an area for sugarcane production, their decision is largely determined by the price of sugarcane (PS, in taka/tonne) and that of its main substitute, jute (PJ, in taka/tonne). Assuming a log-linear function form for constant elasticity, we specify the model as lna t = β 0 + β 1 lnps t + β 2 lnpj t + ɛ t 8 / 57

20 Revision of Linear Regression PROC REG of SAS produces the following results: The REG Procedure Model: MODEL1 Dependent Variable: lna Number of Observations Read 34 Number of Observations Used 34 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept lnps <.0001 lnpj / 57

21 Revision of Linear Regression The estimated regression equation is thus lna t = ( ) ( ) lnps t ( ) lnpj t 10 / 57

22 Revision of Linear Regression The estimated regression equation is thus lna t = ( ) ( ) lnps t ( ) lnpj t A test of H 0 : β 1 = 0 vs. H 1 : β 1 0 yields t = b 1 0 s.e.(b 1 ) = = 6.98 with a p-value of < Hence β 1 is significantly different from zero, and lnps is therefore a significant explanatory variable. 10 / 57

23 Revision of Linear Regression The estimated regression equation is thus lna t = ( ) ( ) lnps t ( ) lnpj t A test of H 0 : β 1 = 0 vs. H 1 : β 1 0 yields t = b 1 0 s.e.(b 1 ) = = 6.98 with a p-value of < Hence β 1 is significantly different from zero, and lnps is therefore a significant explanatory variable. However, the same cannot be said about β 2 or lnpj. 10 / 57

24 Revision of Linear Regression A test of H 0 : β 1 = β 2 = 0 vs. H 1 : otherwise by the F test: F = RSS/k ESS/(n (k+1)) = / /31 = with a p-value of < , confirming the overall significance of the model. Question: Why should we test the overall significance of the model in addition to testing individual regressors significance? R 2 = meaning that the estimated regression can explain 62.06% of the variability of lna in the sample; after adjusting for the model s d.o.f., the explanatory power of the model is 59.61%, as indicated by the adjusted-r / 57

25 Revision of Linear Regression Removing the insignificant lnpj and re-running the regression yields: The REG Procedure Model: MODEL1 Dependent Variable: lna Number of Observations Read 34 Number of Observations Used 34 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept lnps < / 57

26 Revision of Linear Regression Note that R 2 decreases by 4.38% from to whereas adjusted-r 2 decreases by 2.58% from to Recall that when explanatory variables are dropped (added), R 2 always falls (rises), but adjusted-r 2 may rise or fall; when the model contains fewer (more) variables, adjusted-r 2 will rise (drop) if the increase (decrease) in d.o.f. due to the omission (addition) of variables outweighs the fall (rise) in the explanatory power of the regression. 13 / 57

27 Revision of Linear Regression Note that R 2 decreases by 4.38% from to whereas adjusted-r 2 decreases by 2.58% from to Recall that when explanatory variables are dropped (added), R 2 always falls (rises), but adjusted-r 2 may rise or fall; when the model contains fewer (more) variables, adjusted-r 2 will rise (drop) if the increase (decrease) in d.o.f. due to the omission (addition) of variables outweighs the fall (rise) in the explanatory power of the regression. For the simple linear regression model, the t statistic for H 0 : β 1 = 0 is 6.834, which is F = This result does not hold for multiple regression. 13 / 57

28 Multicollinearity There is another serious consequence of adding too many variables to a model besides depleting the model s d.o.f. If a model has several variables, it is likely that some of the variables will be strongly correlated. This problem, known as multicollinearity, can drastically alter the results from one model to another, making them harder to interpret. 14 / 57

29 Multicollinearity There is another serious consequence of adding too many variables to a model besides depleting the model s d.o.f. If a model has several variables, it is likely that some of the variables will be strongly correlated. This problem, known as multicollinearity, can drastically alter the results from one model to another, making them harder to interpret. The most extreme form of multicollinearity is perfect multicollinearity. It refers to the situation where an explanatory variable can be expressed as an exact linear combination of some of the others. Under perfect multicollinearity, O.L.S. fails to produce estimates of the coefficients. A classic example of perfect multicollinearity is the dummy variable trap. 14 / 57

30 Multicollinearity (Imperfect) multicollinearity is also known as near collinearity - the explanatory variables are linearly correlated but they do not obey an exact linear relationship. 15 / 57

31 Multicollinearity (Imperfect) multicollinearity is also known as near collinearity - the explanatory variables are linearly correlated but they do not obey an exact linear relationship. Consider the following three models explaining the relationship between HOUSING (number of housing starts (in thousands) in the U.S., and POP (U.S. population in millions), GDP (U.S. Gross Domestic Product in billions of dollars) and INTRATE (new home mortgage interest rate) between 1963 to 1985: 1)HOUSING t = β 0 + β 1 POP t + β 2 INTRATE t + ɛ t 2)HOUSING t = β 0 + β 3 GDP t + β 2 INTRATE t + ɛ t 3)HOUSING t = β 0 + β 1 POP t + β 2 INTRATE t + β 3 GDP t + ɛ t 15 / 57

32 Multicollinearity Results for the first model: The REG Procedure Model: MODEL1 Dependent Variable: housing Number of Observations Read 23 Number of Observations Used 23 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept pop intrate / 57

33 Multicollinearity Results for the second model: The REG Procedure Model: MODEL1 Dependent Variable: housing Number of Observations Read 23 Number of Observations Used 23 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept gdp intrate / 57

34 Multicollinearity Results from Models 1) and 2) both make sense - estimates of the coefficients are of the expected signs: β 1 > 0, β 2 < 0 and β 3 > 0 and the coefficients are all highly significant. 18 / 57

35 Multicollinearity Results from Models 1) and 2) both make sense - estimates of the coefficients are of the expected signs: β 1 > 0, β 2 < 0 and β 3 > 0 and the coefficients are all highly significant. Consider the third model that combines regressors of the first and second models: The REG Procedure Model: MODEL1 Dependent Variable: housing Number of Observations Read 23 Number of Observations Used 23 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Estimate Error Variable DF t Value Pr > t Intercept pop gdp intrate / 57

36 Multicollinearity In the third model, POP and GDP change to becoming insignificant although they are both significant when entering separately in the first and second models. This is because the three explanatory variables are strongly correlated. The pairwise sample correlations of the three variables are as follows: r GDP,POP = 0.99, r GDP,INTRATE = 0.88 and r POP,INTRATE = / 57

37 Multicollinearity Consider another example that relates EXPENSES, cumulative expenditure on the maintenance of an automobile, to MILES, the cumulative mileage in thousand of miles, and WEEKS, the automobile s age in weeks since first purchase, for 57 automobiles. The following three models are considered: 1)EXPENSES t = β 0 + β 1 WEEKS t + ɛ t 2)EXPENSES t = β 0 + β 2 MILES t + ɛ t 3)EXPENSES t = β 0 + β 1 WEEKS t + β 2 MILES t + ɛ t 20 / 57

38 Multicollinearity Consider another example that relates EXPENSES, cumulative expenditure on the maintenance of an automobile, to MILES, the cumulative mileage in thousand of miles, and WEEKS, the automobile s age in weeks since first purchase, for 57 automobiles. The following three models are considered: 1)EXPENSES t = β 0 + β 1 WEEKS t + ɛ t 2)EXPENSES t = β 0 + β 2 MILES t + ɛ t 3)EXPENSES t = β 0 + β 1 WEEKS t + β 2 MILES t + ɛ t A priori, we expect β 1 > 0 and β 2 > 0; a car that is driven more should have a greater maintenance expense; similarly, the older the car the greater the cost of maintaining it. 20 / 57

39 Multicollinearity Consider results for the three models: The REG Procedure Model: MODEL1 Dependent Variable: expenses Number of Observations Read 57 Number of Observations Used 57 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 weeks < / 57

40 Multicollinearity The REG Procedure Model: MODEL1 Dependent Variable: expenses Number of Observations Read 57 Number of Observations Used 57 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 miles < / 57

41 Multicollinearity The REG Procedure Model: MODEL1 Dependent Variable: expenses Number of Observations Read 57 Number of Observations Used 57 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept weeks <.0001 miles < / 57

42 Multicollinearity It is interesting to note that even though the coefficient estimate for MILES is positive in the second model, it is negative in the third model. Thus there is a reversal in sign. 24 / 57

43 Multicollinearity It is interesting to note that even though the coefficient estimate for MILES is positive in the second model, it is negative in the third model. Thus there is a reversal in sign. The magnitude of the coefficient estimate for WEEKS also changes substantially. 24 / 57

44 Multicollinearity It is interesting to note that even though the coefficient estimate for MILES is positive in the second model, it is negative in the third model. Thus there is a reversal in sign. The magnitude of the coefficient estimate for WEEKS also changes substantially. The t-statistics for MILES and WEEKS are also much lower in the third model even though both variables are still significant. 24 / 57

45 Multicollinearity It is interesting to note that even though the coefficient estimate for MILES is positive in the second model, it is negative in the third model. Thus there is a reversal in sign. The magnitude of the coefficient estimate for WEEKS also changes substantially. The t-statistics for MILES and WEEKS are also much lower in the third model even though both variables are still significant. The problem is again high correlation between WEEKS and MILES. 24 / 57

46 Multicollinearity To explain, consider the model Y t = β 0 + β 1 X 1t + β 2 X 2t + ɛ t It can be shown that var(b 1 ) = σ 2 n t=1 (X 1t X 1 ) 2 (1 r 2 12 ) and var(b 2 ) = σ 2 n t=1 (X 2t X 2 ) 2 (1 r 2 12 ), where r 12 is the sample correlation between X 1t and X 2t. 25 / 57

47 Multicollinearity The effects of increasing r 12 on var(b 2 ): r 12 var(b 2 ) 0 σ 2 nt=1 (X 2t X 2 ) 2 = V V V V V V V V V V The sign reversal and decrease in t values (in absolute terms) are caused by the inflated variances of the estimators. 26 / 57

48 Multicollinearity Common consequences of multicollinearity: - Wider confidence intervals. - Insignificant t statistics. - High R 2 and consequently F can convincingly reject H 0 : β 1 = β 2 = = β k = 0, but few significant t values. - O.L.S. estimates and their standard errors are very sensitive to small changes in model. 27 / 57

49 Multicollinearity Common consequences of multicollinearity: - Wider confidence intervals. - Insignificant t statistics. - High R 2 and consequently F can convincingly reject H 0 : β 1 = β 2 = = β k = 0, but few significant t values. - O.L.S. estimates and their standard errors are very sensitive to small changes in model. Multicollinearity is very much a norm in regression analysis involving non-experimental data. It can never be eliminated. The question is not about the existence or non-existence of multicollinearity, but how serious the problem is. 27 / 57

50 Identifying multicollinearity How to identify multicollinearity? High R 2 (and significant F value) but low values of t statistics. 28 / 57

51 Identifying multicollinearity How to identify multicollinearity? High R 2 (and significant F value) but low values of t statistics. Coefficient estimates and standard errors are sensitive to small changes in model specification. 28 / 57

52 Identifying multicollinearity How to identify multicollinearity? High R 2 (and significant F value) but low values of t statistics. Coefficient estimates and standard errors are sensitive to small changes in model specification. High pairwise correlations between the explanatory variables, but the converse need not be true. In other words, multicollinearity can still be a problem even though the correlation between two variables does not appear to be high. It is possible for three or more variables to be strongly correlated with low pairwise correlations. 28 / 57

53 Identifying multicollinearity variance inflation factor (VIF): The VIF for the variable X j is VIF j = 1, 1 Rj 2 where Rj 2 is the coefficient of determination of the regression of X j on the remaining explanatory variables. The VIF is a measure of the strength of the relationship between each explanatory variable and all other explanatory variables. 29 / 57

54 Identifying multicollinearity variance inflation factor (VIF): The VIF for the variable X j is VIF j = 1, 1 Rj 2 where Rj 2 is the coefficient of determination of the regression of X j on the remaining explanatory variables. The VIF is a measure of the strength of the relationship between each explanatory variable and all other explanatory variables. Relationship between Rj 2 and VIF j : Rj 2 VIF j / 57

55 Identifying multicollinearity Rule of thumb for using VIF: - An individual VIF j larger than 10 indicates that multicollinearity may be seriously influencing the least squares estimates of regression coefficients. - If the average of the VIF j s of the model exceeds 5 then muilticollinearity is considered to be serious. 30 / 57

56 Identifying multicollinearity For the HOUSING example, The REG Procedure Model: MODEL1 Dependent Variable: housing Number of Observations Read 23 Number of Observations Used 23 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > t Inflation Intercept pop gdp intrate / 57

57 Solutions to multicollinearity Solutions to multicollinearity: Benign neglect: If an analyst is less interested in interpreting individual coefficients but more interested in forecasting then multicollinearity may not a serious concern. Even with high correlations among independent variables, if the regression coefficients are significant and have meaningful signs and magnitudes, one need not be too concerned with multicollinearity. 32 / 57

58 Solutions to multicollinearity Solutions to multicollinearity: Benign neglect: If an analyst is less interested in interpreting individual coefficients but more interested in forecasting then multicollinearity may not a serious concern. Even with high correlations among independent variables, if the regression coefficients are significant and have meaningful signs and magnitudes, one need not be too concerned with multicollinearity. Eliminating Variables: Remove the variable with strong correlation with the rest would generally improve the significance of other variables. There is a danger, however, in removing too many variables from the model because that would lead to bias in the estimates. 32 / 57

59 Solutions to multicollinearity Respecify the model: For example, in the housing regression, we can include the variables as per capita rather than including population as an explanatory variable, leading to HOUSING t /POP t = β 0 + β 1 GDP t /POP t + β 2 INTRATE t + ɛ t The REG Procedure Model: MODEL1 Dependent Variable: phousing Number of Observations Read 23 Number of Observations Used 23 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variance Estimate Error t Inflation Variable DF Value Pr > t Intercept pgdp intrate / 57

60 Solutions to multicollinearity Increase the sample size if additional information is available. 34 / 57

61 Solutions to multicollinearity Increase the sample size if additional information is available. Use alternative estimation techniques such as Ridge regression and principal component analysis (beyond the scope of this course) 34 / 57

62 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation First-order autocorrelation As described previously, the standard linear regression model is assumed to be such that ɛ t and ɛ t+k are uncorrelated for all k 0. When this assumption fails the situation is known as autocorrelation or serial correlation. 35 / 57

63 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation First-order autocorrelation As described previously, the standard linear regression model is assumed to be such that ɛ t and ɛ t+k are uncorrelated for all k 0. When this assumption fails the situation is known as autocorrelation or serial correlation. The interpretation of such a situation is that the disturbance at time t influences not only the current value of the dependent variable but also values of the dependent variable at other times. 35 / 57

64 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation First-order autocorrelation As described previously, the standard linear regression model is assumed to be such that ɛ t and ɛ t+k are uncorrelated for all k 0. When this assumption fails the situation is known as autocorrelation or serial correlation. The interpretation of such a situation is that the disturbance at time t influences not only the current value of the dependent variable but also values of the dependent variable at other times. Many factors can cause autocorrelation, e.g., omitted explanatory variables, misspecification of functional form, measurement errors, patterns of business cycles, to name a few. 35 / 57

65 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation First-order autocorrelation There are many possible specifications of correlations among disturbances. The simplest, and also most common type is first-order autocorrelation by which the current disturbance depends linearly upon the immediate past disturbance plus another disturbance term that exhibits no autocorrelation over time, i.e., ɛ t = ρɛ t 1 + ν t, where ν t s are uncorrelated and ρ is an autocorrelation coefficient. 36 / 57

66 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation First-order autocorrelation There are many possible specifications of correlations among disturbances. The simplest, and also most common type is first-order autocorrelation by which the current disturbance depends linearly upon the immediate past disturbance plus another disturbance term that exhibits no autocorrelation over time, i.e., ɛ t = ρɛ t 1 + ν t, where ν t s are uncorrelated and ρ is an autocorrelation coefficient. It is required that 1 < ρ < 1 to fulfill the assumption of stationarity (see Chapter 5). 36 / 57

67 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test The problem with O.L.S. under autocorrelation is that it leads to inefficient estimators of the coefficients and a biased estimator of the error variance. Alternative estimation strategies other than O.L.S. are typically used when disturbances are autocorrelated. 37 / 57

68 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test The problem with O.L.S. under autocorrelation is that it leads to inefficient estimators of the coefficients and a biased estimator of the error variance. Alternative estimation strategies other than O.L.S. are typically used when disturbances are autocorrelated. How to test for first-order autocorrelation? 37 / 57

69 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test The problem with O.L.S. under autocorrelation is that it leads to inefficient estimators of the coefficients and a biased estimator of the error variance. Alternative estimation strategies other than O.L.S. are typically used when disturbances are autocorrelated. How to test for first-order autocorrelation? The Durbin-Watson (DW) test is the most common test. The DW test statistic is given by DW = where e t = Y t Ŷ t. n t=2 (et e t 1) 2 n, t=1 e2 t 37 / 57

70 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test Note that n t=2 DW = (e t e t 1 ) 2 n t=1 e2 t = n t=2 e2 t + n t=2 e2 t 1 2 n t=2 e te t 1 n t=1 e2 t = n t=1 e2 t e1 2 + n t=1 e2 t en 2 2 n t=2 e te t 1 n t=1 e2 t = 2 n t=1 e2 t 2 n t=2 e te t 1 (e1 2 + e2 n) n t=1 e2 t n t=1 e2 t = 2(1 r) (e2 1 + e2 n), where r = n t=2 ete t 1 n t=1 e2 t is the sample autocorrelation coefficient. 38 / 57

71 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test When the sample size is sufficient large, DW 2(1 r). 39 / 57

72 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test When the sample size is sufficient large, DW 2(1 r). If this was based on the true ɛ t s then DW tends in the limit to 2(1 ρ) as n increases. This means - if ρ 0, then DW 2 - if ρ 1, then DW 0 - if ρ 1, then DW 4 39 / 57

73 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test When the sample size is sufficient large, DW 2(1 r). If this was based on the true ɛ t s then DW tends in the limit to 2(1 ρ) as n increases. This means - if ρ 0, then DW 2 - if ρ 1, then DW 0 - if ρ 1, then DW 4 Therefore, a test of H 0 : ρ = 0 can be based on whether DW is close to 2 or not. 39 / 57

74 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test When the sample size is sufficient large, DW 2(1 r). If this was based on the true ɛ t s then DW tends in the limit to 2(1 ρ) as n increases. This means - if ρ 0, then DW 2 - if ρ 1, then DW 0 - if ρ 1, then DW 4 Therefore, a test of H 0 : ρ = 0 can be based on whether DW is close to 2 or not. Unfortunately, the critical values of DW depend on the values of the explanatory variables and these vary from one data set to another. 39 / 57

75 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test To get around this problem, Durbin and Watson established the lower (d L ) and upper (d U ) bounds for the DW critical value. 40 / 57

76 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test To get around this problem, Durbin and Watson established the lower (d L ) and upper (d U ) bounds for the DW critical value. If DW > 4 d L or DW < d L, then we reject H 0. If the observed d U < DW < 4 d U, then we do not reject H 0. If DW lies in neither of these two regions then the test is inconclusive. 40 / 57

77 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test To get around this problem, Durbin and Watson established the lower (d L ) and upper (d U ) bounds for the DW critical value. If DW > 4 d L or DW < d L, then we reject H 0. If the observed d U < DW < 4 d U, then we do not reject H 0. If DW lies in neither of these two regions then the test is inconclusive. See the DW table uploaded on the website. Note that d L and d U are tabulated in terms of n and k = k 1 = number of coefficients excluding the intercept. 40 / 57

78 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test To get around this problem, Durbin and Watson established the lower (d L ) and upper (d U ) bounds for the DW critical value. If DW > 4 d L or DW < d L, then we reject H 0. If the observed d U < DW < 4 d U, then we do not reject H 0. If DW lies in neither of these two regions then the test is inconclusive. See the DW table uploaded on the website. Note that d L and d U are tabulated in terms of n and k = k 1 = number of coefficients excluding the intercept. An intercept term must be present in order for d L s and d U s to be valid. 40 / 57

79 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test To be more specific, for testing H 0 : ρ = 0 vs. H 1 : ρ > 0, the decision rule is to reject H 0 if DW < d L and not to reject H 0 if DW > d U ; the test is inconclusive if d L < DW < d U. 41 / 57

80 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test To be more specific, for testing H 0 : ρ = 0 vs. H 1 : ρ > 0, the decision rule is to reject H 0 if DW < d L and not to reject H 0 if DW > d U ; the test is inconclusive if d L < DW < d U. For testing H 0 : ρ = 0 vs. H 1 : ρ < 0, the decision rule is to reject H 0 if DW > 4 d L and not to reject H 0 if DW < 4 d U ; the test is inconclusive if 4 d U < DW < 4 d L. 41 / 57

81 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test SAS calculates the DW statistic by the option DW in PROC REG. 42 / 57

82 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test SAS calculates the DW statistic by the option DW in PROC REG. For example, for our previous sugarcane plant area example, one can calculate the DW statistic by proc reg data=bangladesh; model lna=lnps lnpj/dw; run; yielding the results The REG Procedure Model: MODEL1 Dependent Variable: lna Durbin-Watson D Number of Observations 34 1st Order Autocorrelation / 57

83 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test As r = 0.412, we test H 0 : ρ = 0 vs. H 1 : ρ > / 57

84 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test As r = 0.412, we test H 0 : ρ = 0 vs. H 1 : ρ > 0. DW = For n = 34, k = 2, at the 5% significance level, d L = 1.33 and d U = / 57

85 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Durbin-Watson Test As r = 0.412, we test H 0 : ρ = 0 vs. H 1 : ρ > 0. DW = For n = 34, k = 2, at the 5% significance level, d L = 1.33 and d U = We therefore reject H 0 and conclude that there is a significant first-order autocorrelation in the disturbances. 43 / 57

86 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Correction for Autocorrelation Many alternative least squares procedures have been introduced for autocorrelation correction, e.g., Cochrane-Orcutt procedure, Prais-Winstein procedure. 44 / 57

87 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Correction for Autocorrelation Many alternative least squares procedures have been introduced for autocorrelation correction, e.g., Cochrane-Orcutt procedure, Prais-Winstein procedure. SAS uses an AUTOREG procedure that augments the original regression model with the autocorrelated disturbance function. For example, in the case of the sugarcane plant area regression example, AUTOREG considers the following model: lna t = β 0 + β 1 lnps t + β 2 lnpj t + ɛ t ; ɛ t = ζɛ t 1 + ν t, where ζ = ρ. The procedure simultaneously estimates β 0, β 1, β 2 and ρ. 44 / 57

88 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Correction for Autocorrelation The SAS commands and outputs are as follows: proc autoreg data=bangladesh; model lna=lnps lnpj/nlag=1; run; The AUTOREG Procedure Estimates of Autoregressive Parameters Standard Lag Coefficient Error t Value Yule-Walker Estimates SSE DFE 30 MSE Root MSE SBC AIC MAE AICC MAPE HQC Durbin-Watson Regress R-Square Total R-Square Parameter Estimates Standard Approx Variable DF Estimate Error t Value Pr > t Intercept lnps <.0001 lnpj / 57

89 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Correction for Autocorrelation The DW value has increased to , resulting in non-rejection of H 0 : ρ = / 57

90 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Correction for Autocorrelation The DW value has increased to , resulting in non-rejection of H 0 : ρ = 0. The coefficient β 2 changes from being insignificant (under O.L.S.) to significant. 46 / 57

91 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Correction for Autocorrelation The DW value has increased to , resulting in non-rejection of H 0 : ρ = 0. The coefficient β 2 changes from being insignificant (under O.L.S.) to significant. The estimated equation is lna t = (2.3813) (0.1902) lnps t (0.3465) lnpj t ( ) e t 1 46 / 57

92 3.1 First-order Autocorrelation and the Durbin-Watson Test 3.2 Correction for Autocorrelation Correction for Autocorrelation The DW value has increased to , resulting in non-rejection of H 0 : ρ = 0. The coefficient β 2 changes from being insignificant (under O.L.S.) to significant. The estimated equation is lna t = (2.3813) (0.1902) lnps t (0.3465) lnpj t ( ) e t 1 The forecast of lna t thus depends on e t 1, the error in the last period. For out-of-sample forecast of more than one period, e t 1 is unknown and set to zero as E(e t ) = / 57

93 Seemingly Unrelated Regression Equations Sometimes different regression equations may be connected not because they interact, but because their error terms are related. 47 / 57

94 Seemingly Unrelated Regression Equations Sometimes different regression equations may be connected not because they interact, but because their error terms are related. For example, in demand studies, a system of demand equations is specified to explain consumption of different commodities; the potential correlations of the disturbances across the equations arise because a shock affecting the demand of one good may spill over and affect demand of other goods. firms in the same branch of industry are likely subject to similar disturbances. 47 / 57

95 Seemingly Unrelated Regression Equations The seemingly unrelated regression equations (S.U.R.E.) model pool the observations of different regressions together and allow for contemporaneous correlations of the disturbances across the different equations. 48 / 57

96 Seemingly Unrelated Regression Equations The seemingly unrelated regression equations (S.U.R.E.) model pool the observations of different regressions together and allow for contemporaneous correlations of the disturbances across the different equations. S.U.R.E. usually (but not always) leads to improved precision over O.L.S. that treats each equation separately. 48 / 57

97 Seemingly Unrelated Regression Equations The seemingly unrelated regression equations (S.U.R.E.) model pool the observations of different regressions together and allow for contemporaneous correlations of the disturbances across the different equations. S.U.R.E. usually (but not always) leads to improved precision over O.L.S. that treats each equation separately. The equations are seemingly unrelated because they are only related through the disturbance terms. 48 / 57

98 Seemingly Unrelated Regression Equations A standard two-equation S.U.R.E. model may be expressed as: where Y t = β 0 + β 1 X 1t + β 2 X 2t + β 3 X 3t + + β k X kt + ɛ t W t = γ 0 + γ 1 Z 1t + γ 2 Z 2t + γ 3 Z 3t + + γ k Z kt + u t, E(ɛ t ) = E(u t ) = 0, var(ɛ t ) = σ 2 1, var(u t) = σ 2 2 cov(ɛ t ɛ t j ) = cov(u t u t j ) = 0 for j 0 cov(ɛ t u t ) 0 and cov(ɛ t u t j ) = 0 for j / 57

99 Seemingly Unrelated Regression Equations Thus, the standard S.U.R.E. model rules out possibilities of serial correlations or heteroscedasticity in an individual equation or serial correlations across equations, but permits contemporaneous correlations across the equations. 50 / 57

100 Seemingly Unrelated Regression Equations Thus, the standard S.U.R.E. model rules out possibilities of serial correlations or heteroscedasticity in an individual equation or serial correlations across equations, but permits contemporaneous correlations across the equations. The standard S.U.R.E. model has been extended to allow for the non-standard features described above as well as different number of explanatory variables in the equations. But these are beyond the scope of our discussion here. 50 / 57

101 Seemingly Unrelated Regression Equations To illustrate the S.U.R.E. technique, consider two firms, General Electric and Westinghouse, indexed by 1 and 2 respectively. Consider the following economic model describing gross firm investment of the two firms: I 1t = β 0 + β 1 V 1t + β 2 K 1t + ɛ t I 2t = γ 0 + γ 1 V 2t + γ 2 K 2t + u t t = 1,..20, where I, V and K are annual gross investment, stock market value and capital stock of the firm at the beginning of the year respectively. The data are taken from Griffiths, Hill and Judge (1993). 51 / 57

102 Seemingly Unrelated Regression Equations As General Electric and Westinghouse are in similar lines of business, the unexplained disturbances that affect the two firms investment decision may be contemporaneously correlated (i.e., the unexplained factor that affects General Electric s investment at time t may be correlated with a similar factor that affects Westinghouse s at the same time). O.L.S. regression of individual regressions cannot capture this correlation. We therefore pool the 40 observations and treat the model as a two-equation system. 52 / 57

103 Seemingly Unrelated Regression Equations The SAS commands for S.U.R.E. estimation of the above model are as follows. PROC SYSLIN first produces the O.L.S. results from estimating the equations separately followed by the S.U.R.E. results from joint estimation: data invest; input i1 v1 k1 i2 v2 k2; cards;...; proc syslin sur; model i1=v1 k1; model i2=v2 k2; run; 53 / 57

104 Seemingly Unrelated Regression Equations The SAS System SYSLIN Procedure Ordinary Least Squares Estimation Model: I1 Dependent variable: I1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model Error C Total Root MSE R-Square Dep Mean Adj R-SQ C.V Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > T INTERCEP V K SYSLIN Procedure Ordinary Least Squares Estimation Model: I2 Dependent variable: I2 Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model Error C Total Root MSE R-Square Dep Mean Adj R-SQ C.V Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > T INTERCEP V K / 57

105 Seemingly Unrelated Regression Equations SYSLIN Procedure Seemingly Unrelated Regression Estimation Cross Model Correlation Corr I1 I2 I I Model: I1 Dependent variable: I1 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > T INTERCEP V K Model: I2 Dependent variable: I2 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > T INTERCEP V K / 57

106 Seemingly Unrelated Regression Equations Hence O.L.S. estimation produces Î 1t = V 1t K 1t ( ) ( ) ( ) Î 2t = V 2t K 2t ( ) ( ) ( ) whereas S.U.R.E. estimation yields Î 1t = V 1t K 1t ( ) ( ) ( ) Î 2t = V 2t K 2t ( ) ( ) ( ) 56 / 57

107 Seemingly Unrelated Regression Equations S.U.R.E. estimation results in smaller standard errors of the estimates and hence more precise estimates of the coefficients. 57 / 57

108 Seemingly Unrelated Regression Equations S.U.R.E. estimation results in smaller standard errors of the estimates and hence more precise estimates of the coefficients. S.U.R.E. estimation will result no efficiency gain over O.L.S. if 1. cov(ɛ t u t ) = 0, or 2. the equations contain identical explanatory variables, e.g., V 1t = V 2t and K 1t = K 2t for all t. 57 / 57

109 Seemingly Unrelated Regression Equations S.U.R.E. estimation results in smaller standard errors of the estimates and hence more precise estimates of the coefficients. S.U.R.E. estimation will result no efficiency gain over O.L.S. if 1. cov(ɛ t u t ) = 0, or 2. the equations contain identical explanatory variables, e.g., V 1t = V 2t and K 1t = K 2t for all t. In our example, the O.L.S. residuals from the two equations have a contemporaneous correlation of It can be tested if the disturbances are indeed correlated. 57 / 57

CHAPTER 3: Multicollinearity and Model Selection

CHAPTER 3: Multicollinearity and Model Selection Prof. Alan Wan 1 / 89 Table of contents 1. Multicollinearity 1.1 What is Multicollinearity? 1.2 Consequences and Identification of Multicollinearity 1.3