Advanced Quantitative Methods: specification and multicollinearity

Size: px

Start display at page:

Download "Advanced Quantitative Methods: specification and multicollinearity"

Dorthy Howard
5 years ago
Views:

1 Advanced Quantitative Methods: Specification and multicollinearity University College Dublin February 16, 2011

2 1 Specification 2 3 4

3 Omitted variables Nonlinearities Outline 1 Specification 2 3 4

4 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0.

5 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0. intercept will be biased iff E(z) 0.

6 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0. intercept will be biased iff E(z) 0. ˆσ 2 will be biased upward

7 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0. intercept will be biased iff E(z) 0. ˆσ 2 will be biased upward = V(ˆβ OLS ) will be biased upward

8 Omitted variables Nonlinearities Omitting a relevant variable z: graphical intuition y If z omitted, Areas II and III reflect information used to estimate ˆβ OLS X I If z included, only Area II would be used to estimate ˆβ OLS X x II III IV Only Area I is used to estimate ˆσ 2, except when z excluded, and then Area IV is also used z If X is orthogonal to z, then no Area III and bias disappears

9 Omitted variables Nonlinearities Including an irrelevant independent variable For unnecessarily included variable z: ˆβ OLS X and V(ˆβ X OLS ) will remain unbiased

10 Omitted variables Nonlinearities Including an irrelevant independent variable For unnecessarily included variable z: ˆβ OLS X and V(ˆβ X OLS ) will remain unbiased ˆβ OLS will be less efficient (increases MSE)

11 Omitted variables Nonlinearities Adding an irrelevant variable: graphical intuition Area II refects variation in y due entirely to X, so unbiased ˆβ OLS X x II III I y Since Area II < (Area II + III), V(ˆβ OLS X ) increases Area I used to estimate ˆσ 2 unbiased so V(ˆβ OLS ) remains unbiased z If z is orthogonal to X then no Area III and then no efficiency loss

12 Omitted variables Nonlinearities Testing restrictions H 0 : y = X (1) n k β 1 +ε H 1 : y = X (1) n k β 1 +X (2) n r β 2 +ε USSR Unrestricted sum of squared residuals RSSR Restricted sum of squared residuals F β2 = (RSSR USSR)/r USSR/(n k r) F(r,n k r)

13 Omitted variables Nonlinearities Non-linearity If there is non-linearity in the variables, but not in the parameters, there is no problem. E.g. can be estimated with OLS. y i = β 0 +β 1 x i +β 2 x 2 i +ε i

14 Omitted variables Nonlinearities Non-linearity If there is non-linearity in the variables, but not in the parameters, there is no problem. E.g. can be estimated with OLS. y i = β 0 +β 1 x i +β 2 x 2 i +ε i If there are other non-linearities, sometimes the equation can be transformed. E.g. y i = β 0 x β 1 i ε i log(y i ) = log(β 0 )+β 1 log(x i )+log(ε i ) y i = β 0 +β 1 x i +ε i

15 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example

16 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example semi-log has two forms: y i = β 0 +β 1 log(x i ), where β 1 is y due to % x) log(y i ) = β 0 +β 1 x i, where β 1 is % y due to x)

17 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example semi-log has two forms: y i = β 0 +β 1 log(x i ), where β 1 is y due to % x) log(y i ) = β 0 +β 1 x i, where β 1 is % y due to x) inverse or reciprocal: y i = β 0 +β 1 1 x i

18 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example semi-log has two forms: y i = β 0 +β 1 log(x i ), where β 1 is y due to % x) log(y i ) = β 0 +β 1 x i, where β 1 is % y due to x) inverse or reciprocal: y i = β 0 +β 1 1 x i polynomial y i = β 0 +β 1 x i +β 2 x 2 i

19 Omitted variables Nonlinearities Exercise Open the mnrf.csv data file. Using plots and diagnostics, find the right specification for: f = f(m,n,r) Of course, start with: f i = β 0 +β 1 m i +β 2 n i +β 3 r i +ε i

20 Outline 1 Specification 2 3 4

21 Hat matrix ŷ = Xˆβ OLS = X(X X) 1 X y = Hy H is called the hat matrix (it puts a hat on y )

22 Hat matrix ŷ = Xˆβ OLS = X(X X) 1 X y = Hy H is called the hat matrix (it puts a hat on y ), or sometimes prediction matrix P.

23 Hat matrix ŷ = Xˆβ OLS = X(X X) 1 X y = Hy H is called the hat matrix (it puts a hat on y ), or sometimes prediction matrix P. var(ŷ) = σ 2 H var(e) = σ 2 (I H)

24 Hat matrix e=(i H)y y x 2 x 1 y^ = Hy

25 Leverage The elements on the diagonal of H are called the leverage of each case - the higher the leverage, the more this particular case contributed to the predicted dependent variable.

26 Leverage The elements on the diagonal of H are called the leverage of each case - the higher the leverage, the more this particular case contributed to the predicted dependent variable. For the remainder we will use: h i = H ii = x i (X X) 1 x i, thus h i represents the leverage of observation i (x i is a row vector of the independent variables for case i).

27 Studentized and standardized residuals The standardized residual is: e i r i = s 2 (1 h i ) The studentized residual is the root mean squared error of the regression with the ith observation removed: t i = e i, s( i) 2 (1 h i) with s 2 ( i) representing s2 for the model without observation i. Both standardized and studentized residuals are attempts to adjust residuals by their standard errors, since: var(e i ) = σ 2 specification (1 h and i ) multicollinearity

28 Outliers An outlier is a point on the regression line where the residual is large.

29 Example model sr i = β 0 +β 1 pop15 i +β 2 pop75 i +β 3 dpi i +β 4 ddpi i +ε i sr savings rate - personal savings divided by disposable income pop15 percent population under age 15 pop75 percent population over age 75 dpi per capita disposable income in dollars ddpi percent growth rate of dpi

30 Example model summary(m <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data=savings)) Estimate Std. Error t value Pr(> t ) (Intercept) pop pop dpi ddpi

31 Residuals vs fitted Residuals vs Fitted plot(m, which=1, bty="n", pch=19) Residuals Philippines Zambia Chile specification 16 and multicollinearity

32 Normal Q-Q plot Normal Q Q plot(m, which=2, bty="n", pch=19) Standardized residuals Chile Philippines Zambia Jos1Elkink 2

33 Scale-Location plot Scale Location plot(m, which=3, bty="n", pch=19) Zambia Standardized residuals Philippines Chile specification 16 and multicollinearity

34 Leverage h <- influence(m)$hat h <- h[order(h)] Leverage United States Japan Ireland South Canada Rhodesia France Jamaica United Kingdom Austria Sweden Luxembourg Venezuela Germany Netherlands Belgium Bolivia Finland Greece Portugal Uruguay Nicaragua ew Norway Zealand Colombia Honduras Guatamala Denmark Korea Philippines Ecuador South Zambia Malaysia Peru Africa Australia Paraguay Italy Switzerland Brazil Iceland Costa India Tunisia Rica Spain China Malta ama urkey le Libya plot(h, type="n", bty="n", xlab="", ylab="leverage") text(h, label=names(h), cex=.7, pos=2) points(h, col="red", pch=19) specification 50 and multicollinearity

35 Residuals vs leverage Residuals vs Leverage plot(m, which=5, bty="n", pch=19) Standardized residuals Zambia Japan Libya Cook s distance

36 Leverage and influence A point with high leverage is located far from the other points. A high leverage point that strongly influences the regression line is called an influential point.

37 Outlier, low leverage, low influence y

38 High leverage, low influence y

39 High leverage, high influence y

40 Cook s Distance D i (ˆβ OLS ( i) ˆβ OLS ) X X(ˆβ ( i) OLS ˆβ OLS ) ks 2 e i h i = ( s ) 2 1 h i k(1 h i ) = t2 i k var(ŷ i ) var(e i ) F(k,n k)

41 The F-test here refers to whether ˆβ OLS would be significantly different if observation i were to be removed (H 0 : β = β ( i) ) (Cook 1979: 169). Specification Cook s Distance D i (ˆβ OLS ( i) ˆβ OLS ) X X(ˆβ ( i) OLS ˆβ OLS ) ks 2 e i h i = ( s ) 2 1 h i k(1 h i ) = t2 i k var(ŷ i ) var(e i ) F(k,n k)

42 Cook s Distance D i = t2 i k var(ŷ i ) var(e i ) t 2 i is a measure of the degree to which the ith observation can be considered as an outlier from the assumed model. The ratios var(ŷ i) var(e i ) measure the relative sensitivity of the estimate, ˆβ OLS, to potential outlying values at each data point. (Cook 1977: 16)

43 Cook s Distance plot Cook s distance plot(m, which=4, pch=19, bty="n") Libya Cook s distance Japan Zambia specification 50 and multicollinearity

44 Cook s Distance vs leverage Cook s dist vs Leverage h ii (1 h ii) Libya plot(m, which=6, pch=19, bty="n") Cook s distance Zambia Japan

45 Examine the outliers > savings[c("japan", "Zambia", "Libya", "United States", "I + ] sr pop15 pop75 dpi ddpi Japan Zambia Libya United States Ireland

46 What to do with outliers? Options: 1 Ignore the problem

47 What to do with outliers? Options: 1 Ignore the problem 2 Investigate why the data are outliers what makes them unusual?

48 What to do with outliers? Options: 1 Ignore the problem 2 Investigate why the data are outliers what makes them unusual? 3 Consider respecifying the model, either by tranforming a variable or by including an additional variable (but beware of overfitting)

49 What to do with outliers? Options: 1 Ignore the problem 2 Investigate why the data are outliers what makes them unusual? 3 Consider respecifying the model, either by tranforming a variable or by including an additional variable (but beware of overfitting) 4 Consider a variant of robust regression that downweights outliers

50 Exercise library(faraway) data(teengamb) lm(gamble ~ sex + status + income + verbal, data=teengamb) Check for leverage, outliers, influential points and nonlinearities.

51 Outline 1 Specification 2 3 4

52 Collinearity When some variables are linear combinations of others then we have exact (or perfect) collinearity, and there is no unique least squares estimate of β.

53 Collinearity When some variables are linear combinations of others then we have exact (or perfect) collinearity, and there is no unique least squares estimate of β. (X X) 1 will not exist if r(x) < k.

54 Collinearity When some variables are linear combinations of others then we have exact (or perfect) collinearity, and there is no unique least squares estimate of β. (X X) 1 will not exist if r(x) < k. When X variables are highly correlated, we have multicollinearity. Detecting multicollinearity: look at correlation matrix of predictors for pairwise correlations regress x j on X ( j) to produce Rj 2, and look for high values (close to 1.0) examine eigenvalues of X X

55 The extent to which multicollinearity is a problem is debatable. The issue is comparable to that of sample size: if n is too small, we have difficulty picking up effects even if they really exist; the same holds for variables that are highly multicollinear, making it difficult to separate their effects on y.

56 However, some problems with high multicollinearity: Small changes in data can lead to large changes in estimates High standard errors but joint significance Coefficients may have wrong sign or implausible magnitudes (Greene 2002: 57)

57 II I y III IV x z

58 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 (Greene 2002: 57)

59 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 σ 2 : all else equal, the better the fit, the lower the variance (Greene 2002: 57)

60 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 σ 2 : all else equal, the better the fit, the lower the variance (1 R 2 k ): all else equal, the lower the R2 from regressing the kth independent variable on all other independent variables, the lower the variance (Greene 2002: 57)

61 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 σ 2 : all else equal, the better the fit, the lower the variance (1 R 2 k ): all else equal, the lower the R2 from regressing the kth independent variable on all other independent variables, the lower the variance n i (x ik x k ) 2 : all else equal, the more variation in x, the lower the variance (Greene 2002: 57)

62 Variance Inflation Factor var(ˆβ OLS k ) = σ 2 (1 Rk 2) n i (x ik x k ) 2 1 VIF k = 1 Rk 2, thus VIF k shows the increase in the var(ˆβ k OLS ) due to the variable being collinear with other independent variables.

63 Variance Inflation Factor var(ˆβ OLS k ) = σ 2 (1 Rk 2) n i (x ik x k ) 2 1 VIF k = 1 Rk 2, thus VIF k shows the increase in the var(ˆβ k OLS ) due to the variable being collinear with other independent variables. library(faraway) vif(lm(...))

64 : solutions Check for coding or logical mistakes (esp. in cases of perfect multicollinearity) Increase n Remove one of the collinear variables (apparently not adding much) Combine multiple variables in indices or underlying dimensions Formalise the relationship

65 Exercise library(faraway) data(longley) Regress Employed on the other independent variables and investigate for multicollinearity and other issues.

66 Outline 1 Specification 2 3 4

67 Homoscedasticity

69 Regression disturbances whose variances are not constant across observations are heteroscedastic.

70 Regression disturbances whose variances are not constant across observations are heteroscedastic. Under heteroscedasticity, the OLS estimators remain unbiased and consistent, but are no longer BLUE or asymptotically efficient. (Thomas 1985, 94)

71 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms)

72 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample

73 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series

74 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data)

75 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets)

76 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets) Omitted variable

77 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets) Omitted variable Wrong functional form

78 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets) Omitted variable Wrong functional form Aggregation with varying sizes of populations etc.

79 : aggregation example Imagine we have the following model: y ij = β 0 +β 1 x ij +ε ij, whereby i indicates the individual, and j the region of this individual, with n j individuals per region.

80 : aggregation example Imagine we have the following model: y ij = β 0 +β 1 x ij +ε ij, whereby i indicates the individual, and j the region of this individual, with n j individuals per region. Say we only have regional level data, ȳ j = 1 n j nj i y ij and x j = 1 n j nj i x ij : ȳ j = β 0 +β 1 x j + ε j, where ε j = 1 n j nj i ε ij. (Thomas 1985, 98)

81 : aggregation example ȳ j = β 0 +β 1 x j + ε j E( ε j ) = 0 n j E( ε 2 j) = 1 nj 2 E( ε ij ) = n j nj 2 σ 2 = 1 σ 2 n j i Therefore, var( ε j ) depends on n j and thus varies across cases. (Judge et al 1985, )

82 : aggregation example ȳ j = β 0 +β 1 x j + ε j In this case the fix is actually easy: since var(ε j ) = σ 2 /n j, var( n j ε j ) = σ 2, so the heteroscedasticity can be avoided by transforming the variables: nj ȳ j = β 0 nj +β 1 nj x j +ε j (Thomas 1985, 98)

83 Since OLS is no longer BLUE or asymptotically efficient, other linear unbiased estimators exist which have smaller sampling variances;

84 Since OLS is no longer BLUE or asymptotically efficient, other linear unbiased estimators exist which have smaller sampling variances; other consistent estimators exist which collapse more quickly to the true values as n increases;

85 Since OLS is no longer BLUE or asymptotically efficient, other linear unbiased estimators exist which have smaller sampling variances; other consistent estimators exist which collapse more quickly to the true values as n increases; we can no longer trust hypothesis tests, because var(ˆβ OLS ) is biased. cov(x 2 i,σ2 i ) > 0, then var(ˆβ OLS ) is underestimated cov(x 2 i,σ2 i ) = 0, then no bias in var(ˆβ OLS ) cov(x 2 i,σ2 i ) < 0, then var(ˆβ OLS ) is overestimated (inefficient) (Thomas 1985, 94-95; Judge et al 1985, 422)

86 Normally we assume: σ E(εε X) = σ 2 0 σ I = σ 2 For the heteroscedastic model we have: ω σ E(εε 0 ω X) = Ω = = 0 σ ω n σn 2

87 Deriving var(ˆβ OLS ) var(ˆβ OLS ) = E[(ˆβ OLS β)(ˆβ OLS β) ] = E[((X X) 1 X ε)((x X) 1 X ε) ] = E[(X X) 1 X εε X(X X) 1 ] = (X X) 1 X E[εε ]X(X X) 1 = (X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1

88 Deriving var(ˆβ OLS ) under heteroscedasticity var(ˆβ OLS ) = E[(ˆβ OLS β)(ˆβ OLS β) ] = E[((X X) 1 X ε)((x X) 1 X ε) ] = E[(X X) 1 X εε X(X X) 1 ] = (X X) 1 X E[εε ]X(X X) 1 = (X X) 1 X ΩX(X X) 1, which cannot be simplified further...

89 Deriving var(ˆβ OLS ) under heteroscedasticity var(ˆβ OLS ) = E[(ˆβ OLS β)(ˆβ OLS β) ] = E[((X X) 1 X ε)((x X) 1 X ε) ] = E[(X X) 1 X εε X(X X) 1 ] = (X X) 1 X E[εε ]X(X X) 1 = (X X) 1 X ΩX(X X) 1, which cannot be simplified further and requires knowledge of Ω to estimate.

90 Efficiency Because observations with low variance will contain more information about the parameters than observations with high variance, an estimator which weighs all observations equally, like OLS, will not be the most efficient. (Davidson & MacKinnon 1999: 197)

91 : solution When the type of heteroscedasticity is known, we can often transform the data. An example is the multiplication with n j of each term in the equation for the group means regression.

92 : solution When the type of heteroscedasticity is known, we can often transform the data. An example is the multiplication with n j of each term in the equation for the group means regression. Another example: if var(ε i ) = σ 2 x 2 i1, then var(ε i/x i1 ) = σ 2, so: y i = β 0 +β 1 x i1 +β 2 x i2 +ε i y i 1 x i1 x i2 = β 0 +β 1 +β 2 + ε i x i1 x i1 x i1 x i1 yi = β 1 +β 0 xi1 +β 1 xi2 +ε i (note the intercepts interpretation) x i1 (Thomas 1985, 98)

93 Generalized Least Squares More in general, if var(ε i ) = σ 2 λ i, with λ i being some function of X i, then we can always transform our model by dividing all variables by λ i. This is referred to as generalized least squares (). (It is a generalization, because of λ i = 1, we have OLS.) With, observations with lower σ 2 are weighted more heavily. (Thomas 1985, 98; Judge et al 1985, 421)

94 Estimated Generalized Least Squares To perform estimation, σi 2 has to be known. In some cases we can estimate σi 2, in which case we talk of estimated generalized least squares (E).

95 Estimated Generalized Least Squares To perform estimation, σi 2 has to be known. In some cases we can estimate σi 2, in which case we talk of estimated generalized least squares (E). To estimate a model with minimal restrictions on σi 2, we are estimating a model with n+k unknown parameters - i.e. the number of parameters to be estimated increases as n increases and the estimator is by definition inconsistent. (Judge et al 1985, 423)

96 Estimated Generalized Least Squares Special cases where estimation might be possible: σ 2 constant within subgroups σ 2 = (Zα) 2, i.e. σ is linear function of exogenous variables σ 2 = Zα, i.e. σ 2 is linear function of exogenous variables σ 2 = σ 2 (Xβ) p, i.e. var(y) is proportional to a power of its expectation σ 2 = e Zα, multiplicative heteroscedasticity e t = v t α 0 +α 1 et 1 2, autoregressive conditional heteroscedasticity (ARCH) See Judge et al (1985, 424ff) for an overview of estimators.

97 When the form of the heteroscedasticity is unknown, we can get consistent estimates of var(ˆβ OLS ) using a heteroscedasticity consistent covariance matrix (HCCM).

98 When the form of the heteroscedasticity is unknown, we can get consistent estimates of var(ˆβ OLS ) using a heteroscedasticity consistent covariance matrix (HCCM). var(ˆβ OLS ) = (X X) 1 X ΩX(X X) 1 HCCM: estimate ˆω ii = (e i 0) 2 = ei 2, so that we have variance estimator var(ˆβ OLS ) = (X X) 1 X diag(ei 2 )X(X X) 1

99 Since there are several variations, this is called HC0 in the Specification When the form of the heteroscedasticity is unknown, we can get consistent estimates of var(ˆβ OLS ) using a heteroscedasticity consistent covariance matrix (HCCM). var(ˆβ OLS ) = (X X) 1 X ΩX(X X) 1 HCCM: estimate ˆω ii = (e i 0) 2 = ei 2, so that we have variance estimator var(ˆβ OLS ) = (X X) 1 X diag(ei 2 )X(X X) 1

100 Residuals vs errors Note that: h ii = x i (X X) 1 x i var(e i ) = σ 2 (1 h ii ) σ 2, therefore var(e i ) underestimates σ 2

101 Residuals vs errors Note that: h ii = x i (X X) 1 x i var(e i ) = σ 2 (1 h ii ) σ 2, therefore var(e i ) underestimates σ 2 and even when the errors (ε) are homoscedastic, the residuals (e) are not.

102 Residuals vs errors Note that: h ii = x i (X X) 1 x i var(e i ) = σ 2 (1 h ii ) σ 2, therefore var(e i ) underestimates σ 2 and even when the errors (ε) are homoscedastic, the residuals (e) are not. So ei 2, used in White s HC0, is, even though consistent, a biased estimator. The small sample properties turn out not to be very good. (Long & Ervin 2000)

103 HCCM variations HC0 = (X X) 1 X diag(e 2 i )X(X X) 1 HC1 = n n k (X X) 1 X diag(e 2 i )X(X X) 1 = n n k HC0 HC2 = (X X) 1 X diag( )X(X X) 1 1 h ii HC3 = (X X) 1 X diag( (1 h ii ) 2)X(X X) 1 Based on Monte Carlo analyses, HC3 is best in small samples. e 2 i e 2 i (Long & Ervin 2000)

104 in R library(car) m <- lm(...) summary(m) vcov <- hccm(m, type="hc3") sqrt(diag(vcov)) (See notes for manual version.)

105 Another solution for dealing with heteroscedasticity is to bootstrap to acquire standard errors. se <- NULL for (i in 1:1000) { sel <- sample(1:n, n, TRUE) mbs <- lm(y ~ x1 + x2, data=data[sel,]) se <- rbind(se, sqrt(diag(vcov(mbs)))) } colmeans(se)

106 Exercise Using p4factor.csv data and model dem i = β 0 +β 1 cwar i +β 3 laggdppc i +β 4 pnbdem i +β 6 egr i +ε i, calculate the standard errors using normal OLS estimation; the four HCCM variations; bootstrapping.

107 Residual plots: heteroscedasticity To detect heteroscedasticity (unequal variances), it is useful to plot: Residuals against fitted values Residuals against dependent variable Residuals against independent variable(s) Usually, the first one is sufficient to detect heteroscedasticity, and can simply be found by: m <- lm(y ~ x) plot(m)

108 Residual plots: heteroscedasticity y

109 Residual plots: heteroscedasticity residuals(m)

110 Residual plots: heteroscedasticity residuals(m)

111 Residual plots: heteroscedasticity residuals(m)

112 Residual plots: homoscedasticity y

113 Residual plots: homoscedasticity residuals(m)

114 Residual plots: homoscedasticity residuals(m)

115 Residual plots: homoscedasticity residuals(m)

116 Residual plots: heteroscedasticity y

117 Residual plots: heteroscedasticity residuals(m)

118 Residual plots: heteroscedasticity residuals(m)

119 Residual plots: heteroscedasticity residuals(m)

120 Known groups One way of testing for heteroscedasticity is if you expect that the variances might differ between two groups, is to run two separate regressions, for the two groups: SSR 1 /(n 1 k) SSR 2 /(n 2 k) F(n 1 k,n 2 k) H 0 : σ 2 1 = σ 2 2 (Wallace & Silver 1988: 267)

121 Known groups For example, f <- dem ~ cwar + laggdppc + pnbdem m1 <- lm(f, data=p4, subset=(bautlag == 1)) m2 <- lm(f, data=p4, subset=(bautlag == 0)) ssr1 <- sum(residuals(m1)^2) ssr2 <- sum(residuals(m2)^2) F <- (ssr1/m1$df) / (ssr2/m2$df) 1 - pf(f, m1$df, m2$df)

122 Breusch-Pagan test σi 2 = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t.

123 Breusch-Pagan test σ 2 i = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t. So this includes scenarios where σ 2 i = (Zα) 2, or σ i = Zα, or σ i = e Zα.

124 Breusch-Pagan test σ 2 i = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t. So this includes scenarios where σ 2 i = (Zα) 2, or σ i = Zα, or σ i = e Zα. If Z contains dummies for groups, it also includes heteroscedasticity due to different variances across subgroups.

125 Breusch-Pagan test σ 2 i = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t. So this includes scenarios where σ 2 i = (Zα) 2, or σ i = Zα, or σ i = e Zα. If Z contains dummies for groups, it also includes heteroscedasticity due to different variances across subgroups. Assumes e 2 i N(0,σ 2 i )

126 Breusch-Pagan test η = q Z(Z Z) 1 Z q 2ˆσ 4 χ 2 (s 1) asymptotically, where q i = e 2 i ˆσ 2 ˆσ 2 = 1 n e e and Z n s a matrix of exogenous variables.

127 Breusch-Pagan test m <- lm(y~x) r <- residuals(m) s2 <- t(r) %*% r / n q <- r^2 - s2 Z <- cbind(1,x) s <- dim(z)[2] eta <- (t(q) %*% Z %*% solve(t(z) %*% Z) %*% t(z) %*% q) eta <- eta / (2 * s2 ^2) p <- 1 - pchisq(eta, s-1) bptest(m, studentize=f)

128 Breusch-Pagan test With more than one independent variable, an alternative approach is to look at an auxiliary regression: e 2 i = γ 0 +γ 1 ŷ 2 i +v i If the model is homoscedastic and the variance is unrelated to ŷ, then H 0 : γ 1 = 0. For this regression, nr 2 χ 2 (1). summary(lm(residuals(m)^2 ~ fitted(m)))$r.sq * n (Thomas 1985, 96-97)

129 Breusch-Pagan test library(lmtest) m <- lm(y ~ x) bptest(m) bptest(m, ~ z1 + z2) By default, R assumes Z = X.

130 Goldfeld-Quandt test To run a Goldfeld-Quandt test: 1 Omit r central observations from the data 2 Run two separate regressions, one for the first (n r)/2 observations and one for the last 3 Calculate R = SSR 1 /SSR 2 4 Perform test based on R F( 1 2 (n r 2k), 1 2 (n r 2k)). (Judge et al 1985, 449)

131 Goldfeld-Quandt test m <- lm(y ~ x) m1 <- lm(y[1:20] ~ x[1:20]) m2 <- lm(y[(n-20):n] ~ x[(n-20):n]) ssr1 <- sum(residuals(m1)^2) ssr2 <- sum(residuals(m2)^2) R <- ssr1/ssr2 p1 <- 1 - pf(r, 18, 18) p2 <- 1 - pf(1/r, 18, 18) library(lmtest) gqtest(m, n-40)

132 Harrison-McCabe test For the above tests we always run several regressions, because even if errors are uncorrelated, residuals are not independent. If residuals are not independent, a ratio of subsets of these residuals do not have an F-distribution, while if we run separate regressions, the residuals will be independent (if the errors are) and such a ratio will have an F-distribution.

133 Harrison-McCabe test For the above tests we always run several regressions, because even if errors are uncorrelated, residuals are not independent. If residuals are not independent, a ratio of subsets of these residuals do not have an F-distribution, while if we run separate regressions, the residuals will be independent (if the errors are) and such a ratio will have an F-distribution. Harrison & McCabe (1979) suggest that such a ratio of subsets of the residuals do have a β-distribution, however.

134 Harrison-McCabe test Harrison McCabe statistic Upper bound HMC statistic Lower bound (two tailed test) E(b) 95% C.I

135 White s test One solution for dealing with heteroscedasticity is calculating White s heteroscedasticity-corrected standard errors. The reasoning behind the White test is very straightforward: if there is homoscedasticity, the corrected standard errors should not be significantly different from the normal ones.

136 White s test 1 Regress e 2 i on x i, all the variables in x i squared, and all cross-products of x i ; 2 Perform test on basis of nr 2 χ 2 (p 1), whereby p is the number of regressors in the auxiliary regression (Greene 2003, 222)

137 White s test 1 Regress e 2 i on x i, all the variables in x i squared, and all cross-products of x i ; e.g. if then run regression y i = β 0 +β 1 x i1 +β 2 x i2 +ε i e 2 i = γ 0 +γ 1 x i1 +γ 2 x i2 +γ 3 x 2 i1 +γ 4 x 2 i2 +γ 5 x i1 x i2 +v i and calculate R 2 ; 2 Perform test on basis of nr 2 χ 2 (p 1), whereby p is the number of regressors in the auxiliary regression (6 in the example). (Greene 2003, 222)

138 White s test in R m <- lm(y ~ x1 + x2) bptest(m, ~ x1 * x2 + I(x1^2) + I(x2^2)) I.e. there does not appear to be an implementation of White s test in R, but it is equivalent to the Breusch-Pagan test with the independent variables as discussed.

139 tests In general, many of these tests require some idea about the shape of the heteroscedasticity;

140 tests In general, many of these tests require some idea about the shape of the heteroscedasticity; many of these test have weak power, depending on the type of heteroscedasticity;

141 tests In general, many of these tests require some idea about the shape of the heteroscedasticity; many of these test have weak power, depending on the type of heteroscedasticity; if there is good reason to suspect heteroscedasticity, it is generally better to just use some robust estimation rather than test first - the tests are not reliable enough.

142 Exercise Using p4factor.csv data in 1998 and model dem i = β 0 +β 1 cwar i +β 3 laggdppc i +β 4 pnbdem i +β 6 egr i +ε i, run tests for heteroscedasticity: Breusch-Pagan; Goldfeld-Quandt; Harrison-McCabe; White.

143 Exercise Test the following model for heteroscedasticity and calculate corrected standard errors: library(lmtest) data(unemployment) myunemployment <- window(unemployment, start=1895, end=1956) time <- 6:67 modelrea <- UN ~ log(m/p) + log(g) + log(x) + time m <- lm(modelrea, data = myunemployment)

Advanced Quantitative Methods: Regression diagnostics

Advanced Quantitative Methods: Regression diagnostics Johan A. Elkink University College Dublin 9 February 2018 1, leverage, influence 2 3 Heteroscedasticity 4 1, leverage, influence 2 3 Heteroscedasticity