Advanced Quantitative Methods: specification and multicollinearity

Size: px
Start display at page:

Download "Advanced Quantitative Methods: specification and multicollinearity"

Transcription

1 Advanced Quantitative Methods: Specification and multicollinearity University College Dublin February 16, 2011

2 1 Specification 2 3 4

3 Omitted variables Nonlinearities Outline 1 Specification 2 3 4

4 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0.

5 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0. intercept will be biased iff E(z) 0.

6 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0. intercept will be biased iff E(z) 0. ˆσ 2 will be biased upward

7 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0. intercept will be biased iff E(z) 0. ˆσ 2 will be biased upward = V(ˆβ OLS ) will be biased upward

8 Omitted variables Nonlinearities Omitting a relevant variable z: graphical intuition y If z omitted, Areas II and III reflect information used to estimate ˆβ OLS X I If z included, only Area II would be used to estimate ˆβ OLS X x II III IV Only Area I is used to estimate ˆσ 2, except when z excluded, and then Area IV is also used z If X is orthogonal to z, then no Area III and bias disappears

9 Omitted variables Nonlinearities Including an irrelevant independent variable For unnecessarily included variable z: ˆβ OLS X and V(ˆβ X OLS ) will remain unbiased

10 Omitted variables Nonlinearities Including an irrelevant independent variable For unnecessarily included variable z: ˆβ OLS X and V(ˆβ X OLS ) will remain unbiased ˆβ OLS will be less efficient (increases MSE)

11 Omitted variables Nonlinearities Adding an irrelevant variable: graphical intuition Area II refects variation in y due entirely to X, so unbiased ˆβ OLS X x II III I y Since Area II < (Area II + III), V(ˆβ OLS X ) increases Area I used to estimate ˆσ 2 unbiased so V(ˆβ OLS ) remains unbiased z If z is orthogonal to X then no Area III and then no efficiency loss

12 Omitted variables Nonlinearities Testing restrictions H 0 : y = X (1) n k β 1 +ε H 1 : y = X (1) n k β 1 +X (2) n r β 2 +ε USSR Unrestricted sum of squared residuals RSSR Restricted sum of squared residuals F β2 = (RSSR USSR)/r USSR/(n k r) F(r,n k r)

13 Omitted variables Nonlinearities Non-linearity If there is non-linearity in the variables, but not in the parameters, there is no problem. E.g. can be estimated with OLS. y i = β 0 +β 1 x i +β 2 x 2 i +ε i

14 Omitted variables Nonlinearities Non-linearity If there is non-linearity in the variables, but not in the parameters, there is no problem. E.g. can be estimated with OLS. y i = β 0 +β 1 x i +β 2 x 2 i +ε i If there are other non-linearities, sometimes the equation can be transformed. E.g. y i = β 0 x β 1 i ε i log(y i ) = log(β 0 )+β 1 log(x i )+log(ε i ) y i = β 0 +β 1 x i +ε i

15 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example

16 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example semi-log has two forms: y i = β 0 +β 1 log(x i ), where β 1 is y due to % x) log(y i ) = β 0 +β 1 x i, where β 1 is % y due to x)

17 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example semi-log has two forms: y i = β 0 +β 1 log(x i ), where β 1 is y due to % x) log(y i ) = β 0 +β 1 x i, where β 1 is % y due to x) inverse or reciprocal: y i = β 0 +β 1 1 x i

18 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example semi-log has two forms: y i = β 0 +β 1 log(x i ), where β 1 is y due to % x) log(y i ) = β 0 +β 1 x i, where β 1 is % y due to x) inverse or reciprocal: y i = β 0 +β 1 1 x i polynomial y i = β 0 +β 1 x i +β 2 x 2 i

19 Omitted variables Nonlinearities Exercise Open the mnrf.csv data file. Using plots and diagnostics, find the right specification for: f = f(m,n,r) Of course, start with: f i = β 0 +β 1 m i +β 2 n i +β 3 r i +ε i

20 Outline 1 Specification 2 3 4

21 Hat matrix ŷ = Xˆβ OLS = X(X X) 1 X y = Hy H is called the hat matrix (it puts a hat on y )

22 Hat matrix ŷ = Xˆβ OLS = X(X X) 1 X y = Hy H is called the hat matrix (it puts a hat on y ), or sometimes prediction matrix P.

23 Hat matrix ŷ = Xˆβ OLS = X(X X) 1 X y = Hy H is called the hat matrix (it puts a hat on y ), or sometimes prediction matrix P. var(ŷ) = σ 2 H var(e) = σ 2 (I H)

24 Hat matrix e=(i H)y y x 2 x 1 y^ = Hy

25 Leverage The elements on the diagonal of H are called the leverage of each case - the higher the leverage, the more this particular case contributed to the predicted dependent variable.

26 Leverage The elements on the diagonal of H are called the leverage of each case - the higher the leverage, the more this particular case contributed to the predicted dependent variable. For the remainder we will use: h i = H ii = x i (X X) 1 x i, thus h i represents the leverage of observation i (x i is a row vector of the independent variables for case i).

27 Studentized and standardized residuals The standardized residual is: e i r i = s 2 (1 h i ) The studentized residual is the root mean squared error of the regression with the ith observation removed: t i = e i, s( i) 2 (1 h i) with s 2 ( i) representing s2 for the model without observation i. Both standardized and studentized residuals are attempts to adjust residuals by their standard errors, since: var(e i ) = σ 2 specification (1 h and i ) multicollinearity

28 Outliers An outlier is a point on the regression line where the residual is large.

29 Example model sr i = β 0 +β 1 pop15 i +β 2 pop75 i +β 3 dpi i +β 4 ddpi i +ε i sr savings rate - personal savings divided by disposable income pop15 percent population under age 15 pop75 percent population over age 75 dpi per capita disposable income in dollars ddpi percent growth rate of dpi

30 Example model summary(m <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data=savings)) Estimate Std. Error t value Pr(> t ) (Intercept) pop pop dpi ddpi

31 Residuals vs fitted Residuals vs Fitted plot(m, which=1, bty="n", pch=19) Residuals Philippines Zambia Chile specification 16 and multicollinearity

32 Normal Q-Q plot Normal Q Q plot(m, which=2, bty="n", pch=19) Standardized residuals Chile Philippines Zambia Jos1Elkink 2

33 Scale-Location plot Scale Location plot(m, which=3, bty="n", pch=19) Zambia Standardized residuals Philippines Chile specification 16 and multicollinearity

34 Leverage h <- influence(m)$hat h <- h[order(h)] Leverage United States Japan Ireland South Canada Rhodesia France Jamaica United Kingdom Austria Sweden Luxembourg Venezuela Germany Netherlands Belgium Bolivia Finland Greece Portugal Uruguay Nicaragua ew Norway Zealand Colombia Honduras Guatamala Denmark Korea Philippines Ecuador South Zambia Malaysia Peru Africa Australia Paraguay Italy Switzerland Brazil Iceland Costa India Tunisia Rica Spain China Malta ama urkey le Libya plot(h, type="n", bty="n", xlab="", ylab="leverage") text(h, label=names(h), cex=.7, pos=2) points(h, col="red", pch=19) specification 50 and multicollinearity

35 Residuals vs leverage Residuals vs Leverage plot(m, which=5, bty="n", pch=19) Standardized residuals Zambia Japan Libya Cook s distance

36 Leverage and influence A point with high leverage is located far from the other points. A high leverage point that strongly influences the regression line is called an influential point.

37 Outlier, low leverage, low influence y

38 High leverage, low influence y

39 High leverage, high influence y

40 Cook s Distance D i (ˆβ OLS ( i) ˆβ OLS ) X X(ˆβ ( i) OLS ˆβ OLS ) ks 2 e i h i = ( s ) 2 1 h i k(1 h i ) = t2 i k var(ŷ i ) var(e i ) F(k,n k)

41 The F-test here refers to whether ˆβ OLS would be significantly different if observation i were to be removed (H 0 : β = β ( i) ) (Cook 1979: 169). Specification Cook s Distance D i (ˆβ OLS ( i) ˆβ OLS ) X X(ˆβ ( i) OLS ˆβ OLS ) ks 2 e i h i = ( s ) 2 1 h i k(1 h i ) = t2 i k var(ŷ i ) var(e i ) F(k,n k)

42 Cook s Distance D i = t2 i k var(ŷ i ) var(e i ) t 2 i is a measure of the degree to which the ith observation can be considered as an outlier from the assumed model. The ratios var(ŷ i) var(e i ) measure the relative sensitivity of the estimate, ˆβ OLS, to potential outlying values at each data point. (Cook 1977: 16)

43 Cook s Distance plot Cook s distance plot(m, which=4, pch=19, bty="n") Libya Cook s distance Japan Zambia specification 50 and multicollinearity

44 Cook s Distance vs leverage Cook s dist vs Leverage h ii (1 h ii) Libya plot(m, which=6, pch=19, bty="n") Cook s distance Zambia Japan

45 Examine the outliers > savings[c("japan", "Zambia", "Libya", "United States", "I + ] sr pop15 pop75 dpi ddpi Japan Zambia Libya United States Ireland

46 What to do with outliers? Options: 1 Ignore the problem

47 What to do with outliers? Options: 1 Ignore the problem 2 Investigate why the data are outliers what makes them unusual?

48 What to do with outliers? Options: 1 Ignore the problem 2 Investigate why the data are outliers what makes them unusual? 3 Consider respecifying the model, either by tranforming a variable or by including an additional variable (but beware of overfitting)

49 What to do with outliers? Options: 1 Ignore the problem 2 Investigate why the data are outliers what makes them unusual? 3 Consider respecifying the model, either by tranforming a variable or by including an additional variable (but beware of overfitting) 4 Consider a variant of robust regression that downweights outliers

50 Exercise library(faraway) data(teengamb) lm(gamble ~ sex + status + income + verbal, data=teengamb) Check for leverage, outliers, influential points and nonlinearities.

51 Outline 1 Specification 2 3 4

52 Collinearity When some variables are linear combinations of others then we have exact (or perfect) collinearity, and there is no unique least squares estimate of β.

53 Collinearity When some variables are linear combinations of others then we have exact (or perfect) collinearity, and there is no unique least squares estimate of β. (X X) 1 will not exist if r(x) < k.

54 Collinearity When some variables are linear combinations of others then we have exact (or perfect) collinearity, and there is no unique least squares estimate of β. (X X) 1 will not exist if r(x) < k. When X variables are highly correlated, we have multicollinearity. Detecting multicollinearity: look at correlation matrix of predictors for pairwise correlations regress x j on X ( j) to produce Rj 2, and look for high values (close to 1.0) examine eigenvalues of X X

55 The extent to which multicollinearity is a problem is debatable. The issue is comparable to that of sample size: if n is too small, we have difficulty picking up effects even if they really exist; the same holds for variables that are highly multicollinear, making it difficult to separate their effects on y.

56 However, some problems with high multicollinearity: Small changes in data can lead to large changes in estimates High standard errors but joint significance Coefficients may have wrong sign or implausible magnitudes (Greene 2002: 57)

57 II I y III IV x z

58 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 (Greene 2002: 57)

59 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 σ 2 : all else equal, the better the fit, the lower the variance (Greene 2002: 57)

60 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 σ 2 : all else equal, the better the fit, the lower the variance (1 R 2 k ): all else equal, the lower the R2 from regressing the kth independent variable on all other independent variables, the lower the variance (Greene 2002: 57)

61 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 σ 2 : all else equal, the better the fit, the lower the variance (1 R 2 k ): all else equal, the lower the R2 from regressing the kth independent variable on all other independent variables, the lower the variance n i (x ik x k ) 2 : all else equal, the more variation in x, the lower the variance (Greene 2002: 57)

62 Variance Inflation Factor var(ˆβ OLS k ) = σ 2 (1 Rk 2) n i (x ik x k ) 2 1 VIF k = 1 Rk 2, thus VIF k shows the increase in the var(ˆβ k OLS ) due to the variable being collinear with other independent variables.

63 Variance Inflation Factor var(ˆβ OLS k ) = σ 2 (1 Rk 2) n i (x ik x k ) 2 1 VIF k = 1 Rk 2, thus VIF k shows the increase in the var(ˆβ k OLS ) due to the variable being collinear with other independent variables. library(faraway) vif(lm(...))

64 : solutions Check for coding or logical mistakes (esp. in cases of perfect multicollinearity) Increase n Remove one of the collinear variables (apparently not adding much) Combine multiple variables in indices or underlying dimensions Formalise the relationship

65 Exercise library(faraway) data(longley) Regress Employed on the other independent variables and investigate for multicollinearity and other issues.

66 Outline 1 Specification 2 3 4

67 Homoscedasticity

68

69 Regression disturbances whose variances are not constant across observations are heteroscedastic.

70 Regression disturbances whose variances are not constant across observations are heteroscedastic. Under heteroscedasticity, the OLS estimators remain unbiased and consistent, but are no longer BLUE or asymptotically efficient. (Thomas 1985, 94)

71 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms)

72 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample

73 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series

74 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data)

75 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets)

76 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets) Omitted variable

77 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets) Omitted variable Wrong functional form

78 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets) Omitted variable Wrong functional form Aggregation with varying sizes of populations etc.

79 : aggregation example Imagine we have the following model: y ij = β 0 +β 1 x ij +ε ij, whereby i indicates the individual, and j the region of this individual, with n j individuals per region.

80 : aggregation example Imagine we have the following model: y ij = β 0 +β 1 x ij +ε ij, whereby i indicates the individual, and j the region of this individual, with n j individuals per region. Say we only have regional level data, ȳ j = 1 n j nj i y ij and x j = 1 n j nj i x ij : ȳ j = β 0 +β 1 x j + ε j, where ε j = 1 n j nj i ε ij. (Thomas 1985, 98)

81 : aggregation example ȳ j = β 0 +β 1 x j + ε j E( ε j ) = 0 n j E( ε 2 j) = 1 nj 2 E( ε ij ) = n j nj 2 σ 2 = 1 σ 2 n j i Therefore, var( ε j ) depends on n j and thus varies across cases. (Judge et al 1985, )

82 : aggregation example ȳ j = β 0 +β 1 x j + ε j In this case the fix is actually easy: since var(ε j ) = σ 2 /n j, var( n j ε j ) = σ 2, so the heteroscedasticity can be avoided by transforming the variables: nj ȳ j = β 0 nj +β 1 nj x j +ε j (Thomas 1985, 98)

83 Since OLS is no longer BLUE or asymptotically efficient, other linear unbiased estimators exist which have smaller sampling variances;

84 Since OLS is no longer BLUE or asymptotically efficient, other linear unbiased estimators exist which have smaller sampling variances; other consistent estimators exist which collapse more quickly to the true values as n increases;

85 Since OLS is no longer BLUE or asymptotically efficient, other linear unbiased estimators exist which have smaller sampling variances; other consistent estimators exist which collapse more quickly to the true values as n increases; we can no longer trust hypothesis tests, because var(ˆβ OLS ) is biased. cov(x 2 i,σ2 i ) > 0, then var(ˆβ OLS ) is underestimated cov(x 2 i,σ2 i ) = 0, then no bias in var(ˆβ OLS ) cov(x 2 i,σ2 i ) < 0, then var(ˆβ OLS ) is overestimated (inefficient) (Thomas 1985, 94-95; Judge et al 1985, 422)

86 Normally we assume: σ E(εε X) = σ 2 0 σ I = σ 2 For the heteroscedastic model we have: ω σ E(εε 0 ω X) = Ω = = 0 σ ω n σn 2

87 Deriving var(ˆβ OLS ) var(ˆβ OLS ) = E[(ˆβ OLS β)(ˆβ OLS β) ] = E[((X X) 1 X ε)((x X) 1 X ε) ] = E[(X X) 1 X εε X(X X) 1 ] = (X X) 1 X E[εε ]X(X X) 1 = (X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1

88 Deriving var(ˆβ OLS ) under heteroscedasticity var(ˆβ OLS ) = E[(ˆβ OLS β)(ˆβ OLS β) ] = E[((X X) 1 X ε)((x X) 1 X ε) ] = E[(X X) 1 X εε X(X X) 1 ] = (X X) 1 X E[εε ]X(X X) 1 = (X X) 1 X ΩX(X X) 1, which cannot be simplified further...

89 Deriving var(ˆβ OLS ) under heteroscedasticity var(ˆβ OLS ) = E[(ˆβ OLS β)(ˆβ OLS β) ] = E[((X X) 1 X ε)((x X) 1 X ε) ] = E[(X X) 1 X εε X(X X) 1 ] = (X X) 1 X E[εε ]X(X X) 1 = (X X) 1 X ΩX(X X) 1, which cannot be simplified further and requires knowledge of Ω to estimate.

90 Efficiency Because observations with low variance will contain more information about the parameters than observations with high variance, an estimator which weighs all observations equally, like OLS, will not be the most efficient. (Davidson & MacKinnon 1999: 197)

91 : solution When the type of heteroscedasticity is known, we can often transform the data. An example is the multiplication with n j of each term in the equation for the group means regression.

92 : solution When the type of heteroscedasticity is known, we can often transform the data. An example is the multiplication with n j of each term in the equation for the group means regression. Another example: if var(ε i ) = σ 2 x 2 i1, then var(ε i/x i1 ) = σ 2, so: y i = β 0 +β 1 x i1 +β 2 x i2 +ε i y i 1 x i1 x i2 = β 0 +β 1 +β 2 + ε i x i1 x i1 x i1 x i1 yi = β 1 +β 0 xi1 +β 1 xi2 +ε i (note the intercepts interpretation) x i1 (Thomas 1985, 98)

93 Generalized Least Squares More in general, if var(ε i ) = σ 2 λ i, with λ i being some function of X i, then we can always transform our model by dividing all variables by λ i. This is referred to as generalized least squares (). (It is a generalization, because of λ i = 1, we have OLS.) With, observations with lower σ 2 are weighted more heavily. (Thomas 1985, 98; Judge et al 1985, 421)

94 Estimated Generalized Least Squares To perform estimation, σi 2 has to be known. In some cases we can estimate σi 2, in which case we talk of estimated generalized least squares (E).

95 Estimated Generalized Least Squares To perform estimation, σi 2 has to be known. In some cases we can estimate σi 2, in which case we talk of estimated generalized least squares (E). To estimate a model with minimal restrictions on σi 2, we are estimating a model with n+k unknown parameters - i.e. the number of parameters to be estimated increases as n increases and the estimator is by definition inconsistent. (Judge et al 1985, 423)

96 Estimated Generalized Least Squares Special cases where estimation might be possible: σ 2 constant within subgroups σ 2 = (Zα) 2, i.e. σ is linear function of exogenous variables σ 2 = Zα, i.e. σ 2 is linear function of exogenous variables σ 2 = σ 2 (Xβ) p, i.e. var(y) is proportional to a power of its expectation σ 2 = e Zα, multiplicative heteroscedasticity e t = v t α 0 +α 1 et 1 2, autoregressive conditional heteroscedasticity (ARCH) See Judge et al (1985, 424ff) for an overview of estimators.

97 When the form of the heteroscedasticity is unknown, we can get consistent estimates of var(ˆβ OLS ) using a heteroscedasticity consistent covariance matrix (HCCM).

98 When the form of the heteroscedasticity is unknown, we can get consistent estimates of var(ˆβ OLS ) using a heteroscedasticity consistent covariance matrix (HCCM). var(ˆβ OLS ) = (X X) 1 X ΩX(X X) 1 HCCM: estimate ˆω ii = (e i 0) 2 = ei 2, so that we have variance estimator var(ˆβ OLS ) = (X X) 1 X diag(ei 2 )X(X X) 1

99 Since there are several variations, this is called HC0 in the Specification When the form of the heteroscedasticity is unknown, we can get consistent estimates of var(ˆβ OLS ) using a heteroscedasticity consistent covariance matrix (HCCM). var(ˆβ OLS ) = (X X) 1 X ΩX(X X) 1 HCCM: estimate ˆω ii = (e i 0) 2 = ei 2, so that we have variance estimator var(ˆβ OLS ) = (X X) 1 X diag(ei 2 )X(X X) 1

100 Residuals vs errors Note that: h ii = x i (X X) 1 x i var(e i ) = σ 2 (1 h ii ) σ 2, therefore var(e i ) underestimates σ 2

101 Residuals vs errors Note that: h ii = x i (X X) 1 x i var(e i ) = σ 2 (1 h ii ) σ 2, therefore var(e i ) underestimates σ 2 and even when the errors (ε) are homoscedastic, the residuals (e) are not.

102 Residuals vs errors Note that: h ii = x i (X X) 1 x i var(e i ) = σ 2 (1 h ii ) σ 2, therefore var(e i ) underestimates σ 2 and even when the errors (ε) are homoscedastic, the residuals (e) are not. So ei 2, used in White s HC0, is, even though consistent, a biased estimator. The small sample properties turn out not to be very good. (Long & Ervin 2000)

103 HCCM variations HC0 = (X X) 1 X diag(e 2 i )X(X X) 1 HC1 = n n k (X X) 1 X diag(e 2 i )X(X X) 1 = n n k HC0 HC2 = (X X) 1 X diag( )X(X X) 1 1 h ii HC3 = (X X) 1 X diag( (1 h ii ) 2)X(X X) 1 Based on Monte Carlo analyses, HC3 is best in small samples. e 2 i e 2 i (Long & Ervin 2000)

104 in R library(car) m <- lm(...) summary(m) vcov <- hccm(m, type="hc3") sqrt(diag(vcov)) (See notes for manual version.)

105 Another solution for dealing with heteroscedasticity is to bootstrap to acquire standard errors. se <- NULL for (i in 1:1000) { sel <- sample(1:n, n, TRUE) mbs <- lm(y ~ x1 + x2, data=data[sel,]) se <- rbind(se, sqrt(diag(vcov(mbs)))) } colmeans(se)

106 Exercise Using p4factor.csv data and model dem i = β 0 +β 1 cwar i +β 3 laggdppc i +β 4 pnbdem i +β 6 egr i +ε i, calculate the standard errors using normal OLS estimation; the four HCCM variations; bootstrapping.

107 Residual plots: heteroscedasticity To detect heteroscedasticity (unequal variances), it is useful to plot: Residuals against fitted values Residuals against dependent variable Residuals against independent variable(s) Usually, the first one is sufficient to detect heteroscedasticity, and can simply be found by: m <- lm(y ~ x) plot(m)

108 Residual plots: heteroscedasticity y

109 Residual plots: heteroscedasticity residuals(m)

110 Residual plots: heteroscedasticity residuals(m)

111 Residual plots: heteroscedasticity residuals(m)

112 Residual plots: homoscedasticity y

113 Residual plots: homoscedasticity residuals(m)

114 Residual plots: homoscedasticity residuals(m)

115 Residual plots: homoscedasticity residuals(m)

116 Residual plots: heteroscedasticity y

117 Residual plots: heteroscedasticity residuals(m)

118 Residual plots: heteroscedasticity residuals(m)

119 Residual plots: heteroscedasticity residuals(m)

120 Known groups One way of testing for heteroscedasticity is if you expect that the variances might differ between two groups, is to run two separate regressions, for the two groups: SSR 1 /(n 1 k) SSR 2 /(n 2 k) F(n 1 k,n 2 k) H 0 : σ 2 1 = σ 2 2 (Wallace & Silver 1988: 267)

121 Known groups For example, f <- dem ~ cwar + laggdppc + pnbdem m1 <- lm(f, data=p4, subset=(bautlag == 1)) m2 <- lm(f, data=p4, subset=(bautlag == 0)) ssr1 <- sum(residuals(m1)^2) ssr2 <- sum(residuals(m2)^2) F <- (ssr1/m1$df) / (ssr2/m2$df) 1 - pf(f, m1$df, m2$df)

122 Breusch-Pagan test σi 2 = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t.

123 Breusch-Pagan test σ 2 i = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t. So this includes scenarios where σ 2 i = (Zα) 2, or σ i = Zα, or σ i = e Zα.

124 Breusch-Pagan test σ 2 i = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t. So this includes scenarios where σ 2 i = (Zα) 2, or σ i = Zα, or σ i = e Zα. If Z contains dummies for groups, it also includes heteroscedasticity due to different variances across subgroups.

125 Breusch-Pagan test σ 2 i = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t. So this includes scenarios where σ 2 i = (Zα) 2, or σ i = Zα, or σ i = e Zα. If Z contains dummies for groups, it also includes heteroscedasticity due to different variances across subgroups. Assumes e 2 i N(0,σ 2 i )

126 Breusch-Pagan test η = q Z(Z Z) 1 Z q 2ˆσ 4 χ 2 (s 1) asymptotically, where q i = e 2 i ˆσ 2 ˆσ 2 = 1 n e e and Z n s a matrix of exogenous variables.

127 Breusch-Pagan test m <- lm(y~x) r <- residuals(m) s2 <- t(r) %*% r / n q <- r^2 - s2 Z <- cbind(1,x) s <- dim(z)[2] eta <- (t(q) %*% Z %*% solve(t(z) %*% Z) %*% t(z) %*% q) eta <- eta / (2 * s2 ^2) p <- 1 - pchisq(eta, s-1) bptest(m, studentize=f)

128 Breusch-Pagan test With more than one independent variable, an alternative approach is to look at an auxiliary regression: e 2 i = γ 0 +γ 1 ŷ 2 i +v i If the model is homoscedastic and the variance is unrelated to ŷ, then H 0 : γ 1 = 0. For this regression, nr 2 χ 2 (1). summary(lm(residuals(m)^2 ~ fitted(m)))$r.sq * n (Thomas 1985, 96-97)

129 Breusch-Pagan test library(lmtest) m <- lm(y ~ x) bptest(m) bptest(m, ~ z1 + z2) By default, R assumes Z = X.

130 Goldfeld-Quandt test To run a Goldfeld-Quandt test: 1 Omit r central observations from the data 2 Run two separate regressions, one for the first (n r)/2 observations and one for the last 3 Calculate R = SSR 1 /SSR 2 4 Perform test based on R F( 1 2 (n r 2k), 1 2 (n r 2k)). (Judge et al 1985, 449)

131 Goldfeld-Quandt test m <- lm(y ~ x) m1 <- lm(y[1:20] ~ x[1:20]) m2 <- lm(y[(n-20):n] ~ x[(n-20):n]) ssr1 <- sum(residuals(m1)^2) ssr2 <- sum(residuals(m2)^2) R <- ssr1/ssr2 p1 <- 1 - pf(r, 18, 18) p2 <- 1 - pf(1/r, 18, 18) library(lmtest) gqtest(m, n-40)

132 Harrison-McCabe test For the above tests we always run several regressions, because even if errors are uncorrelated, residuals are not independent. If residuals are not independent, a ratio of subsets of these residuals do not have an F-distribution, while if we run separate regressions, the residuals will be independent (if the errors are) and such a ratio will have an F-distribution.

133 Harrison-McCabe test For the above tests we always run several regressions, because even if errors are uncorrelated, residuals are not independent. If residuals are not independent, a ratio of subsets of these residuals do not have an F-distribution, while if we run separate regressions, the residuals will be independent (if the errors are) and such a ratio will have an F-distribution. Harrison & McCabe (1979) suggest that such a ratio of subsets of the residuals do have a β-distribution, however.

134 Harrison-McCabe test Harrison McCabe statistic Upper bound HMC statistic Lower bound (two tailed test) E(b) 95% C.I

135 White s test One solution for dealing with heteroscedasticity is calculating White s heteroscedasticity-corrected standard errors. The reasoning behind the White test is very straightforward: if there is homoscedasticity, the corrected standard errors should not be significantly different from the normal ones.

136 White s test 1 Regress e 2 i on x i, all the variables in x i squared, and all cross-products of x i ; 2 Perform test on basis of nr 2 χ 2 (p 1), whereby p is the number of regressors in the auxiliary regression (Greene 2003, 222)

137 White s test 1 Regress e 2 i on x i, all the variables in x i squared, and all cross-products of x i ; e.g. if then run regression y i = β 0 +β 1 x i1 +β 2 x i2 +ε i e 2 i = γ 0 +γ 1 x i1 +γ 2 x i2 +γ 3 x 2 i1 +γ 4 x 2 i2 +γ 5 x i1 x i2 +v i and calculate R 2 ; 2 Perform test on basis of nr 2 χ 2 (p 1), whereby p is the number of regressors in the auxiliary regression (6 in the example). (Greene 2003, 222)

138 White s test in R m <- lm(y ~ x1 + x2) bptest(m, ~ x1 * x2 + I(x1^2) + I(x2^2)) I.e. there does not appear to be an implementation of White s test in R, but it is equivalent to the Breusch-Pagan test with the independent variables as discussed.

139 tests In general, many of these tests require some idea about the shape of the heteroscedasticity;

140 tests In general, many of these tests require some idea about the shape of the heteroscedasticity; many of these test have weak power, depending on the type of heteroscedasticity;

141 tests In general, many of these tests require some idea about the shape of the heteroscedasticity; many of these test have weak power, depending on the type of heteroscedasticity; if there is good reason to suspect heteroscedasticity, it is generally better to just use some robust estimation rather than test first - the tests are not reliable enough.

142 Exercise Using p4factor.csv data in 1998 and model dem i = β 0 +β 1 cwar i +β 3 laggdppc i +β 4 pnbdem i +β 6 egr i +ε i, run tests for heteroscedasticity: Breusch-Pagan; Goldfeld-Quandt; Harrison-McCabe; White.

143 Exercise Test the following model for heteroscedasticity and calculate corrected standard errors: library(lmtest) data(unemployment) myunemployment <- window(unemployment, start=1895, end=1956) time <- 6:67 modelrea <- UN ~ log(m/p) + log(g) + log(x) + time m <- lm(modelrea, data = myunemployment)

Advanced Quantitative Methods: Regression diagnostics

Advanced Quantitative Methods: Regression diagnostics Advanced Quantitative Methods: Regression diagnostics Johan A. Elkink University College Dublin 9 February 2018 1, leverage, influence 2 3 Heteroscedasticity 4 1, leverage, influence 2 3 Heteroscedasticity

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Topic 7: Heteroskedasticity

Topic 7: Heteroskedasticity Topic 7: Heteroskedasticity Advanced Econometrics (I Dong Chen School of Economics, Peking University Introduction If the disturbance variance is not constant across observations, the regression is heteroskedastic

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Econometrics - 30C00200

Econometrics - 30C00200 Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business

More information

Lecture 4: Heteroskedasticity

Lecture 4: Heteroskedasticity Lecture 4: Heteroskedasticity Econometric Methods Warsaw School of Economics (4) Heteroskedasticity 1 / 24 Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity White Goldfeld-Quandt Breusch-Pagan

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Lecture 4: Regression Analysis

Lecture 4: Regression Analysis Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity 1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Heteroskedasticity and Autocorrelation

Heteroskedasticity and Autocorrelation Lesson 7 Heteroskedasticity and Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 7. Heteroskedasticity

More information

Introduction to regression. Paul Schrimpf. Paul Schrimpf. UBC Economics 326. January 23, 2018

Introduction to regression. Paul Schrimpf. Paul Schrimpf. UBC Economics 326. January 23, 2018 Introduction UBC Economics 326 January 23, 2018 Review of last week Expectations and conditional expectations Linear Iterated expectations Asymptotics using large sample distribution to approximate finite

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Multiple Regression Analysis: Heteroskedasticity Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties

More information

2017 Source of Foreign Income Earned By Fund

2017 Source of Foreign Income Earned By Fund 2017 Source of Foreign Income Earned By Fund Putnam Emerging Markets Equity Fund EIN: 26-2670607 FYE: 08/31/2017 Statement Pursuant to 1.853-4: The fund is hereby electing to apply code section 853 for

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa

How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa 1 Outline Focus of the study Data Dispersion and forecast errors during turning points Testing efficiency

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The general linear regression with k explanatory variables is just an extension of the simple regression as follows 3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because

More information

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning Økonomisk Kandidateksamen 2004 (I) Econometrics 2 Rettevejledning This is a closed-book exam (uden hjælpemidler). Answer all questions! The group of questions 1 to 4 have equal weight. Within each group,

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Motivation for multiple regression

Motivation for multiple regression Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope

More information

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Lecture 1: OLS derivations and inference

Lecture 1: OLS derivations and inference Lecture 1: OLS derivations and inference Econometric Methods Warsaw School of Economics (1) OLS 1 / 43 Outline 1 Introduction Course information Econometrics: a reminder Preliminary data exploration 2

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Introduction to Econometrics. Heteroskedasticity

Introduction to Econometrics. Heteroskedasticity Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Leiden University Leiden, 30 April 2018 Outline 1 Error assumptions Introduction Variance Normality 2 Residual vs error Outliers Influential observations Introduction Errors and

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

ECON Introductory Econometrics. Lecture 13: Internal and external validity

ECON Introductory Econometrics. Lecture 13: Internal and external validity ECON4150 - Introductory Econometrics Lecture 13: Internal and external validity Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 9 Lecture outline 2 Definitions of internal and external

More information

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

Graduate Econometrics Lecture 4: Heteroskedasticity

Graduate Econometrics Lecture 4: Heteroskedasticity Graduate Econometrics Lecture 4: Heteroskedasticity Department of Economics University of Gothenburg November 30, 2014 1/43 and Autocorrelation Consequences for OLS Estimator Begin from the linear model

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

Instrumental Variables, Simultaneous and Systems of Equations

Instrumental Variables, Simultaneous and Systems of Equations Chapter 6 Instrumental Variables, Simultaneous and Systems of Equations 61 Instrumental variables In the linear regression model y i = x iβ + ε i (61) we have been assuming that bf x i and ε i are uncorrelated

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

L7: Multicollinearity

L7: Multicollinearity L7: Multicollinearity Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Introduction ï Example Whats wrong with it? Assume we have this data Y

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

04 June Dim A W V Total. Total Laser Met

04 June Dim A W V Total. Total Laser Met 4 June 218 Member State State as on 4 June 218 Acronyms are listed in the last page of this document. AUV Mass and Related Quantities Length PR T TF EM Mass Dens Pres F Torq Visc H Grav FF Dim A W V Total

More information

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1 PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,

More information

Iris Wang.

Iris Wang. Chapter 10: Multicollinearity Iris Wang iris.wang@kau.se Econometric problems Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences?

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Econometrics. 9) Heteroscedasticity and autocorrelation

Econometrics. 9) Heteroscedasticity and autocorrelation 30C00200 Econometrics 9) Heteroscedasticity and autocorrelation Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Heteroscedasticity Possible causes Testing for

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 4 Jakub Mućk Econometrics of Panel Data Meeting # 4 1 / 30 Outline 1 Two-way Error Component Model Fixed effects model Random effects model 2 Non-spherical

More information

Chapter 2 Multiple Regression I (Part 1)

Chapter 2 Multiple Regression I (Part 1) Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Econometrics Master in Business and Quantitative Methods

Econometrics Master in Business and Quantitative Methods Econometrics Master in Business and Quantitative Methods Helena Veiga Universidad Carlos III de Madrid Models with discrete dependent variables and applications of panel data methods in all fields of economics

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance. Heteroskedasticity y i = β + β x i + β x i +... + β k x ki + e i where E(e i ) σ, non-constant variance. Common problem with samples over individuals. ê i e ˆi x k x k AREC-ECON 535 Lec F Suppose y i =

More information

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Ref.:   Spring SOS3003 Applied data analysis for social science Lecture note SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors ECON4150 - Introductory Econometrics Lecture 6: OLS with Multiple Regressors Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 6 Lecture outline 2 Violation of first Least Squares assumption

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63 1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 17, 2012 Outline Heteroskedasticity

More information

Reference: Davidson and MacKinnon Ch 2. In particular page

Reference: Davidson and MacKinnon Ch 2. In particular page RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Solow model: Convergence

Solow model: Convergence Solow model: Convergence Per capita income k(0)>k* Assume same s, δ, & n, but no technical progress y* k(0)=k* k(0) k Assume same s, δ, &

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within

More information

Volume 31, Issue 1. Mean-reverting behavior of consumption-income ratio in OECD countries: evidence from SURADF panel unit root tests

Volume 31, Issue 1. Mean-reverting behavior of consumption-income ratio in OECD countries: evidence from SURADF panel unit root tests Volume 3, Issue Mean-reverting behavior of consumption-income ratio in OECD countries: evidence from SURADF panel unit root tests Shu-Yi Liao Department of Applied Economics, National Chung sing University,

More information

ETH Zürich, October 25, 2010

ETH Zürich, October 25, 2010 Marcel Dettling Institute for Data nalysis and Process Design Zurich University of pplied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling t t th h/ ttli ETH Zürich, October 25, 2010 1 Mortality

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics

More information

POLSCI 702 Non-Normality and Heteroskedasticity

POLSCI 702 Non-Normality and Heteroskedasticity Goals of this Lecture POLSCI 702 Non-Normality and Heteroskedasticity Dave Armstrong University of Wisconsin Milwaukee Department of Political Science e: armstrod@uwm.edu w: www.quantoid.net/uwm702.html

More information

Testing Linear Restrictions: cont.

Testing Linear Restrictions: cont. Testing Linear Restrictions: cont. The F-statistic is closely connected with the R of the regression. In fact, if we are testing q linear restriction, can write the F-stastic as F = (R u R r)=q ( R u)=(n

More information

Reliability of inference (1 of 2 lectures)

Reliability of inference (1 of 2 lectures) Reliability of inference (1 of 2 lectures) Ragnar Nymoen University of Oslo 5 March 2013 1 / 19 This lecture (#13 and 14): I The optimality of the OLS estimators and tests depend on the assumptions of

More information

Linear Regression Models

Linear Regression Models Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information