Advanced Quantitative Methods: specification and multicollinearity
|
|
- Dorthy Howard
- 5 years ago
- Views:
Transcription
1 Advanced Quantitative Methods: Specification and multicollinearity University College Dublin February 16, 2011
2 1 Specification 2 3 4
3 Omitted variables Nonlinearities Outline 1 Specification 2 3 4
4 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0.
5 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0. intercept will be biased iff E(z) 0.
6 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0. intercept will be biased iff E(z) 0. ˆσ 2 will be biased upward
7 Omitted variables Nonlinearities Omitting a relevant independent variable For omitted variable z: ˆβ OLS will be biased iff cor(z,x) 0. intercept will be biased iff E(z) 0. ˆσ 2 will be biased upward = V(ˆβ OLS ) will be biased upward
8 Omitted variables Nonlinearities Omitting a relevant variable z: graphical intuition y If z omitted, Areas II and III reflect information used to estimate ˆβ OLS X I If z included, only Area II would be used to estimate ˆβ OLS X x II III IV Only Area I is used to estimate ˆσ 2, except when z excluded, and then Area IV is also used z If X is orthogonal to z, then no Area III and bias disappears
9 Omitted variables Nonlinearities Including an irrelevant independent variable For unnecessarily included variable z: ˆβ OLS X and V(ˆβ X OLS ) will remain unbiased
10 Omitted variables Nonlinearities Including an irrelevant independent variable For unnecessarily included variable z: ˆβ OLS X and V(ˆβ X OLS ) will remain unbiased ˆβ OLS will be less efficient (increases MSE)
11 Omitted variables Nonlinearities Adding an irrelevant variable: graphical intuition Area II refects variation in y due entirely to X, so unbiased ˆβ OLS X x II III I y Since Area II < (Area II + III), V(ˆβ OLS X ) increases Area I used to estimate ˆσ 2 unbiased so V(ˆβ OLS ) remains unbiased z If z is orthogonal to X then no Area III and then no efficiency loss
12 Omitted variables Nonlinearities Testing restrictions H 0 : y = X (1) n k β 1 +ε H 1 : y = X (1) n k β 1 +X (2) n r β 2 +ε USSR Unrestricted sum of squared residuals RSSR Restricted sum of squared residuals F β2 = (RSSR USSR)/r USSR/(n k r) F(r,n k r)
13 Omitted variables Nonlinearities Non-linearity If there is non-linearity in the variables, but not in the parameters, there is no problem. E.g. can be estimated with OLS. y i = β 0 +β 1 x i +β 2 x 2 i +ε i
14 Omitted variables Nonlinearities Non-linearity If there is non-linearity in the variables, but not in the parameters, there is no problem. E.g. can be estimated with OLS. y i = β 0 +β 1 x i +β 2 x 2 i +ε i If there are other non-linearities, sometimes the equation can be transformed. E.g. y i = β 0 x β 1 i ε i log(y i ) = log(β 0 )+β 1 log(x i )+log(ε i ) y i = β 0 +β 1 x i +ε i
15 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example
16 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example semi-log has two forms: y i = β 0 +β 1 log(x i ), where β 1 is y due to % x) log(y i ) = β 0 +β 1 x i, where β 1 is % y due to x)
17 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example semi-log has two forms: y i = β 0 +β 1 log(x i ), where β 1 is y due to % x) log(y i ) = β 0 +β 1 x i, where β 1 is % y due to x) inverse or reciprocal: y i = β 0 +β 1 1 x i
18 Omitted variables Nonlinearities Functional forms for additional non-linear transformations log-linear as with the previous example semi-log has two forms: y i = β 0 +β 1 log(x i ), where β 1 is y due to % x) log(y i ) = β 0 +β 1 x i, where β 1 is % y due to x) inverse or reciprocal: y i = β 0 +β 1 1 x i polynomial y i = β 0 +β 1 x i +β 2 x 2 i
19 Omitted variables Nonlinearities Exercise Open the mnrf.csv data file. Using plots and diagnostics, find the right specification for: f = f(m,n,r) Of course, start with: f i = β 0 +β 1 m i +β 2 n i +β 3 r i +ε i
20 Outline 1 Specification 2 3 4
21 Hat matrix ŷ = Xˆβ OLS = X(X X) 1 X y = Hy H is called the hat matrix (it puts a hat on y )
22 Hat matrix ŷ = Xˆβ OLS = X(X X) 1 X y = Hy H is called the hat matrix (it puts a hat on y ), or sometimes prediction matrix P.
23 Hat matrix ŷ = Xˆβ OLS = X(X X) 1 X y = Hy H is called the hat matrix (it puts a hat on y ), or sometimes prediction matrix P. var(ŷ) = σ 2 H var(e) = σ 2 (I H)
24 Hat matrix e=(i H)y y x 2 x 1 y^ = Hy
25 Leverage The elements on the diagonal of H are called the leverage of each case - the higher the leverage, the more this particular case contributed to the predicted dependent variable.
26 Leverage The elements on the diagonal of H are called the leverage of each case - the higher the leverage, the more this particular case contributed to the predicted dependent variable. For the remainder we will use: h i = H ii = x i (X X) 1 x i, thus h i represents the leverage of observation i (x i is a row vector of the independent variables for case i).
27 Studentized and standardized residuals The standardized residual is: e i r i = s 2 (1 h i ) The studentized residual is the root mean squared error of the regression with the ith observation removed: t i = e i, s( i) 2 (1 h i) with s 2 ( i) representing s2 for the model without observation i. Both standardized and studentized residuals are attempts to adjust residuals by their standard errors, since: var(e i ) = σ 2 specification (1 h and i ) multicollinearity
28 Outliers An outlier is a point on the regression line where the residual is large.
29 Example model sr i = β 0 +β 1 pop15 i +β 2 pop75 i +β 3 dpi i +β 4 ddpi i +ε i sr savings rate - personal savings divided by disposable income pop15 percent population under age 15 pop75 percent population over age 75 dpi per capita disposable income in dollars ddpi percent growth rate of dpi
30 Example model summary(m <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data=savings)) Estimate Std. Error t value Pr(> t ) (Intercept) pop pop dpi ddpi
31 Residuals vs fitted Residuals vs Fitted plot(m, which=1, bty="n", pch=19) Residuals Philippines Zambia Chile specification 16 and multicollinearity
32 Normal Q-Q plot Normal Q Q plot(m, which=2, bty="n", pch=19) Standardized residuals Chile Philippines Zambia Jos1Elkink 2
33 Scale-Location plot Scale Location plot(m, which=3, bty="n", pch=19) Zambia Standardized residuals Philippines Chile specification 16 and multicollinearity
34 Leverage h <- influence(m)$hat h <- h[order(h)] Leverage United States Japan Ireland South Canada Rhodesia France Jamaica United Kingdom Austria Sweden Luxembourg Venezuela Germany Netherlands Belgium Bolivia Finland Greece Portugal Uruguay Nicaragua ew Norway Zealand Colombia Honduras Guatamala Denmark Korea Philippines Ecuador South Zambia Malaysia Peru Africa Australia Paraguay Italy Switzerland Brazil Iceland Costa India Tunisia Rica Spain China Malta ama urkey le Libya plot(h, type="n", bty="n", xlab="", ylab="leverage") text(h, label=names(h), cex=.7, pos=2) points(h, col="red", pch=19) specification 50 and multicollinearity
35 Residuals vs leverage Residuals vs Leverage plot(m, which=5, bty="n", pch=19) Standardized residuals Zambia Japan Libya Cook s distance
36 Leverage and influence A point with high leverage is located far from the other points. A high leverage point that strongly influences the regression line is called an influential point.
37 Outlier, low leverage, low influence y
38 High leverage, low influence y
39 High leverage, high influence y
40 Cook s Distance D i (ˆβ OLS ( i) ˆβ OLS ) X X(ˆβ ( i) OLS ˆβ OLS ) ks 2 e i h i = ( s ) 2 1 h i k(1 h i ) = t2 i k var(ŷ i ) var(e i ) F(k,n k)
41 The F-test here refers to whether ˆβ OLS would be significantly different if observation i were to be removed (H 0 : β = β ( i) ) (Cook 1979: 169). Specification Cook s Distance D i (ˆβ OLS ( i) ˆβ OLS ) X X(ˆβ ( i) OLS ˆβ OLS ) ks 2 e i h i = ( s ) 2 1 h i k(1 h i ) = t2 i k var(ŷ i ) var(e i ) F(k,n k)
42 Cook s Distance D i = t2 i k var(ŷ i ) var(e i ) t 2 i is a measure of the degree to which the ith observation can be considered as an outlier from the assumed model. The ratios var(ŷ i) var(e i ) measure the relative sensitivity of the estimate, ˆβ OLS, to potential outlying values at each data point. (Cook 1977: 16)
43 Cook s Distance plot Cook s distance plot(m, which=4, pch=19, bty="n") Libya Cook s distance Japan Zambia specification 50 and multicollinearity
44 Cook s Distance vs leverage Cook s dist vs Leverage h ii (1 h ii) Libya plot(m, which=6, pch=19, bty="n") Cook s distance Zambia Japan
45 Examine the outliers > savings[c("japan", "Zambia", "Libya", "United States", "I + ] sr pop15 pop75 dpi ddpi Japan Zambia Libya United States Ireland
46 What to do with outliers? Options: 1 Ignore the problem
47 What to do with outliers? Options: 1 Ignore the problem 2 Investigate why the data are outliers what makes them unusual?
48 What to do with outliers? Options: 1 Ignore the problem 2 Investigate why the data are outliers what makes them unusual? 3 Consider respecifying the model, either by tranforming a variable or by including an additional variable (but beware of overfitting)
49 What to do with outliers? Options: 1 Ignore the problem 2 Investigate why the data are outliers what makes them unusual? 3 Consider respecifying the model, either by tranforming a variable or by including an additional variable (but beware of overfitting) 4 Consider a variant of robust regression that downweights outliers
50 Exercise library(faraway) data(teengamb) lm(gamble ~ sex + status + income + verbal, data=teengamb) Check for leverage, outliers, influential points and nonlinearities.
51 Outline 1 Specification 2 3 4
52 Collinearity When some variables are linear combinations of others then we have exact (or perfect) collinearity, and there is no unique least squares estimate of β.
53 Collinearity When some variables are linear combinations of others then we have exact (or perfect) collinearity, and there is no unique least squares estimate of β. (X X) 1 will not exist if r(x) < k.
54 Collinearity When some variables are linear combinations of others then we have exact (or perfect) collinearity, and there is no unique least squares estimate of β. (X X) 1 will not exist if r(x) < k. When X variables are highly correlated, we have multicollinearity. Detecting multicollinearity: look at correlation matrix of predictors for pairwise correlations regress x j on X ( j) to produce Rj 2, and look for high values (close to 1.0) examine eigenvalues of X X
55 The extent to which multicollinearity is a problem is debatable. The issue is comparable to that of sample size: if n is too small, we have difficulty picking up effects even if they really exist; the same holds for variables that are highly multicollinear, making it difficult to separate their effects on y.
56 However, some problems with high multicollinearity: Small changes in data can lead to large changes in estimates High standard errors but joint significance Coefficients may have wrong sign or implausible magnitudes (Greene 2002: 57)
57 II I y III IV x z
58 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 (Greene 2002: 57)
59 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 σ 2 : all else equal, the better the fit, the lower the variance (Greene 2002: 57)
60 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 σ 2 : all else equal, the better the fit, the lower the variance (1 R 2 k ): all else equal, the lower the R2 from regressing the kth independent variable on all other independent variables, the lower the variance (Greene 2002: 57)
61 Variance of ˆβ OLS var(ˆβ OLS k ) = σ 2 (1 R 2 k ) n i (x ik x k ) 2 σ 2 : all else equal, the better the fit, the lower the variance (1 R 2 k ): all else equal, the lower the R2 from regressing the kth independent variable on all other independent variables, the lower the variance n i (x ik x k ) 2 : all else equal, the more variation in x, the lower the variance (Greene 2002: 57)
62 Variance Inflation Factor var(ˆβ OLS k ) = σ 2 (1 Rk 2) n i (x ik x k ) 2 1 VIF k = 1 Rk 2, thus VIF k shows the increase in the var(ˆβ k OLS ) due to the variable being collinear with other independent variables.
63 Variance Inflation Factor var(ˆβ OLS k ) = σ 2 (1 Rk 2) n i (x ik x k ) 2 1 VIF k = 1 Rk 2, thus VIF k shows the increase in the var(ˆβ k OLS ) due to the variable being collinear with other independent variables. library(faraway) vif(lm(...))
64 : solutions Check for coding or logical mistakes (esp. in cases of perfect multicollinearity) Increase n Remove one of the collinear variables (apparently not adding much) Combine multiple variables in indices or underlying dimensions Formalise the relationship
65 Exercise library(faraway) data(longley) Regress Employed on the other independent variables and investigate for multicollinearity and other issues.
66 Outline 1 Specification 2 3 4
67 Homoscedasticity
68
69 Regression disturbances whose variances are not constant across observations are heteroscedastic.
70 Regression disturbances whose variances are not constant across observations are heteroscedastic. Under heteroscedasticity, the OLS estimators remain unbiased and consistent, but are no longer BLUE or asymptotically efficient. (Thomas 1985, 94)
71 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms)
72 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample
73 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series
74 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data)
75 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets)
76 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets) Omitted variable
77 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets) Omitted variable Wrong functional form
78 Causes of heteroscedasicity More variation for larger sizes (e.g. profits of firms varies more for larger firms) More variation across different groups in the sample Learning effects in time-series Variation in data collection quality (e.g. historical data) Turbulence after shocks in time-series (e.g. financial markets) Omitted variable Wrong functional form Aggregation with varying sizes of populations etc.
79 : aggregation example Imagine we have the following model: y ij = β 0 +β 1 x ij +ε ij, whereby i indicates the individual, and j the region of this individual, with n j individuals per region.
80 : aggregation example Imagine we have the following model: y ij = β 0 +β 1 x ij +ε ij, whereby i indicates the individual, and j the region of this individual, with n j individuals per region. Say we only have regional level data, ȳ j = 1 n j nj i y ij and x j = 1 n j nj i x ij : ȳ j = β 0 +β 1 x j + ε j, where ε j = 1 n j nj i ε ij. (Thomas 1985, 98)
81 : aggregation example ȳ j = β 0 +β 1 x j + ε j E( ε j ) = 0 n j E( ε 2 j) = 1 nj 2 E( ε ij ) = n j nj 2 σ 2 = 1 σ 2 n j i Therefore, var( ε j ) depends on n j and thus varies across cases. (Judge et al 1985, )
82 : aggregation example ȳ j = β 0 +β 1 x j + ε j In this case the fix is actually easy: since var(ε j ) = σ 2 /n j, var( n j ε j ) = σ 2, so the heteroscedasticity can be avoided by transforming the variables: nj ȳ j = β 0 nj +β 1 nj x j +ε j (Thomas 1985, 98)
83 Since OLS is no longer BLUE or asymptotically efficient, other linear unbiased estimators exist which have smaller sampling variances;
84 Since OLS is no longer BLUE or asymptotically efficient, other linear unbiased estimators exist which have smaller sampling variances; other consistent estimators exist which collapse more quickly to the true values as n increases;
85 Since OLS is no longer BLUE or asymptotically efficient, other linear unbiased estimators exist which have smaller sampling variances; other consistent estimators exist which collapse more quickly to the true values as n increases; we can no longer trust hypothesis tests, because var(ˆβ OLS ) is biased. cov(x 2 i,σ2 i ) > 0, then var(ˆβ OLS ) is underestimated cov(x 2 i,σ2 i ) = 0, then no bias in var(ˆβ OLS ) cov(x 2 i,σ2 i ) < 0, then var(ˆβ OLS ) is overestimated (inefficient) (Thomas 1985, 94-95; Judge et al 1985, 422)
86 Normally we assume: σ E(εε X) = σ 2 0 σ I = σ 2 For the heteroscedastic model we have: ω σ E(εε 0 ω X) = Ω = = 0 σ ω n σn 2
87 Deriving var(ˆβ OLS ) var(ˆβ OLS ) = E[(ˆβ OLS β)(ˆβ OLS β) ] = E[((X X) 1 X ε)((x X) 1 X ε) ] = E[(X X) 1 X εε X(X X) 1 ] = (X X) 1 X E[εε ]X(X X) 1 = (X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1
88 Deriving var(ˆβ OLS ) under heteroscedasticity var(ˆβ OLS ) = E[(ˆβ OLS β)(ˆβ OLS β) ] = E[((X X) 1 X ε)((x X) 1 X ε) ] = E[(X X) 1 X εε X(X X) 1 ] = (X X) 1 X E[εε ]X(X X) 1 = (X X) 1 X ΩX(X X) 1, which cannot be simplified further...
89 Deriving var(ˆβ OLS ) under heteroscedasticity var(ˆβ OLS ) = E[(ˆβ OLS β)(ˆβ OLS β) ] = E[((X X) 1 X ε)((x X) 1 X ε) ] = E[(X X) 1 X εε X(X X) 1 ] = (X X) 1 X E[εε ]X(X X) 1 = (X X) 1 X ΩX(X X) 1, which cannot be simplified further and requires knowledge of Ω to estimate.
90 Efficiency Because observations with low variance will contain more information about the parameters than observations with high variance, an estimator which weighs all observations equally, like OLS, will not be the most efficient. (Davidson & MacKinnon 1999: 197)
91 : solution When the type of heteroscedasticity is known, we can often transform the data. An example is the multiplication with n j of each term in the equation for the group means regression.
92 : solution When the type of heteroscedasticity is known, we can often transform the data. An example is the multiplication with n j of each term in the equation for the group means regression. Another example: if var(ε i ) = σ 2 x 2 i1, then var(ε i/x i1 ) = σ 2, so: y i = β 0 +β 1 x i1 +β 2 x i2 +ε i y i 1 x i1 x i2 = β 0 +β 1 +β 2 + ε i x i1 x i1 x i1 x i1 yi = β 1 +β 0 xi1 +β 1 xi2 +ε i (note the intercepts interpretation) x i1 (Thomas 1985, 98)
93 Generalized Least Squares More in general, if var(ε i ) = σ 2 λ i, with λ i being some function of X i, then we can always transform our model by dividing all variables by λ i. This is referred to as generalized least squares (). (It is a generalization, because of λ i = 1, we have OLS.) With, observations with lower σ 2 are weighted more heavily. (Thomas 1985, 98; Judge et al 1985, 421)
94 Estimated Generalized Least Squares To perform estimation, σi 2 has to be known. In some cases we can estimate σi 2, in which case we talk of estimated generalized least squares (E).
95 Estimated Generalized Least Squares To perform estimation, σi 2 has to be known. In some cases we can estimate σi 2, in which case we talk of estimated generalized least squares (E). To estimate a model with minimal restrictions on σi 2, we are estimating a model with n+k unknown parameters - i.e. the number of parameters to be estimated increases as n increases and the estimator is by definition inconsistent. (Judge et al 1985, 423)
96 Estimated Generalized Least Squares Special cases where estimation might be possible: σ 2 constant within subgroups σ 2 = (Zα) 2, i.e. σ is linear function of exogenous variables σ 2 = Zα, i.e. σ 2 is linear function of exogenous variables σ 2 = σ 2 (Xβ) p, i.e. var(y) is proportional to a power of its expectation σ 2 = e Zα, multiplicative heteroscedasticity e t = v t α 0 +α 1 et 1 2, autoregressive conditional heteroscedasticity (ARCH) See Judge et al (1985, 424ff) for an overview of estimators.
97 When the form of the heteroscedasticity is unknown, we can get consistent estimates of var(ˆβ OLS ) using a heteroscedasticity consistent covariance matrix (HCCM).
98 When the form of the heteroscedasticity is unknown, we can get consistent estimates of var(ˆβ OLS ) using a heteroscedasticity consistent covariance matrix (HCCM). var(ˆβ OLS ) = (X X) 1 X ΩX(X X) 1 HCCM: estimate ˆω ii = (e i 0) 2 = ei 2, so that we have variance estimator var(ˆβ OLS ) = (X X) 1 X diag(ei 2 )X(X X) 1
99 Since there are several variations, this is called HC0 in the Specification When the form of the heteroscedasticity is unknown, we can get consistent estimates of var(ˆβ OLS ) using a heteroscedasticity consistent covariance matrix (HCCM). var(ˆβ OLS ) = (X X) 1 X ΩX(X X) 1 HCCM: estimate ˆω ii = (e i 0) 2 = ei 2, so that we have variance estimator var(ˆβ OLS ) = (X X) 1 X diag(ei 2 )X(X X) 1
100 Residuals vs errors Note that: h ii = x i (X X) 1 x i var(e i ) = σ 2 (1 h ii ) σ 2, therefore var(e i ) underestimates σ 2
101 Residuals vs errors Note that: h ii = x i (X X) 1 x i var(e i ) = σ 2 (1 h ii ) σ 2, therefore var(e i ) underestimates σ 2 and even when the errors (ε) are homoscedastic, the residuals (e) are not.
102 Residuals vs errors Note that: h ii = x i (X X) 1 x i var(e i ) = σ 2 (1 h ii ) σ 2, therefore var(e i ) underestimates σ 2 and even when the errors (ε) are homoscedastic, the residuals (e) are not. So ei 2, used in White s HC0, is, even though consistent, a biased estimator. The small sample properties turn out not to be very good. (Long & Ervin 2000)
103 HCCM variations HC0 = (X X) 1 X diag(e 2 i )X(X X) 1 HC1 = n n k (X X) 1 X diag(e 2 i )X(X X) 1 = n n k HC0 HC2 = (X X) 1 X diag( )X(X X) 1 1 h ii HC3 = (X X) 1 X diag( (1 h ii ) 2)X(X X) 1 Based on Monte Carlo analyses, HC3 is best in small samples. e 2 i e 2 i (Long & Ervin 2000)
104 in R library(car) m <- lm(...) summary(m) vcov <- hccm(m, type="hc3") sqrt(diag(vcov)) (See notes for manual version.)
105 Another solution for dealing with heteroscedasticity is to bootstrap to acquire standard errors. se <- NULL for (i in 1:1000) { sel <- sample(1:n, n, TRUE) mbs <- lm(y ~ x1 + x2, data=data[sel,]) se <- rbind(se, sqrt(diag(vcov(mbs)))) } colmeans(se)
106 Exercise Using p4factor.csv data and model dem i = β 0 +β 1 cwar i +β 3 laggdppc i +β 4 pnbdem i +β 6 egr i +ε i, calculate the standard errors using normal OLS estimation; the four HCCM variations; bootstrapping.
107 Residual plots: heteroscedasticity To detect heteroscedasticity (unequal variances), it is useful to plot: Residuals against fitted values Residuals against dependent variable Residuals against independent variable(s) Usually, the first one is sufficient to detect heteroscedasticity, and can simply be found by: m <- lm(y ~ x) plot(m)
108 Residual plots: heteroscedasticity y
109 Residual plots: heteroscedasticity residuals(m)
110 Residual plots: heteroscedasticity residuals(m)
111 Residual plots: heteroscedasticity residuals(m)
112 Residual plots: homoscedasticity y
113 Residual plots: homoscedasticity residuals(m)
114 Residual plots: homoscedasticity residuals(m)
115 Residual plots: homoscedasticity residuals(m)
116 Residual plots: heteroscedasticity y
117 Residual plots: heteroscedasticity residuals(m)
118 Residual plots: heteroscedasticity residuals(m)
119 Residual plots: heteroscedasticity residuals(m)
120 Known groups One way of testing for heteroscedasticity is if you expect that the variances might differ between two groups, is to run two separate regressions, for the two groups: SSR 1 /(n 1 k) SSR 2 /(n 2 k) F(n 1 k,n 2 k) H 0 : σ 2 1 = σ 2 2 (Wallace & Silver 1988: 267)
121 Known groups For example, f <- dem ~ cwar + laggdppc + pnbdem m1 <- lm(f, data=p4, subset=(bautlag == 1)) m2 <- lm(f, data=p4, subset=(bautlag == 0)) ssr1 <- sum(residuals(m1)^2) ssr2 <- sum(residuals(m2)^2) F <- (ssr1/m1$df) / (ssr2/m2$df) 1 - pf(f, m1$df, m2$df)
122 Breusch-Pagan test σi 2 = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t.
123 Breusch-Pagan test σ 2 i = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t. So this includes scenarios where σ 2 i = (Zα) 2, or σ i = Zα, or σ i = e Zα.
124 Breusch-Pagan test σ 2 i = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t. So this includes scenarios where σ 2 i = (Zα) 2, or σ i = Zα, or σ i = e Zα. If Z contains dummies for groups, it also includes heteroscedasticity due to different variances across subgroups.
125 Breusch-Pagan test σ 2 i = f(zα) α = [ α 0 α ] H 0 : α = 0 H 1 : α 0 with f(zα) being any function of Zα that does not depend on t. So this includes scenarios where σ 2 i = (Zα) 2, or σ i = Zα, or σ i = e Zα. If Z contains dummies for groups, it also includes heteroscedasticity due to different variances across subgroups. Assumes e 2 i N(0,σ 2 i )
126 Breusch-Pagan test η = q Z(Z Z) 1 Z q 2ˆσ 4 χ 2 (s 1) asymptotically, where q i = e 2 i ˆσ 2 ˆσ 2 = 1 n e e and Z n s a matrix of exogenous variables.
127 Breusch-Pagan test m <- lm(y~x) r <- residuals(m) s2 <- t(r) %*% r / n q <- r^2 - s2 Z <- cbind(1,x) s <- dim(z)[2] eta <- (t(q) %*% Z %*% solve(t(z) %*% Z) %*% t(z) %*% q) eta <- eta / (2 * s2 ^2) p <- 1 - pchisq(eta, s-1) bptest(m, studentize=f)
128 Breusch-Pagan test With more than one independent variable, an alternative approach is to look at an auxiliary regression: e 2 i = γ 0 +γ 1 ŷ 2 i +v i If the model is homoscedastic and the variance is unrelated to ŷ, then H 0 : γ 1 = 0. For this regression, nr 2 χ 2 (1). summary(lm(residuals(m)^2 ~ fitted(m)))$r.sq * n (Thomas 1985, 96-97)
129 Breusch-Pagan test library(lmtest) m <- lm(y ~ x) bptest(m) bptest(m, ~ z1 + z2) By default, R assumes Z = X.
130 Goldfeld-Quandt test To run a Goldfeld-Quandt test: 1 Omit r central observations from the data 2 Run two separate regressions, one for the first (n r)/2 observations and one for the last 3 Calculate R = SSR 1 /SSR 2 4 Perform test based on R F( 1 2 (n r 2k), 1 2 (n r 2k)). (Judge et al 1985, 449)
131 Goldfeld-Quandt test m <- lm(y ~ x) m1 <- lm(y[1:20] ~ x[1:20]) m2 <- lm(y[(n-20):n] ~ x[(n-20):n]) ssr1 <- sum(residuals(m1)^2) ssr2 <- sum(residuals(m2)^2) R <- ssr1/ssr2 p1 <- 1 - pf(r, 18, 18) p2 <- 1 - pf(1/r, 18, 18) library(lmtest) gqtest(m, n-40)
132 Harrison-McCabe test For the above tests we always run several regressions, because even if errors are uncorrelated, residuals are not independent. If residuals are not independent, a ratio of subsets of these residuals do not have an F-distribution, while if we run separate regressions, the residuals will be independent (if the errors are) and such a ratio will have an F-distribution.
133 Harrison-McCabe test For the above tests we always run several regressions, because even if errors are uncorrelated, residuals are not independent. If residuals are not independent, a ratio of subsets of these residuals do not have an F-distribution, while if we run separate regressions, the residuals will be independent (if the errors are) and such a ratio will have an F-distribution. Harrison & McCabe (1979) suggest that such a ratio of subsets of the residuals do have a β-distribution, however.
134 Harrison-McCabe test Harrison McCabe statistic Upper bound HMC statistic Lower bound (two tailed test) E(b) 95% C.I
135 White s test One solution for dealing with heteroscedasticity is calculating White s heteroscedasticity-corrected standard errors. The reasoning behind the White test is very straightforward: if there is homoscedasticity, the corrected standard errors should not be significantly different from the normal ones.
136 White s test 1 Regress e 2 i on x i, all the variables in x i squared, and all cross-products of x i ; 2 Perform test on basis of nr 2 χ 2 (p 1), whereby p is the number of regressors in the auxiliary regression (Greene 2003, 222)
137 White s test 1 Regress e 2 i on x i, all the variables in x i squared, and all cross-products of x i ; e.g. if then run regression y i = β 0 +β 1 x i1 +β 2 x i2 +ε i e 2 i = γ 0 +γ 1 x i1 +γ 2 x i2 +γ 3 x 2 i1 +γ 4 x 2 i2 +γ 5 x i1 x i2 +v i and calculate R 2 ; 2 Perform test on basis of nr 2 χ 2 (p 1), whereby p is the number of regressors in the auxiliary regression (6 in the example). (Greene 2003, 222)
138 White s test in R m <- lm(y ~ x1 + x2) bptest(m, ~ x1 * x2 + I(x1^2) + I(x2^2)) I.e. there does not appear to be an implementation of White s test in R, but it is equivalent to the Breusch-Pagan test with the independent variables as discussed.
139 tests In general, many of these tests require some idea about the shape of the heteroscedasticity;
140 tests In general, many of these tests require some idea about the shape of the heteroscedasticity; many of these test have weak power, depending on the type of heteroscedasticity;
141 tests In general, many of these tests require some idea about the shape of the heteroscedasticity; many of these test have weak power, depending on the type of heteroscedasticity; if there is good reason to suspect heteroscedasticity, it is generally better to just use some robust estimation rather than test first - the tests are not reliable enough.
142 Exercise Using p4factor.csv data in 1998 and model dem i = β 0 +β 1 cwar i +β 3 laggdppc i +β 4 pnbdem i +β 6 egr i +ε i, run tests for heteroscedasticity: Breusch-Pagan; Goldfeld-Quandt; Harrison-McCabe; White.
143 Exercise Test the following model for heteroscedasticity and calculate corrected standard errors: library(lmtest) data(unemployment) myunemployment <- window(unemployment, start=1895, end=1956) time <- 6:67 modelrea <- UN ~ log(m/p) + log(g) + log(x) + time m <- lm(modelrea, data = myunemployment)
Advanced Quantitative Methods: Regression diagnostics
Advanced Quantitative Methods: Regression diagnostics Johan A. Elkink University College Dublin 9 February 2018 1, leverage, influence 2 3 Heteroscedasticity 4 1, leverage, influence 2 3 Heteroscedasticity
More informationQuantitative Methods I: Regression diagnostics
Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear
More informationAdvanced Quantitative Methods: ordinary least squares
Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationTopic 7: Heteroskedasticity
Topic 7: Heteroskedasticity Advanced Econometrics (I Dong Chen School of Economics, Peking University Introduction If the disturbance variance is not constant across observations, the regression is heteroskedastic
More informationMultiple Regression Analysis. Part III. Multiple Regression Analysis
Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant
More informationEconometrics - 30C00200
Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business
More informationLecture 4: Heteroskedasticity
Lecture 4: Heteroskedasticity Econometric Methods Warsaw School of Economics (4) Heteroskedasticity 1 / 24 Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity White Goldfeld-Quandt Breusch-Pagan
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationIntermediate Econometrics
Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the
More informationLecture 4: Regression Analysis
Lecture 4: Regression Analysis 1 Regression Regression is a multivariate analysis, i.e., we are interested in relationship between several variables. For corporate audience, it is sufficient to show correlation.
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationReview of Econometrics
Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationOutline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity
1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationDealing with Heteroskedasticity
Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing
More informationHeteroskedasticity and Autocorrelation
Lesson 7 Heteroskedasticity and Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 7. Heteroskedasticity
More informationIntroduction to regression. Paul Schrimpf. Paul Schrimpf. UBC Economics 326. January 23, 2018
Introduction UBC Economics 326 January 23, 2018 Review of last week Expectations and conditional expectations Linear Iterated expectations Asymptotics using large sample distribution to approximate finite
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationEconometrics Multiple Regression Analysis: Heteroskedasticity
Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties
More information2017 Source of Foreign Income Earned By Fund
2017 Source of Foreign Income Earned By Fund Putnam Emerging Markets Equity Fund EIN: 26-2670607 FYE: 08/31/2017 Statement Pursuant to 1.853-4: The fund is hereby electing to apply code section 853 for
More informationMultivariate Regression Analysis
Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationMulticollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.
Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear
More informationHow Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa
How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa 1 Outline Focus of the study Data Dispersion and forecast errors during turning points Testing efficiency
More informationLECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity
LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationThe general linear regression with k explanatory variables is just an extension of the simple regression as follows
3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because
More informationNote on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin
Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationLinear Regression. Junhui Qian. October 27, 2014
Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency
More informationØkonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning
Økonomisk Kandidateksamen 2004 (I) Econometrics 2 Rettevejledning This is a closed-book exam (uden hjælpemidler). Answer all questions! The group of questions 1 to 4 have equal weight. Within each group,
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationMotivation for multiple regression
Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope
More informationMa 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA
Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4
More informationRegression Diagnostics for Survey Data
Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics
More informationLecture 1: OLS derivations and inference
Lecture 1: OLS derivations and inference Econometric Methods Warsaw School of Economics (1) OLS 1 / 43 Outline 1 Introduction Course information Econometrics: a reminder Preliminary data exploration 2
More informationDealing With Endogeneity
Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More informationIntroduction to Econometrics. Heteroskedasticity
Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory
More informationRegression diagnostics
Regression diagnostics Leiden University Leiden, 30 April 2018 Outline 1 Error assumptions Introduction Variance Normality 2 Residual vs error Outliers Influential observations Introduction Errors and
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationECON Introductory Econometrics. Lecture 13: Internal and external validity
ECON4150 - Introductory Econometrics Lecture 13: Internal and external validity Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 9 Lecture outline 2 Definitions of internal and external
More informationLecture 5: Omitted Variables, Dummy Variables and Multicollinearity
Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationRemedial Measures for Multiple Linear Regression Models
Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline
More informationGraduate Econometrics Lecture 4: Heteroskedasticity
Graduate Econometrics Lecture 4: Heteroskedasticity Department of Economics University of Gothenburg November 30, 2014 1/43 and Autocorrelation Consequences for OLS Estimator Begin from the linear model
More informationSTAT 4385 Topic 06: Model Diagnostics
STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction
More informationInstrumental Variables, Simultaneous and Systems of Equations
Chapter 6 Instrumental Variables, Simultaneous and Systems of Equations 61 Instrumental variables In the linear regression model y i = x iβ + ε i (61) we have been assuming that bf x i and ε i are uncorrelated
More informationEmpirical Economic Research, Part II
Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationThe Simple Regression Model. Part II. The Simple Regression Model
Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationL7: Multicollinearity
L7: Multicollinearity Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Introduction ï Example Whats wrong with it? Assume we have this data Y
More informationMulticollinearity and A Ridge Parameter Estimation Approach
Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com
More information04 June Dim A W V Total. Total Laser Met
4 June 218 Member State State as on 4 June 218 Acronyms are listed in the last page of this document. AUV Mass and Related Quantities Length PR T TF EM Mass Dens Pres F Torq Visc H Grav FF Dim A W V Total
More informationPANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1
PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,
More informationIris Wang.
Chapter 10: Multicollinearity Iris Wang iris.wang@kau.se Econometric problems Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences?
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 49 Outline 1 How to check assumptions 2 / 49 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More informationEconometrics. 9) Heteroscedasticity and autocorrelation
30C00200 Econometrics 9) Heteroscedasticity and autocorrelation Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Heteroscedasticity Possible causes Testing for
More information2 Prediction and Analysis of Variance
2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering
More informationEconometrics of Panel Data
Econometrics of Panel Data Jakub Mućk Meeting # 4 Jakub Mućk Econometrics of Panel Data Meeting # 4 1 / 30 Outline 1 Two-way Error Component Model Fixed effects model Random effects model 2 Non-spherical
More informationChapter 2 Multiple Regression I (Part 1)
Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationEconometrics Master in Business and Quantitative Methods
Econometrics Master in Business and Quantitative Methods Helena Veiga Universidad Carlos III de Madrid Models with discrete dependent variables and applications of panel data methods in all fields of economics
More informationLeast Squares Estimation-Finite-Sample Properties
Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationEcon 510 B. Brown Spring 2014 Final Exam Answers
Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationHeteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.
Heteroskedasticity y i = β + β x i + β x i +... + β k x ki + e i where E(e i ) σ, non-constant variance. Common problem with samples over individuals. ê i e ˆi x k x k AREC-ECON 535 Lec F Suppose y i =
More informationRef.: Spring SOS3003 Applied data analysis for social science Lecture note
SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors
ECON4150 - Introductory Econometrics Lecture 6: OLS with Multiple Regressors Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 6 Lecture outline 2 Violation of first Least Squares assumption
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More informationPanel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63
1 / 63 Panel Data Models Chapter 5 Financial Econometrics Michael Hauser WS17/18 2 / 63 Content Data structures: Times series, cross sectional, panel data, pooled data Static linear panel data models:
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 17, 2012 Outline Heteroskedasticity
More informationReference: Davidson and MacKinnon Ch 2. In particular page
RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy
More informationReview: Second Half of Course Stat 704: Data Analysis I, Fall 2014
Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions
More informationThe regression model with one fixed regressor cont d
The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8
More informationSolow model: Convergence
Solow model: Convergence Per capita income k(0)>k* Assume same s, δ, & n, but no technical progress y* k(0)=k* k(0) k Assume same s, δ, &
More informationEconometrics of Panel Data
Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within
More informationVolume 31, Issue 1. Mean-reverting behavior of consumption-income ratio in OECD countries: evidence from SURADF panel unit root tests
Volume 3, Issue Mean-reverting behavior of consumption-income ratio in OECD countries: evidence from SURADF panel unit root tests Shu-Yi Liao Department of Applied Economics, National Chung sing University,
More informationETH Zürich, October 25, 2010
Marcel Dettling Institute for Data nalysis and Process Design Zurich University of pplied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling t t th h/ ttli ETH Zürich, October 25, 2010 1 Mortality
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationGov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics
More informationPOLSCI 702 Non-Normality and Heteroskedasticity
Goals of this Lecture POLSCI 702 Non-Normality and Heteroskedasticity Dave Armstrong University of Wisconsin Milwaukee Department of Political Science e: armstrod@uwm.edu w: www.quantoid.net/uwm702.html
More informationTesting Linear Restrictions: cont.
Testing Linear Restrictions: cont. The F-statistic is closely connected with the R of the regression. In fact, if we are testing q linear restriction, can write the F-stastic as F = (R u R r)=q ( R u)=(n
More informationReliability of inference (1 of 2 lectures)
Reliability of inference (1 of 2 lectures) Ragnar Nymoen University of Oslo 5 March 2013 1 / 19 This lecture (#13 and 14): I The optimality of the OLS estimators and tests depend on the assumptions of
More informationLinear Regression Models
Linear Regression Models November 13, 2018 1 / 89 1 Basic framework Model specification and assumptions Parameter estimation: least squares method Coefficient of determination R 2 Properties of the least
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More information