Applied Statistics and Econometrics

Size: px
Start display at page:

Download "Applied Statistics and Econometrics"

Transcription

1 Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September / 53

2 Omitted variable bias The model with a single regressor is Y = β 0 + β 1 X + u. The error u arises because of factors that influence Y but are not included in the regression. These exluded factors are omitted variables from the regression. There are always omitted variables, and sometimes this can lead to a bias in the OLS estimator. We will study when such a bias arises and the likely direction of this bias. Saul Lach () Applied Statistics and Econometrics September / 53 Example: testscores and STR The estimated regression line of the test score-class size relationship is testscore = STR A likely omitted variable here is family income. Suppose that in high-income districts classes are smaller and test scores are higher. Is 2.28 a credible estimate of the causal effect on test scores of a change in the student-teacher ratio? Probably not because it is likely that the estimated effect of STR also reflects the impact on test scores of variations in income across districts. Districts with smaller STR have higher tests due to higher income. Thus 2.78 is a larger effect (in abolute value) than the true causal effect of class size. Saul Lach () Applied Statistics and Econometrics September / 53

3 Omitted variable bias The bias in the OLS estimator that occurs as a result of an omitted factor is called the Omitted Variable Bias (OVB). Given that that there are always omitted variables, it is important to understand when such a OVB occurs? For OVB to occur, the omitted factor, which we call Z, must satisfy two conditions: 1 Z is a determinant of Y (i.e., Z is part of u) 2 Z is correlated with the regressor X. Both conditions must hold for the omission of Z to result in omitted variable bias. Saul Lach () Applied Statistics and Econometrics September / 53 Omitted variable bias: test scores and class size Another omitted variable could be English language ability 1 English language ability (whether the student has English as a second language) plausibly affects standardized test scores: Z is a determinant of Y. 2 Immigrant communities tend to be less affl uent and thus have smaller school budgets and higher STR: Z is correlated with X. Accordingly, ˆβ 1 is biased: what is the direction of the bias? That is, what is the sign of this bias? If intuition fails you, there is a formula... soon. Saul Lach () Applied Statistics and Econometrics September / 53

4 Conditions for OVB in CASchools data Sometimes we can actually check these conditions (at least in a given sample). The California School dataset has data on the percentage of students learning English. The variable is el_pct. Variable Obs Mean Std. Dev. Min Max el_pct Is this variable correlated with STR and testscore (at least in this sample)? correlate el_pct str testscr el_pct str testscr el_pct 1 str testscr Saul Lach () Applied Statistics and Econometrics September / 53 Conditions for OVB in CASchools data testscore testscore str str english english Districts with lower percent English learners have higher test scores. Districts with lower percent English learners have smaller classes. Saul Lach () Applied Statistics and Econometrics September / 53

5 OVB formula Recall from Lecture 4 (Preliminary algebra 3 slide) that we can write ˆβ 1 β 1 = n 1 i=1(x i X )u i n i=1(x i X ) 2 = n n i=1(x i X )u i 1 n n i=1(x i X ) 2 1 n = n i=1(x i X )u i n 1 n s2 X Under assumptions LS2 and LS3 we have ˆβ 1 p β1 + Cov(X, u) Var(x) }{{} OBV If LS1 holds then Cov(X, u) = 0 and ˆβ 1 E ( ) ˆβ 1 = β1 ). If LS1 does not hold then Cov(X, u) = 0 and ˆβ 1 (and also E ( ) ˆβ 1 = β1 ). p β1 (and also p β1 + Cov (X,u) Var (x ) Saul Lach () Applied Statistics and Econometrics September / 53 OVB formula in terms of omitted variable Previous formula is in terms of the error term u and, although it is called the OVB, it is more general in that the formula is correct irrespective of the reason for the correlation (or covariance) between u and X. Suppose we now that we assert that a variable Z is omitted from the regression. We are then saying that Z is part of u and we w.l.o.g. we can then write u = β 2 Z + ε where β 2 is a coeffi cient. Then Cov(X, u) = Cov(X, β 2 Z + ε) = β 2 Cov(X, Z ) assuming ε is uncorrelated with X. Saul Lach () Applied Statistics and Econometrics September / 53

6 OVB formula in terms of omitted variable The OVB formula in this case becomes ˆβ 1 p β1 + β 2 Cov(X, Z ) Var(x) }{{} OBV The math makes clear the two conditions for an OVB 1 Z is a determinant of Y = β 2 = 0. 2 Z is correlated with the regressor X = Cov(X, Z ) = 0. Saul Lach () Applied Statistics and Econometrics September / 53 OVB formula: correlation version An alternative formulation for the OVB formula is in terms of the correlation rather than the covariance: p Cov(X, u) ˆβ 1 β1 + Var(x) }{{} OBV p Cov(X, Z ) ˆβ 1 β1 + β 2 Var(x) }{{} OBV = β 1 + ρ Xu σ u σ X }{{} OBV = β 1 + β 2 ρ XZ σ Z σ X }{{} OBV Saul Lach () Applied Statistics and Econometrics September / 53

7 OVB formula in test score-class size example We usually use the OVB formula to try to sign the direction of the bias. For example, when Z is the % of English Learners it is likely that β 2 < 0 (also sample correlation suggest this). And ρ XZ is likely to be positive, ρ XZ > 0 (also suggested by sample correlation). Thus, β 2 ρ XZ ( ) (+) σ Z σ X < 0 so that ˆβ 1 is smaller than the true parameter β 1. Ignoring English learners overstates (in an absolute sense) the class size effect. What is the likely sign of the bias when Z is family income? Saul Lach () Applied Statistics and Econometrics September / 53 Three ways to overcome omitted variable bias 1. Run a randomized controlled experiment in which treatment (STR) is randomly assigned: then el_pct is still a determinant of testscore, but el_pct is uncorrelated with STR. Such random experiments are unrealistic in practice. Saul Lach () Applied Statistics and Econometrics September / 53

8 Three ways to overcome omitted variable bias 2. Adopt the cross tabulation approach: divide sample into groups having approx. same value of el_pct and analyze within groups. Problems: 1) soon we will run out of data, 2) there are other determinants (e.g., family income,parental education) that are omitted. Saul Lach () Applied Statistics and Econometrics September / 53 Three ways to overcome omitted variable bias 3. Use a regression in which the omitted variable (el_pct) is no longer omitted: include el_pct as an additional regressor in a multiple regression. This is the approach we will focus on. Saul Lach () Applied Statistics and Econometrics September / 53

9 Where are we? 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September / 53 The multiple regression model The population regression model (or function) is Y = β 0 + β 1 X 1 + β 2 X β k X k + u Y is the dependent variable. X 1, X 2,..., X k are the k independent variables (regressors). β 0 is the (unknown) intercept and β 1,..., β k are the various (unknown) slopes. u is the regression error reflecting other omitted factors affecting Y. We assume right away that E (u X 1, X 2,... X k ) = 0 so that the population regression line is the C.E. of Y given the k X s and the slope parameters can be interpreted as causal effects. Saul Lach () Applied Statistics and Econometrics September / 53

10 Interpretation of coeffi cients (slopes) in multiple regression Consider changing X 1 from x 1 to x 1 + 1, while holding all the other X s fixed. Before the change we have After the change we have The difference is E (Y X 1 = x 1,..., X k = x k ) = β 0 + β 1 x 1 + β 2 x β k x k E (Y X 1 = x 1 + 1,..., X k = x k ) = β 0 + β 1 (x ) + β 2 x β k x k E (Y X 1 = x 1 +,..., X k = x k ) E (Y X 1 = x 1,..., X k = x k ) = β 1 1 Saul Lach () Applied Statistics and Econometrics September / 53 Interpretation of coeffi cients (slopes) in multiple regression When 1 = 1 we have E (Y X 1 = x 1 + 1,..., X k = x k ) E (Y X 1 = x 1,..., X k = x k ) = β 1 β 1 measures the effect on (expected) Y of a unit change in X 1, holding the other regressors X 2,... X k fixed (we also say controlling for X 2,... X k ). Whether this partial effect can be given a causal interpretation depends on what we assume for E (u X 1, X 2,... X k ). If E (u X 1, X 2,... X k ) is constant, as assumed here, then β 1 is the causal effect of X 1 on Y. Otherwise, it is not a causal effect. Why? Same interpretation for β j, j = 2,..., k. Saul Lach () Applied Statistics and Econometrics September / 53

11 The multiple regression model in the sample The regression model (or function) in the sample is Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i i = 1,..., n The i th observation in the sample is (Y i, X 1i, X 2i,..., X ki ). Saul Lach () Applied Statistics and Econometrics September / 53 Estimation To simplify the presentation we assume that we have two regressors only, k = 2, Y i = β 0 + β 1 X 1i + β 2 X 2i + u i With two regressors, the OLS estimator solves: n Min b 0,b 1,b 2 (Y i (b 0 + b 1 X 1i + b 2 X 2i )) 2 i=1 The OLS estimator minimizes the average squared difference between the actual values of Y i and the prediction (predicted value), b 0 + b 1 X 1i + b 2 X 2i, based on such b s. This minimization problem is solved using calculus. The result is the OLS estimators of β 0, β 1, β 2 denoted, respectively, by ˆβ 0, ˆβ 1, ˆβ 2. Generalization of the case with one regressor (k = 1). Saul Lach () Applied Statistics and Econometrics September / 53

12 Graphic intuition n min b 0,b 1 (Y i b 0 b 1 X 1i ) 2 i=1 Fits line through points in R 2 n min b 0,b 1,b 2 (Y i b 0 b 1 X 1i b 2 X 2i ) 2 i=1 Fits plane through points in R 3 testscore testscore english str str Saul Lach () Applied Statistics and Econometrics September / 53 Matrix notation The multiple regression model Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i i = 1,..., n can be written in matrix form as Y = Xβ + u where Y = Y 1 Y 2. X = 1 X 11 X 21 X k1 1 X 12 X 22 X k β = β 0 β 1. u = u 1 u 2. Y n 1 X 1n X 2n X kn β k u n Saul Lach () Applied Statistics and Econometrics September / 53

13 OLS in matrix form Using matrix notation, the minimization of the sum of squared residuals can be compactly written as Min β (Y Xβ) (Y Xβ) And the first order conditions are X (Y Xβ) = 0 = X X }{{} β }{{} (k +1) (k +1) (k +1) 1 = X Y }{{} (k +1) 1 which is a system of linear equations that can be solved for β (recall Ax = b, where A = X X,x = β and c = X Y). The solution is the OLS estimator given by provided X X is invertible. ^β = ( X X ) 1 X Y Saul Lach () Applied Statistics and Econometrics September / 53 Example:the CASchools test score data What happens to the coeffi cient on STR?. reg testscr str Source SS df MS Number of obs = 420 F(1, 418) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = testscr Coef. Std. Err. t P> t [95% Conf. Interval] str _cons reg testscr str el_pct Source SS df MS Number of obs = 420 F(2, 417) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = testscr Coef. Std. Err. t P> t [95% Conf. Interval] str el_pct _cons Saul Lach () Applied Statistics and Econometrics September / 53

14 OLS predicted values and residuals Just as in the single regression model the predicted value is And the residual is Ŷ i = ˆβ 0 + ˆβ 1 X 1i + ˆβ 2 X 2i + + ˆβ k X ki û i = Y i Ŷ i = Y i ( ˆβ 0 + ˆβ 1 X 1i + ˆβ 2 X 2i + + ˆβ k X ki ) So that we can write Y i = Ŷ i + û i = ˆβ 0 + ˆβ 1 X 1i + ˆβ 2 X 2i + + ˆβ k X ki + û i Saul Lach () Applied Statistics and Econometrics September / 53 Where are we? 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September / 53

15 Measures of fit for multiple regression Same measures as before: SER (RMSE)= std. deviation of residual û i R 2 = fraction of variance of Y explained or accounted for by X 1,..., X k. (new!) R 2 is the adjusted R 2 = R 2 adjusted for the number of regressors. Saul Lach () Applied Statistics and Econometrics September / 53 Measures of fit: SER and RMSE As in regression with a single regressors, the SER/RMSE measure the spread of the Y s around the estimated regression line n 1 SER(RMSE ) = n k 1 ûi 2 i=1 Saul Lach () Applied Statistics and Econometrics September / 53

16 Measures of fit: R squared As in the regression with a single regressors, the R 2 is the fraction of the variance of Y accounted for by the model (i.e., by X 1,..., X k ) where ESS = n (Ŷi Ȳ ) 2, i=1 R 2 = ESS TSS = 1 SSR TSS n TSS = (Y i Ȳ ) 2, SSR = i=1 n ûi 2 i=1 The R 2 never decreases when another regressors is added (i.e., when k increases). (why?) Not a good feature for a measure of fit. Saul Lach () Applied Statistics and Econometrics September / 53 Measures of fit: Adjusted R squared The adjusted R 2 R 2 addresses this issue by penalizing you for including another regressor: ( ) n 1 SSR R 2 = 1 n k 1 TSS ( ) = R 2 k SSR n k 1 TSS Note that R 2 < R 2 but their difference tends to vanish for large n. R 2 does not necessarily increase with k (although SSR decreases, n 1 n k 1 increases). R 2 can be negative! Saul Lach () Applied Statistics and Econometrics September / 53

17 How to interpret the simple and adjusted R squared? A high R 2 (or R 2 ) means that the regressors account for the variation in Y. A high R 2 (or R 2 ) does not mean that you have eliminated omitted variable bias. A high R 2 (or R 2 ) does not mean that you have an unbiased estimator of a causal effect. A high R 2 (or R 2 ) does not mean that the included variables are statistically significant this must be determined using hypotheses tests. Maximize R 2 (or R 2 ) is not a a criterion we use to select regressors. Saul Lach () Applied Statistics and Econometrics September / 53 CASchools data example Regression of testscore against STR: testscore = str, R 2 = Regression of testscore against STR and el_pct: testscore = str 0.65el_pct, R 2 = Adding the % of English Learners substantially improves fit of the regression. Both regressors account for almost 43% of the variation of testscores across districts. Saul Lach () Applied Statistics and Econometrics September / 53

18 Where are we? 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September / 53 The Least Squares Assumptions for multiple regression The multiple regression model is Y = β 0 + β 1 X 1 + β 2 X β k X k + u The four least squares assumptions are: Assumption #1 The conditional distribution of u given all the X s has mean zero, that is, E (u X 1 = x 1,..., X k = x k ) = 0 for all (x 1,... x k ). Assumption #2 (Y i, X 1i,..., X ki ), i = 1,..., n, are i.i.d. Assumption #3 Large outliers in Y and X are unlikely. X 1,..., X k and Y have finite fourth moments, E ( Y 4) <, E ( X 4 1 ) <,..., E ( X 4 k ) < Assumpiton #4 There is no perfect multicollinearity. Saul Lach () Applied Statistics and Econometrics September / 53

19 Assumption #1: mean independence E (u X 1 = x 1,..., X k = x k ) = 0 Same interpretation as in the regression with a single regressor. This assumption gives a causal interpretation to the parameters (the β s). If an omitted variable (a) belongs in the equation (so is in u) and (b) is correlated with an included X, then this condition fails and there is OVB (omitted variable bias). The solution if possible is to include the omitted variable in the regression. Usually, this assumption is more likely to hold when one controls for more factors by including them in the regression. Saul Lach () Applied Statistics and Econometrics September / 53 Assumption #2: i.i.d. sample Same assumption as in the single regressor model. This is satisfied automatically if the data are collected by simple random sampling. Saul Lach () Applied Statistics and Econometrics September / 53

20 Assumption #3: large outliers are unlikely Same assumption as in the single regressor model. OLS can be sensitive to large outliers. It is recommended to check the data (via scatterplots, etc.) to make sure there are no large outliers (due to typos, coding errors, etc.) This is technical assumption satisfied automatically by variables with a bounded domain. Saul Lach () Applied Statistics and Econometrics September / 53 Assumption #4: No perfect multicollinearity New assumption that applies when there is more than a single regressor. Perfect multicollinearity occurs when one of the regressors is an exact linear function of the other regressors. Assumption #4 rules this out. Cannot estimate the effect of, say, X 1 holding all other variables constant if one of these variables is a perfect linear function of X 1. When there is perfect multicollinearity, the statistical software will let you know it by either crashing, by giving an error message, or by dropping one of the regressors arbitrarily. Saul Lach () Applied Statistics and Econometrics September / 53

21 Including a perfectly collinear regressor in R Example: generate str_new=5+.2*str and add it to regression. What happens? Stata drops one fo the collinear variables.. g str_new=5+.2*str. reg testscr str str_new note: str_new omitted because of collinearity Source SS df MS Number of obs = 420 F(1, 418) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = testscr Coef. Std. Err. t P> t [95% Conf. Interval] str str_new 0 (omitted) _cons Saul Lach () Applied Statistics and Econometrics September / 53 The dummy variable trap Suppose you have a set of multiple binary (dummy) variables, which are mutually exclusive and exhaustive that is, there are multiple categories and every observation falls in one and only one category (think of region of residence: Sicily, Lazio, Tuscany, etc.). If you include all these dummy variables and a constant in the regression, you will have perfect multicollinearity this is sometimes called the dummy variable trap. Why is there perfect multicollinearity here? Solutions to the dummy variable trap: 1 Omit one of the groups (e.g. Lazio), or 2 Omit the intercept What are the implications of (1) or (2) for the interpretation of the coeffi cients? We will analyze this later in an example. Saul Lach () Applied Statistics and Econometrics September / 53

22 Assumption #4: No perfect multicollinearity Perfect multicollinearity usually reflects a mistake in the definitions of the regressors, or an oddity in the data. The solution to perfect multicollinearity is to modify your list of regressors so that you no longer have perfect multicollinearity. Saul Lach () Applied Statistics and Econometrics September / 53 Imperfect multicollinearity Imperfect and perfect multicollinearity are quite different despite the similarity of their names. Imperfect multicollinearity occurs when two or more regressors are very highly (but not perfectly) correlated. Why the term multicollinearity? If two regressors are very highly correlated, then their scatterplot will pretty much look like a straight line they are co-linear" but unless the correlation is exactly ±1, that collinearity is imperfect. Saul Lach () Applied Statistics and Econometrics September / 53

23 Imperfect multicollinearity Imperfect multicollinearity implies that one or more of the regression coeffi cients will be imprecisely estimated. Intuition: the coeffi cient on X 1 is the effect of X 1 holding X 2 constant; but if X 1 and X 2 are highly correlated, there is very little variation in X 1 once X 2 is held constant so the data are pretty much uninformative about what happens when X 1 changes but X 2 doesn t. This means that the variance of the OLS estimator of the coeffi cient on X 1 will be large. Thus, imperfect multicollinearity (correctly) results in large standard errors for one or more of the OLS coeffi cients. Importantly, imperfect multicollinearity does not violate Assumption #4. The OLS regression will run. Saul Lach () Applied Statistics and Econometrics September / 53 Where are we? 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September / 53

24 The sampling distribution of OLS Under the four LS assumptions: 1 ˆβ 0, ˆβ 1,..., ˆβ k are unbiased and consistent estimators of β 0, β 1,..., β k. 2 The joint sampling distribution of ˆβ 0, ˆβ 1,..., ˆβ k is well approximated by a multivariate normal distribution. 3 This implies that, in large samples, for j = 0.1,..., k, ) ˆβ j N (β j, σ 2ˆβj or, equivalently, ˆβ j β j σ 2ˆβj N (0, 1) Saul Lach () Applied Statistics and Econometrics September / 53 The variance of the OLS estimator There is a more complicated formula for the estimator of the variance of ˆβ j...but the software does it for us! As in the single regressor case, there is a formula that holds only when there is homoskedasticity, i.e., when Var (u X 1,..., X k ) is a constant that does not vary with the values of (X 1,..., X k ), and another formula that holds when there is heteroskedasticity. As in the single regressor case, we prefer to use the formula that is robust to heteroskedasticity becasuse it is also correct when there is homoskedasticity. Intuitively we expect our estimator to be less precise (to have higher sampling variance) when using the same data to estimate more parameters. This is indeed correct, and the formula for the variance of ˆβ j (not shown) reflects this intuition as it usually increases with the number of variables (k) included in the regression. This result prevents us form keeping adding regressors without limits. Saul Lach () Applied Statistics and Econometrics September / 53

25 Where are we? 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of fit (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and confidence intervals for a single coeffi cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September / 53 Hypothesis Tests and Confidence Intervals for a Single Coeffi cient This follows the same logic and recipe as for the slope coeffi cient in a single-regressor model. Because ˆβ j β j σ 2ˆβj is approximately distributed N(0, 1) in large samples (under the four LS assumptions), hypotheses on β 1 can be tested using the usual t-statistic t = ˆβ 1 β 1,0 SE ( ˆβ 1 ), and 95% confidence intervals are constructed as { ˆβ 1 ± 1.96 SE ( ˆβ 1 )} Similarly for β 2,..., β k. ˆβ 1 and ˆβ 2 are generally not independently distributed so neither are their t-statistics (more on this later). Saul Lach () Applied Statistics and Econometrics September / 53

26 The California school dataset Single regressor estimates. reg testscr str, robust Linear regression Number of obs = 420 F(1, 418) = Prob > F = R-squared = Root MSE = Robust testscr Coef. Std. Err. t P> t [95% Conf. Interval] str _cons Saul Lach () Applied Statistics and Econometrics September / 53 The California school dataset Multiple regression estimates. reg testscr str el_pct,robust Linear regression Number of obs = 420 F(2, 417) = Prob > F = R-squared = Root MSE = Robust testscr Coef. Std. Err. t P> t [95% Conf. Interval] str el_pct _cons Saul Lach () Applied Statistics and Econometrics September / 53

27 Testing hypotheses and CI in the California school dataset The coeffi cient on STR in the multiple regression is the effect on Testscore of a unit change in STR, holding constant the percentage of English Learners in the district. The coeffi cient on STR falls by one-half (in absolute value) when el_pct is added to the regression (does it make sense?) The 95% confidence interval for the coeffi cient on STR in is { 1.10 ± } = ( 1.95, 0.25) The t-statistic testing H 0 : β STR = 0 is t = ˆβ STR 0 σ 2ˆβSTR = = 2.54 so we reject the null hypothesis at the 5% significance level We use heteroskedasticity-robust standard errors for exactly the same reasons as in the case of a single regressor. Saul Lach () Applied Statistics and Econometrics September / 53

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Chapter 6: Linear Regression With Multiple Regressors

Chapter 6: Linear Regression With Multiple Regressors Chapter 6: Linear Regression With Multiple Regressors 1-1 Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 7 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 68 Outline of Lecture 7 1 Empirical example: Italian labor force

More information

Introduction to Econometrics. Multiple Regression (2016/2017)

Introduction to Econometrics. Multiple Regression (2016/2017) Introduction to Econometrics STAT-S-301 Multiple Regression (016/017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 OLS estimate of the TS/STR relation: OLS estimate of the Test Score/STR relation:

More information

Introduction to Econometrics. Multiple Regression

Introduction to Econometrics. Multiple Regression Introduction to Econometrics The statistical analysis of economic (and related) data STATS301 Multiple Regression Titulaire: Christopher Bruffaerts Assistant: Lorenzo Ricci 1 OLS estimate of the TS/STR

More information

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors ECON4150 - Introductory Econometrics Lecture 6: OLS with Multiple Regressors Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 6 Lecture outline 2 Violation of first Least Squares assumption

More information

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests ECON4150 - Introductory Econometrics Lecture 7: OLS with Multiple Regressors Hypotheses tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 7 Lecture outline 2 Hypothesis test for single

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution

More information

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

More information

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression Outline 1. Hypothesis tests and confidence intervals for a single coefficie. Joint hypothesis tests on multiple coefficients 3.

More information

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Lecture 5. In the last lecture, we covered. This lecture introduces you to Lecture 5 In the last lecture, we covered. homework 2. The linear regression model (4.) 3. Estimating the coefficients (4.2) This lecture introduces you to. Measures of Fit (4.3) 2. The Least Square Assumptions

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 13 Nonlinearities Saul Lach October 2018 Saul Lach () Applied Statistics and Econometrics October 2018 1 / 91 Outline of Lecture 13 1 Nonlinear regression functions

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

Hypothesis Tests and Confidence Intervals. in Multiple Regression

Hypothesis Tests and Confidence Intervals. in Multiple Regression ECON4135, LN6 Hypothesis Tests and Confidence Intervals Outline 1. Why multipple regression? in Multiple Regression (SW Chapter 7) 2. Simpson s paradox (omitted variables bias) 3. Hypothesis tests and

More information

Nonlinear Regression Functions

Nonlinear Regression Functions Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 4.

More information

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor) 1 Multiple Regression Analysis: Estimation Simple linear regression model: an intercept and one explanatory variable (regressor) Y i = β 0 + β 1 X i + u i, i = 1,2,, n Multiple linear regression model:

More information

Hypothesis Tests and Confidence Intervals in Multiple Regression

Hypothesis Tests and Confidence Intervals in Multiple Regression Hypothesis Tests and Confidence Intervals in Multiple Regression (SW Chapter 7) Outline 1. Hypothesis tests and confidence intervals for one coefficient. Joint hypothesis tests on multiple coefficients

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data

Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data 1/2/3-1 1/2/3-2 Brief Overview of the Course Economics suggests important

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 6 Multiple regression model Siv-Elisabeth Skjelbred University of Oslo February 5th Last updated: February 3, 2016 1 / 49 Outline Multiple linear regression model and

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2. Updated: November 17, 2011 Lecturer: Thilo Klein Contact: tk375@cam.ac.uk Contest Quiz 3 Question Sheet In this quiz we will review concepts of linear regression covered in lecture 2. NOTE: Please round

More information

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2016 ECON3150/4150 Spring 2016 Lecture 4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo Last updated: January 26, 2016 1 / 49 Overview These lecture slides covers: The linear regression

More information

The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic)

The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic) The F distribution If: 1. u 1,,u n are normally distributed; and. X i is distributed independently of u i (so in particular u i is homoskedastic) then the homoskedasticity-only F-statistic has the F q,n-k

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平 Econometrics 1 Lecture 8: Linear Regression (2) 黄嘉平 中国经济特区研究中 心讲师 办公室 : 文科楼 1726 E-mail: huangjp@szu.edu.cn Tel: (0755) 2695 0548 Office hour: Mon./Tue. 13:00-14:00 The linear regression model The linear

More information

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C = Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =

More information

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECON Introductory Econometrics. Lecture 16: Instrumental variables ECON4150 - Introductory Econometrics Lecture 16: Instrumental variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 12 Lecture outline 2 OLS assumptions and when they are violated Instrumental

More information

ECO321: Economic Statistics II

ECO321: Economic Statistics II ECO321: Economic Statistics II Chapter 6: Linear Regression a Hiroshi Morita hmorita@hunter.cuny.edu Department of Economics Hunter College, The City University of New York a c 2010 by Hiroshi Morita.

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor ECON4150 - Introductory Econometrics Lecture 4: Linear Regression with One Regressor Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 4 Lecture outline 2 The OLS estimators The effect of

More information

ECON Introductory Econometrics. Lecture 17: Experiments

ECON Introductory Econometrics. Lecture 17: Experiments ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics STAT-S-301 Panel Data (2016/2017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 Regression with Panel Data A panel dataset contains observations on multiple entities

More information

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0 Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct

More information

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias Contact Information Elena Llaudet Sections are voluntary. My office hours are Thursdays 5pm-7pm in Littauer Mezzanine 34-36 (Note room change) You can email me administrative questions to ellaudet@gmail.com.

More information

Multivariate Regression: Part I

Multivariate Regression: Part I Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a

More information

Chapter 2: simple regression model

Chapter 2: simple regression model Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018 Econometrics I KS Module 1: Bivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: March 12, 2018 Alexander Ahammer (JKU) Module 1: Bivariate

More information

Lecture #8 & #9 Multiple regression

Lecture #8 & #9 Multiple regression Lecture #8 & #9 Multiple regression Starting point: Y = f(x 1, X 2,, X k, u) Outcome variable of interest (movie ticket price) a function of several variables. Observables and unobservables. One or more

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

THE MULTIVARIATE LINEAR REGRESSION MODEL

THE MULTIVARIATE LINEAR REGRESSION MODEL THE MULTIVARIATE LINEAR REGRESSION MODEL Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus

More information

Introduction to Econometrics. Regression with Panel Data

Introduction to Econometrics. Regression with Panel Data Introduction to Econometrics The statistical analysis of economic (and related) data STATS301 Regression with Panel Data Titulaire: Christopher Bruffaerts Assistant: Lorenzo Ricci 1 Regression with Panel

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

Simple Linear Regression: The Model

Simple Linear Regression: The Model Simple Linear Regression: The Model task: quantifying the effect of change X in X on Y, with some constant β 1 : Y = β 1 X, linear relationship between X and Y, however, relationship subject to a random

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Lecture notes to Stock and Watson chapter 8

Lecture notes to Stock and Watson chapter 8 Lecture notes to Stock and Watson chapter 8 Nonlinear regression Tore Schweder September 29 TS () LN7 9/9 1 / 2 Example: TestScore Income relation, linear or nonlinear? TS () LN7 9/9 2 / 2 General problem

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Lecture 8: Instrumental Variables Estimation

Lecture 8: Instrumental Variables Estimation Lecture Notes on Advanced Econometrics Lecture 8: Instrumental Variables Estimation Endogenous Variables Consider a population model: y α y + β + β x + β x +... + β x + u i i i i k ik i Takashi Yamano

More information

Empirical Application of Simple Regression (Chapter 2)

Empirical Application of Simple Regression (Chapter 2) Empirical Application of Simple Regression (Chapter 2) 1. The data file is House Data, which can be downloaded from my webpage. 2. Use stata menu File Import Excel Spreadsheet to read the data. Don t forget

More information

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10) Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1 Introductory Econometrics Lecture 13: Hypothesis testing in the multiple regression model, Part 1 Jun Ma School of Economics Renmin University of China October 19, 2016 The model I We consider the classical

More information

Introduction to Econometrics. Review of Probability & Statistics

Introduction to Econometrics. Review of Probability & Statistics 1 Introduction to Econometrics Review of Probability & Statistics Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com Introduction 2 What is Econometrics? Econometrics consists of the application of mathematical

More information

8. Instrumental variables regression

8. Instrumental variables regression 8. Instrumental variables regression Recall: In Section 5 we analyzed five sources of estimation bias arising because the regressor is correlated with the error term Violation of the first OLS assumption

More information

Linear Regression with one Regressor

Linear Regression with one Regressor 1 Linear Regression with one Regressor Covering Chapters 4.1 and 4.2. We ve seen the California test score data before. Now we will try to estimate the marginal effect of STR on SCORE. To motivate these

More information

Lab 07 Introduction to Econometrics

Lab 07 Introduction to Econometrics Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand

More information

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Economics 326 Methods of Empirical Research in Economics Lecture 14: Hypothesis testing in the multiple regression model, Part 2 Vadim Marmer University of British Columbia May 5, 2010 Multiple restrictions

More information

Specification Error: Omitted and Extraneous Variables

Specification Error: Omitted and Extraneous Variables Specification Error: Omitted and Extraneous Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 5, 05 Omitted variable bias. Suppose that the correct

More information

Econometrics -- Final Exam (Sample)

Econometrics -- Final Exam (Sample) Econometrics -- Final Exam (Sample) 1) The sample regression line estimated by OLS A) has an intercept that is equal to zero. B) is the same as the population regression line. C) cannot have negative and

More information

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e Economics 102: Analysis of Economic Data Cameron Spring 2016 Department of Economics, U.C.-Davis Final Exam (A) Tuesday June 7 Compulsory. Closed book. Total of 58 points and worth 45% of course grade.

More information

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation Michele Aquaro University of Warwick This version: July 21, 2016 1 / 31 Reading material Textbook: Introductory

More information

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h. Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h. This is an open book examination where all printed and written resources, in addition to a calculator, are allowed. If you are

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

Econometrics Review questions for exam

Econometrics Review questions for exam Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =

More information

Motivation for multiple regression

Motivation for multiple regression Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope

More information

Lab 6 - Simple Regression

Lab 6 - Simple Regression Lab 6 - Simple Regression Spring 2017 Contents 1 Thinking About Regression 2 2 Regression Output 3 3 Fitted Values 5 4 Residuals 6 5 Functional Forms 8 Updated from Stata tutorials provided by Prof. Cichello

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons Linear Regression with 1 Regressor Introduction to Econometrics Spring 2012 Ken Simons Linear Regression with 1 Regressor 1. The regression equation 2. Estimating the equation 3. Assumptions required for

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u. BOSTON COLLEGE Department of Economics EC 228 Econometrics, Prof. Baum, Ms. Yu, Fall 2003 Problem Set 3 Solutions Problem sets should be your own work. You may work together with classmates, but if you

More information

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity R.G. Pierse 1 Omitted Variables Suppose that the true model is Y i β 1 + β X i + β 3 X 3i + u i, i 1,, n (1.1) where β 3 0 but that the

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

ECON Introductory Econometrics. Lecture 13: Internal and external validity

ECON Introductory Econometrics. Lecture 13: Internal and external validity ECON4150 - Introductory Econometrics Lecture 13: Internal and external validity Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 9 Lecture outline 2 Definitions of internal and external

More information

Lab 11 - Heteroskedasticity

Lab 11 - Heteroskedasticity Lab 11 - Heteroskedasticity Spring 2017 Contents 1 Introduction 2 2 Heteroskedasticity 2 3 Addressing heteroskedasticity in Stata 3 4 Testing for heteroskedasticity 4 5 A simple example 5 1 1 Introduction

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

Econometrics. 8) Instrumental variables

Econometrics. 8) Instrumental variables 30C00200 Econometrics 8) Instrumental variables Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Thery of IV regression Overidentification Two-stage least squates

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Econometrics I Lecture 3: The Simple Linear Regression Model

Econometrics I Lecture 3: The Simple Linear Regression Model Econometrics I Lecture 3: The Simple Linear Regression Model Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1397 1 / 32 Outline Introduction Estimating

More information

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5. Outline 1 Elena Llaudet 2 3 4 October 6, 2010 5 based on Common Mistakes on P. Set 4 lnftmpop = -.72-2.84 higdppc -.25 lackpf +.65 higdppc * lackpf 2 lnftmpop = β 0 + β 1 higdppc + β 2 lackpf + β 3 lackpf

More information

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval] Problem Set #3-Key Sonoma State University Economics 317- Introduction to Econometrics Dr. Cuellar 1. Use the data set Wage1.dta to answer the following questions. a. For the regression model Wage i =

More information

Regression #8: Loose Ends

Regression #8: Loose Ends Regression #8: Loose Ends Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #8 1 / 30 In this lecture we investigate a variety of topics that you are probably familiar with, but need to touch

More information

Econometrics - 30C00200

Econometrics - 30C00200 Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business

More information

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11 Econ 495 - Econometric Review 1 Contents 1 Linear Regression Analysis 4 1.1 The Mincer Wage Equation................. 4 1.2 Data............................. 6 1.3 Econometric Model.....................

More information

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35 Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35 What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information