coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Review - Interpreting the Regression If we estimate: It can be shown that: where ˆ1 r i coefficients β ˆ+ βˆ x+ βˆ ˆ= 0 1 1 2x2 y ˆβ n n 2 1 = rˆ i1yi rˆ i1 i= 1 i= 1 xˆ are the residuals obtained when we estimate the regression ˆ ˆ 1=γ 0+ γ ˆ 2x2 The estimated effect of x 1 on y equals the (simple regression) estimated effect of the part of x 1 that is not explained by x 2 Note that the average of the residuals is always 0, hence the expression for the simple linear regression estimator is simplified This interpretation holds in general (with more variables), Multiple Linear Regression 1

Review Conditions under which exclusion of variables preserves unbiasedness of estimators Estimate the following regressions: - - - ~ y= yˆ = ~ β βˆ 0 0 ~ + β x + βˆ 1 1 x 1 1 + βˆ ~ Ifβˆ 2 = 0, then β1 = βˆ 1 (check first order conditions) ~ If x ˆ 1 and x2 are uncorrelated, then β1 = β1 ~ However, in general it will be the case that β βˆ 2 x 2 1 1 Multiple Linear Regression 2

Review - More or Less Variables? In general, and assuming MLR.1 to MLR.4 holds for as many variables as those under consideration: If we do not include a variable and this variable is uncorrelated with the included regressors, then the OLS estimators will be unbiased Remember, if the other factors (in u) are uncorrelated with the regressors, we can still interpret the estimated effects as ceteribus paribus effects If we do not include a variable and this variable is correlated with the included regressors, then the OLS estimators will be biased, except if the coefficient of the variable not included is 0 in the full model Multiple Linear Regression 3

So, always more variables? Even if they are irrelevant (or almost irrelevant) and therefore do not induce bias in the other estimators? No! Why? Variances of the estimators can become large! Can show, under MLR.1 to MLR.5, that: =1,2,,k Is the coefficient of determination from regressing x on all the other regressors. Tells us how much the other regressors explain x Multiple Linear Regression 4

Understanding OLS Variances =1,2,,k Strong linear relations among the independent variables are harmful: a larger R 2 implies a larger variance for the estimators (almost multicollinearity) If some irrelevant variable is uncorrelated with the remaining regressors, then including it maintains the variance unchanged (not interesting case ) Typically, variables that you think would be useful but turn out to seem irrelevant, are highly correlated with variables already included. This is undesirable as the variances of the estimators become large. So, avoid including these variables, since estimators for the other coefficients will be unbiased and display a smaller variance a larger σ 2 implies a larger variance of the OLS estimators a larger SST implies a smaller variance of the estimators (increases with sample size, so in large samples we should not be too worried!!) Multiple Linear Regression 5

Review - The Gauss-Markov Theorem Under MLR.1 to MLR.5 (the so-called Gauss-Markov Assumptions) it can be shown that OLS is BLUE Best Linear Unbiased Estimator Thus, if the 5 assumptions are presumed to hold, use OLS No other linear and unbiased estimator has a variance smaller than OLS Variances here are matrices, we are saying that is a positive semi-definite matrix (implies that all individual OLS parameter estimators have smaller variance than any other linear unbiased estimator for those parameters) Multiple Linear Regression 6

Inference in the Multiple Linear Regression Model 7

Inference in the multiple linear regression model Suppose you want to test whether a variable is important in explaining variation in the dependent variable: E.g., is the effect of tenure on wages statistically significant (i.e., different than zero)? Is the effect of height on wages statistically significant? Or suppose you want to test whether a coefficient has a particular value E.g., is the effect of one additional year of schooling on expected monthly wages equal to 200? Need to take into account sampling distribution of our estimators We will check whether under the maintained hypothesis (or null htpothesis) the observed values of certain test statistics are likely If they are not we say we reect the null Inference 8

Inference in the multiple linear regression model Assumption MLR.6 (Normality) The distribution of the population error u is independent of x 1, x 2,,x k and u is normally distributed with mean 0 and variance σ 2 We write: u ~ Normal (0,σ 2 ) Independence is stronger than MLR.4 (zero conditional mean). It implies MLR.4. Also, normality and independence imply MLR.5 so all the results regarding unbiasedness and variance of the estimators remain valid Normality is unrealistic in many cases (e.g., wages cannot be negative but normality of u could deliver negative wages). However, most results would hold in large samples without the normality assumption Inference 9

Classical Linear Model Assumptions MLR.1 through MLR.6 are the Classical Linear Model assumptions With these assumptions, one can prove that the OLS estimators are the minimum variance unbiased estimators: no other unbiased estimator has a variance smaller than OLS Inference 10

Distribution of OLS estimators Under MLR.1 through MLR.6 it is straightforward to show that: y x ~ Normal (β 0 + β 1 x 1 + + β k x k, σ 2 ) Also, since the OLS estimators are a linear function of the error term u, then (conditional on the x s) : βˆ ~ Normal ( βˆ β ) sd ( β, Var( βˆ )) ( βˆ ) ~ Normal, so that ( 0,1) : where sd stands for standard deviation (square root of the variance, derived in previous classes) Inference 11

Distribution of OLS estimators Now, theσ 2 that appears in the expression for the standard deviation of the estimators must be estimated 2 2 2 Also, conditional on the x s n k 1) σˆ / σ ~ χ which implies: ( ) ( ) βˆ β βˆ β βˆ βˆ β ( n k 1 ( ) sd ( ) ( ) σ Normal (0,1) = = ( ) ( ) ( ) σ 2 ~ t se βˆ ˆ ˆ ˆ ˆ 1 sd β se β n k sd β χn k 1 n k 1 Therefore, conditional on the x s, we have: Degrees of ( βˆ β ) se ( βˆ ) ~ t n k 1 freedom : n k 1(for large n this is similar to a Normal (0,1)) Inference 12

Performing a test on a coefficient 1 - Set the null hypothesis (and the alternative). E.g., H 0 : β = 0 (coefficient on experience in our wage regression) and H 1 : β > 0 2 - Choose significance level α (Probability of reecting the null if the null is actually true) E.g., α=0.05 3 - Look at the sampling distribution of the test statistic t (random variable) involving the parameter: t ( βˆ β ) = k seβ ( ˆ ) n 1 Under the null hypothesis, the test statistic should be small across samples. Reect the null if the observed value of the test statistic is very unlikely (very large). ~ t Inference 13

Performing a test on a coefficient 4 - For one-sided tests where the alternative is favoured if t obs is large and positive (e.g., H 1 : β > 0), reect the null if the observed test statistic, t obs, is larger than c, where c is implicitly given by: Prob[t>c H 0 is true]=α For one-sided tests where the alternative is favoured if t obs is large and negative (e.g., H 1 : β < 0), reect the null if the observed test statistic, t obs, is smaller than -c, where c is implicitly given by: Prob[t<-c H 0 is true]=α For two-sided tests, where the alternative is favoured if t obs is large in absolute value (e.g., H 1 : β 0), reect the null if the absolute value of observed test statistic, t obs, is larger than c, where c is implicitly given by: Prob[ t >c H 0 is true]=α Inference 14

One-Sided Alternative y i = β 0 + β 1 x i1 + + β k x ik + u i H 0 : β = 0 H 1 : β > 0 t obs here: Fail to reect the null ( βˆ β ) Test statistic : t = tn k 1 seβ (1 α) ( ˆ ) ~ t obs here: Reect the null α - Distribution of the test statistic under the null 0 Inference 15 c

Two-Sided Alternatives y i = β 0 + β 1 X i1 + + β k X ik + u i H 0 : β = 0 H 1 : β 0 t obs here: Reect the null α/2 ( βˆ β ) Test statistic : t = tn k 1 seβ t obs here: Fail to reect the null (1 α) ( ˆ ) ~ t obs here: Reect the null α/2 -c - Distribution of the test statistic under the null 0 Inference 16 c

Example: Hypothesis testing Dependent variable: Log of wages The t ratios are the observed values of the test statistic for testing β = 0, e.g., 96.75=0.07614/0.00079 Inference 17

Example: Hypothesis testing Choose α=0.05 Test H 0 :β = 0 (coefficient on education in our wage regression) against H 1 : β 0 t obs =(0.07614-0)/0.00079=96.75 t obs >1.96 so we reect the null. We say the coefficient for education is significant at the 5% level We use Normal approximation since n is large 0.025 0.95 0.025 -c=-1.96 - Distribution of the test statistic under the null 0 c=1.96 Inference 18

Example: Hypothesis testing Choose α=0.05 Test H 0 :β = 0 (coefficient on education in our wage regression) against H 1 : β > 0 (clearly more reasonable ) t obs =(0.07614-0)/0.00079=96.75 t obs >1.645 so we reect the null. We use Normal approximation since n is large 0.95 0.05 - Distribution of the test statistic under the null 0 c=1.645 Inference 19

Example: Hypothesis testing Choose α=0.05 Test H 0 :β = 0.07 (coefficient on education in our wage regression) against H 1 : β 0.07 t obs =(0.07614-0.07)/0.00079=7.772 t obs >1.96 so we reect the null. We use Normal approximation since n is large 0.025 0.95 0.025 -c=-1.96 0 c=1.96 Inference 20

P-Value Given the observed value of the t statistic, what would be the smallest significance level at which the null H 0 :β = 0 would be reected against the alternative H 1 : β 0? This is the P-Value It is given by Prob[ t > t obs H 0 true] P-Value /2 P-Value /2 1- P-value - t obs If the α>p-value we would reect the null! t obs Inference 21

Confidence intervals A (1 - α) % confidence interval is defined as: βˆ in a ± c. se t n k 1 ( βˆ ), where c is distribution the 1- α 2 percentile If the hypothesized value of a parameter (b ) is inside the confidence interval, we would not reect the null β = b against β b at the significance level α Inference 22

Testing multiple exclusion restrictions Unrestricted model: Restricted model: H 1 : Not H 0 Under the null: r stands for restricted and ur for unrestricted, q is number of restrictions Does SSR ur decrease enough compared to SSR r? If F obs is too large we reect the null Inference 23

Testing multiple exclusion restrictions H 1 : Not H 0 Obtained by dividing numerator and denominator above by SST This is different from testing significance of each coefficient individually!! It is a test of oint significance Inference 24

Testing multiple exclusion restrictions: F test f(f) Reect the null if the observed test statistic, Fobs, is larger than c, where c is implicitly given by: Prob[F>c H 0 is true]=α fail to reect (1 α) 0 c α reect F Inference 25

Testing multiple exclusion restrictions: Example Dependent Variable: log of monthly wages, n=11064 Inference 26

Testing multiple exclusion restrictions: Example α=0.05 Inference 27

Overall Significance of the model Use: H 1 is: Not H 0 Under the null Testing general linear restrictions: in the practical sessions! Inference 28