Multiple Regression Analysis

Size: px

Start display at page:

Download "Multiple Regression Analysis"

Percival Blankenship
6 years ago
Views:

1 Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x β k x k + u 2. Inference 0

2 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators 2. That OLS is BLUE, given the Gauss-Markov assumptions! This still leaves open the shape of the distribution of the OLS estimators! To do classical hypothesis testing, we need to make an assumption about this distribution Assumption 6 (Normality): u is independent of x 1, x 2,, x k and u is normally distributed with zero mean and variance σ 2 : u ~ Normal(0, σ 2 ) 1

3 CLM Assumptions (cont)! This is a very strong assumption, it implies: E(u x)=e(u)=0, and Var(u x)=var(u)=σ 2! Assumptions 1-6 are called the Classical Linear Model Assumptions, CLM Assumptions! Under CLM, OLS is not only BLUE, but is the minimum variance unbiased estimator - no longer limited to the class of linear estimators 2

4 CLM Assumptions (cont)! We can summarize the population assumptions of CLM as follows! y x ~ Normal(β 0 + β 1 x β k x k, σ 2 )! While for now we just assume normality, clear that sometimes not the case! Example: Suppose we re interested in modeling house prices (our y variable) as a function of x variables: House prices are never below zero (a normally distributed variable takes on negative values); same problem with wages Could take log of house prices to convert it to something close to normal; analysis of wages as y variable often involves taking logs Or, if our sample is large enough, non-normality ceases to be a problem, thanks to the Central Limit Theorem. 3

5 The homoskedastic normal distribution with a single explanatory variable f(y x) y.. E(y x) = β 0 + β 1 x Normal distribution x 1 x 2 4

6 Normal Sampling Distributions! Under the CLM assumptions, conditional on the sample values of the independent variables, (x), ˆ! j ~ N "! j,var( ˆ! j ) $ # %! Why? Because each estimated β is a linear combination of the errors, u, in the sample. Recall: ˆ " j = " j + n $ i=1 n $ (x ij # x j )u i (x ij # x j ) 2, and u i ~ N(0,% 2 ) i=1! Any linear combination of iid normal random variables is normal (recall Appendix C) 5

7 Normal Sampling Distributions thus, ( ˆ! j "! ) j sd ( ˆ! ) j ~ Normal( 0,1)! To standardize (as before), we just subtract the mean and divide by the standard deviation! The fact that the estimated β s are distributed normally has further implications we will use: 1) Any linear combination of the estimated β s will be normally distributed. 2) Any subset of the estimated β s will be jointly normally distributed 6

8 The t Test! Testing hypotheses about a single population parameter! Remember, the β j are unknown! However, we can hypothesize about the value of β j and use statistical inference to test our hypothesis! To perform a hypothesis test we first note that under CLM: ( ˆ! j "! j ) se( ˆ! j ) ~ t n" k "1 7

9 The t Test (cont)! Knowing the sampling distribution for the standardized estimator allows us to carry out hypothesis tests! Start with a null hypothesis! For example, H 0 :! j = 0! The null here is making the claim that variable x j has no effect on y. 8

10 The t test (cont)! To perform our test (given this particular stating of the null hypothesis) we first form the t statistic for our estimate of β j for ˆ! j : t ˆ! j " ˆ! j # 0 se( ˆ! ) = j ˆ! j se( ˆ! ) j! Notice that we subtract the value of β j under the null (in this case, 0), from our estimate of β j. Intuitively, if our estimate is close to the null, our t- statistic will be close to zero, and we will be unable to reject the null that β j =0. 9

11 Things to notice about t-stat! The denominator is always positive. So for a zero null, the t-statistic will always take on the sign of your estimate of β j. For positive estimated β j it will be positive, for negative estimated β j it will be negative.! We can interpret the t-statistic as giving us an estimate of how many standard deviations from the null (0, in this case) our estimate of β j lies. Say it equals 3. This suggests that it lies three standard deviations to the right of 0. If it equals -3, this suggests it lies 3 sd s to the left of 0. 10

12 Intuition for t Test! Because it is a continuous random variable, the estimated β j will never precisely equal 0 even if the underlying population parameter β j equals 0! We want to know how far estimated β j is from 0 Essentially, if our estimated β j lies near the null, we think there s a good chance that estimate would be generated if the null is true (so, we will fail to reject the null); if our estimated β j lies far away from the null, we think it s unlikely that estimate would be generated under the null: thus we reject the null.! The question is, How far is far enough from the null to warrant rejection?! We need a rejection rule! 11

13 t-test: One-Sided Alternatives! Besides our null, H 0, we need an alternative hypothesis, H 1, and a significance level! H 1 may be one-sided, or two-sided! Examples of different one-sided alternatives include H 1 :! j < 0 or H 1 :! j > 0! A 2-sided alternative is H 1 :! j " 0! We also need to choose a significance level of our test (recall that the significance level is the probability of Type I error: that is the probability of rejecting the null when the null is true) 12

14 One-Sided Alternatives (cont)! Having picked a significance level, α, we look up the (1 α) th percentile in a t distribution with n k 1 df and call this c, the critical value! With a 5% significance level and a 1-sided positive alternative, c is the 95th percentile. If H 0 is true, 95% of the time our estimate of t will lie below c, so 95% of the time we will correctly fail to reject the null.! If our estimate of t is greater than c, then either 1) H 0 is true, and an unlikely event (Type I error) has occurred; or 2) H 0 is false.! Since the odds of (1) are very small (assuming we choose a conventional significance level like 5%), we will choose the second interpretation (but never forget that (1) is a possibility!) 13

15 Rejection Rule! Thus, our rejection rule for H 1 : β j >0 is as follows:! We reject the null hypothesis if the t-statistic is greater than the critical value! We fail to reject the null hypothesis if the t statistic is less than the critical value 14

16 One-Sided Alternative y i = β 0 + β 1 x i1 + + β k x ik + u i H 0 : β j = 0 H 1 : β j > 0 Fail to reject (1 α) reject α 0 c 15

17 One-sided vs Two-sided! Because the t distribution is symmetric, testing H 1 : β j < 0 is straightforward.! The critical value is just the negative of before! We can reject the null if the t statistic < c, while if t > c then we fail to reject the null.! For a two-sided test, we set the critical value based on α/2 and reject H 1 : β j 0 if the absolute value of the t statistic > c In the 5% case, c would be the 97.5 percentile of the t distribution with n-k-1 df 16

18 Two-Sided Alternative y i = β 0 + β 1 X i1 + + β k X ik + u i H 0 : β j = 0 H 1 : β j 0 fail to reject reject α/2 (1 α) reject α/2 -c 0 c 17

19 Summary for H 0 : β j = 0! Unless otherwise stated, the alternative is assumed to be two-sided; in fact EViews will automatically generate t-statistics for you assuming your null value is 0 and your test is two sided.! If we reject the null, we typically say that our estimated β j is statistically significant at the 5% level or statistically different from zero (the null) at the 5% level! If we fail to reject the null, we typically say our estimate of β j is statistically insignificant or statistically indistinguishable from zero (the null). 18

20 Testing other hypotheses! A more general form of the t statistic recognizes that we may want to test something like H 0 : β j = a j! a j is a constant, e.g. test H 0 : β j =1! In this case, the appropriate t statistic is t! ˆ" j # a j se( ˆ" j ), where a j = 0 for the standard test! The t-statistic tells us how many standard deviations from the null our estimate lies.! As before, this can be used with a one-sided or two-sided alternative. 19

21 Confidence Intervals! Another way to use classical statistical testing is to construct a confidence interval using the same critical value as was used for a two-sided test! A (1 - α) % confidence interval is defined as ˆ! j ± c se( ˆ! # j ), where c is the 1- " $ % 2 in a t n) k )1 distribution! e.g. α=5%; 95% confidence interval & ' ( percentile This implies that the true value of β j will be contained in the interval in 95% of random samples. 20

22 Confidence Intervals (cont)! The confidence intervals are easily calculated since we have ˆ " j and se ˆ (" ) j! Once we have the confidence interval it is easy to do a two-tailed test! If testing H 0 : β j = a j against H 1 : β j a j, then Reject H 0 if a j does not lie in the confidence interval for your sample. Do not reject H 0 if a j does lie in the confidence interval for your sample. 21

23 Computing p-values for t tests! An alternative to the classical approach is to ask, what is the smallest significance level at which the null would be rejected?! So, compute the t statistic, and then look up what percentile it is in the appropriate t distribution--this is called the p-value Interpretation:! The p-value gives the probability of observing a t-statistic as large as we did, if the null is in fact true.! If p=0.01, this tells us we would obtain a t-statistic as big in magnitude (or bigger) in only 1% of samples, if the null is true.! Small p-values are evidence against the null hypothesis 22

24 EViews and p-values, t tests, etc.! Most computer packages will compute the p-value for you, assuming a two-sided test! If you really want a one-sided alternative, just divide the two-sided p-value by 2! EViews provides the t statistic, and p-value for H 0 : β j = 0 for you, in columns labeled t-statistic, Prob., respectively. It also gives you the coefficient estimate and standard error, which can be used to calculate confidence intervals. 23

25 Testing Hypotheses about a Linear Combination of Parameters! Suppose instead of testing whether β 1 is equal to a constant, you want to test whether it is equal to another parameter, e.g. H 0 : β 1 = β 2! This is a hypothesis test concerning 2 parameters (haven t seen this before)! Can use same basic procedure as before to form the t statistic if we write H 0 : β 1 -β 2 =0, t = ˆ! 1 " ˆ! 2 se( ˆ! 1 " ˆ! 2 ) 24

26 Testing Linear Combo (cont)! Problem/difficulty is calculating the standard error se( " ˆ 1 # " ˆ 2 ). Recall (from review) that Var " ˆ 1 # ˆ ( ) Since " se # ˆ! j ( ) = " se( ˆ! ) 1 se ˆ! 1 & ˆ! 2 ( " ) 2 = Var ˆ " 1 ( ) $ is an unbiased estimate of Var ˆ! j %2 { # ( ) + Var( " ˆ ) 2 # 2Cov( " ˆ 1," ˆ ) 2 ( ) $ + " se ˆ!2 %2 # $ & 2s12 %2 }1/2 where s 12 is an estimate of Cov( ˆ! 1, ˆ! ) 2 25

27 Testing a Linear Combo (cont)! So, to use formula, need s 12, which standard EViews output does not give! Many packages will have an option to get it, or will just perform the test for you! In lab you will learn how to use EViews to test whether ˆ! 1 = ˆ! 2.! More generally, you can always restate the problem to get the test you want 26

28 Example! Suppose you are interested in the effect of campaign expenditures by Party A and Party B on votes received by Party A s candidate! Model is votea = β 0 + β 1 log(expenda) + β 2 log(expendb) + β 3 prtystra + u! You might think that expenda and expendb should have an equal yet opposite effect on votes received by Party A s candidate i.e. H 0 :β 1 =-β 2 We could define a new parameter, θ 1, where θ 1 = β 1 +β 2 27

29 Example (cont):! If we could generate an estimate of se(θ 1 ) we could test H 0 : θ 1 = β 1 + β 2 = 0! We can do this by rewriting the model! recall: β 1 = θ 1 β 2, so substitute in and rearrange votea = β 0 + θ 1 log(expenda) + β 2 [log(expendb)-log (expenda)] + β 3 prtystra + u! This is the same model as before, but now we get a direct estimate of θ 1 and the standard error of θ 1 which makes for easier hypothesis testing. 28

30 Example (cont):! The t-statistic can be written: ˆ! t = 1 se( ˆ! ) 1! In fact, any linear combination of parameters could be tested in a similar manner! Other examples of hypotheses about a single linear combination of parameters:! β 1 = 1 + β 2! β 1 = 5β 2! β 1 = -1/2β 2 ; etc 29

31 Multiple Linear Restrictions! Everything we ve done so far has involved testing a single linear restriction, (e.g. β 1 = 0 or β 1 = β 2 )! However, we may want to jointly test multiple hypotheses about our parameters! A typical example is testing exclusion restrictions we want to know if a group of parameters are all equal to zero Example: Suppose that you are a college administrator and you want to know what determines which students will do well 30

32 Exclusion Restrictions! You might estimate the following model (US applicable): ColGPA=β 0 + β 1 highscgpa+ β 2 VerbSAT+β 3 MathSAT+u! Once we control for high-school GPA do the SAT scores have any effect on (or at least predictive power for) college GPA?! Here, the null hypothesis is H 0 : β 2 =0, β 3 =0 (where each of these is referred to as an exclusion restriction ; the alternative hypothesis is that H 0 is not true.! If we can t reject the null, it suggests we should drop VerbSAT and MathSAT from the model! While it s tempting to do separate t-tests on β 2 and β 3, this is not appropriate, because the t-test is done assuming no restrictions are simultaneously being placed on other parameter values (i.e. the t-test is designed for single exclusion restrictions) 31

33 Testing Exclusion Restrictions! We want to know if the parameters are jointly significant at a given level! For example it is possible that neither β 2 nor β 3 is individually significant at, say, the 5% level but that they are jointly significant at that level! To test the restrictions we need to estimate 1. The Restricted Model--without x 2 and x 3 2. The Unrestricted Model--with x 1, x 2 and x 3! We can use the SSR s from each of these regressions to form a test statistic 32

34 Exclusion Restrictions (cont)! Clearly, the SSR will be greater in the restricted model (because with fewer RHS variables, we explain less of the variation in y and so get larger residuals, on average).! Intuitively, we want to know if the increase in SSR is big enough to warrant inclusion of x 1 and x 2! The F-statistic in the general case with q exclusion restrictions is: F! (SSR r " SSR ur ) / q SSR ur / (n " k " 1) Where r refers to the restricted model, and ur refers to the unrestricted model. 33

35 The F statistic! The F statistic is always positive, since the SSR from the restricted model can t be less than the SSR from the unrestricted model! Essentially the F statistic is measuring the relative increase in SSR when moving from the unrestricted to the restricted model! q = number of restrictions, or the numerator degrees of freedom df r df ur! n k 1 = denominator degrees of freedom = df ur 34

36 The F statistic (cont)! To decide if the increase in SSR when we move to a restricted model is big enough to reject the exclusions, we need to know about the sampling distribution of our F statistic under H 0! It turns our that F~F q,n-k-1, where q is the numerator degrees of freedom and n-k-1 is the denominator degrees of freedom.! We can use the back of the text (Table G.3) or EViews to tabulate the F-distribution 35

37 The F statistic (cont) f(f) fail to reject Reject H 0 at α significance level if F > c (1 α) 0 c α reject F 36

38 Summary of the F statistic! If H 0 is rejected we say, for example, that x 2 and x 3 are jointly significant! If H 0 is not rejected the variables are jointly insignificant! The F-test is particularly useful when a group of the x variables are highly correlated e.g. expenditures on hospitals, doctors, nurses, etc. We don t know which are more important Multicollinearity won t allow for individual t-tests (due to high standard errors of individual coefficient estimates) 37

39 The R 2 form of the F statistic! Because the SSR s may be large and unwieldy, an alternative form of the formula is useful! We use the fact that SSR = SST(1 R 2 ) for any regression, so can substitute in for SSR r and SSR ur F! ( R 2 2 ur " R ) r q ( ) n " k " " R ur ( )! Again, r = restricted and ur = unrestricted! Notice in the numerator that the unrestricted R- squared comes first. 38

40 Overall Significance! A special case of exclusion restrictions is to test H 0 : β 1 = β 2 = = β k = 0! Since the R 2 from a model with only an intercept will be zero, the F statistic is simply R 2 / k F = (1! R 2 ) / (n! k! 1)! If we can t reject the null, this suggests that data are consistent with the RHS variables explaining none of the variation in y.! Even if the R-squared is very small, we can use this statistic to test whether the x s are explaining any of y. 39

41 General Linear Restrictions! The basic form of the F statistic will work for any set of linear restrictions (not just setting a couple betas equal to zero)! First estimate the unrestricted model and then estimate the restricted model! In each case, make note of the SSR! Imposing the restrictions can be tricky will likely have to redefine variables again 40

42 Example:! Use same voting model as before! Model is votea = β 0 + β 1 log(expenda) + β 2 log(expendb) + β 3 prtystra + u! now suppose null is H 0 : β 1 = 1, β 3 = 0! Substituting in the restrictions: votea = β 0 + log (expenda) + β 2 log(expendb) + u, so use votealog(expenda)= β 0 + β 2 log(expendb) + u as the restricted model.! Note: Can t use the R-squared form of the F- statistic here because SSTs will be different. 41

43 F-statistic Summary! Just as with t statistics, p-values can be calculated by looking up the percentile in the appropriate F distribution! In lab, you ll learn how to do this in EViews.! If only one exclusion is being tested, then F = t 2, and the p-values will be the same 42

Multiple Regression Analysis

Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,