LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26

ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will explain what significance of coefficients means We will learn how to read regression output Readings for this week: Studenmund, Chapter 5.1-5.4 Wooldridge, Chapter 4 2 / 26

QUESTIONS WE ASK What conclusions can we draw from our regression? What can we learn about the real world from a sample? Is it likely that our results could have been obtained by chance? If our theory is correct, what are the odds that this particular outcome would have been observed? 3 / 26

HYPOTHESIS TESTING We cannot prove that a given hypothesis is correct using hypothesis testing All that can be done is to state that a particular sample conforms to a particular hypothesis We can often reject a given hypothesis with a certain degree of confidence In such a case, we conclude that it is very unlikely the sample result would have been observed if the hypothesized theory were correct 4 / 26

NULL AND ALTERNATIVE HYPOTHESES First step in hypothesis testing: state explicitly the hypothesis to be tested Null hypothesis: statement of the range of values of the regression coefficient that would be expected to occur if the researcher s theory were not correct Alternative hypothesis: specification of the range of values of the coefficient that would be expected to occur if the researcher s theory were correct In other words: we define the null hypothesis as the result we do not expect 5 / 26

NULL AND ALTERNATIVE HYPOTHESES Notation: H0... null hypothesis H A... alternative hypothesis Examples: One-sided test H 0 : β 0 H A : β > 0 Two-sided test H 0 : β = 0 H A : β 0 6 / 26

TYPE I AND TYPE II ERRORS It would be unrealistic to think that conclusions drawn from regression analysis will always be right There are two types of errors we can make Type I : We reject a true null hypothesis Type II : We do not reject a false null hypothesis Example: H0 : β = 0 HA : β 0 Type I error: it holds that β = 0, we conclude that β 0 Type II error: it holds that β 0, we conclude that β = 0 7 / 26

TYPE I AND TYPE II ERRORS Example: H0 : The defendant is innocent HA : The defendant is guilty Type I error = Sending an innocent person to jail Type II error = Freeing a guilty person Obviously, lowering the probability of Type I error means increasing the probability of Type II error In hypothesis testing, we focus on Type I error and we ensure that its probability is not unreasonably large 8 / 26

DECISION RULE 1. Calculate sample statistic 2. Compare sample statistic with the critical value (from the statistical tables) The critical value divides the range of possible values of the statistic into two regions: acceptance region and rejection region If the sample statistic falls into the rejection region, we reject H 0 If the sample statistic falls into the acceptance region, we do not reject H 0 The idea is that if the value of the coefficient does not support H 0, the sample statistic should fall into the rejection region 9 / 26

ONE-SIDED REJECTION REGION H 0 : β 0 vs H A : β > 0 Distribution of β: Probability of Type I error Acceptance region Rejection region 10 / 26

TWO-SIDED REJECTION REGION H 0 : β = 0 vs H A : β 0 Distribution of β: Probability of Type I error Rejection region Acceptance region Rejection region 11 / 26

THE t-test We use t-test to test hypothesis about individual regression slope coefficients Test of more than one coefficient at a time (joint hypotheses) are typically done with the F-test (see next lecture) The t-test is appropriate to use when the stochastic error term is normally distributed and when the variance of that distribution is unknown These are the usual assumptions in regression analyses The t-test accounts for differences in the units of measurement of the variables 12 / 26

THE t-test Consider the model y = β 0 + β 1 x 1 + β 2 x 2 + ε Suppose we want to test (b is some constant) H 0 : β 1 = b vs H A : β 1 b We know that ( ) β 1 N β 1, Var( β 1 ) β 1 β 1 N(0, 1) Var( β 1 ) 13 / 26

THE t-test Problem: Var( β 1 ) depends on the variance of error term σ 2, which is unobservable and therefore unknown It has to be estimated as ˆσ 2 := s 2 = e e n k, k is the number of regression coefficients (here k = 3) e is the vector of residuals We denote standard error of β 1 (sample ) counterpart of standard deviation ) as s.e. ( β1 σ β1 14 / 26

THE t-test We define the t-statistic t := β 1 β 1 ) s.e. ( β1 t n k where β 1 is the estimated coefficient and β 1 is the value of the coefficient that is stated in our hypothesis This statistic depends only on the estimate β 1, our hypothesis about β 1, and it has a known distribution 15 / 26

TWO-SIDED t-test Our hypothesis is Hence, our t-statistic is H 0 : β 1 = b vs H A : β 1 b t = β 1 b ) s.e. ( β1 where β1 is the estimated regression coefficient of β 1 b is the constant from our null hypothesis ) s.e. ( β1 is the estimated standard error of β 1 16 / 26

TWO-SIDED t-test How to determine the critical value for this test statistic? The critical value is the value that distinguishes the acceptance region from the rejection region 1. We set the probability of Type I error Let s set the Type I. error to 5% We say the p-value of the test is 5% or that we have a test at 95% confidence level 2. We find the critical values in the statistical tables: t n k,0.975 and t n k,0.025 The critical value depends on the chosen level of Type I error and n k Note that t n k,0.975 = t n k,0.025 17 / 26

TWO-SIDED t-test Distribution tn-k Rejection region : p-value = 5% 2.5 % 2.5 % tn-k,0.025 tn-k,0.975 We reject H 0 if t > t n k,0.975 18 / 26

ONE-SIDED t-test Suppose our hypothesis is Our t-statistic still is H 0 : β 1 b vs H A : β 1 > b t = β 1 b ) s.e. ( β1 We set the probability of Type I error to 5% We compare our statistic to the critical value t n k,0.95 19 / 26

ONE-SIDED t-test Distribution tn-k Rejection region : p-value = 5% 5% tn-k,0.95 We reject H 0 if t > t n k,0.95 20 / 26

SIGNIFICANCE OF THE COEFFICIENT The most common test performed in regression is with the t-statistic H 0 : β = 0 vs H A : β 0 t = β ) t n k s.e. ( β If we reject H 0 : significant β = 0, we say the coefficient β is This t-statistic is displayed in most regression outputs 21 / 26

THE p-value Classical approach to hypothesis testing: first choose the significance level, then test the hypothesis at the given level of significance (e.g. 5%) However, there is no correct significance level. Or we can ask a more informative question: What is the smallest significance level at which the null hypothesis would still be rejected? This level of significance is known as the p-value. Remember that the significance level describes the probability of type I. error. The smaller the p-value, the smaller the probability of rejecting the true null hypothesis (the bigger the confidence the null hypothesis is indeed correctly rejected). The p-value for H0 : β = 0 is displayed in most regression outputs 22 / 26

EXAMPLE Let us study the impact of years of education on wages: wage = β 0 + β 1 education + β 2 experience + ε Output from Gretl: 23 / 26

CONFIDENCE INTERVAL A 95% confidence interval of β is an interval centered around β such that β falls into this interval with probability 95% ) P ( β c < β < β + c = c = P ) < s.e. ( β β β c ) < ) = 0.95 s.e. ( β s.e. ( β Since β β s.e.( β) t n k, we derive the confidence interval: ) β ± t n k,0.975 s.e. ( β 24 / 26

CONFIDENCE INTERVAL Output from Gretl (wage regression): Confidence interval ) for coefficient on education: β ± t n k,0.975 s.e. ( β = 0.644 ± 1.960 0.054 β [0.538; 0.750] with 95% probability 25 / 26

SUMMARY We discussed the principle of hypothesis testing We derived the t-statistic We defined the concept of the p-value We explained what significance of a coefficient means We observed a regression output on an example 26 / 26