ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Similar documents
ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Applied Statistics and Econometrics

ECON Introductory Econometrics. Lecture 17: Experiments

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

ECON Introductory Econometrics. Lecture 2: Review of Statistics

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Applied Statistics and Econometrics

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECON3150/4150 Spring 2015

Applied Statistics and Econometrics

ECON Introductory Econometrics. Lecture 13: Internal and external validity

Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data

Lecture 5. In the last lecture, we covered. This lecture introduces you to

ECO220Y Simple Regression: Testing the Slope

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

ECON3150/4150 Spring 2016

Hypothesis Tests and Confidence Intervals. in Multiple Regression

Nonlinear Regression Functions

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Statistical Inference with Regression Analysis

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

Econometrics Midterm Examination Answers

Applied Statistics and Econometrics

Linear Regression with Multiple Regressors

General Linear Model (Chapter 4)

Graduate Econometrics Lecture 4: Heteroskedasticity

Lecture 2 Estimating the population mean

The Simple Linear Regression Model

Introduction to Econometrics. Multiple Regression (2016/2017)

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Hypothesis Tests and Confidence Intervals in Multiple Regression

ECON3150/4150 Spring 2016

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Introduction to Econometrics. Multiple Regression

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Lecture 7: OLS with qualitative information

Lecture 4: Multivariate Regression, Part 2

Chapter 6: Linear Regression With Multiple Regressors

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

1 Independent Practice: Hypothesis tests for one parameter:

Correlation and Simple Linear Regression

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Introduction to Econometrics. Review of Probability & Statistics

Binary Dependent Variables

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Review of Econometrics

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Econometrics Homework 1

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Multiple Regression: Inference

Regression #8: Loose Ends

P1.T2. Stock & Watson Chapters 4 & 5. Bionic Turtle FRM Video Tutorials. By: David Harper CFA, FRM, CIPM

Course Econometrics I

Lab 11 - Heteroskedasticity

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Lab 07 Introduction to Econometrics

Quantitative Methods Final Exam (2017/1)

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

2. Linear regression with multiple regressors

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Lecture 4: Multivariate Regression, Part 2

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Empirical Application of Simple Regression (Chapter 2)

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

sociology 362 regression

Statistical Modelling in Stata 5: Linear Models

Problem Set 1 ANSWERS

Heteroskedasticity Example

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Econometrics. 8) Instrumental variables

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Essential of Simple regression

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

sociology 362 regression

Linear Regression with Multiple Regressors

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Fixed and Random Effects Models: Vartanian, SW 683

STATISTICS 110/201 PRACTICE FINAL EXAM

Heteroskedasticity. Occurs when the Gauss Markov assumption that the residual variance is constant across all observations in the data set

Ch 2: Simple Linear Regression

WISE International Masters

Lab 10 - Binary Variables

Econometrics - 30C00200

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Course Econometrics I

At this point, if you ve done everything correctly, you should have data that looks something like:

Lectures 5 & 6: Hypothesis Testing

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics August 2013

Multivariate Regression: Part I

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

Transcription:

ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5

Lecture outline 2 Testing Hypotheses about one of the regression coefficients Repetition: Testing a hypothesis concerning a population mean Testing a 2-sided hypothesis concerning β 1 Testing a 1-sided hypothesis concerning β 1 Confidence interval for a regression coefficient Efficiency of the OLS estimator Best Linear Unbiased Estimator (BLUE) Gauss-Markov Theorem Heteroskedasticity & homoskedasticity Regression when X i is a binary variable Interpretation of β 0 and β 1 Hypothesis tests concerning β 1

3 Repetition: Testing a hypothesis concerning a population mean H 0 : E (Y ) = µ Y,0 H 1 : E (Y ) µ Y,0 Step 1: Compute the sample average Y Step 2: Compute the standard error of Y ( ) SE Y = s Y n Step 3: Compute the t-statistic t act = Y µ Y,0 ( ) SE Y Step 4: Reject the null hypothesis at a 5% significance level if t act > 1.96 or if p value < 0.05

4 Repetition: Testing a hypothesis concerning a population mean Example: California test score data; mean test scores Suppose we would like to test H 0 : E (TestScore) = 650 H 1 : E (TestScore) 650 using the sample of 420 California districts Step 1: TestScore = 654.16 ( ) Step 2: SE TestScore = 0.93 Step 3: t act = TestScore 650 SE(TestScore) = 654.16 650 0.93 = 4.47 Step 4: If we use a 5% significance level, we reject H 0 because t act = 4.47 > 1.96

5 Repetition: Testing a hypothesis concerning a population mean Example: California test score data; mean test scores Friday January 20 13:21:04 2017 Page 1 1. ttest test_score=650 One-sample t test (R) / / / / / / / / / / / / Statistics/Data Analysis Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] test_s~e 420 654.1565.9297082 19.05335 652.3291 655.984 mean = mean( test_score) t = 4.4708 Ho: mean = 650 degrees of freedom = 419 Ha: mean < 650 Ha: mean!= 650 Ha: mean > 650 Pr(T < t) = 1.0000 Pr( T > t ) = 0.0000 Pr(T > t) = 0.0000

6 Testing a 2-sided hypothesis concerning β 1 Testing procedure for the population mean is justified by the Central Limit theorem. Central Limit theorem states that the t-statistic (standardized sample average) has an approximate N (0, 1) distribution in large samples Central Limit Theorem also states that β0 & β 1 have an approximate normal distribution in large samples and the standardized regression coefficients have approximate N (0, 1) distribution in large samples We can therefore use same general approach to test hypotheses about β 0 and β 1. We assume that the Least Squares assumptions hold!

7 Testing a 2-sided hypothesis concerning β 1 H 0 : β 1 = β 1,0 H 1 : β 1 β 1,0 Step 1: Estimate Y i = β 0 + β 1 X i + u i by OLS to obtain β 1 Step 2: Compute the standard error of β 1 Step 3: Compute the t-statistic Step 4: Reject the null hypothesis if t act = β 1 β 1,0 ) SE ( β1 t act > critical value or if p value < significance level

8 Testing a 2-sided hypothesis concerning β 1 The standard error of β 1 The standard error of β 1 is an estimate of the standard deviation of the sampling distribution σ β1 Recall from previous lecture: It can be shown that ) SE ( β1 = σ β1 = 1 n 1 n Var[(X i µ X )u i ] [Var(X i )] 2 1 n n 2 i=1 [ 1 n n i=1 ( 2 X i X) û2 i ) ] 2 2 ( X i X

9 Testing a 2-sided hypothesis concerning β 1 Friday January 13 14:48:31 2017 Page 1 1. regress test_score class_size, robust TestScore i = β 0 + β 1 ClassSize i + u i (R) / / / / / / / / / / / / Statistics/Data Analysis Linear regression Number of obs = 420 F(1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 Robust test_score Coef. Std. Err. t P> t [95% Conf. Interval] class_size -2.279808.5194892-4.39 0.000-3.300945-1.258671 _cons 698.933 10.36436 67.44 0.000 678.5602 719.3057 Suppose we would like to test the hypothesis that class size does not affect test scores (β 1 = 0)

10 Testing a 2-sided hypothesis concerning β 1 5% significance level Step 1: β1 = 2.28 Step 2: SE( β 1 ) = 0.52 Step 3: Compute the t-statistic H 0 : β 1 = 0 H 1 : β 1 0 t act = 2.28 0 0.52 = 4.39 Step 4: We reject the null hypothesis at a 5% significance level because 4.39 > 1.96 p value = 0.000 < 0.05

11 Testing a 2-sided hypothesis concerning β 1 Critical value of the t-statistic The critical value of t-statistic depends on significance level α 0.005 0.005-2.58 0 2.58 Large sample distribution of t-statistic 0.025 0.025-1.96 0 1.96 Large sample distribution of t-statistic 0.05 0.05-1.64 0 1.64 Large sample distribution of t-statistic.

12 Testing a 2-sided hypothesis concerning β 1 1% and 10% significance levels Step 1: β1 = 2.28 Step 2: SE( β 1 ) = 0.52 Step 3: Compute the t-statistic t act = 2.28 0 0.52 = 4.39 Step 4: We reject the null hypothesis at a 10% significance level because 4.39 > 1.64 p value = 0.000 < 0.1 Step 4: We reject the null hypothesis at a 1% significance level because 4.39 > 2.58 p value = 0.000 < 0.01

13 Testing a 2-sided hypothesis concerning β 1 5% significance level H 0 : β 1 = 2 H 1 : β 1 2 Step 1: β1 = 2.28 Step 2: SE( β 1 ) = 0.52 Step 3: Compute the t-statistic t act = 2.28 ( 2) 0.52 = 0.54 Step 4: We don t reject the null hypothesis at a 5% significance level because 0.54 < 1.96

Friday January 13 14:48:31 2017 Page 1 14 Testing a 2-sided hypothesis concerning β 1 5% significance level 1. regress test_score class_size, robust (R) / / / / / / / / / / / / Statistics/Data Analysis Linear regression Number of obs = 420 F(1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 Robust test_score Coef. Std. Err. t P> t [95% Conf. Interval] class_size -2.279808.5194892-4.39 0.000-3.300945-1.258671 _cons 698.933 10.36436 67.44 0.000 678.5602 719.3057 Tuesday January 24 16:14:17 2017 Page 1 1. lincom class_size-(-2) ( 1) class_size = -2 (R) / / / / / / / / / / / / Statistics/Data Analysis H 0 : β 1 = 2 H 0 : β 1 ( 2) = 0 test_score Coef. Std. Err. t P> t [95% Conf. Interval] (1) -.2798083.5194892-0.54 0.590-1.300945.7413286.

15 Testing a 1-sided hypothesis concerning β 1 5% significance level Step 1: β1 = 2.28 Step 2: SE( β 1 ) = 0.52 Step 3: Compute the t-statistic H 0 : β 1 = 2 H 1 : β 1 < 2 t act = 2.28 ( 2) 0.52 = 0.54 Step 4: We don t reject the null hypothesis at a 5% significance level because 0.54 > 1.64

Confidence interval for a regression coefficient 16 Method for constructing a confidence interval for a population mean can be easily extended to constructing a confidence interval for a regression coefficient Using a two-sided test, a hypothesized value for β 1 will be rejected at 5% significance level if t > 1.96 and will be in the confidence set if t 1.96 Thus the 95% confidence interval for β 1 are the values of β 1,0 within ±1.96 standard errors of β 1 95% confidence interval for β 1 ) β 1 ± 1.96 SE ( β1

(R)17 / / / / / Confidence interval for β ClassSize / / / / / / / Statistics/Data Analysis 1. regress test_score class_size, robust Linear regression Number of obs = 420 F(1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 Robust test_score Coef. Std. Err. t P> t [95% Conf. Interval] class_size -2.279808.5194892-4.39 0.000-3.300945-1.258671 _cons 698.933 10.36436 67.44 0.000 678.5602 719.3057 95% confidence interval for β 1 (shown in output) ( 3.30, 1.26) 90% confidence interval for β 1 (not shown in output) ) β 1 ± 1.64 SE ( β1 2.27 ± 1.64 0.52 ( 3.12, 1.42)

18 Properties of the OLS estimator of β 1 Recall the 3 least squares assumptions: Assumption 1: E (u i X i ) = 0 Assumption 2: (Y i, X i ) for i = 1,..., n are i.i.d Assumption 3: Large outliers are unlikely If the 3 least squares assumptions hold the OLS estimator β 1 Is an unbiased estimator of β 1 Is a consistent estimator β 1 Has an approximate normal sampling distribution for large n

19 Properties of Y as estimator of µ Y In lecture 2 we discussed that: Y is an unbiased estimator of µ Y Y a consistent estimator of µ Y Y has an approximate normal sampling distribution for large n AND Y is the Best Linear Unbiased Estimator (BLUE): it is the most efficient estimator of µ Y among all unbiased estimators that are weighted averages of Y 1,..., Y n Let ˆµ Y be an unbiased estimator of µ Y ˆµ Y = 1 n a i Y i with a 1,...a n nonrandom constants n i=1 then Y is more efficient than ˆµ Y, that is ( ) Var Y < Var (ˆµ Y )

Best Linear Unbiased Estimator (BLUE) 20 If we add a fourth OLS assumption: Assumption 4: The error terms are homoskedastic Var (u i X i ) = σ 2 u β OLS 1 is the Best Linear Unbiased Estimator (BLUE): it is the most efficient estimator of β 1 among all conditional unbiased estimators that are a linear function of Y 1,..., Y n Let β 1 be an unbiased estimator of β 1 β 1 = n a i Y i i=1 where a 1,..., a n can depend on X 1,..., X n (but not on Y 1,..., Y n) then OLS β 1 is more efficient than β 1, that is ( βols ) ) Var 1 X 1,..., X n < Var ( β1 X 1,..., X n

21 Gauss-Markov theorem for β 1 The Gauss-Markov theorem states that if the following 3 Gauss-Markov conditions hold 1 E (u i X 1,..., X n) = 0 2 Var (u i X 1,..., X n) = σ 2 u, 0 < σ 2 u < 3 E (u i u j X 1,..., X n) = 0, i j The OLS estimator of β 1 is BLUE It is shown in S&W appendix 5.2 that the following 4 Least Squares assumptions imply the Gauss-Markov conditions Assumption 1: E (u i X i ) = 0 Assumption 2: (Y i, X i ) for i = 1,..., n are i.i.d Assumption 3: Large outliers are unlikely Assumption 4: The error terms are homoskedastic: Var (u i X i ) = σ 2 u

Heteroskedasticity & homoskedasticity 22 The fourth least Squares assumption Var (u i X i ) = σ 2 u states that the conditional variance of the error term does not depend on the regressor X Under this assumption the variance of the OLS estimators simplify to σ 2 β0 = E(X 2 i )σ 2 u nσ 2 X σ 2 β1 = σ 2 u nσ 2 X Is homoskedasticity a plausible assumption?

Example of homoskedasticity Var (u i X i ) = σ 2 u: Example of heteroskedasticity Var (u i X i ) σ 2 u.

Heteroskedasticity & homoskedasticity Example: The returns to education 24 8 7 ln(wage) 6 5 4 0 5 10 15 20 years of education The spread of the dots around the line is clearly increasing with years of education (X i ) Variation in (log) wages is higher at higher levels of education. This implies that Var (u i X i ) σ 2 u.

Heteroskedasticity & homoskedasticity 25 If we assume that the error terms are homoskedastic the standard errors of the OLS estimators simplify to ) SE ( β1 = s 2 û ni=1 (X i X) 2 ) SE ( β0 = ( 1 n ni=1 Xi 2 )s 2 û ni=1 (X i X) 2 In many applications homoskedasticity is not a plausible assumption If the error terms are heteroskedastic, that is Var (u i X i ) σ 2 u and the above formulas are used to compute the standard errors of β 0 and β 1 The standard errors are wrong (often too small) The t-statistic does not have a N (0, 1) distribution (also not in large samples) The probability that a 95% confidence interval contains true value is not 95% (also not in large samples)

Heteroskedasticity & homoskedasticity 26 If the error terms are heteroskedastic we should use the following heteroskedasticity robust standard errors: ) ni=1 1 n 2 (X i X) SE ( β1 2 û i = 2 1 [ n 1n ni=1 (X i X) 2] 2 ) SE ( β0 = 1 1 ni=1 Ĥ n 2 2 û i i 2 [ n 1n ni=1 2 ] 2 Ĥ i with ( Ĥ i = 1 X/ 1 n ) n i=1 X i 2 X i Since homoskedasticity is a special case of heteroskedasticity, these heteroskedasticity robust formulas are also valid if the error terms are homoskedastic. Hypothesis tests and confidence intervals based on above se s are valid both in case of homoskedasticity and heteroskedasticity.

Heteroskedasticity & homoskedasticity 27 In Stata the default option is to assume homoskedasticity Since in many applications homoskedasticity is not a plausible assumption It is best to use heteroskedasticity robust standard errors To obtain heteroskedasticity robust standard errors use the option robust : Regress y x, robust

Wednesday January 25 11:56:20 2017 Page 1 Heteroskedasticity & homoskedasticity 1. regress test_score class_size (R) / / / / / / / / / / / / Statistics/Data Analysis Source SS df MS Number of obs = 420 F(1, 418) = 22.58 Model 7794.11004 1 7794.11004 Prob > F = 0.0000 Residual 144315.484 418 345.252353 R-squared = 0.0512 Adj R-squared = 0.0490 Total 152109.594 419 363.030056 Root MSE = 18.581 28 test_score Coef. Std. Err. t P> t [95% Conf. Interval] class_size -2.279808.4798256-4.75 0.000-3.22298-1.336637 _cons 698.933 9.467491 73.82 0.000 680.3231 717.5428 2. regress test_score class_size, robust Linear regression Number of obs = 420 F(1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 Robust test_score Coef. Std. Err. t P> t [95% Conf. Interval] class_size -2.279808.5194892-4.39 0.000-3.300945-1.258671 _cons 698.933 10.36436 67.44 0.000 678.5602 719.3057

Heteroskedasticity & homoskedasticity 29 If the error terms are heteroskedastic The fourth OLS assumption: Var (u i X i ) = σ 2 u is violated The Gauss-Markov conditions do not hold The OLS estimator is not BLUE (not efficient) but (given that the other OLS assumptions hold) The OLS estimators are unbiased The OLS estimators are consistent The OLS estimators are normally distributed in large samples

Regression when X i is a binary variable 30 Sometimes a regressor is binary: X = 1 if small class size, = 0 if not X = 1 if female, = 0 if male X = 1if treated (experimental drug), = 0 if not Binary regressors are sometimes called dummy variables. So far, β 1 has been called a slope, but that doesn t make sense if X is binary. How do we interpret regression with a binary regressor?

Regression when X i is a binary variable 31 Interpreting regressions with a binary regressor Y i = β 0 + β 1 X i + u i When X i = 0, When X i = 1, E (Y i X i = 0) = E (β 0 + β 1 0 + u i X i = 0) = β 0 + E (u i X i = 0) = β 0 E (Y i X i = 1) = E (β 0 + β 1 1 + u i X i = 1) = β 0 + β 1 + E (u i X i = 0) = β 0 + β 1 This implies that β 1 = E(Y i X i = 1) E(Y i X i = 0) is the population difference in group means

Regression when X i is a binary variable Example: The effect of being in a small class on test scores 32 TestScore i = β 0 + β 1 SmallClass i + u i Let SmallClass i be a binary variable: = 1 if Class size < 20 SmallClass i = 0 if Class size 20 Interpretation of β 0 : population mean test scores in districts where class size is large (not small) β 0 = E (TestScore i SmallClass i = 0) Interpretation of β 1 : the difference in population mean test scores between districts with small and districts with larger classes (not small). β 1 = E (TestScore i SmallClass i = 1) E (TestScore i SmallClass i = 0)

Regression when X i is a binary variable Thursday January 26 14:01:40 2017 Page 1 Example: The effect of being in a small class on test scores 1. tab small_class small_class Freq. Percent Cum. 0 182 43.33 43.33 1 238 56.67 100.00 Total 420 100.00 2. bys small_class: sum class_size (R) / / / / / / / / / / / / Statistics/Data Analysis 33 -> small_class = 0 Variable Obs Mean Std. Dev. Min Max class_size 182 21.28359 1.155685 20 25.8 -> small_class = 1 Variable Obs Mean Std. Dev. Min Max class_size 238 18.38389 1.283886 14 19.96154

Regression when X i is a binary variable Example: Thursday The effect January of26 being 14:05:09 in a2017 small Page class1 on test scores (R) / / / / / / / / / / / / Statistics/Data Analysis 34 1. regress test_score small_class, robust Linear regression Number of obs = 420 F(1, 418) = 16.34 Prob > F = 0.0001 R-squared = 0.0369 Root MSE = 18.721 Robust test_score Coef. Std. Err. t P> t [95% Conf. Interval] small_class 7.37241 1.823578 4.04 0.000 3.787884 10.95694 _cons 649.9788 1.322892 491.33 0.000 647.3785 652.5792 β0 = 649.98 is the sample average of test scores in districts with an average class size 20. β1 = 7.37 is the difference in the sample average of test scores in districts with class size < 20 and districts with average class size 20

Regression when X i is a binary variable Example: The effect of being in a small class on test scores Monday January 30 10:27:59 2017 Page 1 35 (R) / / / / / / / / / / / / Statistics/Data Analysis 1. ttest test_score, by(small_class) unequal Two-sample t test with unequal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] 0 182 649.9788 1.323379 17.85336 647.3676 652.5901 1 238 657.3513 1.254794 19.35801 654.8793 659.8232 combined 420 654.1565.9297082 19.05335 652.3291 655.984 diff -7.37241 1.823689-10.95752-3.787296 diff = mean( 0) - mean( 1) t = -4.0426 Ho: diff = 0 Satterthwaite's degrees of freedom = 403.607 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr( T > t ) = 0.0001 Pr(T > t) = 1.0000.

Regression when X i is a binary variable Testing a 2-sided hypothesis concerning β 1, 1% significance level 36 Step 1: β1 = 7.37 Step 2: SE( β 1 ) = 1.82 Step 3: Compute the t-statistic H 0 : β 1 = 0 H 1 : β 1 0 t act = 7.37 0 1.82 = 4.04 Step 4: We reject the null hypothesis at a 1% significance level because 4.04 > 2.58 p value = 0.000 < 0.01

Regression when X i is a binary variable Example: The effect of high per student expenditure on test scores 37 TestScore i = β 0 + β 1 HighExpenditure i + u i Let HighExpenditure i be a binary variable: = 1 if per student expenditure > $6000 HighExpenditure i = 0 if per student expenditure $6000 Interpretation of β 0 : population mean test scores in districts with low per student expenditure β 0 = E (TestScore i HighExpenditure i = 0) Interpretation of β 1 : the difference in population mean test scores between districts with high and districts with low per student expenditures. β 1 = E (TestScore i HighExpenditure i = 1) E (TestScore i HighExpenditure i = 0)

Regression when X i is a binary variable Thursday January 26 14:29:46 2017 Page 1 Example: The effect of high per student expenditure on test scores (R) / / / / / / / / / / / / Statistics/Data Analysis 38 1. regress test_score high_expenditure, robust Linear regression Number of obs = 420 F(1, 418) = 8.02 Prob > F = 0.0048 R-squared = 0.0295 Root MSE = 18.792 Robust test_score Coef. Std. Err. t P> t [95% Conf. Interval] high_expenditure 10.01216 3.535408 2.83 0.005 3.062764 16.96155 _cons 652.9408.9311991 701.18 0.000 651.1104 654.7712 β0 = 652.94 is the sample average of test scores in districts with low per student expenditures. β1 = 10.01 is the difference in the sample average of test scores in districts with high and districts with low per student expenditures.

Regression when X i is a binary variable Testing a 2-sided hypothesis concerning β 1, 10% significance level 39 Step 1: β1 = 10.01 Step 2: SE( β 1 ) = 3.54 Step 3: Compute the t-statistic H 0 : β 1 = 0 H 1 : β 1 0 t act = 10.01 0 3.54 = 2.83 Step 4: We reject the null hypothesis at a 10% significance level because 2.83 > 1.64 p value = 0.005 < 0.10