Multiple Regression: Inference

Similar documents
THE MULTIVARIATE LINEAR REGRESSION MODEL

Inference in Regression Analysis

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

Multiple Regression Analysis

1 Independent Practice: Hypothesis tests for one parameter:

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Problem 4.1. Problem 4.3

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Model Specification and Data Problems. Part VIII

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

ECO220Y Simple Regression: Testing the Slope

Statistical Inference with Regression Analysis

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Course Econometrics I

Lab 10 - Binary Variables

Statistical Inference. Part IV. Statistical Inference

STATISTICS 110/201 PRACTICE FINAL EXAM

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

Lecture 4: Multivariate Regression, Part 2

Problem Set 1 ANSWERS

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Econometrics Homework 1

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Lecture 4: Multivariate Regression, Part 2

sociology 362 regression

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Lecture 5: Hypothesis testing with the classical linear model

Quantitative Methods Final Exam (2017/1)

CHAPTER 4. > 0, where β

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

F Tests and F statistics

sociology 362 regression

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

Empirical Application of Simple Regression (Chapter 2)

σ σ MLR Models: Estimation and Inference v.3 SLR.1: Linear Model MLR.1: Linear Model Those (S/M)LR Assumptions MLR3: No perfect collinearity

Business Statistics. Lecture 10: Course Review

Applied Statistics and Econometrics

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Handout 12. Endogeneity & Simultaneous Equation Models

Lecture 7: OLS with qualitative information

Regression #8: Loose Ends

ECON3150/4150 Spring 2015

Lecture 2 Multiple Regression and Tests

5.2. a. Unobserved factors that tend to make an individual healthier also tend

Inference in Regression Model

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X

Econometrics Midterm Examination Answers

Nonlinear Regression Functions

General Linear Model (Chapter 4)

Multiple Regression Analysis: Heteroskedasticity

Essential of Simple regression

Computer Exercise 3 Answers Hypothesis Testing

Lecture 8: Functional Form

Course Econometrics I

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Tests of Linear Restrictions

1 A Non-technical Introduction to Regression

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

1 The basics of panel data

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham

ECON3150/4150 Spring 2016

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Problem Set 10: Panel Data

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Table 1: Fish Biomass data set on 26 streams

Lab 07 Introduction to Econometrics

Applied Statistics and Econometrics

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Hypothesis Tests and Confidence Intervals. in Multiple Regression

ECON3150/4150 Spring 2016

Specification Error: Omitted and Extraneous Variables

Introduction to Econometrics. Review of Probability & Statistics

Lecture 3: Multivariate Regression

Brief Suggested Solutions

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

Lab 6 - Simple Regression

The simple linear regression model discussed in Chapter 13 was written as

Problem C7.10. points = exper.072 exper guard forward (1.18) (.33) (.024) (1.00) (1.00)

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Binary Dependent Variables

Lab 11 - Heteroskedasticity

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics August 2013

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Lecture#12. Instrumental variables regression Causal parameters III

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Applied Quantitative Methods II

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

At this point, if you ve done everything correctly, you should have data that looks something like:

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Applied Statistics and Econometrics

Solutions to Problem Set 5 (Due November 22) Maximum number of points for Problem set 5 is: 220. Problem 7.3

Transcription:

Multiple Regression: Inference

The t-test: is ˆ j big and precise enough? We test the null hypothesis: H 0 : β j =0; i.e. test that x j has no effect on y once the other explanatory variables are controlled for. ˆ j will never be 0, but how far is it from 0? Need to weigh the size of the estimate against its sampling error. j We define the t-statistic as: t. ˆ se( ˆ ) Reject H 0 : β j =0 if t is sufficiently large: threshold depends on chosen significance level. Note: we test β j =0, and never. ˆ j j ˆ 0 j

Normality provides a benchmark for t-test Assumption 6: Normality The population error u is independent of the explanatory variables x1, x2,, xk and is normally distributed with zero mean and variance σ 2 : u ~ Normal(0, σ 2 ). 4 assumptions =>OLS gives us an unbiased estimate of the coefficient. 4+1 assumptions =>OLS gives us an unbiased estimate of the variance of the coefficient estimate and OLS is efficient (BLUE). 4+2 assumptions (=Classical Linear Model assumptions) =>coefficients have a normal distribution. OLS estimators are the best estimators: smallest variance among ALL estimators, not only the linear ones.

0 Density.2.4.6.8 What if normality assumption fails? Example: Crime data, variable narr86 use http://fmwww.bc.edu/ec-p/data/wooldridge/crime1 hist narr86,discrete Non-normality of the errors will not be a problem if: -Large sample size -Log transformation of the dep var. -drop outliers 0 5 10 15 narr86

Distribution of OLS estimators 5 Gauss-Markov Assumptions + Normality assumption OLS estimators are normally distributed: Or So under the CLM assumptions,. Careful: this is different from the previous theorem, which involved the constant σ in sd( ˆ j), while in the t-test it is the random variable ˆ. Note: normality of the OLS estimators is still approximately true in large samples even without normality of the errors.

Testing against two-sided alternatives: null hypothesis H 0 : β j =0 against H 1 : β j 0. Need to decide on a significance level, or the proba of rejecting H 0 when it is true. Common choice for significance level: 5%. When the alternative is two-sided, we are interested in the absolute value of the t-statistic => rejection rule is: t ˆ >c, where critical value c depends on the significance j level and the degrees of freedom (df = n-k-1) : when df<120: see table G2; when df>120: standard normal critical value.

Example: Correlates of education Source: WAGE2.dta, Wooldridge Population model to be estimated: educ 0 1sibs 2 feduc 3meduc 4brthord. reg educ sibs meduc feduc brthord u Source SS df MS Number of obs = 663 F( 4, 658) = 43.75 Model 692.455912 4 173.113978 Prob > F = 0.0000 Residual 2603.75525 658 3.95707485 R-squared = 0.2101 Adj R-squared = 0.2053 Total 3296.21116 662 4.97917094 Root MSE = 1.9892 educ Coef. Std. Err. t P> t [95% Conf. Interval] sibs -.0910052.0426604-2.13 0.033 -.1747722 -.0072383 meduc.1214772.0343839 3.53 0.000.0539617.1889926 feduc.2152426.0288683 7.46 0.000.1585576.2719277 brthord -.0122032.0647301-0.19 0.851 -.1393056.1148992 _cons 10.43929.392977 26.56 0.000 9.667645 11.21093

Test significancy of one variable: H 0 : β 1 =0 Is the relationship between education and the number of siblings significant? Is significantly different from 0? 1 We can reject the null-hypothesis ( = 0) at the 5% significance level. We see this because the absolute value of the t-stat is: ˆ 0.091 1.96 is the critical value for a two-tailed test at 5% when we have more than 120 degrees of freedom (here n-k-1=658). 1 1 t 2.13 1. 96 ˆ1 se( ˆ ) 0.0427 1 criticalvalue

What about the other variables? The coefficient of mother s and father s education is statistically significant at the 1% level and below. (t-stat is larger than 2.576). However, we fail to reject the null-hypothesis that birthorder has no effect on education at the 1, 5, and 10% significance level. Note that this last finding suggests that brthord is an irrelevant variable in the regression. Taking it out of the regression does not have a strong effect on any of the coefficient estimates (see regression below)

What happens if you drop one irrelevant variable?. reg educ sibs meduc feduc if brthord!=. Source SS df MS Number of obs = 663 F( 3, 659) = 58.40 Model 692.315273 3 230.771758 Prob > F = 0.0000 Residual 2603.89589 659 3.95128359 R-squared = 0.2100 Adj R-squared = 0.2064 Total 3296.21116 662 4.97917094 Root MSE = 1.9878 educ Coef. Std. Err. t P> t [95% Conf. Interval] sibs -.0953664.0358173-2.66 0.008 -.1656961 -.0250367 meduc.1221488.0341738 3.57 0.000.0550461.1892515 feduc.2155921.0287876 7.49 0.000.1590655.2721186 _cons 10.41426.3696027 28.18 0.000 9.688516 11.14 Note: we see that once the irrelevant variable brthord is not included, the standard error of sibs decreases, and the absolute value of the t-stat increases: sibs is now statistically significant at the 1% level.

Practical guidance One can NEVER accept the null hypothesis: When low t-stat (or high p-value), we fail to reject the null hypothesis. Statistical significance (economic) importance t significance ~ t-statistic importance ~ magnitude of coefficient ˆ j se( ˆ ) If coefficient is insignificant (low t value), then no meaningful interpretation of sign and magnitude of coefficient=>just ignore it. Practical advice: with bigger samples, std errors decrease, which results in more statistical significance=>decrease significance level to be sure. ˆ j j

Testing against one-sided alternatives: H 0 : β j =0 against H 1 : β j <0. Here we only care about the alternative H 1 : β j <0. Why? Introspection, econ theory We are looking for a sufficiently large negative value of t ˆ in order to reject in favor of H 1 j rejection rule: H 0 is rejected in favor of H 1 if t<-c (or t >c). Remember: to reject H 0 against the negative alternative, we must get a negative t statistic.

A few things about the critical value Critical value c smaller than for 2-sided test (see table G2). As significance level falls, the critical value increases. So if H0 is rejected at the 1% level, then it is also rejected at the 5 and 10% levels. Testing H 0 against alternative hypothesis H 1 : β j >0 leads to rejection rule: H 0 is rejected in favor of H 1 if t>c (or t >c).

Testing other hypotheses about β j : H 0 : β j =a H 0 : β j =a against H 1 : β j a a = the hypothesized ceteris paribus effect of x on y ˆ j a t-stat can be written as: t ˆ j se( ˆ ) We reject H 0 if t c ˆ. Alternatively: reject H0 if a is not j in the 95% confidence interval: ˆ is statistically different j from a at the 5% significance level. t If H 1 : β j >a, reject if. ˆ j Note: depending on whether one sided or sided alternative, c will not be the same, see G2. c j

Confidence intervals for β j Using the fact that, a 95% confidence interval for the population parameter β j is given by: ˆ ˆ ˆ ˆ j c. se( j ), j c. se( j ), where the constant c is the 97.5 th percentile in a t n-k-1 distribution (as before). Ex. (see appendix G2): for df=n-k-1=25, a 95% CI is ˆ 2.06. ( ˆ ) j se j When df>50, we can consider c 2. Application: H 0 : β j =a is rejected if a not in the 95% CI (the same if a is 0).

Example: Rationality of house assessments Source: HPRICE1.dta, Wooldridge. regress lprice lassess Source SS df MS Number of obs = 88 F( 1, 86) = 280.94 Model 6.13852904 1 6.13852904 Prob > F = 0.0000 Residual 1.87907448 86.021849703 R-squared = 0.7656 Adj R-squared = 0.7629 Total 8.01760352 87.092156362 Root MSE =.14782 lprice Coef. Std. Err. t P> t [95% Conf. Interval] lassess 1.013407.0604609 16.76 0.000.8932147 1.133599 _cons -.1614743.3460739-0.47 0.642 -.8494464.5264978 Test elasticity of actual price w.r.t. assessed price is zero, i.e. the assessment has no impact on actual price. H 0 : β 1 =0 against H 1 : β 1 0.

H 0 : β 1 =0 against H 1 : β 1 0. Compute t-stat to test H 0 : β 1 =0: t ˆ 1 ˆ 1.013 0.06 16.88 ˆ1 se( 1) =>greater than critical values at any significance level=>we reject the null hypothesis, we conclude that β 1 0, i.e. the value of the assessed price does impact the value of the actual price of the house. Alternatively, we can see that 0 is not included in the confidence interval at 95% given by stata for the value of ˆ j, so we can reject H 0 at the 5% level (at least).

H 0 : β 1 =1 against H 1 : β 1 1. Compute t-stat to test H 0 : β 1 =1: ˆ 1 1 1.0131 t 0.22 ˆ1 se( ˆ 0.06 1) =>smaller than critical values at 1, 5, or even10% levels (c is appr. 1.7 for two-tailed test at 10%)=>we cannot reject the null hypothesis. Alternatively, we could have looked at the 95% confidence interval in the stata output. Given that 1 is in between the interval, we cannot reject H 0 at the 5% significance level.

Is the price assessment rational? What if we include other characteristics? We estimate the model: lprice 0 1lassess 2llotsize 3lsqrft 4bdrms u We think that once assessed price is controlled for, the other characteristics should not impact the actual price. =>test 3 null hypotheses: H 0 : β 2 =0; H 0 : β 3 =0; H 0 : β 4 =0. Stata commands: eststo clear eststo: reg lprice lassess eststo: reg lprice lassess llotsize lsqrft bdrms eststo: reg lprice llotsize lsqrft bdrms esttab,r2

Interpretation of the results Is the house assessment rational? i.e. do other house characteristics impact actual sales price of the house when the assessed price is controlled for? The results in column 2 do not allow to reject the 3 null-hypotheses, and hence provide support for the rational assessment interpretation. Not surprisingly, we see that R-squared does not increase much. House characteristics do not explain much more of the variation in sales prices, once assessed prices are controlled for. Moreover, looking at column three, we see that, as one would expect, we do find significant effects of some the house characteristics on the sales price, if the assessed price of the house is not controlled for. Note also that the coefficient of log(sqrft) is negative in column 2 but positive in column 3. We don t have to be worried about this because we only want to think about the interpretation of the sign if the coefficient is significant. Given that it s insignificant in column 2, the counterintuitive sign in column 2 doesn t matter.

P-values for t-tests Given an observed t statistic, what is the smallest significance level at which H 0 would be rejected? = P( T > t ): «p-value for testing H 0 : β j =0 against twosided alternative» with T being a random variable with n-k-1 df t the numerical value of the test statistic = probability of observing a t statistic as large as we did if the null hypothesis is true=>think of it as the proba of rejecting H 0 while H 0 is true. =lowest significance level at which you can reject H 0. Note: to obtain the one-sided p-value: just divide the twosided p-value by 2.

Testing multiple/joint linear restrictions: the F-test Testing exclusion restrictions: y 0 1x1 2x2 3x3 4x4 5x5 u H 0 : β 3 =0 ; β 4 =0 ; β 5 =0. H 1 : H 0 does not hold, i.e. x 3, x 4 and x 5 combined have an effect on y. Need to test the restrictions jointly. F-test: estimates the model with (=unrestricted) and without (=restricted) x 3, x 4 and x 5, compare the Sum of Squared residuals: how does SSR increase when we drop these variables? If this increase is big enough, we will reject the joint null hypothesis.

The F-test F-statistic: F, where q= nb of restrictions. Under H 0, and assuming CLM assumptions hold, F~F q,n-k-1. =>Reject H 0 if SSR r is relatively large compared to SSR ur, more specifically if F>c, where critical value c depends on the chosen significance level, the number of restrictions q, and the degrees of freedom (n-k-1). (see table G3) SSR SSR r ur SSR n k ur q 1 Terminology: if H 0 is rejected, x 3, x 4 and x 5 are jointly statistically significant.

Notes on F-test The F-stat is always 0. Even if the t-tests on each coeff conclude x 3, x 4 and x 5 are individually statistically insignificant, it may be that x 3, x 4 and x 5 are jointly statistically significant (e.g. due to multicollinearity). Be careful when comparing two models: the same observations should be used=>careful to missing values!

Ex: Effect of personal characteristics and marriage characteristics on the number of extramarital affairs Source: affairs.dta (Wooldridge), data originally used in R.C. Fair (1978), "A Theory of Extramarital Affairs," Journal of Political Economy 86, 45-61, 1978.. desc naffairs age educ occup yrsmarr ratemarr storage display value variable name type format label variable label naffairs byte %9.0g number of affairs within last year age float %9.0g in years educ byte %9.0g years schooling occup byte %9.0g occupation, reverse Hollingshead scale yrsmarr float %9.0g years married ratemarr byte %9.0g 5 = vry hap marr, 4 = hap than avg, 3 = avg, 2 = smewht unhap, 1 = vry unhap. sum naffairs age educ occup yrsmarr ratemarr Variable Obs Mean Std. Dev. Min Max naffairs 601 1.455907 3.298758 0 12 age 601 32.48752 9.288762 17.5 57 educ 601 16.16639 2.402555 9 20 occup 601 4.194676 1.819443 1 7 yrsmarr 601 8.177696 5.571303.125 15 ratemarr 601 3.93178 1.103179 1 5

. tab naffairs number of affairs within last year Freq. Percent Cum.. tab ratemarr 0 451 75.04 75.04 1 34 5.66 80.70 2 17 2.83 83.53 3 19 3.16 86.69 7 42 6.99 93.68 12 38 6.32 100.00 Total 601 100.00 5 = vry hap marr, 4 = hap than avg, 3 = avg, 2 = smewht unhap, 1 = vry unhap Freq. Percent Cum. 1 16 2.66 2.66 2 66 10.98 13.64 3 93 15.47 29.12 4 194 32.28 61.40 5 232 38.60 100.00 Total 601 100.00

. regress naffairs age educ occup yrsmarr ratemarr Source SS df MS Number of obs = 601 F( 5, 595) = 13.91 Model 683.446785 5 136.689357 Prob > F = 0.0000 Residual 5845.63475 595 9.82459621 R-squared = 0.1047 Adj R-squared = 0.0972 Total 6529.08153 600 10.8818026 Root MSE = 3.1344 naffairs Coef. Std. Err. t P> t [95% Conf. Interval] age -.0553558.0224629-2.46 0.014 -.099472 -.0112396 educ -.0009097.0636454-0.01 0.989 -.1259067.1240873 occup.1258993.0841543 1.50 0.135 -.0393763.2911749 yrsmarr.1442325.0372344 3.87 0.000.0711058.2173593 ratemarr -.7548703.120666-6.26 0.000 -.9918534 -.5178873 _cons 4.529373 1.06696 4.25 0.000 2.433907 6.62484. regress naffairs yrsmarr ratemarr Source SS df MS Number of obs = 601 F( 2, 598) = 30.71 Model 608.178688 2 304.089344 Prob > F = 0.0000 Residual 5920.90284 598 9.90117532 R-squared = 0.0931 Adj R-squared = 0.0901 Total 6529.08153 600 10.8818026 Root MSE = 3.1466 naffairs Coef. Std. Err. t P> t [95% Conf. Interval] yrsmarr.0748148.0237706 3.15 0.002.0281307.1214989 ratemarr -.7439478.120047-6.20 0.000 -.9797128 -.5081827 _cons 3.769133.5671483 6.65 0.000 2.655289 4.882978

. esttab,r2 scalars(rss df_r) age -0.0554* (-2.46) educ -0.000910 (-0.01) occup 0.126 (1.50) (1) (2) naffairs naffairs yrsmarr 0.144*** 0.0748** (3.87) (3.15) ratemarr -0.755*** -0.744*** (-6.26) (-6.20) _cons 4.529*** 3.769*** (4.25) (6.65) N 601 601 R-sq 0.105 0.093 rss 5845.6 5920.9 df_r 595 598 t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001

Interpretation: These two regression estimates allow you to test whether individual characteristics jointly have an effect on the number of affairs a person has in a year. In other words, are affairs mainly explained by characteristics of the marriage itself, or do individual characteristics play a role? H 0 : β 1 =0, β 2 =0, β 3 =0. In order to figure this out, we estimate a regression including both individual and marriage characteristics (the unrestricted model), and one with only marriage characteristics (the restricted model). We use an F-test to test the exclusion restrictions. We obtain SSR ur = 5846, SSR r = 5921, q =3 (number of restrictions), n-k-1 = 595 (degrees of freedom) F = [(SSR r - SSR ur )/q]/[ssr ur /n-k-1] =2.55. The critical value for q=3 and n-k-1=595, at the 5% level, is 2.60. Hence we cannot reject the null-hypothesis at the 5% significance level. We can however reject the null-hypothesis at the 10% level (critical value is 2.08).

Notes on F-test R-squared form of F-stat: because SSR=SST(1-R 2 ), F 2 ( R (1 R P-values for F-test: probability of observing a value of F as large as we did, given H 0 is true=>to reject H 0, p-value has to be low. F-stat for overall significance of a regression: H 0 : β 1 = β 2 = = β k =0 ur 2 ur 2 R r ) q ) n k 1 Non-zero hypotheses can be incorporated in a F-test. F-test for 1 restriction (β j =0) two-sided t-test.

Ex. 2: Effect of mother s and father s education on wage. eststo clear. eststo:regress wage educ IQ meduc feduc Source SS df MS Number of obs = 722 F( 4, 717) = 29.16 Model 16801948.1 4 4200487.02 Prob > F = 0.0000 Residual 103272846 717 144034.653 R-squared = 0.1399 Adj R-squared = 0.1351 Total 120074794 721 166539.243 Root MSE = 379.52 wage Coef. Std. Err. t P> t [95% Conf. Interval] educ 32.3182 7.907736 4.09 0.000 16.79312 47.84329 IQ 4.71663 1.154909 4.08 0.000 2.449221 6.984038 meduc 7.785773 6.245587 1.25 0.213-4.476051 20.0476 feduc 9.326488 5.467631 1.71 0.088-1.407992 20.06097 _cons -125.9063 107.7862-1.17 0.243-337.5206 85.70794. test (meduc=0) (feduc=0) ( 1) meduc = 0 ( 2) feduc = 0 F( 2, 717) = 4.26 Prob > F = 0.0145

Interpretation: We want to test whether mother and father s education have a jointly significant effect on wage of the child. H 0 : β meduc =0, β feduc =0. we estimate the unrestricted model and ask stata to do the F-test. Stata indicates there are 2 restrictions, 717 degrees of freedom, and the calculated F-value is 4.26. It also indicates the P-value of the F-test: 0.0145. Hence we can reject the null-hypothesis at the 5% level but not at the 1%. To be more precise, the probability of observing an F-value of 4.26 when the null-hypothesis holds is 1.45%. Note that the significance of this test is much higher than for the individual t-tests for these parameters. This can be explained by multicollinearity. We could have obtained the same result by estimating the restricted model, and using the R-squared form of the F-statistic. Note that we need to be careful to estimate the restricted model on the same observations (i.e. excluding those for which meduc or feduc have missing values, see below)

. eststo:regress wage educ IQ if e(sample) Source SS df MS Number of obs = 722 F( 2, 719) = 53.58 Model 15574479 2 7787239.51 Prob > F = 0.0000 Residual 104500315 719 145341.189 R-squared = 0.1297 Adj R-squared = 0.1273 Total 120074794 721 166539.243 Root MSE = 381.24. esttab,r2 scalars(rss df_r) educ 32.32*** 39.66*** (4.09) (5.27) IQ 4.717*** 5.338*** (4.08) (4.68) meduc 7.786 (1.25) feduc 9.326 (1.71) (1) (2) wage wage _cons -125.9-109.9 (-1.17) (-1.03) N 722 722 R-sq 0.140 0.130 rss 103272845.9 104500315.0 df_r 717 719 t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001 F ( R (1 R 2 ur 2 ur R 2 r )/ q )/( n k 1) (0.1399-0.1297)/2 (1-0.1399)/(717) 4.25 which is larger than 3.00, the critical value at 5%. The regression output shows that the F- value for the test of overall significance of the regression is very high, 53.58. Indeed the P-value shows significance at very low levels.

Test H 0 : linear combination of the parameters=0 H 0 : β 1 =β 2 against H 1 :β 1 β 2. Method 1: test H 0 : β 1 -β 2 =0 Need t-stat: 1 2. =>Need to compute the denominator: with s 12 the estimate of the covariance. Note: ˆ ˆ t se ( ˆ ˆ ) 1 2 2 se( ˆ ) se( ˆ ) 1/ 2 se( ˆ ˆ s 1 2) 1 2 2 2 12 se( ˆ ˆ ) ( ˆ ) ( ˆ 1 2 se 1 se 2)

Ex2b: Effect of mother s and father s education on wage. regres wage educ IQ meduc feduc Source SS df MS Number of obs = 722 F( 4, 717) = 29.16 Model 16801948.1 4 4200487.02 Prob > F = 0.0000 Residual 103272846 717 144034.653 R-squared = 0.1399 Adj R-squared = 0.1351 Total 120074794 721 166539.243 Root MSE = 379.52 wage Coef. Std. Err. t P> t [95% Conf. Interval] educ 32.3182 7.907736 4.09 0.000 16.79312 47.84329 IQ 4.71663 1.154909 4.08 0.000 2.449221 6.984038 meduc 7.785773 6.245587 1.25 0.213-4.476051 20.0476 feduc 9.326488 5.467631 1.71 0.088-1.407992 20.06097 _cons -125.9063 107.7862-1.17 0.243-337.5206 85.70794. test meduc = feduc ( 1) meduc - feduc = 0 F( 1, 717) = 0.02 Prob > F = 0.8788

Ex 3: Effect of years of tenure in company and years of experience on wage wage 0 1exper 2tenure 3educ 4IQ u. regress wage exper tenure educ IQ Source SS df MS Number of obs = 935 F( 4, 930) = 47.59 Model 25948278.7 4 6487069.67 Prob > F = 0.0000 Residual 126767890 930 136309.559 R-squared = 0.1699 Adj R-squared = 0.1663 Total 152716168 934 163507.675 Root MSE = 369.2 wage Coef. Std. Err. t P> t [95% Conf. Interval] exper 14.97782 3.208597 4.67 0.000 8.680892 21.27475 tenure 7.359893 2.46962 2.98 0.003 2.51322 12.20657 educ 57.19797 7.033118 8.13 0.000 43.39534 71.00059 IQ 4.872941.9391096 5.19 0.000 3.029922 6.715961 _cons -532.4066 116.2499-4.58 0.000-760.5492-304.264. test exper==tenure ( 1) exper - tenure = 0 F( 1, 930) = 2.84 Prob > F = 0.0923

Test H 0 : linear combination of parameters=0 (2) H 0 : β 1 =β 2 against H 1 :β 1 β 2. 1 Method 2: define 1 2 Test H 0 : 0 versus H 1 : =>use standard t-test. 1 1 0 Need to redefine variables given that: 1 1 2 =>Estimate y 0 ( 1 2) x1... k xk u y x ( x x ) 0 1 1 2 1 2... k xk u Estimating this model allows us to test H 0 : 0 1 using a simple t-test on one variable (here x 1 ).

Ex 3b: Effect of years of tenure in company and years of experience on wage The new model estimated, using method 2, is: y 0 1 exp er 2(exp er tenure ) 3educ 4IQ u. gen sum = exper+tenure. regress wage exper sum educ IQ Source SS df MS Number of obs = 935 F( 4, 930) = 47.59 Model 25948278.7 4 6487069.67 Prob > F = 0.0000 Residual 126767890 930 136309.559 R-squared = 0.1699 Adj R-squared = 0.1663 Total 152716168 934 163507.675 Root MSE = 369.2 wage Coef. Std. Err. t P> t [95% Conf. Interval] exper 7.617928 4.520722 1.69 0.092-1.254071 16.48993 sum 7.359893 2.46962 2.98 0.003 2.51322 12.20657 educ 57.19797 7.033118 8.13 0.000 43.39534 71.00059 IQ 4.872941.9391096 5.19 0.000 3.029922 6.715961 _cons -532.4066 116.2499-4.58 0.000-760.5492-304.264

Interpretation: we have reformulated the problem in such a way that we can look at the t-test for the first variable (exper), and use it to test the null-hypothesis. The t-test shows we can reject the null hypothesis at the 10% level but not at the 5%. Note that value of t-test is square root of value of F-test shown earlier: 1.69 2 equals 2.84 (approximately). Of course, we should be careful when interpreting the coefficient estimates. To know the total effect of experience, we add up the coefficients of exper and sum. This shows the same effect than previously.

Can we use regression outputs to test joint hypotheses?

Can we test whether log(lotsize), log(sqrft) and bdrms jointly have a significant effect, once the assessed price is controlled for? Yes, given that we have the R-squared of the restricted and the unrestricted model, and the number of observations, we can calculate the R-squared form of the F-test. Can we test whether price assessments are rational, when we define rationality as? The rationality hypothesis can be restated in these terms: a 1% change in assess would be associated with a 1% change in price; that is β 1 =1. In addition, lotsize, sqrft, and bdrms should not help to explain log(price), once the assessed value has been controlled for. Answer: No, we would need to have access to the data, because one of the coefficients is hypothesized not to equal zero.

How can we test for the latter joint hypothesis? There are four restrictions to be tested, three are exclusion restrictions, but β 1 =1 is not. How can we test this hypothesis using F-stat? =>estimate unrestricted and restricted models. Unrestricted model: versus restricted model: The F-stat is simply: y 0 1x1 3x3 4x4 y x u ( SSR 1 0 r SSR /( n 5) ) / 4 The 5% critical value in a F distribution with (4, 83) df is about 2.50, so we fail to reject H0. There is no evidence that the assessed values are not rational. ur SSR ur u (1.880 1.822) / 4 0.661 1.822 / 83