CHAPTER 4. > 0, where β

Similar documents
Problem 4.1. Problem 4.3

Inference in Regression Analysis

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

Solutions to Problem Set 5 (Due November 22) Maximum number of points for Problem set 5 is: 220. Problem 7.3

CHAPTER 7. + ˆ δ. (1 nopc) + ˆ β1. =.157, so the new intercept is = The coefficient on nopc is.157.

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

Multiple Regression Analysis

Problem C7.10. points = exper.072 exper guard forward (1.18) (.33) (.024) (1.00) (1.00)

Solutions to Problem Set 4 (Due November 13) Maximum number of points for Problem set 4 is: 66. Problem C 6.1

Lecture Module 6. Agenda. Professor Spearot. 1 P-values. 2 Comparing Parameters. 3 Predictions. 4 F-Tests

Problem 13.5 (10 points)

Multiple Regression Analysis: Further Issues

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Multiple Regression: Inference

In Chapter 2, we learned how to use simple regression analysis to explain a dependent

Answer Key: Problem Set 6

Answer Key: Problem Set 5

Statistical Inference. Part IV. Statistical Inference

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

ECNS 561 Topics in Multiple Regression Analysis

Inference in Regression Model

7. Prediction. Outline: Read Section 6.4. Mean Prediction

ECO375 Tutorial 4 Wooldridge: Chapter 6 and 7

Multiple Linear Regression CIVL 7012/8012

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers

Intermediate Econometrics

Introductory Econometrics Exercises for tutorials (Fall 2014)

Model Specification and Data Problems. Part VIII

More on Specification and Data Issues

Introduction to Econometrics. Heteroskedasticity

Introduction to Econometrics Chapter 4

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Exercise sheet 3 The Multiple Regression Model

Outline. 2. Logarithmic Functional Form and Units of Measurement. Functional Form. I. Functional Form: log II. Units of Measurement

Solutions to Problem Set 5 (Due December 4) Maximum number of points for Problem set 5 is: 62. Problem 9.C3

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

ECON3150/4150 Spring 2015

Final Exam - Solutions

ECNS 561 Multiple Regression Analysis

In Chapter 2, we learned how to use simple regression analysis to explain a dependent

Multiple Regression Analysis: Heteroskedasticity

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Simple Regression Model. January 24, 2011

Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues

ECON3150/4150 Spring 2016

THE MULTIVARIATE LINEAR REGRESSION MODEL

Heteroskedasticity (Section )

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

Regression #8: Loose Ends

Wednesday, October 10 Handout: One-Tailed Tests, Two-Tailed Tests, and Logarithms

Chapter 8 Heteroskedasticity

Homoskedasticity. Var (u X) = σ 2. (23)

Ch 13 & 14 - Regression Analysis

Making sense of Econometrics: Basics

Econometrics Problem Set 6

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

EC4051 Project and Introductory Econometrics

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

The Simple Linear Regression Model

Regression Analysis with Cross-Sectional Data

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Chapter 14 Multiple Regression Analysis

Regression Models REVISED TEACHING SUGGESTIONS ALTERNATIVE EXAMPLES

Chapter 13: Dummy and Interaction Variables

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

ECON 497: Lecture Notes 10 Page 1 of 1

Chapter 6 Continuous Probability Distributions

Solutions to Problem Set 6 (Due December 8) Maximum number of points for Problem set 8 is: 220. Problem 10.6

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Econometrics -- Final Exam (Sample)

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Applied Quantitative Methods II

Lab 11 - Heteroskedasticity

Statistical Inference with Regression Analysis

Ordinary Least Squares Regression Explained: Vartanian

Multiple Regression Methods

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

The simple linear regression model discussed in Chapter 13 was written as

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

5.2. a. Unobserved factors that tend to make an individual healthier also tend

Universidad Carlos III de Madrid Econometría Nonlinear Regression Functions Problem Set 8

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 2 Fall 2016 Instructor: Martin Farnham

Part Possible Score Base 5 5 MC Total 50

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Binary Logistic Regression

Solutions to Exercises in Chapter 9

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Transcription:

CHAPTER 4 SOLUTIONS TO PROBLEMS 4. (i) and (iii) generally cause the t statistics not to have a t distribution under H. Homoskedasticity is one of the CLM assumptions. An important omitted variable violates Assumption MLR.. The CLM assumptions contain no mention of the sample correlations among independent variables, except to rule out the case where the correlation is one. 4. (i) H : β =. H : β >. (ii) The proportionate effect on salary is.4(5) =.. To obtain the percentage effect, we multiply this by :.%. Therefore, a 5 point ceteris paribus increase in ros is predicted to increase salary by only.%. Practically speaking this is a very small effect for such a large change in ros. (iii) The % critical value for a one-tailed test, using df =, is obtained from Table G. as.8. The t statistic on ros is.4/.54.44, which is well below the critical value. Therefore, we fail to reject H at the % significance level. (iv) Based on this sample, the estimated ros coefficient appears to be different from zero only because of sampling variation. On the other hand, including ros may not be causing any harm; it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation). 4. (i) Holding profmarg fixed, rdintens ˆ =. log(sales) = (./)[ log( sales) ].(% sales). Therefore, if % sales =, rdintens ˆ., or only about / of a percentage point. For such a large percentage increase in sales, this seems like a practically small effect. (ii) H : β = versus H : β >, where β is the population slope on log(sales). The t statistic is./.6.486. The 5% critical value for a one-tailed test, with df = = 9, is obtained from Table G. as.699; so we cannot reject H at the 5% level. But the % critical value is.; since the t statistic is above this value, we reject H in favor of H at the % level. (iii) Not really. Its t statistic is only.87, which is well below even the % critical value for a one-tailed test. 4.4 (i) H : β =. H : β. (ii) Other things equal, a larger population increases the demand for rental housing, which should increase rents. The demand for overall housing is higher when average income is higher, pushing up the cost of housing, including rental rates.

(iii) The coefficient on log(pop) is an elasticity. A correct statement is that a % increase in population increases rent by.66() =.66%. (iv) With df = 64 4 = 6, the % critical value for a two-tailed test is.66. The t statistic is about.9, which is well above the critical value. So β is statistically different from zero at the % level. 4.5 (i).4 ±.96(.94), or about.8 to.596. (ii) No, because the value.4 is well inside the 95% CI. (iii) Yes, because is well outside the 95% CI. 4.6 (i) With df = n = 86, we obtain the 5% critical value from Table G. with df = 9. Because each test is two-tailed, the critical value is.987. The t statistic for H : β = is about -.89, which is much less than.987 in absolute value. Therefore, we fail to reject β =. The t statistic for H : β = is (.976 )/.49 -.49, which is even less significant. (Remember, we reject H in favor of H in this case only if t >.987.) (ii) We use the SSR form of the F statistic. We are testing q = restrictions and the df in the unrestricted model is 86. We are given SSR r = 9,448.99 and SSR ur = 65,644.5. Therefore, F (9, 448.99 65, 644.5) 86 = 65, 644.5.7, which is a strong rejection of H : from Table G.c, the % critical value with and 9 df is 4.85. (iii) We use the R-squared form of the F statistic. We are testing q = restrictions and there are 88 5 = 8 df in the unrestricted model. The F statistic is [(.89.8)/(.89)](8/).46. The % critical value (again using 9 denominator df in Table G.a) is.5, so we fail to reject H at even the % level. In fact, the p-value is about.. (iv) If heteroskedasticity were present, Assumption MLR.5 would be violated, and the F statistic would not have an F distribution under the null hypothesis. Therefore, comparing the F statistic against the usual critical values, or obtaining the p-value from the F distribution, would not be especially meaningful. 4.7 (i) While the standard error on hrsemp has not changed, the magnitude of the coefficient has increased by half. The t statistic on hrsemp has gone from about.47 to., so now the coefficient is statistically less than zero at the 5% level. (From Table G. the 5% critical value with 4 df is.684. The % critical value is.4, so the p-value is between. and.5.)

(ii) If we add and subtract β log(employ) from the right-hand-side and collect terms, we have log(scrap) + β hrsemp + [ β log(sales) β log(employ)] + [ β log(employ) + β log(employ)] + u + β hrsemp + β log(sales/employ) + ( β + β )log(employ) + u, where the second equality follows from the fact that log(sales/employ) = log(sales) log(employ). Defining θ β + β gives the result. (iii) No. We are interested in the coefficient on log(employ), which has a t statistic of., which is very small. Therefore, we conclude that the size of the firm, as measured by employees, does not matter, once we control for training and sales per employee (in a logarithmic functional form). (iv) The null hypothesis in the model from part (ii) is H : β =. The t statistic is [.95 ( )]/.7 = (.95)/.7.; this is very small, and we fail to reject whether we specify a one- or two-sided alternative. 4.8 (i) We use Property VAR. from Appendix B: Var( ˆβ ˆβ ) = Var ( ˆβ ) + 9 Var ( ˆβ ) 6 Cov ( ˆβ, ˆβ ). (ii) t = ( ˆβ ˆβ )/se( ˆβ ˆβ ), so we need the standard error of ˆβ ˆβ. (iii) Because θ β, we can write β = θ + β. Plugging this into the population model gives y + ( θ + β )x + β x + β x + u + θ x + β (x + x ) + β x + u. This last equation is what we would estimate by regressing y on x, x + x, and x. The coefficient and standard error on x are what we want. 4.9 (i) With df = 76 4 = 7, we use the standard normal critical value (df = in Table G.), which is.96 for a two-tailed test at the 5% level. Now t educ =./5.88.89, so t educ =.89 <.96, and we fail to reject H : β educ = at the 5% level. Also, t age.5, so age is also statistically insignificant at the 5% level. 4

(ii) We need to compute the R-squared form of the F statistic for joint significance. But F = [(..)/(.)](7/).96. The 5% critical value in the F,7 distribution can be obtained from Table G.b with denominator df = : cv =.. Therefore, educ and age are jointly significant at the 5% level (.96 >.). In fact, the p-value is about.9, and so educ and age are jointly significant at the % level. (iii) Not really. These variables are jointly significant, but including them only changes the coefficient on totwrk from.5 to.48. (iv) The standard t and F statistics that we used assume homoskedasticity, in addition to the other CLM assumptions. If there is heteroskedasticity in the equation, the tests are no longer valid. 4. (i) We need to compute the F statistic for the overall significance of the regression with n = 4 and k = 4: F = [.95/(.95)](7/4).4. The 5% critical value with 4 numerator df and using for the numerator df, is.45, which is well above the value of F. Therefore, we fail to reject H : β 4 = at the % level. No explanatory variable is individually significant at the 5% level. The largest absolute t statistic is on dkr, t dkr.6, which is not significant at the 5% level against a two-sided alternative. (ii) The F statistic (with the same df) is now [./(.)](7/4).7, which is even lower than in part (i). None of the t statistics is significant at a reasonable level. (iii) It seems very weak. There are no significant t statistics at the 5% level (against a twosided alternative), and the F statistics are insignificant in both cases. Plus, less than 4% of the variation in return is explained by the independent variables. 4. (i) In columns () and (), the coefficient on profmarg is actually negative, although its t statistic is only about. It appears that, once firm sales and market value have been controlled for, profit margin has no effect on CEO salary. (ii) We use column (), which controls for the most factors affecting salary. The t statistic on log(mktval) is about.5, which is just significant at the 5% level against a two-sided alternative. (We can use the standard normal critical value,.96.) So log(mktval) is statistically significant. Because the coefficient is an elasticity, a ceteris paribus % increase in market value is predicted to increase salary by %. This is not a huge effect, but it is not negligible, either. (iii) These variables are individually significant at low significance levels, with t ceoten. and t comten.79. Other factors fixed, another year as CEO with the company increases salary by about.7%. On the other hand, another year with the company, but not as CEO, lowers salary by about.9%. This second finding at first seems surprising, but could be related to the superstar effect: firms that hire CEOs from outside the company often go after a small pool of highly regarded candidates, and salaries of these people are bid up. More non-ceo years with a company makes it less likely the person was hired as an outside superstar. 5

SOLUTIONS TO COMPUTER EXERCISES 4. (i) Holding other factors fixed, votea log( expenda) = ( β /)[ log( expenda)] ( β /)(% expenda), where we use the fact that log( expenda) % expenda. So β / is the (ceteris paribus) percentage point change in votea when expenda increases by one percent. (ii) The null hypothesis is H : β = β, which means a z% increase in expenditure by A and a z% increase in expenditure by B leaves votea unchanged. We can equivalently write H : β + β =. (iii) The estimated equation (with standard errors in parentheses below estimates) is votea ˆ =45.8 +6.8 log(expenda) 6.65 log(expendb) +.5 prtystra (.9) (.8) (.79) (.6) n = 7, R =.79. The coefficient on log(expenda) is very significant (t statistic 5.9), as is the coefficient on log(expendb) (t statistic 7.45). The estimates imply that a % ceteris paribus increase in spending by candidate A increases the predicted share of the vote going to A by about.6 percentage points. [Recall that, holding other factors fixed, votea (6.8/)% expenda).] Similarly, a % ceteris paribus increase in spending by B reduces votea by about.66 percentage points. These effects certainly cannot be ignored. While the coefficients on log(expenda) and log(expendb) are of similar magnitudes (and opposite in sign, as we expect), we do not have the standard error of ˆβ + ˆβ, which is what we would need to test the hypothesis from part (ii). (iv) Write θ + β, or β = θ β. Plugging this into the original equation, and rearranging, gives votea + θ log(expenda) + β [log(expendb) log(expenda)] + β prtystra + u, When we estimate this equation we obtain $ θ.5 and se( $ θ ).5. The t statistic for the hypothesis in part (ii) is.5/.5. Therefore, we fail to reject H : β = β. 4. (i) In the model log(salary) + β LSAT + β GPA + β log(libvol) + β 4 log(cost)+ β 5 rank + u, 6

the hypothesis that rank has no effect on log(salary) is H : β 5 =. The estimated equation (now with standard errors) is log ( salary ) =8.4 +.47 LSAT +.48 GPA +.95 log(libvol) (.5) (.4) (.9) (.) n = 6, R =.84. +.8 log(cost). rank (.) (.) The t statistic on rank is, which is very significant. If rank decreases by (which is a move up for a law school), median starting salary is predicted to increase by about.%. (ii) LSAT is not statistically significant (t statistic.8) but GPA is very significance (t statistic.76). The test for joint significance is moot given that GPA is so significant, but for completeness the F statistic is about 9.95 (with and df) and p-value.. (iii) When we add clsize and faculty to the regression we lose five observations. The test of their joint significant (with and 8 = df) gives F.95 and p-value.9. So these two variables are not jointly significant unless we use a very large significance level. (iv) If we want to just determine the effect of numerical ranking on starting law school salaries, we should control for other factors that affect salaries and rankings. The idea is that there is some randomness in rankings, or the rankings might depend partly on frivolous factors that do not affect quality of the students. LSAT scores and GPA are perhaps good controls for student quality. However, if there are differences in gender and racial composition across schools, and systematic gender and race differences in salaries, we could also control for these. However, it is unclear why these would be correlated with rank. Faculty quality, as perhaps measured by publication records, could be included. Such things do enter rankings of law schools. 4.4 (i) The estimated model is log ( price ) =.67 +.79 sqrft +.89 bdrms (.)(.4) (.96) n = 88, R =.588. Therefore, ˆ θ = 5(.79) +.89 =.858, which means that an additional 5 square foot bedroom increases the predicted price by about 8.6%. (ii) β = θ 5 β, and so 7

log(price) + β sqrft + ( θ 5 β )bdrms + u + β (sqrft 5 bdrms) + θ bdrms + u. (iii) From part (ii), we run the regression log(price) on (sqrft 5 bdrms) and bdrms, and obtain the standard error on bdrms. We already know that ˆ θ =.858; now we also get se( ˆ θ ) =.68. The 95% confidence interval reported by my software package is.6 to.9 (or about.% to.9%). 4.5 The R-squared from the regression bwght on cigs, parity, and faminc, using all,88 observations, is about.48. This means that, if we mistakenly use this in place of.64, which is the R-squared using the same,9 observations available in the unrestricted regression, we would obtain F = [(.87.48)/(.87)](,85/).4, which yields p-value.9 in an F distribution with and,85 df. This is significant at the % level, but it is incorrect. The correct F statistic was computed as.4 in Example 4.9, with p-value.4. 4.6 (i) If we drop rbisyr the estimated equation becomes log ( salary ) =. +.677 years +.58 gamesyr (.7) (.) (.6) +.4 bavg +.59 hrunsyr (.) (.7) n = 5, R =.65. Now hrunsyr is very statistically significant (t statistic 4.99), and its coefficient has increased by about two and one-half times. (ii) The equation with runsyr, fldperc, and sbasesyr added is log ( salary ) =.4 +.7 years +.79 gamesyr (.) (.) (.7) n = 5, R =.69. +.5 bavg +. hrunsyr (.) (.86) +.74 runsyr +. fldperc.64 sbasesyr (.5) (.) (.5) 8

Of the three additional independent variables, only runsyr is statistically significant (t statistic =.74/.5.4). The estimate implies that one more run per year, other factors fixed, increases predicted salary by about.74%, a substantial increase. The stolen bases variable even has the wrong sign with a t statistic of about., while fldperc has a t statistic of only.5. Most major league baseball players are pretty good fielders; in fact, the smallest fldperc is 8 (which means.8). With relatively little variation in fldperc, it is perhaps not surprising that its effect is hard to estimate. (iii) From their t statistics, bavg, fldperc, and sbasesyr are individually insignificant. The F statistic for their joint significance (with and 45 df) is about.69 with p-value.56. Therefore, these variables are jointly very insignificant. 4.7 (i) In the model log(wage) + β educ + β exper + β tenure + u the null hypothesis of interest is H : β. (ii) Let θ β. Then we can estimate the equation log(wage) + β educ + θ exper + β (exper + tenure) + u to obtain the 95% CI for θ. This turns out to be about. ±.96(.47), or about -.7 to.. Because zero is in this CI, θ is not statistically different from zero at the 5% level, and we fail to reject H : β at the 5% level. 4.8 (i) The minimum value is, the maximum is 99, and the average is about 56.6. (ii) When phsrank is added to (4.6), we get the following: log ( wage ) =.459.9 jc +.755 totcoll +.49 exper +. phsrank (.4) (.7) (.6) (.) (.4) n = 6,76, R =. So phsrank has a t statistic equal to only.5; it is not statistically significant. If we increase phsrank by, log(wage) is predicted to increase by (.) =.. This implies a.% increase in wage, which seems a modest increase given a percentage point increase in phsrank. (However, the sample standard deviation of phsrank is about 4.) (iii) Adding phsrank makes the t statistic on jc even smaller in absolute value, about., but the coefficient magnitude is similar to (4.6). Therefore, the base point remains unchanged: the return to a junior college is estimated to be somewhat smaller, but the difference is not significant and standard significant levels. 9

(iv) The variable id is just a worker identification number, which should be randomly assigned (at least roughly). Therefore, id should not be correlated with any variable in the regression equation. It should be insignificant when added to (4.7) or (4.6). In fact, its t statistic is about.54. 4.9 (i) There are,7 single people in the sample of 9,75. (ii) The estimated equation is $ nett fa = 4.4 +.799 inc +.84 age ( 4.8) (.6) (.9) n =,7, R =.9. The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 8 more cents in predicted nettfa; no surprise there. The coefficient on age means that, holding income fixed, if a person gets another year older, his/her nettfa is predicted to increase by about $84. (Remember, nettfa is in thousands of dollars.) Again, this is not surprising. (iii) The intercept is not very interesting, as it gives the predicted nettfa for inc = and age =. Clearly, there is no one with even close to these values in the relevant population. (iv) The t statistic is (.84 )/.9.7. Against the one-sided alternative H : β <, the p-value is about.44. Therefore, we can reject H : β = at the 5% significance level (against the one-sided alternative). (v) The slope coefficient on inc in the simple regression is about.8, which is not very different from the.799 obtained in part (ii). As it turns out, the correlation between inc and age in the sample of single people is only about.9, which helps explain why the simple and multiple regression estimates are not very different; refer back to page 79 of the text.