Multiple Regression Analysis: Further Issues

Size: px

Start display at page:

Download "Multiple Regression Analysis: Further Issues"

Curtis Anthony
5 years ago
Views:

1 Multiple Regression Analysis: Further Issues Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) MLR: Further Issues 1 / 36

2 Effects of Data Scaling on OLS Statistics Effects of Data Scaling on OLS Statistics Ping Yu (HKU) MLR: Further Issues 2 / 36

3 Effects of Data Scaling on OLS Statistics Section 2.4: SLR The fitted regression line in the example of CEO salary and return on equity is \salary = roe, n = 209,R 2 =.0132 How will the intercept and slope estimates change when the units of measurement of the dependent and independent variables changes? Suppose the salary is measured in dollars rather than thousands of dollars, the intercept should be 963,191 and the slope should be 18,501. (why?) Solution: convert a new problem to an old problem whose solution is known. Ping Yu (HKU) MLR: Further Issues 3 / 36

4 Effects of Data Scaling on OLS Statistics Relationship between Objective Functions Recall that where arg means arguments. If y i = w 1 y i and x i bβ 0, β b n 1 = arg min β 0,β 1 i=1 = w 2 x i, then (y i β 0 β 1 x i ) 2, n min (y β 0,β i β 0 β 1 x i ) 2 1 i=1 n = min (w 1 y i β β 0,β 0 β 1 w 2x i ) 2 1 i=1 = min w 2 n β β 0,β y i β w 2 1 x w 1 i=1 1 w i. 1 Ping Yu (HKU) MLR: Further Issues 4 / 36

5 Effects of Data Scaling on OLS Statistics Relationship between b β s and bσ 2 s So (why?) Also, bσ 2 = = = bβ 0 = bβ 0 and β b w 1 = β b w w 1 =) b β 0 = w 1 b β 0 and b β 1 = w 1 w 2 b β 1 n 1 n 2 bu i 2 = 1 n n 2 yi i=1 i=1 w1 2 n n 2 y i i=1 β b 0 2 β b 1 x i w1 2 n n 2 bu i 2 i=1 = w 2 1 bσ 2. bβ 0 bβ 2 1 x i Or, the standard error of the regression (SER) bσ = w 1 bσ. Ping Yu (HKU) MLR: Further Issues 5 / 36

6 Effects of Data Scaling on OLS Statistics Relationship between R 2 s and Standard Errors It turns out that R 2 = R 2 : R 2 = 1 = 1 SSR SST n i=1 yi n i=1 bβ 0 y i y 2 bβ 2 1 x i Standard Errors: = 1 = 1 w 2 1 n i=1 bu2 i w 2 1 n i=1 (y i y) 2 SSR SST = R2 bβ 0 = w b 1β 0 and β b 1 = w 1 β b w 1 2 =) se bβ 0 = w 1 se bβ 0 and se bβ 1 = w 1 se bβ w 1 2 Ping Yu (HKU) MLR: Further Issues 6 / 36

7 Effects of Data Scaling on OLS Statistics Relationship between t Statistics and CIs (Section 6.1: MLR) It is natural to predict that t statistic is the same as before: t bβ 1 bβ 1 = b β 1 se = w 1 w 2 h bβ 1 = w 1 b w 2 β 1 β = b 1 = t w bβ. 1 w 2 se bβ 1 se bβ 1 1 The CI for β 1 is the original CI multiplied by w 1 w 2 : h bβ se bβ 1, β b se bβ i se bβ 1, β b se bβ 1 i. The results for β 0 are similar; the only difference is to replace w 1 w 2 by w 1. Ping Yu (HKU) MLR: Further Issues 7 / 36

8 Effects of Data Scaling on OLS Statistics Unit Change in Logarithmic Form Changing the unit of measurement of y and x, when they appear in logarithmic form, does not affect any of the slope estimates, but may affect the intercept estimate. why? so n min [log(y β 0,β i ) β 0 β 1 log(x i )] 2 1 i=1 n = min [log(y i ) + log(w 1 ) β β 0,β 0 β 1 log(x i ) β 1 log(w 2)] 2 1 i=1 n = min [log(y i ) (β β 0,β 0 + β 1 log(w 2) log(w 1 )) β 1 log(x i )] 2, 1 i=1 bβ 0 = b β 0 + b β 1 log(w 2) log(w 1 ) and b β 1 = b β 1 =) b β 0 = b β 0 b β 1 log(w 2 ) + log(w 1 ) and b β 1 = b β 1. I.e., the elasticity is invariant to the units of measurement of either y or x, and the intercept is related to both the original intercept and slope. Ping Yu (HKU) MLR: Further Issues 8 / 36

9 Effects of Data Scaling on OLS Statistics Example: Japanese Learning Suppose we want to study the SLR, y i = β 0 + β 1 x i + u i, where y i = exam mark in Japanese language course x i = hours of study per day during the semester (average hours) The fitted regression line is by i = x i (5.6) (3.5) If you do not study at all, the predicted mark is 20. One additional hour of study per day increases exam mark by 14 marks. At the mean value x = 3 hours of study per day is expected to result in a mark of y = 62. Ping Yu (HKU) MLR: Further Issues 9 / 36

10 Effects of Data Scaling on OLS Statistics continue Now if we report hours of study per week xi rather than per day, the variable has been scaled: xi = w 2 x i = 7x i and yi = w 1 y i = y i. If we run the regression based on xi and y i, we get by i = xi (5.6) (0.5) Each additional hour of study per week increases the exam mark by 2 marks (intuition here). 2 t statistic remains the same: 0.5 = The CI for β 1, [ , ] = [1.02,2.98], is 1/7 of the CI for β 1, which is [ , ] = [7.14,20.98]. Note that the means x and y will still be on the newly estimated regression line: y = x 62 = Ping Yu (HKU) MLR: Further Issues 10 / 36

11 More on Functional Form More on Functional Form Ping Yu (HKU) MLR: Further Issues 11 / 36

12 More on Functional Form More on Using Logarithmic Functional Forms Logarithmic transformations have the convenient percentage/elasticity interpretation. Slope coefficients of logged variables are invariant to rescalings. Taking logs often eliminates/mitigates problems with outliers. (why? figure here) Taking logs often helps to secure normality (e.g., log(wage) vs. wage) and homoskedasticity (see Chapter 8 for an example). Variables measured in units such as years (e.g., education, experience, tenure, age, etc) should not be logged. Variables measured in percentage points (e.g., unemployment rate, participation rate of a pension plan, etc.) should also not be logged. Logs must not be used if variables take on zero or negative values (e.g., hours of work during a month). It is hard to reverse the log-operation when constructing predictions. (we will discuss more on this point later in this chapter) Ping Yu (HKU) MLR: Further Issues 12 / 36

13 More on Functional Form logx Figure: logx < x and lim x! x = 0 Ping Yu (HKU) MLR: Further Issues 13 / 36

14 More on Functional Form Models with Quadratics Example: Suppose the fitted regression line for the wage equation is \wage = exper.0061exper 2 (.35) (.041) (.0009) n = 526,R 2 =.093 (quite small) The predicted wage is a concave function of exper. [figure here] The marginal effect of exper on wage is wage exper = b β b β 2 exper = exper. The first year of experience increases the wage by some $.30, the second year by.298 2(.0061)(1) = $.29 < $.30 etc. Ping Yu (HKU) MLR: Further Issues 14 / 36

15 More on Functional Form Figure: Wage Maximum with Respect to Work Experience Mincer (1974, Schooling, Experience and Earnings) assumed log(wage) = β 0 + β 1 educ + β 2 exper + β 3 exper 2 + u. Ping Yu (HKU) MLR: Further Issues 15 / 36

More on Functional Form History of the Wage Equation Jacob Mincer (1922-2006),

16 More on Functional Form History of the Wage Equation Jacob Mincer ( ), Columbia, father of modern labor economics Ping Yu (HKU) MLR: Further Issues 16 / 36

17 More on Functional Form Example: Effects of Pollution on Housing Prices The fitted regression line is \ log(price) = log(nox).087log(dist) (.57) (.115) (.043).545rooms +.062rooms 2.048stratio (.165) (.013) (.006) n = 506,R 2 =.603 where nox = nitrogen oxide in air dist = distance from employment centers, in miles stratio = student/teacher ratio The predicted log(price) is a convex function of rooms. [figure here] The coefficient of rooms is negative. Does this mean that, at a low number of rooms, more rooms are associated with lower prices? The marginal effect of rooms on log(price) is log(price) rooms = price/price rooms = rooms. Ping Yu (HKU) MLR: Further Issues 17 / 36

18 More on Functional Form Figure: Calculation of the Turnaround Point Ping Yu (HKU) MLR: Further Issues 18 / 36

19 More on Functional Form Other Possibilities Using Quadratics Along with Logarithms: which implies log(price) = β 0 + β 1 log(nox) + β 2 log(nox) 2 +β 3 crime + β 4 rooms + β 5 rooms 2 + β 6 stratio + u, log(price) log(nox) = % price % nox = β 1 + 2β 2 log(nox). Higher Order Polynomials: It is often assumed that the total cost takes the following form, cost = β 0 + β 1 quantity + β 2 quantity 2 + β 3 quantity 3 + u, which implies a U-shaped marginal cost (MC). [figure here] Ping Yu (HKU) MLR: Further Issues 19 / 36

20 More on Functional Form TVC TC TFC 0 0 Figure: Quadratic MC Implies Cubic TC: q is the inflection point Ping Yu (HKU) MLR: Further Issues 20 / 36

21 More on Functional Form Models with Interaction Terms In the model price = β 0 + β 1 sqrft + β 2 bdrms + β 3 sqrft bdrms + β 4 bthrms + u, sqrft bdrms is the interaction term. The marginal effect of bdrms on price is price bdrms = β 2 + β 3 sqrft. The effect of the number of bedrooms depends on the level of square footage. Interaction effects complicate interpretation of parameters: β 2 is the effect of number of bedrooms, but for a square footage of zero. How to avoid this interpretation difficulty? Ping Yu (HKU) MLR: Further Issues 21 / 36

22 More on Functional Form Reparametrization of Interaction Effects The model can be reparametrized as y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + u y = α 0 + δ 1 x 1 + δ 2 x 2 + β 3 (x 1 µ 1 ) (x 2 µ 2 ) + u, where µ 1 = E [x 1 ] and µ 2 = E [x 2 ] are population means of x 1 and x 2, and can be replaced by their sample means. What is the relationship between bα 0, δ b 1, δ b 2 and bβ 0, β b 1, β b 2? (Exercise) Now, y x 2 = δ 2 + β 3 (x 1 µ 1 ), i.e., δ 2 is the effect of x 2 if all variables take on their mean values. Advantages of reparametrization: - It is easy to interpret all parameters. - Standard errors for partial effects at the mean values are available. - If necessary, interaction may be centered at other interesting values. Ping Yu (HKU) MLR: Further Issues 22 / 36

23 More on Goodness-of-Fit and Selection of Regressors More on Goodness-of-Fit and Selection of Regressors Ping Yu (HKU) MLR: Further Issues 23 / 36

24 More on Goodness-of-Fit and Selection of Regressors Adjusted R-Squared General remarks on R-squared: - A high R-squared does not imply that there is a causal interpretation. - A low R-squared does not preclude precise estimation of partial effects. Recall that R 2 SSR/n = 1 SST /n = 1 eσ 2 u eσ 2, y so R 2 is estimating the population R-squared ρ 2 = 1 σ 2 u σ 2, y the proportion of the variation in y in the population explained by the independent variables. Adjusted R-Squared: R 2 = 1 SSR/ (n k 1) SST / (n 1) = 1 bσ 2 u bσ 2, y is sometimes also called R-bar squared, where bσ 2 u and bσ 2 y are unbiased estimators of σ 2 u and σ 2 y due to the correction of dfs. Ping Yu (HKU) MLR: Further Issues 24 / 36

25 History of R 2 More on Goodness-of-Fit and Selection of Regressors Henri Theil ( ), Chicago and Florida Ping Yu (HKU) MLR: Further Issues 25 / 36

26 More on Goodness-of-Fit and Selection of Regressors continue R 2 takes into account degrees of freedom of the numerator and denominator, so is generally a better measure of goodness-of-fit. R 2 imposes a penalty for adding new regressors: k " =) R 2 # R 2 increases if and only if the t statistic of a newly added regressor is greater than one in absolute value. [proof not required] - Compared with y = β 0 + β 1 x 1 + u, the regression y = β 0 + β 1 x 1 + β 2 x 2 + u has a larger R 2 if and only if tbβ 2 > 1. Relationship between R 2 and R 2 : by we have 1 R 2 = SSR SST = n k 1 n 1 R 2 = 1 SSR/ (n k 1) SST / (n 1) unless k = 0 or R 2 = 1. Note that R 2 even gets negative if R 2 < n k 1. 1 R 2 n 1 n k 1 < R2 = n k 1 1 R 2, n 1 Ping Yu (HKU) MLR: Further Issues 26 / 36

27 More on Goodness-of-Fit and Selection of Regressors Using Adjusted R-squared to Choose between Nonnested Models Models are nonnested if neither model is a special case of the other. For example, to incorporate diminishing return of sales to R&D, we consider two models: where rdintens = R&D intensity. rdintens = β 0 + β 1 log(sales) + u, rdintens = β 0 + β 1 sales + β 2 sales 2 + u, R 2 =.061 and R 2 =.030 in model 1 and R 2 =.148 and R 2 =.090 in model 2. A comparison between the R-squared of both models would be unfair to the first model because the first model contains fewer parameters. In the given example, even after adjusting for the difference in degrees of freedom, the quadratic model is preferred. Ping Yu (HKU) MLR: Further Issues 27 / 36

28 More on Goodness-of-Fit and Selection of Regressors Comparing Models with Different Dependent Variables R-squared or adjusted R-squared must not be used to compare models which differ in their definition of the dependent variable. Example (CEO Compensation and Firm Performance): and \salary = sales roe (223.90)(.0089) (11.08) n = 209,R 2 =.029,R 2 =.020,SST = 391,732,982 \ lsalary = lsales roe (0.29)(.033) (.0040) n = 209,R 2 =.282,R 2 =.275,SST = There is much less variation in log(salary) that needs to be explained than in salary, so it is not fair to compare R 2 and R 2 of the two models. (we will discuss how to compare the fitting of these two models later in this chapter) Ping Yu (HKU) MLR: Further Issues 28 / 36

29 More on Goodness-of-Fit and Selection of Regressors Controlling for Too Many Factors in Regression Analysis In some cases, certain variables should not be held fixed: - In a regression of traffic fatalities on state beer taxes (and other factors) one should not directly control for beer consumption. - why? Beer taxes influence traffic fatalities only through beer consumption; if beer consumption is controlled, then the coefficient of beer taxes measures the indirect effect of beer taxes, which is hardly interesting. - In a regression of family health expenditures on pesticide usage among farmers one should not control for doctor visits. - why? Health expenditures include doctor visits, an we would like to pick up all effects of pesticide use on health expenditure. Different regressions may serve different purposes: - In a regression of house prices on house characteristics, one would include price assessments and also housing attributes if the purpose of the regression is to study the validity of assessments; one should not include price assessments if the purpose of the regression is to estimate a hedonic price model, which measures the marginal values of various housing attributes. Ping Yu (HKU) MLR: Further Issues 29 / 36

30 More on Goodness-of-Fit and Selection of Regressors Adding Regressors to Reduce the Error Variance Recall that Var bβ j = σ 2 SST j 1 R 2 j. - Adding regressors may exacerbate multicollinearity problems (R 2 j "). - On the other hand, adding regressors reduces the error variance (σ 2 #). Variables that are uncorrelated with other regressors should be added because they reduce error variance (σ 2 #) without increasing multicollinearity (Rj 2 remains the same). However, such uncorrelated variables may be hard to find. Example (Individual Beer Consumption and Beer Prices): Including individual characteristics in a regression of beer consumption on beer prices leads to more precise estimates of the price elasticity if individual characteristics are uncorrelated with beer prices. log(cons) = β 0 + β 1 log(price) + indchar {z } + u. uncorrelated with log(price) Ping Yu (HKU) MLR: Further Issues 30 / 36

31 Prediction and Residual Analysis (*) Prediction and Residual Analysis Ping Yu (HKU) MLR: Further Issues 31 / 36

32 Prediction and Residual Analysis Predicting y When log(y) is the Dependent Variable We study only this prediction problem as promised. Note that implies log(y) = β 0 + β 1 x β k x k + u y = exp(β 0 + β 1 x β k x k )exp(u) = m(x)exp(u). Under the additional assumption that u is independent of (x 1,,x k ), we have E [yjx] = exp(β 0 + β 1 x β k x k )E [exp(u)jx] = exp(β 0 + β 1 x β k x k )E [exp(u)] m(x)α 0, where the second equality is due to the independence between u and x, so the predicted y is by = bm(x)bα 0 where bm(x) = exp bβ 0 + β b 1 x β b k x k and bα 0 = 1 n n exp(bu i ). i=1 Ping Yu (HKU) MLR: Further Issues 32 / 36

33 Prediction and Residual Analysis E [exp(u)] 1 Recall that E [u] = 0, so E [exp(u)] exp(e [u]) = exp(0) = 1. In the following figure, suppose u takes only two values u 1 and u 2 with probability 1 2 and 1 2, respectively. Since E [u] = 1 2 (u 1 + u 2 ) = 0, u 1 = u 2. Now, E [exp(u)] = 1 2 (exp(u 1) + exp(u 2 )) 1 exp 2 (u 1 + u 2 ) = exp(e [u]) = 1, where the equality is achieved only if u 1 = u 2 = 0, i.e., u = 0. As a result, ey = bm(x) = exp \ log(y) under-estimates E [yjx]! Ping Yu (HKU) MLR: Further Issues 33 / 36

34 Prediction and Residual Analysis Figure: Illustration of Jensen s Inequality Ping Yu (HKU) MLR: Further Issues 34 / 36

35 Prediction and Residual Analysis Comparing R-Squared of a Logged and an Unlogged Specification Reconsider the CEO salary problem: and \salary = sales mktval ceoten (65.23) (.0100) (.0095) (5.61) n = 177,R 2 =.201 \ lsalary = lsales mktval ceoten (0.257)(.039) (.050) (.0053) n = 177, e R 2 =.318 R 2 and e R 2 are the R-squareds for the predictions of the unlogged salary variable (although the second regression is originally for logged salaries). Both R-squareds can now be directly compared Ping Yu (HKU) MLR: Further Issues 35 / 36

36 Prediction and Residual Analysis About e R 2 Recall that where by is the predicted value of y. R 2 = [ Corr (y, by) 2, When lsalary is the dependent variable, the predicted value of y is bm(x)bα 0 = bα 0 ey. Since bα 0 > 0, [Corr (y, by) = [ Corr (y, bα 0 ey) = [ Corr (y, ey) invariant to bα 0, where recall that for any a > 0, Corr (X,aY ) = Corr (X,Y ) = Cov (X,Y ) p Var (X )Var (Y ). As a result, er 2 = [ Corr (y, ey) 2. Ping Yu (HKU) MLR: Further Issues 36 / 36

7. Prediction. Outline: Read Section 6.4. Mean Prediction

7. Prediction. Outline: Read Section 6.4. Mean Prediction Outline: Read Section 6.4 II. Individual Prediction IV. Choose between y Model and log(y) Model 7. Prediction Read Wooldridge (2013), Chapter 6.4 2 Mean Prediction Predictions are useful But they are subject