Last Name (Print): Solution First Name (Print): Student Number: MGECHY L Introduction to Regression Analysis Term Test Friday July, PM Instructor: Victor Yu Aids allowed: Time allowed: Calculator and one aid sheet (two 8."x" pages) written or typed on both sides Two () hours This exam consists of questions in pages including this cover page. It is the student s responsibility to hand in all 9 pages of this exam. Any missing page will get a zero mark. Show your work in part. No marks will be given if you do not show your work. This exam is worth % of your course grade. Do not write on the space below, for markers only. Page Question Max Mark - 7 6 8 6 7-8 9 6 9-7 Total The University of Toronto's Code of Behaviour on Academic Matters applies to all University of Toronto Scarborough students. The Code prohibits all forms of academic dishonesty including, but not limited to, cheating, plagiarism, and the use of unauthorized aids. Students violating the Code may be subject to penalties up to and including suspension or expulsion from the University. Management, 6 Military Trail, Toronto, ON, MC A4, Canada www.utsc.utoronto.ca/mgmt
Part I. Multiple Choice. marks in each question. No part mark. Circle only one answer. If there are more than one correct answer, circle the best one.. The model y x x u is x (a) a simple regression model (b) a linear multiple regression model (c) a non-linear multiple regression model (d) all of above can be correct (e) none of above is correct. Let y x u be a regression model with one regressor, and let be the correlation between x and y. Which one of the following statements is false? (a) If the F value in the ANOVA table is less than, the model is not significant. (b) Testing H is equivalent to testing H :. : (c) The OLS estimator b always has the same sign as the sample correlation coefficient r. (d) To test the significance of the model, we must assume that the error u has a normal distribution with mean and variance. (e) The estimated regression equation of y on x is always the same as that of x on y.. Let y x u be a regression model with one regressor. To obtain the sample regression coefficient b using the method of least squares, which one of the following statements is false? E b (a) n (b) y y i n (c) y i y i i n (d) y i y (e) i n i i y i y i is a minimum is a minimum 4. Given a set of data x y, x, y,...,,, x y n n. If the correlation coefficient is computed to be r., what is the correlation coefficient computed from the set of data x, y, x, y,..., x n, y n? (a). (b). 7 (c). (d).7 (e) cannot be calculated from given condition
. For the pairs of measurements x y, x, y,...,,, x y n n, the OLS regression line of y on x is y. 4x ; and the OLS regression line of x on y is x y. What is the correlation coefficient between x and y? (a).4 (b). (c). (d). (e).4 6. The regression R is a measure of (a) whether or not X causes Y. (b) the goodness of fit of your regression line. (c) whether or not ESS > TSS. (d) the square of the regression coefficient. (e) none of the above 7. The reason why estimators have a sampling distribution is that (a) economics is not a precise science. (b) individuals respond differently to incentives. (c) in real life you typically get to sample many times. (d) the values of the explanatory variable and the error term differ across samples. (e) the values of the explanatory variable and the error term are the same across samples. 8. Imagine you regressed earnings of individuals on a constant, a binary variable ("Male") which takes on the value for males and is otherwise, and another binary variable ("Female") which takes on the value for females and is otherwise. Because females typically earn less than males, you would expect (a) the coefficient for Male to have a positive sign, and for Female a negative sign. (b) both coefficients to be the same distance from the constant, one above and the other below. (c) none of the OLS estimators to exist because there is perfect multicollinearity. (d) this to yield a difference in means statistic. (e) this to yield better results. 9. The intercept in the multiple regression model (a) should be excluded if one explanatory variable has negative values. (b) determines the height of the regression line. (c) should be excluded because the population regression function does not go through the origin. (d) is statistically significant if it is larger than.96. (e) is always statistically significant.
. In multiple regression, the R increases whenever a regressor is (a) added unless the coefficient on the added regressor is exactly zero. (b) added even when the coefficient on the added regressor is exactly zero. (c) added unless there is heterosckedasticity. (d) greater than.96 in absolute value. (e) greater than.64 in absolute value. In the multiple regression model, the adjusted R (a) cannot be negative. (b) will never be greater than the regression R. (c) equals the square of the correlation coefficient r. (d) cannot decrease when an additional explanatory variable is added. (e) is none of the above. Let R unrestricted and R restricted be.466 and.449 respectively in multiple regression. The difference between the unrestricted and the restricted model is that you have imposed two restrictions. There are 4 observations. The F-statistic in this case is closest to (a) 4.6 (b) 8. (c).4 (d) 7.7 (e).4. Consider the following regression output where the dependent variable is testscores and the two explanatory variables are the student-teacher ratio (STR) and the percent of English (PctEL) learners: = 698.9 -. STR -.6 PctEL. You are told that the t- statistic on the student- teacher ratio coefficient is.6. The standard error therefore is approximately (a). (b).96 (c).6 (d).4 (e).64 Questions 4 7: Suppose in a sample of men that their monthly income (in thousands of dollars), years of schooling and ages are as follows: y x x Men (income in $,) (Years of Schooling) (Age) 6 8 4 7 4 8 6 9 4 4
Assume a linear model y x x u Computer outputs show the following results: Regression Statistics Multiple R.998 R Square.8449 Adj R Square.68989 Standard Error.9796 Observations ANOVA df SS MS F Significance F Regression 6...44786.44 Residual..7 Total 4 74 Coefficients Standard Error Intercept..6 x.8.89 x -..4 4. At % significance level, the model is (a) Significant (b) Not Significant (c) not able to determine the significance. For someone with years of schooling and years of age, the expected monthly income is closest to: (a) $,8 (b) $,88 (c) $6, (d) $6,7 (e) $6,9 6. At % significance, which one of the following statements is true? (a) Both x and x are significant variables to predict y. (b) Only x is significant, x is not significant. (c) Only x is significant, x is not significant. (d) Both x and x are not significant variables to predict y (e) None of the above is true. 7. A 9% confidence interval for is closest to b t / SE b. 4..4 That is,.46,.76.. 96
Part II Show your work in each question. 8. (6 marks) You have obtained a sample of 744 individuals from the Current Population Survey (CPS) and are interested in the relationship between weekly earnings and age. The regression, yielded the following result: = 9.6 +. Age, R =., SER = 87.., (.4) (.7) where Earn and Age are measured in dollars and years respectively. (a) Interpret the regression coefficient.. Solution: A person who is one year older increases her weekly earnings by $.. ( marks) (b) Interpret the measures of fit R. ( marks) Solution: The regression R indicates that five percent of the variation in earnings is explained by the model. The typical error is $87.. (c) Is the relationship between Age and Earn statistically significant? (Use % significance level). ( marks) Solution: H, H : : Method : At % significance level, do not reject H if.96 t. 96, reject H if b. t.96 or t. 96. Test statistic is t 9.8 which falls in the SE b.7 rejection region. Reject H and conclude that the model is significant. Method : At % significance level, do not reject H if F. 84, do not reject H if n R 74. F.84. Test statistic is F 9.684 which falls in the R. rejection region. Reject H and conclude that the model is significant. (d) Construct a 9% confidence interval for the slope. Solution: t SEb..96.7.. 7 b, or (4.88, 6.7) / ( marks) 6
9. (6 marks) An analyst studies the effects of age ( x ), body size ( x ), and smoking history (Z) on systolic blood pressure (y) for a sample of people. The multiple regression model is: y x x Z x Z x Z u 4 The fitted regression equations for smokers and non-smokers are, respectively: Smokers: y 48.7.466x 6. 744x Non-smokers: y 48.6.9x. 4x (a) Obtain the estimates of,,,, 4, and in the model above. Write your answers down below. b b b b b b Show your work below. Solution: For smokers, the model is 4 y x x 4x x x x u y 4 Therefore b b 48. 7, b b. 466, b b 6. 744 For non-smokers, the model is 4 y x x b 48.6, 9 b., b. 4, And b. 8, b 4. 47, b. 77 u, hence u, which is (6 marks) 7
(b) The Sum of Squares in the ANOVA table for the model with parameters,,,, and are given below. 4, SS df MS F Regression 496 Residual Total 646 Test the overall significance of the model using a significance level.. (You must write down the null and alternative hypotheses, the test statistic and the conclusion). (4 marks) Solution: H, H : At least one, i,,,4,. : 4 [Alternatively, H : Model is NOT significant, H : Model is significant.] The ANOVA table is SS df MS F Regression 496 98. 6.9 Residual 6 8.769 Total 646 At % significance level, reject H if F. 9, do not reject H if F. 9, where F has df = (,6). The F statistic from the ANOVA table is 6.9, reject H and conclude that the model is significant. i (c) The Sum of Squares in the ANOVA table for the model with parameters,, and are given below., SS df MS F Regression 489 Residual Total Test the hypothesis H : 4 versus H : At least one of 4 or is non-zero. Use the significance level.. Solution: The partial F test is SSRres SSRunres / q 6 / F SSR / n k / unres is not to reject H. 8.769 (6 marks).84, the conclusion 8
. (7 marks) The cost of attending your college has once again gone up. Although you have been told that education is investment in human capital, which carries a return of roughly % a year, you (and your parents) are not pleased. One of the administrators at your university/college does not make the situation better by telling you that you pay more because the reputation of your institution is better than that of others. To investigate this hypothesis, you collect data randomly for national universities and liberal arts colleges from the - U.S. News and World Report annual rankings. Next you perform the following regression (a) = 7,.7 +,98. Reputation. Size (,8.6) (664.8) (.) + 8,46.79 Dpriv 46.8 Dlibart,76. Dreligion (,4.8) (,.9) (,7.86) R=.7, SER =,77. where Cost is Tuition, Fees, Room and Board in dollars, Reputation is the index used in U.S. News and World Report (based on a survey of university presidents and chief academic officers), which ranges from ("marginal") to ("distinguished"), Size is the number of undergraduate students, and Dpriv, Dlibart, and Dreligion are binary variables indicating whether the institution is private, a liberal arts college, and has a religious affiliatio The numbers in parentheses are standard errors. Indicate whether or not, each coefficient is significantly different from zero. Solution: H :, H :, i,,,4,. i i (4 marks) For., reject H if t. 984 or t. 984. Do not reject H if.984 t. 984, where we have used the t distribution with degrees of freedom. The actual degrees of freedom for this t-test is n k 94, but the corresponding t-value is not available from our t-table. [To marker: some students may use.96 from the Z-table since the sample size is large. Please consider it correct.] bi The t-test for each coefficient is t SEbi Variable Coefficient SE t conclusion Reputation b 98. SE b 664. 8.9966 significant Size b. SE b.. 8 not significant Dpriv b 846. 79 SE b 8 4..9 significant Dlibart b 4 46. 8 SE b 4. 9. 7 not significant Dreligion b 76. SE b 7. 86. 8 significant 9
(b) What is the p-value for the null hypothesis that the coefficient on Size is equal to zero? Based on this, should you eliminate the variable from the regression? Why or why not? ( marks) Solution: From the Z-table, p-value = P Z.4.68. 6 Alternative solution. From the t-table with degrees of freedom,. p value., hence. p value. (c) You want to test simultaneously the hypotheses that βsize = and βdilbert =. Your regression package returns the F-statistic of.. At % significance level, can you reject the null hypothesis? ( marks) Solution: The degrees of freedom for this partial F test is (, 94). From the F-table, the critical value is.9. Reject H if F. 9, do not reject H if F.9. Since the regression package returns the F-statistic of., we do not reject the null hypothesis. (d) Eliminating the Size and Dlibart variables from your regression, the estimation regression becomes =,4. +,8.84 Reputation +,9.7 Dpriv,78. Dreligion; (,77.) (9.49) (87.) (,8.7) R=.7, SER =,79.68 Test the overall significance of this model. Why do you think that the effect of attending a private institution has increased now? ( marks) Solution: H : Model is not significant, H : Model is significant At % significance level, reject H if F. 7 and do not reject H if F. 7, where F has df = (, 96). We have used df = (, ) here, which is the closest. R / k.7/.4 Test statistic is F 8.98 R / n k.7/.9 Reject H and conclude that the model is significant. Private institutions are smaller, on average, and some of these are liberal arts colleges. Both of these variables had negative coefficients.