Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more than one independent variable d. both a and b. The least squares regression line minimizes the sum of a. errors b. squared errors c. predictions d. none of the above 3. The value of the correlation coefficient is always in the range a. 0 to b. - to c. - to 0 d. none of the above 4. MS(Reg) = 400, MS(Residual error) = 00, MS(Total) = 5. What is the standard deviation of the errors? a. 0 b. 0 c. 5 d. 30 5. In question (4), what is the value of the F statistic? a. 0.8 b. 4 c..778 d. 3 6. Eleven cars of a certain model, between one and seven years of age, were randomly selected from the classified ads. Data were collected on their ages (x in years) and prices (y in 000 dollars). The least-squares regression equation is y ˆ 9.8.56 x. Which of the following statements is correct? a. The price will increase by $560 for every -year increase in age. b. The price will decrease by $560 for every -year decrease in age c. The relationship between price and age is positive. d. The price for a new car (with 0 year of age) is $9800 e. None of the above. 7. In linear regression analysis, the F statistic is large when a. the amount of variation unexplained by the model is large compared to the amount of variation explained by the model b. the amount of variation explained by the model is large compared to the amount of variation unexplained by the model c. the amount of variation explained by the model is small compared to the amount of variation unexplained by the model d. the amount of variation explained by the model and the amount of variation unexplained by the model are always equal 8. For children between the ages of 8 months and 9 months, there is approximately linear relationship between "height" and "age". The relationship can be represented by: Yˆ 64.93 0.63( x), where Y represents height (in Ahmad's residual (Closest to)? a. 79. b. -0.9 c. +0.9 d. 56.6 e. 64.93 9. Suppose a straight line is fit to data having response variable y and independent variable x. The slope of the line is negative and the portion of the variation in the values of y that is explained by the least-squares regression of y on x is 0.64. What is the value of the correlation coefficient between x and y? a. 0.64 b. -0.8 c. -0.64 d. 0.8 d. 0.50 0. Which of the following statements is true? a. The correlation coefficient equals the proportion of times two variables lie on a straight line. b. The correlation coefficient will be +.0 only if all the data lie on a perfectly horizontal straight line. c. The correlation coefficient measures the fraction of outliers that appear in a scatter plot. d. The correlation coefficient is a unit-less number and must always be between -.0 and +.0. e. All of the above.
. In a simple regression analysis, if R 0. 5 and the total sum of squares (SST) = 000, then the error sum of squares (SSE) will have the value of a. 750 b. 500 c. 8000 d. 500 e. 500 Questions -5 refer to the following Regression Analysis was applied between sales data (in $000) and advertising data (in $00) and the following information was obtained: yˆ.8 x n 7 S ˆ SSR 5 SSE 75 0.683 (standard error of the point estimate of the slope). Based on the above estimated regression equation, if advertising is $3000, then the predicted value of sales (in dollars) is a. $66,000 b. $5,4 c. $66 d. $7,400 3. The F statistic computed from the above data is a. 3 b. 45 c. 48 d. 54 4. The critical F value at 0. 05 is a. 3.59 b. 3.68 c. 4.45 d. 4.54 5. The t statistic for testing the significance of the slope is a..80 b..96 c. 6.708 d. 0.555 II. Solve The Following Problems. A final exam in Statistics consists of questions in 8 sections. A teacher believes that the most important of these sections is the section of problem solving. She analyses the scores of 36 randomly chosen students using Minitab Software and produces the following printout relating the total score (Y) to the problem solving subscore(x) Predictor Coef SE Coef T Constant.960 6.8? ProbSolve 4.06 0.5393? S =.09 R-Sq = 6.0% R-Sq(adj) = 60.9% a. What is the predicted Total score if the ProbSolve score was 0? b. What is the residual (error) for the data point (0, 55)? c. At the 0.05 significance level, test whether the slope of the regression line for all statistics students is different from zero. d. Calculate the 95% confidence interval of the slope of the regression line for all statistics students. e. Are the results reached through the test in (c) and the confidence interval in (d) consistent? Explain the reasons for your answer.. The following table gives the ages (in years) and prices (in$00) of 8 randomly selected cars Age (X) 8 3 6 9 5 6 3 Price(Y) 8 94 50 45 4 36 99 a. Find the least squares regression line b. Give a practical interpretation of the values of the y-intercept and slope calculated in part (a) c. Compute r and r and explain what they mean in the context of the problem d. Predict the Price for a 7 years old car. e. Compute the standard deviation of errors f. Compute a 99% confidence interval for the slope
g. Testing at % significance level, can you conclude that is negative. 3. The following data give information on the ages (in years) and the number of breakdowns during the past month for a sample of seven machines at a large company. Age (X) 7 8 3 9 4 # of breakdowns (Y) 0 5 4 7 x 55, y 4, 46 x, y 339 xy, 57 a. Taking age as an independent variable and number of breakdowns as a dependent variable, find the least squares regression line. b. Give a brief interpretation of the values of a and b calculated in part (a) c. Compute r and r and explain what they mean. d. Compute the standard deviation of errors. 4. A random sample of eight drivers insured with a company and having similar auto insurance policies was selected. The following table lists their driving experiences (in years) and monthly auto insurance premiums. Driving Experience (years) X 5 9 5 6 5 6 Monthly Premium : Y $64 87 50 7 44 56 4 60 x 90, y 474, xy 4739, x 396, 964 a. Find the least squares regression line. b. Give a brief interpretation of the values of a and b calculated in part (a) c. Compute r and r and explain what they mean. d. Predict the monthly auto insurance premium for a deriver with 0 years of driving experience. e. Compute the standard deviation of errors. 5. Given are five observations collected in a regression study on two variables. Pts each. Sum Sum of squares X 4 5 7 8 6 x =58 Y 3 6 4 7 y =69 SSxx=.8 SSyy=. XY 4 0 4 3 00 SSxy=.6 a. Develop the estimated regression equation for these data. b. Use the estimated regression equation to predict the value of y when x = 4. c. What percentage of the total sum of squares can be accounted for by the estimated regression equation? d. What is the value of the sample correlation coefficient? e. What is the value of the mean square error? What is the value of the standard error of the estimate? f. Test for a significant relationship by using the t test. Use. 05 g. Use the F test to test for a significant relationship. Use. 05. What is your conclusion? 3
6. The commercial division of a real estate firm is conducting a regression analysis of the relationship between x, annual gross rents (in thousands of euros), and y, selling price (in thousands of euros) for apartment buildings. Data were collected on several properties sold and the following computer output was obtained. Regression Analysis: y versus x Predictor Coef SE Coef T Constant 0.00 3.3 6. x 7..366 5.9 Analysis of Variance Source DF SS Regression 4587.3 Residual Error 7 Total 8 5984. a. How many apartment buildings were in the sample? b. Write the estimated regression equation. c. What is the value of S b? d. Use the F statistic to test the significance of the relationship at 0.05 e. Estimate the selling price of an apartment building with gross annual rents of 50000 Multiple Regression Model I. Choice Questions. n = 0, and there are 4 residual error degrees of freedom. How many independent variables are there? a. 7 b. 6 c. 5 d. 4. In a multiple regression model with the usual assumptions, if we add an independent variable: a. The explained variability will increase or stay the same b. The unexplained variability will increase c. The total variability will increase d. The coefficient of determination decreases 3. Suppose that a sample of 5 observations follows the multiple regression model : y x x, with the usual assumptions. Individual tests of significance have been done for both variables as well as the global regression test. The results are: H : 0, and the value of the test statistic "t" 0. 5 H : 0, and the value of the test statistic "t" 5. 6 H : 0, and the value of the test statistic "F" 5 If we use a level of significance 0. 05, the results indicate that: a. the regression model is useful for predicting, but x is individually not significant. b. both variables x and x are not significant c. the regression model is not useful for predicting d. both variables x and x are significant 4
Questions 4-7 refer to the following Consider the output below from a regression of the sales price of 50 homes on their square footage and the number of bedrooms. Some of the output details have been deleted. Predictor Coef StDev T P Constant 49. 6.749 7.8 0.000 Square Footage??? 0.5556 8.66 0.000 Number of Bedrooms -0.78.49-0.99 0.905 Analysis of Variance Source DF SS MS F P Regression 8559.6?????? 0.000 Residual Error????????? Total 49 395. 4. What is the value of the missing slope coefficient? a. 5.58 b. 4.8 c. 4.5 d. 0.06 s e of the error term? a. 9.5 b. 90.9 c. 89.7 d. 866.9 5. What is the estimate of the variance 6. What is the value of the missing F-statistic? a. 0.06 b..39 c. 99.85 d. 5.5 7. Let and be the slope coefficients in the regression. You would the null hypothesis H : 0 at the level. (Fill in the blanks) a. Reject, 5% level b. Fail to reject, 0% level c. Reject, % level d. Both (a) and (c) 5
II. Solve the following problems. The owner of Showtime Movie Theaters, Inc., would like to estimate weekly gross revenue as a function of advertising expenditures. Historical data for a sample of 8 weeks follow. Weekly Gross Revenue ($000s) Television Advertising ($000s) 96 5.0.5 90.0.0 95 4.0.5 9.5.5 95 3.0 3.3 94 3.5.3 94.5 4. 94 3.0.5 Newspaper Advertising ($000s) A portion of the Minitab computer output follows Regression Analysis: Weekly Gross versus Television A; Newspaper Ad The regression equation is: Weekly Gross Revenue = 83. +.9 Television Advertising +.30 Newspaper Advertising Predictor Coef SE Coef T Constant.574 Television Advertising 0.304 Newspaper Advertising 0.307 S = 0.64587 R-Sq = 9.9% R-Sq(adj) = 88.7% Analysis of Variance Source DF SS MS F Regression 3.435 Residual Error 5 Total a. What is the estimate of the weekly gross revenue for a week when $3500 is spent on television advertising and $800 is spent on newspaper advertising? b. Find and interpret R? c. When television advertising was the only independent variable, R = 0.653. Do you prefer the multiple regression results? Explain d. Use 0. 05 to test the hypotheses H : 0 H : and/or is not equal to zero Did the estimated regression equation provide a good fit to the data? Explain e. Find the mean square error. Find the standard error of the estimate ˆ f. Use 0. 05 to test the significance of each independent variable. Should X or X be dropped from the model? g. Construct 95% C.I. for. Use this confidence interval to test whether X is significant or not. 6