IT 403 Practice Problems (2-2) Answers #1. Which of the following is correct with respect to the correlation coefficient (r) and the slope of the leastsquares regression line (Choose one)? a. They will always have the same sign. b. They will have opposite signs. c. Nothing, because they are two different measures that are not related to one another. a (same sign) #2. What does r 2 measure? r 2, the coefficient of determination, measures the fraction (or percent) of variability/variation in the values of y that is explained by the least-square regression of y on x. #3. [Exercise 2.78, p. 120 (slightly re-worded)] Refer to Exercise 2.75, where you examined the relationship between the number of undergraduate college students and the populations for the 50 states. Figure 2.21 gives the output from a software package for the regression. Use this output to answer the following questions: i. What is the equation of the leastsquares regression line? ˆy= 15044.917 + 0.053x ii. What is the value of r 2? 0.968 iii. Interpret the value of r 2. 96.8% of the variation in the number of undergraduates is accounted for by the population size. iv. Does the software output tell you that the relationship is linear and not, for example, curved? Explain your answer. The software does not report the nature of the relationship; it is assuming a linear relationship in the calculations shown.
#4. [Exercise 2.80, p. 121] The following 20 observations on Y and X were generated by a computer program. i. Make a scatterplot and describe the relationship between Y and X. As a note, to obtain a scatter graph in SPSS, you do Graphs >> Chart Builder, then select Scatter Plot from the Gallery list, and drag the respective variables to X and Y axes. There seems to be a weak positive linear relationship between y and x. ii. Find the equation of the least-squares regression line and add the line to your plot. Coefficients a Unstandardized Coefficients Standardized Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 17.380 4.742 3.665.002 x.623.239.523 2.604.018 a. Dependent Variable: y The least-square linear regression equation is ˆy= 17.380 + 0.623x You can obtain the regression equation in SPSS by Analyze >> Regression >> Linear You can find the equation in the output titled Coefficients. Focus on the values in the column B under Unstandardized Coefficients. The intercept (b0) is shown for (Constant) and the slope is shown for x. iii. What percent of the variability in Y is explained by X?
Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1.523 a.274.233 1.93911 a. Predictors: (Constant), x In another output, titled Model Summary, you see R Square, which is 0.274. O 27.4% of the variability in Y is explained by X. iv. Summarize your analysis of these data in a short paragraph. [Textbook solution] The x variable only accounts for 27.37% of the variation in y, so the relationship is fairly weak. #5. A chemist was conducting an experiment to find how many ml of a particular substance dissolves in different temperatures of water. A correlation of 0.87 was computed. Which interpretation is TRUE (Choose one)? a. 87% of the variation in the amount of dissolved substance is explained by temperature. b. Correlation cannot be computed because temperature is not a continuous variable. c. 76% of the variation in the amount of dissolved substance is explained by temperature. c (76%) #6. Do heavier cars use more gasoline? To answer this question, a researcher randomly selected 15 cars. He collected data about the weight (in hundreds of pounds) and the mileage (mpg) for each car. From a scatter plot made with the data, a linear model seems appropriate. The percentage of variation in mileage that is accounted for by the linear relationship between mileage and weight is approximately 44%. What is the value of the correlation coefficient between the weight and the mileage of a car? 0.44 = 0.663. So, r = 0.663. #7. Below is a plot of the Olympic gold-medal-winning performance in the high jump (in inches) for the years 1900 to 1996. The equation of the least-squares regression line of Winning Height (in inches) on Year is Winning Height = 364.90 + 0.23 Year In another millennium (the year 3000), if the Olympics continue to be held, we can expect the Winning Height to be about (Choose one):
a. 325 inches. b. 690 inches. c. none of the above. c (none of the above) #8. The British government conducts regular surveys of household spending. The average weekly household spending on tobacco products and spending on alcoholic beverages for each of 11 regions in Great Britain were recorded. A scatter plot of spending on tobacco versus spending on alcohol is given below: Determine whether each of the following statements is true or false. i. The observation in the lower-right corner of the plot is influential. -- TRUE ii. There is clear evidence of a negative association between spending on alcohol and spending on tobacco. -- FALSE iii. The equation of the least-squares regression line for this plot would be approximately y = 10 2x. -- FALSE iv. If we measured the spending in dollars instead of pounds, the correlation coefficient would decrease because a dollar is worth less than a pound. -- FALSE #9. A(n) is an observation that is substantially different from the other observations. (Choose one) a. outlier b. lurking variable c. confounding variable d. None of the above.
a (outlier) #10. It is known that not exercising may lead to poor health. However, it is possible that people who are already in poor health do not have the ability or energy to exercise. This example is one of. (Choose one) a. causation b. common response c. confounding d. None of the above. c (confounding variable) #11. [Exercise 2.96, p. 134 (slightly re-worded)] Barium-137m is a radioactive form of the element barium that decays very rapidly. It is easy and safe to use for lab experiments in schools and colleges. In a typical experiment, the radioactivity of a sample of barium-137m is measured for one minute. It is then measured for three additional one-minute periods, separated by two minutes. So data are recorded at one, three, five, and seven minutes after the start of the first counting period. The measurement units are counts. Here are the data for one of these experiments. Time Count LogCount 1 578 6.35957 3 317 5.75890 5 203 5.31321 7 118 4.77068 i. Using the least-squares regression equation count = 602.8 (74.7 time) and the observed data, find the residuals for the counts. In SPSS, you can produce residuals for every observation (i.e., the value for the response variable). To do so you do: Analyze >> Regression >> Linear And click on the button Save, and in the following window, check for Unstandardized under Residuals (as shown in the next figure).
ii. Plot the residuals versus time. The graph on the right was obtained using SPSS as a scatter plot between Time and the residuals. iii. Write a short paragraph assessing the fit of the least-squares regression line to these data based on your interpretation of the residual plot. [Textbook solution] There is a clear curve in the residual plot; this is not a good model for these data.
#12. [Exercise 2.101, p. 134] What s wrong? Each of the following statements contains an error. Describe each error and explain why the statement is wrong. i. An influential observation will always have a large residual. If the line is pulled toward the influential point, the observation will not necessarily have a large residual. ii. High correlation is never present when there is causation. High correlation is always present if there is causation. iii. If we have data at values of x equal to 1, 2, 3, 4, and 5, and we try to predict the value of y for x = 2.5 using a least-squares regression equation, we are doing an extrapolation. Extrapolation is using a regression to predict for x-values outside the range of the data (here, using 20, for example). #13. Correlations caused by lurking variables are called. (Choose one) a. nonsense correlations b. association correlations c. reverse correlations d. None of the above. a (nonsense correlations)