ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator. Giving or receiving assistance from other students is not allowed. Show work to receive partial credit! Partial credit will be given, but only for work written on the exam. The total points are 25. Good luck!
1. Assume that the math scores of high school seniors in North Carolina are normally distributed with mean 82 and standard deviation 5. (a) (2 points) Compute the z-score of a student with math score 85. Would you say this student has an extremely high math score? Justify your answer. ANSWER: z-score= (85-82)/5=.6. Note that for standard normal distribution, 95% of the data is within 2 standard deviation from 0; here,.6 < 2, suggesting the 85 is not an extreme value. (b) (3 points) Let X be the mean math score of a class of 25. What is the probability that X is greater than 83.6? ANSWER: P ( X > 83.6) = P ( X 82 5/ 83.6 82 > 25 5/ 25 ) 83.6 82 = P (Z > 5/ 25 ) = P (Z > 1.6) =.0548
2. Consider the following three scatter plots with least squares lines. (a) (1 point) Which least squares line has the largest intercept? ANSWER: Plot3. About 3.1. (b) (1 point) Which least squares line has the largest slope? ANSWER: Plot1 has largest slope. (c) (2 points) Which simple linear regression has the largest coefficient of determination (R 2 )? ANSWER: Plot 3, as its line has the best of fit among the three.
3. The British Journal of Sports Medicine (April 2000) published a study of the effect of massage on boxing performance. Two variables measured on the boxers were blood lactate concentration (mm) and the boxer s perceived recovery (28-point scale). The data were obtained for 16 five-round boxing performances, where a massage was given to the boxer between rounds. The plot below gives the 95% prediction interval for the average value and a particular value of perceived recovery for several levels of blood lactate concentration. (a) (2 points) Explain why the interval for a particular value is considerably wider than the interval for the average value. ANSWER: Recall the formula of confidence interval and prediction interval, the only difference lies in the margin part, 1 n + (x p x) 2 SS xx < 1 + 1 n + (x p x) 2 SS xx Intuitively, for confidence interval, we estimate E(y) = β 0 + β 1 x with ŷ = ˆβ 1 + ˆβ 2 x. The error is just ŷ E(y). However, for prediction interval, we estimate y with ŷ, the error being ŷ y = ŷ E(y) + (E(y) y) = ŷ E(y) ɛ, which has additional error ɛ from individual level. (b) (1 point) Would it be wise to use this simple linear regression model to predict a boxer s pereceived recovery if the blood lactate level is 1mM? Explain. ANSWER: No. Predication at some value of explanatory variable which is out of the range of observed data will produce unreliable result. 1mM is way below the lower bound of the data.
4. The R output for the data of the previous problem is: Call: lm(formula = RECOVERY ~ LACTATE, data = BOXING2) Residuals: Min 1Q Median 3Q Max -6.577-3.752 0.060 3.067 8.043 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 2.7967 4.9838 0.561 0.5836 LACTATE 2.5667 0.9883 2.597 0.0211 * --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 4.28 on 14 degrees of freedom Multiple R-squared: 0.3251,Adjusted R-squared: 0.2769 F-statistic: 6.744 on 1 and 14 DF, p-value: 0.0211 (a) (1 point) Give the least squares regression line. ANSWER: ŷ = 2.7967 + 2.5667 LACTATE (b) (2 points) Give the slope and it interpretation in the context of the problem. ANSWER: Slope is 2.5667. Thus, the boxers perceived recovery (28-point scale) increases by 2.5667 point on average for one additional unit increase in blood lactate concentration (mm). (c) (1 point) Give the sample correlation between blood lactate level and perceived recovery. ANSWER: Sample correlation is just R 2 =.3251 =.57 (note the sign must agree wit the sign of slope). (d) (2 points) Is there a statistically significant association between blood lactate level and perceived recovery at the 0.05 level? Explain. ANSWER: H 0 : β = 0 H a : β 0 Look at the t value corresponding to LACTATE, its p-value.0211 <.05. Therefore, we reject the null and conclude there is a statistically significant association between blood lactate level and perceived recovery. (e) (1 point) Would you say there is a strong linear relationship between blood lactate level and perceived recovery? Explain. ANSWER: The R-squares is 0.3251, which means only about 32.5 percent of variation in the data can be explained by the model. Hence the linear relationship is not very strong.
5. The first-order multiple regression model with two predictors is Y = β 0 + β 1 X 1 + β 2 X 2 + ɛ, where Y is the dependent variable, X 1 and X 2 are the independent variables, and ɛ is the random error. We collect 32 observations and perform a multiple regression. The R output is: Call: lm(formula = Y ~ X1 + X2) Residuals: Min 1Q Median 3Q Max -2.2128-0.5937 0.1083 0.7110 1.8639 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -2.8676 1.3173-2.177 0.03778 * X1 2.4296 0.6857 3.543 0.00136 ** X2 2.2206 0.6615 3.357 0.00222 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.012 on 29 degrees of freedom Multiple R-squared: 0.396,Adjusted R-squared: 0.3543 F-statistic: 9.505 on 2 and 29 DF, p-value: 0.0006691 (a) (1 point) What is the least squares regression line? ANSWER: ŷ = 2.8676 + 2.4296X 1 + 2.2206X 2 (b) (1 point) Conduct a test of overall model utility. Use α =.05. ANSWER: H 0 : β 1 = β 2 = 0 H a : at least one of β 1 and β 2 is not equal to zero. F-test value is 9.505 with p-value=.0006 which is less than.05. Thus the model is useful in that it explains some variation in Y using X 1 and X 2. (c) (2 points) Conduct a test whether X 1 is significantly associated with Y. Use α =.05. ANSWER: H 0 : β 1 = 0 and H a : β 1 0. The t-value for X 1 is 3.543. Its p-value is.00136, which is less than.05. Thus we reject the null and conclude X 1 is significantly associated with Y. (d) (2 points) What assumptions about ɛ s distribution are needed for the test in (c). ANSWER: we need the following 1. ɛ i follows the same normal distribution N(0, σ 2 ) for all i; 2. ɛ i is independent of ɛ j for j i.