P RACTICE PROBLEMS This document contains 3 sets of practice problems. Correlation: 3 problems Regression: 4 problems ANOVA: 8 problems You should print a copy of these practice problems and bring them with you to class. Solutions will be reviewed in class and you will have trouble keeping up if you do not have a copy of them with you. Note: Additional problems, and real- time solutions, are provided on line in the form of screencasts. The additional problems are also provided in PDF form via a link on that site. You are strongly encouraged to try working those problems before watching the screencasts. The additional problems will NOT be covered during class time. Primary URL: http://awarnach.mathstat.dal.ca/~joeb/stats1060_webcasts/welco me.html Alternate URL: http://web.me.com/cadair_idris/stats1060/welcome.html
C ORRELATION: Problem 1. In each of the following settings decide which variable you would consider explanatory and which one you would consider response. Predict the type of association. a. The amount of time spent studying for the 1060 midterm and the exam grade. b. Your grade on the 1060 exam and your height at age 6 c. The amount of yearly rainfall and the biomass of an ecosystem d. The weight and height of a 1060 student Problem 2. The figure shows several scatterplots. Match the following correlation coefficients to the scatterplots: 0.71, - 0.55, 0.002, - 0.908. Indicate which, if any, of these datasets might you question the use of correlation? A B C D Problem 3. Identify the errors in the following statements. a. There was a high correlation between the occupational class of the father and the occupational class of the son. b. We detected a high correlation (1.10) between the length and cost of a long distance phone call. c. The strong positive correlation between the percentage of the population that own a cell phone and life expectancy means that countries that want to improve the health of their citizens should invest in more cell phone networks. d. We found the correlation between rainfall and crop yield to be r = 0.26 bushels per inch of rainfall.
R EGRESSION: Problem 1. Variable 1 (var1) is the annual wine consumption per person per year measured in liters of alcohol consumed. Variable 2 (var2) is the annual death rate, given as number per 10,000. Data for these two variables were collected from 19 countries and analyzed using Minitab. Given the following output from Minitab, compute the slope and intercept of the least squares regression line: MTB> corr var1, var2 Correlation of var1 and var2 = - 0.843 MTB> desc var1 var2 N MEAN MEDIAN TRMEAN STDEV var1 19 3.026 2.400 2.806 2.510 var2 19 191.1 199.0 191.7 68.4 Problem 2. The variables (var1 and var2) are the same as in question 1 above. In this case, Minitab was used to conduct a linear regression analysis. Use the Minitab output in the box to answer the questions below. MTB> regr var2 var1 The regression equation is var2 = 261 23.0 var1 Predictor coef stdev t - ratio p Constant 260.56 13.38 18.83 0.0 var1-22.969 3.557-6.46 0.0 s = 37.88 R - sq=71.0% Analysis of Variance SOURCE DF SS MS F P Regression 1 59814 59814 41.69 0.00 Error 17 24931 1435 Total 18 84205 a) What is the equation for the least squares regression line? b) State the prediction of the model in words. c) Write out the ratio of the variance in the model predictions to the total variance in the data? d) What is the standard deviation of the residuals? e) If Canada and France differ by 6.7 liters of wine consumption, what would be the predicted difference in risk of heart disease? f) If we changed the units of wine consumption from liters to ounces, would the r 2 value change? Would the slope change?
Problem 3. Data were collected from 78 seventh grade students to investigate the relationship between grade point average (GPA) and intelligence quotient score (IQ). GPA was considered the response variable, and IQ was considered the predictor variable. The following statistics were obtained from these data: x = 108.9231 y = 7.4464 r = 0.6337 s x = 13.1710 s y = 2.0996 a) Is the association positive or negative? b) What is the equation for the best-fit straight line? c) What proportion of the variation in GPA can be accounted for by IQ? d) What GPA would you predict from a student with an IQ of 100? e) Given the total variation in GPA is 84342 (SST), compute the total sum of the squared errors. f) If we wanted to predict IQ from GPA what would be the best-fit regression line? Problem 4. We return to the dataset used in problem 5 above. Again, use the Minitab output in the box to answer the questions below. MTB> regr var2 var1 The regression equation is var2 = 261 23.0 var1 Predictor coef stdev t-ratio p Constant 260.56 13.38 18.83 0.0 var1-22.969 3.557-6.46 0.0 s = 37.88 R-sq=71.0% Analysis of Variance SOURCE DF SS MS F P Regression 1 59814 59814 41.69 0.00 Error 17 24931 1435 Total 18 84205 a) State the null and alternative hypotheses in terms of the slope (numerically and in words). b) Assume that the assumptions for inference have been met and perform the t-test for a relationship between the two variables (assume α = 0.05). State your conclusion. c) Construct a 95% confidence interval for the slope. d) Explain the meaning of the confidence interval.
ANOVA Problem 1: Will the grand mean of the sample always be equal to the true mean? If no, then explain why they could differ. Problem 2: A psychologist is interested in investigating if the three different treatments available to her have different effects on client improvement. She randomly chose seven clients from those undergoing each of the three different treatments, and asked them to rate their level of satisfaction on a scale of 1 to 100. (a ) Assuming that she will carry out an ANOVA on these data, state the null and alternative hypotheses. (b ) What is the minimum condition that will satisfy the alternative hypothesis in this case? Problem 3: For each of the two boxplots below (a ) state the null and alternative hypotheses, and (b ) comment on the possibility that the null hypothesis is false. Population 1 Population 2 Population 3 Treatment 1 Treatment 2 Treatment 3 Treatment 4
Problem 4: Calculate a e below by using only the summary statistics for Treatments A, B and C.! a. df 1 and df 2 b. y c. SSTR d. SSE e. SST Treatment A Treatment B Treatment C y A = 10 y B = 12 y C = 8 s A = 1 s B = 1 s C = 1 n A = 5 n B = 5 n C = 5 Problem 5: A researcher has randomly sampled grades for six students for each of three available ways to take introductory biology: (i) completely on - line, (ii) in a traditional classroom setting, and (iii) in a hybrid system having both traditional classroom lectures and online activities. Use the data in the table below to (a ) construct an ANOVA table, and (b ) comment on the possibility that student grades differ among the three groups.! Online Traditional Hybrid y A = 71.6667 y B = 74.1667 y C = 80 s A = 15.0555 s B = 13.1927 s C = 12.6491 n A = 6 n B = 6 n C = 6 Problem 6: For a - d below, find the critical value of F given values for α, df 1 and df 2. a. α = 0.01, df 1 = 1, and df 2 = 7 b. α = 0.05, df 1 = 1, and df 2 = 7 c. α = 0.05, df 1 = 2, and df 2 = 15 d. α = 0.05, df 1 = 9, and df 2 = 16 Problem 7: Complete the elements of the partial ANOVA table for an analysis of three experimental treatments, each having six observations. Source of variation DF Sum of squares Treatment Error 70 Total 140 Mean square F statistic
Problem 8: Using the ANOVA table that you completed in question 7, complete steps a - d below to test for inequality in the group means (assume α = 0.01). a. State the hypotheses (Clearly define µ i ) b. Find the F crit and the rejection rule c. Compare F data with F crii d. State your conclusion and interpretation