Basic Statistics Exercises 66

Similar documents
Unit 27 One-Way Analysis of Variance

Inferences for Regression

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

CHAPTER 10. Regression and Correlation

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Correlation & Simple Regression

SMAM 314 Exam 42 Name

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Simple Linear Regression

Simple Linear Regression Using Ordinary Least Squares

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Inference for the Regression Coefficient

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

Module 8: Linear Regression. The Applied Research Center

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Unit 6 - Introduction to linear regression

Ch14. Multiple Regression Analysis

Analysing data: regression and correlation S6 and S7

Inference with Simple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Review of Multiple Regression

Relax and good luck! STP 231 Example EXAM #2. Instructor: Ela Jackiewicz

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

Fish act Water temp

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

16.400/453J Human Factors Engineering. Design of Experiments II

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

AP Statistics - Chapter 2A Extra Practice

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Lectures on Simple Linear Regression Stat 431, Summer 2012

Chapter 12 - Lecture 2 Inferences about regression coefficient

23. Inference for regression

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012

1 A Review of Correlation and Regression

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

Simple Linear Regression: One Qualitative IV

Midterm 2 - Solutions

UNIT 12 ~ More About Regression

Six Sigma Black Belt Study Guides

STAT 350 Final (new Material) Review Problems Key Spring 2016

Essential Question: What are the standard intervals for a normal distribution? How are these intervals used to solve problems?

Chapter 9. Correlation and Regression

Lecture notes on Regression & SAS example demonstration

9. Linear Regression and Correlation

9 Correlation and Regression

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

Final Exam - Solutions

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Using SPSS for One Way Analysis of Variance

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Do not copy, post, or distribute

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

Regression. Marc H. Mehlman University of New Haven

16.3 One-Way ANOVA: The Procedure

y n 1 ( x i x )( y y i n 1 i y 2

Statistics for Managers using Microsoft Excel 6 th Edition

Lecture 30. DATA 8 Summer Regression Inference

Regression Analysis. BUS 735: Business Decision Making and Research

REVIEW 8/2/2017 陈芳华东师大英语系

Multiple linear regression S6

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

This document contains 3 sets of practice problems.

Factorial Independent Samples ANOVA

Linear Correlation and Regression Analysis

Statistics and Quantitative Analysis U4320

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Independent Samples ANOVA

Conditions for Regression Inference:

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Chapter 16. Simple Linear Regression and Correlation


7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Sociology 6Z03 Review II

1. Use Scenario 3-1. In this study, the response variable is

UNIVERSITY OF TORONTO Faculty of Arts and Science

2. Outliers and inference for regression

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH

1. What does the alternate hypothesis ask for a one-way between-subjects analysis of variance?

Correlation and simple linear regression S5

Basic Business Statistics 6 th Edition

Chapter 14 Student Lecture Notes 14-1

Chapter 4. Regression Models. Learning Objectives

Lecture 18: Simple Linear Regression

y response variable x 1, x 2,, x k -- a set of explanatory variables

Information Sources. Class webpage (also linked to my.ucdavis page for the class):

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Inference for Regression

Checking model assumptions with regression diagnostics

Taguchi Method and Robust Design: Tutorial and Guideline

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Transcription:

Basic Statistics Exercises 66

42. Suppose we are interested in predicting a person's height from the person's length of stride (distance between footprints). The following data is recorded for a random sample of 5 people: Length of Stride (inches) 14 13 21 25 17 Height (inches) 61 54 63 72 59 (a) Identify the dependent (response) variable and the independent (explanatory) variable for a regression analysis. The dependent (response) variable is Y = height, and the independent (explanatory) variable is stride length. (b) Does the data appear to be observational or experimental? Since the ages look random, it appears that the data is observational. (c) Use the formulas from Exercise 41 find the equation of the least squares line. ^ y = 40.2 + 1.2x (d) Use the least squares line to predict the height of a person whose length of stride is 15 inches. ^ y = 40.2 + 1.2(15) = 58.2 inches Basic Statistics Exercises 67

43. The prediction of a male's right-hand grip strength from age is to be studied. A 0.05 significance level is to be used with a simple linear regression. The following data is recorded for a random sample of males: Age (years) 15 17 19 11 16 22 17 25 12 14 25 23 Grip Strength (lbs.) 50 54 66 46 58 54 64 80 46 70 76 80 (a) Identify the dependent (response) variable and the independent (explanatory) variable for a regression analysis. The dependent (response) variable is Y = grip strength, and the independent (explanatory) variable is age. (b) Does the data appear to be observational or experimental? Since the ages look random, it appears that the data is observational. (c) In order to use SPSS to do the calculations needed for the statistical analysis, enter the data into an SPSS data file named age_grip containing two variables named age and grip. Then, go to the document titled Using SPSS for Windows (which can be accessed from the appropriate link on the course syllabus web page), go to the section titled Hypothesis Tests Involving Two Variables, and read the steps in the subsection titled Performing a Simple Linear Regression with Checks of Linearity, Homoscedasticity, and Normality Assumptions. Use these steps as a guide to obtaining the output needed for the remainder of this exercise. Once you have successfully generated SPSS output, add a title to the top of the output in the following format: YOUR NAME Basic Statistics Exercise 43(c) Verify that your SPSS output contains all of the following: Basic Statistics Exercises 68

43. - continued Basic Statistics Exercises 69

43. - continued Basic Statistics Exercises 70

43. - continued (b) Use the SPSS output to find each of the following: n = 12 18 62 r = + 0.770 (11)(4.824) 2 = 255.98 (c) Use the SPSS output to find the equation of the least squares line. ^ The least squares line can be written y = 26 + 2x. (d) Write a one-sentence interpretation of the slope in the least squares line. Grip strength appears to increase on average by about 2 pounds with each increase of one year in age. (e) Find the coefficient of determination, and write a one-sentence interpretation. From the SPSS output, we find r 2 = 0.593. About 59.3% of the variation in grip strength is explained by age. (f) Find the standard error of estimate. From the SPSS output, we find s = 8.390. Basic Statistics Exercises 71

43. - continued (g) Use the SPSS output to make a statement concerning whether each of the following assumptions in a simple linear regression is satisfied: the linearity assumption The data points appear to be randomly distributed about the least squares line on the scatter plot, and the residuals plotted against the predicted values look random. Consequently, the linearity assumption appears to be satisfied. the uniform variance (homoscedasticity) assumption The variation in standardized residuals around the horizontal line looks reasonably uniform. the normality assumption The histogram of standardized residuals looks somewhat bell-shaped, and the points on the normal probability plot do not seem to depart too far from the diagonal line. Since the necessary assumptions appear to be satisfied, we feel it is appropriate to proceed with the regression analysis. Basic Statistics Exercises 72

43 - continued (h) A 0.05 significance level is chosen for a hypothesis test to see if there is any evidence that the linear relationship between age and grip strength is significant, that is, that the slope in the regression is significantly different from zero (0). Write the results of this hypothesis test two different ways: Write the results of the f test in the ANOVA table in a format suitable for a journal article to be submitted for publication. The f test in the ANOVA for the regression to predict grip strength from age is statistically significant at the 0.05 level (f 1, 10 = 14.545, f 1, 10; 0.05 = 4.96, 0.001 < p < 0.01 OR p = 0.003). We conclude that the linear relationship between age and grip strength is significant, and the data suggest a positive relationship. Write the results of the t test about the slope in a format suitable for a journal article to be submitted for publication. With a t test, the sample slope (2.00 lbs.) is statistically significantly different from zero at the 0.05 level (t 10 = 3.814, t 10; 0.025 = 2.228, 0.001 < p < 0.01 OR p = 0.003). We can be 95% confident that the slope in the regression to predict grip strength from age is between 0.832 and 3.168 lbs. Considering the results of the hypothesis test, decide whether or not a 95% confidence interval for the slope in the regression would be of interest. If yes, find and interpret the confidence interval; if not, explain why. Since rejecting H 0 suggests that the hypothesized zero slope is not correct, a 95% confidence interval will provide us with some information about the slope, which estimates the average change in grip strength with an increase of one year in age. 2 (2.228)(8.390/ 255.98), 2 + (2.228)(8.390/ 255.98) Basic Statistics Exercises 73

43 - continued (i) A 0.05 significance level is chosen for a hypothesis test to see if there is any evidence that the mean grip strength for 20 year old right-handed males is different from 80 lbs. Write the results of this hypothesis test in a format suitable for a journal article to be submitted for publication. With a t test, the estimated mean grip strength (66 lbs.) is statistically significantly different from the hypothesized mean (80 lbs.) at the 0.05 level (t 10 = 5.304, t 10; 0.025 = 2.228, p < 0.001). We can be 95% confident that the mean grip strength for 20 year old right-handed males is between 60.12 and 71.88 lbs. Considering the results of the hypothesis test, decide whether or not a 95% confidence interval for the mean grip strength for 20 year old right-handed males would be of interest. If yes, find and interpret the confidence interval; if not, explain why. Since rejecting H 0 suggests that the hypothesized mean grip strength for 20 year old right-handed males is not correct, a 95% confidence interval will provide us with some information about this mean. 26 + 2(20) = 66 66 (2.228)(8.390 1/12 + (20 18) 2 /255.98), 66 + (2.228)(8.390/ 1/12 + (20 18) 2 /255.98) Basic Statistics Exercises 74

43 - continued (j) Find and interpret a 95% prediction interval for the grip strength of a 20 year old right-handed male. 26 + 2(20) = 66 66 (2.228)(8.390 1 + 1/12 + (20 18) 2 /255.98), 66 + (2.228)(8.390/ 1 + 1/12 + (20 18) 2 /255.98) We are 95% confident that the grip strength for a randomly selected 20-year old right-handed male will be between 46.40 and 85.60 lbs. OR At least 95% of 20-year old right-handed males have a grip strength between 46.40 and 85.60 lbs. (k) For what age group of right-handed males will the confidence interval for mean grip strength and the prediction interval for a particular grip strength both have the smallest length? 18 year olds Basic Statistics Exercises 75

44. The prediction of score (0-100) on a test from hours of study is to be studied. A 0.05 significance level is to be used with a simple linear regression. The following data is recorded for a random sample of students: Study Time(hrs) 2 2 2 4 4 4 6 6 6 8 8 8 10 10 10 Test Score(points) 65 61 95 65 61 73 77 73 82 79 81 73 99 88 79 (a) Identify the dependent (response) variable and the independent (explanatory) variable for a regression analysis. The dependent (response) variable is Y = test score, and the independent (explanatory) variable is study time. (b) Does the data appear to be observational or experimental? Since the times do not look random, it appears that the data is experimental. (c) In order to use SPSS to do the calculations needed for the statistical analysis, enter the data into an SPSS data file named time_score containing two variables named time and score. Then, go to the document titled Using SPSS for Windows (which can be accessed from the appropriate link on the course syllabus web page), go to the section titled Hypothesis Tests Involving Two Variables, and read the steps in the subsection titled Performing a Simple Linear Regression with Checks of Linearity, Homoscedasticity, and Normality Assumptions. Use these steps as a guide to obtaining the output needed for the remainder of this exercise. Once you have successfully generated SPSS output, add a title to the top of the output in the following format: YOUR NAME Basic Statistics Exercise 44(c) Verify that your SPSS output contains all of the following: a scatter plot displaying the least squares line; tables titled Descriptive Statistics, Correlations, Model Summary, ANOVA, and Coefficients; a normal probability plot; a histogram on which a bell-shaped curve has been superimposed; a plot of standardized predicted values versus standardized residuals. Basic Statistics Exercises 76

44 - continued (b) Use the SPSS output to find each of the following: n = 15 6 76.73 r = + 0.530 (14)(2.928) 2 = 120.024576 (c) Use the SPSS output to find the equation of the least squares line. ^ The least squares line can be written y = 64.333 + 2.067x. (d) Write a one-sentence interpretation of the slope in the least squares line. Test score appears to increase on average by about 2.067 points with each increase of one hour in study time. (e) Find the coefficient of determination, and write a one-sentence interpretation. From the SPSS output, we find r 2 = 0.281. About 28.1% of the variation in test score is explained by study time. (f) Find the standard error of estimate. From the SPSS output, we find s = 10.048. Basic Statistics Exercises 77

44. - continued (g) Use the SPSS output to make a statement concerning whether each of the following assumptions in a simple linear regression is satisfied: the linearity assumption Since the data points appear to be randomly distributed about the least squares line on the scatter plot, and the residuals plotted against the predicted values look random. Consequently, the linearity assumption appears to be satisfied. the uniform variance (homoscedasticity) assumption The variation in standardized residuals around the horizontal line looks reasonably uniform. the normality assumption The histogram of standardized residuals looks somewhat bell-shaped, even though the points on the normal probability plot do not seem to depart too far from the diagonal line. Since the necessary assumptions do not appear to be drastically violated, we feel it is appropriate to proceed with the regression analysis. Basic Statistics Exercises 78

44 - continued In the Word document named Basic_Statistics_Result_Summaries (created previously), begin a section titled Basic Statistics Exercises 44. In this section, create a subsection for each of parts (h), (i), and (j) which follow, and in each subsection created, write the summaries for the corresponding part. Print the page(s) and insert them immediately after this page. (h) A 0.05 significance level is chosen for a hypothesis test to see if there is any evidence that the linear relationship between study time and test score is significant, that is, that the slope in the regression is significantly different from zero (0). Write the results of this hypothesis test two different ways: Write the results of the f test in the ANOVA table in a format suitable for a journal article to be submitted for publication. Write the results of the t test about the slope in a format suitable for a journal article to be submitted for publication. Also, considering the results of this hypothesis test, decide whether or not a 95% confidence interval for the slope in the regression would be of interest. If yes, find and interpret the confidence interval; if not, explain why. (i) A 0.05 significance level is chosen for a hypothesis test to see if there is any evidence that the mean test score for students who study for 5 hours is different from 85 points. Write the results of this hypothesis test in a format suitable for a journal article to be submitted for publication. Considering the results of the hypothesis test, decide whether or not a 95% confidence interval for the mean test score for students who study for 5 hours would be of interest. If yes, find and interpret the confidence interval; if not, explain why. (j) Find and interpret a 95% prediction interval for the test score of a student who studied for 5 hours. (k) For what study time will the confidence interval for mean test score and the prediction interval for a particular test score both have the smallest length? 6 hours Basic Statistics Exercises 79

45. In a study of the impact of temperature during the summer months on the maximum amount of power that must be generated to meet demand each day, the prediction of daily peak power load (megawatts) from daily high temperature (degrees Fahrenheit) is of interest. Data for 25 randomly selected summer days is stored in the SPSS data file powerloads (which can be accessed from the appropriate link on the course syllabus web page). A 0.05 significance level is chosen for hypothesis testing. (a) Identify the dependent (response) variable and the independent (explanatory) variable for a regression analysis. The dependent (response) variable is Y = daily peak power load, and the independent (explanatory) variable is X = daily high temperature. (b) Does the data appear to be observational or experimental? Since the daily high temperature is random, the data is observational. (c) Use SPSS to do the calculations needed for a simple linear regression by going to the document titled Using SPSS for Windows (which can be accessed from the appropriate link on the course syllabus web page), going to the section titled Hypothesis Tests Involving Two Variables, and reading the steps in the subsection titled Performing a Simple Linear Regression with Checks of Linearity, Homoscedasticity, and Normality Assumptions. Once you have successfully generated SPSS output, add a title to the top of the output in the following format: YOUR NAME Basic Statistics Exercise 45(c) Verify that your SPSS output contains all of the following: Basic Statistics Exercises 80

45. - continued Basic Statistics Exercises 81

45. - continued Basic Statistics Exercises 82

45. - continued (d) Use the SPSS output to make a statement concerning whether each of the following assumptions in a simple linear regression is satisfied: the linearity assumption The data points do not appear to be randomly distributed about the least squares line on the scatter plot; it seems that as temperature increases, power load increases at a faster rate. Also, the residuals plotted against the predicted values do not look random. Consequently, the linearity assumption does not appear to be satisfied. the uniform variance (homoscedasticity) assumption the normality assumption Since the linearity assumption does not appear to be satisfied, it is not possible (or even relevant) to consider the uniform variance and normality assumptions. We do not feel it is appropriate to proceed with the regression analysis. Basic Statistics Exercises 83

45 - continued (e) It is decided that a quadratic model to predict daily peak power load from daily high temperature will be considered to improve prediction. Write an equation which describes this model. Y = a + b 1 X + b 2 X 2 OR powerload = a + b 1 (temp) + b 2 (temp) 2 (f) Use SPSS to do the calculations needed for a quadratic regression by going to the document titled Using SPSS for Windows (which can be accessed from the appropriate link on the course syllabus web page), going to the section titled Hypothesis Tests Involving Two or More Variables, and reading the steps in the subsection titled Performing a Quadratic Regression with Checks of Model, Homoscedasticity, and Normality Assumptions. Once you have successfully generated SPSS output, add a title to the top of the output in the following format: YOUR NAME Basic Statistics Exercise 45(f) Verify that your SPSS output contains all of the following: Basic Statistics Exercises 84

45 - continued Basic Statistics Exercises 85

45 - continued (g) Use the SPSS output to make a statement concerning whether each of the following assumptions in the quadratic regression is satisfied: the assumption of the quadratic model Since the residuals plotted against the predicted values look random, the assumption of the quadratic model appears to be correct. the uniform variance (homoscedasticity) assumption The variation in standardized residuals around the horizontal line looks reasonably uniform. the normality assumption The histogram of standardized residuals looks somewhat bell-shaped, and the points on the normal probability plot do not seem to depart to far from the diagonal line. Based on these observations, we feel it is appropriate to proceed with the quadratic regression analysis. (h) A 0.05 significance level is chosen for the following hypothesis tests: First, write the results of the f test in the ANOVA table for the quadratic regression in a format suitable for a journal article to be submitted for publication. The f test in the ANOVA for the regression to predict power load from temperature and squared temperature is statistically significant at the 0.05 level (f 2, 22 = 259.687, f 2, 22; 0.05 = 3.44, p < 0.001). Basic Statistics Exercises 86

45(h) - continued Next, complete the steps outlined below to calculate the f statistic for the hypothesis test see if there is any evidence that the addition of squared temperature (the quadratic term) after temperature (the linear term) is statistically significant; the, write the results of this f test in a format suitable for a journal article to be submitted for publication. SSR(temp, temp 2 ) = the regression sum of squares from the ANOVA table with both temperature and squared temperature in the model = SSR(temp) = the regression sum of squares from the ANOVA table with only temperature in the model = MSE(temp, temp 2 ) = the error mean square from the ANOVA table with both temperature and squared temperature in the model = numerator df for the f statistic = number of new terms added to the model = denominator df for the f statistic = df associated with MSE(temp, temp 2 ) = SSR(temp, temp 2 ) SSR(temp) 15011.772 13196.400 f statistic = = = MSE(temp, temp 2 ) 28.904 62.807 The addition of squared temperature after temperature to predict power load is statistically significant at the 0.05 level (f 1, 22 = 62.807, f 1, 22; 0.05 = 4.30, p < 0.001). Basic Statistics Exercises 87

45 - continued (i) Use the SPSS output to find the equation of the least squares quadratic. The least squares line can be written ^ load = 385.048 8.293(temp) + 0.060(temp) 2. (j) Find the multiple R 2, and write a one-sentence interpretation. From the SPSS output, we find R 2 = 0.959. About 95.9% of the variation in power load is explained by temperature and squared temperature. (k) Find the standard error of estimate. From the SPSS output, we find s = 5.37620. (l) Use the least squares quadratic to predict the daily peak power load on a day when the high temperature is 75 degrees Fahrenheit, and also on a day when the high temperature is 85 degrees Fahrenheit. 385.048 8.293(75) + 0.060(75) 2 = 100.573 megawatts 385.048 8.293(85) + 0.060(85) 2 = 113.643 megawatts Basic Statistics Exercises 88

46. In a study to predict IgG (milligrams of immunoglobulin in blood), which is an indicator of long-term immunity, from maximal oxygen uptake (milliliters per kilogram), which is a measure of aerobic fitness level, data is taken on randomly selected subjects and stored in the SPSS data file aerobic (which can be accessed from the appropriate link on the course syllabus web page). A 0.05 significance level is chosen for hypothesis testing. (a) Identify the dependent (response) variable and the independent (explanatory) variable for a regression analysis. The dependent (response) variable is Y = IgG, and the independent (explanatory) variable is X = maximal oxygen uptake. (b) Does the data appear to be observational or experimental? Since the maximal oxygen uptake is random, the data is observational. (c) Use SPSS to do the calculations needed for a simple linear regression by going to the document titled Using SPSS for Windows (which can be accessed from the appropriate link on the course syllabus web page), going to the section titled Hypothesis Tests Involving Two Variables, and reading the steps in the subsection titled Performing a Simple Linear Regression with Checks of Linearity, Homoscedasticity, and Normality Assumptions. Once you have successfully generated SPSS output, add a title to the top of the output in the following format: YOUR NAME Basic Statistics Exercise 46(c) Verify that your SPSS output contains all of the following: a scatter plot displaying the least squares line; tables titled Descriptive Statistics, Correlations, Model Summary, ANOVA, and Coefficients; a normal probability plot; a histogram on which a bell-shaped curve has been superimposed; a plot of standardized predicted values versus standardized residuals. Basic Statistics Exercises 89

46. - continued (d) Use the SPSS output to make a statement concerning whether each of the following assumptions in a simple linear regression is satisfied: the linearity assumption The data points do not appear to be randomly distributed about the least squares line on the scatter plot; it seems that as maximal oxygen uptake increases, IgG increases at a slower rate. Also, the residuals plotted against the predicted values do not look random. Consequently, the linearity assumption does not appear to be satisfied. the uniform variance (homoscedasticity) assumption the normality assumption Since the linearity assumption does not appear to be satisfied, it is not possible (or even relevant) to consider the uniform variance and normality assumptions. We do not feel it is appropriate to proceed with the regression analysis. Basic Statistics Exercises 90

46 - continued (e) It is decided that a quadratic model to predict IgG from maximum oxygen intake will be considered to improve prediction. Write an equation which describes this model. Y = a + b 1 X + b 2 X 2 OR IgG = a + b 1 (maxoxy) + b 2 (maxoxy) 2 (f) Use SPSS to do the calculations needed for a quadratic regression by going to the document titled Using SPSS for Windows (which can be accessed from the appropriate link on the course syllabus web page), going to the section titled Hypothesis Tests Involving Two or More Variables, and reading the steps in the subsection titled Performing a Quadratic Regression with Checks of Model, Homoscedasticity, and Normality Assumptions. Once you have successfully generated SPSS output, add a title to the top of the output in the following format: YOUR NAME Basic Statistics Exercise 46(f) Verify that your SPSS output contains all of the following: tables titled Descriptive Statistics, Correlations, Model Summary, ANOVA, and Coefficients; a normal probability plot; a histogram on which a bell-shaped curve has been superimposed; a plot of standardized predicted values versus standardized residuals. Basic Statistics Exercises 91

46 - continued (g) Use the SPSS output to make a statement concerning whether each of the following assumptions in the quadratic regression is satisfied: the assumption of the quadratic model Since the residuals plotted against the predicted values look random, the assumption of the quadratic model appears to be correct. the uniform variance (homoscedasticity) assumption The variation in standardized residuals around the horizontal line looks reasonably uniform. the normality assumption The histogram of standardized residuals looks somewhat bell-shaped, and the points on the normal probability plot do not seem to depart to far from the diagonal line. Based on these observations, we feel it is appropriate to proceed with the quadratic regression analysis. (h) A 0.05 significance level is chosen for the following hypothesis tests: First, write the results of the f test in the ANOVA table for the quadratic regression in a format suitable for a journal article to be submitted for publication. The f test in the ANOVA for the regression to predict IgG from maximal oxygen intake and squared maximal oxygen intake is statistically significant at the 0.05 level (f 2, 27 = 203.159, f 2, 27; 0.05 = 3.35, p < 0.001). Basic Statistics Exercises 92

46(h) - continued Next, complete the steps outlined below to calculate the f statistic for the hypothesis test see if there is any evidence that the addition of squared maximal oxygen intake (the quadratic term) after maximal oxygen intake (the linear term) is statistically significant; the, write the results of this f test in a format suitable for a journal article to be submitted for publication. SSR(maxoxy, maxoxy 2 ) = the regression sum of squares from the ANOVA table with both maximal oxygen intake and squared maximal oxygen intake in the model = SSR(maxoxy) = the regression sum of squares from the ANOVA table with only maximal oxygen intake in the model = MSE(maxoxy, maxoxy 2 ) = the error mean square from the ANOVA table with both maximal oxygen intake and squared maximal oxygen intake in the model = numerator df for the f statistic = number of new terms added to the model = denominator df for the f statistic = df associated with MSE(maxoxy, maxoxy 2 ) = f statistic = SSR(maxoxy, maxoxy 2 ) SSR(maxoxy) 4602210.632 4472047.115 = = MSE(maxoxy, maxoxy 2 ) 11326.605 11.492 The addition of squared maximum oxygen intake after maximum oxygen intake to predict IgG is statistically significant at the 0.05 level (f 1, 27 = 11.492, f 1, 27; 0.05 = 4.21, p < 0.001). Basic Statistics Exercises 93

46 - continued (i) Use the SPSS output to find the equation of the least squares quadratic. The least squares line can be written ^ IgG = 1464.404 + 88.307(maxoxy) 0.536(maxoxy) 2. (j) Find the multiple R 2, and write a one-sentence interpretation. From the SPSS output, we find R 2 = 0.938. About 93.8% of the variation in IgG is explained by maximum oxygen intake and squared maximum oxygen intake. (k) Find the standard error of estimate. From the SPSS output, we find s = 106.427. (l) Use the least squares quadratic to predict the the IgG for a person whose maximal oxygen uptake is 40 milliliters per kilogram. 1464.404 + 88.307(40) 0.536(40) 2 = 1210.276 milligrams Basic Statistics Exercises 94