Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons

Similar documents
Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Simple Linear Regression: One Quantitative IV

Correlation and simple linear regression S5

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013

Chapter 9 - Correlation and Regression

Simple Linear Regression

Multiple Comparisons

Interactions between Binary & Quantitative Predictors

A discussion on multiple regression models

bivariate correlation bivariate regression multiple regression

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors

SPSS Guide For MMI 409

What Does the F-Ratio Tell Us?

1 A Review of Correlation and Regression

General Linear Model (Chapter 4)

Warner, R. M. (2008). Applied Statistics: From bivariate through multivariate techniques. Thousand Oaks: Sage.

In Class Review Exercises Vartanian: SW 540

Inference for Regression Inference about the Regression Model and Using the Regression Line

The point value of each problem is in the left-hand margin. You must show your work to receive any credit, except on problems 1 & 2. Work neatly.

ECON 497 Midterm Spring

Ordinary Least Squares Regression Explained: Vartanian

Item-Total Statistics. Corrected Item- Cronbach's Item Deleted. Total

Multiple Linear Regression

SPSS LAB FILE 1

TOPIC 9 SIMPLE REGRESSION & CORRELATION

Two-Way ANOVA. Chapter 15

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

ST505/S697R: Fall Homework 2 Solution.

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

16.400/453J Human Factors Engineering. Design of Experiments II

Multiple Regression and Model Building (cont d) + GIS Lecture 21 3 May 2006 R. Ryznar

Simple Linear Regression Using Ordinary Least Squares

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Three Factor Completely Randomized Design with One Continuous Factor: Using SPSS GLM UNIVARIATE R. C. Gardner Department of Psychology

1. The (dependent variable) is the variable of interest to be measured in the experiment.

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

STAT 525 Fall Final exam. Tuesday December 14, 2010

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Preview from Notesale.co.uk Page 3 of 63

Multiple linear regression S6

Chapter 3: Examining Relationships

1 Correlation and Inference from Regression

Formula for the t-test

IT 403 Practice Problems (2-2) Answers

MORE ON SIMPLE REGRESSION: OVERVIEW

Review of Multiple Regression

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Difference in two or more average scores in different groups

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

1 DV is normally distributed in the population for each level of the within-subjects factor 2 The population variances of the difference scores

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

STAT Chapter 11: Regression

ANOVA: Comparing More Than Two Means

Practical Biostatistics

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Inference for the Regression Coefficient

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

5. Multiple Regression (Regressioanalyysi) (Azcel Ch. 11, Milton/Arnold Ch. 12) The k-variable Multiple Regression Model

Advanced Quantitative Data Analysis

Multiple linear regression

The inductive effect in nitridosilicates and oxysilicates and its effects on 5d energy levels of Ce 3+

Correlation and Regression Bangkok, 14-18, Sept. 2015

Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D.

Inferences for Regression

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

STAT 328 (Statistical Packages)

Regression. Notes. Page 1. Output Created Comments 25-JAN :29:55

Hypothesis testing: Steps

AP Statistics Unit 2 (Chapters 7-10) Warm-Ups: Part 1

Chapter 4. Regression Models. Learning Objectives

Estimating a Population Mean

Analysis of variance

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Correlations. Notes. Output Created Comments 04-OCT :34:52

Hypothesis testing: Steps

The independent-means t-test:

Multiple Linear Regression

More about Single Factor Experiments

Extensions of One-Way ANOVA.

Categorical Predictor Variables

ANOVA continued. Chapter 10

4/22/2010. Test 3 Review ANOVA

Transcription:

Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons 1. Suppose we wish to assess the impact of five treatments on an outcome Y. How would these five treatments be coded as dummy variables? Present actual data to illustrate the coding. Each case for a given treatment would be coded as 1, and if a case does not occur for that treatment, it would be coded as 0. Y Treatment 1 Treatment 2 Treatment 3 Treatment 4 Treatment 5 Treatment Received Score 1 0 0 0 0 1 Score 0 1 0 0 0 2 Score 0 0 1 0 0 3 Score 0 0 0 1 0 4 Score 0 0 0 0 1 5 2. Below is a regression analysis with two variables: Heart rate = beats per minute Blood Pressure Medication = four drugs prescriptions (Losartan, Ziac, Lisinopril [12.5mg], and Lisinopril [40mg]) Dummy variables were created for Ziac, Lisinopril 12.5, and Lisinopril 40. Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1.741(a).550.531 4.75098 a Predictors: (Constant), lisinopril_40, lisinopril_12_5, ziac ANOVA(b) Model Sum of Squares df Mean Square F Sig. 1 Regression 1982.987 3 660.996 29.284.000(a) Residual 1625.171 72 22.572 Total 3608.158 75 a Predictors: (Constant), lisinopril_40, lisinopril_12_5, ziac b Dependent Variable: heart_rate Model Coefficients(a) Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 61.714 1.037 59.527.000 ziac -12.597 1.550 -.762-8.127.000 lisinopril_12_5-1.067 1.550 -.065 -.689.493 lisinopril_40 -.190 1.466 -.012 -.130.897 a Dependent Variable: heart_rate

Coefficients(a) Model 95% Confidence Interval for B Lower Bound Upper Bound 1 (Constant) 59.648 63.781 ziac -15.687-9.507 lisinopril_12_5-4.157 2.023 lisinopril_40-3.113 2.732 a Dependent Variable: heart_rate (a) Overall does it appear heart rate differs by blood pressure medication at the.05 level? Explain how you arrived at an answer for this question. Yes, because the model F ratio, 29.28, which is significant at the.05 level (p=.000), tests whether there are mean differences among each of the four drugs if drug is the only variable in the question. (b) Provide literal interpretations for each of the four unstandardized regression coefficients presented in the coefficients table. B0 = 61.71: This is the mean heart rate when taking Losartan B1 = -12.597: The mean heart rate is 12.59 lower under Ziac compared to the heart rate when taking Losartan. B2 = -1.067: The mean heart rate is 1.067 lower under Lisinopril 12.5 compared to the heart rate when taking Losartan. B3 = -0.19: The mean heart rate is 0.19 lower under Lisinopril 40 compared to the heart rate when taking Losartan. (c) What is the predicted mean heart rate for someone taking Lisinopril 40? For someone taking Ziac? Lisinopril 40: Predicted Heart Rate = 61.714-12.597(Ziac) -1.067(Lisin 12.5) -0.190(Lisin 40) Predicted Heart Rate = 61.714-12.597(0) -1.067(0) -0.190(1) Predicted Heart Rate = 61.714-0.190(1) Predicted Heart Rate = 61.524 Ziac: Predicted Heart Rate = 61.714-12.597(Ziac) -1.067(Lisin 12.5) -0.190(Lisin 40) Predicted Heart Rate = 61.714-12.597(1) -1.067(0) -0.190(0) Predicted Heart Rate = 61.714-12.597(1)) Predicted Heart Rate = 49.117 (d) Which regression coefficients are statistically significant at the.05 level for the dummy variables? Only the Ziac coefficient is statistically significant (t = -8.12, p<.001).

(e) Write a very brief, one sentence interpretation for each dummy variable coefficient in the SPSS coefficients table above. Take into account significance testing with an alpha =.05. Ziac, b = -12.597: Significant so this means heart rate is lower when takin Ziac than when taking Losartan. Lisin 12.5, b = -1.067: Not significant so this means heart rate is similar between Lisinopril 12.5 and Losartan. Lisin 40, b = -0.190: Not significant so this means heart rate is similar between Lisinopril 40 and Losartan. (f) Model R 2 =.55 what does this tell us? About 55% of variance in heart rate can be predicted by drug prescription. (g) What is the predicted mean heart rate difference between Ziac and Lisinopril 12.5? Predicted mean for Ziac = 49.117 Predicted mean for Lisin 12.5 = 60.647 Mean difference = 49.117-60.647 = -11.53 (h) What is the interpretation for the 95% confidence interval for b3 (Lisin 40 dummy)? One may be 95% confident that the true mean difference in heart rate between Lisinopril 40 and Losartan lies between -3.113 and 2.732. 3. If one wishes to compare Y across five treatments, and the per comparison alpha =.10, (a) how many pairwise comparisons are possible, and If there are five treatments, then there are 10 total pairwise comparisons. The following formula can be used to compute the number of pairwise comparisons where N is the number of units/groups to compare: ((N 1) N) / 2 = number of pairwise comparison possible ((5 1) 5) / 2 (4*5)/2 20/2 = 10 (b) what would be the familywise error rate? The formula to determine familywise error rate is Familywise error rate = 1 - (1 - alpha per comparison) C where c is the number of comparisons to perform. Familywise error rate = 1 - (1 - alpha per comparison) C Familywise error rate = 1 - (1 -.10) 10 Familywise error rate = 1 - (.90) 10 Familywise error rate =.6514

(c) Interpret the familywise error rate calculated in (b) above. The probability of committing a Type 1 error across the 10 comparisons is.6514 when each comparison as a per comparison alpha of.10. 4. Question 2 above presented SPSS results for heart rate compared across four drug prescriptions. The per comparison alpha was set at.05. The number of observations for the study is 76. (a) There are six possible pairwise comparisons among the four drug prescriptions. This means there are six different confidence intervals that could be computed. Use a familywise error rate of.05 when considering these comparisons. Based upon this information, calculate and present the Bonferroni confidence interval for the Ziac vs. Losartan comparison. For simplicity, I used the Bonferroni adjusted t critical values found in this table http://www.bwgriffin.com/gsu/courses/edur8132/notes/bonferroni_critical_t_values.pdf Bonferroni adjusted t = 2.721 (for 6 companions, alpha =.05, and df2 = 60; note cannot use next level of 120).95CI = b ± t (Bonferroni t) (SE b).95ci = -12.597 ± (2.721) (1.55) Upper = -8.379 Lower = -16.814 A more precise critical t value can be obtained with Excel using the critical t value function: =T.INV.2T(adjusted alpha, df) The adjusted alpha would be.05/6 =.008333, and the df for this model is 76-3-1 = 72 (also reported in the ANOVA table from SPSS above), so =T.INV.2T(.008333, 72) = -2.7129 So with this critical t the CI would be.95ci = b ± t (Bonferroni t) (SE b).95ci = -12.597 ± (2.7129) (1.55) Upper = -8.392 Lower = -16.802 This value matches slightly better with the CI provided by SPSS.

(b) Using the same information presented in (a), construct and present a Scheffé confidence interval for the Ziac vs. Losartan comparison. To calculate the CI, first one must find the critical F ratio: df1 = J 1 = 4 1 = 3 df2 = n k 1 = 76 3 1 = 72 where J is the number of groups, and k is the number of dummy variables in the regression equation. We want an overall familywise error rate of.05. Critical F can be found in Excel using the following function =F.INV.RT(alpha level, df1, df2) =F.INV.RT(0.05,3,72) = 2.7318 Next convert this critical F ratio to a Scheffé adjusted F ratio Scheffé F = (J 1) (original critical F) Scheffé F = (3) (2.7318) Scheffé F = 8.1954 Next convert this Scheffé F ratio to a critical Scheffé t value by taking the square root of the Scheffé F: Scheffé t = Scheffé F Scheffé t = 8. 1954 Scheffé t = 2.8627 Now use the CI formula to find the upper and lower limits:.95ci = b ± t (Scheffé t ) (SE b).95ci = -12.597 ± (2.8627) (1.55) Upper = -8.1598 Lower = -17.034

5. Below is a data file containing the following variables for cars taken between 1970 and 1982: mpg: miles per gallon engine: engine displacement in cubic inches horse: horsepower weight: vehicle weight in pounds accel: time to accelerate from 0 to 60 mph in seconds year: model year (70 = 1970, to 82 = 1982) origin: country of origin (1=American, 2=Europe, 3=Japan) cylinder: number of cylinders SPSS Data: http://www.bwgriffin.com/gsu/courses/edur8132/selfassessments/week04/cars_missing_deleted.sav (Note: There are underscore marks between words in the SPSS data file name.) Other Data Format: If you prefer a data file format other than SPSS, let me know. For this problem we wish to know whether MPG differs among car origins: Predicted MPG = b0 + origin of car with appropriate dummy variables Present an APA styled regression analysis with DV = MPR and IV = origin. Set alpha =.01. You will have to create the dummy variables for origins. Also present both the Bonferroni and Scheffé confidence intervals comparing each of the origins. APA styled response. Table 1 Descriptive Statistics and Correlations Between MPG and Origin of Vehicle Manufacturer Variable Correlations MPG USA Europe MPG --- USA -.56* --- Europe.24* -.59* --- Mean 23.48 0.62.17 SD 7.78.48.38 Note. USA (1 = made in USA; 0 = others) and Europe (1 = made in Europe, 0 = others) are dummy variables. n = 391. *p <.01. Table 2 Regression Analysis of MPG by Origin of Vehicle Variable b se b 99% CI t USA -10.37 0.83-12.51, -8.23-12.56* Europe -2.85 1.06-5.58, -0.12-2.70* Intercept 30.45 0.72 28.59, 32.31 42.42* Note. R 2 =.33, adj. R 2 =.33, F 2,388 = 96.03*, MSE = 40.703, n = 391. USA (1 = made in USA; 0 = others) and Europe (1 = made in Europe, 0 = others) are dummy variables. *p <.01.

Table 3 Multiple Comparison in MPG by Origin of Vehicle Comparison Estimated Mean Difference Standard Error of Difference 99% Bonferroni Adjusted CI of Mean Difference 99% Scheffé Adjusted CI of Mean Difference USA vs. Japan -10.37* 0.83-12.81, -7.93-12.89, -7.85 Europe vs. Japan -2.85 1.06-5.97, 0.27-6.07, 0.37 USA vs Europe -7.52* 0.88-10.11, -4.94-10.20, -4.86 *p <.01, where p-values are adjusted using the Bonferroni or Scheffé method depending upon which column is viewed. (Note to EDUR 8132 students: Normally one would not present both the Bonferroni and Scheffé corrected confidence intervals, but for convenience I have included both in one table.) Regression results show that there are statistically significant mean differences in MPG across vehicles based on vehicle origin. More specifically, the table of multiple comparisons shows that vehicles made in the USA tend to have lower MPG than those made in Japan or Europe, and these differences are significant at the.01 level using the Bonferroni (or Scheffé) adjustment for comparisons. The MPG difference between European and Japanese cars is not statistically significant suggesting that vehicles from these two areas produce similar mean MPGs. SPSS Output Used to Create the Above Tables Correlations obtained using SPSS correlation command. Descriptive Statistics Mean Std. Deviation N Miles per Gallon 23.48 7.781 391 USA.6240.48499 391 Europe.1739.37952 391 Correlations Miles per Gallon USA Europe Miles per Gallon Pearson Correlation 1 -.564(**).243(**) Sig. (2-tailed).000.000 N 391 391 391 USA Pearson Correlation -.564(**) 1 -.591(**) Sig. (2-tailed).000.000 N 391 391 391 Europe Pearson Correlation.243(**) -.591(**) 1 Sig. (2-tailed).000.000 N 391 391 391 ** Correlation is significant at the 0.01 level (2-tailed). Below are SPSS results using General Linear Model this will be converted to regression. See Parameter Estimates table for regression results. General Linear Model is convenient to use because it can provide Bonferroni and Scheffe comparisons.

Descriptive Statistics Country of Origin Mean Std. Deviation N American 20.08 6.415 244 European 27.60 6.580 68 Japanese 30.45 6.090 79 Total 23.48 7.781 391 Tests of Between-Subjects Effects Source Type III Sum of Squares df Mean Square F Sig. Corrected Model 7817.309(a) 2 3908.655 96.030.000 Intercept 194029.594 1 194029.594 4767.050.000 origin 7817.309 2 3908.655 96.030.000 Error 15792.466 388 40.702 Total 239224.740 391 Corrected Total 23609.775 390 a R Squared =.331 (Adjusted R Squared =.328) Estimates 99% Confidence Interval Country of Origin Mean Std. Error Lower Bound Upper Bound American 20.079.408 19.021 21.136 European 27.603.774 25.600 29.606 Japanese 30.451.718 28.593 32.309 Parameter Estimates 99% Confidence Interval Parameter B Std. Error t Sig. Lower Bound Upper Bound Intercept 30.451.718 42.423.000 28.593 32.309 [origin=1] -10.372.826-12.559.000-12.510-8.234 [origin=2] -2.848 1.055-2.698.007-5.580 -.116 [origin=3] 0(a)..... a This parameter is set to zero because it is redundant. Origin 1 = American Origin 2 = European 3 = Japanese Pairwise Comparisons Mean Difference (I-J) Std. Error Sig.(a) 99% Confidence Interval for Difference(a) (I) Country of Origin (J) Country of Origin Lower Bound Upper Bound American European -7.524(*).875.000-10.108-4.940 Japanese -10.372(*).826.000-12.811-7.933 European American 7.524(*).875.000 4.940 10.108 Japanese -2.848 1.055.022-5.965.269 Japanese American 10.372(*).826.000 7.933 12.811 European 2.848 1.055.022 -.269 5.965 Based on estimated marginal means * The mean difference is significant at the.01 level.

a Adjustment for multiple comparisons: Bonferroni. Multiple Comparisons Mean 99% Confidence Interval (I) Country of Origin (J) Country of Origin Difference (I-J) Std. Error Sig. Lower Bound Upper Bound Scheffe American European -7.52(*).875.000-10.20-4.85 Japanese -10.37(*).826.000-12.89-7.85 European American 7.52(*).875.000 4.85 10.20 Japanese -2.85 1.055.027-6.07.37 Japanese American 10.37(*).826.000 7.85 12.89 European 2.85 1.055.027 -.37 6.07 Bonferroni American European -7.52(*).875.000-10.11-4.94 Japanese -10.37(*).826.000-12.81-7.93 European American 7.52(*).875.000 4.94 10.11 Japanese -2.85 1.055.022-5.96.27 Japanese American 10.37(*).826.000 7.93 12.81 Based on observed means. * The mean difference is significant at the.01 level. European 2.85 1.055.022 -.27 5.96