Stat 6640 Solution to Midterm #2

Stat 6640 Solution to Midterm #2 1. A study was conducted to examine how three statistical software packages used in a statistical course affect the statistical competence a student achieves. At the end of the course, the student s score on a comprehensive, standardized examination will be used to measure a student s competence. Twelve professors were randomly chosen and randomly and evenly split among the three statistical software packages. During the semester of interest, this course had 24 sections, and each of the 12 professors taught two sections. The data follow: package 1 package 2 package 3 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 P 11 P 12 83.2 85.4 88.1 75.8 89.0 74.7 74.2 80.2 83.7 79.9 74.7 88.8 80.3 91.3 84.0 92.2 70.4 79.9 75.6 80.3 76.8 79.7 77.5 80.4 Of the four ANOVA Tables in Appendix A, only one is correct for the problem. (a) (2 points) Which ANOVA Table provides the correct analysis? Why? ANOVA #3 is the correct one. This is a fully nested design with the random factor Professor nested within the fixed factor Package. The error term for Package is Professor(Package). (b) (2 points) Using your choice in Part (a), draw appropriate conclusion. Use 0.05 significance level. Reject H0 a : α i = 0 at 0.05 significance level since the P -value is 0.023. The students competence seem to vary among the three packages. (c) (4 points) Perform a stage 2 analysis on the factor Package using Tukey s all-pairwise 95% confidence intervals. A table of studentized range distribution quantiles is provided in Appendix A. 1

The half interval width for the Tukey s Honest Significant Differences is q α;k,dfe MSB(A) bn 17.39 17.39 = q 0.05;3,9 4 2 = 3.948 = 5.82. 8 The mean competence scores are 85.04, 78.04, and 80.19, respectively, for the three packages. The table below gives the pairwise differences in means y i.. y j.. with the significant differences displayed in boldface: To summarize the above findings: j 2 3 i 1 7.00 4.85 2 2.15 Package 2 3 1 Mean Competence 78.04 80.19 85.04 2. (4 points) A study was set to determine whether the relative cholesterol measurements (mg/dl) for patients were consistent from run to run in the clinic. The serum samples of four randomly selected patients from a pool of patients and the cholesterol level was measured from these samples. Two independent replicate tubes were prepared for each patient for each of three (randomly chosen) runs on a spectrophotometer. The data are listed: Patient run 1 2 3 4 1 167.3 186.7 214.5 148.5 166.7 184.2 215.3 148.5 2 179.6 193.8 228.9 158.6 175.3 198.9 220.4 154.7 3 169.4 179.4 208.2 144.7 165.9 177.6 207.1 145.9 Use the computer output in Appendix B to analyze the data and draw conclusion. (Use 0.10 significance level for interaction and 0.05 significance level for the main factors.) 2

This is a two-factor completely crossed random effect model. The interaction is insignificant at 0.10 significance level since its P -value is 0.231. Both run and patient are significant at 0.05 significance level since both P -values are 0.000. Suppose the negligible variance component due to the interaction is regarded as part of the error. The total variance contained in an observation is 50.328+764.044+(1.957+6.556) = 822.885. The contribution of the variance components are 6.1%, 92.8%, and 1.1%, respectively, for run, patient, and error. The most noticeable variation comes from patients. 3. (3 points) Eight factorial treatments are compared in an 8 8 Latin square. The eight treatment combinations 11, 12, 13, 14, 21, 22, 23, and 24 of two factors of primary interest correspond to the Latin letters A, B, C, D, E, F, G, and H. These two factors are Temperature at two levels (level 1 = 150 C and level 2 = 200 C) and Catalyst Charge at four levels (level 1 = 2%, level 2 = 4%, level 3 = 6%, and level 4 = 8%). The two block factors are Material (row) and Day (column). What are the degrees of freedom of the following ANOVA table? Briefly describe how the SS for the Interaction (i.e., Temperature by Catalyst Charge interaction) is computed. ANOVA column (Day) Source df 11 12 13 14 21 22 23 24 Material 7 12 11 14 13 22 21 24 23 Day 7 13 14 11 12 23 24 21 22 Treatment Combination 7 row 14 13 12 11 24 23 22 21 Temperature 1 (Material) 21 22 23 24 11 12 13 14 Catalyst Charge 3 22 21 24 23 12 11 14 13 Interaction 3 23 24 21 22 13 14 11 12 Error 42 24 23 22 21 14 13 12 11 Total 63 3

The degrees of freedom are given in the ANOVA table above. Note the row, column, and Latin are each of s 1 = 8 1 = 7 degrees of freedom and the Error term is of (s 1) (s 2) = (8 1) (8 2) = 42 degrees of freedom. Denote y ijkl the response at Material i, Day j, Temperature k, and Catalyst Charge l for a combination of (i, j, k, l) from the Latin square. The grand mean is computed by y... = 1 64 (i,j,k,l) y ijkl. The treatment combination means are y..kl = 1 y ijkl, k = 1, 2, l = 1, 2, 3, 4. 8 (i,j) The Temperature means are y..k. = 1 32 The Catalyst Charge means are y...l = 1 16 y ijkl, k = 1, 2. (i,j,l) (i,j,k) y ijkl, l = 1, 2, 3, 4. The treatment combination sum of squares is SS treatment = 8 (y..kl y... ) 2 (k,l) 2 The Temperature sum of squares is]; SS T emperature = 32 (y..k. y... ) 2 The Catalyst Charge sum of squares is SS Catalyst Charge = 16 k=1 4 (y...l y... ) 2 The interaction sum of squares is SS interaction = SS treatment SS T emperature SS Catalyst Charge. l=1 4

Appendix A: ANOVA Tables for Problem 1 ANOVA #1 Package 2 205.72 102.86 2.97 0.089 σ 2 + 8κ 2 A Professor 3 51.36 17.12 0.50 0.692 σ 2 + 6κ 2 B Package*Professor 6 105.16 17.53 0.51 0.792 σ 2 + 2κ 2 AB ANOVA #2 Package 2 205.72 102.86 5.87 0.039 σ 2 + 2σ 2 AB + 8κ2 A Professor 3 51.36 17.12 0.98 0.463 σ 2 + 2σ 2 AB + 6σ2 B Package*Professor 6 105.16 17.53 0.51 0.792 σ 2 + 2σAB 2 ANOVA #3 Package 2 205.72 102.86 5.91 0.023 σ 2 + 2σ 2 b + 8κ2 a Professor(Package) 9 156.52 17.39 0.50 0.846 σ 2 + 2σb 2 ANOVA #4 Package 2 205.72 102.86 2.97 0.089 σ 2 + 8κ 2 a Professor(Package) 9 156.52 17.39 0.50 0.846 σ 2 + 2κ 2 b The table below gives the upper α-fractile of Studentized range distribution q α;k,dfe, where k is the number of treatment means under comparison, df E is the error degrees of freedom of the respective model. 5

α = 0.05 α = 0.01 k df E = 6 df E = 9 df E = 12 df E = 6 df E = 9 df E = 12 2 3.46 3.199 3.081 5.243 4.596 4.32 3 4.339 3.948 3.773 6.331 5.428 5.046 4 4.896 4.415 4.199 7.033 5.957 5.502 12 6.789 5.983 5.615 9.485 7.784 7.06 6

Appendix B: Computer Output for Problem 2 Analysis of Variance for cholesterol Source DF SS MS F P run 2 826.2 413.1 39.46 0.000 patient 3 13784.2 4594.7 438.85 0.000 run*patient 6 62.8 10.5 1.60 0.231 Error 12 78.7 6.6 Total 23 14751.9 S = 2.56052 R-Sq = 99.47% R-Sq(adj) = 98.98% Expected Mean Square Variance Error for Each Term (using Source component term unrestricted model) 1 run 50.328 3 (4) + 2 (3) + 8 (1) 2 patient 764.044 3 (4) + 2 (3) + 6 (2) 3 run*patient 1.957 4 (4) + 2 (3) 4 Error 6.556 (4) 7