identity Y ijk Ȳ = (Y ijk Ȳij ) + (Ȳi Ȳ ) + (Ȳ j Ȳ ) + (Ȳij Ȳi Ȳ j + Ȳ ) Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, (1) E(MSE) = E(SSE/[IJ(K 1)]) = (2) E(MSA) = E(SSA/(I 1)) = + JK I 1 (3) E(MSB) = E(SSB/(J 1)) = + IK J 1 i=1 j=1 (4) E(MSAB) = E(SSAB/[(I 1)(J 1)]) = + α 2 i β 2 i K (I 1)(J 1) i=1 j=1 δ 2 ij Proof: They can be proved by using Lemma A: Let X i, where i = 1,,n be independent random variables with E(X i ) = µ i and V ar(x i ) =. Then E(X i X) 2 = (µ i µ) 2 + n 1 n σ2 Apply it to (1): E(Y ijk Ȳij.) 2 = (0 0) 2 + (K 1)/K, thus E(SSE) = E[ ij (K 1)[Sij 2 ] = IJKE[S2 ij ] = IJ(K 1) Apply it (2): Apply it to (3): Notice that Y ijk N(µ + α i + β j + δ ij, ) and apply the lemma to SSTO, 21
E(SST O) = = = (IJK 1) + E(Y ijk Ȳ ) 2 [(µ + α i + β j + δ ij µ) 2 + IJK 1 IJK σ2 ] = (IJK 1) + JK (α i + β j + δ ij ) 2 α 2 i + IK i=1 βj 2 + K j=1 i=1 j=1 The last step is true because of the constraints on the parameters. For example, α i β j = K( α i )( β j ) = 0 i=1 Based on (1)-(3) and E(SSTO), we can prove (4). Theorem B: Distributions of Sums of Squares Assume that the errors are independent and normally distributed with means zero and variances. Then (1) SSE/ follows a chi-squared distribution with IJ(K 1) degrees of freedom. (2) Under the null j=1 H 0 : α i = 0,i = 1,,I SSA/ follows a chi-squared distribution with I 1 degrees of freedom. (3) Under the null H 0 : β j = 0,j = 1,,J SSB/ follows a chi-squared distribution with J 1 degrees of freedom. (4) Under the null H 0 : δ ij = 0,i = 1,,I,j = 1,,J SSAB/ follows a chi-squared distribution with (I 1)(J 1) degrees of freedom. (5) The sums of squares are independently distributed. Proof. We only provide the proof for (1). 22 δ 2 ij
SSE/ = 1 Consider the sample in group (i,j): {Y ij1,y ij2,,y ijk }. (Y ijk Ȳij ) 2 Here we fixed the level of factor A at i and the level of factor B at j. All the observations have the same distribution: N(µ + α i + β j + δ ij, and they are independent. This is iid becuase we assumed that Y ijk = µ + α i + β j + δ ij + ǫ ijk where ǫ ijk N(0, ). So {Y ij1,y ij2,,y ijk } is a random sample from N(µ + α i + β j + δ ij, ). In 120B we learned that the distribution of the sample variance of a random sample from a Normal distribution. Let Sij 2 denote the sample variance of the random sample, we have 1 (Y ijk Ȳij ) 2 = K 1 Sij 2 χ 2 K 1. k=1 Since Sij 2 s are from independent samples, they are independent. In addition, it is not difficult to verfiy that Therefore, SSE = i j (K 1)S 2 ij SSE/ χ 2 IJ(K 1) Proofs for the (2,3) are similar. Proofs for (4,5) are not required in this course. For (2) Think about the sample variance of this random sample under the null {Ȳ1..,Ȳ2..,,ȲI..} For (3) Think about the sample variance of this random sample under the null {Ȳ.1.,Ȳ.2.,,Ȳ.I.} For (4) The rationale for df = (I 1)(J 1): there are IJ δ ij s but they are redundant: given the constraints i = j = 0 the last row and the last columns are not needed, which results in (I 1)(J 1) unique parameters. 23
Implication of very large SSA. what does a large SSA tell us? Under the null H O : α i = 0, i = 1,, I E(MSA) = E(MSE) If the null hypothesis is not true, E[MSA] > E[MSE]. Therefore,we can use MSA/MSE as a test statistic. In fact, under the null, it follows F I 1,IJ(K 1). Summary, (1) Under the null H O : α i = 0, i = 1,, I (2) Under the null F = MSA MSE = SSA/(I 1) SSE/[IJ(K 1] SSA/[ (I 1)] = SSE/[ IJ(K 1] χ 2 I 1 /(I 1) = d χ 2 IJ(K 1) /[IJ(K 1)] F I 1,IJ(K 1) H O : β j = 0, j = 1,, J (3) Under the null F = MSB MSE F J 1,IJ(K 1) H O : δ ij = 0, i = 1,, I, j = 1,, J F = MSAB MSE F (I 1)(J 1),IJ(K 1) All the distributions have the same denominator degrees of freedom. This is because all the sums of squares are compared to the mean sum of squares of error. ANOVA table for two-way models 24
Source SS df M S F MSA A SSA I 1 SSA MSA= MSE B SSB J 1 MSB = SSB J 1 I-1 MSB MSE AB SSAB (I 1)(J 1) MSAB = SSAB (I 1)(J 1) Error SSE IJ(K 1) MSE = SSE IJ(K 1) Total SSTO IJK 1 MSAB MSE The ANOVA table for the iron retention example To test the effect of iron form, we test using the statistic Because the test statistic is... Source SS df MS F Iron form 2.074 1 2.074 5.99 Dosage 15.588 2 7.794 22.53 Interaction.810 2.405 1.17 Error 35.396 102.346 Total 53.768 107 H 0 : α 1 = α 2 = 0 v.s. H 1 : not all α i equal 0 F = Similarly, dosage also has an effect on the iron retention. Last, we want to test whether there is an interaction effect by considering the following test statistic... F = Since..., we conclude that... 25
A brief review of all the tests we have discussed: One group: X 1,,X n, H 0 : µ = µ 0. We use the one-sample t-test. Two groups but paired: the t-test for paired samples, which is the one-sample t-test. Two groups: X 1,,X m, Y 1,,Y n. We use the two-sample t-test. I groups: Y ij. We use one-way ANOVA. Two factors. We use two-way ANOVA. 26