Analysis of variance using orthogonal projections

Size: px

Start display at page:

Download "Analysis of variance using orthogonal projections"

Tyler Griffin
5 years ago
Views:

1 Analysis of variance using orthogonal projections Rasmus Waagepetersen Abstract The purpose of this note is to show how statistical theory for inference in balanced ANOVA models can be conveniently developed using orthogonal projections. The approach is exemplified for one- and twoway ANOVA and finally some hints are given regarding extension to three- or higher-way ANOVA. 1 Prerequisites on factors and projections Analysis of variance (ANOVA) models are specified in terms of grouping variables or factors. Suppose observations y i are indexed by a set I of cardinality n. A factor F is a function that assigns a grouping label among a finite set of labels to each observation. E.g. F (i) = q means that observation y i (or index i) is assigned to group/level q for the factor F. Suppose F generates k groups. The design matrix Z F corresponding to F is then n k and the iqth entry of Z F is 1 if i is assigned to group q and 0 otherwise. Note that in many applications, i is a multi-index of the form i = i 1 i 2... i p for p 1. For any factor F we denote by L F the column space of Z F. The orthogonal projection on L F is denoted P F. The result of applying P F to a vector (y i ) i I is that y i is replaced by the average of the observations in the group that i belongs to (i.e. the average of those y l for which F (l) = F (i)). Two factors play a special role. The unit factor I has a unique level for each observation so L I = R n and P I = I (with an abuse of notation I is used both for factor and identity matrix). The factor 0 assigns all observations to the same group so L 0 = span{1 n } and P 0 = 1 n 1 T n/n. Note that P 0 y is simply the vector where each component is given by the average ȳ = i I y i/n. For any factor F, L 0 L F L I. 1

2 A factor F is said to be balanced if there is a common number m of observations at each of the k levels (whereby n = mk). In this case, the orthogonal projection P F on L F is P F = 1 m Z F Z T F. This type of identity is crucial in the following and the reason why we focus on balanced factors. 2 One-way ANOVA Consider the model Y ij = ξ + U i + ɛ ij, i = 1,..., k, j = 1,..., n i, where ξ R, U i N(0, τ 2 ), ɛ ij N(0, σ 2 ), τ 2, σ 2 0, and the variables U i and ɛ ij, are independent i = 1,..., k and j = 1,..., n i. Let I consist of indices ij for i = 1,..., k and j = 1,..., n i, and define the factor F by F (ij) = i. Then stacking variables on top of each other we can write the model in vector form as Y = 1 n ξ + Z F U + ɛ where n is the total number of observations. As noted in the previous section, Z F is the design matrix corresponding to F : the ij, qth entry of Z F is 1 if Y ij belongs to the qth group and zero otherwise. 2.1 Orthogonal decomposition We now obtain an orthogonal decomposition of R n : R n = V 0 V F V I where V 0 = L 0 = span(1 n ), V F = L F V 0 and V I = R n L F. Here denotes sum of orthogonal subspaces: L 1 L 2 = {x + y x L 1, y L 2 } for orthogonal subspaces L 1 and L 2, and denotes orthogonal complement: L 2 L 1 = {x L 2 x T y = 0 y L 1 } 2

3 for L 1 L 2. The dimensions of V 0, V F and V I are 1, k 1 and n k, and the orthogonal projections on V 0, V F and V I are Q 0 = P 0, Q F = P F P 0 and Q I = I P F (see exercise 1). Using the orthogonal projections we also obtain an orthogonal decomposition of the data vector: Y = Q 0 Y + Q F Y + Q I Y into components falling in V 0, V F and V I. The covariance matrix is decomposed as: CovY = CovZ F U + Covɛ = mτ 2 P F + σ 2 I = λp F + σ 2 Q I where λ = mτ 2 + σ 2. Here we used CovZ F U = τ 2 Z F Z T F = τ 2 mp F and I = P F + Q I = 0. Note that there is a one-to-one correspondence between the pairs (λ, σ 2 ) and (τ 2, σ 2 ). The components Q 0 Y, Q F Y and Q I Y are independent since their covariances are zero. For example Cov(Q 0 Y, Q F Y ) = Q 0 ΣQ F = Q 0 (λp F + σ 2 Q I )Q F which is zero since P F = Q 0 +Q F and all products Q 0 Q F = Q 0 Q I = Q F Q I = 0. Since CovY = λp F + σ 2 Q I = CovP F Y + CovQ I Y we also consider in the next section the coarser decomposition Y = P F Y + Q I Y of Y into independent components P F Y and Q I Y falling in L F and V I. 2.2 Estimation using factorization of likelihood Note P F 1 n = Q 0 1 n = 1 n and Q I 1 n = 0. Moreover Q I P F = 0. Hence [ ] (( ) [ ]) PF 1n ξ λpf 0 Y N, Q I 0 n 0 σ 2. Q I We can thus base maximum likelihood estimation of (ξ, λ) on P F Y and maximum likelihood estimation of σ 2 on Q I Y. 3

4 More precisely, Σ 1/2 exp( 1 2 (Y 1 nξ) T Σ 1 (Y 1 n ξ)) = λ k/2 exp( 1 2λ P F Y 1 n ξ 2 ) (σ 2 ) k(m 1)/2 exp( 1 2σ 2 Q IY 2 ) where we used Σ 1 = σ 2 Q I + λ 1 P F and Σ = λ k (σ 2 ) mk k (see exercises 2,3,5). The two factors in the above likelihood are in fact generalized densities of the degenerate normal vectors P F Y and Q I Y (i.e. these vectors are n dimensional but their distributions are concentrated on lowerdimensional subspaces of dimension k and n k). Consider e.g. the factor λ k/2 exp( 1 P 2 F Y 1 n ξ 2 ) involving the parameters λ and ξ. We can maximize this with respect to λ and ξ in exactly the same way as when we previously considered the likelihood of a linear normal model N n (µ, σ 2 I). Since L 0 = span{1 n } we thus obtain 1 n ξ = P 0 P F Y = P 0 Y and ˆλ = P F Y P 0 Y 2 /k = SSB/k. Proceeding in the same way for the second factor (where there is no mean parameter), we obtain ˆσ 2 = Q I Y 2 /(k(m 1)) = SSE/(k(m 1)). Note that Q F Y N n (0, λq F ). By exercise 6, P F Y P 0 Y 2 = Q F Y 2 λχ 2 (k 1) which has mean λ(k 1). Thus ˆλ is biased. An unbiased (REML) estimate is given by λ = P F Y P 0 Y 2 /(k 1) = SSB/(k 1). 3 Two-way analysis of variance Consider now a model with two factors T (treatment) and P (plot) with numbers of levels d T and d P. Moreover let P T be the cross-factor (has d P d T levels - one for each combination of levels of P and T ). More specifically, I consists of indices ptr and the factor mappings are P (ptr) = p, T (ptr) = t, and P T (ptr) = pt. Assume P T is balanced with n P T = m observations for each level. Then P and T are balanced too with numbers of observations n P = md T and n T = md P for each level. 4

5 A two-way ANOVA model with fixed T effects and random P and P T effects is Y ptr = ξ + β t + U p + U pt + ɛ ptr p = 1,..., d P, t = 1,..., d T, r = 1,..., m, where the U p N(0, σ 2 P ), U pt N(0, σ 2 P T ) and ɛ ptr N(0, σ 2 ) are independent. With a convenient abuse of terminology we refer to P and P T as random factors. The random effects U p and U pt can e.g. serve to model soil variation between plots in a field in case of an agricultural experiment. In vector form the model is Y = µ + Z P U P + Z P T U P T + ɛ where µ L T. The vectors Y, U P and U P T are again obtained by stacking, e.g. U P = (U 1,..., U dp ) T. 3.1 Orthogonal decomposition Similar to the one-way ANOVA we define: V 0 = L 0 V T = L T V 0 V P = L P V 0. Since P T is balanced it follows that P T P P = P P P T = P 0 (exercise 7) which implies Q T Q P = 0 where Q T = P T P 0 and Q P = P P P 0. Hence V T and V P are orthogonal and we obtain an orthogonal decomposition of the sum of L P and L T : We further define L P + L T = {v + w v L P, w L T } = V 0 V P V T. V P T = L P T (L P + L T ), V I = R n L P T and obtain the orthogonal decomposition: R n = V 0 V P V T V P T V I. The dimensions of the V spaces are f 0 = 1, f P = d P 1, f T = d T 1, f P T = d P d T d P d T + 1 = (d P 1)(d T 1), f I = n d P d T. The orthogonal projections on the V spaces are : Q 0 = P 0, Q P = P P Q 0, Q T = P T Q 0, Q P T = P P T Q P Q T Q 0 and Q I = I P P T. 5

6 In line with the orthogonal decomposition of R n we also obtain two decompositions of Y into orthogonal components: Y = Q 0 Y + Q P Y + Q T Y + Q P T Y + Q I Y = Q P Y + Q P T Y + Q I Y where Q P = Q 0 + Q P = P P QP T = Q P T + Q T and Q I = Q I. The second decomposition corresponds to a coarser decomposition R n = ṼP ṼP T ṼI into orthogonal subspaces ṼP = V 0 V P = L P, Ṽ P T = V T V P T and Ṽ I = V I associated with each random factor P, P T and I. The coarser decomposition of R n corresponds to a decomposition of the covariance matrix using the Q projection matrices: where CovY = σ 2 P n P P P + σ 2 P T n P T P P T + σ 2 I = λ P QP + λ P T QP T + λ I QI λ P = σ 2 + n P T σ 2 P T + n P σ 2 P (1) λ P T = σ 2 + n P T σ 2 P T (2) λ I = σ 2. The mean vector µ L T = V 0 V T is decomposed as: µ = Q P µ + Q P T µ + Q I µ = Q 0 µ + Q T µ + 0 = µ 0 + µ T where µ 0 L 0 and µ T V T. Note the one-to-one correspondences between the σ 2 s and the λ s and between µ and (µ 0,µ T ). 3.2 Relation to sum-to-zero constraint In the model for the mean vector µ, the parameters ξ and β 1,..., β dt identifiable. This is because are not µ ptr = ξ + β t = (ξ + k) + (β t k) = ξ + β t. 6

7 Hence if we add k to ξ we obtain the same mean vector by just subtracting k from the β t s. One way to enforce identifiability is to require that one β t is zero (i.e. choosing a reference group). Another option is to introduce the sum-to-zero constraint: dt t=1 β t = 0. Note that vectors v V T are of the form v ptr = β t since they lie in L T and they further satisfy the sum-to-zero constraint 1 T nv = d P m d T t=1 β t = 0 due to the orthogonality of V 0 and V T. In other words, enforcing the sum-to-zero constraint on the β t parameters corresponds to the decomposition of µ into a constant vector ξ1 n falling in V 0 and a vector v = (β t ) ptr V T. 3.3 Estimation using factorization of likelihood As for one-way ANOVA, Y is decomposed into independent normal vectors Q P Y N(µ 0, λ P QP ) QP T Y N(µ T, λ P T QP T ) QI Y N(0, λ I QI ). Accordingly, the density for Y factorizes into a product of (generalized) densities for Q P Y, Q P T Y and Q I Y similar to the case of the one-way ANOVA. For instance, the density for Q P Y is proportional to λ d P /2 P exp( 1 2λ P Q P Y µ 0 2 ). Thus in analogy with estimation in the usual linear model Y N n (µ, σ 2 I) we obtain ˆµ 0 = P 0 QP Y = P 0 Y = 1 n Ȳ and Similarly, ˆλ P = Q P Y P 0 Y 2 /d P = Q P Y 2 /d P. ˆµ T = Q T QP T Y = Q T Y, ˆµ = ˆµ 0 + ˆµ T = P T Y, ˆλ P T = Q P T Y Q T Y 2 /(d P d T d P ) = Q P T Y 2 /(d P d T d P ), and ˆλ I = ˆσ 2 = Q I Y 2 /(n d P d T ). 7

8 Since V P and V P T have dimensions f P = d P 1 and f P T = (d P 1)(d T 1) we often use these (cf. exercise 6) denominators instead of d P and d P d T d P for λ P and λ P T thereby obtaining the unbiased estimates λ P = Q P Y 2 /f P λp T = Q P T Y 2 /f P T of λ P and λ P T. Note that the MLE for µ is in fact identical to the MLE in the linear model with µ L T and no random effects Estimation of original variance components Estimates of the original variances σp 2 and σ2 P T can be obtained by simply solving (1) and (2) with respect to these parameters. That is, σ 2 P = ( λ P λ P T )/n P σ 2 P T = ( λ P T σ 2 )/n P T. Note that there is no guarantee that these estimates are non-negative. In fact there could be several reasons for obtaining negative variance estimates: if e.g. the true σp 2 is zero or close to zero it is not unlikely to encounter λ P < λ P T due to sampling variation. observations within a group could be negatively correlated which could be reflected by a negative variance estimate (remember definition of intra-class correlation). data deviates in some way (e.g. outliers) from the assumed model. In general, if a negative variance estimate is obtained, it is recommended (as always) to check carefully whether model assumptions seem valid. If data does not seem to contradict the model, one may consider refitting the model without the random factor for which a negative variance estimate was obtained. If negative correlation within groups of the data is plausible one may look into an alternative model to the assumed linear mixed model. 4 Strata A crucial common point for the one- and two-way ANOVA models in the previous sections is that the factorization of the likelihood is based on an orthogonal decomposition induced by the random factors only. This is due 8

9 P I PxT O T Figure 1: Structure diagram for two-way ANOVA with random effects. to the decomposition of the covariance matrix into terms given by scaled projection matrices for the random factors. The fixed effects are decomposed into components according to V spaces for the fixed factors. These components are then assigned to strata corresponding to Ṽ spaces for the random factors. We say that a factor F is finer than G (or G coarser than F ) if levels of G can be obtained by merging levels of F. We then write G F. The rule for allocating fixed effects to strata is then that F belongs to B strata (where B is a random factor) if B is the coarsest random factor which is finer than F. This is of course assuming that this rule makes sense, i.e. we work with an experimental design where this rule gives a unique allocation of each fixed factor to one random factor. It is for instance required that I is a random factor. Also, for fixed F and random B 1, B 2, we can not have both F B 1 and F B 2 unless either B 1 B 2 or B 2 B 1. To get an overview of allocation to strata it is useful to draw a structure diagram as shown for the two-way ANOVA in Figure 1. In the structure diagram there is an arrow from F to G if G is coarser than F and there is no intermediate factor which is coarser than F and finer than G. In Figure 1 9

10 there are three strata (black, green and red) corresponding to the random factors I, P T and P. Note that here we can not have both P and T random unless O is random too (if O was fixed in this case there would not be a unique allocation of O to random factor strata). We also want to respect the hierarchical principle for the mean structure so P T can not be fixed if P is random. A more general discussion of allocation to strata is given in Section 7. 5 Hypothesis tests and confidence intervals We here exemplify how hypothesis tests are conducted in balanced ANOVAs by considering one- and two-way ANOVAs. 5.1 One-way ANOVA For the one-way ANOVA with random effects consider the F -statistic F = SSB/(k 1) SSE/(k(m 1)) = λ ˆσ 2 σ 2 + mτ 2 σ 2 F (k 1, k(m 1)) = (1 + mγ)f (k 1, k(m 1)) where γ = τ 2 /σ 2 is the signal to noise ratio. Thus with q L and q U e.g. 2.5% and 97.5% quantiles for F (k 1, k(m 1)), we have P (q L F/(1 + mγ) q U ) = 95% P ((F/q U 1)/m γ (F/q L 1)/m) = 95% so that [(F/q U 1)/m, (F/q L 1)/m)] is a 95% confidence interval for γ, see also Remark 5.10 in [1]. One hypothesis is of particular interest: H 0 : τ 2 = 0, i.e. there is no variation between groups. A simple F -test for this is based on the observation τ 2 = 0 γ = 0 σ 2 = λ. 10

11 Thus large values of the F statistic above is critical for H 0 and under H 0, F has a F (k 1, k(m 1)) distribution. Note that the F statistic coincides with the F -test for no group effects in a one-way ANOVA with fixed group effects (which is equivalent to the likelihood ratio test for no group effects in the fixed effects one-way ANOVA). 5.2 Tests for fixed effects in two-way ANOVA For the two-way ANOVA considered in Section 3 the interesting hypothesis regarding fixed effects is H 0 : β 1 = β 2 = = β dt i.e. no fixed group effects. This is equivalent to that µ T = 0 (referring to the notation of Section 3.1). Thus for constructing the likelihood ratio test we only need to consider the factor corresponding to the term Q P T Y N(µ T, λ P T QP T ) which has density λ (d P d T d P )/2 P T exp( 1 2λ P T Q P T Y µ T 2 ). Thus the situation is equivalent to testing µ = 0 for a linear normal model N dp d T d P (µ, λ P T I). We can thus proceed as for a usual linear normal model (without random effects) and obtain the F -test F = Q T Y 2 λ P T F (d T 1, (d P 1)(d T 1)). 5.3 Test for variance components in two-way ANOVA Recall λ I = σ 2, λ P T = σ 2 + n P T σ 2 P T and λ P = σ 2 + n P T σ 2 P T + n P σ 2 P. Hence e.g. σ 2 P T = 0 λ I = λ P T. Thus a natural statistic (but not likelihood-ratio statistic) for testing σ 2 P T = 0 is F = λ P T ˆσ 2 which has an F ((d P 1)(d T 1), n d P T ) distribution if σp 2 T = 0. Big values of F are critical for σp 2 T = 0. Note λ P T = Q P T Y 2 /((d P 1)(d T 1)) so F is identical to the F -statistic for testing the hypothesis of no (fixed) effect of the factor P T in a linear normal model without random effects. 11

12 5.4 Confidence intervals for parameters in two-way ANOVA Regarding the mean vector µ = 1 n ξ + Z T β where β = (β 1,..., β dt ), we can consider confidence intervals for components of ξ + β i of µ or for contrasts δ ij = β i β j corresponding to differences between group means. Here it is easy to derive confidence intervals for the contrast: employing the orthogonal decomposition µ = µ 0 + µ T, the contrasts only involve µ T in the P T stratum. E.g. if c is a vector with c 1i1 = 1, c 1j1 = 1 and zeroes elsewhere then δ ij = β i β j = c T µ = c T µ T. Thus δ ij can be estimated as ˆδ ij = c Tˆµ T = c T Q T QP T Y N(β i β j, λ P T c T Q T c) and we can thus follow the usual route to constructing confidence intervals based on the t-distribution combining the above normal distribution for ˆδ ij and the scaled χ 2 distribution of λ P T with degrees of freedom (d P 1)(d T 1). Note c T Q T QP T Y = c T Q T Y = ȳ i ȳ j and c T Q T c = c T P T c c T P 0 c = c T P T c = (1, 1)(1/n T, 1/n T ) T = 2 n T. Hence the t-statistic becomes ȳ i ȳ j 2 λ P T /n T which has a t((d P 1)(d T 1)) distribution under the null hypothesis β i = β j. You can also get the above results by simply observing that ȳ i ȳ j = β i β j + m U i m U j + 1 ɛ i 1 ɛ j n T n T n T n ( T ) 2 N β i β j, (mσp 2 T + σ 2 ). n T Note here m = n P T so mσ P T + σ 2 = λ P T. Confidence intervals for the λ variance parameters (including σ 2 = λ I ) are straightforward due to the exact scaled χ 2 distributions of their estimates. Confidence intervals for the original variance parameters (other than σ 2 ) are less straightforward since estimates for these are distributed as differences of scaled χ 2 distributions). 12

13 6 Orthogonal decomposition for a K-way ANOVA In this section we first construct an orthogonal decomposition for a balanced three-way ANOVA and then generalize the approach to a K-way balanced ANOVA for arbitrary K 1. We know by now that for a two-sided ANOVA corresponding to factors F 1 and F 2 with F 1 F 2 balanced, there exists a decomposition of R n into orthogonal subspaces V I, V F1 F 2, V F1, V F2 and V 0. Suppose we introduce a third factor F 3 so that F 1 F 2 F 3 is balanced too. Then also {V I, V F1 F 3, V F1, V F3, V 0 } and {V I, V F2 F 3, V F2, V F3, V 0 } are sets of orthogonal subspaces. By analogy with two-way ANOVA with factors F = F 1 F 2 and G = F 3 it also follows that V F1 F 2 and V F3 are orthogonal and similarly for the pairs V F1 F 3, V F2 and V F2 F 3, V F1. It is also easy to see that P F1 F 2 P F2 F 3 = P F2. (3) This implies V F1 F 2 and V F2 F 3 are orthogonal. To see this note that the corresponding projections are Q F1 F 2 = P F1 F 2 P F2 Q F1 and Q F2 F 3 = P F2 F 3 P F2 Q F3. We now check that Q F1 F 2 Q F2 F 3 = 0 from which the required orthogonality follows: Q F1 F 2 Q F2 F 3 = (P F1 F 2 P F2 Q F1 )(P F2 F 3 P F2 Q F3 ) = P F2 P F2 P F1 F 2 Q F3 P F2 + P F2 0 Q F1 P F2 F = 0. Here, P F1 F 2 Q F3 = 0 in analogy with a two-way ANOVA with factors F = F 1 F 2 and G = F 3, and Q F1 P F2 F 3 = 0 by the same argument. Thus V F1 F 2, V F1 F 3 and V F2 F 3 are orthogonal too. Defining V F1 F 2 F 3 = L F1 F 2 F 3 (V 0 V F1 V F2 V F3 V F1 F 2 V F1 F 3 V F2 F 3 ) V I = R n V F1 F 2 F 3 we arrive at the orthogonal decomposition R n = V I A {1,2,3} V l A F l letting V l F l = V 0. The approach for the three-way ANOVA can be generalized to obtain the following main result: 13

14 Theorem 1. Suppose F 1 F K is balanced, K 1. Define, recursively, for A {1,..., K}, V l A F l = L l A F l B A V l B F l with V l F l = V 0 = L 0 and let V I = R n L K l=1 F l. Then we have the orthogonal decomposition R n = V I A {1,...,K} V l A F l. Proof. We need to show that (*) V l A F l and V l B F l are orthogonal whenever A, B {1,..., K}, A B (this is already shown for K = 1, 2, 3). If A B = then (*) follows by analogy to a two-way ANOVA with factors F = l A F l and G = l B F l. If B A we define F = l B F l and G = l A\B F l. Then V l A F l = V F G and the result again follows by analogy to a two-way ANOVA. The case A B is handled similarly. Finally if C = A B is non-empty and neither A B nor B A, let F = l A\C F l, G = l B\C F l, and H = l C F l. Then (*) holds by analogy to a three-way ANOVA. 7 Decomposition into strata In this section we discuss a general decomposition into strata corresponding to factors with random effects. Let D be a set of factors and let B D denote the set of factors with random effects. We assume the factors in D are distinct in the sense that we can not have both F F and F F for different F, F D (otherwise two factors in D would induce the same grouping of the observations). there is an orthogonal decomposition R n = F D V F of R n into a sum of subspaces V F, F D, with associated orthogonal projection matrices Q F. the covariance matrix of the data vector is decomposed as Σ = B B σ 2 Bn B P B 14

15 where for B B, P B = Q F. (4) F D:F B We want Q B s, B B, so that P B = B B:B B Q B (5) where for each B B, Q B is an orthogonal projection on a subspace ṼB where R n = B B Ṽ B and ṼB is a sum of V F subspaces. We want this because then we obtain the decomposition Σ = σbn 2 B P B = λ B QB where λ B = n B σb 2 B B B B B B:B B and hence a decomposition of Y into independent terms Q B Y, B B. The following theorem tells us when we may get what we want. Theorem 2. A necessary and sufficient condition for (5) is that: for all F D there exists a B B such that F B and B B for all other B with F B. If this condition holds we can define Ṽ B = V F and QB = Q F. F D:B(F )=B F D:B(F )=B Remark 1. Note that this condition implies that I has to be in B. This is not a serious restriction since we will always include measurement error in our models. It also implies that the B mentioned in the condition is unique. In the proof below we use the notation B(F ) for the unique B corresponding to F, F D. Proof. We first show the sufficiency. Each of the factors F B on the right hand side of (4) has B(F ) = B for precisely one B B. Hence P B = Q F B B:B B F D:B(F )=B Thus we can define for B B, a subspace and associated orthogonal projection Ṽ B = V F and QB = Q F. F D:B(F )=B F D:B(F )=B 15

16 Each F D belongs to precisely one ṼB so Rn = B BṼB. To show that the condition is necessary note first that to get (5), for each F D we must have V F ṼB for some B B with B B. Moreover by equating (4) and (5) we need F B. Suppose next that B i, i = 1,..., m, m 1, are the elements in B that satisfy F B i, i = 1,..., m. If m 2 assume by contradiction that for each B i there is a B j so that B i B j. Assume without loss of generality that V F ṼB 1 and that B 1 B 2. Since F B 2 we would then by (4) need to include Q B 1 in the sum (5) for P B 2 but this would be a contradiction with the restriction on the sum in (5) when B 1 B 2. 8 Concluding remarks This note essentially gives a simplified exposition of results presented in [3]. A general exposition can also be found in [2]. We here considered the use of orthogonal decompositions only in the context of balanced ANOVAs. However, the same principles apply for other types of linear mixed models allowing for relevant orthogonal decompositions of the sample space. One example is a random coefficient model for the well-known orthodontic data set consisting of data for 27 children each with 4 measurements. Here one can e.g. observe that the MLEs of the mean parameters are the same in the models with and without random child effects. References [1] Henrik Madsen and Poul Thyregod Introduction to general and generalized linear models. Chapman & Hall/CRC Texts in Statistical Science. [2] Jesper Møller Centrale statistiske modeller og likelihood baserede metoder, Department of Mathematical Sciences, Aarhus University. [3] Tue Tjur Analysis of variance models in orthogonal designs, International Statistical Review, 52,

17 A Orthogonal projections Suppose L is a subspace of R n, n 1. Let L = {v R n v w = 0 for all w L} denote its orthogonal complement. Orthogonal decomposition: each x R n has a unique decomposition where u L and v L. x = u + v Orthogonal projection: u and v above are the orthogonal projections p L (x) and p L (x) of x on respectively L and L. Due to orthogonality of u and v, we obtain Pythagoras theorem: x 2 = u 2 + v 2. A few facts regarding orthogonal projections: (a) the orthogonal projection p L : R n L on L is a linear mapping. It is thus given by a unique matrix-transformation p L (x) = P x where P is an n n matrix. (b) the projection matrix P is symmetric (P T = P ) and idempotent (P 2 = P ). (c) conversely, if a matrix Q is symmetric, idempotent and L = colq then Q is the matrix of the orthogonal projection on L. (d) if L = colx and X has full rank then P = X(X T X) 1 X T. (e) if P is the matrix of the orthogonal projection on L then I P is the matrix of the orthogonal projection on L. (f) if L 1 and L 2 are subspaces with L 1 L 2 and associated orthogonal projections P 1 and P 2, then the orthogonal projection on L 2 L 1 is P 2 P 1. Example: the orthogonal projection on a subspace spanned by a single vector v is vt x v 2 x. Example: the orthogonal projection on a subspace spanned by orthogonal vectors v 1,..., v p is p v T i x i=1 v i x. 2 17

18 B Exercises 1. Let L 1 L 2 with orthogonal projections P 1 and P 2. Show that P 2 P 1 is the orthogonal projection on L 2 L Show that an orthogonal projection only has eigen values 1 or For a symmetric matrix A show that the determinant A is the product of A s eigen values. 4. Show Q I P F = 0 5. Let S = ap + bq where P and Q are orthogonal projections with P + Q = I and a, b 0. Show that the eigen values of S are the nonzero eigen values a and b of ap and bq. Show that S 1 = a 1 P +b 1 Q. 6. Show that Y 2 σ 2 χ 2 (d) if Y N(0, σ 2 P ) and P is an orthogonal projection on a subspace of dimension d (hint: use spectral decomposition and the result above regarding the eigen values of P ). 7. Check that P P P T = P 0 when P T is balanced. 8. Check that L T + L P = V 0 V P V T. 18

Outline for today. Two-way analysis of variance with random effects

Outline for today. Two-way analysis of variance with random effects Outline for today Two-way analysis of variance with random effects Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark Two-way ANOVA using orthogonal projections March 4, 2018 1 /