Stat 502 Design and Analysis of Experiments General Linear Model

Size: px

Start display at page:

Download "Stat 502 Design and Analysis of Experiments General Linear Model"

Logan McDonald
5 years ago
Views:

1 1 Stat 502 Design and Analysis of Experiments General Linear Model Fritz Scholz Department of Statistics, University of Washington December 6, 2013

2 2 General Linear Hypothesis We assume the data vector Y = (Y 1,..., Y N ) consists of independent Y i N (µ i, σ 2 ), for i = 1,..., N. We also have a data model hypothesis, namely that the mean vector µ can be any point in a given s-dimensional linear subspace Π Ω R N, where s < N. Many statistical problems can be formulated as follows: we identify a linear subspace Π ω of Π Ω of dimension s r, with 0 < r s, and we test the hypothesis H 0 : µ Π ω against the alternative H 1 : µ Π Ω Π ω. To distinguish Π ω from Π Ω we call Π ω the test hypothesis.

3 3 The Two-Sample Example Let Y 1,..., Y m N (ξ, σ 2 ) and Y m+1,..., Y m+n N (η, σ 2 ) be independent samples. Here N = m + n and the data model below species or reects two samples with common variance σ 2 and possibly dierent means ξ and η. Π Ω is 2-dimensional (s = 2) consisting of all vectors of form µ = (µ 1,..., µ N ) m 1's n 0's m 0's n 1's {}}{{}}{{}}{{}}{ = ξ ( 1,..., 1, 0,..., 0) +η ( 0,..., 0, 1,..., 1) = ξa1 + ηa2 }{{}}{{} a 1 The orthogonal vectors a1 and a2 span Π Ω. Here we want to test the hypothesis H 0 : ξ = η, i.e., µ = (µ 1,..., µ N ) = ξ(a1 + a2) and Π ω is the linear subspace N 1's {}}{ spanned by 1 = a1 + a2 = ( 1,..., 1), i.e., r = 1 or s r = 1. a 2

4 4 The k-sample Example Let Y i1,..., Y ini N (ξ i, σ 2 ), i = 1,..., k be independent random samples. Here N = n n k and we have used the traditional double indexing on the Y 's, but we could equally well have used a more awkward single index i = 1,..., N. The data model below species or reects k samples with common variance σ 2 and possibly dierent means ξ 1,..., ξ k. Π Ω is k-dimensional (s = k) consisting of all vectors of form µ = (µ µ knk ) n 1 1's (N n 1) 0's (N n k ) 0's n k 1's {}}{{}}{{}}{{}}{ = ξ 1 ( 1,..., 1, 0,..., 0 ) ξ k ( 0,..., 0, 1,..., 1) }{{}}{{} a 1 a k = ξ 1 a ξ k a k The orthogonal vectors a1,..., a k span Π Ω. a i has 1's in positions (i, 1),..., (i, n i ) and 0's in the remaining positions.

5 5 H 0 : ξ 1 =... = ξ k Here we want to test the hypothesis H 0 : ξ 1 =... = ξ k, i.e., µ = (µ µ knk ) = ξ 1 a ξ k a k = ξ 1 (a a k ) Π ω is the (s r) dimensional linear subspace spanned by N 1's {}}{ 1 = a a k = ( 1,..., 1), i.e., s r = k r = 1, or r = k 1.

6 6 Least Squares Estimates (LSE's) The Π Ω -least squares estimate ˆµ = ˆµ(Y ) of µ minimizes Y µ 2 = N (Y i µ i ) 2 over µ Π Ω. ˆµ = ˆµ(Y ) = projection of Y onto Π Ω = point in Π Ω closest to Y The Π ω -least squares estimate ˆµ = ˆµ(Y ) of µ minimizes Y µ 2 = N (Y i µ i ) 2 over µ Π ω. ˆµ = ˆµ(Y ) = projection of Y onto Π ω = point in Π ω closest to Y

7 7 General Linear Model Schematic Diagram Y R N Y µ = e 0 µ µ Π Ω Π ω

8 8 Comments on the Previous Diagram ˆµ Π Ω is the orthogonal projection of the data vector Y onto Π Ω, i.e., best explanation of Y in terms of Π Ω. The orthogonal complement e = Y ˆµ is the residual error vector, i.e., that part of Y not explained by ˆµ Π Ω. Y = ˆµ + (Y ˆµ) = ˆµ + e. ˆµ is the orthogonal projection of the data vector Y onto Π ω, i.e., it is the best explanation of Y in terms of Π ω. ˆµ is also the orthogonal projection of ˆµ onto Π ω, i.e., it is the best explanation of ˆµ in terms of Π ω. ˆµ = ˆµ + (ˆµ ˆµ). The orthogonal complement ˆµ ˆµ is that part of ˆµ that cannot be explained by Π ω. It expresses the estimated discrepancy of the unknown µ from Π ω.

9 9 Orthogonal Least Squares Decomposition Y = orthogonal sum of the following three component vectors Y = ˆµ + (ˆµ ˆµ) + (Y ˆµ) = ˆµ + (ˆµ ˆµ) + e Orthogonality = these three components are independent of each other, assuming normality of Y. Pythagoras 4 Y 2 = ˆµ 2 + Y ˆµ 2 = ˆµ 2 + e 2 Y 2 = ˆµ 2 + Y ˆµ 2 ˆµ 2 = ˆµ 2 + ˆµ ˆµ 2 Y ˆµ 2 = ˆµ ˆµ 2 + Y ˆµ 2 = ˆµ ˆµ 2 + e 2 Y ˆµ 2 {}}{ Y 2 = ˆµ 2 + ˆµ ˆµ 2 + e 2 double Pythagoras

10 10 LSE's = MLE's (Maximum Likelihood Estimates Under the normal distribution model for the Y i the LSE's are also the maximum likelihood estimates (MLE's) of µ w.r.t. to the respective model constraints µ Π Ω and µ Π ω. This follows immediately from the likelihood function for the observed Y = y = (y 1,..., y N ) ( ) ( N ) N 1 L(µ, σ) = fµ,σ(y 1,..., y N ) = σ exp (y i µ i ) 2 2π 2σ 2 It is maximized over Π Ω (Π ω ) by minimizing N (y i µ i ) 2 w.r.t. µ Π Ω (Π ω ) taking ˆσ 2 = N (y i ˆµ i ) 2 /N and ˆσ 2 = N (y i ˆµ i ) 2 /N, respectively.

11 General Linear Model Theorem (Proof: Slides 36) Theorem: Under the general linear model assumption we have: ˆµ ˆµ 2 = = N ( ˆµ i ˆµ i ) 2 = Y ˆµ 2 Y ˆµ 2 N (Y i ˆµ i ) 2 N (Y i ˆµ i ) 2 σ 2 χ 2 r,λ N is independent of SSE = Y ˆµ 2 = (Y i ˆµ i ) 2 σ 2 χ 2 N s and thus F = ˆµ ˆµ 2 /r Y ˆµ 2 /(N s) F r, N s,λ, 11 noncentrality parameter = λ = ˆµ(µ) ˆµ(µ) 2 /σ 2 = ˆµ(µ) µ 2 /σ 2, where ˆµ(µ) is viewed as Π ω -LSE when Y = µ and ˆµ(µ) is viewed as Π Ω -LSE when Y = µ. Of course, ˆµ(µ) = µ since µ Π Ω.

12 General Linear Model Theorem (2-Sample Case) Here the Π Ω -LSE of µ minimizes N m (Y i µ i ) 2 = (Y i ξ) 2 + m+n i=m+1 (Y i η) 2 12 = ˆµ = (ˆξ,..., ˆξ, ˆη,..., ˆη) = ˆξ a 1 + ˆηa 2 m m+n with ˆξ = Ȳ 1 = Y i /m and ˆη = Y i /n. i=m+1 The Π ω -LSE of µ minimizes N N (Y i µ i ) 2 = (Y i µ) 2 ˆµ = (ˆµ,..., ˆµ) = ˆµ 1 with ˆµ = Ȳ F = Ȳ = (mȳ1 + nȳ2)/n ˆµ ˆµ 2 = mn N (Ȳ1 Ȳ2) 2 ( ˆµ ˆµ 2 /r Y ˆµ 2 /(N s) = [Ȳ1 Ȳ2]/ 1/m + 1/n) 2 /1 m+n (Y i Ȳ ) 2 /(m + n 2) F 1, N 2, λ = (2-sample t-statistic) 2. F -test 2-sided t-test.

13 13 Noncentrality Parameter in the 2-Sample Case To nd the noncentrality parameter λ we replace Y by µ = (ξ,..., ξ, η,..., η) in ˆµ(Y ) ˆµ(Y ) 2 = mn N (Ȳ1 Ȳ2) 2 = ˆµ(µ) ˆµ(µ) 2 = mn N (ξ η)2 = Thus the noncentrality parameter (ncp) is λ = ( ( [ξ η]/ ) 2 1/m + 1/n σ 2 = δ 2 where δ is the ncp in the two-sample t-test, namely δ = ξ η σ 1/m + 1/n. ) 2 (ξ η) 1/m + 1/n

14 14 General Linear Model Theorem (k-sample Case) Here the Π Ω -LSE of µ minimizes N (Y i µ i ) 2 = k n i (Y ij ξ i ) 2 j=1 ˆµ = ˆξ 1 a ˆξ k a k with ˆξ i = Ȳi. = The Π ω -LSE of µ minimizes N (Y i µ i ) 2 = F = ˆµ = Ȳ.. = n i Y ij /n i. j=1 k n i (Y ij µ) 2 ˆµ = (ˆµ,..., ˆµ) = ˆµ 1 j=1 k Ȳ i.n i /N ˆµ ˆµ 2 = ˆµ ˆµ 2 k /r Y ˆµ 2 /(N s) = k ni k n i j=1 (Ȳi. Ȳ..) 2 j=1 (Ȳ i. Ȳ..)2 /(k 1) ni j=1 (Y ij Ȳ..) 2 /(N k) F k 1, N k, λ

15 15 Noncentrality Parameter in the k-sample Case To get the noncentrality parameter λ we replace Y by µ = (ξ 1,..., ξ 1,..., ξ k,..., ξ k ) in ˆµ(Y ) ˆµ(Y ) 2 = k n i j=1 (Ȳ i. Ȳ..) 2 = ˆµ(µ) ˆµ(µ) 2 = k n i (ξ i ξ) 2 = j=1 k n i (ξ i ξ) 2 with ξ = k n i µ ij /N = j=1 k ξ i n i /N The noncentrality parameter is λ = k n i(ξ i ξ) 2 σ 2.

16 16 The Two Factor Experiment with Unequal Cell Counts We consider the following model with two factors A and B Y ijk = µ ij + e ijk = µ + a i + b j + c ij + e ijk with e ijk i.i.d. N (0, σ 2 ), i = 1,..., t 1, j = 1,..., t 2, k = 1,..., n ij. The replications n ij may vary from cell to cell. Even if we plan n ij = n i, j, accidents happen, lost data. Sometimes imbalance is planned (economic constraints). May even have n ij = 0 for some cells. It constrains analysis. We require n i. = j n ij > 0 i, and similarly n.j > 0 j.

17 17 Parametrization Equivalence: n ij > 0 i, j To establish a 1-1 correspondence between the t 1 t 2 freely varying cell means µ ij and the model parameters µ, a i, b j, c ij, we impose the set-to-zero constraints on the latter, i.e, a 1 = 0, b 1 = 0 and c 1j = c i1 = 0 for all i, j. This leaves 1 + (t 1 1) + (t 2 1) + (t 1 1) (t 2 1) = t 1 t 2 freely varying model parameters. Clearly the t 1 t 2 freely varying model parameters, µ, a i, b j, c ij with set-to-zero side conditions give us µ ij = µ + a i + b j + c ij. Conversely, we get the model parameters µ, a i, b j, c ij with set-to-zero side conditions from µ ij via µ = µ 11, a i = µ i1 µ 11, b j = µ 1j µ 11, c ij = µ ij µ 11 (µ i1 µ 11 ) + (µ 1j µ 11 ) = µ ij µ i1 µ 1j + µ 11 again with µ ij = µ+a i +b j +c ij = µ 11 +(µ i1 µ 11 )+(µ 1j µ 11 )+(µ ij µ i1 µ 1j +µ 11 )

18 18 Problems Caused by Unequal Replications n ij = n i, j orthogonality nice ANOVA decomposition. When n ij vary, that decomposition usually does not hold, i.e., we still have Y ijk = Ȳ... + (Ȳi.. Ȳ...) + (Ȳ.j. Ȳ...) +(Ȳ ij. Ȳ i.. Ȳ.j. + Ȳ...) + (Y ijk Ȳ ij.) but typically SS T SS A + SS B + SS AB + SS E The orthogonality no longer holds! However, we can still test various hypotheses. H 0 : c ij = 0 i, j (no interactions) or the additive model holds. H A 0 : a i = 0 i, given no interactions (no main eect from A). H B 0 : b j = 0 j, given no interactions (no main eect from B). The denominator error sum of squares and df will be dierent when testing H 0 or H A 0 (HB 0 ).

19 19 The Order of Factors When both eects are tested in the additive model, mind the order in which the factors are specied in the model. In lm(y ~ A+B) F A tests H0 A within the additive model. In the latter situation F B tests factor B, after accounting for A. The reason for the distinction is that the subspaces corresponding to A and B are not orthogonal. Part of B may be explainable by A. Only the part of B orthogonal to the subspace corresponding to A comes into play when B is specied as the second factor. The sums of squares still add up. They are least squares distances of projections onto subspaces.

20 20 Illustration: With Interactions, n ij > 0 i, j # for factorial.test see class web page > nmat <- cbind(c(2,1,2),c(2,3,2),c(2,3,1)) > out <- factorial.test(nmat=nmat,sig=1.5,int=t) > out Y a b

21 21 ANOVAs for Illustration Data > anova(lm(y~a+b+a*b,data=out)) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) a * b ** a:b * Residuals Signif. codes: 0 *** ** 0.01 * > anova(lm(y~b+a+a*b,data=out)) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) b ** a ** b:a * Residuals Signif. codes: 0 *** ** 0.01 *

22 22 Some Comments Note the order of a+b and b+a in previous models. The results for those two factors are dierent. This is due to order of subspace projections. Sums of squares = squared distance within the respective eect subspace to the hypothesis that the eect is zero. The residual sum of squares measures the variation distance perpendicular to involved eects.

23 23 Illustration: No Interactions, n ij > 0 i, j > nmat <- cbind(c(2,1,2),c(2,3,2),c(2,3,1)) > out<- factorial.test(nmat=nmat,sig=.5,int=f) > out Y a b

24 24 ANOVAs for Illustration Data > anova(lm(y~a+b+a*b,data=out)) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) a * b * a:b Residuals Signif. codes: 0 *** ** 0.01 * > anova(lm(y~b+a+a*b,data=out)) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) b * a ** b:a Residuals Signif. codes: 0 *** ** 0.01 *

25 25 Illustration: With Interactions, Some n ij = 0 > nmat <- cbind(c(0, 1, 2), c(2, 3, 2), c(2, 3, 0)) > out<- factorial.test(nmat=nmat,sig=.5,int=t) > out Y a b

26 26 ANOVAs for Illustration Data > anova(lm(y~a+b+a*b,data=out)) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) a *** b ** a:b Residuals Signif. codes: 0 *** ** 0.01 * > anova(lm(y~b+a+a*b,data=out)) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) b *** a *** b:a Residuals Signif. codes: 0 *** ** 0.01 *

27 27 ANOVAs for Illustration Data > anova(lm(y~a+b,data=out)) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) a *** b ** Residuals > anova(lm(y~a:b,data=out)) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) a:b *** Residuals Note = = a:b Sum Sq i,j k:n ij >0 (Y ijk Ȳ ij.) 2 = from previous slide.

28 28 Comments on Previous ANOVAs b:a only has 2 degrees of freedom, in place of previous 4. In the full model we are missing 2 µ ij 's. This works as long as the pattern of non-missing cells is connected, i.e., any µ ij with n ij > 0 can be connected by a rectangular path to any other µ i j with n i j > 0. The rectangular path can only traverse cells with n ij > 0. Chapter 5 of Linear Models for Unbalanced Data by Shayle R. Searle, Wiley 1987.

29 29 Analysis of Covariance (ANCOVA) The simplest example model is as follows Y ij = µ + τ i + β(x ij x..) + ɛ ij with ɛ ij i.i.d. N (0, σ 2 ) We have a treatments, i = 1,..., a. n observations Y ij at level i, with covariates x ij. τ i satisfy sum- or set-to-zero side conditions. If we ignore x ij, then the response variability may be inated. The x ij may suggest (false) treatment eects. The above model is an ANOVA/Regressions hybrid. Lots of generalizations: More treatments More covariates Dierent slopes, i.e., covariate/treatment interactions Treatment interactions ANCOVA is useful for variation adjustment due to uncontrollable nuisance variables. Contrast with blocking.

30 30 Breaking Strength Example Data (Montgomery) The following data were taken from Montgomery Ch They represent breaking strength data for monolament ber produced by 3 dierent machines M i, i = 1, 2, 3. The strength Y ij is measured in pounds. Y ij is aected by the ber diameter x ij. (x ij, Y ij ), j = 1,..., 5 were taken from machine M i.

31 31 The Data > breakstrength strength diameter machine

32 32 A First Look Breaking strength, y Machine 1 Machine 2 Machine Diameter, x

33 33 ANOVA of Diameter vs Machine > anova(lm(diameter~machine,data=breakstrength)) Analysis of Variance Table Response: diameter Df Sum Sq Mean Sq F value Pr(>F) machine Residuals When using ANCOVA it is useful to understand the eect of the covariate (diameter) on the factor (Machine). When adjusting for covariate rst, we may adjust also for part of the factor. Here Machine seems not to aect the covariate.

34 34 ANCOVA > anova(lm(strength~machine+diameter,data=breakstrength) Analysis of Variance Table Response: strength Df Sum Sq Mean Sq F value Pr(>F) machine e-05 *** diameter e-06 *** Residuals > anova(lm(strength~diameter+machine,data=breakstrength) Analysis of Variance Table Response: strength Df Sum Sq Mean Sq F value Pr(>F) diameter e-07 *** machine Residuals Note the dierence in signicance!

35 35 Diagnostic Plots residuals fitted diameter Normal Q Q Plot Sample Quantiles Machine Theoretical Quantiles

36 36 Appendix A: General Linear Model Theorem Proof The proof is essentially that in Testing Statistical Hypotheses, 3rd Edition, ch. 7, by E.L. Lehmann and J.P. Romano (2005) There it is presented in the context of certain optimality properties and followed by many explicit examples. The proof is rst given in a case of special linear subspaces Π Ω and Π ω Π Ω, where the statement of the theorem is immediate from the denitions of χ 2 f, χ2 g,λ and F f,g,λ. The general case can always be orthonormally transformed to the special case. This does not change the meaning of LSE. LSE's in one framework rotate to LSE's in the other. Orthonormal transforms don't change distances/independence.

37 37 General Linear Model Theorem: Special Case U i N (ν i, σ 2 ), i = 1,..., N with model hypothesis ν s+1 =... = ν N = 0. This describes an s-dimensional linear subspace Π Ω of R N. Test H 0 : ν 1 =... = ν r = 0 (inside the model hypothesis). This describes an (s r)-dimensional subspace Π ω Π Ω. = N i=s+1 U 2 i /σ 2 χ 2 N s and r Ui 2 /σ 2 χ 2 r, λ are independent and with ncp λ = r ν2 i /σ2 The natural test rejects H 0 when the corresponding F -statistic is too large, where r F = U2 i /r N i=s+1 U2 i /(N s) F r, N s,λ.

38 38 LSE View in Special Case The Π Ω -LSE ˆν minimizes N U ν 2 = (U i ν i ) 2 = s (U i ν i ) 2 + N i=s+1 U 2 i over ν Π Ω i.e., ˆν i = U i for i = 1,..., s and ˆν i = 0 for i = s + 1,..., N. Similarly, the Π ω -LSE ˆν minimizes N r s N U ν 2 = (U i ν i ) 2 = Ui 2 + (U i ν i ) 2 + over ν Π ω i=r+1 i=s+1 i.e., ˆν i = U i for i = r + 1,..., s and ˆν i = 0 for i = 1,..., r, s + 1,..., N. Clearly N U ˆν 2 = Ui 2 and ˆν ˆν 2 = U ˆν 2 U ˆν 2 = i=s+1 F = ˆν ˆν 2 /r U ˆν 2 /(N s) = r U2 i /r N i=s+1 U2 i U 2 i /(N s) r U 2 i

39 LSE View of the Noncentrality Parameter Using U = ν Π Ω in the Π ω -LSE derivation of ν ω we have N U ν ω 2 = ν ν ω 2 = (ν i ν ω,i ) 2 = r s N νi 2 + (ν i ν ω,i ) 2 + i=r+1 i=s+1 ν 2 i since ν ω Π ω Π Ω = r s νi 2 + (ν i ν ω,i ) 2 since ν Π Ω i=r+1 39 = r νi 2 after minimizing over Π ω, i.e., ν ω,i = ν i, i = r + 1,..., = λ = ν ν ω 2 /σ 2 = r νi 2 /σ 2.

40 40 Orthonormal Transform to the Special Case The general case can always be orthonormally transformed to the special case. Let C be an orthonormal matrix, with the rst s rows c i, i = 1,..., s, spanning Π Ω and c i, i = r + 1,..., s, spanning Π ω. Such orthonormal basis vectors can always be constructed via the Gram-Schmidt process. Transform U = C Y with mean vector ν = Cµ U 1,..., U N are again independent with common variance σ 2. Now note that µ Π Ω µ c i or ν i = c iµ = 0, i = s + 1,..., N ν Π Ω µ Π ω µ c i or ν i = c iµ = 0, i = 1,..., r, s+1,..., N ν Π ω

41 41 Orthonormal Transform and Independence Suppose V 1,..., V N are i.i.d. N (0, 1) then Z = C V has components Z 1,..., Z n i.i.d. N (0, 1) ( ) ( ) N 1 N ( ) N f (v) = exp vi 2 1 /2 = exp ( v 2 /2 ) 2π 2π has constant density if and only if v is constant, i.e., the constant level density contour surfaces consist of points that are equidistant from the origin. Orthonormal transformations z = C v preserve distances ( v = z ) The transformed vector Z has the same density, i.e., f (z). The general independence result may be seen as follows Y = µ + σv has independent components Y 1,..., Y N. U = C Y = Cµ+σC V = ν+σz has independent components

42 42 Orthonormal Transform and LSE's The principle of LSE's is based on minimizing distances. Orthonormal transforms don't change distances. b 2 = b b = a C C a = a a = a 2. Thus the LSE's (w.r.t. Π Ω or Π ω ) based on the untransformed Y simply transform to the LSE's w.r.t. to the transformed Π Ω or Π ω and based on the transformed U. Numerator and denominator of F involve LS distances, which are unchanged as we pass from Y to U = C Y. The same holds for the ncp, also characterized in terms of a LS distance/σ 2, i.e., ncp = ˆµ(µ) µ 2 /σ 2 = ˆν(ν) ν 2 /σ 2, based on Y = µ or U = ν, respectively.

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each