THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

Size: px

Start display at page:

Download "THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay"

Lisa Osborne
6 years ago
Views:

1 THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay Lecture 3: Comparisons between several multivariate means Key concepts: 1. Paired comparison & repeated measures 2. Comparing means of two populations 3. Comparing means of several populations: One-way multivariate analysis of variance 4. Testing for equality of covariance matrices 5. Two-way multivariate analysis of variance 6. Profile analysis 7. Growth curves Key assumption: Normality or large sample sizes. 1 Paired comparison A procedure to eliminate the influence of extraneous unit-to-unit variation. Measures are taken on the same or identical units for different treatments. Recall the univariate paired t-test. Let X j1 and X j2 be the response of unit j to treatment 1 and 2, respectively, where j = 1,..., n. Let D j = X j1 X j2 be the difference between the treatments. This is so because both responses are from the same or idential unit. Assume D j N(δ, σ 2 d ). Consider the testing problem with H o : δ = 0 versus H a : δ 0. The paried t-test is t = D δ s d / n, where D = 1 n n j=1 D j and s 2 d = 1 n n 1 j=1 (D j D) 2. One rejects H o if and only if t t n 1 (α/2). The corresponding 100(1 α)% confidence interval for the mean difference δ = E(X j1 X j2 ) is D t n 1 (α/2) s d δ D + t n 1 (α/2) s d. n n Generalization: Suppose that p measurements are taken from each unit so that the responses are X 1ji and X 2ji, where X 1ji is the measure of the ith variable of j unit for treatment

2 1, and X 2ji is the measure of the ith variable of j unit for treatment 2. The difference is then D ji = X 1ji X 2ji and D j = (D j1,..., D jp ). Assume that E(D j ) = δ = (δ 1,..., δ p ) and cov(d j ) = Σ d. If we further assume D j N p (δ, Σ d ), then we can make inference about δ using the Hotelling s T 2 statistic where D = 1 n n j=1 T 2 = n( D δ) S 1 d ( D δ), D j and S 2 d = 1 n 1 n (D j D)(D j D) Result 6.1. Let the differences D 1,..., D n be a random sample from an N p (δ, Σ d ) population. Then, T 2 = n( D δ) S 1 d ( D δ) is distributed as an [(n 1)p/(n p)]f p,n p random variable. If n and n p are both large, T 2 is approximately distributed as χ 2 p random variable. Inference: Suppose the random sample consists of d 1,..., d n and the population is N p (δ, Σ d ). Then, reject H o : δ = 0 in favor of H a : δ 0 if j=1 T 2 = n d S 1 d d > (n 1)p n p F p,n p(α). A 100(1 α)% confidence region for δ is ( d δ) S 1 d ( d δ) (n 1)p n(n p) F p,n p(α). A 100(1 α)% simultaneous confidence intervals for the individual mean differences δ i are (n 1)p d i ± n p F s 2 d i p,n p(α) n, where d i is the ith element of d and s 2 di is the (i, i)th element of S d. The Bonferroni 100(1 α)% simultaneous confidence intervals for the individual mean differences are d i ± t n 1 (α/(2p)) s 2 d i n. Finally, if n and n p are sufficiently large, then the normality assumption can be dropped, and the simultaneous C.I.s can be obtained by replacing [(n 1)p/(n p)]f p,n p (α) by χ 2 p(α). Remark: The R programs of Chapter 5 can be used to perform paired comparison. Example: Consider the data on Table 6.1 of the text. 2

3 > x=read.table("t6-1.dat") > d1=x[,1]-x[,3] > d2=x[,2]-x[,4] > d=cbind(d1,d2) > source( Hotelling.R ) > Hotelling(d,rep(0,2)) [,1] Hoteliing-T p.value > source("cregion.r") > confreg(d) [1] "C.R. based on T^2" [,1] [,2] [1,] [2,] [1] "CR based on individual t" [,1] [,2] [1,] [2,] [1] "CR based on Bonferroni" [,1] [,2] [1,] [2,] [1] "Asymp. simu. CR" [,1] [,2] [1,] [2,] Contrast matrix. Definition: A p-dimensional vector is called a contrast vector if its elements sum to zero. By definition, contrast vectors are orthogonal to the vector of ones. A m k matrix is called a contrast matrix if all its rows are contrast vectors. For example, c = (1, 0, 1, 0) is a contrast vector. The above paired comparisons can be achieved by using contrast matrix. For example, consider the effluent data of Example 6.1. Instead of computing the differenced data, we can directly use the observations in Table 6.1. The observation for Sample 1 is x 1 = (6, 27, 25.15). Construct the contrast matrix C = [ ]. (1)

4 Clearly, the differenced data are d j = Cx j for j = 1,..., n. Furthermore, d = C x and S d = CSC, where S is the sample covariance matrix of the data. The T 2 statistic then becomes T 2 = n x C (CSC ) 1 C x. Consequently, there is no need to calculate the differenced data d. This idea is particularly useful in analyzing repeated measures in which different treatments are applied to each unit once over successive periods of time. Suppose there are q treatments, then the observation for the jth unit is X j = (X j1, X j2,..., X jq ), j = 1,..., n. Let µ = E(X). To test the hypothesis that all treatments have the same effect is equivalent to test all elements of µ are equal. To this end, we can construct the contrast matrix C 1 as µ 1 µ µ 1 µ 1 µ µ 2 or C 2 as. = µ 1 µ q µ 1 µ 2 µ 2 µ 3. = µ q 1 µ q µ q C 1µ, µ 1 µ 2. µ q C 2µ. The problem then is to test C 1 µ = 0 or C 2 µ = 0. [Other contrast matrices are available.] This results in using the T 2 test statistic as T 2 = n(c X) (CSC ) 1 C X. Remark. The T 2 statistic does not depend on the choice of contrast matrix C. This is because rank(c 1 ) = rank(c 2 ) = q 1 and each row of C i are orthogonal to the vector of ones. Consequently, the rows of C 1 and the rows of C 2 span the (q 1)-dimensional subspace that is orthogonal to 1 q. Thus, there exists a non-singular matrix B (q 1) (q 1) such that C 1 = BC 2. For instance, for the C 1 and C 2 matrices given above, we have B = It is then easy to show that C 1 and C 2 give the same T 2 statistic. 4

5 Based on the prior discussion, we can test the equality of treatments in a repeated measures case by using the result below. Consider an N q (µ, Σ) population. Let C be a contrast matrix. An α-level test of H o : Cµ = 0 versus H a : Cµ 0 is to reject H o if T 2 = n(c x) (CSC ) 1 C x > (n 1)(q 1) F q 1,n q+1 (α), n q + 1 where x and S are the sample mean and covariance matrix. A confidence region for the contrasts Cµ is n(c x Cµ) (CSC ) 1 (C x Cµ) (n 1)(q 1) F q 1,n q+1. n q + 1 Consequently, simultaneous 100(1 α)% confidence intervals for the single contrasts c C for any contrast vectors of interest are c x (n 1)(q 1) c Sc ± F q 1,n q+1 (α) n q + 1 n. Example. Consider the Sleeping-dog data in Table 6.2. There are 19 observations and four treatments. To analyze the data, a R program called contrast.r is developed. The analysis is given below: > x=read.table("t6-2.dat") > dim(x) [1] 19 4 > x V1 V2 V3 V

6 > source("contrast.r") > cmtx=matrix(c(-1,1,1,-1,-1,-1,1,1,-1,1,-1,1),3,4) > cmtx [,1] [,2] [,3] [,4] [1,] [2,] [3,] > contrast(x,cmtx) [1] "Hotelling Tsq statistics & p-value" [1] e e-07 [1] "Simultaneous C.I. for each contrast" [,1] [,2] [1,] [2,] [3,] Comparing mean vectors of two populations The setup 1. X 11, X 12,..., X 1,n1 are p-dimensional random sample of size n 1 from a population with mean µ 1 and covariance matrix Σ X 21, X 22,..., X 2,n1 are p-dimensional random sample of size n 2 from a population with mean µ 2 and covariance matrix Σ The two random samples are independent. If n 1 and n 2 are small, some additional assumptions are needed. They are 1. both populations are normal, 2. Σ 1 = Σ 2 = Σ. Problem of interest: H o : µ 1 µ 2 = δ o versus H a : µ 1 µ 2 δ 0. Denote the sample mean and covariance of the random samples by x 1 and S 1 and x 2 and S 2, respectively. Under the assumption that Σ 1 = Σ 2, we can obtain a pooled estimate of the covariance matrix S pool = n 1 1 n 1 + n 2 2 S 1 + n 2 1 n 1 + n 2 2 S 2. 6

7 This pooled estimate is consistent as E(S pool ) = Σ. Note that E( X 1 X 2 ) = µ 1 µ 2 and Cov( X 1 X 2 ) = Cov( X 1 ) + Cov( X 2 ) = 1 n 1 Σ + 1 n 2 Σ, which can be estimated by ( 1 n n 2 )S pool. The following result holds. Result 6.2. If X 11, X 12,..., X 1,n1 form a random sample from N p (µ 1, Σ) and X 21, X 22,..., X 2,n2 forms a random sample from N p (µ 2, Σ) and the two random samples are independent, then T 2 = [ X 1 X 2 (µ 1 µ 2 )] [( 1 n n 2 ) S pool ] 1 [ X 1 X 2 (µ 1 µ 2 )] is distributed as Consequently, where c 2 = (n 1+n 2 2)p (n 1 +n 2 p 1) F p,n 1 +n 2 p 1(α). (n 1 + n 2 2)p (n 1 + n 2 p 1) F p,n 1 +n 2 p 1. P r(t 2 c 2 ) = 1 α, Proof: (1) X 1 X 2 N p (µ 1 µ 2, [(1/n 1 ) + (1/n 2 )]Σ), (2) (n 1 1)S 1 W n1 1(Σ) and (n 2 1)S 2 W n2 1(Σ), and (3) (n 1 1)S 1 and (n 2 1)S 2 are independent so that (n 1 1)S 1 + (n 2 1)S 2 W n1 +n 2 2(Σ). Result 6.3. Let c 2 = (n 1+n 2 2)p (n 1 +n 2 p 1) F p,n 1 +n 2 p 1(α). With probability 1 α, ( ) a ( X 1 X 2 ) ± c a 1n1 + 1n2 S pool a will cover a (µ 1 µ 2 ) for all a. Thus, the simultaneous confidence intervals for µ 1i µ 2i is ( ( X 1i X 1 2i ) ± c + 1 ) s ii,pool, i = 1,..., p, n 1 n 2 where s ii,pool denotes the (i, i)th element of the matrix S pool. The Bonferroni 100(1 α)% simultaneous C.I. for µ 1i µ 2i are ( x 1i x 2i ) ± t n1 +n 2 2(α/(2p)) ( 1 n n 2 ) s ii,pool. Case: Σ 1 Σ 2. In this case, there is no pooling in covariance matrix estimation and ( we typically ) require that n 1 p and n 2 p are sufficiently large. One can then replace 1 n n 2 S pool by 1 n 1 S n 2 S 2 and F -distribution by χ 2 p distribution. 7

8 When n 1 = n 2 = n, then 1 S S 2 = 1 n 1 n 2 n (S 1 + S 2 ) = 2 ( 1 n 2 S ) 2 S 2 ( 1 = n + 1 ) ( ) (n 1)S 1 n (n 1) + (n 1) + (n 1)S 2 (n 1) + (n 1) ( 1 = n + 1 ) (n 1)S1 + (n 1)S 2 n n + n 2 ( 1 = n + 1 ) S pool. n Thus, where n 1 = n 2, the large sample procedure is essentially the same as the one using pooled covariance matrix. The impact of unequal covariance matrices is, therefore, least when the sample sizes are equal. The impact would be greater if either n 1 n 2 or n 2 n 1. The Behrens-Fisher Problem. Test H o : µ 1 µ 2 = 0 versus H a : µ 1 µ 2 0, where the two populations are normally distributed, but have different covariance matrices, and the sample sizes are not large. [Of course, n 1 > p and n 2 > p are needed in order to estimate the covariance matrices.] The key issue is the distribution of T 2 = [ X 1 X 2 (µ 1 µ 2 )] [ 1 n 1 S n 2 S 2 ] 1 [ X 1 X 2 (µ 1 µ 2 )] when n 1 p and n 2 p are small. This problem has been widely studied in the literature; see, for instance, Krishnamoorthy and Yu (2004, Statistics & Probability Letters) and Nel and Van der Merwe (1986, Communications in Statistics - Theory and Methods). A recommended method is to approximate the distribution of T 2 as T 2 vp = v p + 1 F p,v p+1, where v = 2 i=1 1 n i p + p { [ 2 ( ( ) ) ] tr n i S i n 1 S n 2 S 2 + ( [ 1 tr n i S i ( ) ]) n 1 S n 2 S 2 }, where min(n 1, n 2 ) v n 1 + n 2. Remark: A R script Behrens.R is available to perform the test. See course web. For illustration, consider the effluent data on Table 6.1. The paired comparison rejects the null hypothesis of equal means. The result of using Behrens-Fisher approach is given below. > x=read.table("t6-1.dat") 8

9 > dim(x) [1] 11 4 > x1=x[,1:2] > x2=x[,3:4] > source("behrens.r") > Behrens(x1,x2) [1] "Estimate of v: " [1] [1] "Test result:" [,1] Test-T p.value It also rejects the null hypothesis. 3 Comparing mean vectors of several populations Setup: g populations, and n l observations for population l. 1. {X l,1, X l,2,..., X l,nl } is a random sample pf size n l from a population with mean µ l, where l = 1,..., g. The random samples from different populations are independent. 2. All populations have a common covariance matrix Σ, which is positive definite. 3. Each population is multivariate normal with dimension p. The normality assumption in Condition 3 can be relaxed when the sample size is sufficiently large. Hypothesis of interest H o : µ 1 = µ 2 = = µ g versus H a : µ i µ j for some 1 i, j g and i j. Univariate case: Recall the case of p = 1. The null hypothesis of µ 1 = µ 2 = = µ l can be written as τ 1 = τ 2 = = τ l = 0, where τ j is the deviation of µ j from the overall mean µ, i.e., µ j = µ + τ j. The model can then be written as X l,j = µ + τ l + e l,j, l = 1,..., g; j = 1,..., n l, where e l,j N(0, σ 2 a). For unique identification of parameters, it is commonly assumed that g n lτ l = 0. For the data, an analogous decomposition is x l,j = x + ( x l x) + (x l,j x l ), 9

10 where x = ( g nl j=1 x l,j)/n with n = g n l, x l = ( n l j=1 x l,j)/n l. Here x is an estimate of µ, ( x l x) is an estimate of τ l and (x l,j x l ) is an estimate of the error term e l,j. Subtracting x from the prior equation, taking squares, and summing, we have the identity g n l (x l,j x) 2 = j=1 g n l ( x l x) 2 + g n l (x l,j x l ) 2. (2) j=1 The cross-product term drops because it is zero, indicating the terms are orthogonal to each other. This identity is often thought of as ( ) ( ) ( ) Sum of Squares Sum of Squares Sum of Squares = +. of Total Variations of Treatments of Residuals In addition, the number of independent quantities in each term of the above identity is related by g g n l 1 = (g 1) + (n l 1). This is known as the degrees of freedom for each term. The univariate Analysis of Variance Table (ANOVA) is a summary of the aboev results. Source Degrees of of variation Sum of Squares freedom Treatments SS tr = g n l( x l x) 2 g 1 Residuals SS res = g nl j=1 (x l,j x l ) 2 g n l g Total SS tot = g nl j=1 (x l,j x) 2 g n l 1 The usual F -test rejects the null hypothesis H o : τ 1 = τ 2 = = τ g = 0 at the α level if F = SS tr /(g 1) SS res / ( g n l g) > F g 1, n l g. The rational for the F-test is as follows. x l is an estimate of µ l so that the numerator is a weighted measure of the variation of x l between the g populations, where the weights depend on the sample size of each population. The issue then is to judge the magnitude of this variation. The denominator provides a reference measure of the variation because it is an estimate of the random variation (i.e., σ 2 ) of the data. If the variation between the populations is large with respect to the random noises, then the means are said to be different. Remark: The R command for univariate analysis of variance is aov. For illustration, consider the data in Example 6.7 of the text. The R analysis corresponding to that of Example 6.8 is as follows. 10

11 > x=c(1,1,1,2,2,3,3,3) > y=c(9,6,9,0,2,3,1,2) > g1=factor(x) > g1 [1] Levels: > help(aov) > m1=aov(y~g1) > m1 Call: aov(formula = y ~ g1) Terms: g1 Residuals Sum of Squares Deg. of Freedom 2 5 Residual standard error: Estimated effects may be unbalanced > summary(m1) Df Sum Sq Mean Sq F value Pr(>F) g ** Residuals Multivariate case. When p > 1, the model becomes X l,j = µ + τ l + e l,j, j = 1,..., n l ; l = 1,..., g where e l,j N p (0, Σ). As before, µ is the overall mean vector, and τ l denotes the lth treatment effect satisfying g n lτ l = 0. The data can be decomposed as x l,j = x + ( x l x) + (x l,j x l ). Subtracting x from the prior equation, post-multiplying by its own transpose and summing, we obtain g n l g g n l (x l,j x)(x l,j x) = n l ( x l x)( x l x) + (x l,j x l )(x l,j x l ), j=1 where, as in the univariate case, the cross-product term sums to zero. For ease in notation, we define g n l W = (x l,j x l )(x l,j x l ) j=1 j=1 = (n 1 1)S 1 + (n 2 1)S (n g 1)S g, 11

12 to represent the within population sum of squares and cross products matrix, and B = g n l ( x l x)( x l x) to denote the between population sum of squares and cross-products matrix. The hypothesis of no treatment effects, H o : τ 1 = τ 2 = = τ g = 0, is tested by considering the relative sizes of the treatment and residual sums of squares and cross-products. The multivariate analysis of variance (MANOVA) table is given by Source Matrix of sum of squares Degrees of of variation and cross-products freedom Treatment B = g n l( x l x)( x l x) g 1 Residuals W = g nl j=1 (x l,j x l )(x l,j x l ) g n l g Total B + W = g nl j=1 (x l,j x)(x l,j x) g n l g The test then involves generalized variances, i.e. Specifically, one rejects H o if g nl Λ = W B + W = g determinant of the covariance matrix. j=1 (x l,j x l )(x l,j x l ) nl j=1 (x l,j x)(x l,j x) is too small. This test statistics was proposed by Wilks and is commonly referred to as Wilk s lambda. The distribution of Λ is given in Table 6.3 of the text for some special cases (p. 303). For other cases and large sample sizes, a modification of Λ due to Bartlett (1938) can be used. Specifically, if H o is true and l n l = n is large, ( n 1 p + g ) ( ln(λ ) = n 1 p + g ) ( ) W ln, 2 2 B + W has approximately a chi-square distribution with p(g 1) degrees of freedom. Remark. The R command for multivariate analysis of variance is manova. Below are some examples. > help(manova) > help(summary.manova) ** Example 6.9 of the text on Page 304 and 305. > x=matrix(c(1,1,1,2,2,3,3,3,9,6,9,0,2,3,1,2,3,2,7,4,0,8,9,7),8,3) 12

13 > x [,1] [,2] [,3] [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] > fac1=factor(x[,1]) > xx=x[,2:3] > m2=manova(xx~fac1) > m2 Call: manova(xx ~ fac1) Terms: fac1 Residuals resp resp Deg. of Freedom 2 5 Residual standard error: Estimated effects may be unbalanced --- > summary(m2) Df Pillai approx F num Df den Df Pr(>F) fac ** Residuals > summary(m2,test= Wilks ) Df Wilks approx F num Df den Df Pr(>F) fac ** Residuals ** Another example ** > help(gl) % generates factors 13

14 > da=read.table("t6-9.dat") > dim(da) [1] 48 4 > y=cbind(da[,1],da[,2],da[,3]) > gen=factor(gl(2,24)) > gen [1] [25] Levels: 1 2 > m1=manova(y~gen) > m1 Call: manova(y ~ gen) Terms: gen Residuals resp resp resp Deg. of Freedom 1 46 Residual standard error: Estimated effects may be unbalanced > summary(m1,test="wilks") Df Wilks approx F num Df den Df Pr(>F) gen e-09 *** Residuals Signif. codes: 0 *** ** 0.01 * If the null hypothesis of equal mean vectors are rejected, then one can use simultaneous confidence intervals between the components of mean vectors to study the differences between the means. Or equivalently, consider the components of the treatment-effect vectors τ l. Consider the difference between the ith component of the treatment vectors τ k and τ l, i.e. τ k,i τ l,i. The sample estimate is ˆτ k,i ˆτ l,i = ( x k,i x i ) ( x l,i x i ) = x k,i x l,i. Since random samples are independent between populations, we have ( Var(ˆτ k,i ˆτ l,i ) = Var( X k,i X 1 l,i ) = + 1 ) σ ii, n k n l where σ ii is the (i, i)th element of Σ. In addition, ˆσ ii = w ii, where w n g ii is the (i, i)th element of W and n = g n l. For g populations with p dimensional data, there are p variables 14

15 and g(g 1)/2 pairwise differences so that the Bonferroni critical value if t n g (α/(2m)) with m = pg(g 1)/2. Consequently, with probability at least 1 α, τ k,i τ l,i belongs to ( ) ( α w ii 1 ( x k,i x l,i ) ± t n g + 1 ) pg(g 1) n g n k n l for all i = 1,... p and all differences l < k = 1,..., g. 4 Testing for equality of covariance matrices The setup: g populations and p variables. The covariance matrix of population j is Σ j, which is positive definite. H o : Σ 1 = Σ 2 = = Σ g = Σ versus H a : Σ i Σ j for some 1 i j g. The most commonly used test statistic is the Box s M test. It is a likelihood-ratio type of test. Under the normality assumption, the likelihood ratio statistic fof testing equality in covariance matrix is g ( ) (nl 1)/2 Sl Λ =, S pool where n l is the sample size of the lth population, S l is the sample covariance matrix of lth population and S pool = 1 g (n l 1) [(n 1 1)S 1 + (n 2 1)S (n g 1)S g ], is the pooled sample covariance matrix. Box s test is based on a χ 2 approximation to the sampling distribution of 2 ln(λ). Specifically, [ g ] g M 2 ln(λ) = (n l 1) ln( S pool ) [(n l 1) ln( S l )]. Under H o, S l are not expected to differ too much so that they should be close to S pool. In this case, the ratio of determinants should be close to 1 and the M-statistic will be small. Box s test. Let [ g ] [ ] 1 u = (n l 1) 1 2p 2 + 3p 1 g (n, l 1) 6(p + 1)(g 1) where p is the number of variables and g is the number of populations. Then {[ g ] } g C = (1 u)m = (1 u) (n l 1) ln( S pool ) [(n l 1) ln( S l )] 15

16 has an approximate χ 2 distribution with v = 1 p(p+1)(g 1) degrees of freedom. One rejects 2 H o if C > χ 2 p(p+1)(g 1)/2 (α). Remark: A simple R script, called Box-M.R is written to perform the Box-M test for equal covariance matrices. For illustration, consider the data in Table 6.1. The null hypothesis cannot be rejected at the 5% level. The program requires two input variables: (a) data set and (b) a vector of (n 1, n 2,..., n g ) of sample sizes. The data set are arranged in population ordering that matches the sample size vector. > source("box_m.r") > mm=box_m(y,nv) [1] "Test result:" [,1] Box.M-C p.value > names(mm) [1] "Box.M" "Test.Stat" "p.value" 5 Two-way multivariate analysis of variance Univariate case: The model is X lkr = µ + τ l + β k + γ lk + e lkr ; l = 1,..., g; k = 1,..., b; r = 1,..., n where g τ l = b k=1 β k = g γ lk = b k=1 γ lk = 0 and e lkr N(0, σ 2 ). Here µ is the overall mean, representing the general leve of response, τ l is the fixed effect of factor 1, β k is the fixed effect of factor 2, and γ lk is the interaction between factor 1 and factor 2. For the data, the corresponding decomposition is x lkr = x + ( x l. x) + ( x.k x) + ( x lk x l. x.k + x) + (x lkr x lk ), where x is the overall sample mean, x l. = 1 b n bn k=1 r=1 x lkr, x.k = 1 g n gn r=1 x lkr, and x lk = 1 n n r=1 x lkr. Subtracting x, squaring and summing, we have the identity g b n (x lkr x) 2 = k=1 r=1 + g b bn( x l. x) 2 + gn( x.k x) 2 g k=1 b n( x lk x l. x.k + x) 2 + k=1 k=1 r=1 g b n (x lkr x lk ) 2. This identity is commonly expressed as SS tot = SS fac1 + SS fac2 + SS int + SS res. 16

17 The corresponding degrees of freedom is gbn 1 = (g 1) + (b 1) + (g 1)(b 1) + gb(n 1). The univariate analysis of variance table is simply the summary of the prior two equations. Univariate Two-Way Analysis of Variance Table Source Sum of Degrees of Mean F- of variation Squares Freedom Squares ratio Factor 1 SS fac1 g 1 MS fac1 = SS fac1 g 1 Factor 2 SS fac2 b 1 MS fac2 = SS fac2 b 1 Interaction SS int (g 1)(b 1) MS int = SS int (g 1)(b 1) Residuals SS res gb(n 1) MSE = SSres gb(n 1) Total SS tot gbn 1 MS fac1 MSE MS fac2 MSE MS int MSE In the table, mean squares are defined as the sum of squares divided by its degrees of freedom. For instance, MSE = 1 g b n gb(n 1) k=1 r=1 (x lkr x lk ) 2, which is an estimate of σ 2. The hypothesis of no iteration, H o : γ lk = 0 for all l and k versus H a : γ lk 0 for some l and k, can be tested by the F -ratio F = MS int F MSE (g 1)(b 1),gb(n 1). Similar tests can be done for the factor effects. Multivariate case. The multivariate version of the model is X lkr = µ + τ l + β k + γ lk + e lkr, l = 1,..., g; k = 1,..., b; r = 1,..., n, where e lkr N p (0, Σ), and g τ l = b k=1 β k = g γ lk = b k=1 γ lk = 0. The corresponding decomposition for the data is x lkr = x + ( x l. x) + ( x.k x) + ( x lk x l. x.k + x) + (x lkr x lk. This leads to the identity g b n (x lkr x)(x lkr x) = k=1 r=1 + + Denote the identity as g b bn( x l. x)( x l. x) + gn( x.k x)( x.k x) g k=1 g k=1 b n( x lk x l. x.k + x)( x lk x l. x.k + x) b k=1 r=1 n (x lkr x lk )(x lkr x lk ). SSP tot = SSP fac1 + SSP fac2 + SSP int + SSP res, 17

18 where SSP stands for sum of squares and cross-products. The identity for the degrees of freedom remains the same as the univariate case. Similarly, we can construct a multivariate Two-Way Analysis of Variance table as the univariate case. However, the tests are conducted based on the generalized variances. A test of no interaction, is the likelihood ratio statistic H o : γ lk = 0 for all l, k vs H a : γ lk 0 some l, k Λ = SSP res SSP int + SSP res. Using Bartlett s approximation, one reject H o at the α level if [ gb(n 1) The main effect of factor 1 is tested by The test statistic is p + 1 (g 1)(b 1) 2 ] ln(λ ) > χ 2 (g 1)(b 1)p(α). H o : τ 1 = τ 2 = = τ g = 0 vs H a : τ l 0 for some l. Λ = The corresponding Bartlett s approximation is [ gb(n 1) SSP res SSP fac1 + SSP res. p + 1 (g 1) 2 Similarly, the main effect of factor 2 is tested by The test statistic is ] ln(λ ) χ 2 (g 1)p. H o : β 1 = β 2 = = β b = 0 vs H a : β k 0 for some k. Λ = The corresponding Bartlett s approximation is [ gb(n 1) SSP res SSP fac2 + SSP res. p + 1 (b 1) 2 ] ln(λ ) χ 2 (b 1)p. When a null hypothesis is rejected, one can consider the simultaneous confidence intervals (based on Bonferroni method) to conduct further study. 18

19 *** Third example **** Two factors > da=read.table("t6-4.dat") > da V1 V2 V3 V4 V > y=cbind(da[,3],da[,4],da[,5]) > fac1=factor(da[,1]) > fac1 [1] Levels: 0 1 > fac2=factor(da[,2]) **** Analyze individual response variables **** > y1=x[,3] > m1=aov(y1~fac1+fac2+fac1*fac2) > summary(m1) Df Sum Sq Mean Sq F value Pr(>F) fac ** fac * fac1:fac Residuals

20 > y2=x[,4] > m2=aov(y2~fac1+fac2+fac1*fac2) > summary(m2) Df Sum Sq Mean Sq F value Pr(>F) fac * fac fac1:fac Residuals > y3=x[,5] > m3=aov(y3~fac1+fac2+fac1*fac2) > summary(m3) Df Sum Sq Mean Sq F value Pr(>F) fac fac fac1:fac Residuals **** Joint analysis **** > m2=manova(y~fac1+fac2+fac1*fac2) > m2 Call: manova(y ~ fac1 + fac2 + fac1 * fac2) Terms: fac1 fac2 fac1:fac2 Residuals resp resp resp Deg. of Freedom Residual standard error: Estimated effects may be unbalanced > summary(m2,test="wilks") Df Wilks approx F num Df den Df Pr(>F) fac ** fac * fac1:fac Residuals Signif. codes: 0 *** ** 0.01 *

21 > summary(m2,test="pillai") Df Pillai approx F num Df den Df Pr(>F) fac ** fac * fac1:fac Residuals Signif. codes: 0 *** ** 0.01 * Profile analysis Profile analysis pertains to situations in which a battery of p treatments are administered to two or more groups of subjects, All responses must be expressed in similar units and the responses for the different groups are assumed to be independent of one another. In profile analysis, the question of equality of mean vectors is divided into several specific possibilities. For example, consider the case of two groups. The mean vectors are µ i = (µ i1, µ i2,..., µ ip ), where i = 1, 2. The questions of interest in profile analysis are 1. Are the profile parallel? Equivalently, is H o1 : µ 1j µ 1,j 1 = µ 2j µ 2,j 1 for j = 2, 3,..., p, acceptable? 2. Assuming that the profile are parallel, are the profile coincident? Equivalently, is H o2 : µ 1j = µ 2j for j = 1, 2,..., p, acceptable? 3. Assuming that the profile are coincident, are the profiles level? That is, are all means equal to the same values? Equivalently, is H o3 : µ 11 = µ 12 = = µ 1p = µ 21 = µ 22 = µ 2p acceptable? The null hypothesis H o1 can be written as where C is a contrast matrix C (p 1) p = H o1 : Cµ 1 = Cµ 2, The data can then be transfomed to obtain the samples {Cx 1j } n 1 j=1 and {Cx 2j} n 2 j=1, where n 1 and n 2 are the sample sizes of the two groups, respectively. Consequently,to test for parallel profiles for two normal populations, one rejects H o1 : Cµ 1 = Cµ 2 at the level α if. T 2 = ( x 1 x 2 ) C [( 1 n n 2 ) CS pool C ] 1 C( x 1 x 2 ) > c 2, 21

22 where c 2 = (n 1 + n 2 2)(p 1) F p 1,n1 +n n n 2 p 2 p(α). When the profiles are parallel, then either µ 1i > µ 2i or µ 1i < µ 2i for all i. Under this condition, the profiles will be coincident only if p i=1 µ 1i = p i=1 µ 2i, i.e. 1 µ 1 = 1 µ 2, where 1 is the p-dimensional vector of 1 s. Therefore, the second stage of the test is H o2 : 1 µ 1 = 1 µ 2. One can transform the data and apply the usual two-sample t-test. Specifically, to test for coincident profiles, given that the profiles are parallel, one rejects H o2 at the level α if [( 1 T 2 = 1 ( x 1 x 2 ) + 1 ) 1 1 S pool 1] 1 ( x 1 x 2 ) n 1 n 2 = 1 ( x 1 x 2 ) ( ) > t 2 n 1 +n 2 2(α/2). 1 n n 2 1 S pool 1 The next step is to check whether all variables have the same mean so that the common profile is level. When H o1 and H o2 hold, the common mean vector µ is estimated by x = n 1 n 1 + n 2 x 1 + n 2 n 1 + n 2 x 2. If the common profile is level, then µ 1 = µ 2 = = µ p and the third null hypothesis is H o3 : Cµ = 0, where C is defined in the step 1. Thus, to test for level profiles, given that profiles are coincident, one rejects H o3 at level α if (n 1 + n 2 ) x C [CSC ] 1 C x > c 2 where S is the sample covariance matrix based on all n 1 + n 2 observations and c 2 = (n 1 + n 2 1)(p 1) F p 1,n1 +n n 1 + n 2 p p+1(α). Remark: A R script profile.r is available on the course web to perform the three profile tests discussed. For illustration, consider the data on Table 6.14 of the textbook. The results are given below: > source("profile.r") > da=read.table( T6-14.DAT ) > x1=da[1:30,1:4] > x2=da[31:60,1:4] > cbind(x1,x2) V1 V2 V3 V4 V1 V2 V3 V

23 > profile(x1,x2) [1] "Are the profiles parallel?" [,1] Test-T p.value [1] "Are the profiles coincident?" [,1] [,2] Test-T p.value [1] "Are the profiles level?" [,1] [,2] [,3] Test-T e+01 p.value e-04 23

24 [,1] [,2] [,3] Test-T e+01 p.value e-04 7 Growth curves Growth curve is a special case of the repeated measure problem. Here a single treatment is applied to each subject and a single characteristic is observed over a period of time. For example, we could measure the weight of each puppy at birth and then once a month for a period of time. The weight curve of a dog that is of interest and, hence, is referred to as growth curve. Consider the example of Potthoff-Roy model for quadratic growth. Here p measurements on all subjects are taken at times t 1, t 2,..., t p, and the model is E(X) = E X 1 X 2. X p β 0 + β 1 t 1 + β 2 t 2 1 β 0 + β 1 t 2 + β 2 t 2 2 =., β 0 + β 1 t p + β 2 t 2 p where the ith mean µ i is the quadratic expression evaluated at t i. When several groups of subjects are involved, one likes to compare the growth curve among the groups. Assume that g groups of subjects are involved and for group l, the random sample consists of X l1,..., X l, n l, where n l > 0 is the sample size. Assumption: All of the X lj are independent and have the same covariance matrix Σ. Under the quadratic growth model, the mean vectors are 1 t 1 t t 2 t 2 2 E[X lj ] =... 1 t p t 2 p β l0 β l1 β l2 Bβ l. The model can easily be generalized to the qth-order polynomial. Under the assumption of multivariate normality, the MLE of the β l are where β l = (B S 1 pool B) 1 B S 1 pool X l, l = 1, 2,..., g, S pool = 1 N g [(n 1 1)S (n g 1)S g ] = 1 N g W with N = g n l is the pooled estimator of the common covariance matrix Σ. The estimate covariances of the maximum likelihood estimate (MLE) are Cov( β l ) = k n l (B S 1 pool B) 1, l = 1,..., g, 24

25 where k = (N g)(n g 1)/(N g p + q)(n g p + q + 1). The covariance between β i and β j are 0 for i j. To test that a qth-order polynomial is adequate, the model is fit without restrictions. That is, one fits the model separately to each group. The sum of squares and cross-product matrix then becomes g n l W q = (X lj B β l )(X lj B β l ) j=1 which has N g+p q 1 degrees of freedom. The likelihood ratio test of the null hypothesis that the qth-order polynomial is adequate can be based on the Wilk s lambda Λ = W W q. The difference in the number of parameters between the null and alternative hypothesis is g(p q 1) so that (N 12 ) (p q + g) ln(λ ) χ 2 (p q 1)g, when n l are sufficiently large. Remark: A R script for growth curve analysis, called growth.r, is available on the course web. For demonstration, consider the data on Tables 6.5 and 6.6. > source("growth.r") > x1=read.table( t6-5.dat ) > x2=read.table( t6-6.dat ) > x=rbind(x1,x2) > nv=c(nrow(x1),nrow(x2)) > pv=c(0,1,2,3) > growth(x,nv,pv,2) [1] "Growth curve model" [1] "Order: " [1] 2 [1] "Beta-hat: " [,1] [,2] [1,] [2,] [3,] [1] "Standard errors: " [,1] [,2] [1,] [2,] [3,] [1] "W" 25

26 V1 V2 V3 V4 V V V V [1] "Wq" V1 V2 V3 V4 V V V V [1] "Lambda:" [1] [1] "Test result:" [,1] LR-stat p.value

Lecture 5: Hypothesis tests for more than one sample

1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated