THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

Size: px
Start display at page:

Download "THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay"

Transcription

1 THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay Lecture 3: Comparisons between several multivariate means Key concepts: 1. Paired comparison & repeated measures 2. Comparing means of two populations 3. Comparing means of several populations: One-way multivariate analysis of variance 4. Testing for equality of covariance matrices 5. Two-way multivariate analysis of variance 6. Profile analysis 7. Growth curves Key assumption: Normality or large sample sizes. 1 Paired comparison A procedure to eliminate the influence of extraneous unit-to-unit variation. Measures are taken on the same or identical units for different treatments. Recall the univariate paired t-test. Let X j1 and X j2 be the response of unit j to treatment 1 and 2, respectively, where j = 1,..., n. Let D j = X j1 X j2 be the difference between the treatments. This is so because both responses are from the same or idential unit. Assume D j N(δ, σ 2 d ). Consider the testing problem with H o : δ = 0 versus H a : δ 0. The paried t-test is t = D δ s d / n, where D = 1 n n j=1 D j and s 2 d = 1 n n 1 j=1 (D j D) 2. One rejects H o if and only if t t n 1 (α/2). The corresponding 100(1 α)% confidence interval for the mean difference δ = E(X j1 X j2 ) is D t n 1 (α/2) s d δ D + t n 1 (α/2) s d. n n Generalization: Suppose that p measurements are taken from each unit so that the responses are X 1ji and X 2ji, where X 1ji is the measure of the ith variable of j unit for treatment

2 1, and X 2ji is the measure of the ith variable of j unit for treatment 2. The difference is then D ji = X 1ji X 2ji and D j = (D j1,..., D jp ). Assume that E(D j ) = δ = (δ 1,..., δ p ) and cov(d j ) = Σ d. If we further assume D j N p (δ, Σ d ), then we can make inference about δ using the Hotelling s T 2 statistic where D = 1 n n j=1 T 2 = n( D δ) S 1 d ( D δ), D j and S 2 d = 1 n 1 n (D j D)(D j D) Result 6.1. Let the differences D 1,..., D n be a random sample from an N p (δ, Σ d ) population. Then, T 2 = n( D δ) S 1 d ( D δ) is distributed as an [(n 1)p/(n p)]f p,n p random variable. If n and n p are both large, T 2 is approximately distributed as χ 2 p random variable. Inference: Suppose the random sample consists of d 1,..., d n and the population is N p (δ, Σ d ). Then, reject H o : δ = 0 in favor of H a : δ 0 if j=1 T 2 = n d S 1 d d > (n 1)p n p F p,n p(α). A 100(1 α)% confidence region for δ is ( d δ) S 1 d ( d δ) (n 1)p n(n p) F p,n p(α). A 100(1 α)% simultaneous confidence intervals for the individual mean differences δ i are (n 1)p d i ± n p F s 2 d i p,n p(α) n, where d i is the ith element of d and s 2 di is the (i, i)th element of S d. The Bonferroni 100(1 α)% simultaneous confidence intervals for the individual mean differences are d i ± t n 1 (α/(2p)) s 2 d i n. Finally, if n and n p are sufficiently large, then the normality assumption can be dropped, and the simultaneous C.I.s can be obtained by replacing [(n 1)p/(n p)]f p,n p (α) by χ 2 p(α). Remark: The R programs of Chapter 5 can be used to perform paired comparison. Example: Consider the data on Table 6.1 of the text. 2

3 > x=read.table("t6-1.dat") > d1=x[,1]-x[,3] > d2=x[,2]-x[,4] > d=cbind(d1,d2) > source( Hotelling.R ) > Hotelling(d,rep(0,2)) [,1] Hoteliing-T p.value > source("cregion.r") > confreg(d) [1] "C.R. based on T^2" [,1] [,2] [1,] [2,] [1] "CR based on individual t" [,1] [,2] [1,] [2,] [1] "CR based on Bonferroni" [,1] [,2] [1,] [2,] [1] "Asymp. simu. CR" [,1] [,2] [1,] [2,] Contrast matrix. Definition: A p-dimensional vector is called a contrast vector if its elements sum to zero. By definition, contrast vectors are orthogonal to the vector of ones. A m k matrix is called a contrast matrix if all its rows are contrast vectors. For example, c = (1, 0, 1, 0) is a contrast vector. The above paired comparisons can be achieved by using contrast matrix. For example, consider the effluent data of Example 6.1. Instead of computing the differenced data, we can directly use the observations in Table 6.1. The observation for Sample 1 is x 1 = (6, 27, 25.15). Construct the contrast matrix C = [ ]. (1)

4 Clearly, the differenced data are d j = Cx j for j = 1,..., n. Furthermore, d = C x and S d = CSC, where S is the sample covariance matrix of the data. The T 2 statistic then becomes T 2 = n x C (CSC ) 1 C x. Consequently, there is no need to calculate the differenced data d. This idea is particularly useful in analyzing repeated measures in which different treatments are applied to each unit once over successive periods of time. Suppose there are q treatments, then the observation for the jth unit is X j = (X j1, X j2,..., X jq ), j = 1,..., n. Let µ = E(X). To test the hypothesis that all treatments have the same effect is equivalent to test all elements of µ are equal. To this end, we can construct the contrast matrix C 1 as µ 1 µ µ 1 µ 1 µ µ 2 or C 2 as. = µ 1 µ q µ 1 µ 2 µ 2 µ 3. = µ q 1 µ q µ q C 1µ, µ 1 µ 2. µ q C 2µ. The problem then is to test C 1 µ = 0 or C 2 µ = 0. [Other contrast matrices are available.] This results in using the T 2 test statistic as T 2 = n(c X) (CSC ) 1 C X. Remark. The T 2 statistic does not depend on the choice of contrast matrix C. This is because rank(c 1 ) = rank(c 2 ) = q 1 and each row of C i are orthogonal to the vector of ones. Consequently, the rows of C 1 and the rows of C 2 span the (q 1)-dimensional subspace that is orthogonal to 1 q. Thus, there exists a non-singular matrix B (q 1) (q 1) such that C 1 = BC 2. For instance, for the C 1 and C 2 matrices given above, we have B = It is then easy to show that C 1 and C 2 give the same T 2 statistic. 4

5 Based on the prior discussion, we can test the equality of treatments in a repeated measures case by using the result below. Consider an N q (µ, Σ) population. Let C be a contrast matrix. An α-level test of H o : Cµ = 0 versus H a : Cµ 0 is to reject H o if T 2 = n(c x) (CSC ) 1 C x > (n 1)(q 1) F q 1,n q+1 (α), n q + 1 where x and S are the sample mean and covariance matrix. A confidence region for the contrasts Cµ is n(c x Cµ) (CSC ) 1 (C x Cµ) (n 1)(q 1) F q 1,n q+1. n q + 1 Consequently, simultaneous 100(1 α)% confidence intervals for the single contrasts c C for any contrast vectors of interest are c x (n 1)(q 1) c Sc ± F q 1,n q+1 (α) n q + 1 n. Example. Consider the Sleeping-dog data in Table 6.2. There are 19 observations and four treatments. To analyze the data, a R program called contrast.r is developed. The analysis is given below: > x=read.table("t6-2.dat") > dim(x) [1] 19 4 > x V1 V2 V3 V

6 > source("contrast.r") > cmtx=matrix(c(-1,1,1,-1,-1,-1,1,1,-1,1,-1,1),3,4) > cmtx [,1] [,2] [,3] [,4] [1,] [2,] [3,] > contrast(x,cmtx) [1] "Hotelling Tsq statistics & p-value" [1] e e-07 [1] "Simultaneous C.I. for each contrast" [,1] [,2] [1,] [2,] [3,] Comparing mean vectors of two populations The setup 1. X 11, X 12,..., X 1,n1 are p-dimensional random sample of size n 1 from a population with mean µ 1 and covariance matrix Σ X 21, X 22,..., X 2,n1 are p-dimensional random sample of size n 2 from a population with mean µ 2 and covariance matrix Σ The two random samples are independent. If n 1 and n 2 are small, some additional assumptions are needed. They are 1. both populations are normal, 2. Σ 1 = Σ 2 = Σ. Problem of interest: H o : µ 1 µ 2 = δ o versus H a : µ 1 µ 2 δ 0. Denote the sample mean and covariance of the random samples by x 1 and S 1 and x 2 and S 2, respectively. Under the assumption that Σ 1 = Σ 2, we can obtain a pooled estimate of the covariance matrix S pool = n 1 1 n 1 + n 2 2 S 1 + n 2 1 n 1 + n 2 2 S 2. 6

7 This pooled estimate is consistent as E(S pool ) = Σ. Note that E( X 1 X 2 ) = µ 1 µ 2 and Cov( X 1 X 2 ) = Cov( X 1 ) + Cov( X 2 ) = 1 n 1 Σ + 1 n 2 Σ, which can be estimated by ( 1 n n 2 )S pool. The following result holds. Result 6.2. If X 11, X 12,..., X 1,n1 form a random sample from N p (µ 1, Σ) and X 21, X 22,..., X 2,n2 forms a random sample from N p (µ 2, Σ) and the two random samples are independent, then T 2 = [ X 1 X 2 (µ 1 µ 2 )] [( 1 n n 2 ) S pool ] 1 [ X 1 X 2 (µ 1 µ 2 )] is distributed as Consequently, where c 2 = (n 1+n 2 2)p (n 1 +n 2 p 1) F p,n 1 +n 2 p 1(α). (n 1 + n 2 2)p (n 1 + n 2 p 1) F p,n 1 +n 2 p 1. P r(t 2 c 2 ) = 1 α, Proof: (1) X 1 X 2 N p (µ 1 µ 2, [(1/n 1 ) + (1/n 2 )]Σ), (2) (n 1 1)S 1 W n1 1(Σ) and (n 2 1)S 2 W n2 1(Σ), and (3) (n 1 1)S 1 and (n 2 1)S 2 are independent so that (n 1 1)S 1 + (n 2 1)S 2 W n1 +n 2 2(Σ). Result 6.3. Let c 2 = (n 1+n 2 2)p (n 1 +n 2 p 1) F p,n 1 +n 2 p 1(α). With probability 1 α, ( ) a ( X 1 X 2 ) ± c a 1n1 + 1n2 S pool a will cover a (µ 1 µ 2 ) for all a. Thus, the simultaneous confidence intervals for µ 1i µ 2i is ( ( X 1i X 1 2i ) ± c + 1 ) s ii,pool, i = 1,..., p, n 1 n 2 where s ii,pool denotes the (i, i)th element of the matrix S pool. The Bonferroni 100(1 α)% simultaneous C.I. for µ 1i µ 2i are ( x 1i x 2i ) ± t n1 +n 2 2(α/(2p)) ( 1 n n 2 ) s ii,pool. Case: Σ 1 Σ 2. In this case, there is no pooling in covariance matrix estimation and ( we typically ) require that n 1 p and n 2 p are sufficiently large. One can then replace 1 n n 2 S pool by 1 n 1 S n 2 S 2 and F -distribution by χ 2 p distribution. 7

8 When n 1 = n 2 = n, then 1 S S 2 = 1 n 1 n 2 n (S 1 + S 2 ) = 2 ( 1 n 2 S ) 2 S 2 ( 1 = n + 1 ) ( ) (n 1)S 1 n (n 1) + (n 1) + (n 1)S 2 (n 1) + (n 1) ( 1 = n + 1 ) (n 1)S1 + (n 1)S 2 n n + n 2 ( 1 = n + 1 ) S pool. n Thus, where n 1 = n 2, the large sample procedure is essentially the same as the one using pooled covariance matrix. The impact of unequal covariance matrices is, therefore, least when the sample sizes are equal. The impact would be greater if either n 1 n 2 or n 2 n 1. The Behrens-Fisher Problem. Test H o : µ 1 µ 2 = 0 versus H a : µ 1 µ 2 0, where the two populations are normally distributed, but have different covariance matrices, and the sample sizes are not large. [Of course, n 1 > p and n 2 > p are needed in order to estimate the covariance matrices.] The key issue is the distribution of T 2 = [ X 1 X 2 (µ 1 µ 2 )] [ 1 n 1 S n 2 S 2 ] 1 [ X 1 X 2 (µ 1 µ 2 )] when n 1 p and n 2 p are small. This problem has been widely studied in the literature; see, for instance, Krishnamoorthy and Yu (2004, Statistics & Probability Letters) and Nel and Van der Merwe (1986, Communications in Statistics - Theory and Methods). A recommended method is to approximate the distribution of T 2 as T 2 vp = v p + 1 F p,v p+1, where v = 2 i=1 1 n i p + p { [ 2 ( ( ) ) ] tr n i S i n 1 S n 2 S 2 + ( [ 1 tr n i S i ( ) ]) n 1 S n 2 S 2 }, where min(n 1, n 2 ) v n 1 + n 2. Remark: A R script Behrens.R is available to perform the test. See course web. For illustration, consider the effluent data on Table 6.1. The paired comparison rejects the null hypothesis of equal means. The result of using Behrens-Fisher approach is given below. > x=read.table("t6-1.dat") 8

9 > dim(x) [1] 11 4 > x1=x[,1:2] > x2=x[,3:4] > source("behrens.r") > Behrens(x1,x2) [1] "Estimate of v: " [1] [1] "Test result:" [,1] Test-T p.value It also rejects the null hypothesis. 3 Comparing mean vectors of several populations Setup: g populations, and n l observations for population l. 1. {X l,1, X l,2,..., X l,nl } is a random sample pf size n l from a population with mean µ l, where l = 1,..., g. The random samples from different populations are independent. 2. All populations have a common covariance matrix Σ, which is positive definite. 3. Each population is multivariate normal with dimension p. The normality assumption in Condition 3 can be relaxed when the sample size is sufficiently large. Hypothesis of interest H o : µ 1 = µ 2 = = µ g versus H a : µ i µ j for some 1 i, j g and i j. Univariate case: Recall the case of p = 1. The null hypothesis of µ 1 = µ 2 = = µ l can be written as τ 1 = τ 2 = = τ l = 0, where τ j is the deviation of µ j from the overall mean µ, i.e., µ j = µ + τ j. The model can then be written as X l,j = µ + τ l + e l,j, l = 1,..., g; j = 1,..., n l, where e l,j N(0, σ 2 a). For unique identification of parameters, it is commonly assumed that g n lτ l = 0. For the data, an analogous decomposition is x l,j = x + ( x l x) + (x l,j x l ), 9

10 where x = ( g nl j=1 x l,j)/n with n = g n l, x l = ( n l j=1 x l,j)/n l. Here x is an estimate of µ, ( x l x) is an estimate of τ l and (x l,j x l ) is an estimate of the error term e l,j. Subtracting x from the prior equation, taking squares, and summing, we have the identity g n l (x l,j x) 2 = j=1 g n l ( x l x) 2 + g n l (x l,j x l ) 2. (2) j=1 The cross-product term drops because it is zero, indicating the terms are orthogonal to each other. This identity is often thought of as ( ) ( ) ( ) Sum of Squares Sum of Squares Sum of Squares = +. of Total Variations of Treatments of Residuals In addition, the number of independent quantities in each term of the above identity is related by g g n l 1 = (g 1) + (n l 1). This is known as the degrees of freedom for each term. The univariate Analysis of Variance Table (ANOVA) is a summary of the aboev results. Source Degrees of of variation Sum of Squares freedom Treatments SS tr = g n l( x l x) 2 g 1 Residuals SS res = g nl j=1 (x l,j x l ) 2 g n l g Total SS tot = g nl j=1 (x l,j x) 2 g n l 1 The usual F -test rejects the null hypothesis H o : τ 1 = τ 2 = = τ g = 0 at the α level if F = SS tr /(g 1) SS res / ( g n l g) > F g 1, n l g. The rational for the F-test is as follows. x l is an estimate of µ l so that the numerator is a weighted measure of the variation of x l between the g populations, where the weights depend on the sample size of each population. The issue then is to judge the magnitude of this variation. The denominator provides a reference measure of the variation because it is an estimate of the random variation (i.e., σ 2 ) of the data. If the variation between the populations is large with respect to the random noises, then the means are said to be different. Remark: The R command for univariate analysis of variance is aov. For illustration, consider the data in Example 6.7 of the text. The R analysis corresponding to that of Example 6.8 is as follows. 10

11 > x=c(1,1,1,2,2,3,3,3) > y=c(9,6,9,0,2,3,1,2) > g1=factor(x) > g1 [1] Levels: > help(aov) > m1=aov(y~g1) > m1 Call: aov(formula = y ~ g1) Terms: g1 Residuals Sum of Squares Deg. of Freedom 2 5 Residual standard error: Estimated effects may be unbalanced > summary(m1) Df Sum Sq Mean Sq F value Pr(>F) g ** Residuals Multivariate case. When p > 1, the model becomes X l,j = µ + τ l + e l,j, j = 1,..., n l ; l = 1,..., g where e l,j N p (0, Σ). As before, µ is the overall mean vector, and τ l denotes the lth treatment effect satisfying g n lτ l = 0. The data can be decomposed as x l,j = x + ( x l x) + (x l,j x l ). Subtracting x from the prior equation, post-multiplying by its own transpose and summing, we obtain g n l g g n l (x l,j x)(x l,j x) = n l ( x l x)( x l x) + (x l,j x l )(x l,j x l ), j=1 where, as in the univariate case, the cross-product term sums to zero. For ease in notation, we define g n l W = (x l,j x l )(x l,j x l ) j=1 j=1 = (n 1 1)S 1 + (n 2 1)S (n g 1)S g, 11

12 to represent the within population sum of squares and cross products matrix, and B = g n l ( x l x)( x l x) to denote the between population sum of squares and cross-products matrix. The hypothesis of no treatment effects, H o : τ 1 = τ 2 = = τ g = 0, is tested by considering the relative sizes of the treatment and residual sums of squares and cross-products. The multivariate analysis of variance (MANOVA) table is given by Source Matrix of sum of squares Degrees of of variation and cross-products freedom Treatment B = g n l( x l x)( x l x) g 1 Residuals W = g nl j=1 (x l,j x l )(x l,j x l ) g n l g Total B + W = g nl j=1 (x l,j x)(x l,j x) g n l g The test then involves generalized variances, i.e. Specifically, one rejects H o if g nl Λ = W B + W = g determinant of the covariance matrix. j=1 (x l,j x l )(x l,j x l ) nl j=1 (x l,j x)(x l,j x) is too small. This test statistics was proposed by Wilks and is commonly referred to as Wilk s lambda. The distribution of Λ is given in Table 6.3 of the text for some special cases (p. 303). For other cases and large sample sizes, a modification of Λ due to Bartlett (1938) can be used. Specifically, if H o is true and l n l = n is large, ( n 1 p + g ) ( ln(λ ) = n 1 p + g ) ( ) W ln, 2 2 B + W has approximately a chi-square distribution with p(g 1) degrees of freedom. Remark. The R command for multivariate analysis of variance is manova. Below are some examples. > help(manova) > help(summary.manova) ** Example 6.9 of the text on Page 304 and 305. > x=matrix(c(1,1,1,2,2,3,3,3,9,6,9,0,2,3,1,2,3,2,7,4,0,8,9,7),8,3) 12

13 > x [,1] [,2] [,3] [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] > fac1=factor(x[,1]) > xx=x[,2:3] > m2=manova(xx~fac1) > m2 Call: manova(xx ~ fac1) Terms: fac1 Residuals resp resp Deg. of Freedom 2 5 Residual standard error: Estimated effects may be unbalanced --- > summary(m2) Df Pillai approx F num Df den Df Pr(>F) fac ** Residuals > summary(m2,test= Wilks ) Df Wilks approx F num Df den Df Pr(>F) fac ** Residuals ** Another example ** > help(gl) % generates factors 13

14 > da=read.table("t6-9.dat") > dim(da) [1] 48 4 > y=cbind(da[,1],da[,2],da[,3]) > gen=factor(gl(2,24)) > gen [1] [25] Levels: 1 2 > m1=manova(y~gen) > m1 Call: manova(y ~ gen) Terms: gen Residuals resp resp resp Deg. of Freedom 1 46 Residual standard error: Estimated effects may be unbalanced > summary(m1,test="wilks") Df Wilks approx F num Df den Df Pr(>F) gen e-09 *** Residuals Signif. codes: 0 *** ** 0.01 * If the null hypothesis of equal mean vectors are rejected, then one can use simultaneous confidence intervals between the components of mean vectors to study the differences between the means. Or equivalently, consider the components of the treatment-effect vectors τ l. Consider the difference between the ith component of the treatment vectors τ k and τ l, i.e. τ k,i τ l,i. The sample estimate is ˆτ k,i ˆτ l,i = ( x k,i x i ) ( x l,i x i ) = x k,i x l,i. Since random samples are independent between populations, we have ( Var(ˆτ k,i ˆτ l,i ) = Var( X k,i X 1 l,i ) = + 1 ) σ ii, n k n l where σ ii is the (i, i)th element of Σ. In addition, ˆσ ii = w ii, where w n g ii is the (i, i)th element of W and n = g n l. For g populations with p dimensional data, there are p variables 14

15 and g(g 1)/2 pairwise differences so that the Bonferroni critical value if t n g (α/(2m)) with m = pg(g 1)/2. Consequently, with probability at least 1 α, τ k,i τ l,i belongs to ( ) ( α w ii 1 ( x k,i x l,i ) ± t n g + 1 ) pg(g 1) n g n k n l for all i = 1,... p and all differences l < k = 1,..., g. 4 Testing for equality of covariance matrices The setup: g populations and p variables. The covariance matrix of population j is Σ j, which is positive definite. H o : Σ 1 = Σ 2 = = Σ g = Σ versus H a : Σ i Σ j for some 1 i j g. The most commonly used test statistic is the Box s M test. It is a likelihood-ratio type of test. Under the normality assumption, the likelihood ratio statistic fof testing equality in covariance matrix is g ( ) (nl 1)/2 Sl Λ =, S pool where n l is the sample size of the lth population, S l is the sample covariance matrix of lth population and S pool = 1 g (n l 1) [(n 1 1)S 1 + (n 2 1)S (n g 1)S g ], is the pooled sample covariance matrix. Box s test is based on a χ 2 approximation to the sampling distribution of 2 ln(λ). Specifically, [ g ] g M 2 ln(λ) = (n l 1) ln( S pool ) [(n l 1) ln( S l )]. Under H o, S l are not expected to differ too much so that they should be close to S pool. In this case, the ratio of determinants should be close to 1 and the M-statistic will be small. Box s test. Let [ g ] [ ] 1 u = (n l 1) 1 2p 2 + 3p 1 g (n, l 1) 6(p + 1)(g 1) where p is the number of variables and g is the number of populations. Then {[ g ] } g C = (1 u)m = (1 u) (n l 1) ln( S pool ) [(n l 1) ln( S l )] 15

16 has an approximate χ 2 distribution with v = 1 p(p+1)(g 1) degrees of freedom. One rejects 2 H o if C > χ 2 p(p+1)(g 1)/2 (α). Remark: A simple R script, called Box-M.R is written to perform the Box-M test for equal covariance matrices. For illustration, consider the data in Table 6.1. The null hypothesis cannot be rejected at the 5% level. The program requires two input variables: (a) data set and (b) a vector of (n 1, n 2,..., n g ) of sample sizes. The data set are arranged in population ordering that matches the sample size vector. > source("box_m.r") > mm=box_m(y,nv) [1] "Test result:" [,1] Box.M-C p.value > names(mm) [1] "Box.M" "Test.Stat" "p.value" 5 Two-way multivariate analysis of variance Univariate case: The model is X lkr = µ + τ l + β k + γ lk + e lkr ; l = 1,..., g; k = 1,..., b; r = 1,..., n where g τ l = b k=1 β k = g γ lk = b k=1 γ lk = 0 and e lkr N(0, σ 2 ). Here µ is the overall mean, representing the general leve of response, τ l is the fixed effect of factor 1, β k is the fixed effect of factor 2, and γ lk is the interaction between factor 1 and factor 2. For the data, the corresponding decomposition is x lkr = x + ( x l. x) + ( x.k x) + ( x lk x l. x.k + x) + (x lkr x lk ), where x is the overall sample mean, x l. = 1 b n bn k=1 r=1 x lkr, x.k = 1 g n gn r=1 x lkr, and x lk = 1 n n r=1 x lkr. Subtracting x, squaring and summing, we have the identity g b n (x lkr x) 2 = k=1 r=1 + g b bn( x l. x) 2 + gn( x.k x) 2 g k=1 b n( x lk x l. x.k + x) 2 + k=1 k=1 r=1 g b n (x lkr x lk ) 2. This identity is commonly expressed as SS tot = SS fac1 + SS fac2 + SS int + SS res. 16

17 The corresponding degrees of freedom is gbn 1 = (g 1) + (b 1) + (g 1)(b 1) + gb(n 1). The univariate analysis of variance table is simply the summary of the prior two equations. Univariate Two-Way Analysis of Variance Table Source Sum of Degrees of Mean F- of variation Squares Freedom Squares ratio Factor 1 SS fac1 g 1 MS fac1 = SS fac1 g 1 Factor 2 SS fac2 b 1 MS fac2 = SS fac2 b 1 Interaction SS int (g 1)(b 1) MS int = SS int (g 1)(b 1) Residuals SS res gb(n 1) MSE = SSres gb(n 1) Total SS tot gbn 1 MS fac1 MSE MS fac2 MSE MS int MSE In the table, mean squares are defined as the sum of squares divided by its degrees of freedom. For instance, MSE = 1 g b n gb(n 1) k=1 r=1 (x lkr x lk ) 2, which is an estimate of σ 2. The hypothesis of no iteration, H o : γ lk = 0 for all l and k versus H a : γ lk 0 for some l and k, can be tested by the F -ratio F = MS int F MSE (g 1)(b 1),gb(n 1). Similar tests can be done for the factor effects. Multivariate case. The multivariate version of the model is X lkr = µ + τ l + β k + γ lk + e lkr, l = 1,..., g; k = 1,..., b; r = 1,..., n, where e lkr N p (0, Σ), and g τ l = b k=1 β k = g γ lk = b k=1 γ lk = 0. The corresponding decomposition for the data is x lkr = x + ( x l. x) + ( x.k x) + ( x lk x l. x.k + x) + (x lkr x lk. This leads to the identity g b n (x lkr x)(x lkr x) = k=1 r=1 + + Denote the identity as g b bn( x l. x)( x l. x) + gn( x.k x)( x.k x) g k=1 g k=1 b n( x lk x l. x.k + x)( x lk x l. x.k + x) b k=1 r=1 n (x lkr x lk )(x lkr x lk ). SSP tot = SSP fac1 + SSP fac2 + SSP int + SSP res, 17

18 where SSP stands for sum of squares and cross-products. The identity for the degrees of freedom remains the same as the univariate case. Similarly, we can construct a multivariate Two-Way Analysis of Variance table as the univariate case. However, the tests are conducted based on the generalized variances. A test of no interaction, is the likelihood ratio statistic H o : γ lk = 0 for all l, k vs H a : γ lk 0 some l, k Λ = SSP res SSP int + SSP res. Using Bartlett s approximation, one reject H o at the α level if [ gb(n 1) The main effect of factor 1 is tested by The test statistic is p + 1 (g 1)(b 1) 2 ] ln(λ ) > χ 2 (g 1)(b 1)p(α). H o : τ 1 = τ 2 = = τ g = 0 vs H a : τ l 0 for some l. Λ = The corresponding Bartlett s approximation is [ gb(n 1) SSP res SSP fac1 + SSP res. p + 1 (g 1) 2 Similarly, the main effect of factor 2 is tested by The test statistic is ] ln(λ ) χ 2 (g 1)p. H o : β 1 = β 2 = = β b = 0 vs H a : β k 0 for some k. Λ = The corresponding Bartlett s approximation is [ gb(n 1) SSP res SSP fac2 + SSP res. p + 1 (b 1) 2 ] ln(λ ) χ 2 (b 1)p. When a null hypothesis is rejected, one can consider the simultaneous confidence intervals (based on Bonferroni method) to conduct further study. 18

19 *** Third example **** Two factors > da=read.table("t6-4.dat") > da V1 V2 V3 V4 V > y=cbind(da[,3],da[,4],da[,5]) > fac1=factor(da[,1]) > fac1 [1] Levels: 0 1 > fac2=factor(da[,2]) **** Analyze individual response variables **** > y1=x[,3] > m1=aov(y1~fac1+fac2+fac1*fac2) > summary(m1) Df Sum Sq Mean Sq F value Pr(>F) fac ** fac * fac1:fac Residuals

20 > y2=x[,4] > m2=aov(y2~fac1+fac2+fac1*fac2) > summary(m2) Df Sum Sq Mean Sq F value Pr(>F) fac * fac fac1:fac Residuals > y3=x[,5] > m3=aov(y3~fac1+fac2+fac1*fac2) > summary(m3) Df Sum Sq Mean Sq F value Pr(>F) fac fac fac1:fac Residuals **** Joint analysis **** > m2=manova(y~fac1+fac2+fac1*fac2) > m2 Call: manova(y ~ fac1 + fac2 + fac1 * fac2) Terms: fac1 fac2 fac1:fac2 Residuals resp resp resp Deg. of Freedom Residual standard error: Estimated effects may be unbalanced > summary(m2,test="wilks") Df Wilks approx F num Df den Df Pr(>F) fac ** fac * fac1:fac Residuals Signif. codes: 0 *** ** 0.01 *

21 > summary(m2,test="pillai") Df Pillai approx F num Df den Df Pr(>F) fac ** fac * fac1:fac Residuals Signif. codes: 0 *** ** 0.01 * Profile analysis Profile analysis pertains to situations in which a battery of p treatments are administered to two or more groups of subjects, All responses must be expressed in similar units and the responses for the different groups are assumed to be independent of one another. In profile analysis, the question of equality of mean vectors is divided into several specific possibilities. For example, consider the case of two groups. The mean vectors are µ i = (µ i1, µ i2,..., µ ip ), where i = 1, 2. The questions of interest in profile analysis are 1. Are the profile parallel? Equivalently, is H o1 : µ 1j µ 1,j 1 = µ 2j µ 2,j 1 for j = 2, 3,..., p, acceptable? 2. Assuming that the profile are parallel, are the profile coincident? Equivalently, is H o2 : µ 1j = µ 2j for j = 1, 2,..., p, acceptable? 3. Assuming that the profile are coincident, are the profiles level? That is, are all means equal to the same values? Equivalently, is H o3 : µ 11 = µ 12 = = µ 1p = µ 21 = µ 22 = µ 2p acceptable? The null hypothesis H o1 can be written as where C is a contrast matrix C (p 1) p = H o1 : Cµ 1 = Cµ 2, The data can then be transfomed to obtain the samples {Cx 1j } n 1 j=1 and {Cx 2j} n 2 j=1, where n 1 and n 2 are the sample sizes of the two groups, respectively. Consequently,to test for parallel profiles for two normal populations, one rejects H o1 : Cµ 1 = Cµ 2 at the level α if. T 2 = ( x 1 x 2 ) C [( 1 n n 2 ) CS pool C ] 1 C( x 1 x 2 ) > c 2, 21

22 where c 2 = (n 1 + n 2 2)(p 1) F p 1,n1 +n n n 2 p 2 p(α). When the profiles are parallel, then either µ 1i > µ 2i or µ 1i < µ 2i for all i. Under this condition, the profiles will be coincident only if p i=1 µ 1i = p i=1 µ 2i, i.e. 1 µ 1 = 1 µ 2, where 1 is the p-dimensional vector of 1 s. Therefore, the second stage of the test is H o2 : 1 µ 1 = 1 µ 2. One can transform the data and apply the usual two-sample t-test. Specifically, to test for coincident profiles, given that the profiles are parallel, one rejects H o2 at the level α if [( 1 T 2 = 1 ( x 1 x 2 ) + 1 ) 1 1 S pool 1] 1 ( x 1 x 2 ) n 1 n 2 = 1 ( x 1 x 2 ) ( ) > t 2 n 1 +n 2 2(α/2). 1 n n 2 1 S pool 1 The next step is to check whether all variables have the same mean so that the common profile is level. When H o1 and H o2 hold, the common mean vector µ is estimated by x = n 1 n 1 + n 2 x 1 + n 2 n 1 + n 2 x 2. If the common profile is level, then µ 1 = µ 2 = = µ p and the third null hypothesis is H o3 : Cµ = 0, where C is defined in the step 1. Thus, to test for level profiles, given that profiles are coincident, one rejects H o3 at level α if (n 1 + n 2 ) x C [CSC ] 1 C x > c 2 where S is the sample covariance matrix based on all n 1 + n 2 observations and c 2 = (n 1 + n 2 1)(p 1) F p 1,n1 +n n 1 + n 2 p p+1(α). Remark: A R script profile.r is available on the course web to perform the three profile tests discussed. For illustration, consider the data on Table 6.14 of the textbook. The results are given below: > source("profile.r") > da=read.table( T6-14.DAT ) > x1=da[1:30,1:4] > x2=da[31:60,1:4] > cbind(x1,x2) V1 V2 V3 V4 V1 V2 V3 V

23 > profile(x1,x2) [1] "Are the profiles parallel?" [,1] Test-T p.value [1] "Are the profiles coincident?" [,1] [,2] Test-T p.value [1] "Are the profiles level?" [,1] [,2] [,3] Test-T e+01 p.value e-04 23

24 [,1] [,2] [,3] Test-T e+01 p.value e-04 7 Growth curves Growth curve is a special case of the repeated measure problem. Here a single treatment is applied to each subject and a single characteristic is observed over a period of time. For example, we could measure the weight of each puppy at birth and then once a month for a period of time. The weight curve of a dog that is of interest and, hence, is referred to as growth curve. Consider the example of Potthoff-Roy model for quadratic growth. Here p measurements on all subjects are taken at times t 1, t 2,..., t p, and the model is E(X) = E X 1 X 2. X p β 0 + β 1 t 1 + β 2 t 2 1 β 0 + β 1 t 2 + β 2 t 2 2 =., β 0 + β 1 t p + β 2 t 2 p where the ith mean µ i is the quadratic expression evaluated at t i. When several groups of subjects are involved, one likes to compare the growth curve among the groups. Assume that g groups of subjects are involved and for group l, the random sample consists of X l1,..., X l, n l, where n l > 0 is the sample size. Assumption: All of the X lj are independent and have the same covariance matrix Σ. Under the quadratic growth model, the mean vectors are 1 t 1 t t 2 t 2 2 E[X lj ] =... 1 t p t 2 p β l0 β l1 β l2 Bβ l. The model can easily be generalized to the qth-order polynomial. Under the assumption of multivariate normality, the MLE of the β l are where β l = (B S 1 pool B) 1 B S 1 pool X l, l = 1, 2,..., g, S pool = 1 N g [(n 1 1)S (n g 1)S g ] = 1 N g W with N = g n l is the pooled estimator of the common covariance matrix Σ. The estimate covariances of the maximum likelihood estimate (MLE) are Cov( β l ) = k n l (B S 1 pool B) 1, l = 1,..., g, 24

25 where k = (N g)(n g 1)/(N g p + q)(n g p + q + 1). The covariance between β i and β j are 0 for i j. To test that a qth-order polynomial is adequate, the model is fit without restrictions. That is, one fits the model separately to each group. The sum of squares and cross-product matrix then becomes g n l W q = (X lj B β l )(X lj B β l ) j=1 which has N g+p q 1 degrees of freedom. The likelihood ratio test of the null hypothesis that the qth-order polynomial is adequate can be based on the Wilk s lambda Λ = W W q. The difference in the number of parameters between the null and alternative hypothesis is g(p q 1) so that (N 12 ) (p q + g) ln(λ ) χ 2 (p q 1)g, when n l are sufficiently large. Remark: A R script for growth curve analysis, called growth.r, is available on the course web. For demonstration, consider the data on Tables 6.5 and 6.6. > source("growth.r") > x1=read.table( t6-5.dat ) > x2=read.table( t6-6.dat ) > x=rbind(x1,x2) > nv=c(nrow(x1),nrow(x2)) > pv=c(0,1,2,3) > growth(x,nv,pv,2) [1] "Growth curve model" [1] "Order: " [1] 2 [1] "Beta-hat: " [,1] [,2] [1,] [2,] [3,] [1] "Standard errors: " [,1] [,2] [1,] [2,] [3,] [1] "W" 25

26 V1 V2 V3 V4 V V V V [1] "Wq" V1 V2 V3 V4 V V V V [1] "Lambda:" [1] [1] "Test result:" [,1] LR-stat p.value

Lecture 5: Hypothesis tests for more than one sample

Lecture 5: Hypothesis tests for more than one sample 1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) II Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 1 Compare Means from More Than Two

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

5 Inferences about a Mean Vector

5 Inferences about a Mean Vector 5 Inferences about a Mean Vector In this chapter we use the results from Chapter 2 through Chapter 4 to develop techniques for analyzing data. A large part of any analysis is concerned with inference that

More information

Other hypotheses of interest (cont d)

Other hypotheses of interest (cont d) Other hypotheses of interest (cont d) In addition to the simple null hypothesis of no treatment effects, we might wish to test other hypothesis of the general form (examples follow): H 0 : C k g β g p

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Comparisons of Several Multivariate Populations Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS

More information

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay Lecture 5: Multivariate Multiple Linear Regression The model is Y n m = Z n (r+1) β (r+1) m + ɛ

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five

More information

Chapter 7, continued: MANOVA

Chapter 7, continued: MANOVA Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 17 for Applied Multivariate Analysis Outline Multivariate Analysis of Variance 1 Multivariate Analysis of Variance The hypotheses:

More information

Comparisons of Several Multivariate Populations

Comparisons of Several Multivariate Populations Comparisons of Several Multivariate Populations Edps/Soc 584, Psych 594 Carolyn J Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees,

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Comparisons of Two Means Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c

More information

You can compute the maximum likelihood estimate for the correlation

You can compute the maximum likelihood estimate for the correlation Stat 50 Solutions Comments on Assignment Spring 005. (a) _ 37.6 X = 6.5 5.8 97.84 Σ = 9.70 4.9 9.70 75.05 7.80 4.9 7.80 4.96 (b) 08.7 0 S = Σ = 03 9 6.58 03 305.6 30.89 6.58 30.89 5.5 (c) You can compute

More information

Inferences about a Mean Vector

Inferences about a Mean Vector Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University

More information

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization. 1 Chapter 1: Research Design Principles The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization. 2 Chapter 2: Completely Randomized Design

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

Group comparison test for independent samples

Group comparison test for independent samples Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern. STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems

More information

Multiple comparisons - subsequent inferences for two-way ANOVA

Multiple comparisons - subsequent inferences for two-way ANOVA 1 Multiple comparisons - subsequent inferences for two-way ANOVA the kinds of inferences to be made after the F tests of a two-way ANOVA depend on the results if none of the F tests lead to rejection of

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation Using the least squares estimator for β we can obtain predicted values and compute residuals: Ŷ = Z ˆβ = Z(Z Z) 1 Z Y ˆɛ = Y Ŷ = Y Z(Z Z) 1 Z Y = [I Z(Z Z) 1 Z ]Y. The usual decomposition

More information

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

ANOVA Longitudinal Models for the Practice Effects Data: via GLM Psyc 943 Lecture 25 page 1 ANOVA Longitudinal Models for the Practice Effects Data: via GLM Model 1. Saturated Means Model for Session, E-only Variances Model (BP) Variances Model: NO correlation, EQUAL

More information

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' ('' 14 3! "#!$%# $# $&'('$)!! (Analysis of Variance : ANOVA) *& & "#!# +, ANOVA -& $ $ (+,$ ''$) *$#'$)!!#! (Multivariate Analysis of Variance : MANOVA).*& ANOVA *+,'$)$/*! $#/#-, $(,!0'%1)!', #($!#$ # *&,

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA

BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA PART 1: NESTED ANOVA Nested designs are used when levels of one factor are not represented within all levels of another factor. Often this is

More information

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3. Simfit Tutorials and worked examples for simulation, curve fitting, statistical analysis, and plotting. http://www.simfit.org.uk MANOVA examples From the main SimFIT menu choose [Statistcs], [Multivariate],

More information

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: MULTIVARIATE ANALYSIS OF VARIANCE MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: 1. Cell sizes : o

More information

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is Stat 501 Solutions and Comments on Exam 1 Spring 005-4 0-4 1. (a) (5 points) Y ~ N, -1-4 34 (b) (5 points) X (X,X ) = (5,8) ~ N ( 11.5, 0.9375 ) 3 1 (c) (10 points, for each part) (i), (ii), and (v) are

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Mean Vector Inferences

Mean Vector Inferences Mean Vector Inferences Lecture 5 September 21, 2005 Multivariate Analysis Lecture #5-9/21/2005 Slide 1 of 34 Today s Lecture Inferences about a Mean Vector (Chapter 5). Univariate versions of mean vector

More information

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models EPSY 905: Multivariate Analysis Spring 2016 Lecture #12 April 20, 2016 EPSY 905: RM ANOVA, MANOVA, and Mixed Models

More information

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 29 Multivariate Linear Regression- Model

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information

MULTIVARIATE ANALYSIS OF VARIANCE

MULTIVARIATE ANALYSIS OF VARIANCE MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,

More information

Statistics for EES Factorial analysis of variance

Statistics for EES Factorial analysis of variance Statistics for EES Factorial analysis of variance Dirk Metzler June 12, 2015 Contents 1 ANOVA and F -Test 1 2 Pairwise comparisons and multiple testing 6 3 Non-parametric: The Kruskal-Wallis Test 9 1 ANOVA

More information

Notes on Maxwell & Delaney

Notes on Maxwell & Delaney Notes on Maxwell & Delaney PSY710 12 higher-order within-subject designs Chapter 11 discussed the analysis of data collected in experiments that had a single, within-subject factor. Here we extend those

More information

STAT 730 Chapter 5: Hypothesis Testing

STAT 730 Chapter 5: Hypothesis Testing STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The

More information

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM Junyong Park Bimal Sinha Department of Mathematics/Statistics University of Maryland, Baltimore Abstract In this paper we discuss the well known multivariate

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Rationale and MANOVA test statistics underlying principles MANOVA assumptions Univariate ANOVA Planned and unplanned Multivariate ANOVA

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Two sample T 2 test 1 Two sample T 2 test 2 Analogous to the univariate context, we

More information

Profile Analysis Multivariate Regression

Profile Analysis Multivariate Regression Lecture 8 October 12, 2005 Analysis Lecture #8-10/12/2005 Slide 1 of 68 Today s Lecture Profile analysis Today s Lecture Schedule : regression review multiple regression is due Thursday, October 27th,

More information

Multivariate analysis of variance and covariance

Multivariate analysis of variance and covariance Introduction Multivariate analysis of variance and covariance Univariate ANOVA: have observations from several groups, numerical dependent variable. Ask whether dependent variable has same mean for each

More information

Multivariate Analysis of Variance

Multivariate Analysis of Variance Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)

More information

Rejection regions for the bivariate case

Rejection regions for the bivariate case Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Discriminant Analysis Background 1 Discriminant analysis Background General Setup for the Discriminant Analysis Descriptive

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

STAT 501 EXAM I NAME Spring 1999

STAT 501 EXAM I NAME Spring 1999 STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your

More information

Multivariate Linear Models

Multivariate Linear Models Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements

More information

Analysis of variance using orthogonal projections

Analysis of variance using orthogonal projections Analysis of variance using orthogonal projections Rasmus Waagepetersen Abstract The purpose of this note is to show how statistical theory for inference in balanced ANOVA models can be conveniently developed

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance Chapter 9 Multivariate and Within-cases Analysis 9.1 Multivariate Analysis of Variance Multivariate means more than one response variable at once. Why do it? Primarily because if you do parallel analyses

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population

More information

ANOVA approaches to Repeated Measures. repeated measures MANOVA (chapter 3)

ANOVA approaches to Repeated Measures. repeated measures MANOVA (chapter 3) ANOVA approaches to Repeated Measures univariate repeated-measures ANOVA (chapter 2) repeated measures MANOVA (chapter 3) Assumptions Interval measurement and normally distributed errors (homogeneous across

More information

3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43 3. The F Test for Comparing Reduced vs. Full Models opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics 510 1 / 43 Assume the Gauss-Markov Model with normal errors: y = Xβ + ɛ, ɛ N(0, σ

More information

Booth School of Business, University of Chicago Business 41914, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Midterm

Booth School of Business, University of Chicago Business 41914, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Midterm Booth School of Business, University of Chicago Business 41914, Spring Quarter 017, Mr Ruey S Tsay Solutions to Midterm Problem A: (51 points; 3 points per question) Answer briefly the following questions

More information

4.1 Order Specification

4.1 Order Specification THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2009, Mr Ruey S Tsay Lecture 7: Structural Specification of VARMA Models continued 41 Order Specification Turn to data

More information

Analysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร

Analysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร Analysis of Variance ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร pawin@econ.tu.ac.th Outline Introduction One Factor Analysis of Variance Two Factor Analysis of Variance ANCOVA MANOVA Introduction

More information

Multivariate Data Analysis Notes & Solutions to Exercises 3

Multivariate Data Analysis Notes & Solutions to Exercises 3 Notes & Solutions to Exercises 3 ) i) Measurements of cranial length x and cranial breadth x on 35 female frogs 7.683 0.90 gave x =(.860, 4.397) and S. Test the * 4.407 hypothesis that =. Using the result

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Random Matrices and Multivariate Statistical Analysis

Random Matrices and Multivariate Statistical Analysis Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model Biostatistics 250 ANOVA Multiple Comparisons 1 ORIGIN 1 Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model When the omnibus F-Test for ANOVA rejects the null hypothesis that

More information

Week 14 Comparing k(> 2) Populations

Week 14 Comparing k(> 2) Populations Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.

More information

6 Multivariate Regression

6 Multivariate Regression 6 Multivariate Regression 6.1 The Model a In multiple linear regression, we study the relationship between several input variables or regressors and a continuous target variable. Here, several target variables

More information

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS 1 WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS I. Single-factor designs: the model is: yij i j ij ij where: yij score for person j under treatment level i (i = 1,..., I; j = 1,..., n) overall mean βi treatment

More information

Comparing two independent samples

Comparing two independent samples In many applications it is necessary to compare two competing methods (for example, to compare treatment effects of a standard drug and an experimental drug). To compare two methods from statistical point

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

MULTIVARIATE POPULATIONS

MULTIVARIATE POPULATIONS CHAPTER 5 MULTIVARIATE POPULATIONS 5. INTRODUCTION In the following chapters we will be dealing with a variety of problems concerning multivariate populations. The purpose of this chapter is to provide

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013 18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are

More information

M M Cross-Over Designs

M M Cross-Over Designs Chapter 568 Cross-Over Designs Introduction This module calculates the power for an x cross-over design in which each subject receives a sequence of treatments and is measured at periods (or time points).

More information

ANOVA in SPSS. Hugo Quené. opleiding Taalwetenschap Universiteit Utrecht Trans 10, 3512 JK Utrecht.

ANOVA in SPSS. Hugo Quené. opleiding Taalwetenschap Universiteit Utrecht Trans 10, 3512 JK Utrecht. ANOVA in SPSS Hugo Quené hugo.quene@let.uu.nl opleiding Taalwetenschap Universiteit Utrecht Trans 10, 3512 JK Utrecht 7 Oct 2005 1 introduction In this example I ll use fictitious data, taken from http://www.ruf.rice.edu/~mickey/psyc339/notes/rmanova.html.

More information

T. Mark Beasley One-Way Repeated Measures ANOVA handout

T. Mark Beasley One-Way Repeated Measures ANOVA handout T. Mark Beasley One-Way Repeated Measures ANOVA handout Profile Analysis Example In the One-Way Repeated Measures ANOVA, two factors represent separate sources of variance. Their interaction presents an

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Design of Engineering Experiments Part 5 The 2 k Factorial Design

Design of Engineering Experiments Part 5 The 2 k Factorial Design Design of Engineering Experiments Part 5 The 2 k Factorial Design Text reference, Special case of the general factorial design; k factors, all at two levels The two levels are usually called low and high

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus Chapter 9 Hotelling s T 2 Test 9.1 One Sample The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus H A : µ µ 0. The test rejects H 0 if T 2 H = n(x µ 0 ) T S 1 (x µ 0 ) > n p F p,n

More information

Cointegration Lecture I: Introduction

Cointegration Lecture I: Introduction 1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic

More information

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7 Master of Applied Statistics ST116: Chemometrics and Multivariate Statistical data Analysis Per Bruun Brockhoff Module 4: Computing 4.1 Computing section.................................. 1 4.1.1 Example:

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

UV Absorbance by Fish Slime

UV Absorbance by Fish Slime Data Set 1: UV Absorbance by Fish Slime Statistical Setting This handout describes a repeated-measures ANOVA, with two crossed amongsubjects factors and repeated-measures on a third (within-subjects) factor.

More information

1 Introduction to One-way ANOVA

1 Introduction to One-way ANOVA Review Source: Chapter 10 - Analysis of Variance (ANOVA). Example Data Source: Example problem 10.1 (dataset: exp10-1.mtw) Link to Data: http://www.auburn.edu/~carpedm/courses/stat3610/textbookdata/minitab/

More information

Covariance Structure Approach to Within-Cases

Covariance Structure Approach to Within-Cases Covariance Structure Approach to Within-Cases Remember how the data file grapefruit1.data looks: Store sales1 sales2 sales3 1 62.1 61.3 60.8 2 58.2 57.9 55.1 3 51.6 49.2 46.2 4 53.7 51.5 48.3 5 61.4 58.7

More information

One-Way ANOVA Calculations: In-Class Exercise Psychology 311 Spring, 2013

One-Way ANOVA Calculations: In-Class Exercise Psychology 311 Spring, 2013 One-Way ANOVA Calculations: In-Class Exercise Psychology 311 Spring, 2013 1. You are planning an experiment that will involve 4 equally sized groups, including 3 experimental groups and a control. Each

More information

Stat 502 Design and Analysis of Experiments General Linear Model

Stat 502 Design and Analysis of Experiments General Linear Model 1 Stat 502 Design and Analysis of Experiments General Linear Model Fritz Scholz Department of Statistics, University of Washington December 6, 2013 2 General Linear Hypothesis We assume the data vector

More information

Cuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths

Cuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths Cuckoo Birds Analysis of Variance Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 29th November 2005 Cuckoo birds have a behavior in which they lay their

More information

Lecture 10: Experiments with Random Effects

Lecture 10: Experiments with Random Effects Lecture 10: Experiments with Random Effects Montgomery, Chapter 13 1 Lecture 10 Page 1 Example 1 A textile company weaves a fabric on a large number of looms. It would like the looms to be homogeneous

More information