THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay
|
|
- Lisa Osborne
- 6 years ago
- Views:
Transcription
1 THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay Lecture 3: Comparisons between several multivariate means Key concepts: 1. Paired comparison & repeated measures 2. Comparing means of two populations 3. Comparing means of several populations: One-way multivariate analysis of variance 4. Testing for equality of covariance matrices 5. Two-way multivariate analysis of variance 6. Profile analysis 7. Growth curves Key assumption: Normality or large sample sizes. 1 Paired comparison A procedure to eliminate the influence of extraneous unit-to-unit variation. Measures are taken on the same or identical units for different treatments. Recall the univariate paired t-test. Let X j1 and X j2 be the response of unit j to treatment 1 and 2, respectively, where j = 1,..., n. Let D j = X j1 X j2 be the difference between the treatments. This is so because both responses are from the same or idential unit. Assume D j N(δ, σ 2 d ). Consider the testing problem with H o : δ = 0 versus H a : δ 0. The paried t-test is t = D δ s d / n, where D = 1 n n j=1 D j and s 2 d = 1 n n 1 j=1 (D j D) 2. One rejects H o if and only if t t n 1 (α/2). The corresponding 100(1 α)% confidence interval for the mean difference δ = E(X j1 X j2 ) is D t n 1 (α/2) s d δ D + t n 1 (α/2) s d. n n Generalization: Suppose that p measurements are taken from each unit so that the responses are X 1ji and X 2ji, where X 1ji is the measure of the ith variable of j unit for treatment
2 1, and X 2ji is the measure of the ith variable of j unit for treatment 2. The difference is then D ji = X 1ji X 2ji and D j = (D j1,..., D jp ). Assume that E(D j ) = δ = (δ 1,..., δ p ) and cov(d j ) = Σ d. If we further assume D j N p (δ, Σ d ), then we can make inference about δ using the Hotelling s T 2 statistic where D = 1 n n j=1 T 2 = n( D δ) S 1 d ( D δ), D j and S 2 d = 1 n 1 n (D j D)(D j D) Result 6.1. Let the differences D 1,..., D n be a random sample from an N p (δ, Σ d ) population. Then, T 2 = n( D δ) S 1 d ( D δ) is distributed as an [(n 1)p/(n p)]f p,n p random variable. If n and n p are both large, T 2 is approximately distributed as χ 2 p random variable. Inference: Suppose the random sample consists of d 1,..., d n and the population is N p (δ, Σ d ). Then, reject H o : δ = 0 in favor of H a : δ 0 if j=1 T 2 = n d S 1 d d > (n 1)p n p F p,n p(α). A 100(1 α)% confidence region for δ is ( d δ) S 1 d ( d δ) (n 1)p n(n p) F p,n p(α). A 100(1 α)% simultaneous confidence intervals for the individual mean differences δ i are (n 1)p d i ± n p F s 2 d i p,n p(α) n, where d i is the ith element of d and s 2 di is the (i, i)th element of S d. The Bonferroni 100(1 α)% simultaneous confidence intervals for the individual mean differences are d i ± t n 1 (α/(2p)) s 2 d i n. Finally, if n and n p are sufficiently large, then the normality assumption can be dropped, and the simultaneous C.I.s can be obtained by replacing [(n 1)p/(n p)]f p,n p (α) by χ 2 p(α). Remark: The R programs of Chapter 5 can be used to perform paired comparison. Example: Consider the data on Table 6.1 of the text. 2
3 > x=read.table("t6-1.dat") > d1=x[,1]-x[,3] > d2=x[,2]-x[,4] > d=cbind(d1,d2) > source( Hotelling.R ) > Hotelling(d,rep(0,2)) [,1] Hoteliing-T p.value > source("cregion.r") > confreg(d) [1] "C.R. based on T^2" [,1] [,2] [1,] [2,] [1] "CR based on individual t" [,1] [,2] [1,] [2,] [1] "CR based on Bonferroni" [,1] [,2] [1,] [2,] [1] "Asymp. simu. CR" [,1] [,2] [1,] [2,] Contrast matrix. Definition: A p-dimensional vector is called a contrast vector if its elements sum to zero. By definition, contrast vectors are orthogonal to the vector of ones. A m k matrix is called a contrast matrix if all its rows are contrast vectors. For example, c = (1, 0, 1, 0) is a contrast vector. The above paired comparisons can be achieved by using contrast matrix. For example, consider the effluent data of Example 6.1. Instead of computing the differenced data, we can directly use the observations in Table 6.1. The observation for Sample 1 is x 1 = (6, 27, 25.15). Construct the contrast matrix C = [ ]. (1)
4 Clearly, the differenced data are d j = Cx j for j = 1,..., n. Furthermore, d = C x and S d = CSC, where S is the sample covariance matrix of the data. The T 2 statistic then becomes T 2 = n x C (CSC ) 1 C x. Consequently, there is no need to calculate the differenced data d. This idea is particularly useful in analyzing repeated measures in which different treatments are applied to each unit once over successive periods of time. Suppose there are q treatments, then the observation for the jth unit is X j = (X j1, X j2,..., X jq ), j = 1,..., n. Let µ = E(X). To test the hypothesis that all treatments have the same effect is equivalent to test all elements of µ are equal. To this end, we can construct the contrast matrix C 1 as µ 1 µ µ 1 µ 1 µ µ 2 or C 2 as. = µ 1 µ q µ 1 µ 2 µ 2 µ 3. = µ q 1 µ q µ q C 1µ, µ 1 µ 2. µ q C 2µ. The problem then is to test C 1 µ = 0 or C 2 µ = 0. [Other contrast matrices are available.] This results in using the T 2 test statistic as T 2 = n(c X) (CSC ) 1 C X. Remark. The T 2 statistic does not depend on the choice of contrast matrix C. This is because rank(c 1 ) = rank(c 2 ) = q 1 and each row of C i are orthogonal to the vector of ones. Consequently, the rows of C 1 and the rows of C 2 span the (q 1)-dimensional subspace that is orthogonal to 1 q. Thus, there exists a non-singular matrix B (q 1) (q 1) such that C 1 = BC 2. For instance, for the C 1 and C 2 matrices given above, we have B = It is then easy to show that C 1 and C 2 give the same T 2 statistic. 4
5 Based on the prior discussion, we can test the equality of treatments in a repeated measures case by using the result below. Consider an N q (µ, Σ) population. Let C be a contrast matrix. An α-level test of H o : Cµ = 0 versus H a : Cµ 0 is to reject H o if T 2 = n(c x) (CSC ) 1 C x > (n 1)(q 1) F q 1,n q+1 (α), n q + 1 where x and S are the sample mean and covariance matrix. A confidence region for the contrasts Cµ is n(c x Cµ) (CSC ) 1 (C x Cµ) (n 1)(q 1) F q 1,n q+1. n q + 1 Consequently, simultaneous 100(1 α)% confidence intervals for the single contrasts c C for any contrast vectors of interest are c x (n 1)(q 1) c Sc ± F q 1,n q+1 (α) n q + 1 n. Example. Consider the Sleeping-dog data in Table 6.2. There are 19 observations and four treatments. To analyze the data, a R program called contrast.r is developed. The analysis is given below: > x=read.table("t6-2.dat") > dim(x) [1] 19 4 > x V1 V2 V3 V
6 > source("contrast.r") > cmtx=matrix(c(-1,1,1,-1,-1,-1,1,1,-1,1,-1,1),3,4) > cmtx [,1] [,2] [,3] [,4] [1,] [2,] [3,] > contrast(x,cmtx) [1] "Hotelling Tsq statistics & p-value" [1] e e-07 [1] "Simultaneous C.I. for each contrast" [,1] [,2] [1,] [2,] [3,] Comparing mean vectors of two populations The setup 1. X 11, X 12,..., X 1,n1 are p-dimensional random sample of size n 1 from a population with mean µ 1 and covariance matrix Σ X 21, X 22,..., X 2,n1 are p-dimensional random sample of size n 2 from a population with mean µ 2 and covariance matrix Σ The two random samples are independent. If n 1 and n 2 are small, some additional assumptions are needed. They are 1. both populations are normal, 2. Σ 1 = Σ 2 = Σ. Problem of interest: H o : µ 1 µ 2 = δ o versus H a : µ 1 µ 2 δ 0. Denote the sample mean and covariance of the random samples by x 1 and S 1 and x 2 and S 2, respectively. Under the assumption that Σ 1 = Σ 2, we can obtain a pooled estimate of the covariance matrix S pool = n 1 1 n 1 + n 2 2 S 1 + n 2 1 n 1 + n 2 2 S 2. 6
7 This pooled estimate is consistent as E(S pool ) = Σ. Note that E( X 1 X 2 ) = µ 1 µ 2 and Cov( X 1 X 2 ) = Cov( X 1 ) + Cov( X 2 ) = 1 n 1 Σ + 1 n 2 Σ, which can be estimated by ( 1 n n 2 )S pool. The following result holds. Result 6.2. If X 11, X 12,..., X 1,n1 form a random sample from N p (µ 1, Σ) and X 21, X 22,..., X 2,n2 forms a random sample from N p (µ 2, Σ) and the two random samples are independent, then T 2 = [ X 1 X 2 (µ 1 µ 2 )] [( 1 n n 2 ) S pool ] 1 [ X 1 X 2 (µ 1 µ 2 )] is distributed as Consequently, where c 2 = (n 1+n 2 2)p (n 1 +n 2 p 1) F p,n 1 +n 2 p 1(α). (n 1 + n 2 2)p (n 1 + n 2 p 1) F p,n 1 +n 2 p 1. P r(t 2 c 2 ) = 1 α, Proof: (1) X 1 X 2 N p (µ 1 µ 2, [(1/n 1 ) + (1/n 2 )]Σ), (2) (n 1 1)S 1 W n1 1(Σ) and (n 2 1)S 2 W n2 1(Σ), and (3) (n 1 1)S 1 and (n 2 1)S 2 are independent so that (n 1 1)S 1 + (n 2 1)S 2 W n1 +n 2 2(Σ). Result 6.3. Let c 2 = (n 1+n 2 2)p (n 1 +n 2 p 1) F p,n 1 +n 2 p 1(α). With probability 1 α, ( ) a ( X 1 X 2 ) ± c a 1n1 + 1n2 S pool a will cover a (µ 1 µ 2 ) for all a. Thus, the simultaneous confidence intervals for µ 1i µ 2i is ( ( X 1i X 1 2i ) ± c + 1 ) s ii,pool, i = 1,..., p, n 1 n 2 where s ii,pool denotes the (i, i)th element of the matrix S pool. The Bonferroni 100(1 α)% simultaneous C.I. for µ 1i µ 2i are ( x 1i x 2i ) ± t n1 +n 2 2(α/(2p)) ( 1 n n 2 ) s ii,pool. Case: Σ 1 Σ 2. In this case, there is no pooling in covariance matrix estimation and ( we typically ) require that n 1 p and n 2 p are sufficiently large. One can then replace 1 n n 2 S pool by 1 n 1 S n 2 S 2 and F -distribution by χ 2 p distribution. 7
8 When n 1 = n 2 = n, then 1 S S 2 = 1 n 1 n 2 n (S 1 + S 2 ) = 2 ( 1 n 2 S ) 2 S 2 ( 1 = n + 1 ) ( ) (n 1)S 1 n (n 1) + (n 1) + (n 1)S 2 (n 1) + (n 1) ( 1 = n + 1 ) (n 1)S1 + (n 1)S 2 n n + n 2 ( 1 = n + 1 ) S pool. n Thus, where n 1 = n 2, the large sample procedure is essentially the same as the one using pooled covariance matrix. The impact of unequal covariance matrices is, therefore, least when the sample sizes are equal. The impact would be greater if either n 1 n 2 or n 2 n 1. The Behrens-Fisher Problem. Test H o : µ 1 µ 2 = 0 versus H a : µ 1 µ 2 0, where the two populations are normally distributed, but have different covariance matrices, and the sample sizes are not large. [Of course, n 1 > p and n 2 > p are needed in order to estimate the covariance matrices.] The key issue is the distribution of T 2 = [ X 1 X 2 (µ 1 µ 2 )] [ 1 n 1 S n 2 S 2 ] 1 [ X 1 X 2 (µ 1 µ 2 )] when n 1 p and n 2 p are small. This problem has been widely studied in the literature; see, for instance, Krishnamoorthy and Yu (2004, Statistics & Probability Letters) and Nel and Van der Merwe (1986, Communications in Statistics - Theory and Methods). A recommended method is to approximate the distribution of T 2 as T 2 vp = v p + 1 F p,v p+1, where v = 2 i=1 1 n i p + p { [ 2 ( ( ) ) ] tr n i S i n 1 S n 2 S 2 + ( [ 1 tr n i S i ( ) ]) n 1 S n 2 S 2 }, where min(n 1, n 2 ) v n 1 + n 2. Remark: A R script Behrens.R is available to perform the test. See course web. For illustration, consider the effluent data on Table 6.1. The paired comparison rejects the null hypothesis of equal means. The result of using Behrens-Fisher approach is given below. > x=read.table("t6-1.dat") 8
9 > dim(x) [1] 11 4 > x1=x[,1:2] > x2=x[,3:4] > source("behrens.r") > Behrens(x1,x2) [1] "Estimate of v: " [1] [1] "Test result:" [,1] Test-T p.value It also rejects the null hypothesis. 3 Comparing mean vectors of several populations Setup: g populations, and n l observations for population l. 1. {X l,1, X l,2,..., X l,nl } is a random sample pf size n l from a population with mean µ l, where l = 1,..., g. The random samples from different populations are independent. 2. All populations have a common covariance matrix Σ, which is positive definite. 3. Each population is multivariate normal with dimension p. The normality assumption in Condition 3 can be relaxed when the sample size is sufficiently large. Hypothesis of interest H o : µ 1 = µ 2 = = µ g versus H a : µ i µ j for some 1 i, j g and i j. Univariate case: Recall the case of p = 1. The null hypothesis of µ 1 = µ 2 = = µ l can be written as τ 1 = τ 2 = = τ l = 0, where τ j is the deviation of µ j from the overall mean µ, i.e., µ j = µ + τ j. The model can then be written as X l,j = µ + τ l + e l,j, l = 1,..., g; j = 1,..., n l, where e l,j N(0, σ 2 a). For unique identification of parameters, it is commonly assumed that g n lτ l = 0. For the data, an analogous decomposition is x l,j = x + ( x l x) + (x l,j x l ), 9
10 where x = ( g nl j=1 x l,j)/n with n = g n l, x l = ( n l j=1 x l,j)/n l. Here x is an estimate of µ, ( x l x) is an estimate of τ l and (x l,j x l ) is an estimate of the error term e l,j. Subtracting x from the prior equation, taking squares, and summing, we have the identity g n l (x l,j x) 2 = j=1 g n l ( x l x) 2 + g n l (x l,j x l ) 2. (2) j=1 The cross-product term drops because it is zero, indicating the terms are orthogonal to each other. This identity is often thought of as ( ) ( ) ( ) Sum of Squares Sum of Squares Sum of Squares = +. of Total Variations of Treatments of Residuals In addition, the number of independent quantities in each term of the above identity is related by g g n l 1 = (g 1) + (n l 1). This is known as the degrees of freedom for each term. The univariate Analysis of Variance Table (ANOVA) is a summary of the aboev results. Source Degrees of of variation Sum of Squares freedom Treatments SS tr = g n l( x l x) 2 g 1 Residuals SS res = g nl j=1 (x l,j x l ) 2 g n l g Total SS tot = g nl j=1 (x l,j x) 2 g n l 1 The usual F -test rejects the null hypothesis H o : τ 1 = τ 2 = = τ g = 0 at the α level if F = SS tr /(g 1) SS res / ( g n l g) > F g 1, n l g. The rational for the F-test is as follows. x l is an estimate of µ l so that the numerator is a weighted measure of the variation of x l between the g populations, where the weights depend on the sample size of each population. The issue then is to judge the magnitude of this variation. The denominator provides a reference measure of the variation because it is an estimate of the random variation (i.e., σ 2 ) of the data. If the variation between the populations is large with respect to the random noises, then the means are said to be different. Remark: The R command for univariate analysis of variance is aov. For illustration, consider the data in Example 6.7 of the text. The R analysis corresponding to that of Example 6.8 is as follows. 10
11 > x=c(1,1,1,2,2,3,3,3) > y=c(9,6,9,0,2,3,1,2) > g1=factor(x) > g1 [1] Levels: > help(aov) > m1=aov(y~g1) > m1 Call: aov(formula = y ~ g1) Terms: g1 Residuals Sum of Squares Deg. of Freedom 2 5 Residual standard error: Estimated effects may be unbalanced > summary(m1) Df Sum Sq Mean Sq F value Pr(>F) g ** Residuals Multivariate case. When p > 1, the model becomes X l,j = µ + τ l + e l,j, j = 1,..., n l ; l = 1,..., g where e l,j N p (0, Σ). As before, µ is the overall mean vector, and τ l denotes the lth treatment effect satisfying g n lτ l = 0. The data can be decomposed as x l,j = x + ( x l x) + (x l,j x l ). Subtracting x from the prior equation, post-multiplying by its own transpose and summing, we obtain g n l g g n l (x l,j x)(x l,j x) = n l ( x l x)( x l x) + (x l,j x l )(x l,j x l ), j=1 where, as in the univariate case, the cross-product term sums to zero. For ease in notation, we define g n l W = (x l,j x l )(x l,j x l ) j=1 j=1 = (n 1 1)S 1 + (n 2 1)S (n g 1)S g, 11
12 to represent the within population sum of squares and cross products matrix, and B = g n l ( x l x)( x l x) to denote the between population sum of squares and cross-products matrix. The hypothesis of no treatment effects, H o : τ 1 = τ 2 = = τ g = 0, is tested by considering the relative sizes of the treatment and residual sums of squares and cross-products. The multivariate analysis of variance (MANOVA) table is given by Source Matrix of sum of squares Degrees of of variation and cross-products freedom Treatment B = g n l( x l x)( x l x) g 1 Residuals W = g nl j=1 (x l,j x l )(x l,j x l ) g n l g Total B + W = g nl j=1 (x l,j x)(x l,j x) g n l g The test then involves generalized variances, i.e. Specifically, one rejects H o if g nl Λ = W B + W = g determinant of the covariance matrix. j=1 (x l,j x l )(x l,j x l ) nl j=1 (x l,j x)(x l,j x) is too small. This test statistics was proposed by Wilks and is commonly referred to as Wilk s lambda. The distribution of Λ is given in Table 6.3 of the text for some special cases (p. 303). For other cases and large sample sizes, a modification of Λ due to Bartlett (1938) can be used. Specifically, if H o is true and l n l = n is large, ( n 1 p + g ) ( ln(λ ) = n 1 p + g ) ( ) W ln, 2 2 B + W has approximately a chi-square distribution with p(g 1) degrees of freedom. Remark. The R command for multivariate analysis of variance is manova. Below are some examples. > help(manova) > help(summary.manova) ** Example 6.9 of the text on Page 304 and 305. > x=matrix(c(1,1,1,2,2,3,3,3,9,6,9,0,2,3,1,2,3,2,7,4,0,8,9,7),8,3) 12
13 > x [,1] [,2] [,3] [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] > fac1=factor(x[,1]) > xx=x[,2:3] > m2=manova(xx~fac1) > m2 Call: manova(xx ~ fac1) Terms: fac1 Residuals resp resp Deg. of Freedom 2 5 Residual standard error: Estimated effects may be unbalanced --- > summary(m2) Df Pillai approx F num Df den Df Pr(>F) fac ** Residuals > summary(m2,test= Wilks ) Df Wilks approx F num Df den Df Pr(>F) fac ** Residuals ** Another example ** > help(gl) % generates factors 13
14 > da=read.table("t6-9.dat") > dim(da) [1] 48 4 > y=cbind(da[,1],da[,2],da[,3]) > gen=factor(gl(2,24)) > gen [1] [25] Levels: 1 2 > m1=manova(y~gen) > m1 Call: manova(y ~ gen) Terms: gen Residuals resp resp resp Deg. of Freedom 1 46 Residual standard error: Estimated effects may be unbalanced > summary(m1,test="wilks") Df Wilks approx F num Df den Df Pr(>F) gen e-09 *** Residuals Signif. codes: 0 *** ** 0.01 * If the null hypothesis of equal mean vectors are rejected, then one can use simultaneous confidence intervals between the components of mean vectors to study the differences between the means. Or equivalently, consider the components of the treatment-effect vectors τ l. Consider the difference between the ith component of the treatment vectors τ k and τ l, i.e. τ k,i τ l,i. The sample estimate is ˆτ k,i ˆτ l,i = ( x k,i x i ) ( x l,i x i ) = x k,i x l,i. Since random samples are independent between populations, we have ( Var(ˆτ k,i ˆτ l,i ) = Var( X k,i X 1 l,i ) = + 1 ) σ ii, n k n l where σ ii is the (i, i)th element of Σ. In addition, ˆσ ii = w ii, where w n g ii is the (i, i)th element of W and n = g n l. For g populations with p dimensional data, there are p variables 14
15 and g(g 1)/2 pairwise differences so that the Bonferroni critical value if t n g (α/(2m)) with m = pg(g 1)/2. Consequently, with probability at least 1 α, τ k,i τ l,i belongs to ( ) ( α w ii 1 ( x k,i x l,i ) ± t n g + 1 ) pg(g 1) n g n k n l for all i = 1,... p and all differences l < k = 1,..., g. 4 Testing for equality of covariance matrices The setup: g populations and p variables. The covariance matrix of population j is Σ j, which is positive definite. H o : Σ 1 = Σ 2 = = Σ g = Σ versus H a : Σ i Σ j for some 1 i j g. The most commonly used test statistic is the Box s M test. It is a likelihood-ratio type of test. Under the normality assumption, the likelihood ratio statistic fof testing equality in covariance matrix is g ( ) (nl 1)/2 Sl Λ =, S pool where n l is the sample size of the lth population, S l is the sample covariance matrix of lth population and S pool = 1 g (n l 1) [(n 1 1)S 1 + (n 2 1)S (n g 1)S g ], is the pooled sample covariance matrix. Box s test is based on a χ 2 approximation to the sampling distribution of 2 ln(λ). Specifically, [ g ] g M 2 ln(λ) = (n l 1) ln( S pool ) [(n l 1) ln( S l )]. Under H o, S l are not expected to differ too much so that they should be close to S pool. In this case, the ratio of determinants should be close to 1 and the M-statistic will be small. Box s test. Let [ g ] [ ] 1 u = (n l 1) 1 2p 2 + 3p 1 g (n, l 1) 6(p + 1)(g 1) where p is the number of variables and g is the number of populations. Then {[ g ] } g C = (1 u)m = (1 u) (n l 1) ln( S pool ) [(n l 1) ln( S l )] 15
16 has an approximate χ 2 distribution with v = 1 p(p+1)(g 1) degrees of freedom. One rejects 2 H o if C > χ 2 p(p+1)(g 1)/2 (α). Remark: A simple R script, called Box-M.R is written to perform the Box-M test for equal covariance matrices. For illustration, consider the data in Table 6.1. The null hypothesis cannot be rejected at the 5% level. The program requires two input variables: (a) data set and (b) a vector of (n 1, n 2,..., n g ) of sample sizes. The data set are arranged in population ordering that matches the sample size vector. > source("box_m.r") > mm=box_m(y,nv) [1] "Test result:" [,1] Box.M-C p.value > names(mm) [1] "Box.M" "Test.Stat" "p.value" 5 Two-way multivariate analysis of variance Univariate case: The model is X lkr = µ + τ l + β k + γ lk + e lkr ; l = 1,..., g; k = 1,..., b; r = 1,..., n where g τ l = b k=1 β k = g γ lk = b k=1 γ lk = 0 and e lkr N(0, σ 2 ). Here µ is the overall mean, representing the general leve of response, τ l is the fixed effect of factor 1, β k is the fixed effect of factor 2, and γ lk is the interaction between factor 1 and factor 2. For the data, the corresponding decomposition is x lkr = x + ( x l. x) + ( x.k x) + ( x lk x l. x.k + x) + (x lkr x lk ), where x is the overall sample mean, x l. = 1 b n bn k=1 r=1 x lkr, x.k = 1 g n gn r=1 x lkr, and x lk = 1 n n r=1 x lkr. Subtracting x, squaring and summing, we have the identity g b n (x lkr x) 2 = k=1 r=1 + g b bn( x l. x) 2 + gn( x.k x) 2 g k=1 b n( x lk x l. x.k + x) 2 + k=1 k=1 r=1 g b n (x lkr x lk ) 2. This identity is commonly expressed as SS tot = SS fac1 + SS fac2 + SS int + SS res. 16
17 The corresponding degrees of freedom is gbn 1 = (g 1) + (b 1) + (g 1)(b 1) + gb(n 1). The univariate analysis of variance table is simply the summary of the prior two equations. Univariate Two-Way Analysis of Variance Table Source Sum of Degrees of Mean F- of variation Squares Freedom Squares ratio Factor 1 SS fac1 g 1 MS fac1 = SS fac1 g 1 Factor 2 SS fac2 b 1 MS fac2 = SS fac2 b 1 Interaction SS int (g 1)(b 1) MS int = SS int (g 1)(b 1) Residuals SS res gb(n 1) MSE = SSres gb(n 1) Total SS tot gbn 1 MS fac1 MSE MS fac2 MSE MS int MSE In the table, mean squares are defined as the sum of squares divided by its degrees of freedom. For instance, MSE = 1 g b n gb(n 1) k=1 r=1 (x lkr x lk ) 2, which is an estimate of σ 2. The hypothesis of no iteration, H o : γ lk = 0 for all l and k versus H a : γ lk 0 for some l and k, can be tested by the F -ratio F = MS int F MSE (g 1)(b 1),gb(n 1). Similar tests can be done for the factor effects. Multivariate case. The multivariate version of the model is X lkr = µ + τ l + β k + γ lk + e lkr, l = 1,..., g; k = 1,..., b; r = 1,..., n, where e lkr N p (0, Σ), and g τ l = b k=1 β k = g γ lk = b k=1 γ lk = 0. The corresponding decomposition for the data is x lkr = x + ( x l. x) + ( x.k x) + ( x lk x l. x.k + x) + (x lkr x lk. This leads to the identity g b n (x lkr x)(x lkr x) = k=1 r=1 + + Denote the identity as g b bn( x l. x)( x l. x) + gn( x.k x)( x.k x) g k=1 g k=1 b n( x lk x l. x.k + x)( x lk x l. x.k + x) b k=1 r=1 n (x lkr x lk )(x lkr x lk ). SSP tot = SSP fac1 + SSP fac2 + SSP int + SSP res, 17
18 where SSP stands for sum of squares and cross-products. The identity for the degrees of freedom remains the same as the univariate case. Similarly, we can construct a multivariate Two-Way Analysis of Variance table as the univariate case. However, the tests are conducted based on the generalized variances. A test of no interaction, is the likelihood ratio statistic H o : γ lk = 0 for all l, k vs H a : γ lk 0 some l, k Λ = SSP res SSP int + SSP res. Using Bartlett s approximation, one reject H o at the α level if [ gb(n 1) The main effect of factor 1 is tested by The test statistic is p + 1 (g 1)(b 1) 2 ] ln(λ ) > χ 2 (g 1)(b 1)p(α). H o : τ 1 = τ 2 = = τ g = 0 vs H a : τ l 0 for some l. Λ = The corresponding Bartlett s approximation is [ gb(n 1) SSP res SSP fac1 + SSP res. p + 1 (g 1) 2 Similarly, the main effect of factor 2 is tested by The test statistic is ] ln(λ ) χ 2 (g 1)p. H o : β 1 = β 2 = = β b = 0 vs H a : β k 0 for some k. Λ = The corresponding Bartlett s approximation is [ gb(n 1) SSP res SSP fac2 + SSP res. p + 1 (b 1) 2 ] ln(λ ) χ 2 (b 1)p. When a null hypothesis is rejected, one can consider the simultaneous confidence intervals (based on Bonferroni method) to conduct further study. 18
19 *** Third example **** Two factors > da=read.table("t6-4.dat") > da V1 V2 V3 V4 V > y=cbind(da[,3],da[,4],da[,5]) > fac1=factor(da[,1]) > fac1 [1] Levels: 0 1 > fac2=factor(da[,2]) **** Analyze individual response variables **** > y1=x[,3] > m1=aov(y1~fac1+fac2+fac1*fac2) > summary(m1) Df Sum Sq Mean Sq F value Pr(>F) fac ** fac * fac1:fac Residuals
20 > y2=x[,4] > m2=aov(y2~fac1+fac2+fac1*fac2) > summary(m2) Df Sum Sq Mean Sq F value Pr(>F) fac * fac fac1:fac Residuals > y3=x[,5] > m3=aov(y3~fac1+fac2+fac1*fac2) > summary(m3) Df Sum Sq Mean Sq F value Pr(>F) fac fac fac1:fac Residuals **** Joint analysis **** > m2=manova(y~fac1+fac2+fac1*fac2) > m2 Call: manova(y ~ fac1 + fac2 + fac1 * fac2) Terms: fac1 fac2 fac1:fac2 Residuals resp resp resp Deg. of Freedom Residual standard error: Estimated effects may be unbalanced > summary(m2,test="wilks") Df Wilks approx F num Df den Df Pr(>F) fac ** fac * fac1:fac Residuals Signif. codes: 0 *** ** 0.01 *
21 > summary(m2,test="pillai") Df Pillai approx F num Df den Df Pr(>F) fac ** fac * fac1:fac Residuals Signif. codes: 0 *** ** 0.01 * Profile analysis Profile analysis pertains to situations in which a battery of p treatments are administered to two or more groups of subjects, All responses must be expressed in similar units and the responses for the different groups are assumed to be independent of one another. In profile analysis, the question of equality of mean vectors is divided into several specific possibilities. For example, consider the case of two groups. The mean vectors are µ i = (µ i1, µ i2,..., µ ip ), where i = 1, 2. The questions of interest in profile analysis are 1. Are the profile parallel? Equivalently, is H o1 : µ 1j µ 1,j 1 = µ 2j µ 2,j 1 for j = 2, 3,..., p, acceptable? 2. Assuming that the profile are parallel, are the profile coincident? Equivalently, is H o2 : µ 1j = µ 2j for j = 1, 2,..., p, acceptable? 3. Assuming that the profile are coincident, are the profiles level? That is, are all means equal to the same values? Equivalently, is H o3 : µ 11 = µ 12 = = µ 1p = µ 21 = µ 22 = µ 2p acceptable? The null hypothesis H o1 can be written as where C is a contrast matrix C (p 1) p = H o1 : Cµ 1 = Cµ 2, The data can then be transfomed to obtain the samples {Cx 1j } n 1 j=1 and {Cx 2j} n 2 j=1, where n 1 and n 2 are the sample sizes of the two groups, respectively. Consequently,to test for parallel profiles for two normal populations, one rejects H o1 : Cµ 1 = Cµ 2 at the level α if. T 2 = ( x 1 x 2 ) C [( 1 n n 2 ) CS pool C ] 1 C( x 1 x 2 ) > c 2, 21
22 where c 2 = (n 1 + n 2 2)(p 1) F p 1,n1 +n n n 2 p 2 p(α). When the profiles are parallel, then either µ 1i > µ 2i or µ 1i < µ 2i for all i. Under this condition, the profiles will be coincident only if p i=1 µ 1i = p i=1 µ 2i, i.e. 1 µ 1 = 1 µ 2, where 1 is the p-dimensional vector of 1 s. Therefore, the second stage of the test is H o2 : 1 µ 1 = 1 µ 2. One can transform the data and apply the usual two-sample t-test. Specifically, to test for coincident profiles, given that the profiles are parallel, one rejects H o2 at the level α if [( 1 T 2 = 1 ( x 1 x 2 ) + 1 ) 1 1 S pool 1] 1 ( x 1 x 2 ) n 1 n 2 = 1 ( x 1 x 2 ) ( ) > t 2 n 1 +n 2 2(α/2). 1 n n 2 1 S pool 1 The next step is to check whether all variables have the same mean so that the common profile is level. When H o1 and H o2 hold, the common mean vector µ is estimated by x = n 1 n 1 + n 2 x 1 + n 2 n 1 + n 2 x 2. If the common profile is level, then µ 1 = µ 2 = = µ p and the third null hypothesis is H o3 : Cµ = 0, where C is defined in the step 1. Thus, to test for level profiles, given that profiles are coincident, one rejects H o3 at level α if (n 1 + n 2 ) x C [CSC ] 1 C x > c 2 where S is the sample covariance matrix based on all n 1 + n 2 observations and c 2 = (n 1 + n 2 1)(p 1) F p 1,n1 +n n 1 + n 2 p p+1(α). Remark: A R script profile.r is available on the course web to perform the three profile tests discussed. For illustration, consider the data on Table 6.14 of the textbook. The results are given below: > source("profile.r") > da=read.table( T6-14.DAT ) > x1=da[1:30,1:4] > x2=da[31:60,1:4] > cbind(x1,x2) V1 V2 V3 V4 V1 V2 V3 V
23 > profile(x1,x2) [1] "Are the profiles parallel?" [,1] Test-T p.value [1] "Are the profiles coincident?" [,1] [,2] Test-T p.value [1] "Are the profiles level?" [,1] [,2] [,3] Test-T e+01 p.value e-04 23
24 [,1] [,2] [,3] Test-T e+01 p.value e-04 7 Growth curves Growth curve is a special case of the repeated measure problem. Here a single treatment is applied to each subject and a single characteristic is observed over a period of time. For example, we could measure the weight of each puppy at birth and then once a month for a period of time. The weight curve of a dog that is of interest and, hence, is referred to as growth curve. Consider the example of Potthoff-Roy model for quadratic growth. Here p measurements on all subjects are taken at times t 1, t 2,..., t p, and the model is E(X) = E X 1 X 2. X p β 0 + β 1 t 1 + β 2 t 2 1 β 0 + β 1 t 2 + β 2 t 2 2 =., β 0 + β 1 t p + β 2 t 2 p where the ith mean µ i is the quadratic expression evaluated at t i. When several groups of subjects are involved, one likes to compare the growth curve among the groups. Assume that g groups of subjects are involved and for group l, the random sample consists of X l1,..., X l, n l, where n l > 0 is the sample size. Assumption: All of the X lj are independent and have the same covariance matrix Σ. Under the quadratic growth model, the mean vectors are 1 t 1 t t 2 t 2 2 E[X lj ] =... 1 t p t 2 p β l0 β l1 β l2 Bβ l. The model can easily be generalized to the qth-order polynomial. Under the assumption of multivariate normality, the MLE of the β l are where β l = (B S 1 pool B) 1 B S 1 pool X l, l = 1, 2,..., g, S pool = 1 N g [(n 1 1)S (n g 1)S g ] = 1 N g W with N = g n l is the pooled estimator of the common covariance matrix Σ. The estimate covariances of the maximum likelihood estimate (MLE) are Cov( β l ) = k n l (B S 1 pool B) 1, l = 1,..., g, 24
25 where k = (N g)(n g 1)/(N g p + q)(n g p + q + 1). The covariance between β i and β j are 0 for i j. To test that a qth-order polynomial is adequate, the model is fit without restrictions. That is, one fits the model separately to each group. The sum of squares and cross-product matrix then becomes g n l W q = (X lj B β l )(X lj B β l ) j=1 which has N g+p q 1 degrees of freedom. The likelihood ratio test of the null hypothesis that the qth-order polynomial is adequate can be based on the Wilk s lambda Λ = W W q. The difference in the number of parameters between the null and alternative hypothesis is g(p q 1) so that (N 12 ) (p q + g) ln(λ ) χ 2 (p q 1)g, when n l are sufficiently large. Remark: A R script for growth curve analysis, called growth.r, is available on the course web. For demonstration, consider the data on Tables 6.5 and 6.6. > source("growth.r") > x1=read.table( t6-5.dat ) > x2=read.table( t6-6.dat ) > x=rbind(x1,x2) > nv=c(nrow(x1),nrow(x2)) > pv=c(0,1,2,3) > growth(x,nv,pv,2) [1] "Growth curve model" [1] "Order: " [1] 2 [1] "Beta-hat: " [,1] [,2] [1,] [2,] [3,] [1] "Standard errors: " [,1] [,2] [1,] [2,] [3,] [1] "W" 25
26 V1 V2 V3 V4 V V V V [1] "Wq" V1 V2 V3 V4 V V V V [1] "Lambda:" [1] [1] "Test result:" [,1] LR-stat p.value
Lecture 5: Hypothesis tests for more than one sample
1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) II Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 1 Compare Means from More Than Two
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference
More information5 Inferences about a Mean Vector
5 Inferences about a Mean Vector In this chapter we use the results from Chapter 2 through Chapter 4 to develop techniques for analyzing data. A large part of any analysis is concerned with inference that
More informationOther hypotheses of interest (cont d)
Other hypotheses of interest (cont d) In addition to the simple null hypothesis of no treatment effects, we might wish to test other hypothesis of the general form (examples follow): H 0 : C k g β g p
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Comparisons of Several Multivariate Populations Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS
More informationTHE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay Lecture 5: Multivariate Multiple Linear Regression The model is Y n m = Z n (r+1) β (r+1) m + ɛ
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five
More informationChapter 7, continued: MANOVA
Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 17 for Applied Multivariate Analysis Outline Multivariate Analysis of Variance 1 Multivariate Analysis of Variance The hypotheses:
More informationComparisons of Several Multivariate Populations
Comparisons of Several Multivariate Populations Edps/Soc 584, Psych 594 Carolyn J Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees,
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Comparisons of Two Means Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c
More informationYou can compute the maximum likelihood estimate for the correlation
Stat 50 Solutions Comments on Assignment Spring 005. (a) _ 37.6 X = 6.5 5.8 97.84 Σ = 9.70 4.9 9.70 75.05 7.80 4.9 7.80 4.96 (b) 08.7 0 S = Σ = 03 9 6.58 03 305.6 30.89 6.58 30.89 5.5 (c) You can compute
More informationInferences about a Mean Vector
Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University
More informationThe legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.
1 Chapter 1: Research Design Principles The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization. 2 Chapter 2: Completely Randomized Design
More informationMultivariate Linear Regression Models
Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between
More informationGroup comparison test for independent samples
Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationSTAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.
STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems
More informationMultiple comparisons - subsequent inferences for two-way ANOVA
1 Multiple comparisons - subsequent inferences for two-way ANOVA the kinds of inferences to be made after the F tests of a two-way ANOVA depend on the results if none of the F tests lead to rejection of
More informationLeast Squares Estimation
Least Squares Estimation Using the least squares estimator for β we can obtain predicted values and compute residuals: Ŷ = Z ˆβ = Z(Z Z) 1 Z Y ˆɛ = Y Ŷ = Y Z(Z Z) 1 Z Y = [I Z(Z Z) 1 Z ]Y. The usual decomposition
More informationANOVA Longitudinal Models for the Practice Effects Data: via GLM
Psyc 943 Lecture 25 page 1 ANOVA Longitudinal Models for the Practice Effects Data: via GLM Model 1. Saturated Means Model for Session, E-only Variances Model (BP) Variances Model: NO correlation, EQUAL
More informationMANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''
14 3! "#!$%# $# $&'('$)!! (Analysis of Variance : ANOVA) *& & "#!# +, ANOVA -& $ $ (+,$ ''$) *$#'$)!!#! (Multivariate Analysis of Variance : MANOVA).*& ANOVA *+,'$)$/*! $#/#-, $(,!0'%1)!', #($!#$ # *&,
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationBIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA
BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA PART 1: NESTED ANOVA Nested designs are used when levels of one factor are not represented within all levels of another factor. Often this is
More informationExample 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.
Simfit Tutorials and worked examples for simulation, curve fitting, statistical analysis, and plotting. http://www.simfit.org.uk MANOVA examples From the main SimFIT menu choose [Statistcs], [Multivariate],
More informationMANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:
MULTIVARIATE ANALYSIS OF VARIANCE MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: 1. Cell sizes : o
More information3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is
Stat 501 Solutions and Comments on Exam 1 Spring 005-4 0-4 1. (a) (5 points) Y ~ N, -1-4 34 (b) (5 points) X (X,X ) = (5,8) ~ N ( 11.5, 0.9375 ) 3 1 (c) (10 points, for each part) (i), (ii), and (v) are
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationMean Vector Inferences
Mean Vector Inferences Lecture 5 September 21, 2005 Multivariate Analysis Lecture #5-9/21/2005 Slide 1 of 34 Today s Lecture Inferences about a Mean Vector (Chapter 5). Univariate versions of mean vector
More informationRepeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models
Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models EPSY 905: Multivariate Analysis Spring 2016 Lecture #12 April 20, 2016 EPSY 905: RM ANOVA, MANOVA, and Mixed Models
More informationApplied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur
Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 29 Multivariate Linear Regression- Model
More informationSMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning
SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance
More informationMULTIVARIATE ANALYSIS OF VARIANCE
MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,
More informationStatistics for EES Factorial analysis of variance
Statistics for EES Factorial analysis of variance Dirk Metzler June 12, 2015 Contents 1 ANOVA and F -Test 1 2 Pairwise comparisons and multiple testing 6 3 Non-parametric: The Kruskal-Wallis Test 9 1 ANOVA
More informationNotes on Maxwell & Delaney
Notes on Maxwell & Delaney PSY710 12 higher-order within-subject designs Chapter 11 discussed the analysis of data collected in experiments that had a single, within-subject factor. Here we extend those
More informationSTAT 730 Chapter 5: Hypothesis Testing
STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The
More informationSOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM
SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM Junyong Park Bimal Sinha Department of Mathematics/Statistics University of Maryland, Baltimore Abstract In this paper we discuss the well known multivariate
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationLecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)
Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Rationale and MANOVA test statistics underlying principles MANOVA assumptions Univariate ANOVA Planned and unplanned Multivariate ANOVA
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Two sample T 2 test 1 Two sample T 2 test 2 Analogous to the univariate context, we
More informationProfile Analysis Multivariate Regression
Lecture 8 October 12, 2005 Analysis Lecture #8-10/12/2005 Slide 1 of 68 Today s Lecture Profile analysis Today s Lecture Schedule : regression review multiple regression is due Thursday, October 27th,
More informationMultivariate analysis of variance and covariance
Introduction Multivariate analysis of variance and covariance Univariate ANOVA: have observations from several groups, numerical dependent variable. Ask whether dependent variable has same mean for each
More informationMultivariate Analysis of Variance
Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)
More informationRejection regions for the bivariate case
Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationApplied Multivariate Analysis
Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Discriminant Analysis Background 1 Discriminant analysis Background General Setup for the Discriminant Analysis Descriptive
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationSTAT 501 EXAM I NAME Spring 1999
STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your
More informationMultivariate Linear Models
Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements
More informationAnalysis of variance using orthogonal projections
Analysis of variance using orthogonal projections Rasmus Waagepetersen Abstract The purpose of this note is to show how statistical theory for inference in balanced ANOVA models can be conveniently developed
More informationSTT 843 Key to Homework 1 Spring 2018
STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ
More informationChapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance
Chapter 9 Multivariate and Within-cases Analysis 9.1 Multivariate Analysis of Variance Multivariate means more than one response variable at once. Why do it? Primarily because if you do parallel analyses
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For
More informationNotes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1
Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population
More informationANOVA approaches to Repeated Measures. repeated measures MANOVA (chapter 3)
ANOVA approaches to Repeated Measures univariate repeated-measures ANOVA (chapter 2) repeated measures MANOVA (chapter 3) Assumptions Interval measurement and normally distributed errors (homogeneous across
More information3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43
3. The F Test for Comparing Reduced vs. Full Models opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics 510 1 / 43 Assume the Gauss-Markov Model with normal errors: y = Xβ + ɛ, ɛ N(0, σ
More informationBooth School of Business, University of Chicago Business 41914, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Midterm
Booth School of Business, University of Chicago Business 41914, Spring Quarter 017, Mr Ruey S Tsay Solutions to Midterm Problem A: (51 points; 3 points per question) Answer briefly the following questions
More information4.1 Order Specification
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2009, Mr Ruey S Tsay Lecture 7: Structural Specification of VARMA Models continued 41 Order Specification Turn to data
More informationAnalysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร
Analysis of Variance ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร pawin@econ.tu.ac.th Outline Introduction One Factor Analysis of Variance Two Factor Analysis of Variance ANCOVA MANOVA Introduction
More informationMultivariate Data Analysis Notes & Solutions to Exercises 3
Notes & Solutions to Exercises 3 ) i) Measurements of cranial length x and cranial breadth x on 35 female frogs 7.683 0.90 gave x =(.860, 4.397) and S. Test the * 4.407 hypothesis that =. Using the result
More informationOn Selecting Tests for Equality of Two Normal Mean Vectors
MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department
More informationT-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum
T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationRandom Matrices and Multivariate Statistical Analysis
Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical
More informationAnalysis of variance, multivariate (MANOVA)
Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables
More informationMultiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model
Biostatistics 250 ANOVA Multiple Comparisons 1 ORIGIN 1 Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model When the omnibus F-Test for ANOVA rejects the null hypothesis that
More informationWeek 14 Comparing k(> 2) Populations
Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.
More information6 Multivariate Regression
6 Multivariate Regression 6.1 The Model a In multiple linear regression, we study the relationship between several input variables or regressors and a continuous target variable. Here, several target variables
More informationWITHIN-PARTICIPANT EXPERIMENTAL DESIGNS
1 WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS I. Single-factor designs: the model is: yij i j ij ij where: yij score for person j under treatment level i (i = 1,..., I; j = 1,..., n) overall mean βi treatment
More informationComparing two independent samples
In many applications it is necessary to compare two competing methods (for example, to compare treatment effects of a standard drug and an experimental drug). To compare two methods from statistical point
More informationMultivariate Regression (Chapter 10)
Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationMULTIVARIATE POPULATIONS
CHAPTER 5 MULTIVARIATE POPULATIONS 5. INTRODUCTION In the following chapters we will be dealing with a variety of problems concerning multivariate populations. The purpose of this chapter is to provide
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More information18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013
18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are
More informationM M Cross-Over Designs
Chapter 568 Cross-Over Designs Introduction This module calculates the power for an x cross-over design in which each subject receives a sequence of treatments and is measured at periods (or time points).
More informationANOVA in SPSS. Hugo Quené. opleiding Taalwetenschap Universiteit Utrecht Trans 10, 3512 JK Utrecht.
ANOVA in SPSS Hugo Quené hugo.quene@let.uu.nl opleiding Taalwetenschap Universiteit Utrecht Trans 10, 3512 JK Utrecht 7 Oct 2005 1 introduction In this example I ll use fictitious data, taken from http://www.ruf.rice.edu/~mickey/psyc339/notes/rmanova.html.
More informationT. Mark Beasley One-Way Repeated Measures ANOVA handout
T. Mark Beasley One-Way Repeated Measures ANOVA handout Profile Analysis Example In the One-Way Repeated Measures ANOVA, two factors represent separate sources of variance. Their interaction presents an
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationDesign of Engineering Experiments Part 5 The 2 k Factorial Design
Design of Engineering Experiments Part 5 The 2 k Factorial Design Text reference, Special case of the general factorial design; k factors, all at two levels The two levels are usually called low and high
More informationHypothesis Testing hypothesis testing approach
Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationChapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus
Chapter 9 Hotelling s T 2 Test 9.1 One Sample The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus H A : µ µ 0. The test rejects H 0 if T 2 H = n(x µ 0 ) T S 1 (x µ 0 ) > n p F p,n
More informationCointegration Lecture I: Introduction
1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic
More information4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7
Master of Applied Statistics ST116: Chemometrics and Multivariate Statistical data Analysis Per Bruun Brockhoff Module 4: Computing 4.1 Computing section.................................. 1 4.1.1 Example:
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationUV Absorbance by Fish Slime
Data Set 1: UV Absorbance by Fish Slime Statistical Setting This handout describes a repeated-measures ANOVA, with two crossed amongsubjects factors and repeated-measures on a third (within-subjects) factor.
More information1 Introduction to One-way ANOVA
Review Source: Chapter 10 - Analysis of Variance (ANOVA). Example Data Source: Example problem 10.1 (dataset: exp10-1.mtw) Link to Data: http://www.auburn.edu/~carpedm/courses/stat3610/textbookdata/minitab/
More informationCovariance Structure Approach to Within-Cases
Covariance Structure Approach to Within-Cases Remember how the data file grapefruit1.data looks: Store sales1 sales2 sales3 1 62.1 61.3 60.8 2 58.2 57.9 55.1 3 51.6 49.2 46.2 4 53.7 51.5 48.3 5 61.4 58.7
More informationOne-Way ANOVA Calculations: In-Class Exercise Psychology 311 Spring, 2013
One-Way ANOVA Calculations: In-Class Exercise Psychology 311 Spring, 2013 1. You are planning an experiment that will involve 4 equally sized groups, including 3 experimental groups and a control. Each
More informationStat 502 Design and Analysis of Experiments General Linear Model
1 Stat 502 Design and Analysis of Experiments General Linear Model Fritz Scholz Department of Statistics, University of Washington December 6, 2013 2 General Linear Hypothesis We assume the data vector
More informationCuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths
Cuckoo Birds Analysis of Variance Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 29th November 2005 Cuckoo birds have a behavior in which they lay their
More informationLecture 10: Experiments with Random Effects
Lecture 10: Experiments with Random Effects Montgomery, Chapter 13 1 Lecture 10 Page 1 Example 1 A textile company weaves a fabric on a large number of looms. It would like the looms to be homogeneous
More information