Comparisons of Several Multivariate Populations
|
|
- Damian Sherman
- 6 years ago
- Views:
Transcription
1 Comparisons of Several Multivariate Populations Edps/Soc 584, Psych 594 Carolyn J Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017
2 Overview 1 way ANOVA Classic Treatment As a general linear model 1 way MANOVA The Model: Generalization of ANOVA to multivariate Hypothesis Testing Example 1: Massed vs distributed practice Multivariate General Linear Model and Example 2: Increased survival Following up to a significant result Multivariate contrasts Simultaneous confidence intervals Discriminant function Summary of PCA, MANOVA, DA SAS IML and PROC GLM Reading: Johnson & Wichern pages CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
3 Generalizing 1 way ANOVA to Multivariate Data and Generalizing multivariate T 2 to more than two populations Suppose that we have random samples from g populations and measures on p variables: Population 1: Population 2: Population g: x 11,x 12,,x 1n1 x 21,x 22,,x 2n2 x g1,x g2,,x gng where each x lj is a (p 1) vector CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
4 Examples: 5 standardized tests scores the same for high school students who attend different high school programs (ie, general, vo/tech, academic) Survival times measured in two ways different between those treated with supplemental vitamin C the over six types of cancer? Others? CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
5 Basic Assumptions Assumptions needed for Statistical Inference X l1,x l2,,x lnl is a random sample of size n l from a population with means µ l for l = 1,,g (ie, observations within populations are independent and representative of their populations) Random samples from different populations are independent All populations have the same covariance matrix, Σ X lj N(µ l,σ); that is, each population is multivariate normal If a population is not multivariate normal, then for large n l central limit theorem may kick-in CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
6 One-way ANOVA Review The univariate case where p = 1 Assumptions: X lj N(µ l,σ 2 ) iid for j = 1,,n l and l = 1,,g Hypotheses: H o : µ 1 = µ 2 = = µ g versus H a : not H o We usually express µ l as the sum of a grand mean and deviations from the grand mean µ l }{{} l th pop mean = µ }{{} grand mean = µ + τ l + µ l µ }{{} l th pop treatment effect If µ 1 = µ 2 = = µ g, then an equivalent way to write the null hypothesis is H o : τ 1 = τ 2 = = τ g = 0 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
7 The Model for an Observation X lj = µ+τ l +ǫ lj where ǫ lj N(0,σ 2 ) and independent ǫ lj is random error We typically impose the condition g l=1 τ l = 0 as an identification constraint The decomposition of an observation is X lj }{{} observation = X }{{} overall sample mean + ( X l X) }{{} estimated treatment effect +(X lj = X l ) }{{} residual error X is the estimator of µ ˆτ l = ( X l X) is the estimator of τ l (X lj X l ) is the estimator of ǫ lj CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
8 The Sums of Squares The sum of squared observations SS obs = SS total = g n l l=1 j=1 We also take the three components of X lj and form sums of squares ( g n l g ) SS mean = X 2 = n l X 2 SS treatment = SS res = l=1 j=1 n l g l=1 j=1 n l ˆτ 2 l = l=1 n l X 2 lj g ( X l X) 2 = l=1 j=1 n l g g ˆǫ 2 lj = (X lj X l ) 2 l=1 j=1 l=1 j=1 g n l ( X l X) 2 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72 l=1
9 Sums of Squared Decomposition & Geometry or SS obs = SS mean +SS tr +SS res SS corrected = SS obs SS mean = SS tr +SS res This work because the components (sums of squares) are orthogonal Geometry: Consider the n = ( g l=1 n l) dimensional observation space where each observation defines a dimension We break this space into three orthogonal sub-spaces corresponding to each component The dimensionality of the sub-space corresponds to the degrees of freedom for the corresponding SS (see text for more details) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
10 ANOVA Summary Table Let n + = g l=1 n l, the total sample size Source of Variation Sum of Squares df Treatment SS tr = ( g l=1 n ) l X l 2 g 1 Residual Total (SS obs SS mean ) (corrected for mean) = g l=1 SS res = g nl l=1 j=1 (X lj X l ) 2 n + g nl j=1 (X lj X) 2 n + 1 Test statistic for H o : µ 1 = = µ g (or H o : τ 1 = = τ g ) and its sampling distribution are Reject H o for F = SS tr/(g 1) SS res /(n + g) F (g 1),(n + g) large values of SS tr /SS res large values of 1+SS tr /SS res small values of (1+SS tr /SS res ) 1 = SSres SS tr+ss res CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
11 One-Way ANOVA as a GLM X 11 X 1n1 X 21 X 2n2 X g1 X gng = µ τ 1 τ 1 τ 2 τ 2 τ g 1 τ g 1 + ǫ 11 ǫ 1n1 ǫ 21 ǫ 2n2 ǫ g1 ǫ gng X n+ 1 }{{} Dependent = A n+ g }{{} Design Matrix β g 1 }{{} Parameters + ǫ n+ 1 }{{} Residuals CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
12 Least Squares Estimates of GLM How we get parameter estimates depends on how the design matrix is set up There are multiple ways of setting up the design matrix We ll use the rank g matrix A on the previous slide ˆβ = (A A) 1 A X ˆx = Aˆβ = { x l } n+ 1 ˆǫ = X ˆX = X A(A A) 1 A X = (I A(A A) 1 A )X Our hypothesis test of equal population means, H o : µ 1 = µ 2 = = µ g τ 1 = τ 2 = = τ g = 0 can be expressed as H o : Cβ = 0 where C is a contrast matrix CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
13 Testing Using C For example, C (g 1) g = So τ 1 τ 2 τ 2 τ 3 H o : Cβ = 0 = τ g 2 τ g 1 Our F test (given before) tests H o : Cβ = 0 From GLM framework, you can introduce continuous (numerical) variables ANOVA and multiple regression are essentially the same We can generalize the GLM to the multivariate GLM SAS PROC GLM will make more sense CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
14 One-Way MANOVA MANOVA model for comparing g population mean vectors parallels univariate ANOVA: ( X lj µ ) observation = overall ( τ l ) mean l + th treatment vector effect vector vector p 1 p 1 }{{} p 1 }{{} Random Fixed where ǫ lj N p (0,Σ) and all independent for j = 1,,n l cases per group, and l = 1,,g groups For Identification, g l=1 n lτ l = 0 + ǫ lj residual for l th group, j th case } p 1 {{ } Random CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
15 Observation Vectors Each component of X lj satisfies the 1-way ANOVA model, but now the model includes covariances among the component parts The covariances are assumed to be equal across populations A vector of observations can be decomposed as X lj ( Observation ) = X overall sample mean + ( X l X) estimated treatment + effect (X lj X l ) ( residual ) = ˆµ + ˆτ l + ˆǫ lj We also have a decomposition of sums-of-squares and crossproducts, or SSCP for short CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
16 Sums-of-Squares and Cross-Products (SSCP) First we ll find the total corrected squares and cross-products (x lj x)(x lj x) = [(x lj x l )+( x l x)][(x lj x l )+( x l x)] = (x lj x l )(x lj x l ) +( x l x)( x l x) }{{} squares & cross-products (x lj x l )( x l x) +( x l x)(x lj x l ) }{{} cross-products Next sum all of this over cases and groups Since addition is distributive, we ll do this in pieces looking just at cross-product first CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
17 Sum of Cross-Products n l (x lj x l )( x l x) = j=1 = n l (x lj x l ) ( x l x) j=1 n l j=1 x lj n l x l ( x l x) = n l ( x l x l )( x }{{} l x) = 0 0 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
18 Sum of Squares Now summing the rest over j and l we get g n l g (x lj x)(x lj x) = n l ( x l x)( x l x) l=1 j=1 l=1 g n l + (x lj x l )(x lj x l ) l=1 j=1 Total (corrected) SSCP = Treatment + Residual = Between Groups + Within Groups = Hypothesis + Error CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
19 S l is the sample covariance matrix for the l th group (treatment, condition, etc) W ( E ) is proportional to a pooled estimated of the common Σ CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72 Introduction 1-way ANOVA GLM 1-Way MANOVA H Testing Example 1 Following up Multivariate GLM Simultaneous CIs A Closer Look at Within Groups SSCP W = E = = g n l (x lj x l )(x lj x l ) l=1 j=1 n 1 n 2 (x 1j x 1 )(x 1j x 1 ) + (x 2j x 2 )(x 2j x 2 ) j=1 j=1 n g + (x gj x g )(x gj x g ) j=1 = W 1 +W 2 ++W g = (n 1 1)S 1 +(n 2 1)S 2 + +(n g 1)S g
20 Between Groups SSCP & Test Statistic With respect to between groups SSCP, g B = H = n l ( x l x)( x l x) = l=1 g n lˆτ lˆτ l If H o : τ 1 = τ 2 = = τ g = 0 is true, Then B (or H ) should be close to 0 To test H o, we consider the ratio of generalized SSCPs, Λ = W W+B = W T l=1 where T = W+B (ie, the total corrected SSCP) Λ is known as Wilk s Lambda It s equivalent to likelihood ratio statistic CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
21 Hypothesis Testing with Λ Λ is a ratio of generalized sampling variances Λ = W p i=1 = λ i T p i=1 λ i Where λ i s are eigenvalues of W, and λ i s are eigenvalues of T If H o : τ 1 = τ 2 = = τ g = 0 is true then B is close to 0 = T W = λi λ i = Λ close to 1 If H o : τ 1 = τ 2 = = τ g = 0 is false then B is not close 0 = values on diagonals of T, which will be positive, will be large = λi < λ i = Λ is small The exact distribution of Λ can be derived for special cases of p and g CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
22 where ν h = degrees of freedom for hypothesis, and ν e = degrees of freedom for error (residual) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72 Introduction 1-way ANOVA GLM 1-Way MANOVA H Testing Example 1 Following up Multivariate GLM Simultaneous CIs Distribution of Wilk s Lambda Λ Wilk s Λ = SSCP e SSCP e +SSCP h Number df for variables Hypothesis Sampling distribution for multivariate data p = 1 ν h 1 p = 2 ν h 1 p 1 ν h = 1 p 2 ν h = 2 ( νe ν h ) (1 Λ Λ ) Fνh,ν e ( )( ) νe 1 1 Λ ν h F Λ 2νh,2(ν e 1) ( ) νe+νh p (1 Λ ) p Λ Fp,(νe+ν h p) ( νe+ν h p 1 p )( ) 1 Λ F Λ 2p,2(νe+νh p 1)
23 Other Test Statistics There are more than one way to combine the information in B and W (or H and E) Wilk s Λ Λ = W W+B = E p E+H = i=1 λ i p i=1 λ i where λ i are eigenvalues of E, and λ i are eigenvalues of (E+H) Hotelling-Lawely Trace Criteria g = trace(e 1 H) = tr(he 1 ) = λ i where λ i is eigenvalue of HE 1 i=1 Reject Ho when tr(he 1 ) is large When Ho is true, tr(he 1 ) χ 2 p(g 1) Note: df = rank of design matrix (GLM approach) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
24 Pillai s Trace and Roy s Largest Root Pillai s Trace Criterion = trace(b(b+w) 1 ) = trace(h(h+e) 1 ) = where λ i is the eigenvalue (root) of HE 1 Roy s Largest Root Criterion p λ i i=1 1+ λ i θ = largest root of (E+H) 1 H = largest root of H(E+H) 1 ( ) λ1 = 1+ λ 1 where λ 1 is the largest root of E 1 H = HE 1 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
25 How They are All Related to Wilk s Λ Let λ i be root of HE 1 ( eigenvalue of H relative to E ) and if all λ i s > 0 (ie, λ 1 λ 2 λ p 0), Then we can write Λ E = E+H = E E I p +E 1 H = = = E E 1 I p +HE 1 1 I+HE 1 1 p i=1 (1+ λ i ) So Λ is a decreasing function of λ i λ i because various theorems CJ Anderson (Illinois) Comparisons of Several Multivariate Populations θ i Spring / 72
26 Which Test Statistic to Use Wilk s Λ = likelihood ratio statistic If all statistics lead to the same conclusion, use Λ If statistics lead to different conclusion, need to figure out why From Simulation studies (power & robustness): Roy s largest root found to be the least useful, except when the population structure is such that groups differ in one dimension and one group is much more different from the rest Others all do pretty good w/rt power (they use more information in E and H than Roys s) Pillai s trace criterion is Least affected by departures from usual population model (ie, more robust against departures from normality) Better for diffuse alternative hypotheses versus sharper ones When roots are approximately equal, it has best power Wilk s and Hotwelling-Lawley have about the same power for a wider-range (spectrum) of alternative hypotheses CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
27 Other Cases and Summary MANOVA For cases not coveredif H o is true and g lnλ ( n (p +g) ) = ( W / B+W ) l=1 n l = n is Large, then (n 1 12 ) (p +g) χ 2 p(g 1) You should examine the residual vectors for normality and outliers (ie, ˆǫ lj s)maybe use PCA or methods mentioned in the text Source of Wilk s variation SSCP df Λ Treatment B = g l=1 n l( x l x)( x l x) g 1 W / T (Between) Residual W = g nl l=1 j=1 (x jl x l )(x jl x l ) n g (Within) Total (corrected T = W+B n 1 for mean) = g nl l=1 j=1 (x jl x)(x jl x) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
28 Example 1: Distributed vs Massed Practice 1 Way MANOVA: Data from Tatsuoka (1988), Multivariate Analysis: Techniques for Educational and Psychological Research, pp (up-dated story) An experiment was conducted for comparing 2 methods (A & B) of teaching computer programing to 60 female seniors in a techincal training high school program Also of interest were the effects of distributed versus massed practice C 1 : 2 hours of instruction/day for 6 weeks C 2 : 3 hours of instruction/day for 4 weeks C 3 : 4 hours of instruction/day for 3 weeks Each subject received a total of 12 hours of instruction For now, we ll just look the effect of distributed versus massed practice Note: n l = 20 for l = 1,2,3 Two variables (dependent measures): X 1 = speed and X 2 = accuracy CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
29 Descriptive Statistics The overall mean vector and mean vectors for each condition: ( ) ( ) ( ) ( x = x = x = x = 1285 The treatment effect vectors (ie, ˆτ i = x i x) ( ) ( ) ( ˆτ 1 = ˆτ = ˆτ = 540 ) ) Sample covariance matrices: ( ) ( S 1 = S = ( ) S 3 = ) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
30 Means and Confidence Regions Accuracy % confidence regions for µ 1, µ 2 and µ 3 where n l = x x1 (2 hours/day for 6 weeks) x2 (3 hours/day for 4 weeks) x3 (4 hours/day for 3 weeks) Speed CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
31 Hypothesis Test No difference between massed versus distributed practice on either speed or accuracy: H o : τ 1 = τ 2 = τ 3 = 0 versus H a : τ l 0 for all l = 1,2,3 The within groups (residual) sums of squares and cross-products matrix W = (n 1 1)S 1 +(n 2 1)S 2 +(n 3 1)S 3 ( ) ( ) = ( ) ( ) = CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
32 Hypothesis Test continued The between groups SSCP matrix: 3 B = n l ( x l x)( x l x) l=1 ( 493 = ( = ) ( 038 (493,545) ( T = W+B = ) ( 532, 540) ) ( ) (038, 05) Or T = (n 1)S where S is the covariance matrix computed over all groups and n is the total sample Then B = T W CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72 )
33 Test Statistic & Distribution W = (177315)(28995) (41420) 2 = T = (282818)(146725) (152575) 2 = Λ = W T = = 0188 For p = 2 and g = 3, we can use the exact sample distribution: ( n g 1 g 1 ) ( 1 ) Λ = Λ ( n p 2 p ) ( 1 ) Λ F Λ 2(g 1),2(n g CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
34 Test Statistic & Distribution For this example, (60 3 1) (3 1) ( 1 ) 188 = ( ) 566 = Since F 4,112 (α = 05) F 4,120 (α = 05) = 245, reject H o that treatment vectors are all equal to 0 The data support the conclusion that there is an effect of massed versus distributed practice CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
35 Following up a Significant Result Multivariate contrasts & confidence regions Tests on individual variables (simultaneous confidence intervals for group/treatment differences) Discriminant Analysis Multivariate Contrasts We need the multivariate generalization of the general linear model: X gn p = A gn (g+1) B (g+1) p +E gn p where A is the design matrix (it could have g or g +1 columns depending on the parameterization), and B is a matrix of coefficients (model parameters) some examples CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
36 A is n + g with dummy codes X n+ p = β o1 β o2 β op β 11 β 12 β 1p β 21 β 22 β 2p β g 1,1 β g 1,2 β g 1,p Given the design matrix above, β ok = µ gk, and β lk = µ lk µ gk If p = 1, we would have 1-way ANOVA +E n+ p CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
37 SAS PROC GLM design: A is gn (g +1) An alternative design matrix and parameter vector: X n+ p = n + (g+1) β o1 β o2 β op β 11 β 12 β 1p β 21 β 22 β 2p β g 1,1 β g 1,2 β g 1,p β g,1 β g,2 β g,p +E n+ p Normally, ˆB = (A A) 1 A X; however, the rank of A defined above (and hence A A) is only g = There s no unique solution to A X = A AB CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
38 What s Interesting We re interested in differences between group means; that is, µ i µ k = (µ+τ i ) p 1 (µ+τ k ) p 1 = τ i τ k Even if we can t get unique estimates of elements of B, we can get unique estimates of differences between parameter estimates, which correspond to differences between group means regardless of what inverse of (A A) is used Moore-Penrose inverses of non-full rank square matrix (A A) is denoted by (A A) SAS PROC GLM uses the Moore-Penrose inverse of (A A) In SAS/PROC IML, the Moore-Penrose inverse is obtained by the command ginv( ), for example giaa = ginv(a *A); CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
39 Estimable and Testable What we can do is test linear combinations of elements of B if the linear combination is a contrast Estimable: A linear function c B is estimable if (A A)(A A) c = c Testable: A linear function is testable if it only involves the estimable functions of B Contrasts of elements of B are estimable and therefore testable These correspond to differences between means We ll demonstrate multivariate General linear model by example CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
40 Example: Cameron & Pauling Data Increase in survival of cancer patients given supplemental treatment with vitamin C Increase in survival = the number of days a patient survives minus the number of days matched control survives x 1 = d 1 = increase in survival measured as days from first hospitalization x 2 = d 2 = increase in survival measured days from un-treatability type = type of cancer (1 =stomach, 2 =bronchus, 3 =colon, 4 =rectum, 5 =bladder, 6 =kidney) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
41 Example: Descriptive statistics l Type n l x i = d i S l 1 Stomach Bronchus Colon Rectum Bladder Kidney S pool = 5 (n l 1)S l = l=1 ( ) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
42 Plot of Means and 95% Confidence Regions d2 (untreatable) colon x 3 x bronchus x 4 rectum 2 x x 1 stomach 6 kidney x 5 bladder d1 (1 st hospitalization) Using S i to the compute regions n 1 = 12, n 2 = 16, n 3 = 16, n 4 = 7, n 5 = 5, n 6 = 7 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
43 Plot of Means and 95% Confidence Regions d2 colon x 3 x bronchus x 4 rectum 2 x x 1 stomach 6 kidney x 5 bladder d1 Using S pool to the compute regions CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
44 Results of MANOVA Hypothesis Test H o : µ stomach = µ bronchus = µ colon = µ rectum = µ bladder = µ kidney or equivalently H o : τ stomach = τ bronchus = τ colon = τ rectum = τ bladder = τ kidney df type of cancer (hypothesis) ν h = g 1 = 6 1 = 5 df within (error) = ν e = l n l g = 63 6 = 57 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
45 Results of MANOVA Hypothesis Test continued Wilk s Λ = det(w)/det(t) = Since p = 2 dependent variables, Wilk s Λ has an exact sampling distribution that is F, in particular ( ) ( (νe 1) 1 ) Λ F = F 2νh,2ν Λ e ν h F = and p-value = 0005 Reject H o The data support the conclusion that not all of the means (or τ s are equal) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
46 Estimated MANOVA parameters ˆµ = ( Type hospitalization untreatable Stomach ˆτ 1 = ( , 71543) Bronchus ˆτ 2 = (156055, 59585) Colon ˆτ 3 = ( 88368, ) Rectum ˆτ 4 = (91873, 60111) Bladder ˆτ 5 = ( , 36660) Kidney ˆτ 6 = ( 86698, 64746) Recall that µ l = µ+τ l ) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
47 MANOVA as a multivariate GLM A main effect and six dummy variables (this is what PROC GLM does) So the design matrix looks like A n+ 7 = MANOVA (multivariate general linear model): y n+ 2 = A n+ 7B 7 2 +ǫ n+ 2 Estimation: B = (A A) A y }n stomach }n bronchus }n colon }n rectum }n bladder }n kidney Predicted values: ŷ = AB where ŷ jl = ( x 1l, x 2l ) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
48 MANOVA as a multivariate GLM (continued) For ŷ jl = ( x 1l, x 2l ), it is the the case that x j1l = b o1 +b l1 and x j2l = b o2 +b l2 So to compare two groups (types of cancer), x il x il = (b oi +b li ) (b oi +b l i) = b li b l i Consider a contrast between means for two types of cancer, for example, stomach and bronchial, b o1 b o2 b 11 b 12 c b 21 b 22 B = (0,1, 1,0,0,0,0) b 31 b 32 b 41 b 42 b 51 b 52 b 61 b 62 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
49 MANOVA as a multivariate GLM (continued) H o : c B = 0 c B = ((b 11 b 21 ),(b 12 b 22 )) = (( x 11 x 21 ),( x 12 x 22 )) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
50 With the Parameter Estimates B = stomach: x 11 = b o1 +b 11 = ( 25564) = 7067 x 12 = b o2 +b 12 = ( 6393) = 9491 bronchus: x 21 = b o1 +b 21 = ( 27681) = 4950 x 22 = b o2 +b 22 = ( 5197) = Ho : (0,1, 1,0,0,0,0)B = (( ),( )) = (2117, 1196) = (( x 11 x 21 ),( x 12 x 22 )) = 0 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
51 Testing H o : CBM = 0 Our hypothesis tests can be of the form H o : C r g B g p M p s = 0 r s C defines hypotheses (contrasts) on the elements of columns of B; that is, comparison between the means on the same variables over groups M defines hypotheses (contrasts) on the elements of rows of B; that is, comparison between the means on the same group over variables For now M = I and we ll consider hypotheses of the form H o : CB = 0 r p Specifically, we want to consider (for example) H o : 0b 0k +c 1 b 1k +c 2 b 2k + +c g b gk c 1 τ 1 +c 2 τ 2 + +c g τ g where g l=1 c l = 0 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
52 Testing Contrasts: The H matrix For a simple contrast, such as c = (0,1, 1,0,0), we could do this as a multivariate T 2 test for independent groups; however, we ll stay within the MANOVA and multivariate linear model framework (so we can test multiple ones) Suppose that we have a contrast matrix C r (g+1) where the rows are r orthogonal contrasts, the hypothesis matrix equals H = (CB) (C(A A) C ) 1 (CB) For a balanced design (ie, n 1 = n 2 = = n g = n) and a single contrast (ie, r = 1), this reduces to ( g )( g ) n H = g l=1 c c l x l c l x l l l=1 l=1 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
53 Testing Contrasts The Error matrix is W; that is, g n E = W = (x j x l )(x jl x l ) l=1 j=1 = X X B (A A)B Wilk s Lambda for the test H o : CB = 0 is Λ = det(e) det(h +E) To find the transformation of this to an F distribution: p = anything ν hypothesis = r ν error = l n l p CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
54 Example: Five types the same? Three equivalent forms for Hypothesis 1: H o : H o : H o : µ bronchus = µ colon = µ kidney = µ rectum = µ stomach τ bronchus = τ colon = τ kidney = τ rectum = τ stomach β bronchus = β colon = β kidney = β rectum = β stomach where β l is a p 1 column vector of B (ie, a row of B written as a column) For the contrast matrix we need to know the order of the effects in the GLM I re-order them so that they are in alphabetical order, because PROC GLM puts them in alphabetical order (or numerical if groups are coded this way) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
55 H o : Four types the same? H o : CB = 0 = β 01 β o2 β 11 β 12 β 21 β 22 β 31 β 32 β 41 β 42 β 51 β 52 β 61 β 61 intercept bladder bronchus colon kidney rectum stomach So H o : (β 21 β 31 ) (β 22 β 32 ) (β 21 β 41 ) (β 22 β 42 ) (β 21 β 51 ) (β 22 β 52 ) (β 21 β 61 ) (β 22 β 62 ) = (τ 21 τ 31 ) (τ 22 τ 32 ) (τ 21 τ 41 ) (τ 22 τ 42 ) (τ 21 τ 51 ) (τ 22 τ 52 ) (τ 21 τ 61 ) (τ 22 τ 62 ) = 0 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
56 Hypothesis Matrices H = (CB) (C(A A)(ns +nb +nc +nr +nd +nk 6)C ) 1 (CB) ( ) The E error SSCP is the same as W that we used before, which equals Wilk s Lambda, E = W = X X B(A A)B ( ) = Λ = det(e) det(h+e) = 44877E E13 = CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
57 Results ν h = number of rows of C = 4 ν e = number of rows of X p = 57 Referring to the table for transformations of Λ that have sampling distributions that are F, we use the one for p = 2 and ν h 1, which is ( νe 1 F = ν h ) ( 1 ) Λ = Λ ( ) ( 56 1 ) = If the null is true, then this should have a F 2νh,2(ν e 1) sampling distribution Comparing F = 145 to the F 4,112, we find that the p-value is 18 Retain the null hypothesis The data suggest no difference in increased survival of patients over different types of cancer (except bladder) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
58 Five versus the Rest H o : τ bladder = (τ bronchus +τ colon +τ kidney +τ rectum +τ stomach )/5 or equivalently H o : CB = 0 = (0, 5,1,1,1,1,1) β o1 β o2 β 11 β 12 β 21 β 22 β 31 β 32 β 41 β 42 β 51 β 52 β 61 β 61 E is the same as before, but now ( ) H = intercept bladder bronchus colon kidney rectum stomach CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
59 The Test and Result Λ = 44877E E13 = ν e = g l=1 n l g = 57 ν h = 1, the number of rows of C So, for ν h = 1, use( )( ) νe +ν h p 1 Λ F = p Λ = , which if the null is true (and assumptions valid), F should have a sampling distribution that is F p,(νe+ν h p) Comparing F to F 2,56, we get a p-value< 01 Reject H o Summary: The mean survival of patients with bladder cancer differs from that of those with other types of cancer; however, no support for differences between the other types Question: Are there differences for survival from first hospitalization CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
60 Simultaneous Confidence Intervals We can construct simultaneous confidence intervals for components of differences τ l τ l (which equal µ l µ l ) or other linear combinations such as τ 1 (τ 2 +τ 3 )/2 There are at least three ways of doing this Specify a matrix M in the hypothesis test H o : CBM = 0 that is a (p 1) vector with all M = (0,, 1 }{{} i th,0) Bonferroni-type: Same as above but split the α into pieces, on part for each of the planned comparisons Roys method, which is based on the union-intersection principle CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
61 Using CBM = 0 C picks out which two (or more groups to compare) eg, want to compare bladder with the rest, C = (0,1, 2, 2, 2, 2, 2) M picks out which variable (or linear combination of variables) eg, Just compare d 1, increase in survival from first hospitalization, M = (1,0) Putting these together in our example gives us β o1 β o2 β 11 β 12 β 21 β 22 (0,1, 2, 2, 2, 2, 2) β 31 β 32 β 41 β 42 β 51 β 52 β 61 β 61 ( ) ( 1 = β ) 6 β l1 l=2 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
62 Confidence interval for CBM β β l1 = τ 11 l=2 We need two things: A fudge-factor a value from a probability distribution An estimate of the standard error A (1 α)100% confidence statement given vectors C 1 (g+1) and M p 1 is 6 l=2 τ l1 CBM± F 1,νe (α) (M S pool M)(C(A A) C ) Note: Consider two columns of B, β i and β k, the covariance matrix between them is cov(β i,β k ) = s pool,ik (A A) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
63 Our example: CI for CBM β l=2 β l1 = ( ) = (M S pool M)(C(A A) C ) = s pool,11 ( ) = ( )( ) = And F 1,57 (05) = 401 So our 95% confidence interval is ± ± 401(30634) (56003,178691) Since 0 is not in the interval, the mean increase in survival from first hospitalization due to bladder cancer is larger than the average of the others means Should we test whether the same is true for increase in survival from time of untreatability? CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
64 Plot of Means d2 x 3 x 4 x 2 x x 6 1 x 5 d1 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
65 Notes about These CIs If you re only looking (testing) the difference between two means, eg, τ li τ l i Then the standard error is just s pool,ii 1/nl +1/n l When looking at a difference for a variable (eg, above), these confidence statements are equivalent to what you would get from 1-way ANOVA using Fisher s least significant differences; that is, they are univariate CIs When considering a linear combination of variables, these CIs are equivalent to univariate CIs where you ve analyzed a new or composite variable defined by the linear combination In our example, we don t have to worry too much about inflated Type I error rate, because we only did one CI after rejecting the overall test and using multivariate contrasts to narrow down where differences exist If you do all pairwise differences, there are g(g 1)/2 pairs times p variables (eg, 2(6)(5)/2 = 30) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
66 Bonferroni Intervals If you have planned to look at all pairwise comparisons before looking at the data (ie, m = pg(g 1)/2), then you can use as your fudge factor t νe (α/(2m)) Let n + = g l=1 n l For the model X lj = µ + τ l + ǫ lj with j = 1,,n l and l = 1,,g with confidence at least (1 α), (τ li τ l i) belongs to ( x li x l i)±t νe (α/(2m)) s pool,ii ( 1 n l + 1 n l for all components (variables) i = 1,,p and all differences l < l = 1,,g CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72 )
67 Roy s Method This is based on the union intersection principle This is more like the first method that we considered (ie, CBM); however, we use a different distribution for our fudge factor We use Greatest Root of H θ(e+h) = 0 where H (between groups or hypothesis SSCP matrix) and E = W (error or within groups SSCP matrix) are independent Wishart matrices To apply this result, we need percentiles of the greatest root distribution of the largest root λ of the equation H λe = 0 Percentile can be found in tables This distribution does not depend on Σ but only on df = n g p 1 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
68 Roy s Method Tables and charts of greatest root distribution exist; however, these are difficult to read (can find them in older literature) Recommendation: I ld suggest using Scheffé s method where you do 1-way ANOVA on a linear combination of variables and then specify the contrast that you want CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
69 A Truly Multivariate Follow-Up Discriminate Analysis The first discriminate function gives a linear combination of the p variables that yields the greatest differences between the means of the groups You can get p 1 functions They equal the characteristic roots of E 1 H For now, we ll just get them from SAS/PROC GLM CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
70 Summary: PCA, MANOVA, DA d2 x 3 x x x 2 4 x 6 x 1 1 st PC x 5 d1 1 st Discriminant CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
71 SAS IML and GLM SAS IML code using traditional approach & GLM one PROC GLM data=vitc; class type ; model d1 d2 = type /solution; * Note: The order of the values in the contrast are alphabetical, in this case order is bladder bronchus colon kidney rectum stomach; contrast bronchus=colon=kidney=rectum=stomach type , type , type , type ; contrast bladder vs others type ; manova h=type /printh printe; estimate b vs o type ; lsmeans type; title MANOVA of vitamin C and Cancer ; Alternate MANOVA statement where M is entered as M : manova h=type M=(1 0); CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
72 SAS GLM Output Univariate ANOVAs for each dependent variable If requested, E (printe) and H (printh) SSCP matrices p characteristic roots and vectors of E 1 H (ie, discriminant functions) Other requested statistics: contrasts estimates of contrasts cell means etc Test statistics for no overall effect specified in MANOVA statement Show SAS program and output CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring / 72
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Comparisons of Several Multivariate Populations Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS
More informationInferences about a Mean Vector
Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University
More informationOther hypotheses of interest (cont d)
Other hypotheses of interest (cont d) In addition to the simple null hypothesis of no treatment effects, we might wish to test other hypothesis of the general form (examples follow): H 0 : C k g β g p
More informationLecture 5: Hypothesis tests for more than one sample
1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated
More informationChapter 7, continued: MANOVA
Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) II Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 1 Compare Means from More Than Two
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Comparisons of Two Means Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c
More informationMultivariate analysis of variance and covariance
Introduction Multivariate analysis of variance and covariance Univariate ANOVA: have observations from several groups, numerical dependent variable. Ask whether dependent variable has same mean for each
More informationMultivariate Linear Models
Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Canonical Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Slide
More information5 Inferences about a Mean Vector
5 Inferences about a Mean Vector In this chapter we use the results from Chapter 2 through Chapter 4 to develop techniques for analyzing data. A large part of any analysis is concerned with inference that
More informationTHE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay Lecture 3: Comparisons between several multivariate means Key concepts: 1. Paired comparison & repeated
More information4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7
Master of Applied Statistics ST116: Chemometrics and Multivariate Statistical data Analysis Per Bruun Brockhoff Module 4: Computing 4.1 Computing section.................................. 1 4.1.1 Example:
More informationMore Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson
More Linear Algebra Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 17 for Applied Multivariate Analysis Outline Multivariate Analysis of Variance 1 Multivariate Analysis of Variance The hypotheses:
More informationAnalysis of variance, multivariate (MANOVA)
Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationMANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:
MULTIVARIATE ANALYSIS OF VARIANCE MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: 1. Cell sizes : o
More informationApplied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur
Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 29 Multivariate Linear Regression- Model
More informationRepeated Measures Part 2: Cartoon data
Repeated Measures Part 2: Cartoon data /*********************** cartoonglm.sas ******************/ options linesize=79 noovp formdlim='_'; title 'Cartoon Data: STA442/1008 F 2005'; proc format; /* value
More informationRepeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models
Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models EPSY 905: Multivariate Analysis Spring 2016 Lecture #12 April 20, 2016 EPSY 905: RM ANOVA, MANOVA, and Mixed Models
More informationSTAT 730 Chapter 5: Hypothesis Testing
STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The
More informationApplication of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM
Application of Ghosh, Grizzle and Sen s Nonparametric Methods in Longitudinal Studies Using SAS PROC GLM Chan Zeng and Gary O. Zerbe Department of Preventive Medicine and Biometrics University of Colorado
More informationMULTIVARIATE ANALYSIS OF VARIANCE
MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,
More informationANOVA Longitudinal Models for the Practice Effects Data: via GLM
Psyc 943 Lecture 25 page 1 ANOVA Longitudinal Models for the Practice Effects Data: via GLM Model 1. Saturated Means Model for Session, E-only Variances Model (BP) Variances Model: NO correlation, EQUAL
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,
More informationMultivariate Linear Regression Models
Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between
More informationGroup comparison test for independent samples
Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations
More informationTopic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model
Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is
More informationExample 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.
Simfit Tutorials and worked examples for simulation, curve fitting, statistical analysis, and plotting. http://www.simfit.org.uk MANOVA examples From the main SimFIT menu choose [Statistcs], [Multivariate],
More informationProfile Analysis Multivariate Regression
Lecture 8 October 12, 2005 Analysis Lecture #8-10/12/2005 Slide 1 of 68 Today s Lecture Profile analysis Today s Lecture Schedule : regression review multiple regression is due Thursday, October 27th,
More informationLeast Squares Estimation
Least Squares Estimation Using the least squares estimator for β we can obtain predicted values and compute residuals: Ŷ = Z ˆβ = Z(Z Z) 1 Z Y ˆɛ = Y Ŷ = Y Z(Z Z) 1 Z Y = [I Z(Z Z) 1 Z ]Y. The usual decomposition
More informationChapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance
Chapter 9 Multivariate and Within-cases Analysis 9.1 Multivariate Analysis of Variance Multivariate means more than one response variable at once. Why do it? Primarily because if you do parallel analyses
More informationMultivariate Regression (Chapter 10)
Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate
More informationMANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''
14 3! "#!$%# $# $&'('$)!! (Analysis of Variance : ANOVA) *& & "#!# +, ANOVA -& $ $ (+,$ ''$) *$#'$)!!#! (Multivariate Analysis of Variance : MANOVA).*& ANOVA *+,'$)$/*! $#/#-, $(,!0'%1)!', #($!#$ # *&,
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationAnalysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA
Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED Maribeth Johnson Medical College of Georgia Augusta, GA Overview Introduction to longitudinal data Describe the data for examples
More informationWITHIN-PARTICIPANT EXPERIMENTAL DESIGNS
1 WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS I. Single-factor designs: the model is: yij i j ij ij where: yij score for person j under treatment level i (i = 1,..., I; j = 1,..., n) overall mean βi treatment
More informationz = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is
Example X~N p (µ,σ); H 0 : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) H 0β : β µ = 0 test statistic for H 0β is y z = β βσβ /n And reject H 0β if z β > c [suitable critical value] 301 Reject H 0 if
More informationSTAT 501 EXAM I NAME Spring 1999
STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your
More informationAn Introduction to Multivariate Statistical Analysis
An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents
More informationCovariance Structure Approach to Within-Cases
Covariance Structure Approach to Within-Cases Remember how the data file grapefruit1.data looks: Store sales1 sales2 sales3 1 62.1 61.3 60.8 2 58.2 57.9 55.1 3 51.6 49.2 46.2 4 53.7 51.5 48.3 5 61.4 58.7
More informationNeuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:
1 Neuendorf MANOVA /MANCOVA Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y1 Y2 Y3 Y4 Like ANOVA/ANCOVA: 1. Assumes equal variance (equal covariance matrices) across cells (groups defined by
More informationRejection regions for the bivariate case
Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test
More informationIncomplete Block Designs
Incomplete Block Designs Recall: in randomized complete block design, each of a treatments was used once within each of b blocks. In some situations, it will not be possible to use each of a treatments
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Principal Analysis Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board
More informationRandom Intercept Models
Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept
More informationPrepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti
Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang Use in experiment, quasi-experiment
More informationNeuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:
1 Neuendorf MANOVA /MANCOVA Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) Y1 Y2 INTERACTIONS : Y3 X1 x X2 (A x B Interaction) Y4 Like ANOVA/ANCOVA: 1. Assumes equal variance (equal covariance matrices)
More informationChapter 2 Multivariate Normal Distribution
Chapter Multivariate Normal Distribution In this chapter, we define the univariate and multivariate normal distribution density functions and then we discuss the tests of differences of means for multiple
More informationOutline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013
Topic 19 - Inference - Fall 2013 Outline Inference for Means Differences in cell means Contrasts Multiplicity Topic 19 2 The Cell Means Model Expressed numerically Y ij = µ i + ε ij where µ i is the theoretical
More informationANOVA approaches to Repeated Measures. repeated measures MANOVA (chapter 3)
ANOVA approaches to Repeated Measures univariate repeated-measures ANOVA (chapter 2) repeated measures MANOVA (chapter 3) Assumptions Interval measurement and normally distributed errors (homogeneous across
More informationNeuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:
1 Neuendorf MANOVA /MANCOVA Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y1 Y2 Y3 Y4 Like ANOVA/ANCOVA: 1. Assumes equal variance (equal covariance matrices) across cells (groups defined by
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For
More informationSample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson
Sample Geometry Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring
More informationSOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM
SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM Junyong Park Bimal Sinha Department of Mathematics/Statistics University of Maryland, Baltimore Abstract In this paper we discuss the well known multivariate
More informationNotes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1
Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population
More informationRandom Matrices and Multivariate Statistical Analysis
Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical
More informationPOWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE
POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE Supported by Patrick Adebayo 1 and Ahmed Ibrahim 1 Department of Statistics, University of Ilorin, Kwara State, Nigeria Department
More informationTopic 28: Unequal Replication in Two-Way ANOVA
Topic 28: Unequal Replication in Two-Way ANOVA Outline Two-way ANOVA with unequal numbers of observations in the cells Data and model Regression approach Parameter estimates Previous analyses with constant
More informationSTAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.
STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems
More information20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =
20. ONE-WAY ANALYSIS OF VARIANCE 1 20.1. Balanced One-Way Classification Cell means parametrization: Y ij = µ i + ε ij, i = 1,..., I; j = 1,..., J, ε ij N(0, σ 2 ), In matrix form, Y = Xβ + ε, or 1 Y J
More informationVectors and Matrices Statistics with Vectors and Matrices
Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More informationGLM Repeated Measures
GLM Repeated Measures Notation The GLM (general linear model) procedure provides analysis of variance when the same measurement or measurements are made several times on each subject or case (repeated
More informationApplied Multivariate Analysis
Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Discriminant Analysis Background 1 Discriminant analysis Background General Setup for the Discriminant Analysis Descriptive
More informationMATH5745 Multivariate Methods Lecture 07
MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction
More informationWeek 14 Comparing k(> 2) Populations
Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.
More informationSerial Correlation. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology
Serial Correlation Edps/Psych/Stat 587 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 017 Model for Level 1 Residuals There are three sources
More informationLecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2
Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y
More informationMore about Single Factor Experiments
More about Single Factor Experiments 1 2 3 0 / 23 1 2 3 1 / 23 Parameter estimation Effect Model (1): Y ij = µ + A i + ɛ ij, Ji A i = 0 Estimation: µ + A i = y i. ˆµ = y..  i = y i. y.. Effect Modell
More information3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is
Stat 501 Solutions and Comments on Exam 1 Spring 005-4 0-4 1. (a) (5 points) Y ~ N, -1-4 34 (b) (5 points) X (X,X ) = (5,8) ~ N ( 11.5, 0.9375 ) 3 1 (c) (10 points, for each part) (i), (ii), and (v) are
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationModels for Clustered Data
Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA
More informationM A N O V A. Multivariate ANOVA. Data
M A N O V A Multivariate ANOVA V. Čekanavičius, G. Murauskas 1 Data k groups; Each respondent has m measurements; Observations are from the multivariate normal distribution. No outliers. Covariance matrices
More informationModels for Clustered Data
Models for Clustered Data Edps/Psych/Stat 587 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline Notation NELS88 data Fixed Effects ANOVA
More informationYou can compute the maximum likelihood estimate for the correlation
Stat 50 Solutions Comments on Assignment Spring 005. (a) _ 37.6 X = 6.5 5.8 97.84 Σ = 9.70 4.9 9.70 75.05 7.80 4.9 7.80 4.96 (b) 08.7 0 S = Σ = 03 9 6.58 03 305.6 30.89 6.58 30.89 5.5 (c) You can compute
More informationUV Absorbance by Fish Slime
Data Set 1: UV Absorbance by Fish Slime Statistical Setting This handout describes a repeated-measures ANOVA, with two crossed amongsubjects factors and repeated-measures on a third (within-subjects) factor.
More informationThe Random Effects Model Introduction
The Random Effects Model Introduction Sometimes, treatments included in experiment are randomly chosen from set of all possible treatments. Conclusions from such experiment can then be generalized to other
More informationLecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)
Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Rationale and MANOVA test statistics underlying principles MANOVA assumptions Univariate ANOVA Planned and unplanned Multivariate ANOVA
More informationTwo-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek
Two-factor studies STAT 525 Chapter 19 and 20 Professor Olga Vitek December 2, 2010 19 Overview Now have two factors (A and B) Suppose each factor has two levels Could analyze as one factor with 4 levels
More informationCHAPTER 2 SIMPLE LINEAR REGRESSION
CHAPTER 2 SIMPLE LINEAR REGRESSION 1 Examples: 1. Amherst, MA, annual mean temperatures, 1836 1997 2. Summer mean temperatures in Mount Airy (NC) and Charleston (SC), 1948 1996 Scatterplots outliers? influential
More informationT. Mark Beasley One-Way Repeated Measures ANOVA handout
T. Mark Beasley One-Way Repeated Measures ANOVA handout Profile Analysis Example In the One-Way Repeated Measures ANOVA, two factors represent separate sources of variance. Their interaction presents an
More informationChapter 5: Multivariate Analysis and Repeated Measures
Chapter 5: Multivariate Analysis and Repeated Measures Multivariate -- More than one dependent variable at once. Why do it? Primarily because if you do parallel analyses on lots of outcome measures, the
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS
ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing
More informationMultivariate Analysis of Variance
Chapter 15 Multivariate Analysis of Variance Jolicouer and Mosimann studied the relationship between the size and shape of painted turtles. The table below gives the length, width, and height (all in mm)
More informationRegression #5: Confidence Intervals and Hypothesis Testing (Part 1)
Regression #5: Confidence Intervals and Hypothesis Testing (Part 1) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #5 1 / 24 Introduction What is a confidence interval? To fix ideas, suppose
More informationSTA442/2101: Assignment 5
STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to
More information1 Overview. 2 Multiple Regression framework. Effect Coding. Hervé Abdi
In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Effect Coding Hervé Abdi 1 Overview Effect coding is a coding scheme used when an analysis of variance (anova) is performed
More informationOn Selecting Tests for Equality of Two Normal Mean Vectors
MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department
More informationOne-way ANOVA (Single-Factor CRD)
One-way ANOVA (Single-Factor CRD) STAT:5201 Week 3: Lecture 3 1 / 23 One-way ANOVA We have already described a completed randomized design (CRD) where treatments are randomly assigned to EUs. There is
More informationM M Cross-Over Designs
Chapter 568 Cross-Over Designs Introduction This module calculates the power for an x cross-over design in which each subject receives a sequence of treatments and is measured at periods (or time points).
More informationSTA 437: Applied Multivariate Statistics
Al Nosedal. University of Toronto. Winter 2015 1 Chapter 5. Tests on One or Two Mean Vectors If you can t explain it simply, you don t understand it well enough Albert Einstein. Definition Chapter 5. Tests
More informationMultivariate Data Analysis Notes & Solutions to Exercises 3
Notes & Solutions to Exercises 3 ) i) Measurements of cranial length x and cranial breadth x on 35 female frogs 7.683 0.90 gave x =(.860, 4.397) and S. Test the * 4.407 hypothesis that =. Using the result
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More information