Comparisons of Several Multivariate Populations

Similar documents
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Inferences about a Mean Vector

Other hypotheses of interest (cont d)

Lecture 5: Hypothesis tests for more than one sample

Chapter 7, continued: MANOVA

Applied Multivariate and Longitudinal Data Analysis

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Multivariate analysis of variance and covariance

Multivariate Linear Models

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

5 Inferences about a Mean Vector

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Multivariate Statistical Analysis

Analysis of variance, multivariate (MANOVA)

Applied Multivariate and Longitudinal Data Analysis

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Repeated Measures Part 2: Cartoon data

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

STAT 730 Chapter 5: Hypothesis Testing

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

MULTIVARIATE ANALYSIS OF VARIANCE

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Multivariate Linear Regression Models

Group comparison test for independent samples

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

Profile Analysis Multivariate Regression

Least Squares Estimation

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

Multivariate Regression (Chapter 10)

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

z = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is

STAT 501 EXAM I NAME Spring 1999

An Introduction to Multivariate Statistical Analysis

Covariance Structure Approach to Within-Cases

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

Rejection regions for the bivariate case

Incomplete Block Designs

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Random Intercept Models

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Neuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:

Chapter 2 Multivariate Normal Distribution

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

ANOVA approaches to Repeated Measures. repeated measures MANOVA (chapter 3)

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

Multivariate Statistical Analysis

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Random Matrices and Multivariate Statistical Analysis

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

Topic 28: Unequal Replication in Two-Way ANOVA

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

Vectors and Matrices Statistics with Vectors and Matrices

Principal component analysis

GLM Repeated Measures

Applied Multivariate Analysis

MATH5745 Multivariate Methods Lecture 07

Week 14 Comparing k(> 2) Populations

Serial Correlation. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

More about Single Factor Experiments

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Models for Clustered Data

M A N O V A. Multivariate ANOVA. Data

Models for Clustered Data

You can compute the maximum likelihood estimate for the correlation

UV Absorbance by Fish Slime

The Random Effects Model Introduction

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

CHAPTER 2 SIMPLE LINEAR REGRESSION

T. Mark Beasley One-Way Repeated Measures ANOVA handout

Chapter 5: Multivariate Analysis and Repeated Measures

Ch 3: Multiple Linear Regression

Inference for the Regression Coefficient

STAT 525 Fall Final exam. Tuesday December 14, 2010

Correlation and the Analysis of Variance Approach to Simple Linear Regression

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Multivariate Analysis of Variance

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

STA442/2101: Assignment 5

1 Overview. 2 Multiple Regression framework. Effect Coding. Hervé Abdi

On Selecting Tests for Equality of Two Normal Mean Vectors

One-way ANOVA (Single-Factor CRD)

M M Cross-Over Designs

STA 437: Applied Multivariate Statistics

Multivariate Data Analysis Notes & Solutions to Exercises 3

Confidence Intervals, Testing and ANOVA Summary

Transcription:

Comparisons of Several Multivariate Populations Edps/Soc 584, Psych 594 Carolyn J Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017

Overview 1 way ANOVA Classic Treatment As a general linear model 1 way MANOVA The Model: Generalization of ANOVA to multivariate Hypothesis Testing Example 1: Massed vs distributed practice Multivariate General Linear Model and Example 2: Increased survival Following up to a significant result Multivariate contrasts Simultaneous confidence intervals Discriminant function Summary of PCA, MANOVA, DA SAS IML and PROC GLM Reading: Johnson & Wichern pages 296 323 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 21/ 72

Generalizing 1 way ANOVA to Multivariate Data and Generalizing multivariate T 2 to more than two populations Suppose that we have random samples from g populations and measures on p variables: Population 1: Population 2: Population g: x 11,x 12,,x 1n1 x 21,x 22,,x 2n2 x g1,x g2,,x gng where each x lj is a (p 1) vector CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 31/ 72

Examples: 5 standardized tests scores the same for high school students who attend different high school programs (ie, general, vo/tech, academic) Survival times measured in two ways different between those treated with supplemental vitamin C the over six types of cancer? Others? CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 41/ 72

Basic Assumptions Assumptions needed for Statistical Inference X l1,x l2,,x lnl is a random sample of size n l from a population with means µ l for l = 1,,g (ie, observations within populations are independent and representative of their populations) Random samples from different populations are independent All populations have the same covariance matrix, Σ X lj N(µ l,σ); that is, each population is multivariate normal If a population is not multivariate normal, then for large n l central limit theorem may kick-in CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 51/ 72

One-way ANOVA Review The univariate case where p = 1 Assumptions: X lj N(µ l,σ 2 ) iid for j = 1,,n l and l = 1,,g Hypotheses: H o : µ 1 = µ 2 = = µ g versus H a : not H o We usually express µ l as the sum of a grand mean and deviations from the grand mean µ l }{{} l th pop mean = µ }{{} grand mean = µ + τ l + µ l µ }{{} l th pop treatment effect If µ 1 = µ 2 = = µ g, then an equivalent way to write the null hypothesis is H o : τ 1 = τ 2 = = τ g = 0 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 61/ 72

The Model for an Observation X lj = µ+τ l +ǫ lj where ǫ lj N(0,σ 2 ) and independent ǫ lj is random error We typically impose the condition g l=1 τ l = 0 as an identification constraint The decomposition of an observation is X lj }{{} observation = X }{{} overall sample mean + ( X l X) }{{} estimated treatment effect +(X lj = X l ) }{{} residual error X is the estimator of µ ˆτ l = ( X l X) is the estimator of τ l (X lj X l ) is the estimator of ǫ lj CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 71/ 72

The Sums of Squares The sum of squared observations SS obs = SS total = g n l l=1 j=1 We also take the three components of X lj and form sums of squares ( g n l g ) SS mean = X 2 = n l X 2 SS treatment = SS res = l=1 j=1 n l g l=1 j=1 n l ˆτ 2 l = l=1 n l X 2 lj g ( X l X) 2 = l=1 j=1 n l g g ˆǫ 2 lj = (X lj X l ) 2 l=1 j=1 l=1 j=1 g n l ( X l X) 2 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 81/ 72 l=1

Sums of Squared Decomposition & Geometry or SS obs = SS mean +SS tr +SS res SS corrected = SS obs SS mean = SS tr +SS res This work because the components (sums of squares) are orthogonal Geometry: Consider the n = ( g l=1 n l) dimensional observation space where each observation defines a dimension We break this space into three orthogonal sub-spaces corresponding to each component The dimensionality of the sub-space corresponds to the degrees of freedom for the corresponding SS (see text for more details) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 91/ 72

ANOVA Summary Table Let n + = g l=1 n l, the total sample size Source of Variation Sum of Squares df Treatment SS tr = ( g l=1 n ) l X l 2 g 1 Residual Total (SS obs SS mean ) (corrected for mean) = g l=1 SS res = g nl l=1 j=1 (X lj X l ) 2 n + g nl j=1 (X lj X) 2 n + 1 Test statistic for H o : µ 1 = = µ g (or H o : τ 1 = = τ g ) and its sampling distribution are Reject H o for F = SS tr/(g 1) SS res /(n + g) F (g 1),(n + g) large values of SS tr /SS res large values of 1+SS tr /SS res small values of (1+SS tr /SS res ) 1 = SSres SS tr+ss res CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 101/ 72

One-Way ANOVA as a GLM X 11 X 1n1 X 21 X 2n2 X g1 X gng = 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 µ τ 1 τ 1 τ 2 τ 2 τ g 1 τ g 1 + ǫ 11 ǫ 1n1 ǫ 21 ǫ 2n2 ǫ g1 ǫ gng X n+ 1 }{{} Dependent = A n+ g }{{} Design Matrix β g 1 }{{} Parameters + ǫ n+ 1 }{{} Residuals CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 111/ 72

Least Squares Estimates of GLM How we get parameter estimates depends on how the design matrix is set up There are multiple ways of setting up the design matrix We ll use the rank g matrix A on the previous slide ˆβ = (A A) 1 A X ˆx = Aˆβ = { x l } n+ 1 ˆǫ = X ˆX = X A(A A) 1 A X = (I A(A A) 1 A )X Our hypothesis test of equal population means, H o : µ 1 = µ 2 = = µ g τ 1 = τ 2 = = τ g = 0 can be expressed as H o : Cβ = 0 where C is a contrast matrix CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 121/ 72

Testing Using C For example, 0 1 1 0 0 0 0 0 1 1 0 0 C (g 1) g = 0 0 0 0 1 1 So τ 1 τ 2 τ 2 τ 3 H o : Cβ = 0 = τ g 2 τ g 1 Our F test (given before) tests H o : Cβ = 0 From GLM framework, you can introduce continuous (numerical) variables ANOVA and multiple regression are essentially the same We can generalize the GLM to the multivariate GLM SAS PROC GLM will make more sense CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 131/ 72

One-Way MANOVA MANOVA model for comparing g population mean vectors parallels univariate ANOVA: ( X lj µ ) observation = overall ( τ l ) mean l + th treatment vector effect vector vector p 1 p 1 }{{} p 1 }{{} Random Fixed where ǫ lj N p (0,Σ) and all independent for j = 1,,n l cases per group, and l = 1,,g groups For Identification, g l=1 n lτ l = 0 + ǫ lj residual for l th group, j th case } p 1 {{ } Random CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 141/ 72

Observation Vectors Each component of X lj satisfies the 1-way ANOVA model, but now the model includes covariances among the component parts The covariances are assumed to be equal across populations A vector of observations can be decomposed as X lj ( Observation ) = X overall sample mean + ( X l X) estimated treatment + effect (X lj X l ) ( residual ) = ˆµ + ˆτ l + ˆǫ lj We also have a decomposition of sums-of-squares and crossproducts, or SSCP for short CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 151/ 72

Sums-of-Squares and Cross-Products (SSCP) First we ll find the total corrected squares and cross-products (x lj x)(x lj x) = [(x lj x l )+( x l x)][(x lj x l )+( x l x)] = (x lj x l )(x lj x l ) +( x l x)( x l x) }{{} squares & cross-products (x lj x l )( x l x) +( x l x)(x lj x l ) }{{} cross-products Next sum all of this over cases and groups Since addition is distributive, we ll do this in pieces looking just at cross-product first CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 161/ 72

Sum of Cross-Products n l (x lj x l )( x l x) = j=1 = n l (x lj x l ) ( x l x) j=1 n l j=1 x lj n l x l ( x l x) = n l ( x l x l )( x }{{} l x) = 0 0 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 171/ 72

Sum of Squares Now summing the rest over j and l we get g n l g (x lj x)(x lj x) = n l ( x l x)( x l x) l=1 j=1 l=1 g n l + (x lj x l )(x lj x l ) l=1 j=1 Total (corrected) SSCP = Treatment + Residual = Between Groups + Within Groups = Hypothesis + Error CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 181/ 72

S l is the sample covariance matrix for the l th group (treatment, condition, etc) W ( E ) is proportional to a pooled estimated of the common Σ CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 191/ 72 Introduction 1-way ANOVA GLM 1-Way MANOVA H Testing Example 1 Following up Multivariate GLM Simultaneous CIs A Closer Look at Within Groups SSCP W = E = = g n l (x lj x l )(x lj x l ) l=1 j=1 n 1 n 2 (x 1j x 1 )(x 1j x 1 ) + (x 2j x 2 )(x 2j x 2 ) j=1 j=1 n g + (x gj x g )(x gj x g ) j=1 = W 1 +W 2 ++W g = (n 1 1)S 1 +(n 2 1)S 2 + +(n g 1)S g

Between Groups SSCP & Test Statistic With respect to between groups SSCP, g B = H = n l ( x l x)( x l x) = l=1 g n lˆτ lˆτ l If H o : τ 1 = τ 2 = = τ g = 0 is true, Then B (or H ) should be close to 0 To test H o, we consider the ratio of generalized SSCPs, Λ = W W+B = W T l=1 where T = W+B (ie, the total corrected SSCP) Λ is known as Wilk s Lambda It s equivalent to likelihood ratio statistic CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 201/ 72

Hypothesis Testing with Λ Λ is a ratio of generalized sampling variances Λ = W p i=1 = λ i T p i=1 λ i Where λ i s are eigenvalues of W, and λ i s are eigenvalues of T If H o : τ 1 = τ 2 = = τ g = 0 is true then B is close to 0 = T W = λi λ i = Λ close to 1 If H o : τ 1 = τ 2 = = τ g = 0 is false then B is not close 0 = values on diagonals of T, which will be positive, will be large = λi < λ i = Λ is small The exact distribution of Λ can be derived for special cases of p and g CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 211/ 72

where ν h = degrees of freedom for hypothesis, and ν e = degrees of freedom for error (residual) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 221/ 72 Introduction 1-way ANOVA GLM 1-Way MANOVA H Testing Example 1 Following up Multivariate GLM Simultaneous CIs Distribution of Wilk s Lambda Λ Wilk s Λ = SSCP e SSCP e +SSCP h Number df for variables Hypothesis Sampling distribution for multivariate data p = 1 ν h 1 p = 2 ν h 1 p 1 ν h = 1 p 2 ν h = 2 ( νe ν h ) (1 Λ Λ ) Fνh,ν e ( )( ) νe 1 1 Λ ν h F Λ 2νh,2(ν e 1) ( ) νe+νh p (1 Λ ) p Λ Fp,(νe+ν h p) ( νe+ν h p 1 p )( ) 1 Λ F Λ 2p,2(νe+νh p 1)

Other Test Statistics There are more than one way to combine the information in B and W (or H and E) Wilk s Λ Λ = W W+B = E p E+H = i=1 λ i p i=1 λ i where λ i are eigenvalues of E, and λ i are eigenvalues of (E+H) Hotelling-Lawely Trace Criteria g = trace(e 1 H) = tr(he 1 ) = λ i where λ i is eigenvalue of HE 1 i=1 Reject Ho when tr(he 1 ) is large When Ho is true, tr(he 1 ) χ 2 p(g 1) Note: df = rank of design matrix (GLM approach) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 231/ 72

Pillai s Trace and Roy s Largest Root Pillai s Trace Criterion = trace(b(b+w) 1 ) = trace(h(h+e) 1 ) = where λ i is the eigenvalue (root) of HE 1 Roy s Largest Root Criterion p λ i i=1 1+ λ i θ = largest root of (E+H) 1 H = largest root of H(E+H) 1 ( ) λ1 = 1+ λ 1 where λ 1 is the largest root of E 1 H = HE 1 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 241/ 72

How They are All Related to Wilk s Λ Let λ i be root of HE 1 ( eigenvalue of H relative to E ) and if all λ i s > 0 (ie, λ 1 λ 2 λ p 0), Then we can write Λ E = E+H = E E I p +E 1 H = = = E E 1 I p +HE 1 1 I+HE 1 1 p i=1 (1+ λ i ) So Λ is a decreasing function of λ i λ i because various theorems CJ Anderson (Illinois) Comparisons of Several Multivariate Populations θ i Spring 2017 251/ 72

Which Test Statistic to Use Wilk s Λ = likelihood ratio statistic If all statistics lead to the same conclusion, use Λ If statistics lead to different conclusion, need to figure out why From Simulation studies (power & robustness): Roy s largest root found to be the least useful, except when the population structure is such that groups differ in one dimension and one group is much more different from the rest Others all do pretty good w/rt power (they use more information in E and H than Roys s) Pillai s trace criterion is Least affected by departures from usual population model (ie, more robust against departures from normality) Better for diffuse alternative hypotheses versus sharper ones When roots are approximately equal, it has best power Wilk s and Hotwelling-Lawley have about the same power for a wider-range (spectrum) of alternative hypotheses CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 261/ 72

Other Cases and Summary MANOVA For cases not coveredif H o is true and g lnλ ( n 1 1 2 (p +g) ) = ( W / B+W ) l=1 n l = n is Large, then (n 1 12 ) (p +g) χ 2 p(g 1) You should examine the residual vectors for normality and outliers (ie, ˆǫ lj s)maybe use PCA or methods mentioned in the text Source of Wilk s variation SSCP df Λ Treatment B = g l=1 n l( x l x)( x l x) g 1 W / T (Between) Residual W = g nl l=1 j=1 (x jl x l )(x jl x l ) n g (Within) Total (corrected T = W+B n 1 for mean) = g nl l=1 j=1 (x jl x)(x jl x) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 271/ 72

Example 1: Distributed vs Massed Practice 1 Way MANOVA: Data from Tatsuoka (1988), Multivariate Analysis: Techniques for Educational and Psychological Research, pp 273 279 (up-dated story) An experiment was conducted for comparing 2 methods (A & B) of teaching computer programing to 60 female seniors in a techincal training high school program Also of interest were the effects of distributed versus massed practice C 1 : 2 hours of instruction/day for 6 weeks C 2 : 3 hours of instruction/day for 4 weeks C 3 : 4 hours of instruction/day for 3 weeks Each subject received a total of 12 hours of instruction For now, we ll just look the effect of distributed versus massed practice Note: n l = 20 for l = 1,2,3 Two variables (dependent measures): X 1 = speed and X 2 = accuracy CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 281/ 72

Descriptive Statistics The overall mean vector and mean vectors for each condition: ( ) ( ) ( ) ( 3362 3855 3400 2830 x = x 1825 1 = x 2370 2 = x 1820 3 = 1285 The treatment effect vectors (ie, ˆτ i = x i x) ( ) ( ) ( 493 038 532 ˆτ 1 = ˆτ 545 2 = ˆτ 005 3 = 540 ) ) Sample covariance matrices: ( ) ( 4952 1317 2747 421 S 1 = S 1317 759 2 = 421 448 ( ) 1633 442 S 3 = 442 319 ) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 291/ 72

Means and Confidence Regions Accuracy 50 40 95% confidence regions for µ 1, µ 2 and µ 3 where n l = 20 30 20 10 x x1 (2 hours/day for 6 weeks) x2 (3 hours/day for 4 weeks) x3 (4 hours/day for 3 weeks) 0 10 20 30 40 50 Speed CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 301/ 72

Hypothesis Test No difference between massed versus distributed practice on either speed or accuracy: H o : τ 1 = τ 2 = τ 3 = 0 versus H a : τ l 0 for all l = 1,2,3 The within groups (residual) sums of squares and cross-products matrix W = (n 1 1)S 1 +(n 2 1)S 2 +(n 3 1)S 3 ( ) ( ) 4952 1317 2747 421 = 19 +19 1317 759 421 448 ( ) 1633 442 +19 442 319 ( ) 177315 41420 = 41420 28995 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 311/ 72

Hypothesis Test continued The between groups SSCP matrix: 3 B = n l ( x l x)( x l x) l=1 ( 493 = 20 545 ( 532 +20 540 = ) ( 038 (493,545) +20 005 ( 1055033 111155 111155 117730 T = W+B = ) ( 532, 540) ) ( 282818 152575 152575 146725 ) (038, 05) Or T = (n 1)S where S is the covariance matrix computed over all groups and n is the total sample Then B = T W CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 321/ 72 )

Test Statistic & Distribution W = (177315)(28995) (41420) 2 = 3425632 T = (282818)(146725) (152575) 2 = 18217389 Λ = W T = 3425632 18217389 = 0188 For p = 2 and g = 3, we can use the exact sample distribution: ( n g 1 g 1 ) ( 1 ) Λ = Λ ( n p 2 p ) ( 1 ) Λ F Λ 2(g 1),2(n g CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 331/ 72

Test Statistic & Distribution For this example, (60 3 1) (3 1) ( 1 ) 188 = 56 188 2 ( ) 566 = 36568 434 Since F 4,112 (α = 05) F 4,120 (α = 05) = 245, reject H o that treatment vectors are all equal to 0 The data support the conclusion that there is an effect of massed versus distributed practice CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 341/ 72

Following up a Significant Result Multivariate contrasts & confidence regions Tests on individual variables (simultaneous confidence intervals for group/treatment differences) Discriminant Analysis Multivariate Contrasts We need the multivariate generalization of the general linear model: X gn p = A gn (g+1) B (g+1) p +E gn p where A is the design matrix (it could have g or g +1 columns depending on the parameterization), and B is a matrix of coefficients (model parameters) some examples CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 351/ 72

A is n + g with dummy codes 1 1 0 0 1 1 0 0 1 0 1 0 X n+ p = 1 0 1 0 1 0 0 0 1 0 0 0 β o1 β o2 β op β 11 β 12 β 1p β 21 β 22 β 2p β g 1,1 β g 1,2 β g 1,p Given the design matrix above, β ok = µ gk, and β lk = µ lk µ gk If p = 1, we would have 1-way ANOVA +E n+ p CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 361/ 72

SAS PROC GLM design: A is gn (g +1) An alternative design matrix and parameter vector: 1 1 0 0 1 1 0 0 1 0 1 0 X n+ p = 1 0 1 0 1 0 0 1 1 0 0 1 n + (g+1) β o1 β o2 β op β 11 β 12 β 1p β 21 β 22 β 2p β g 1,1 β g 1,2 β g 1,p β g,1 β g,2 β g,p +E n+ p Normally, ˆB = (A A) 1 A X; however, the rank of A defined above (and hence A A) is only g = There s no unique solution to A X = A AB CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 371/ 72

What s Interesting We re interested in differences between group means; that is, µ i µ k = (µ+τ i ) p 1 (µ+τ k ) p 1 = τ i τ k Even if we can t get unique estimates of elements of B, we can get unique estimates of differences between parameter estimates, which correspond to differences between group means regardless of what inverse of (A A) is used Moore-Penrose inverses of non-full rank square matrix (A A) is denoted by (A A) SAS PROC GLM uses the Moore-Penrose inverse of (A A) In SAS/PROC IML, the Moore-Penrose inverse is obtained by the command ginv( ), for example giaa = ginv(a *A); CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 381/ 72

Estimable and Testable What we can do is test linear combinations of elements of B if the linear combination is a contrast Estimable: A linear function c B is estimable if (A A)(A A) c = c Testable: A linear function is testable if it only involves the estimable functions of B Contrasts of elements of B are estimable and therefore testable These correspond to differences between means We ll demonstrate multivariate General linear model by example CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 391/ 72

Example: Cameron & Pauling Data Increase in survival of cancer patients given supplemental treatment with vitamin C Increase in survival = the number of days a patient survives minus the number of days matched control survives x 1 = d 1 = increase in survival measured as days from first hospitalization x 2 = d 2 = increase in survival measured days from un-treatability type = type of cancer (1 =stomach, 2 =bronchus, 3 =colon, 4 =rectum, 5 =bladder, 6 =kidney) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 401/ 72

Example: Descriptive statistics l Type n l x i = d i S l 1 Stomach 12 7067 12586988 1520670 9492 1520670 1223536 2 Bronchus 16 4950 2391507 1746147 10688 1746147 1661945 3 Colon 16 11719 25202776 14788476 29319 14788476 11827456 4 Rectum 7 29743 50871562 15525005 22657 15525005 5634062 5 Bladder 5 130420 374766320 21407105 12980 21407105 1469770 6 Kidney 7 11886 12901848 334462 10171 334462 1739291 S pool = 5 (n l 1)S l = l=1 ( ) 41966842 7681585 7681585 45848122 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 411/ 72

Plot of Means and 95% Confidence Regions d2 (untreatable) colon x 3 x bronchus x 4 rectum 2 x x 1 stomach 6 kidney x 5 bladder d1 (1 st hospitalization) Using S i to the compute regions n 1 = 12, n 2 = 16, n 3 = 16, n 4 = 7, n 5 = 5, n 6 = 7 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 421/ 72

Plot of Means and 95% Confidence Regions d2 colon x 3 x bronchus x 4 rectum 2 x x 1 stomach 6 kidney x 5 bladder d1 Using S pool to the compute regions CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 431/ 72

Results of MANOVA Hypothesis Test H o : µ stomach = µ bronchus = µ colon = µ rectum = µ bladder = µ kidney or equivalently H o : τ stomach = τ bronchus = τ colon = τ rectum = τ bladder = τ kidney df type of cancer (hypothesis) ν h = g 1 = 6 1 = 5 df within (error) = ν e = l n l g = 63 6 = 57 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 441/ 72

Results of MANOVA Hypothesis Test continued Wilk s Λ = det(w)/det(t) = 05817749 Since p = 2 dependent variables, Wilk s Λ has an exact sampling distribution that is F, in particular ( ) ( (νe 1) 1 ) Λ F = F 2νh,2ν Λ e ν h F = 34838 and p-value = 0005 Reject H o The data support the conclusion that not all of the means (or τ s are equal) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 451/ 72

Estimated MANOVA parameters ˆµ = ( 20556 16646 Type hospitalization untreatable Stomach ˆτ 1 = ( 134888, 71543) Bronchus ˆτ 2 = (156055, 59585) Colon ˆτ 3 = ( 88368, 126727) Rectum ˆτ 4 = (91873, 60111) Bladder ˆτ 5 = (1098644, 36660) Kidney ˆτ 6 = ( 86698, 64746) Recall that µ l = µ+τ l ) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 461/ 72

MANOVA as a multivariate GLM A main effect and six dummy variables (this is what PROC GLM does) So the design matrix looks like A n+ 7 = 1 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 MANOVA (multivariate general linear model): y n+ 2 = A n+ 7B 7 2 +ǫ n+ 2 Estimation: B = (A A) A y }n stomach }n bronchus }n colon }n rectum }n bladder }n kidney Predicted values: ŷ = AB where ŷ jl = ( x 1l, x 2l ) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 471/ 72

MANOVA as a multivariate GLM (continued) For ŷ jl = ( x 1l, x 2l ), it is the the case that x j1l = b o1 +b l1 and x j2l = b o2 +b l2 So to compare two groups (types of cancer), x il x il = (b oi +b li ) (b oi +b l i) = b li b l i Consider a contrast between means for two types of cancer, for example, stomach and bronchial, b o1 b o2 b 11 b 12 c b 21 b 22 B = (0,1, 1,0,0,0,0) b 31 b 32 b 41 b 42 b 51 b 52 b 61 b 62 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 481/ 72

MANOVA as a multivariate GLM (continued) H o : c B = 0 c B = ((b 11 b 21 ),(b 12 b 22 )) = (( x 11 x 21 ),( x 12 x 22 )) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 491/ 72

With the Parameter Estimates B = 32631 15884 25564 6393 27681 5197 20912 13434 2888 6773 97789 2904 stomach: x 11 = b o1 +b 11 = 32631+( 25564) = 7067 x 12 = b o2 +b 12 = 15884+( 6393) = 9491 bronchus: x 21 = b o1 +b 21 = 32631+( 27681) = 4950 x 22 = b o2 +b 22 = 15884+( 5197) = 10687 Ho : (0,1, 1,0,0,0,0)B = (( 25564+27681),( 6393 5197)) = (2117, 1196) = (( x 11 x 21 ),( x 12 x 22 )) = 0 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 501/ 72

Testing H o : CBM = 0 Our hypothesis tests can be of the form H o : C r g B g p M p s = 0 r s C defines hypotheses (contrasts) on the elements of columns of B; that is, comparison between the means on the same variables over groups M defines hypotheses (contrasts) on the elements of rows of B; that is, comparison between the means on the same group over variables For now M = I and we ll consider hypotheses of the form H o : CB = 0 r p Specifically, we want to consider (for example) H o : 0b 0k +c 1 b 1k +c 2 b 2k + +c g b gk c 1 τ 1 +c 2 τ 2 + +c g τ g where g l=1 c l = 0 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 511/ 72

Testing Contrasts: The H matrix For a simple contrast, such as c = (0,1, 1,0,0), we could do this as a multivariate T 2 test for independent groups; however, we ll stay within the MANOVA and multivariate linear model framework (so we can test multiple ones) Suppose that we have a contrast matrix C r (g+1) where the rows are r orthogonal contrasts, the hypothesis matrix equals H = (CB) (C(A A) C ) 1 (CB) For a balanced design (ie, n 1 = n 2 = = n g = n) and a single contrast (ie, r = 1), this reduces to ( g )( g ) n H = g l=1 c c l x l c l x l l l=1 l=1 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 521/ 72

Testing Contrasts The Error matrix is W; that is, g n E = W = (x j x l )(x jl x l ) l=1 j=1 = X X B (A A)B Wilk s Lambda for the test H o : CB = 0 is Λ = det(e) det(h +E) To find the transformation of this to an F distribution: p = anything ν hypothesis = r ν error = l n l p CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 531/ 72

Example: Five types the same? Three equivalent forms for Hypothesis 1: H o : H o : H o : µ bronchus = µ colon = µ kidney = µ rectum = µ stomach τ bronchus = τ colon = τ kidney = τ rectum = τ stomach β bronchus = β colon = β kidney = β rectum = β stomach where β l is a p 1 column vector of B (ie, a row of B written as a column) For the contrast matrix we need to know the order of the effects in the GLM I re-order them so that they are in alphabetical order, because PROC GLM puts them in alphabetical order (or numerical if groups are coded this way) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 541/ 72

H o : Four types the same? H o : CB = 0 = 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 β 01 β o2 β 11 β 12 β 21 β 22 β 31 β 32 β 41 β 42 β 51 β 52 β 61 β 61 intercept bladder bronchus colon kidney rectum stomach So H o : (β 21 β 31 ) (β 22 β 32 ) (β 21 β 41 ) (β 22 β 42 ) (β 21 β 51 ) (β 22 β 52 ) (β 21 β 61 ) (β 22 β 62 ) = (τ 21 τ 31 ) (τ 22 τ 32 ) (τ 21 τ 41 ) (τ 22 τ 42 ) (τ 21 τ 51 ) (τ 22 τ 52 ) (τ 21 τ 61 ) (τ 22 τ 62 ) = 0 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 551/ 72

Hypothesis Matrices H = (CB) (C(A A)(ns +nb +nc +nr +nd +nk 6)C ) 1 (CB) ( ) 32436992785 18071748204 18071748204 42924340815 The E error SSCP is the same as W that we used before, which equals Wilk s Lambda, E = W = X X B(A A)B ( ) 24340768476 44553193042 = 44553193042 2659191047 Λ = det(e) det(h+e) = 44877E13 54684E13 = 0820661 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 561/ 72

Results ν h = number of rows of C = 4 ν e = number of rows of X p = 57 Referring to the table for transformations of Λ that have sampling distributions that are F, we use the one for p = 2 and ν h 1, which is ( νe 1 F = ν h ) ( 1 ) Λ = Λ ( ) ( 56 1 ) 0820661 = 14541861 4 0820661 If the null is true, then this should have a F 2νh,2(ν e 1) sampling distribution Comparing F = 145 to the F 4,112, we find that the p-value is 18 Retain the null hypothesis The data suggest no difference in increased survival of patients over different types of cancer (except bladder) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 571/ 72

Five versus the Rest H o : τ bladder = (τ bronchus +τ colon +τ kidney +τ rectum +τ stomach )/5 or equivalently H o : CB = 0 = (0, 5,1,1,1,1,1) β o1 β o2 β 11 β 12 β 21 β 22 β 31 β 32 β 41 β 42 β 51 β 52 β 61 β 61 E is the same as before, but now ( ) 62660386574 1861059247 H = 1861059247 5527481893 intercept bladder bronchus colon kidney rectum stomach CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 581/ 72

The Test and Result Λ = 44877E13 63332E13 = 07085934 ν e = g l=1 n l g = 57 ν h = 1, the number of rows of C So, for ν h = 1, use( )( ) νe +ν h p 1 Λ F = p Λ = 11514901, which if the null is true (and assumptions valid), F should have a sampling distribution that is F p,(νe+ν h p) Comparing F to F 2,56, we get a p-value< 01 Reject H o Summary: The mean survival of patients with bladder cancer differs from that of those with other types of cancer; however, no support for differences between the other types Question: Are there differences for survival from first hospitalization CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 591/ 72

Simultaneous Confidence Intervals We can construct simultaneous confidence intervals for components of differences τ l τ l (which equal µ l µ l ) or other linear combinations such as τ 1 (τ 2 +τ 3 )/2 There are at least three ways of doing this Specify a matrix M in the hypothesis test H o : CBM = 0 that is a (p 1) vector with all M = (0,, 1 }{{} i th,0) Bonferroni-type: Same as above but split the α into pieces, on part for each of the planned comparisons Roys method, which is based on the union-intersection principle CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 601/ 72

Using CBM = 0 C picks out which two (or more groups to compare) eg, want to compare bladder with the rest, C = (0,1, 2, 2, 2, 2, 2) M picks out which variable (or linear combination of variables) eg, Just compare d 1, increase in survival from first hospitalization, M = (1,0) Putting these together in our example gives us β o1 β o2 β 11 β 12 β 21 β 22 (0,1, 2, 2, 2, 2, 2) β 31 β 32 β 41 β 42 β 51 β 52 β 61 β 61 ( ) ( 1 = β 0 11 1 5 ) 6 β l1 l=2 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 611/ 72

Confidence interval for CBM β 11 1 5 6 β l1 = τ 11 l=2 We need two things: A fudge-factor a value from a probability distribution An estimate of the standard error A (1 α)100% confidence statement given vectors C 1 (g+1) and M p 1 is 6 l=2 τ l1 CBM± F 1,νe (α) (M S pool M)(C(A A) C ) Note: Consider two columns of B, β i and β k, the covariance matrix between them is cov(β i,β k ) = s pool,ik (A A) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 621/ 72

Our example: CI for CBM β 11 1 5 6 l=2 β l1 = 130420 1 (4950+11719+11886+29743+7067) = 117347 5 (M S pool M)(C(A A) C ) = s pool,11 (02197619) = (42703103)(02197619) = 93845152 And F 1,57 (05) = 401 So our 95% confidence interval is 130420 ± 401 93845152 130420 ± 401(30634) (56003,178691) Since 0 is not in the interval, the mean increase in survival from first hospitalization due to bladder cancer is larger than the average of the others means Should we test whether the same is true for increase in survival from time of untreatability? CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 631/ 72

Plot of Means d2 x 3 x 4 x 2 x x 6 1 x 5 d1 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 641/ 72

Notes about These CIs If you re only looking (testing) the difference between two means, eg, τ li τ l i Then the standard error is just s pool,ii 1/nl +1/n l When looking at a difference for a variable (eg, above), these confidence statements are equivalent to what you would get from 1-way ANOVA using Fisher s least significant differences; that is, they are univariate CIs When considering a linear combination of variables, these CIs are equivalent to univariate CIs where you ve analyzed a new or composite variable defined by the linear combination In our example, we don t have to worry too much about inflated Type I error rate, because we only did one CI after rejecting the overall test and using multivariate contrasts to narrow down where differences exist If you do all pairwise differences, there are g(g 1)/2 pairs times p variables (eg, 2(6)(5)/2 = 30) CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 651/ 72

Bonferroni Intervals If you have planned to look at all pairwise comparisons before looking at the data (ie, m = pg(g 1)/2), then you can use as your fudge factor t νe (α/(2m)) Let n + = g l=1 n l For the model X lj = µ + τ l + ǫ lj with j = 1,,n l and l = 1,,g with confidence at least (1 α), (τ li τ l i) belongs to ( x li x l i)±t νe (α/(2m)) s pool,ii ( 1 n l + 1 n l for all components (variables) i = 1,,p and all differences l < l = 1,,g CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 661/ 72 )

Roy s Method This is based on the union intersection principle This is more like the first method that we considered (ie, CBM); however, we use a different distribution for our fudge factor We use Greatest Root of H θ(e+h) = 0 where H (between groups or hypothesis SSCP matrix) and E = W (error or within groups SSCP matrix) are independent Wishart matrices To apply this result, we need percentiles of the greatest root distribution of the largest root λ of the equation H λe = 0 Percentile can be found in tables This distribution does not depend on Σ but only on df = n g p 1 CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 671/ 72

Roy s Method Tables and charts of greatest root distribution exist; however, these are difficult to read (can find them in older literature) Recommendation: I ld suggest using Scheffé s method where you do 1-way ANOVA on a linear combination of variables and then specify the contrast that you want CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 681/ 72

A Truly Multivariate Follow-Up Discriminate Analysis The first discriminate function gives a linear combination of the p variables that yields the greatest differences between the means of the groups You can get p 1 functions They equal the characteristic roots of E 1 H For now, we ll just get them from SAS/PROC GLM CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 691/ 72

Summary: PCA, MANOVA, DA d2 x 3 x x x 2 4 x 6 x 1 1 st PC x 5 d1 1 st Discriminant CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 701/ 72

SAS IML and GLM SAS IML code using traditional approach & GLM one PROC GLM data=vitc; class type ; model d1 d2 = type /solution; * Note: The order of the values in the contrast are alphabetical, in this case order is bladder bronchus colon kidney rectum stomach; contrast bronchus=colon=kidney=rectum=stomach type 0 1 0 0 0-1, type 0 0 1-1 0 0, type 0 1-1 -1 0 1, type 0 1 1 1-4 1; contrast bladder vs others type -5 1 1 1 1 1 ; manova h=type /printh printe; estimate b vs o type 1-2 -2-2 -2-2; lsmeans type; title MANOVA of vitamin C and Cancer ; Alternate MANOVA statement where M is entered as M : manova h=type M=(1 0); CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 711/ 72

SAS GLM Output Univariate ANOVAs for each dependent variable If requested, E (printe) and H (printh) SSCP matrices p characteristic roots and vectors of E 1 H (ie, discriminant functions) Other requested statistics: contrasts estimates of contrasts cell means etc Test statistics for no overall effect specified in MANOVA statement Show SAS program and output CJ Anderson (Illinois) Comparisons of Several Multivariate Populations Spring 2017 721/ 72