Analysis of variance, multivariate (MANOVA)

Similar documents
Chapter 7, continued: MANOVA

An Introduction to Multivariate Statistical Analysis

MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García. Comunicación Técnica No I-07-13/ (PE/CIMAT)

STAT 730 Chapter 5: Hypothesis Testing

Lecture 5: Hypothesis tests for more than one sample

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Applied Multivariate and Longitudinal Data Analysis

Multivariate Linear Models

Multivariate analysis of variance and covariance

MULTIVARIATE ANALYSIS OF VARIANCE

z = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is

Other hypotheses of interest (cont d)

Canonical Correlation Analysis of Longitudinal Data

STAT 501 EXAM I NAME Spring 1999

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

Multivariate Regression (Chapter 10)

Multivariate Linear Regression Models

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

Least Squares Estimation

Multivariate Statistical Analysis

A Note on UMPI F Tests

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

Group comparison test for independent samples

Comparisons of Several Multivariate Populations

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Random Matrices and Multivariate Statistical Analysis

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

M M Cross-Over Designs

Lecture 3. Inference about multivariate normal distribution

Chap The McGraw-Hill Companies, Inc. All rights reserved.

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

BIOMETRICS INFORMATION

GLM Repeated Measures

Factorial designs. Experiments

THE IMPACT OF HIGH SCHOOLS OUTPUTS ON STUDENTS ACHIEVEMENT IN THE COLLEGE OF SCIENCES AT AL-HUSSEIN BIN TALAL UNIVERSITY

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

You can compute the maximum likelihood estimate for the correlation

[y i α βx i ] 2 (2) Q = i=1

Experimental Design and Data Analysis for Biologists

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM

Department of Statistics

Inferences about a Mean Vector

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

Chapter 4: Randomized Blocks and Latin Squares

Testing Equality of Natural Parameters for Generalized Riesz Distributions

Fractional Factorial Designs

Statistical Inference On the High-dimensional Gaussian Covarianc

IX. Complete Block Designs (CBD s)

M A N O V A. Multivariate ANOVA. Data

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

High-dimensional asymptotic expansions for the distributions of canonical correlations

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Lecture 6 Multiple Linear Regression, cont.

Multiple comparisons - subsequent inferences for two-way ANOVA

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Statistical Hypothesis Testing

October 1, Keywords: Conditional Testing Procedures, Non-normal Data, Nonparametric Statistics, Simulation study

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Czech J. Anim. Sci., 50, 2005 (4):

Sleep data, two drugs Ch13.xls

Ch 2: Simple Linear Regression

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

On testing the equality of mean vectors in high dimension

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Statistics For Economics & Business

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC

General Linear Models. with General Linear Hypothesis Tests and Likelihood Ratio Tests

Neuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

Multivariate Regression

Regression with Compositional Response. Eva Fišerová

STT 843 Key to Homework 1 Spring 2018

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

On Selecting Tests for Equality of Two Normal Mean Vectors

Summary of Chapters 7-9

Formal Statement of Simple Linear Regression Model

ROI ANALYSIS OF PHARMAFMRI DATA:

BIOS 2083 Linear Models c Abdus S. Wahed

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Chapter 3 ANALYSIS OF RESPONSE PROFILES

The Standard Linear Model: Hypothesis Testing

Bootstrap Procedures for Testing Homogeneity Hypotheses

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Transcription:

Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables measured are decided by the investigator. In a clinical trial, patients satisfying some eligibility criteria are assigned at random, either to a new treatment or to a placebo. Patients are followed for some time and a few responses are recorded for each patient. In an agricultural experiment, a field is divided into plots of shape and size determined by the investigator. The plots are assigned at random one among a few treatments which could be, e.g., different fertilizers. The variables measured on each plot could be the yields of some crops. The general objective of a designed experiment is to assess the effects of different treatments on the responses. These effects are evaluated by statistical estimates and confidence intervals of the magnitude of the differences between treatments. The estimates should avoid biases and the random errors should be minimized as much as possible. The statistical methodology used to analyze such designed experiments in which there are several responses is termed MANOVA, an acronym for Multivariate ANalysis Of VAriance. Keywords: MANOVA; linear model; factor; experiment Multivariate regression model and the general linear hypothesis The linear model considered is Y = XB + E, where Y : n p is the matrix of responses, B : k p is the matrix of unknown regression coefficients, and X : n k is a fixed and known design matrix of rank k. The rows of the error term E are independently distributed as multivariate normal N p (0, Σ) with an unknown positive definite covariance matrix. The maximum likelihood estimators of the unknown parameters are ˆB = (X X) 1 X Y and ˆΣ = (Y X ˆB) (Y X ˆB)/n. The fundamental property of these estimates is that they form a set of sufficient statistics. Moreover, they are independently distributed as multivariate normal, ˆB N ( B, (X X) 1 Σ ), and Wishart, n ˆΣ W p (n k, Σ). The general linear hypothesis is the hypothesis H 0 : CB = 0, where the known matrix C : r k is of rank r. This null hypothesis is being tested against all alternatives in the multivariate regression model.

2 Analysis of variance, multivariate (MANOVA) A canonical form The canonical form due to Roy [8] and Anderson [1], see also Bilodeau and Brenner [3] or Eaton [5], is obtained by transforming the original response Y so that in the new model X becomes X 0 = I k and C reduces 0 ( ) to C 0 = I r,0. It can be shown that an equivalent problem in its canonical form is to test H 0 : M 1 = 0 against H 1 : M 1 0 based on Z 1 : r p N(M 1, I r Σ), Z 2 : (k r) p N(M 2, I k r Σ), and Z 3 : (n k) p N(0, I n k Σ), where Z 1, Z 2, and Z 3 are independent. The Wilks likelihood ratio test (LRT) rejects the null hypothesis for small values of Λ = Z 3Z 3 n/2 Z 1 Z 1 + Z 3 Z 3 n/2. At this point, it is convenient to define the so called U(p; m, n) distribution which is the distribution of W 1 / W 1 + W 2, where W 1 W p(n, I) is independently distributed of W 2 W p(m, I). Then, the likelihood ratio test has the distribution Λ 2/n U(p; r, n k). It can be shown that [n 1 2 (p m + 1)] log U(p; m, n) has a limiting chi-square distribution with pm degrees of freedom. This chi-square approximation can be adjusted by a C factor to get a test of exact significance level α. One computes the test statistic U(p; m, n) and rejects the null hypothesis at significance α if [n 1 ] 2 (p m + 1) log U(p; m, n) > C p,m,n p+1 (α) χ 2 pm(α). Tables of the C factors can be found in Anderson [2] and more extended tables are available in Pillai and Gupta [7]. Other invariant tests The problem in its canonical form is left invariant by the group of transformations Z 1 Γ 1 Z 1 A, Z 2 Γ 2 Z 2 A + N, Z 3 Γ 3 Z 3 A, where Γ 1, Γ 2, and Γ 3 are orthogonal, A is non singular, and N is arbitrary. The LRT is invariant with respect to this group of transformations. Let t = min(p, r), a maximal invariant is the set of non zero eigenvalues

Analysis of variance, multivariate (MANOVA) 3 f 1 f t of the matrix Z 1Z 1 (Z 3Z 3 ) 1. In the very special case t = 1, the LRT is uniformly most powerful invariant (UMPI). When p = 1 the LRT is equivalent to the usual F ratio, whereas when r = 1 it is equivalent to Hotelling s T 2 test. Both of these tests are UMPI. Unfortunately, when t > 1 there does not exist an UMPI test (even a locally MPI), see Fujikoshi [6]. Expressed in terms of the original MANOVA problem, f 1 f t are the t largest eigenvalues of { ˆB C [C(X X) 1 C ] 1 C ˆB} 1 (n ˆΣ) and the Wilks LRT rejects for small values of t (1 + f i ) 1 = n ˆΣ n ˆΣ + ˆB C [C(X X) 1 C ] 1 C ˆB. Other invariant tests proposed by popular statistical softwares are the Lawley-Hotelling trace test the test of Bartlett-Nanda-Pillai t t [ f i = tr Z 1Z 1 (Z 3Z 3 ) 1], [ f i (1 + f i ) 1 = tr Z 1Z 1 (Z 1Z 1 + Z 3Z 3 ) 1], and Roy s maximum root test f 1. Tables of critical values for these tests are available in Anderson [2]. Models of MANOVA Completely randomized design with one factor An experiment is set up to investigate the effects of a factor A, at levels 1,..., I, on p responses. A number n i of experimental units are chosen at random and assigned to the treatment corresponding to the level i of the factor. There are in total n = I n i experimental units. The linear model used is y ij = µ + α i + e ij, i = 1,..., I; j = 1,..., n i,

4 Analysis of variance, multivariate (MANOVA) where y ij : p 1 is the vector of the p responses observed on the experimental unit j at level i of the factor. The parameters represent an overall effect µ and an effect α i due to factor A at level i. The error terms e ij are supposed independent N p(0, Σ). The constraint I n iα i = 0 is imposed to avoid over parametrization. The null hypothesis of interest is that of the absence of effect of factor A, H A : α 1 = = α I = 0. Let ȳ.. be the mean over all units and ȳ i. be the mean over units having received factor A at level i. Wilks test is expressed in terms of sums of squares SST = SS(A)= I n i y ij y ij nȳ.. ȳ.. j=1 I n i ȳ i. ȳ i. nȳ.. ȳ.. SSE=SST SS(A) and rejects the null hypothesis whenever U(p; I 1, n I) = SSE / SSE + SS(A) is small. Simultaneous (1 α)100% confidence intervals on contrasts of treatment effects, I a α i b i where a : p 1 is arbitrary and the constants b i satisfying I b i = 0 define the contrast, can be constructed using the Bonferroni method or using Roy s maximum root test, see Srivastava [10]. These simultaneous confidence intervals can be adapted to other MANOVA models such as those in the subsequent sections. Randomized complete block design with one factor In a clinical trial, a priori information on variables, associated with the responses, such as sex, age, and stage of a disease, is used to group experimental units into homogeneous blocks. In the absence of effect of the factor, experimental units within a block should give similar responses. When the factor A has I levels, blocks of I homogeneous experimental units are formed. Within a block, the I levels of the factor are assigned at random to the I experimental units using, e.g., a random permutation. The linear model with J blocks becomes y ij = µ + α i + β j + e ij, i = 1,..., I; j = 1,..., J,

Analysis of variance, multivariate (MANOVA) 5 subject to the constraints I α i = J j=1 β j = 0. The parameter µ is an overall effect, α i is the effect due to factor A at level i and β j is the effect of block j. The sums of squares required are I J SST = y ij y ij IJȳ.. ȳ.. j=1 I SS(A)=J ȳ i. ȳ i. IJȳ.. ȳ.. J SS(blocks)=I ȳ.j ȳ.j IJȳ.. ȳ.. j=1 SSE=SST SS(A) SS(blocks). Wilks test for the hypothesis of absence of factor A effect, H A : α 1 = = α I = 0, is the test which rejects for small values of U(p; I 1, (I 1)(J 1)) = SSE / SSE +SS(A). A similar test for the effect of blocking can be conducted but is usually of no intrinsic interest. The latin square design is another specialized blocking technique, see Cox and Reid [4] for a general discussion and Srivastava [10] for the statistical MANOVA analysis. Balanced factorial design In agriculture, the effects of three fertilizers, say nitrogen, potassium and potash, each applied at low or high quantities, on the yield of some crops is an example of a factorial experiment with two factors: fertilizer and quantity. A balanced factorial design consists of an equal number of replicates of all possible treatments; a treatment being any combination of the levels of the factors. Factorial experiments are usually preferred to doing two experiments investigating one factor at a time. They are more efficient for estimating main effects, which are the averaged effects of a single factor over all units. Interactions among factors can also be assessed in a factorial experiment but not from two one-at-a-time experiments, see Cox and Reid [4]. The linear model of a two factor balanced factorial design with interaction replicated r times is y ijk = µ + α i + β j + (αβ) ij + e ijk, i = 1,..., I; j = 1,..., J; k = 1,..., r, subject to the constraints I α i = J j=1 β j = I (αβ) ij = J j=1 (αβ) ij = 0. The parameter µ is an overall effect, α i is a main effect of factor A, β j is a main effect of factor B, and (αβ) ij is an interaction

6 Analysis of variance, multivariate (MANOVA) effect between the two factors. The sums of squares required are I J r SST = y ijk y ijk IJrȳ...ȳ... j=1 k=1 I SS(A)=Jr ȳ i.. ȳ i.. IJrȳ...ȳ... SS(B)=Ir J ȳ.j. ȳ.j. IJrȳ... ȳ... j=1 I J SS(T reatments)=r ȳ ij. ȳ ij. IJrȳ...ȳ... j=1 SS(A B)=SS(T reatments) SS(A) SS(B) SSE=SST SS(T reatments). The hypotheses of absence of A main effects H A : α i = 0, for all i, of absence of B main effects H B : β j = 0, for all j, and of absence of interactions H A B : (αβ) ij = 0, for all i and j, can be tested using Wilks tests given respectively by U(p; I 1, IJ(r 1))= SSE / SSE + SS(A) U(p; J 1, IJ(r 1))= SSE / SSE + SS(B) U(p; (I 1)(J 1), IJ(r 1))= SSE / SSE + SS(A B). The computations of any MANOVA can be performed easily with the manova statement of either the SAS [9] proc anova (for balanced designs) or more generally the proc glm. References [1] Anderson T.W. (1958). An Introduction to Multivariate Statistical Analysis. John Wiley & Sons, New York. [2] Anderson T.W. (1984). An Introduction to Multivariate Statistical Analysis. 2nd ed. John Wiley & Sons, New York. [3] Bilodeau M., Brenner D. (1999). Theory of Multivariate Statistics. Springer-Verlag, New York. [4] Cox D.R., Reid N. (2000). The Theory of the Design of Experiments. Chapman and Hall/CRC.

Analysis of variance, multivariate (MANOVA) 7 [5] Eaton M.L. (1983). Multivariate Statistics, a Vector Space Approach. John Wiley & Sons, New York. [6] Fujikoshi Y. (1988). Comparison of powers of a class of tests for multivariate linear hypothesis and independence. Journal of Multivariate Analysis 26, 48 58. [7] Pillai K.C.S., Gupta A.K. (1969). On the exact distribution of Wilks s criterion, Biometrika 56, 109 118. [8] Roy S.N. (1957). Some aspects of multivariate analysis. John Wiley & Sons, New York. [9] SAS Institute Inc., SAS 9.1.3 Help and Documentation, Cary, NC: SAS Institute Inc., 2000-2004. [10] Srivastava M.S. (2002). Methods of Multivariare Statistics. John Wiley & Sons, New York.