Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

Similar documents
Chapter 7, continued: MANOVA

Group comparison test for independent samples

One-way ANOVA. Experimental Design. One-way ANOVA

MULTIVARIATE ANALYSIS OF VARIANCE

Factorial designs. Experiments

Lecture 5: Hypothesis tests for more than one sample

Multivariate Statistical Analysis

Analysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Neuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:

WELCOME! Lecture 13 Thommy Perlinger

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Analysis of Variance

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

STAT 730 Chapter 5: Hypothesis Testing

One-way Analysis of Variance. Major Points. T-test. Ψ320 Ainsworth

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Multivariate Analysis of Variance

Analysis of variance, multivariate (MANOVA)

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Principal component analysis

T. Mark Beasley One-Way Repeated Measures ANOVA handout

COMPARING SEVERAL MEANS: ANOVA

Introduction. Chapter 8

Analysis of Variance: Part 1

Chapter 14: Repeated-measures designs

Hypothesis Testing hypothesis testing approach

Other hypotheses of interest (cont d)

Multiple comparisons - subsequent inferences for two-way ANOVA

Chapter 12. Analysis of variance

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

DESAIN EKSPERIMEN BLOCKING FACTORS. Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Factorial Treatment Structure: Part I. Lukas Meier, Seminar für Statistik

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Multivariate analysis of variance and covariance

ANCOVA. Lecture 9 Andrew Ainsworth

Lecture 5: ANOVA and Correlation

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

Unit 12: Analysis of Single Factor Experiments

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

ANOVA approaches to Repeated Measures. repeated measures MANOVA (chapter 3)

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

Lec 1: An Introduction to ANOVA

STA2601. Tutorial letter 203/2/2017. Applied Statistics II. Semester 2. Department of Statistics STA2601/203/2/2017. Solutions to Assignment 03

M A N O V A. Multivariate ANOVA. Data

Lec 3: Model Adequacy Checking


Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

Analysis of Variance (ANOVA)

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

What Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

M M Cross-Over Designs

TWO-FACTOR AGRICULTURAL EXPERIMENT WITH REPEATED MEASURES ON ONE FACTOR IN A COMPLETE RANDOMIZED DESIGN

Multivariate Statistical Analysis

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Rejection regions for the bivariate case

BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Analysis of variance

10/31/2012. One-Way ANOVA F-test

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

Chapter 4: Randomized Blocks and Latin Squares

GLM Repeated Measures

Multivariate Linear Regression Models

Research Methodology: Tools

Topic 4: Orthogonal Contrasts

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Sleep data, two drugs Ch13.xls

4.1. Introduction: Comparing Means

Chapter 10. Design of Experiments and Analysis of Variance

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

Analyses of Variance. Block 2b

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

STAT 115:Experimental Designs

Formal Statement of Simple Linear Regression Model

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Simple Linear Regression

Institute of Actuaries of India

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

Advanced Experimental Design

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Ch 2: Simple Linear Regression

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

5 Inferences about a Mean Vector

Multivariate Linear Models

Transcription:

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Rationale and MANOVA test statistics underlying principles MANOVA assumptions Univariate ANOVA Planned and unplanned Multivariate ANOVA comparisons (MANOVA): principles and procedures L6. When to use ANOVA Tests for effect of discrete independent variables. Each independent variable is called a factor, and each factor may have two or more levels or treatments (e.g. crop yields with nitrogen (N) or nitrogen and phosphorous (N + P) added). ANOVA tests whether all group means are the same. Use when number of levels (groups) is greater than two. Frequency µ C µ N µ N+P Yield Control Experimental (N) Experimental (N+P) L6. Why not use multiple -sample tests? For k comparisons, the probability of accepting a true H 0 for all k is ( - α) k. For 4 means, ( - α) k = (0.95) 6 =.735. So α (for all comparisons) = 0.65. So, when comparing the means of four samples from the same population, we would expect to detect significant differences among at least one pair 7% of the time. Frequency µ C : µ N+P µ c :µ N µ N :µ N+P µ C µ N µ N+P Control Yield Experimental (N) Experimental (N+P) L6.3

What ANOVA does/doesn t do Tells us whether all group means are equal (at a specified α level)......but if we reject H 0, the ANOVA does not tell us which pairs of means are different from one another. Frequency Frequency Control Experimental (N) Experimental (N+ P) µ C µ N µ N+P µ C µ N µ N+P Yield L6.4 Model I ANOVA: effects of temperature on trout growth 3 treatments determined (set) by investigator. 0.0 Dependent variable is 0.6 growth rate (λ), factor (X) is temperature. 0. Since X is controlled, we 0.08 can estimate the effect of 0.04 a unit increase in X (temperature) on λ (the 0.00 effect size)... 6 0 4 8 and can predict λ at Water temperature ( C) other temperatures. L6.5 Growth rate λ (cm/day) Model II ANOVA: geographical variation in body size of black bears 3 locations (groups) sampled from set of possible locations. Dependent variable is body size, factor (X) is location. Even if locations differ, we have no idea what factors are controlling this variability... so we cannot predict body size at other locations. Body size (kg) 80 40 00 60 0 Riding Kluane Mountain Algonquin L6.6

Model differences In Model I, the putative causal factor(s) can be manipulated by the experimenter, whereas in Model II they cannot. In Model I, we can estimate the magnitude of treatment effects and make predictions, whereas in Model II we can do neither. In one-way (single classification) ANOVA, calculations are identical for both models but this is NOT so for multiple classification ANOVA! L6.7 How is it done? And why call it ANOVA? In ANOVA, the total variance in the dependent variable is partitioned into two components: among-groups: variance of means of different groups (treatments) within-groups (error): variance of individual observations within groups around the mean of the group L6.8 The general ANOVA model The general model is: Y ij = µ + α i+ ε ij ε 4 µ α Y µ ANOVA algorithms fit the above model (by least squares) to estimate the Y α i s. µ H 0 : all α i s = 0 Group Group Group 3 µ =µ = µ = µ 3 α =α =α 3 = 0 Group L6.9

Partitioning the total sums of squares µ Y µ µ 3 µ Total SS Model (Groups) SS Error SS Group Group Group 3 L6.0 The ANOVA table Source of Variation Sum of Squares Degrees of freedom (df) Mean Square F Total Error k ni (Yij Y) i= j = n - SS/df k Groups n i ( Y i Y ) k - SS/df i = k ni (Yi j Yi) i= j= n - k SS/df MS groups MS error L6. Use of single-classification MANOVA Data set consists of k groups ( treatments ), with n i observations per group, and p variables per observation. Question: do the groups differ with respect to their multivariate means? In single-classification ANOVA, we assume that a single factor is variable among groups, i.e., that all other factors which may possible affect the variables in question are randomized among groups. L6.

Examples Good(ish) 4 different concentrations of some suspected contaminant; 0 young fish randomly assigned to each treatment; at age months, a number of measurements taken on each surviving fish. Bad(ish) 0 young fish reared in 4 different treatments, each treatment consisting of water samples taken at different stages of treatment in a water treatment plant. L6.3 Multivariate variance: a geometric interpretation Univariate variance is a measure of the volume occupied by sample points in one dimension. Multivariate variance involving m variables is the volume occupied by sample points in an m -dimensional space. X Larger variance X Occupied volume X Smaller variance X L6.4 Multivariate variance: effects of correlations among variables X No correlation Correlations between pairs of variables reduce the volume occupied by sample points and hence, reduce the multivariate variance. Occupied volume X Positive correlation X Negative correlation X L6.5

C and the generalized multivariate variance L C = C N M O Q P = 3 4 c o r = = 05. = cos θ, θ = 60 The determinant of the ss sample covariance matrix C is a generalized multivariate variance because area of a h parallelogram with sides θ s given by the individual standard deviations and s angle determined by the correlation between opposite h variables equals the sin 60 = = ; h = 3. hypotenuse determinant of C. Area = Base Height = 3, Area = C L6.6 ANOVA vs MANOVA: procedure In ANOVA, the total sums of squares is partitioned into a within-groups (SS w ) and between-group SS b sums of squares: SST = SSb + SSw In MANOVA, the total sums of squares and cross-products (SSCP) matrix is partitioned into a within groups SSCP (W) and a between-groups SSCP (B) T= B+ W L6.7 ANOVA vs MANOVA: hypothesis testing In ANOVA, the null hypothesis is: H µ µ µ 0 : = = = k In MANOVA, the null hypothesis is H = = = 0 : µ µ µ k This is tested by means of the F statistic: MS MS b b F = = MS w MS e This is tested by (among other things) Wilk s lambda: W W Λ= =,0 Λ T B+ W L6.8

SSCP matrices: within, between, and total The total (T) SSCP matrix (based on p variables X, X,, X p ) in a sample of objects belonging to m groups G, G,, G m with sizes n, n,, n m can be partitioned into withingroups (W) and betweengroups (B) SSCP matrices: T = B+ W x ijk x jk x k n j t = ( x x )( x x ) t m Value of variable X k for ith observation in group j Mean of variable X k for group j Overall mean of variable X k rc, w Element in row r and rc column c of total (T, t) and within (W, w) SSCP rc ijr r ijc c j= i= m n j rc = ijr jr ijc jc j= i= w ( x x )( x x ) L6.9 The distribution of Λ Unlike F, Λ has a very complicated distribution but, given certain assumptions it can be approximated b as Bartlett s χ (for moderate to large samples) or Rao s F (for small samples) χ = [( N ) 0.5( p+ k)]ln Λ df = p( k ) F / s = Λ ms p( k )/+ Λ / s pk ( ) m= N ( p+ k)/ p ( k ) 4 s = p + ( k ) 5 df= pk ( ), ms pk ( )/+ L6.0 Assumptions All observations are independent (residuals are uncorrelated) Within each sample (group), variables (residuals) are multivariate normally distributed Each sample (group) has the same covariance matrix (compound symmetry) L6.

Effect of violation of assumptions Assumption Effect on α Effect on power Independence of observations Normality Equality of covariance matrices Very large, actual α much larger than nominal α Small to negligible Small to negligible if group Ns similar, if Ns very unequal, actual α larger than nominal α Large, power much reduced Reduced power for platykurtotic distributions, skewness has little effect Power reduced, reduction greater for unequal Ns. L6. Checking assumptions in MANOVA Independence (intraclass correlation, ACF) No Use group means as unit of analysis Assess MV normality Yes N i > 0 Check group sizes N i < 0 MVN graph test Check Univariate normality L6.3 Checking assumptions in MANOVA (cont d) MV normal? Most variables normal? No Transform offending variables Yes Yes Check homogeneity of covariance matrices No Group sizes more or less equal (R <.5)? Yes No Yes END Yes Groups reasonably large (> 5)? Transform variables, or adjust α L6.4

Then what? Question Procedure What variables are responsible for detected differences among groups? Do certain groups (determined beforehand) differ from one another? Which pairs of groups differ from one another (groups not specified beforehand)? Check univariate F tests as a guide; use another multivariate procedure (e.g. discriminant function analysis) Planned multiple comparisons Unplanned multiple comparisons L6.5 What are multiple comparisons? Pair-wise comparisons of different treatments These comparisons may involve group means, medians, variances, etc. for means, done after ANOVA In all cases, H 0 is that the groups in question do not differ. Frequency µ C : µ N+P µ c :µ N µ N :µ N+P µ C µ N µ N+P Control Yield Experimental (N) Experimental (N+P) L6.6 Types of comparisons Y planned (a priori): independent of ANOVA results; theory predicts Planned which treatments should be different. X X X 3 X 4 X 5 unplanned (a posteriori): unplanned depend on ANOVA results; unclear which Y treatments should be different. Test of significance are very different between the X X X 3 X 4 X 5 two! L6.7

Planned comparisons (a( a priori contrasts): catecholamine levels in stressed fish Comparisons of interest are 0.7 determined by experimenter 0.6 beforehand based on theory 0.5 and do not depend on 0.4 ANOVA results. 0.3 Prediction from theory: 0. catecholamine levels 0. increase above basal levels 0.0 only after threshold PA O = 30 0 0 30 40 torr is reached. PA O (torr) So, compare only treatments 50 above and below 30 torr (N T = Predicted threshold ). L6.8 [Catecholamine] Unplanned comparisons (a( a posteriori contrasts): catecholamine levels in stressed fish Comparisons are determined by ANOVA results. Prediction from theory: catecholamine levels increase with increasing PA O. So, comparisons between any pairs of treatments may be warranted (N T = ). [Catecholamine] 0.7 0.6 0.5 0.4 0.3 0. 0. 0.0 0 0 30 40 50 PA O (torr) Predicted relationship L6.9 The problem: controlling experiment-wise α error For k comparisons, the probability of accepting H 0 (no difference) is ( - α) k. For 4 treatments, ( - α) k = (0.95) 6 =.735, so experiment-wise α (α e ) = 0.65. Thus we would expect to 0.0 reject H 0 for at least one 0 4 6 8 0 paired comparison about Number of treatments 7% of the time, even if all four treatments are Nominal α =.05 identical. L6.30 Experiment-wise α (α e ).0 0.8 0.6 0.4 0.

Unplanned comparisons: Hotelling T and univariate F tests Follow rejection of null Then use univariate t- in original MANOVA by tests to determine all pairwise multivariate which variables are tests using Hotelling T contributing to the to determine which detected pairwise groups are different differences but test at modified α opinion is divided as to maintain overall to whether these nominal type I error should be done at a rate (e.g. Bonferroni modified α. correction) L6.3 How many different variables for a MANOVA? In general, try to use a Measurement error is small number of variables multiplicative among because: variables: the larger the In MANOVA, power number of variables, generally declines with the larger the increasing number of measurement noise variables. Interpretation is easier If a number of variables with a smaller number are included that do not of variables differ among groups, this will obscure differences on a few variables L6.3 How many different variables for a MANOVA : recommendation Choose variables carefully, attempting to keep them to a minimum Try to reduce the number of variables by using multivariate procedures (e.g. PCA) to generate composite, uncorrelated variables which can then be used as input. Use multivariate procedures (such as discriminant function analysis) to optimize set of variables. L6.33