Lecture 5: Hypothesis tests for more than one sample

Similar documents
Chapter 7, continued: MANOVA

Applied Multivariate and Longitudinal Data Analysis

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

5 Inferences about a Mean Vector

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Multivariate Statistical Analysis

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Comparisons of Several Multivariate Populations

Applied Multivariate and Longitudinal Data Analysis

Other hypotheses of interest (cont d)

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Multivariate analysis of variance and covariance

Example 1 describes the results from analyzing these data for three groups and two variables contained in test file manova1.tf3.

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

Multivariate Regression (Chapter 10)

Analysis of variance, multivariate (MANOVA)

An Introduction to Multivariate Statistical Analysis

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

GLM Repeated Measures

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

You can compute the maximum likelihood estimate for the correlation

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Multivariate Statistical Analysis

STAT 730 Chapter 5: Hypothesis Testing

Multivariate Statistical Analysis

Multivariate Linear Regression Models

Multivariate Linear Models

MULTIVARIATE ANALYSIS OF VARIANCE

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

Stevens 2. Aufl. S Multivariate Tests c

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

M A N O V A. Multivariate ANOVA. Data

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

Least Squares Estimation

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

Group comparison test for independent samples

Lecture 3. Inference about multivariate normal distribution

STAT 501 EXAM I NAME Spring 1999

z = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is

Mean Vector Inferences

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Multiple comparisons - subsequent inferences for two-way ANOVA

Inferences about a Mean Vector

STAT5044: Regression and Anova

MULTIVARIATE POPULATIONS

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Neuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:

Multivariate Statistics

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

General Linear Model. Notes Output Created Comments Input. 19-Dec :09:44

Rejection regions for the bivariate case

T. Mark Beasley One-Way Repeated Measures ANOVA handout

Elliptically Contoured Distributions

4.1. Introduction: Comparing Means

Profile Analysis Multivariate Regression

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

1 One-way Analysis of Variance

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Repeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each

Ch 2: Simple Linear Regression

On MANOVA using STATA, SAS & R

Within Cases. The Humble t-test

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

PLSC PRACTICE TEST ONE

Multivariate Tests. Mauchly's Test of Sphericity

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Repeated Measures Part 2: Cartoon data

Principal component analysis

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

i=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that

On Selecting Tests for Equality of Two Normal Mean Vectors

Chapter 14: Repeated-measures designs

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

ANCOVA. Lecture 9 Andrew Ainsworth

Multivariate Analysis of Variance

A Multivariate Perspective

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Chapter 10. Design of Experiments and Analysis of Variance

Chapter 2 Multivariate Normal Distribution

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

ANOVA: Analysis of Variance - Part I

arxiv: v1 [math.st] 11 Jun 2018

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Multivariate Data Analysis Notes & Solutions to Exercises 3

Solutions to Final STAT 421, Fall 2008

M M Cross-Over Designs

STA 437: Applied Multivariate Statistics

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Applied Multivariate Analysis

Introduction. Introduction

One-way ANOVA. Experimental Design. One-way ANOVA

Transcription:

1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011

2/23 Outline Paired comparisons Repeated measures Comparing mean vectors from two populations Comparing mean vectors from more than two populations MANOVA

3/23 Repetition: Testing H 0 : µ = µ 0 Let X N p (µ, Σ). When testing the hypothesis H 0 : µ = µ 0, we use Hotelling s T 2 : Under H 0, T 2 = n( X µ 0 ) S 1 ( X µ 0 ). n p (n 1)p T 2 F p,n p. The T 2 test therefore rejects H 0 : µ = µ 0 at level α if T 2 > (n 1)p n p F p,n p(α). Similarly, the p-value of the test is obtained as p = P(T 2 > x) where x is the observed value of the statistic.

4/23 Paired comparisons If we wish to study the effect of a treatment, it is often desirable to measure the response variables of interest on a single unit before and after the treatment was applied to that unit. This procedure eliminates unit-to-unit variation. Examples: ph of lakes before and after chalk was added, health of patients before and after medication, people s view of Uppsala University before and after a nationwide advertising campaign... Similarly, if we wish to compare two treatments, we can apply both treatments to the same (or identical) experimental unit. Such experimental designs are called paired comparisons, since the measurements are made in pairs.

5/23 Paired comparisons: Hotelling s T 2 Let X j1 denote the response to treatment 1 and X j2 denote the response to treatment 2 for experimental unit j. If X j1 and X j2 are multivariate normal, then D j = X j1 X j2 N p (δ, Σ d ), where δ is the mean difference between the treatments. If the treatments are applied independently to n independent units, so that D 1,..., D n are independent N p (δ, Σ d ) random vectors, then T 2 = n( D δ) S 1 d ( D δ) p(n 1) n p F p,n p. This is simply the result about Hotelling s T 2 from the last lecture. The problem of comparing the two samples X 11,..., X 1n1 and X 21,..., X 2n2 is simplified to the familiar one sample problem by looking at the pairwise differences.

6/23 Paired comparisons: testing The hypothesis H 0 : δ = 0 is rejected in favour of the alternative H 1 : δ 0 if T 2 = n d S 1 d d > p(n 1) n p F p,n p(α) where d j = (d j1, d j2,..., d jp ), j = 1,..., n are the observed differences between the n units. A confidence region for δ with confidence level α consists of all δ such that ( d δ) S 1 d ( d p(n 1) δ) n(n p) F p,n p(α). The simultaneous Bonferroni confidence intervals for the individual mean differences δ i are given by ( α ) ) I δi = ( d i ± t n 1 s 2 2p d i /n.

7/23 Paired comparisons: contrasts If we use a little matrix algebra, it is not necessary to calculate all the differences. Instead, we can use contrast matrices. See blackboard! On the other hand, it may be advisable to calculate the differences d 1,..., d n in order to assess their normality. The notion of contrast matrices can also be used for repeated measures designs.

8/23 Repeated measures A situation that is similar to what we just studied is when we wish to compare the effects of q different treatments on a single response variable. Let X 1,..., X n be i.i.d. N p (µ, Σ) observations, with X j = (X j1, X j2,..., X jq ), where X ji is the response to the ith treatment on the jth experimental unit. Typically, we wish to test the hypothesis that there is no difference between the treatment means. This is stated using contrast matrices. See blackboard!

9/23 Repeated measures: testing When the treatment means are equal, C 1 µ = C 2 µ = 0. In fact, Cµ = 0 for any contrast matrix C. Given C, we can compute the observed contrasts Cx j, with mean C x and sample covariance CSC. The hypothesis Cµ = 0 is tested using T 2 = n(c x) (CSC ) 1 (C x) (q 1)(n 1) F q 1,n q+1 n q + 1 under H 0. The statistic T 2 is independent of the choice of contrast matrix C.

10/23 Comparing mean vectors from two populations Often we wish to compare the mean vectors of two populations in situations where it isn t possible to use paired comparisons. Assume that we have a p-variate sample X 11, X 12,..., X 1n1 from a distribution with mean µ 1 and covariance Σ 1 and a p-variate sample X 21, X 22,..., X 2n2 from a distribution with mean µ 2 and covariance Σ 2. Furthermore, assume that the two samples are independent. We wish to test the hypothesis that µ 1 µ 2 = δ 0.

Two populations: Hotelling s T 2 In order to construct a test statistic for this hypothesis, we think about how Hotelling s T 2 is constructed in the one-sample case. See blackboard! Result 6.2. If X 11, X 12,..., X 1n1 are i.i.d. N p (µ 1, Σ) and X 21, X 22,..., X 2n2 are i.i.d. N p (µ 2, Σ), then ) ( T 2 = ( X 1 X 2 (µ 1 µ 2 ) ( 1 + 1 ) 1 ) )S p ( X 1 X 2 (µ 1 µ 2 ) n 1 n 2 (n 1 + n 2 2)p n 1 + n 2 p 1 F p,n 1 +n 2 p 1. The assumption that the covariance matrices are equal is quite strong! There are p variances and p(1 p) 2 distinct covariances in the covariance matrix. On the other hand, the real null hypothesis may be that the distributions, and not just the mean vectors, are equal for the two treatments. 11/23

12/23 Two populations: Behrens-Fisher problem The problem of making inferences about the two means of two (univariate) normal populations without assuming that the variances are equal is called the Behrens-Fisher problem. Different approaches to this problem have been proposed by Fisher, Behrens, Chapman, Dudewicz and Ahmed, among others. The most commonly used solution was given by Welch, who proposed a t-test using s 2 d = s2 1 /n 1 + s 2 2 /n 2. His statistic is approximately t-distributed, with a complicated expression for the degrees of freedom. Further reading: Kim, S.-H., Cohen, A.S. (1998): On the Behrens-Fisher problem: a review, Journal of Educational and Behavioral Statistics, 23, pp. 356-377.

13/23 Two populations: Behrens-Fisher problem When comparing mean vectors of two multivariate normal populations with unequal covariance matrices, the problem becomes even more complicated. Some possible solutions are: Use that ) ( T 2 = ( X 1 X 2 δ 0 ( 1 S 1 + 1 ) 1 ) S 2 ) ( X 1 X 2 δ 0 χ 2 p n 1 n 2 under H 0 when n 1 p and n 2 p are large, even if the data is non-normal. Use that, for normal data, T 2 above is approximately distributed as νp ν p + 1 F p,ν p+1 where ν is given by the complicated expression (6-29) in J&W. Use a different, more robust, test! (e.g. Tiku and Singh (1982))

14/23 MANOVA: Multivariate ANalysis Of VAriance Now let s assume that we have observations from g populations: Population 1: Population 2:. Population g: X 11, X 12,..., X 1n1 X 21, X 22,..., X 2n2. X g1, X g2,..., X gng and that we wish to test the hypothesis that all populations have the same mean. If there are differences, we d like to be able to say which means differ.

15/23 MANOVA: Assumptions For MANOVA, we make the following assumptions: X l1, X l2,..., X lnl are i.i.d. with mean µ l, l = 1, 2,..., g. The samples from different populations are independent. All populations have the same covariance matrix Σ. The populations are multivariate normal. If the sample sizes are large, MANOVA can be used as an approximative method due to the multivariate central limit theorem.

16/23 MANOVA: Model Linear model: X lj = µ + τ l + e lj, j = 1, 2,..., n l and l = 1, 2,..., g where e lj are independent N p (0, Σ) variables. Here the parameter vector µ is an overall mean and τ represents the lth treatment effect with We wish to test g n l τ l = 0. l=1 H 0 : τ 1 = τ 2 =... = τ g against the hypothesis that at least two effects differ.

17/23 MANOVA: Sums of squares and cross products In analogue to the univariate MANOVA, the total sum of squares (and cross products) is partitioned into different sources of variation: g n l (x lj x)(x lj x) = l=1 j=1 g n l ( x l x)( x l x) + l=1 g n l (x lj x l )(x lj x l ) = B + W l=1 j=1 where B is the treatment (Between) sum of squares and cross products and W is the residual (Within) sum of squares and cross products. B and W are p p matrices. The latter can be rewritten as W = (n 1 1)S 1 + (n 2 1)S 2 +... + (n g 1)S g

18/23 MANOVA: Test statistic In univariate ANOVA, H 0 : τ 1 = τ 2 =... = τ g is tested by studying a suitable rescaling of SS Tr /SS Res. This is equivalent to studying SS Tr /SS Res or 1 + SS Tr /SS Res = (SS Res + SS Tr )/SS Res. This, in turn, is equivalent to studying SS Res /(SS Tr + SS Res ). We would like to construct a similar statistic for MANOVA, but ratios of matrices are not defined. Wilks suggested using the statistic known as Wilks lambda. Λ = det W det(b + W)

19/23 MANOVA: Distribution of Wilks Λ What can be said about the distribution of Λ = det W det(b + W)? Let N = g l=1 n l. Then we have the following Exact results: N g 1 Λ p = 1 g 2 g 1 Λ F g 1, N g N g 1 1 Λ p = 2 g 2 Λ F 2(g 1), 2(N g 1) p 1 g = 2 p 1 g = 3 g 1 N p 1 1 Λ p N p 2 p Approximate result: (for N large) Λ F p, N p 1 1 Λ Λ F 2p, 2(N p 2) (N 1 (p + g)/2) ln Λ χ 2 p(g 1)

20/23 MANOVA: Other test statistics Three other tests statistics are also common for MANOVA: Lawley Hotelling trace: Pillai trace: Roy s largest root: tr(bw 1 ) tr(b(b + W) 1 ) maximum eigenvalue of W(B + W) 1 For g = 2, all four statistics reduce to Hotelling s T 2. For large samples, all four are nearly equivalent.

21/23 MANOVA: Confidence intervals Simultaneous confidence intervals for the mean differences are obtained using the Bonferroni approach. Let N = g l=1 n l. Then ( α ) ( x w ) ii ki x li ± t N g pg(g 1) N g (1/n k + 1/n l ) where w ii is the ith diagonal element of W, is a confidence interval for τ ki τ li with confidence level at least 1 α.

22/23 Equality of covariance matrices As previously mentioned, the assumption of equal covariance matrices is quite strong, as there are p(p+1) 2 distinct elements in the covariance matrix. There are a few methods to investigate the assumption of equality: Visual investigation of matrices. Box s M test. Discussed in J&W. Good theoretical properties. Not as good in practice. Some authors call this test super-sensitive and say that it isn t useable for α > 0.01. Bartlett s test or Levene s test for equal variances, for marginals.

23/23 Summary Paired comparisons Repeated measures Comparing mean vectors from two populations Comparing mean vectors from more than two populations MANOVA Different statistics to choose from Equality of covariance matrices