I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Size: px

Start display at page:

Download "I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN"

June Lyons
6 years ago
Views:

1 Comparisons of Two Means Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees, University of Illinois Comparisons of Two Means Slide 1 of 68

2 Outline Summary : p variables, 2 matched pairs (i.e., dependent samples): H o : µ 1 µ 2 = δ = 0 Repeated measures designs: 1 variable measured as multiple times: H o : Lµ = 0 Two independent samples: Four Cases of H o : µ 1 = µ 2 Missing data later in the semester Reading: Johnson & Wichern pages Comparisons of Two Means Slide 2 of 68

3 (dependent samples) Paired observations arise in a number of different ways: Every subject (case) responds twice (e.g., pre/post test) (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Cases may be matched (on relevant variables) and then randomly assigned to one of two treatments. Naturally occurring pairs: husbands/wifes, siblings, etc. The plan: Review univariate and then generalize to the multivariate situation. For j = 1,...,n (number of pairs), let X j1 = measurement (response) of the j th case given treatment 1. X j2 = measurement (response) of the j th case given treatment 2. We want to examine the differences D j = X j1 X j2 Comparisons of Two Means Slide 3 of 68

4 Univariate Case D j = X j1 X j2 (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data If D j N(δ,σD 2 ), then the statistic t = D δ s D / n Student s t distribution where D = (1/n) n j=1 D j = (1/n) n j=1 (X j1 X j2 ) s 2 D = (1/(n 1)) n j=1 (D j D) 2 Test H o : δ = 0 versus H A : δ 0 (or H o : δ = δ o versus H A : δ δ o ). A 100(1 α)% confidence interval (estimate) of δ D ±t n 1 (α/2) sd n Comparisons of Two Means Slide 4 of 68

5 Advantage The advantage of looking at differences using paired... (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data It eliminates effects of case-to-case variation, because the variance (standard deviation) of differences is reduced to the extent that the scores/measurements are positively correlated σ 2 D = σ 2 X 1 +σ 2 X 2 2σ X1,X 2 This result comes from what we know about linear combinations: ( ) so D = a X = (1, 1) X 1 X 2 = X 1 X 2 µ D = a µ var(d) = a Σa where µ 2 1 is the mean vector for X and Σ 2 2 covariance matrix for X. Comparisons of Two Means Slide 5 of 68

6 Multivariate Situation (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Record p variables for each treatment (condition) for each member of each pair. For case j, we have X 1j1 = variable 1, treatment 1 X 2j1 = variable 1, treatment 2 X 1j2 = variable 2, treatment 1 X 2j2 = variable 2, treatment 2. X 1jp = variable p, treatment 1 X 2jp = variable p, treatment 2 where j = 1,...,n (n = the number of pairs that we have). We Study the differences D j1 = X 1j1 X 2j1 D j2 = X 1j2 X 2j2. D jp = X 1jp X 2jp. D j = D j1 D j2. D jp Comparisons of Two Means Slide 6 of 68

7 Needed for Statistical Inference (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Assume the D j N p (δ,σ D ) and i.i.d. for j = 1,...,J where δ = δ 1 δ 2.. δ p = E(D j) If the differences D 1,D 2,...,D n are a random sample from a N p (δ,σ D ) population, then T 2 = n( D δ) S 1 ( D δ) (n 1)p n p F p,n p Modification for Large Samples: If n and (n-p) are large, then T 2 is approximately distributed as a χ 2 p random variable regardless of the distribution of D j (i.e., D j may not be multivariate normal, but δ and Σ 1 D exist). Comparisons of Two Means Slide 7 of 68

8 Statistical Inference Suppose that we have observations d j = (d j1,d j2,...,d jp for j = 1,...,n). (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Descriptive statistics: d p 1 = 1 n n j=1 Hypothesis Test: d j and S d,(p p) = 1 n 1 n (d j d)(d j d) j=1 H o : δ = 0 versus H A : δ 0... assuming D j N p (δ,σ D ) and i.i.d. Reject H o if T 2 = n d S 1 d (n 1)p n p F p,n p(α) Comparisons of Two Means Slide 8 of 68

9 If you Reject H o : δ = 0 (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Confidence Region: n( D δ) S 1 ( D δ) (n 1)p n p F p,n p(α) Simultaneous T 2 Intervals for individual differences of components means (n 1)p δ i : di ± n p F p,n p(α) s 2 d i /n where d i is mean difference of the i th variable and s 2 d i is the i th diagonal element of S d. Bonferroni 100(1 α)% confidence intervals δ i : di ±t n 1 (α/2m) s 2 d i /n where m = the number of confidence intervals (). Comparisons of Two Means Slide 9 of 68

10 Large Samples (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure For Large (n p) (i.e., D j need not be multivariate normal) (n 1)p n p F p,n p(α) χ 2 p(α) Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 10 of 68

11 Example: The data Data from Table 5.9, page of Rencher (2007): (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data "Each of 15 students wrote an informal and a formal essay (Kramer, 1972, p100). The variables were recorded were the number of words and number of verbs" y1 = words in informal essay y2 = verbs in informal essay y3 = words in formal essay y4 = verbs in formal essay These are count data. CLT kick-in? n = 15 smallish Sample Statistics: Difference: d =words [verbs] informal words [verbs] formal. ( ) ( ) words d = S = 3.53 verbs Comparisons of Two Means Slide 11 of 68

12 Plot of the Data (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 12 of 68

13 Plot of the Data: Cases Connected (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 13 of 68

14 Plot of the Differences (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 14 of 68

15 Example: Test H o : δ = 0 versus H A : δ 0 (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data (i.e., the number of words and verbs in informal and formal essays are the same). ( ) 1 ( T = 15 (32.80,3.53) ( ) = 15 (32.80, 3.53) = (14(2)/13)F 2,13 (.05) = 8.20 Alternatively, (13)/((14)2)T 2 = 7.053, which is distributed as F 2,13, and has a p-value of =.008 Conclusion: Reject H o. The data support the conclusion that the number of words and verbs in informal essays are not equal to the number in formal ones. ) Comparisons of Two Means Slide 15 of 68

16 95% Confidence Region for δ From SAS>Solutions>Interactive Data Analysis Analyze > Multivariate (scatter plot, curves) (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 16 of 68

95% Confidence Region for the Mean (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples

17 95% Confidence Region for the Mean (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 17 of 68

18 SAS for the Last Figure (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure proc sgscatter data=essay; compare y= dverbs x= dwords / ellipse=(type=mean) ; title 95% Confidence Region for the mean Difference ; run; Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 18 of 68

19 Confidence Region, T 2 & Bonferroni Intervals (dependent samples) Verbs Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 d = (32.80,3.53) Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Words Plot of the Differences Example: Test δ the Mean ր δ o = (0,0) SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 19 of 68

20 Another way to calculate T 2 for paired. (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data So far we ve divided the sample ; that is, D = X 1 X 2. Now we ll consider a Full Sample method that considers every case as a pair and each with p measures on each member of the pair. Pair or Case Number Conditon 1 2 j n (a) (b) p variables p variables p variables p variables p variables p variables p variables p variables So we have 2p variables measured for each case (pair). In an experimental situation, the conditions are assumed to have been randomly assigned to members of the pairs. Comparisons of Two Means Slide 20 of 68

21 (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Full Data Method for paired Full Data Matrix: X 111 X 112 X 11p X 121 X 122 X 12p X 211 X 212 X 21p X 221 X 222 X 22p X n 2p = X n11 X n12 X n1p X n21 X n22 X n2p = (X }{{} 1 X 2 ) }{{} n p n p Full Sample Mean Vector: X = ( X 11, X 12,..., X 1p X 21,..., X 2p ) = ( X 1 X 2) Comparisons of Two Means Slide 21 of 68

22 (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Full Data Method for paired Full Data Sample Covariance Matrix: ( S 11 S 12 S 2p 2p = S 21 S 22 where S 11 is the (p p) covariance matrix for X 1 S 22 is the (p p) covariance matrix for X 2 S 12 = S 21 is the (p p) covariance matrix between X 1 & X 2. Define a Contrast Matrix: C p 2p = Comparisons of Two Means Slide 22 of 68 ) = (I p p I p p ) What condition do you need to have a contrast matrix?

23 Computations for Full Data Let x j,(2p 1) = j th row of X (n 2p) written as a column vector. (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data d j = Cx j d = C x = C((1/n) n j=1 x j) Putting all of this together yields T 2 = n(c x) (CSC ) 1 (C x) = n x C (CSC ) 1 C x With this method, we don t have to split the data set and compute the differences. We ll see more uses of contrast matrices.... relatively soon. SAS/IML code for essay example. Comparisons of Two Means Slide 23 of 68

24 for comparing conditions (treatments, etc). This is another generalization of univariate paired t test. as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary Situation: q conditions are compared with respect to one response variable. Each case receives each treatment once over successive periods of time. The order of the treatments should be randomized (& counterbalanced if possible). Example from Cochran & Cox (1957) (I got this from Timm 1980): There are four calculator designs and each person does specified computations. Their speed is recorded for each of the four calculators. The order of the calculator use was randomly assigned. This is Repeated measures because each case (person) gets each treatment (calculator)... we have repeated observations or measurements on each case. Comparisons of Two Means Slide 24 of 68

25 as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary Let the j th observation equal x j = x j1 x j2.. x jq j = 1,...,n where x ji = response or measurement of the i th treatment on the j th case. Question (hypothesis): Is there a treatment effect? versus H o : µ 1 = µ 2 = = µ q H A : Not H o This is the same hypothesis test in univariate, repeated measures ANOVA. Comparisons of Two Means Slide 25 of 68

26 as a Multivariate Test as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary To test this as a multivariate mean vector, we need to use contrasts of the components of µ, Assume X j N q (µ,σ). Set up a contrast µ 1 µ 2 µ 1 µ 2 =. µ 1 µ q }{{} (q 1) 1 µ = E(x j ) = µ 1 µ 2.. µ q } {{} (q 1) q µ 1 µ 2.. µ q }{{} q 1 = C 1 µ So H o : C 1 µ = 0. (no treatment effect). Comparisons of Two Means Slide 26 of 68

27 Contrast Matrices Any contrast matrix of size (q 1) q will do. as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 For example, C 2 µ = } {{} (q 1) q To be a contrast matrix, The rows are linearly independent. µ 1 µ 2.. µ q }{{} q 1 = µ 1 µ 2 µ 2 µ 3. µ q 1 µ q Each row is a contrast vector. Summary Comparisons of Two Means Slide 27 of 68

28 Hypothesis and Test for as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 The hypothesis of no effects due to treatment in a repeated measures design H o : µ 1 = µ 2 = µ q is the same as performing Hotelling s T 2 of H o : Cµ = 0 where C is a (q 1) q contrast matrix Given data x 1,x 2,...,x n and a contrast matrix C, the T 2 test statistic equals T 2 = nc x(csc ) 1 C x T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary Reject H o if T 2 > (n 1)(q 1) n q +1 F (q 1),(n q+1) (α) Now for our example... Plot data and then SAS/IML Comparisons of Two Means Slide 28 of 68

29 (Scatter) Plot of the Calculator Data as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary Comparisons of Two Means Slide 29 of 68

30 Input 1 from SAS/IML proc iml; * A Module that computes Hotellings Tˆ2 for one sample tests; as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary start Tsq(X,muo,Ts,pvalue); n=nrow(x); one=j(n,1); Xbar = X *one/n; XbarM = one*xbar ; S=(X - XbarM) *(X - XbarM)/(n-1); Ts=n*(xbar-muo) *inv(s)*(xbar-muo); p=ncol(x); dfden=n-1; F=((n-1)*p/(n-p))*Ts; pvalue = 1 - cdf( F,F,p,dfden); finish Tsq; Comparisons of Two Means Slide 30 of 68

31 Input continued as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary X={ , , , , }; C1={ , , }; muo={0, 0, 0}; X1 = X*C1 ; run stats(x1,n1,xbar1,w1,s1); run Tsq(X1,muo,Tsq1,pvalue1); Comparisons of Two Means Slide 31 of 68

32 Output 1 from SAS/IML Data matrix (5 subjects x 4 variables) = X as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T C1 Using C1: Summary Comparisons of Two Means Slide 32 of 68

33 Output 1 continued as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 X*C1 = XBAR1 mean of C1*X1 = TSQ1 PVALUE1 T 2 and Repeated Measures ANOVA vs multivariate T 2 Tˆ2 for C1*mu=0 ----> with p-value = Summary Comparisons of Two Means Slide 33 of 68

34 Using Contrast Matrix 2 as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 C2={ , , }; X2 = X*C2 ; run stats(x2,n2,xbar2,w2,s2); run Tsq(X2,muo,Tsq2,pvalue2); (Partial) Output from this: XBAR2 mean of C2*X2 = TSQ2 PVALUE2 Tˆ2 for C2*mu=0 ----> with p-value = Summary With different contrast matrices, we get different C x vectors, but T 2, p value, and conclusions are exactly the same. Comparisons of Two Means Slide 34 of 68

35 T 2 and as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary As before (1 α)% Confidence region which consists of all Cµ s such that n(c x Cµ) (CSC ) 1 (C x Cµ) (n 1)(q 1) (n q +1) F (q 1),(n q+1)(α) And Simultaneous T 2 intervals for a single contrast c i x where c i is the ith row of matrix C, c i x± (n 1)(q 1) c (n q +1) F i Sc i (q 1),(n q+1)(α) n }{{} For Bonferroni (or one-at-time) confidence intervals, replace statistic above the brace by appropriate value from the t n 1 distribution. For large n, can use χ 2 q 1. Comparisons of Two Means Slide 35 of 68

36 ANOVA vs multivariate T 2 as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary The multivariate T 2 is appropriate for situations where we cannot assume that the covariance matrix for X has a particular structure. With repeated measures ANOVA you must assume that Σ X has a special structure, in particular spherical, σ 2 τ τ τ σ 2 τ Σ X = τ τ σ 2 Unlikely but this works too: Σ = σ 2 I. If the assumptions on the structure of Σ are met, then repeated measures ANOVA is more powerful than multivariate T 2 because the repeated measures ANOVA takes the structure of Σ into account. If assumptions on Σ not met, T 2 is still valid but not repeated measures ANOVA. Comparisons of Two Means Slide 36 of 68

37 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Situation: Two samples, each having p measurements where we have a random sample of size n 1 from population 1 and a random sample of size n 2 from population 2. Sample from population 1 Sample from population 2 {}}{{}}{ X 11,X 12,...,X 1n1 X 21,X 22,...,X 2n2 S 1 = 1 n 1 1 x 1 = 1 n 1 n 1 n 1 j=1 j=1 Sample Means x 1j x 2 = 1 n 1 n 2 Sample Covariance matrices j=1 (x 1j x 1 )(x 1j x 1 ) S 2 = 1 n 2 1 x 2j n 2 j=1 (x 2j x 2 )(x 2j x 2 ) Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 37 of 68 and Bonferroni Hypothesis: H o : µ 1 = µ 2

38 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 38 of 68 and Bonferroni Assumptions 1. The sample X 11,X 12,...,X 1n1 is a random sample of size n 1 from a p variate population with mean vector µ 1 and covariance matrix Σ The sample X 21,X 22,...,X 2n1 is a random sample of size n 2 from a p variate population with mean vector µ 2 and covariance matrix Σ The samples are (statistically) independent of each other. These assumptions are required when we want to test H o : µ 1 = µ 2 or equivalently µ 1 µ 2 = 0 H A : µ 1 µ 2 or equivalently µ 1 µ 2 0 If n 1 and/or n 2 are small, then we must make two additional assumptions: 4. Both populations are multivariate normal. 5. Σ 1 = Σ 2 This is a very strong assumption (stronger than univariate case).

39 Case 1: Known Σ 1 and Σ 2 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 39 of 68 and Bonferroni To develop the test for independent populations, we ll start with supposing that we know Σ 1 and Σ 2 (i.e., we don t have to estimate them) and assume first 4 assumptions made on previous slide. The test statistic would be because ( x 1 x 2 ) ( 1 n 1 Σ n 2 Σ 2 ) 1 ( x 1 = x 2 ) χ 2 p ( x 1 x 2 ) N p ((µ 1 µ 2 ), Why is ( x 1 x 2 ) multivariate normal? 1 Σ ) Σ 2 n 1 n 2 When H o is true, then µ 1 µ 2 = 0 and the test statistic should be small.

40 Case 2: Σ 1 and Σ 2 Unknown Σ 1 and Σ 2 must be estimated. Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region For this more realistic case, we must also assume Σ 1 = Σ 2 = Σ Since Σ 1 = Σ 2 = Σ, we will estimate Σ by pooling the data from the two samples: S pool = (n 1 1)S 1 +(n 2 1)S 2 n 1 +n 2 2 n1 j=1 = (x 1j x 1 )(x 1j x 1 ) + n 2 j=1 (x 2j x 2 )(x 2j x 2 ) n 1 +n 2 2 S pool is an estimator of Σ with df = n 1 +n 2 2. Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 40 of 68 and Bonferroni

41 Distribution of Linear Combination Consider the linear combination of two random vectors x 1 x 2 E( x 1 x 2 ) = µ 1 µ 2 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 41 of 68 and Bonferroni Σ x 1 x 2 = cov( x 1 x 2 ) = cov( x 1 )+cov( x x ) independent samples = 1 Σ+ 1 Σ n 1 n ( 2 1 = + 1 ) Σ n 1 n 2 which is estimated by ( 1 n n 2 )S pool. When x 11,...,x 1n1 is a random sample of size n 1 from N(µ 1,Σ) and x 21,...,x 2n2 is a random sample of size n 2 from N(µ 2,Σ) then the test statistic for H o : µ 1 µ 2 = δ o T 2 = (( x 1 x 2 ) δ o ) (( 1 n n 2 )S pool ) 1 (( x 1 x 2 ) δ o )

42 Distribution of Test Statistic The test statistic T 2 = (( x 1 x 2 ) δ o ) (( 1 n n 2 )S pool ) 1 (( x 1 x 2 ) δ o ) Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 42 of 68 and Bonferroni has a sampling distribution that is (n 1 +n 2 2)p (n 1 +n 2 p 1) F p,(n 1 +n 2 p1 1) or we could just refer (n 1 +n 2 p 1) (n 1 +n 2 2)p T2 to F p,(n1 +n 2 p1 1) Note: (( ) ) 1 S pool = n 1 n 2 So sometimes you ll see (( )) 1 n1 +n 2 S pool = n 1n 2 (S pool ) 1 n 1 n 2 n 1 +n 2 T 2 = n 1n 2 n 1 +n 2 (( x 1 x 2 ) δ o ) S 1 pool (( x 1 x 2 ) δ o )

43 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 43 of 68 and Bonferroni Example: T 2 From Johnson & Wichern: Wisconsin homeowners without airconditioning (n 1 = 45) and those with airconditioning (n 2 = 55). X 1 = total on-peak consumption of electricity July 1977 (in kilowatts) X 2 = total off-peak consumption of electricity July 1977(in kilowatts) S 1 = x 1 = (204.4,556.6) x 2 = (130.0,355.0) and ( x 1 x 2 ) = (74.4,201.6) S pool = 44S 1 +54S 2 98 S 2 = =

44 Example continued The estimated covariance matrix of ( x 1 x 2 ) is Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 44 of 68 and Bonferroni S x 1 x 2 = ( )S pool n 1 n 2 ( = ( ) = ( To test H o : δ = (µ 1 µ 2 ) = 0, compute test statistic ( x 1 x 2 ) S 1 x 1 x 2 ( x 1 x 2 ) = (74,201.6) = For α =.05: (98(2)/97)F 2,97 (.05) = 2.02(3.1) = Conclusion... ) ( ) ) 1 ( )

45 100(1 α)% Confidence Region for µ 1 µ 2 Is the set of all δ = µ 1 µ 2 s such that Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region where n 1 n 2 n 1 +n 2 (( x 1 x 2 ) δ) S 1 pool (( x 1 x 2 ) δ) c 2 c 2 = (n 1 +n 2 2)p (n 1 +n 2 p 1) F p,(n 1 +n 2 p 1)(α) To study the ellipsoid, we can focus on the eigenvalues and eigenvectors of S pool. The axes of the ellipsoid are ( x 1 x 2 )± λ i ( 1 n n 2 )c 2 e i i = 1,...,p where λ i and e i are the eigenvalues and eigenvectors of S pool. Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 45 of 68 and Bonferroni

46 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 46 of 68 and Bonferroni Example:Confidence Region The 95% Confidence Region (Ellipse): The set of all possible (µ 1 µ 2 ) that satisfy the following equation: ( ) 1 ( (74.4 δ 1 ) ((74.4 δ 1 ),(201.6 δ 2 )) (201.6 δ 2 ) where c 2 = (98(2)/97)F 2,97 (.05) = 2.02(3.1) = Eigenvalues and Eigenvectors of S pool are ( λ 1 = , e 1 = and λ 2 = , e 2 = ( ) ) ) c 2

47 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 47 of 68 and Bonferroni Computing the Axes of the Ellipse Major axis ( ) Minor axis ( ) ± λ 1 ( 1 n n 2 )c 2 e 1 ± ( ) ( ( ) , ± ( ) ( ( ) , ) )

48 Figure of 95% Confidence Region µ 12 µ 22 (off-peak) 300 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 48 of 68 and Bonferroni δ o = (0,0) d = (74.4,201.6) µ 11 µ 21 (on-peak)

49 Simultaneous T 2 Intervals Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Let c 2 = (n 1 +n 2 2)p (n 1 +n 2 p 1) F p,(n 1 +n 2 p 1)(α) With confidence 100(1 α)% ( ) a ( x 1 x 2 )±c a 1n1 + 1n2 S pool a will cover a (µ 1 µ 2 ) for all possible a. By appropriate choices for a, we can get component intervals: a 1 =, a 1 2 =,, a 0 p = Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 49 of 68 and Bonferroni

50 Simultaneous T 2 continued Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region So the component intervals are ( 1 ( x 11 x 21 ) ± c + 1 ) n 1 n 2 ( 1 ( x 12 x 22 ) ± c + 1 ) n 1 n 2 where... ( x 1p x 2p ) ± c c = ( 1 n n 2 ) s pool,11 s pool,22 s pool,pp (n 1 +n 2 2)p (n 1 +n 2 p 1) F p,(n 1 +n 2 p 1)(α) Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 50 of 68 and Bonferroni

51 Example: Simultaneous T 2 intervals Consider the linear combination vectors: a 1 = (1,0) So a 1δ = a 1(µ 1 µ 2 ) = µ 11 µ 21 = δ 1 and Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region a 2 = (0,1) So a 2δ = a 2(µ 1 µ 2 ) = µ 12 µ 22 = δ 2 Using these we get the intervals for on-peak 74.4±(2.502) δ and for off-peak 201.6±(2.502) δ Note: c 2 = 6.26 = Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 51 of 68 and Bonferroni

52 Bonferroni and One-at-a-Time Intervals For Bonferroni and One-at-a-Time (i.e., univariate method) intervals, you simply need to change the value of c. Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Bonferroni c = t n1 +n 2 2(α/2m) where m = number of intervals formed (probably p, but no more). These should be planned a priori. One-at-a-Time c = t n1 +n 2 2(α/2) Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 52 of 68 and Bonferroni

53 Example: Simultaneous T 2 and Bonferroni µ 12 µ 22 (off-peak) 300 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 53 of 68 and Bonferroni δ o = (0,0) d = (74.4,201.6) µ 11 µ 21 (on-peak)

54 Case 3: Large n 1 p and n 2 p If n 1 p and n 2 p are large, then we do NOT need to assume: Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Σ 1 = Σ 2. x 1j multivariate normal. x 2j multivariate normal. We do need to assume that Observations between populations are independent. x 11,...x 1,n1 are a random sample from population 1 with µ 1 and Σ 1. x 21,...x 2,n2 are a random sample from population 2 with µ 2 and Σ 2. If n 1 p and n 2 p are large, then an approximate sampling distribution for the test statistic T 2 is χ 2 p. Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 54 of 68 and Bonferroni

55 Large Sample Case To test Estimate the covariance matrix of the differences Σ x 1 x 2... remember case 1? Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region which we can estimate using Σ x 1 x 2 = Σ x 1 +Σ x 2 = 1 n 1 Σ n 2 Σ 2 1 n 1 S n 2 S 2 Test statistic for H o : µ 1 µ 2 = δ o T 2 = (( x 1 x 2 ) δ o ) ( 1 n 1 S n 2 S 2 ) 1 (( x 1 x 2 ) δ o ) χ 2 p Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 55 of 68 and Bonferroni

56 Large Sample Case continued A 100(1 α)% Confidence region (ellipsoid) for δ = µ 1 µ 2 is the set of all δ that satisfy Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region (( x 1 x 2 ) δ) ( 1 n 1 S n 2 S 2 ) 1 ( x 1 x 2 ) δ) χ 2 p(α) For 100(1 α)% simultaneous χ 2 intervals ( ) a ( x 1 x 2 )± χ 2 p(α) a 1n1 S 1 + 1n2 S 2 a Let s try this for the air conditioner data... Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 56 of 68 and Bonferroni

57 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Example using Large Sample What if Σ 1 Σ 2? n 1 and n 2 may be large enough to use the large sample theory. ( ) ( 1 S S 2 = n 1 n ( ) = [ 1 S ] ( 1 S 2 = n 1 n ) 10 4 ) Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 57 of 68 and Bonferroni

58 Example: Large Sample Test Statistic Test H o : δ = 0: Test statistic is ( x 1 x 2 ) [ 1 n 1 S n 2 S 2 ] 1 ( x 1 x 2 ) Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 58 of 68 and Bonferroni = (( ),( )) = (10 4 ) which for α =.05, the critical value from χ 2 p of 5.99 (the p-value <.005) Compare this with T 2 = using S pool (where we assumed that Σ 1 = Σ 2 )

59 Large Sample χ 2 Intervals Using the same the linear combination vectors as above: Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 59 of 68 and Bonferroni and a 1 = (1,0) so a 1δ = a 1(µ 1 µ 2 ) = µ 11 µ 21 a 2 = (0,1) so a 2δ = a 2(µ 1 µ 2 ) = µ 12 µ 22 ( )± = (21.7,127.1) ( )± = (75.8,327.4) which are very similar to the T 2 intervals given previously Note: X 2 2(.05) = 5.99

60 Sample Sample with n 1 = n 2 We obtained similar results in our large and small sample procedures; however, one possible reason stems from n 1 n 2. Note that when n 1 = n 2 = n (n 1) n+n 2 = 1 2 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 60 of 68 and Bonferroni 1 n S n S 2 = 1 ( ) (n 1) n (S 1 +S 2 ) = 2 n+n 2 }{{} = 2 n n+n 2 ( 1 = n + 1 ) S pool n =1 ( ) (n 1)S1 +(n 1)S 2 1 n (S 1 +S 2 ) This implies that with equal samples, the large sample procedure for computing an estimate of Σ x 1 x 2 is essentially the same as the procedure based on pooled covariance matrix.

61 Case 4: Small sample with Σ 1 Σ 2 We should consider whether Σ 1 = Σ 2 is a reasonable assumption. Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 61 of 68 and Bonferroni If n 1 p and n 2 p are small and Σ 1 Σ 2, then there s no nice measure like T 2 whose distribution does not depend on Σ 1 and Σ 2. Rule-of-Thumb for when to worry about Σ 1 Σ 2 : Don t worry if ratios σ 1,ik /σ 2,ik 4 (or σ 2,ik /σ 1,ik 4). Our air conditioner example: (1, 1) / = 1.60 (1, 2) / = 1.21 (2, 2) / = 1.31 all 4

62 Testing whether Σ 1 = Σ 2 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region We could use Bartlet s test, but this assumes Data are multivariate normal (not just that the means are multivariate normal). Σ 1 = Σ 2. So if you reject H o (significant test statistics), it could be because Σ 1 Σ 2 Data are not normal. Or both Σ 1 Σ 2 and Data are not normal. Additionally for a valid test you need large samples, but if you have large samples you don t need to assumed that Σ 1 = Σ 2 (or normality of the data). Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 62 of 68 and Bonferroni

63 Revisiting Examining Why Our motivation for computing confidence intervals for components of mean vector was to come to conclusion about individual means. The simultaneous T 2 intervals hold for any a. Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 63 of 68 and Bonferroni The a that leads to the largest population difference is proportional to S 1 pool ( x 1 x 2 ) = a If null hypothesis using T 2 is rejected, then a ( x 1 x 2 ) has the largest possible statistic a ( x 1 x 2 ) = ( x 1 x 2 ) S 1 pool ( x 1 x 2 ) which is a multiple of T 2. a is useful for interpreting and describing why H o was rejected.

64 Interpretation Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region For the air conditioner data (using large sample), a is proportional to ( )( ) ( ( ) = So the difference in X 2 (off-peak consumption) contributes more (.063 >.041) to the rejection of H o : µ 1 µ 2 = 0 via T 2 test than X 1 (on-peak energy consumption). Note: a (µ 1 µ 2 ) = (.041(µ 11 µ 21 ).063(µ 12 µ 22 ) ) ) Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 64 of 68 and Bonferroni

65 Summary regarding Inferences about µ Four reasons for taking a multivariate approach to hypothesis testing: Summary Summary regarding Inferences about µ Error Rates & More Reasons Reason 4 A couple of final notes Reason 1: If you do p univariate (t) tests, you have an inflated type I error rate (i.e., actual α larger than you want it to be). With a multivariate test, the exact α level is under your control..g., If p = 5 and you perform p separate univariate tests all at α =.05, then Prob{at least 1 false rejection} = Prob{at leat 1 Type I error} >.05 In the extreme case where all the variables are independent, if H o is true Prob{at least 1 false rejection} = 1 Prob{all P retained} = 1 (1 α) p Comparisons of Two Means Slide 65 of 68

66 Error Rates & More Reasons Overall error rates are somewhere between For p = 5 =.05 and.23 For p = 10 =.05 and.40. Summary Summary regarding Inferences about µ Error Rates & More Reasons Reason 4 A couple of final notes Reason 2: Univariate tests ignore (completely) the correlations between the variables. Multivariate tests make direct use of the covariance matrix. Reason 3: Multivariate tests are more powerful (in most cases). Sometimes all p univariate tests fail to reach significance, but multivariate test is significant because small effects combine to jointly indicate significance. Note: For a given sample size, there is a limit to the number of variables a multivariate test can handle without losing power. Comparisons of Two Means Slide 66 of 68

Inferences about a Mean Vector

Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University