I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Size: px
Start display at page:

Download "I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN"

Transcription

1 Comparisons of Two Means Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees, University of Illinois Comparisons of Two Means Slide 1 of 68

2 Outline Summary : p variables, 2 matched pairs (i.e., dependent samples): H o : µ 1 µ 2 = δ = 0 Repeated measures designs: 1 variable measured as multiple times: H o : Lµ = 0 Two independent samples: Four Cases of H o : µ 1 = µ 2 Missing data later in the semester Reading: Johnson & Wichern pages Comparisons of Two Means Slide 2 of 68

3 (dependent samples) Paired observations arise in a number of different ways: Every subject (case) responds twice (e.g., pre/post test) (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Cases may be matched (on relevant variables) and then randomly assigned to one of two treatments. Naturally occurring pairs: husbands/wifes, siblings, etc. The plan: Review univariate and then generalize to the multivariate situation. For j = 1,...,n (number of pairs), let X j1 = measurement (response) of the j th case given treatment 1. X j2 = measurement (response) of the j th case given treatment 2. We want to examine the differences D j = X j1 X j2 Comparisons of Two Means Slide 3 of 68

4 Univariate Case D j = X j1 X j2 (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data If D j N(δ,σD 2 ), then the statistic t = D δ s D / n Student s t distribution where D = (1/n) n j=1 D j = (1/n) n j=1 (X j1 X j2 ) s 2 D = (1/(n 1)) n j=1 (D j D) 2 Test H o : δ = 0 versus H A : δ 0 (or H o : δ = δ o versus H A : δ δ o ). A 100(1 α)% confidence interval (estimate) of δ D ±t n 1 (α/2) sd n Comparisons of Two Means Slide 4 of 68

5 Advantage The advantage of looking at differences using paired... (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data It eliminates effects of case-to-case variation, because the variance (standard deviation) of differences is reduced to the extent that the scores/measurements are positively correlated σ 2 D = σ 2 X 1 +σ 2 X 2 2σ X1,X 2 This result comes from what we know about linear combinations: ( ) so D = a X = (1, 1) X 1 X 2 = X 1 X 2 µ D = a µ var(d) = a Σa where µ 2 1 is the mean vector for X and Σ 2 2 covariance matrix for X. Comparisons of Two Means Slide 5 of 68

6 Multivariate Situation (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Record p variables for each treatment (condition) for each member of each pair. For case j, we have X 1j1 = variable 1, treatment 1 X 2j1 = variable 1, treatment 2 X 1j2 = variable 2, treatment 1 X 2j2 = variable 2, treatment 2. X 1jp = variable p, treatment 1 X 2jp = variable p, treatment 2 where j = 1,...,n (n = the number of pairs that we have). We Study the differences D j1 = X 1j1 X 2j1 D j2 = X 1j2 X 2j2. D jp = X 1jp X 2jp. D j = D j1 D j2. D jp Comparisons of Two Means Slide 6 of 68

7 Needed for Statistical Inference (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Assume the D j N p (δ,σ D ) and i.i.d. for j = 1,...,J where δ = δ 1 δ 2.. δ p = E(D j) If the differences D 1,D 2,...,D n are a random sample from a N p (δ,σ D ) population, then T 2 = n( D δ) S 1 ( D δ) (n 1)p n p F p,n p Modification for Large Samples: If n and (n-p) are large, then T 2 is approximately distributed as a χ 2 p random variable regardless of the distribution of D j (i.e., D j may not be multivariate normal, but δ and Σ 1 D exist). Comparisons of Two Means Slide 7 of 68

8 Statistical Inference Suppose that we have observations d j = (d j1,d j2,...,d jp for j = 1,...,n). (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Descriptive statistics: d p 1 = 1 n n j=1 Hypothesis Test: d j and S d,(p p) = 1 n 1 n (d j d)(d j d) j=1 H o : δ = 0 versus H A : δ 0... assuming D j N p (δ,σ D ) and i.i.d. Reject H o if T 2 = n d S 1 d (n 1)p n p F p,n p(α) Comparisons of Two Means Slide 8 of 68

9 If you Reject H o : δ = 0 (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Confidence Region: n( D δ) S 1 ( D δ) (n 1)p n p F p,n p(α) Simultaneous T 2 Intervals for individual differences of components means (n 1)p δ i : di ± n p F p,n p(α) s 2 d i /n where d i is mean difference of the i th variable and s 2 d i is the i th diagonal element of S d. Bonferroni 100(1 α)% confidence intervals δ i : di ±t n 1 (α/2m) s 2 d i /n where m = the number of confidence intervals (). Comparisons of Two Means Slide 9 of 68

10 Large Samples (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure For Large (n p) (i.e., D j need not be multivariate normal) (n 1)p n p F p,n p(α) χ 2 p(α) Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 10 of 68

11 Example: The data Data from Table 5.9, page of Rencher (2007): (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data "Each of 15 students wrote an informal and a formal essay (Kramer, 1972, p100). The variables were recorded were the number of words and number of verbs" y1 = words in informal essay y2 = verbs in informal essay y3 = words in formal essay y4 = verbs in formal essay These are count data. CLT kick-in? n = 15 smallish Sample Statistics: Difference: d =words [verbs] informal words [verbs] formal. ( ) ( ) words d = S = 3.53 verbs Comparisons of Two Means Slide 11 of 68

12 Plot of the Data (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 12 of 68

13 Plot of the Data: Cases Connected (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 13 of 68

14 Plot of the Differences (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 14 of 68

15 Example: Test H o : δ = 0 versus H A : δ 0 (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data (i.e., the number of words and verbs in informal and formal essays are the same). ( ) 1 ( T = 15 (32.80,3.53) ( ) = 15 (32.80, 3.53) = (14(2)/13)F 2,13 (.05) = 8.20 Alternatively, (13)/((14)2)T 2 = 7.053, which is distributed as F 2,13, and has a p-value of =.008 Conclusion: Reject H o. The data support the conclusion that the number of words and verbs in informal essays are not equal to the number in formal ones. ) Comparisons of Two Means Slide 15 of 68

16 95% Confidence Region for δ From SAS>Solutions>Interactive Data Analysis Analyze > Multivariate (scatter plot, curves) (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 16 of 68

17 95% Confidence Region for the Mean (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 17 of 68

18 SAS for the Last Figure (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure proc sgscatter data=essay; compare y= dverbs x= dwords / ellipse=(type=mean) ; title 95% Confidence Region for the mean Difference ; run; Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 18 of 68

19 Confidence Region, T 2 & Bonferroni Intervals (dependent samples) Verbs Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 d = (32.80,3.53) Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Words Plot of the Differences Example: Test δ the Mean ր δ o = (0,0) SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Comparisons of Two Means Slide 19 of 68

20 Another way to calculate T 2 for paired. (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data So far we ve divided the sample ; that is, D = X 1 X 2. Now we ll consider a Full Sample method that considers every case as a pair and each with p measures on each member of the pair. Pair or Case Number Conditon 1 2 j n (a) (b) p variables p variables p variables p variables p variables p variables p variables p variables So we have 2p variables measured for each case (pair). In an experimental situation, the conditions are assumed to have been randomly assigned to members of the pairs. Comparisons of Two Means Slide 20 of 68

21 (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Full Data Method for paired Full Data Matrix: X 111 X 112 X 11p X 121 X 122 X 12p X 211 X 212 X 21p X 221 X 222 X 22p X n 2p = X n11 X n12 X n1p X n21 X n22 X n2p = (X }{{} 1 X 2 ) }{{} n p n p Full Sample Mean Vector: X = ( X 11, X 12,..., X 1p X 21,..., X 2p ) = ( X 1 X 2) Comparisons of Two Means Slide 21 of 68

22 (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data Full Data Method for paired Full Data Sample Covariance Matrix: ( S 11 S 12 S 2p 2p = S 21 S 22 where S 11 is the (p p) covariance matrix for X 1 S 22 is the (p p) covariance matrix for X 2 S 12 = S 21 is the (p p) covariance matrix between X 1 & X 2. Define a Contrast Matrix: C p 2p = Comparisons of Two Means Slide 22 of 68 ) = (I p p I p p ) What condition do you need to have a contrast matrix?

23 Computations for Full Data Let x j,(2p 1) = j th row of X (n 2p) written as a column vector. (dependent samples) Univariate Case Advantage Multivariate Situation Needed for Statistical Inference Statistical Inference If you Reject Ho : δ = 0 Large Samples Example: The data Plot of the Data Plot of the Data: Cases Connected Plot of the Differences Example: Test δ the Mean SAS for the Last Figure Confidence Region,T 2 & Bonferroni Intervals Another way to calculatet 2 Computations for Full Data d j = Cx j d = C x = C((1/n) n j=1 x j) Putting all of this together yields T 2 = n(c x) (CSC ) 1 (C x) = n x C (CSC ) 1 C x With this method, we don t have to split the data set and compute the differences. We ll see more uses of contrast matrices.... relatively soon. SAS/IML code for essay example. Comparisons of Two Means Slide 23 of 68

24 for comparing conditions (treatments, etc). This is another generalization of univariate paired t test. as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary Situation: q conditions are compared with respect to one response variable. Each case receives each treatment once over successive periods of time. The order of the treatments should be randomized (& counterbalanced if possible). Example from Cochran & Cox (1957) (I got this from Timm 1980): There are four calculator designs and each person does specified computations. Their speed is recorded for each of the four calculators. The order of the calculator use was randomly assigned. This is Repeated measures because each case (person) gets each treatment (calculator)... we have repeated observations or measurements on each case. Comparisons of Two Means Slide 24 of 68

25 as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary Let the j th observation equal x j = x j1 x j2.. x jq j = 1,...,n where x ji = response or measurement of the i th treatment on the j th case. Question (hypothesis): Is there a treatment effect? versus H o : µ 1 = µ 2 = = µ q H A : Not H o This is the same hypothesis test in univariate, repeated measures ANOVA. Comparisons of Two Means Slide 25 of 68

26 as a Multivariate Test as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary To test this as a multivariate mean vector, we need to use contrasts of the components of µ, Assume X j N q (µ,σ). Set up a contrast µ 1 µ 2 µ 1 µ 2 =. µ 1 µ q }{{} (q 1) 1 µ = E(x j ) = µ 1 µ 2.. µ q } {{} (q 1) q µ 1 µ 2.. µ q }{{} q 1 = C 1 µ So H o : C 1 µ = 0. (no treatment effect). Comparisons of Two Means Slide 26 of 68

27 Contrast Matrices Any contrast matrix of size (q 1) q will do. as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 For example, C 2 µ = } {{} (q 1) q To be a contrast matrix, The rows are linearly independent. µ 1 µ 2.. µ q }{{} q 1 = µ 1 µ 2 µ 2 µ 3. µ q 1 µ q Each row is a contrast vector. Summary Comparisons of Two Means Slide 27 of 68

28 Hypothesis and Test for as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 The hypothesis of no effects due to treatment in a repeated measures design H o : µ 1 = µ 2 = µ q is the same as performing Hotelling s T 2 of H o : Cµ = 0 where C is a (q 1) q contrast matrix Given data x 1,x 2,...,x n and a contrast matrix C, the T 2 test statistic equals T 2 = nc x(csc ) 1 C x T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary Reject H o if T 2 > (n 1)(q 1) n q +1 F (q 1),(n q+1) (α) Now for our example... Plot data and then SAS/IML Comparisons of Two Means Slide 28 of 68

29 (Scatter) Plot of the Calculator Data as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary Comparisons of Two Means Slide 29 of 68

30 Input 1 from SAS/IML proc iml; * A Module that computes Hotellings Tˆ2 for one sample tests; as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary start Tsq(X,muo,Ts,pvalue); n=nrow(x); one=j(n,1); Xbar = X *one/n; XbarM = one*xbar ; S=(X - XbarM) *(X - XbarM)/(n-1); Ts=n*(xbar-muo) *inv(s)*(xbar-muo); p=ncol(x); dfden=n-1; F=((n-1)*p/(n-p))*Ts; pvalue = 1 - cdf( F,F,p,dfden); finish Tsq; Comparisons of Two Means Slide 30 of 68

31 Input continued as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary X={ , , , , }; C1={ , , }; muo={0, 0, 0}; X1 = X*C1 ; run stats(x1,n1,xbar1,w1,s1); run Tsq(X1,muo,Tsq1,pvalue1); Comparisons of Two Means Slide 31 of 68

32 Output 1 from SAS/IML Data matrix (5 subjects x 4 variables) = X as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T C1 Using C1: Summary Comparisons of Two Means Slide 32 of 68

33 Output 1 continued as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 X*C1 = XBAR1 mean of C1*X1 = TSQ1 PVALUE1 T 2 and Repeated Measures ANOVA vs multivariate T 2 Tˆ2 for C1*mu=0 ----> with p-value = Summary Comparisons of Two Means Slide 33 of 68

34 Using Contrast Matrix 2 as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 C2={ , , }; X2 = X*C2 ; run stats(x2,n2,xbar2,w2,s2); run Tsq(X2,muo,Tsq2,pvalue2); (Partial) Output from this: XBAR2 mean of C2*X2 = TSQ2 PVALUE2 Tˆ2 for C2*mu=0 ----> with p-value = Summary With different contrast matrices, we get different C x vectors, but T 2, p value, and conclusions are exactly the same. Comparisons of Two Means Slide 34 of 68

35 T 2 and as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary As before (1 α)% Confidence region which consists of all Cµ s such that n(c x Cµ) (CSC ) 1 (C x Cµ) (n 1)(q 1) (n q +1) F (q 1),(n q+1)(α) And Simultaneous T 2 intervals for a single contrast c i x where c i is the ith row of matrix C, c i x± (n 1)(q 1) c (n q +1) F i Sc i (q 1),(n q+1)(α) n }{{} For Bonferroni (or one-at-time) confidence intervals, replace statistic above the brace by appropriate value from the t n 1 distribution. For large n, can use χ 2 q 1. Comparisons of Two Means Slide 35 of 68

36 ANOVA vs multivariate T 2 as a Multivariate Test Contrast Matrices Hypothesis and Test for (Scatter) Plot of the Calculator Data Input 1 from SAS/IML Input continued Output 1 from SAS/IML Output 1 continued Using Contrast Matrix 2 T 2 and Repeated Measures ANOVA vs multivariate T 2 Summary The multivariate T 2 is appropriate for situations where we cannot assume that the covariance matrix for X has a particular structure. With repeated measures ANOVA you must assume that Σ X has a special structure, in particular spherical, σ 2 τ τ τ σ 2 τ Σ X = τ τ σ 2 Unlikely but this works too: Σ = σ 2 I. If the assumptions on the structure of Σ are met, then repeated measures ANOVA is more powerful than multivariate T 2 because the repeated measures ANOVA takes the structure of Σ into account. If assumptions on Σ not met, T 2 is still valid but not repeated measures ANOVA. Comparisons of Two Means Slide 36 of 68

37 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Situation: Two samples, each having p measurements where we have a random sample of size n 1 from population 1 and a random sample of size n 2 from population 2. Sample from population 1 Sample from population 2 {}}{{}}{ X 11,X 12,...,X 1n1 X 21,X 22,...,X 2n2 S 1 = 1 n 1 1 x 1 = 1 n 1 n 1 n 1 j=1 j=1 Sample Means x 1j x 2 = 1 n 1 n 2 Sample Covariance matrices j=1 (x 1j x 1 )(x 1j x 1 ) S 2 = 1 n 2 1 x 2j n 2 j=1 (x 2j x 2 )(x 2j x 2 ) Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 37 of 68 and Bonferroni Hypothesis: H o : µ 1 = µ 2

38 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 38 of 68 and Bonferroni Assumptions 1. The sample X 11,X 12,...,X 1n1 is a random sample of size n 1 from a p variate population with mean vector µ 1 and covariance matrix Σ The sample X 21,X 22,...,X 2n1 is a random sample of size n 2 from a p variate population with mean vector µ 2 and covariance matrix Σ The samples are (statistically) independent of each other. These assumptions are required when we want to test H o : µ 1 = µ 2 or equivalently µ 1 µ 2 = 0 H A : µ 1 µ 2 or equivalently µ 1 µ 2 0 If n 1 and/or n 2 are small, then we must make two additional assumptions: 4. Both populations are multivariate normal. 5. Σ 1 = Σ 2 This is a very strong assumption (stronger than univariate case).

39 Case 1: Known Σ 1 and Σ 2 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 39 of 68 and Bonferroni To develop the test for independent populations, we ll start with supposing that we know Σ 1 and Σ 2 (i.e., we don t have to estimate them) and assume first 4 assumptions made on previous slide. The test statistic would be because ( x 1 x 2 ) ( 1 n 1 Σ n 2 Σ 2 ) 1 ( x 1 = x 2 ) χ 2 p ( x 1 x 2 ) N p ((µ 1 µ 2 ), Why is ( x 1 x 2 ) multivariate normal? 1 Σ ) Σ 2 n 1 n 2 When H o is true, then µ 1 µ 2 = 0 and the test statistic should be small.

40 Case 2: Σ 1 and Σ 2 Unknown Σ 1 and Σ 2 must be estimated. Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region For this more realistic case, we must also assume Σ 1 = Σ 2 = Σ Since Σ 1 = Σ 2 = Σ, we will estimate Σ by pooling the data from the two samples: S pool = (n 1 1)S 1 +(n 2 1)S 2 n 1 +n 2 2 n1 j=1 = (x 1j x 1 )(x 1j x 1 ) + n 2 j=1 (x 2j x 2 )(x 2j x 2 ) n 1 +n 2 2 S pool is an estimator of Σ with df = n 1 +n 2 2. Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 40 of 68 and Bonferroni

41 Distribution of Linear Combination Consider the linear combination of two random vectors x 1 x 2 E( x 1 x 2 ) = µ 1 µ 2 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 41 of 68 and Bonferroni Σ x 1 x 2 = cov( x 1 x 2 ) = cov( x 1 )+cov( x x ) independent samples = 1 Σ+ 1 Σ n 1 n ( 2 1 = + 1 ) Σ n 1 n 2 which is estimated by ( 1 n n 2 )S pool. When x 11,...,x 1n1 is a random sample of size n 1 from N(µ 1,Σ) and x 21,...,x 2n2 is a random sample of size n 2 from N(µ 2,Σ) then the test statistic for H o : µ 1 µ 2 = δ o T 2 = (( x 1 x 2 ) δ o ) (( 1 n n 2 )S pool ) 1 (( x 1 x 2 ) δ o )

42 Distribution of Test Statistic The test statistic T 2 = (( x 1 x 2 ) δ o ) (( 1 n n 2 )S pool ) 1 (( x 1 x 2 ) δ o ) Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 42 of 68 and Bonferroni has a sampling distribution that is (n 1 +n 2 2)p (n 1 +n 2 p 1) F p,(n 1 +n 2 p1 1) or we could just refer (n 1 +n 2 p 1) (n 1 +n 2 2)p T2 to F p,(n1 +n 2 p1 1) Note: (( ) ) 1 S pool = n 1 n 2 So sometimes you ll see (( )) 1 n1 +n 2 S pool = n 1n 2 (S pool ) 1 n 1 n 2 n 1 +n 2 T 2 = n 1n 2 n 1 +n 2 (( x 1 x 2 ) δ o ) S 1 pool (( x 1 x 2 ) δ o )

43 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 43 of 68 and Bonferroni Example: T 2 From Johnson & Wichern: Wisconsin homeowners without airconditioning (n 1 = 45) and those with airconditioning (n 2 = 55). X 1 = total on-peak consumption of electricity July 1977 (in kilowatts) X 2 = total off-peak consumption of electricity July 1977(in kilowatts) S 1 = x 1 = (204.4,556.6) x 2 = (130.0,355.0) and ( x 1 x 2 ) = (74.4,201.6) S pool = 44S 1 +54S 2 98 S 2 = =

44 Example continued The estimated covariance matrix of ( x 1 x 2 ) is Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 44 of 68 and Bonferroni S x 1 x 2 = ( )S pool n 1 n 2 ( = ( ) = ( To test H o : δ = (µ 1 µ 2 ) = 0, compute test statistic ( x 1 x 2 ) S 1 x 1 x 2 ( x 1 x 2 ) = (74,201.6) = For α =.05: (98(2)/97)F 2,97 (.05) = 2.02(3.1) = Conclusion... ) ( ) ) 1 ( )

45 100(1 α)% Confidence Region for µ 1 µ 2 Is the set of all δ = µ 1 µ 2 s such that Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region where n 1 n 2 n 1 +n 2 (( x 1 x 2 ) δ) S 1 pool (( x 1 x 2 ) δ) c 2 c 2 = (n 1 +n 2 2)p (n 1 +n 2 p 1) F p,(n 1 +n 2 p 1)(α) To study the ellipsoid, we can focus on the eigenvalues and eigenvectors of S pool. The axes of the ellipsoid are ( x 1 x 2 )± λ i ( 1 n n 2 )c 2 e i i = 1,...,p where λ i and e i are the eigenvalues and eigenvectors of S pool. Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 45 of 68 and Bonferroni

46 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 46 of 68 and Bonferroni Example:Confidence Region The 95% Confidence Region (Ellipse): The set of all possible (µ 1 µ 2 ) that satisfy the following equation: ( ) 1 ( (74.4 δ 1 ) ((74.4 δ 1 ),(201.6 δ 2 )) (201.6 δ 2 ) where c 2 = (98(2)/97)F 2,97 (.05) = 2.02(3.1) = Eigenvalues and Eigenvectors of S pool are ( λ 1 = , e 1 = and λ 2 = , e 2 = ( ) ) ) c 2

47 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 47 of 68 and Bonferroni Computing the Axes of the Ellipse Major axis ( ) Minor axis ( ) ± λ 1 ( 1 n n 2 )c 2 e 1 ± ( ) ( ( ) , ± ( ) ( ( ) , ) )

48 Figure of 95% Confidence Region µ 12 µ 22 (off-peak) 300 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 48 of 68 and Bonferroni δ o = (0,0) d = (74.4,201.6) µ 11 µ 21 (on-peak)

49 Simultaneous T 2 Intervals Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Let c 2 = (n 1 +n 2 2)p (n 1 +n 2 p 1) F p,(n 1 +n 2 p 1)(α) With confidence 100(1 α)% ( ) a ( x 1 x 2 )±c a 1n1 + 1n2 S pool a will cover a (µ 1 µ 2 ) for all possible a. By appropriate choices for a, we can get component intervals: a 1 =, a 1 2 =,, a 0 p = Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 49 of 68 and Bonferroni

50 Simultaneous T 2 continued Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region So the component intervals are ( 1 ( x 11 x 21 ) ± c + 1 ) n 1 n 2 ( 1 ( x 12 x 22 ) ± c + 1 ) n 1 n 2 where... ( x 1p x 2p ) ± c c = ( 1 n n 2 ) s pool,11 s pool,22 s pool,pp (n 1 +n 2 2)p (n 1 +n 2 p 1) F p,(n 1 +n 2 p 1)(α) Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 50 of 68 and Bonferroni

51 Example: Simultaneous T 2 intervals Consider the linear combination vectors: a 1 = (1,0) So a 1δ = a 1(µ 1 µ 2 ) = µ 11 µ 21 = δ 1 and Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region a 2 = (0,1) So a 2δ = a 2(µ 1 µ 2 ) = µ 12 µ 22 = δ 2 Using these we get the intervals for on-peak 74.4±(2.502) δ and for off-peak 201.6±(2.502) δ Note: c 2 = 6.26 = Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 51 of 68 and Bonferroni

52 Bonferroni and One-at-a-Time Intervals For Bonferroni and One-at-a-Time (i.e., univariate method) intervals, you simply need to change the value of c. Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Bonferroni c = t n1 +n 2 2(α/2m) where m = number of intervals formed (probably p, but no more). These should be planned a priori. One-at-a-Time c = t n1 +n 2 2(α/2) Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 52 of 68 and Bonferroni

53 Example: Simultaneous T 2 and Bonferroni µ 12 µ 22 (off-peak) 300 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 53 of 68 and Bonferroni δ o = (0,0) d = (74.4,201.6) µ 11 µ 21 (on-peak)

54 Case 3: Large n 1 p and n 2 p If n 1 p and n 2 p are large, then we do NOT need to assume: Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Σ 1 = Σ 2. x 1j multivariate normal. x 2j multivariate normal. We do need to assume that Observations between populations are independent. x 11,...x 1,n1 are a random sample from population 1 with µ 1 and Σ 1. x 21,...x 2,n2 are a random sample from population 2 with µ 2 and Σ 2. If n 1 p and n 2 p are large, then an approximate sampling distribution for the test statistic T 2 is χ 2 p. Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 54 of 68 and Bonferroni

55 Large Sample Case To test Estimate the covariance matrix of the differences Σ x 1 x 2... remember case 1? Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region which we can estimate using Σ x 1 x 2 = Σ x 1 +Σ x 2 = 1 n 1 Σ n 2 Σ 2 1 n 1 S n 2 S 2 Test statistic for H o : µ 1 µ 2 = δ o T 2 = (( x 1 x 2 ) δ o ) ( 1 n 1 S n 2 S 2 ) 1 (( x 1 x 2 ) δ o ) χ 2 p Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 55 of 68 and Bonferroni

56 Large Sample Case continued A 100(1 α)% Confidence region (ellipsoid) for δ = µ 1 µ 2 is the set of all δ that satisfy Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region (( x 1 x 2 ) δ) ( 1 n 1 S n 2 S 2 ) 1 ( x 1 x 2 ) δ) χ 2 p(α) For 100(1 α)% simultaneous χ 2 intervals ( ) a ( x 1 x 2 )± χ 2 p(α) a 1n1 S 1 + 1n2 S 2 a Let s try this for the air conditioner data... Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 56 of 68 and Bonferroni

57 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Example using Large Sample What if Σ 1 Σ 2? n 1 and n 2 may be large enough to use the large sample theory. ( ) ( 1 S S 2 = n 1 n ( ) = [ 1 S ] ( 1 S 2 = n 1 n ) 10 4 ) Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 57 of 68 and Bonferroni

58 Example: Large Sample Test Statistic Test H o : δ = 0: Test statistic is ( x 1 x 2 ) [ 1 n 1 S n 2 S 2 ] 1 ( x 1 x 2 ) Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 58 of 68 and Bonferroni = (( ),( )) = (10 4 ) which for α =.05, the critical value from χ 2 p of 5.99 (the p-value <.005) Compare this with T 2 = using S pool (where we assumed that Σ 1 = Σ 2 )

59 Large Sample χ 2 Intervals Using the same the linear combination vectors as above: Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 59 of 68 and Bonferroni and a 1 = (1,0) so a 1δ = a 1(µ 1 µ 2 ) = µ 11 µ 21 a 2 = (0,1) so a 2δ = a 2(µ 1 µ 2 ) = µ 12 µ 22 ( )± = (21.7,127.1) ( )± = (75.8,327.4) which are very similar to the T 2 intervals given previously Note: X 2 2(.05) = 5.99

60 Sample Sample with n 1 = n 2 We obtained similar results in our large and small sample procedures; however, one possible reason stems from n 1 n 2. Note that when n 1 = n 2 = n (n 1) n+n 2 = 1 2 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 60 of 68 and Bonferroni 1 n S n S 2 = 1 ( ) (n 1) n (S 1 +S 2 ) = 2 n+n 2 }{{} = 2 n n+n 2 ( 1 = n + 1 ) S pool n =1 ( ) (n 1)S1 +(n 1)S 2 1 n (S 1 +S 2 ) This implies that with equal samples, the large sample procedure for computing an estimate of Σ x 1 x 2 is essentially the same as the procedure based on pooled covariance matrix.

61 Case 4: Small sample with Σ 1 Σ 2 We should consider whether Σ 1 = Σ 2 is a reasonable assumption. Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 61 of 68 and Bonferroni If n 1 p and n 2 p are small and Σ 1 Σ 2, then there s no nice measure like T 2 whose distribution does not depend on Σ 1 and Σ 2. Rule-of-Thumb for when to worry about Σ 1 Σ 2 : Don t worry if ratios σ 1,ik /σ 2,ik 4 (or σ 2,ik /σ 1,ik 4). Our air conditioner example: (1, 1) / = 1.60 (1, 2) / = 1.21 (2, 2) / = 1.31 all 4

62 Testing whether Σ 1 = Σ 2 Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region We could use Bartlet s test, but this assumes Data are multivariate normal (not just that the means are multivariate normal). Σ 1 = Σ 2. So if you reject H o (significant test statistics), it could be because Σ 1 Σ 2 Data are not normal. Or both Σ 1 Σ 2 and Data are not normal. Additionally for a valid test you need large samples, but if you have large samples you don t need to assumed that Σ 1 = Σ 2 (or normality of the data). Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 62 of 68 and Bonferroni

63 Revisiting Examining Why Our motivation for computing confidence intervals for components of mean vector was to come to conclusion about individual means. The simultaneous T 2 intervals hold for any a. Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 63 of 68 and Bonferroni The a that leads to the largest population difference is proportional to S 1 pool ( x 1 x 2 ) = a If null hypothesis using T 2 is rejected, then a ( x 1 x 2 ) has the largest possible statistic a ( x 1 x 2 ) = ( x 1 x 2 ) S 1 pool ( x 1 x 2 ) which is a multiple of T 2. a is useful for interpreting and describing why H o was rejected.

64 Interpretation Assumptions Case 1: KnownΣ 1 and Σ 2 Case 2: Σ 1 andσ 2 Unknown Distribution of Linear Combination Distribution of Test Statistic Example: Two Independent SamplesT 2 Example continued 100(1 α)% Confidence Region for µ 1 µ 2 Example:Confidence Region Computing the Axes of the Ellipse Figure of 95% Confidence Region For the air conditioner data (using large sample), a is proportional to ( )( ) ( ( ) = So the difference in X 2 (off-peak consumption) contributes more (.063 >.041) to the rejection of H o : µ 1 µ 2 = 0 via T 2 test than X 1 (on-peak energy consumption). Note: a (µ 1 µ 2 ) = (.041(µ 11 µ 21 ).063(µ 12 µ 22 ) ) ) Simultaneous T 2 Intervals Simultaneous T 2 continued intervals Bonferroni and One-at-a-Time Intervals Comparisons of Two Means Slide 64 of 68 and Bonferroni

65 Summary regarding Inferences about µ Four reasons for taking a multivariate approach to hypothesis testing: Summary Summary regarding Inferences about µ Error Rates & More Reasons Reason 4 A couple of final notes Reason 1: If you do p univariate (t) tests, you have an inflated type I error rate (i.e., actual α larger than you want it to be). With a multivariate test, the exact α level is under your control..g., If p = 5 and you perform p separate univariate tests all at α =.05, then Prob{at least 1 false rejection} = Prob{at leat 1 Type I error} >.05 In the extreme case where all the variables are independent, if H o is true Prob{at least 1 false rejection} = 1 Prob{all P retained} = 1 (1 α) p Comparisons of Two Means Slide 65 of 68

66 Error Rates & More Reasons Overall error rates are somewhere between For p = 5 =.05 and.23 For p = 10 =.05 and.40. Summary Summary regarding Inferences about µ Error Rates & More Reasons Reason 4 A couple of final notes Reason 2: Univariate tests ignore (completely) the correlations between the variables. Multivariate tests make direct use of the covariance matrix. Reason 3: Multivariate tests are more powerful (in most cases). Sometimes all p univariate tests fail to reach significance, but multivariate test is significant because small effects combine to jointly indicate significance. Note: For a given sample size, there is a limit to the number of variables a multivariate test can handle without losing power. Comparisons of Two Means Slide 66 of 68

Inferences about a Mean Vector

Inferences about a Mean Vector Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

Lecture 5: Hypothesis tests for more than one sample

Lecture 5: Hypothesis tests for more than one sample 1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated

More information

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson Sample Geometry Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring

More information

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson More Linear Algebra Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois

More information

Mean Vector Inferences

Mean Vector Inferences Mean Vector Inferences Lecture 5 September 21, 2005 Multivariate Analysis Lecture #5-9/21/2005 Slide 1 of 34 Today s Lecture Inferences about a Mean Vector (Chapter 5). Univariate versions of mean vector

More information

5 Inferences about a Mean Vector

5 Inferences about a Mean Vector 5 Inferences about a Mean Vector In this chapter we use the results from Chapter 2 through Chapter 4 to develop techniques for analyzing data. A large part of any analysis is concerned with inference that

More information

Rejection regions for the bivariate case

Rejection regions for the bivariate case Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Comparisons of Several Multivariate Populations Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Slide

More information

Comparisons of Several Multivariate Populations

Comparisons of Several Multivariate Populations Comparisons of Several Multivariate Populations Edps/Soc 584, Psych 594 Carolyn J Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees,

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Two sample T 2 test 1 Two sample T 2 test 2 Analogous to the univariate context, we

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Combinations of Variables Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

More information

STA 437: Applied Multivariate Statistics

STA 437: Applied Multivariate Statistics Al Nosedal. University of Toronto. Winter 2015 1 Chapter 5. Tests on One or Two Mean Vectors If you can t explain it simply, you don t understand it well enough Albert Einstein. Definition Chapter 5. Tests

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Principal Analysis Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board

More information

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay Lecture 3: Comparisons between several multivariate means Key concepts: 1. Paired comparison & repeated

More information

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus Chapter 9 Hotelling s T 2 Test 9.1 One Sample The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus H A : µ µ 0. The test rejects H 0 if T 2 H = n(x µ 0 ) T S 1 (x µ 0 ) > n p F p,n

More information

Hotelling s One- Sample T2

Hotelling s One- Sample T2 Chapter 405 Hotelling s One- Sample T2 Introduction The one-sample Hotelling s T2 is the multivariate extension of the common one-sample or paired Student s t-test. In a one-sample t-test, the mean response

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

You can compute the maximum likelihood estimate for the correlation

You can compute the maximum likelihood estimate for the correlation Stat 50 Solutions Comments on Assignment Spring 005. (a) _ 37.6 X = 6.5 5.8 97.84 Σ = 9.70 4.9 9.70 75.05 7.80 4.9 7.80 4.96 (b) 08.7 0 S = Σ = 03 9 6.58 03 305.6 30.89 6.58 30.89 5.5 (c) You can compute

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) II Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 1 Compare Means from More Than Two

More information

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern. STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems

More information

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4. Multivariate normal distribution Reading: AMSA: pages 149-200 Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016 1. Density and properties Brief outline

More information

Profile Analysis Multivariate Regression

Profile Analysis Multivariate Regression Lecture 8 October 12, 2005 Analysis Lecture #8-10/12/2005 Slide 1 of 68 Today s Lecture Profile analysis Today s Lecture Schedule : regression review multiple regression is due Thursday, October 27th,

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis

More information

Within Cases. The Humble t-test

Within Cases. The Humble t-test Within Cases The Humble t-test 1 / 21 Overview The Issue Analysis Simulation Multivariate 2 / 21 Independent Observations Most statistical models assume independent observations. Sometimes the assumption

More information

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is Stat 501 Solutions and Comments on Exam 1 Spring 005-4 0-4 1. (a) (5 points) Y ~ N, -1-4 34 (b) (5 points) X (X,X ) = (5,8) ~ N ( 11.5, 0.9375 ) 3 1 (c) (10 points, for each part) (i), (ii), and (v) are

More information

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions YORK UNIVERSITY Faculty of Science Department of Mathematics and Statistics MATH 222 3. M Test # July, 23 Solutions. For each statement indicate whether it is always TRUE or sometimes FALSE. Note: For

More information

Chapter 7, continued: MANOVA

Chapter 7, continued: MANOVA Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc. Chapter 24 Comparing Means Copyright 2010 Pearson Education, Inc. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side. For example:

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Nonparametric Location Tests: k-sample

Nonparametric Location Tests: k-sample Nonparametric Location Tests: k-sample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

Stat 427/527: Advanced Data Analysis I

Stat 427/527: Advanced Data Analysis I Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM

SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM SOME ASPECTS OF MULTIVARIATE BEHRENS-FISHER PROBLEM Junyong Park Bimal Sinha Department of Mathematics/Statistics University of Maryland, Baltimore Abstract In this paper we discuss the well known multivariate

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1 PHP2510: Principles of Biostatistics & Data Analysis Lecture X: Hypothesis testing PHP 2510 Lec 10: Hypothesis testing 1 In previous lectures we have encountered problems of estimating an unknown population

More information

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: MULTIVARIATE ANALYSIS OF VARIANCE MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: 1. Cell sizes : o

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

STAT 501 EXAM I NAME Spring 1999

STAT 501 EXAM I NAME Spring 1999 STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 2.830J / 6.780J / ESD.63J Control of Processes (SMA 6303) Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Serial Correlation. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology

Serial Correlation. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology Serial Correlation Edps/Psych/Stat 587 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 017 Model for Level 1 Residuals There are three sources

More information

1 Hypothesis testing for a single mean

1 Hypothesis testing for a single mean This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in

More information

Distribution-Free Procedures (Devore Chapter Fifteen)

Distribution-Free Procedures (Devore Chapter Fifteen) Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test la Contents The two sample t-test generalizes into Analysis of Variance. In analysis of variance ANOVA the population consists

More information

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018 Math 403 - P. & S. III - Dr. McLoughlin - 1 2018 2 Hand-out 2 Dr. M. P. M. M. M c Loughlin Revised 2018 3. Fundamentals 3.1. Preliminaries. Suppose we can produce a random sample of weights of 10 year-olds

More information

One-way ANOVA (Single-Factor CRD)

One-way ANOVA (Single-Factor CRD) One-way ANOVA (Single-Factor CRD) STAT:5201 Week 3: Lecture 3 1 / 23 One-way ANOVA We have already described a completed randomized design (CRD) where treatments are randomly assigned to EUs. There is

More information

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 4 Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science Copyright 013 by D.B. Rowe 1 Agenda: Recap Chapter 9. and 9.3 Lecture Chapter 10.1-10.3 Review Exam 6 Problem Solving

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2014 1 See last slide for copyright information. 1 / 37 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

The Random Effects Model Introduction

The Random Effects Model Introduction The Random Effects Model Introduction Sometimes, treatments included in experiment are randomly chosen from set of all possible treatments. Conclusions from such experiment can then be generalized to other

More information

STAT 501 Assignment 1 Name Spring 2005

STAT 501 Assignment 1 Name Spring 2005 STAT 50 Assignment Name Spring 005 Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Written Assignment: Due

More information

8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution

8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution Eigenvectors and the Anisotropic Multivariate Gaussian Distribution Eigenvectors and the Anisotropic Multivariate Gaussian Distribution EIGENVECTORS [I don t know if you were properly taught about eigenvectors

More information

HYPOTHESIS TESTING. Hypothesis Testing

HYPOTHESIS TESTING. Hypothesis Testing MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1 Math 66/566 - Midterm Solutions NOTE: These solutions are for both the 66 and 566 exam. The problems are the same until questions and 5. 1. The moment generating function of a random variable X is M(t)

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Stat 587 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline Notation NELS88 data Fixed Effects ANOVA

More information

M A N O V A. Multivariate ANOVA. Data

M A N O V A. Multivariate ANOVA. Data M A N O V A Multivariate ANOVA V. Čekanavičius, G. Murauskas 1 Data k groups; Each respondent has m measurements; Observations are from the multivariate normal distribution. No outliers. Covariance matrices

More information

MULTIVARIATE POPULATIONS

MULTIVARIATE POPULATIONS CHAPTER 5 MULTIVARIATE POPULATIONS 5. INTRODUCTION In the following chapters we will be dealing with a variety of problems concerning multivariate populations. The purpose of this chapter is to provide

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Introduction to Business Statistics QM 220 Chapter 12

Introduction to Business Statistics QM 220 Chapter 12 Department of Quantitative Methods & Information Systems Introduction to Business Statistics QM 220 Chapter 12 Dr. Mohammad Zainal 12.1 The F distribution We already covered this topic in Ch. 10 QM-220,

More information

STA Module 10 Comparing Two Proportions

STA Module 10 Comparing Two Proportions STA 2023 Module 10 Comparing Two Proportions Learning Objectives Upon completing this module, you should be able to: 1. Perform large-sample inferences (hypothesis test and confidence intervals) to compare

More information

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment STAT 5 Assignment Name Spring Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Examine the matrix properties

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

Introduction to the Analysis of Variance (ANOVA)

Introduction to the Analysis of Variance (ANOVA) Introduction to the Analysis of Variance (ANOVA) The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique for testing for differences between the means of multiple (more

More information

Analysis of variance (ANOVA) Comparing the means of more than two groups

Analysis of variance (ANOVA) Comparing the means of more than two groups Analysis of variance (ANOVA) Comparing the means of more than two groups Example: Cost of mating in male fruit flies Drosophila Treatments: place males with and without unmated (virgin) females Five treatments

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 17 for Applied Multivariate Analysis Outline Multivariate Analysis of Variance 1 Multivariate Analysis of Variance The hypotheses:

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Rank-Based Methods. Lukas Meier

Rank-Based Methods. Lukas Meier Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data

More information

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) BSTT523 Pagano & Gauvreau Chapter 13 1 Nonparametric Statistics Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) In particular, data

More information

Multivariate analysis of variance and covariance

Multivariate analysis of variance and covariance Introduction Multivariate analysis of variance and covariance Univariate ANOVA: have observations from several groups, numerical dependent variable. Ask whether dependent variable has same mean for each

More information

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Modeling Sub-Visible Particle Data Product Held at Accelerated Stability Conditions José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Outline Sub-Visible Particle (SbVP) Poisson Negative Binomial

More information

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing ECO22Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Group comparison test for independent samples

Group comparison test for independent samples Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations

More information

One-way ANOVA. Experimental Design. One-way ANOVA

One-way ANOVA. Experimental Design. One-way ANOVA Method to compare more than two samples simultaneously without inflating Type I Error rate (α) Simplicity Few assumptions Adequate for highly complex hypothesis testing 09/30/12 1 Outline of this class

More information