y ij = µ + α i + ɛ ij,

Similar documents
Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Chapter 13, Part A Analysis of Variance and Experimental Design

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Stat 139 Homework 7 Solutions, Fall 2015

Statistics 511 Additional Materials

Stat 200 -Testing Summary Page 1

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Random Variables, Sampling and Estimation

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Sample Size Determination (Two or More Samples)

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Topic 9: Sampling Distributions of Estimators

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Chapter 6 Sampling Distributions

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

(7 One- and Two-Sample Estimation Problem )

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Simple Linear Regression

Chapter 8: Estimating with Confidence

Module 1 Fundamentals in statistics

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

1 Inferential Methods for Correlation and Regression Analysis

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Data Analysis and Statistical Methods Statistics 651

Formulas and Tables for Gerstman

MATH/STAT 352: Lecture 15

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Statistics 20: Final Exam Solutions Summer Session 2007

Biostatistics for Med Students. Lecture 2

Statistical inference: example 1. Inferential Statistics

Power and Type II Error

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Grant MacEwan University STAT 252 Dr. Karen Buro Formula Sheet

Lesson 2. Projects and Hand-ins. Hypothesis testing Chaptre 3. { } x=172.0 = 3.67

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Describing the Relation between Two Variables

Section 14. Simple linear regression.

REGRESSION MODELS ANOVA

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Expectation and Variance of a random variable

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

1 Constructing and Interpreting a Confidence Interval

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Output Analysis (2, Chapters 10 &11 Law)

Common Large/Small Sample Tests 1/55

STAT431 Review. X = n. n )

STATISTICAL INFERENCE

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

University of California, Los Angeles Department of Statistics. Hypothesis testing

Final Examination Solutions 17/6/2010

Section 11.8: Power Series

Topic 10: Introduction to Estimation

Properties and Hypothesis Testing

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

There is no straightforward approach for choosing the warmup period l.

Statisticians use the word population to refer the total number of (potential) observations under consideration

1 Constructing and Interpreting a Confidence Interval

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

A Question. Output Analysis. Example. What Are We Doing Wrong? Result from throwing a die. Let X be the random variable

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Analysis of Experimental Data

Frequentist Inference

Read through these prior to coming to the test and follow them when you take your test.

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

(all terms are scalars).the minimization is clearer in sum notation:

Transcription:

STAT 4 ANOVA -Cotrasts ad Multiple Comparisos /3/04 Plaed comparisos vs uplaed comparisos Cotrasts Cofidece Itervals Multiple Comparisos: HSD Remark Alterate form of Model I y ij = µ + α i + ɛ ij, a i α i = 0 idetifiability costrait Plaed comparisos - sigle pairs of meas, or costraits specified i advace Differece of Meas e.g. µ i µ j : like a two-sample test -but, we have a ANOVA model ad hece the pooled variace estimate s for the commo variace σ. 00( α)%ci ȳ i ȳ j ± t α/[ν] SEȳi ȳ j, ν = a, SEȳi ȳ j = s + i j Ex: (Pea sectio data) Legth of pea sectios grow i tissue cultures [] 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 Cotrol Glucose Fructose Gluc+ Fruc Sucrose 75 57 58 58 6 67 58 6 59 66 70 60 56 58 65 75 59 58 6 63 65 6 57 57 64 7 60 56 56 6 67 60 6 58 65 67 57 60 57 65 76 59 57 57 6 68 6 58 59 67 peas=sca() pea.df=data.frame(peas,culture=as.factor(rep(:5,0))) culture=as.factor(rep(:5,0)) culture [46] 3 4 5 Levels: 3 4 5 pea.lm=lm(peas~culture,data=pea.df) aova(pea.lm) Respose: peas Df Sum Sq Mea Sq F value Pr(>F) culture 4 077.3 69.33 49.368 6.737e-6 *** Residuals 45 45.50 5.46 --- Sigif. codes: 0 *** 0.00 ** 0.0 * 0.05. 0. pea.resid=residuals(pea.lm) pea.fitted=fitted(pea.lm) matrix(roud(pea.resid,),col=5,byrow=t) [,] [,] [,3] [,4] [,5] [,] 4.9 -.3-0. 0 -.

[,] -3. -.3.8.9 [3,] -0. 0.7 -. 0 0.9 [4,] 4.9-0.3-0. 3 -. [5,] -5..7 -. - -0. [6,] 0.9 0.7 -. - -. [7,] -3. 0.7.8 0 0.9 [8,] -3. -.3.8-0.9 [9,] 5.9-0.3 -. - -. [0,] -..7-0..9 > sum(roud(pea.resid,)^) [] 45.5 matrix(roud(pea.fitted,),col=5,byrow=t) [,] [,] [,3] [,4] [,5] [,] 70. 59.3 58. 58 64. [,] 70. 59.3 58. 58 64. [3,] 70. 59.3 58. 58 64. [4,] 70. 59.3 58. 58 64. [5,] 70. 59.3 58. 58 64. [6,] 70. 59.3 58. 58 64. [7,] 70. 59.3 58. 58 64. [8,] 70. 59.3 58. 58 64. [9,] 70. 59.3 58. 58 64. [0,] 70. 59.3 58. 58 64. 95% CI for µ c µ g, differece betwee the cotrol group ad the glucose group: ȳ c ȳ g = 70. 59.3 = 0.8 s = s = MS withi = 5.46 =.34 SEȳc ȳ g =.34 0 + 0 =.046 dof ν = a = 50 5 = 45 ; t.975[45] =.05 C.I. = 0.8 ± (.0)(.046) = [8.69,.9] Cotrast: A liear combiatio of meas where the coefficiets sum to zero: Populatio Sample a a γ = c i µ i c i = 0 i= i= c i ȳ i i Ex: sugars vs. cotrol γ = µ c 4 (µ g + µ f + µ g+f + µ s ), coefficiets (c i ) = (, 4, 4, 4, 4 ) Typically used to compare groups of meas or certai weighted combiatios ( orthogoal cotrasts ) such as liear or quadratic effects. Variace of a sample cotrast (assumig the sample meas are idepedet) Estimated SE c, SE c = s i c i i, Var c = Var ( c i ȳ i ) = c i Var(ȳ i ) = c i 00( α)%ci C ± t α/ [ν]se c σ i

Ex: C = 70. (59.3 + 58. + 58 + 64.) = 0. 4 c i = i 0 [ + + + + ] = 5 4 4 4 4 40 = c 8, i = i 8 = 0.3536 SE c = (.34) (0.3536) = 0.873 CI for C is 0. ± (.0) (0.873) = [8.53,.87] Hypothesis Test for Cotrast: e.g. H 0 : γ = γ 0 Form a t-statistic t = C γ 0 t [ν] if H 0 is true. SE c Remark: differeces betwee meas are a speacial case of cotrasts: e.g. µ c µ g = a i= c i µ i with (C i ) = (,, 0, 0, 0). These types of ivestigatios should be doe o combiatios of factors that were determied i advace of observig the experimetal results, or else the cofidece levels are ot as specified by the procedure. Also, doig several comparisos might chage the overall cofidece level. This ca be avoided by carefully selectig cotrasts to ivestigate i advace ad makig sure that: the umber of such cotrasts does ot exceed the umber of degrees of freedom betwee the treatmets *oly orthogoal cotrasts are chose. However, there are also several powerful multiple compariso procedures we ca use after observig the experimetal results. Uplaed comparisos After lookig at the data, we may wish to assess the sigificace of, or give C.I. s for certai differeces e.g. µ g µ f (i pea legth e.g.) or cotrasts e.g. µ s (µ 3 g + µ f + µ gf ) That looked iterestig a posteriori. Viewed a priori, however, there are may differeces or cotrasts that could potetially attract attetio. We eed to adjust our sigificace levels ad P-values (larger) or our C.I. s (wider) to allow for this search over all possibilities. Subject of multiple comparisos -see books, e.g. Miller, R.G. Simulateous Statistic Iferece All pairs of differeces with a treatmets, there are ( ) a = a(a ) possible comparisos of differet meas: µ i µ j, i =,..., a; j =,..., i If we used t-itervals, would have may itervals of form I ij ȳ i ȳ j ± t α/[ν] SEȳi ȳ j But the chace that all itervals simultaeously cover all µ i µ j : P {I ij coverµ i µ j ; for all]i < j} < α To obtai a simultaeous coverage property, make itevals wider I T K ij ȳ i ȳ j ± Q α[a,ν] SEȳi ȳ j TK = Tukey Kramer Q α[a,ν] are percetage poits of studetized rage distributio. Formal defiitio: Q [a,ν] = max Z i Z j s where Z, Z,..., Z a N(0, ); νs χ (ν) ad all idepedet gives wider itervals Q α[a,ν] a=!) Ex. (pea legths) Q.95[5,45] = 4.0 =.84(>.0 = t.975[45] ) > t α/[ν] (uless 3

> qtukey(0.95,5,45) [] 4.0 simultaeous iterval for µ c µ g i 0.8 ± (.843)(.046) = [7.83, 3.8] Simultaeous coverage property if Model I holds, ad = =... = a ( balaced ), the P (Iij T K covers µ i µ j for all i < j) = α Remark: If the ANOVA is ubalaced (ot all i equal) the these Tukey-Kramer itervals are coservative (coverage prob α). Whe comparig the meas for the levels of a factor i a aalysis of variace, a simple compariso usig t-tests will iflate the probability of declarig a sigificat differece whe it is ot i fact preset. This because the itervals are calculated with a give coverage probability for each iterval but the iterpretatio of the coverage is usually with respect to the etire family of itervals. Joh Tukey itroduced itervals based o the rage of the sample meas rather tha the idividual differeces. The itervals retured by this fuctio are based o this Studetized rage statistics. Techically the itervals costructed i this way would oly apply to balaced desigs where there are the same umber of observatios made at each level of the factor. This fuctio icorporates a adjustmet for sample size that produces sesible itervals for mildly ubalaced desigs. >peas.aov_aov(peas~gr) >TukeyHSD(peas.aov) Tukey multiple comparisos of meas 95% family-wise cofidece level Fit: aov(formula = peas ~ gr) $gr diff lwr upr - -0.8-3.76807-7.8393 3- -.9-4.86807-8.9393 4- -. -5.06807-9.393 5- -6.0-8.96807-3.0393 3- -. -4.06807.86807 4- -.3-4.6807.66807 5-4.8.8393 7.76807 4-3 -0. -3.6807.76807 5-3 5.9.9393 8.86807 5-4 6. 3.393 9.06807 > peas.hsd_tukeyhsd(peas.aov) > plot(peas.hsd) 5 0 5 0 5 0 5 4 5 3 4 3 5 4 3 5 4 3 95% family wise cofidece level Differeces i mea levels of gr The term experimet wise error rate α arises because, if H 0 is true (all µ i equal), the the chace of falsely declarig as sigificat ay of the a(a ) pair wise diffs is (at most) α : P H0 {max ȳ i ȳ i SEȳi ȳ i > Q α/[a,ν] } α ( = α if all i equal) Balaced Case ad Hoestly Sigificat Differece (HSD) if all i =, the all SEȳi ȳ i = s ȳ i ȳ i > Q α/ [a, ν] s so just fid those pairs (ȳ i ȳ i ) separated by > HSD 4

Overlappig itervals picture - the ± HSD itervals overl ap if ad oly if ȳ i ȳ i HSD meas (µ i, µ i ) whose HSD itervals do t overlap are sigificatly differet at experimet wise error rate α. (Warig! ȳ i ± HSD is NOT a 00( α)% Cof. iterval! ) All cotrasts: The Scheffé itervals I s ± (a )F α,[a,ν] SE c have the simultaeous coverage property (for balaced or ubalaced cases) P {I s coverforallcotrasts} = a α Sice cotrasts are more geeral tha differeces, expect Scheffé itervals to be eve wider tha Tukey-Kramer Ex: γ = µ s (µ 3 g + µ f + µ gf )c = (0,,,, ) c i i ( + 3 + ) = 4 3 3 3 = 0 3 3 c SE c = s i = (.34)(.365) = 0.8544 (a )F α[a,ν] = 4F.95[4,45] = 4 x (.58) = 3. 95 % Scheffé iterval for c = x s ( x 3 g + x f + x gf ) = 64. 58.5 = 5.6 has margi effor (3.)(0.8544) =.743 CI [5.6.74, 5.6 +.74] = [.86, 8.34] (Note that Scheffé multiplier = 3. >.84 = Qα[a,ν] = Tukey-Kramer multiplier) Remark: - There is a versio of the T-K itervals for cotrasts - these ca be better (shorter) tha the Scheffé method if a is larger ad relatively fewer c i are o-zero cotr.peas=matrix(c(4,-,-,-,-,0,-,-,3,-),col=) cotr.peas [,] [,] [,] 4 0 [,] - - [3,] - - [4,] - 3 [5,] - - cotrasts(culture)=cotr.peas 5