UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

UCLA STAT 3 ntroducton to Statstcal Methods for the Lfe and Health Scences nstructor: vo Dnov, Asst. Prof. of Statstcs and Neurology Chapter Analyss of Varance - ANOVA Teachng Assstants: Fred Phoa, Anwer Khan, Mng Zheng & Matlda Hseh Unversty of Calforna, Los Angeles, Fall 005 http://www.stat.ucla.edu/~dnov/courses_students.html Slde Slde Comparng the Means of ndependent Samples n Chapter 7 we consdered the comparsons of two ndependent group means usng the ndependent t test We need to expand our thnkng to compare ndependent samples The procedure we wll use s called Analyss of Varance (ANOVA) Comparng the Means of ndependent Samples Example: 5 varetes of peas are currently beng tested by a large agrbusness cooperatve to determne whch s best suted for producton. A feld was dvded nto 0 plots, wth each varety of peas planted n four plots. The yelds (n bushels of peas) produced from each plot are shown n the table below: Varety of Pea A B C D E 6. 9. 9..3 0. 4.3 8. 30.8.4 9.3.8 7.3 33.9 4.3 9.9 8. 3. 3.8.8. Slde 3 Slde 4 Comparng the Means of ndependent Samples ssues n ANOVA n applyng ANOVA, the data are regarded as random samples from k populatons Notaton (let sub-ndces A, B, etc ): Populaton means: µ, µ, µ 3, µ 4, µ 5 Populaton standard devatons: σ, σ, σ 3, σ 4, σ 5 We have fve group means to compare Why not just carry out a bunch of t tests? Repeated t tests would mean: Ho: µ µ Ho: µ µ 3 Ho: µ 3 µ 4 Ho: µ 4 µ 5 Ho: µ µ 4 Ho: µ µ 5 Etc 5 0 We would have to make What s so bad about that? comparsons Slde 5 Slde 6

ssues n ANOVA Each test s carred out at α 0.05, so a type error s 5% for each The overall rsk of a type error s larger than 0.05 and gets larger as the number of groups () gets larger SOLUTON: Need to make multple comparsons wth an overall error of α 0.05 (or whchever level s specfed). ssues n ANOVA There are other postve aspects of usng ANOVA: Can see f there s a trend wthn the groups; low to hgh Estmaton of the standard devaton Global sharng of nformaton of all data yelds precson n the analyss The man dea behnd ANOVA s that we need to know how much nherent varablty there s n the data before we can judge whether there s a dfference n the sample means Slde 7 Slde 8 ssues n ANOVA ssues n ANOVA To make an nference about means we compare two types of varablty: varablty between sample means varablty wthn each group 34 3 30 ndvdual Value Plot of yeld vs varety t s very mportant that we keep these two types of varablty n mnd as we work through the followng formulas t s our goal to come up wth a numerc quantty that descrbes each of these varablty s yeld 8 6 4 0 A wthn B C varety between D E Slde 9 Slde 0 ssues n ANOVA The Basc ANOVA wthn Slde between Because we now have groups each wth t s own observatons, we need to modfy our notaton Notaton: y j group observaton j For the pea example: y 6. y 4.3 y 9. y 54. Slde

The Basc ANOVA More notaton: number of groups n number of observatons n group n * total number of observatons n + n + + n The Basc ANOVA Formulae: The group mean for group s: The grand mean s: y.. n n y j j * n y j j. n y To compute the dfference between the means we wll compare each group mean to the grand mean Slde 3 Slde 4 Varaton Between Groups Varaton Between Groups Goal # s to descrbe the varaton between the groups means RECALL: For the ndependent t test we descrbed the dfference between two group means as y y n ANOVA we descrbe the dfference between means as sums of squares between: SS(between) n ( y. ) Can be though of as the dfference between each group mean and the grand mean look at the formula As our other measures of varaton have used n the past s degrees of freedom, SS(between) also has degrees of freedom df (between) Fnally our measure of between group varablty s mean square between: MS(between) SS( between) df ( between) Ths measures varablty between the sample means Slde 5 Slde 6 Varaton Wthn Groups Varaton Wthn Groups Goal # s to descrbe the varaton wthn the groups RECALL: To measure the varablty wthn a sngle sample we used: ( y y ) s n n ANOVA to descrbe the combned varaton wthn groups we use sums of squares wthn: SS(wthn) ( yj y. ) j Can be though of as the combnaton of varaton wthn the groups Slde 7 SS(wthn) also has degrees of freedom df (wthn) n* - Fnally our measure of wthn varablty s mean square wthn: MS(wthn) SS( wthn) df ( wthn) Ths s a measure of varablty wthn the groups Slde 8 3

More on MS (wthn) The quantty for MS(wthn) s a measure of varablty wthn the groups f there were only one group wth n observatons, then n SS(wthn) ( y j y) df(wthn) n* - MS(wthn) j n ( y j y) j n Ths was s from chapter! More on MS (wthn) ANOVA deals wth several groups smultaneously. MS(wthn) s a combnaton of the varances of the groups t s poolng together measurements of varablty from the dfferent groups Wth smlar logc MS(wthn) for two groups can be transformed nto the pooled standard devaton remember our talk n chapter 7 about the pooled and unpooled methods? S pooled MS(wthn) Slde 9 Slde 0 A Fundamental Relatonshp of ANOVA The last formula based dscusson we need to have s regardng the total varablty n the data y y y. + y. ( ) ( ) ( ) j j A Fundamental Relatonshp of ANOVA Ths also corresponds to the sums of squares: ( yj ) ( yj y. ) + n ( y. ) j j Ths means SS(total) SS(wthn) + SS(between) Devaton of an observaton from the grand mean Total varablty wthn between SS(total) measures the varablty among all n * observatons n the groups df(total) df(wthn) + df(between) (n* - ) + ( ) n* - Slde Slde ANOVA Calculatons ANOVA Calculatons You ve probably notced that we haven t crunched any of these numbers yet Calculatons are farly ntense Computers are gong to rescue us: SOCR Slde 3 Example: peas (cont ) NOTE: between and wthn varances may be referred to as: SST (treatmentvarety) and SSE (Error, wthn) One-way ANOVA: yeld versus varety Source DF SS MS F P varety 4 34.04 85.5 3.97 0.000 Error 5 53.5 3.57 Total 9 395.56 S.889 R-Sq 86.47% R-Sq(adj) 8.86% ndvdual 95% Cs For Mean Based on Pooled StDev Level N Mean StDev A 4 5.00.69 (----*----) B 4 8.950.690 (----*----) C 4 3.650.30 (----*----) D 4.450.33 (----*----) E 4 0.350.5 (----*----) 0.0 4.0 8.0 3.0 Pooled StDev.889 Slde 4 4

The ANOVA table Standard for all ANOVA s also helps keep you organzed Source df SS MS SS( between) ( ) Between n y. df ( between) SS( wthn) ( ) Wthn n * yj y. j df ( wthn) ( ) Total n * - yj j Ths s our hypothess test for ANOVA # General form of the hypotheses: H o : µ µ µ H a : at least two of the µ k s are dfferent H o s compound when >, so rejectng H o doesn't tell us whch µ k 's are not equal, only that two are dfferent Slde 5 Slde 6 # The test statstc: MS( between) F s MS( wthn) F s wll be large f there s a lot of between varaton when compared to the wthn varaton dscrepances n the means are large relatve to the varablty wthn the groups #3 The p-value based on the F dstrbuton named after Fsher depends on numerator df and denomnator df Table 0 pgs 687 696 (or SOCR resource) #4 The concluson (TBD) Large values of F s provde evdence aganst H o Slde 7 Slde 8 Example s there a sgnfcant dfference between these 3 groups at α 0.05? MS( between) MSST SST / df F ~ F(,4) MS( wthn) MSSE MSSE / df SST / 9.86 / 3.4; p value 0.07 (SOCR) MSSE / 4 3/ 4 http://socr.stat.ucla.edu/test/socr_analyses.html 3 y.86; SST SS( Between) 7 n ( y. ) (.36) + 3(0.86) + (.64) 9.86 SSE SS( Wthn) ( yj y. ) j http://socr.stat.ucla.edu/applets.dr/normal_t_ch_f_tables.htm [ 0.5 + 0.5 ] + [ 0 + + ] + [ 0.5 + 0.5 ] 3 H : some µ µ j Reject H o! A Slde 9 N S µ H : µ µ... µ o A 0 0.5 B 0 3 3 C 4 5 9 4.5 Example: Peas (cont ) Do the data provde evdence to suggest that there s a dfference n yeld among the fve varetes of peas? Test usng α 0.05. H o : µ µ µ where A, B, etc H a : at least two of the µ k s are dfferent Slde 30 5

One-way ANOVA: yeld versus varety Source DF SS MS F P F 85.5 3.97 varety 4 34.04 85.5 3.97 0.000 3.57 Error 5 53.5 3.57 Total 9 395.56 p 0.000 S.889 R-Sq 86.47% R-Sq(adj) 8.86% ndvdual 95% Cs For Mean Based on Pooled StDev Level N Mean StDev A 4 5.00.69 (----*----) B 4 8.950.690 (----*----) C 4 3.650.30 (----*----) D 4.450.33 (----*----) E 4 0.350.5 (----*----) 0.0 4.0 8.0 3.0 Pooled StDev.889 Because 0.000 < 0.05 we wll reject H o. CONCLUSON: The data show that at least two of the true mean yelds of the fve varetes of peas, are statstcally sgnfcantly dfferent (p 0.000). Notce we can only say that at least two of the means are dfferent not whch two are dfferent! not all means are dfferent! Slde 3 Slde 3 Example: Peas (cont ) Suppose we need to get the p-value usng the table: Back to bracketng! numerator df 4 denomnator df 5 p < 0.000, so we wll agan reject H o Don t need to worry about doublng! Practce Example: Parents are frequently concerned when ther chld seems slow to begn walkng. n 97 Scence reported on an experment n whch the effects of several dfferent treatments on the age at whch a chld s frst walks were compared. Chldren n the frst group were gven specal walkng exercses for mnutes daly begnnng at the age week and lastng 7 weeks. The second group of chldren receved daly exercses, but not the walkng exercses admnstered to the frst group. The thrd and forth groups receved no specal treatment and dffered only n that the thrd group s progress was checked weekly and the forth was checked only at the end of the study. Observatons on age (months) when the chld began to walk are on the next slde Slde 33 Slde 34 Practce Practce Grp_ Grp_ Grp_3 Grp_4 9.5 3.5 9.5 0.5 9.75 0 9 0.75.5 3.5 3 0.5 3.5.5 9.5 5 3 Suppose and n ( y.) j y 43. 69 j ( y. ) 4. 78 H o : µ µ µ 3 µ 4 H a : at least two of the µ k s are dfferent Source df SS MS F Between 4 3 4.78 4.78 4.93 4.93. 4 3.30 Wthn 3 4 9 43.69 43.69.30 9 Total 3 58.47 Slde 35 Slde 36 6

Practce Wth 3 numerator df and 9 denomnator df 0. < p < 0., so we fal to reject H o CONCLUSON: These data show that a chld's true mean walkng age s not statstcally sgnfcantly dfferent among any of the four treatment groups (0.< p < 0.). Now you try to replcate these results usng the computer and the fle walkng.mtw Slde 37 Practce One-way ANOVA: age versus treatment Source DF SS MS F P treatment 3 4.78 4.93.4 0.9 Error 9 43.69.30 Total 58.47 S.56 R-Sq 5.8% R-Sq(adj) 3.48% ndvdual 95% Cs For Mean Based on Pooled StDev Level N Mean StDev -+---------+---------+---------+-------- 6 0.5.447 (-------*--------) 6.375.896 (--------*-------) 3 6.708.50 (--------*--------) 4 5.350 0.96 (--------*---------) -+---------+---------+---------+-------- 9.0 0.5.0 3.5 Pooled StDev.56 Slde 38 Applcablty of Methods Standard Condtons ANOVA s vald f:. Desgn condtons: a. Reasonable that groups of observatons are random samples from ther respectve populatons. Observatons wthn each group must be ndependent of one another. b. The samples must be ndependent. Populaton condtons: - The populaton dstrbutons must be approxmately normal wth equal standard devatons σ σ σ * normalty s less crucal n the sample szes are large Applcablty of Methods Verfcaton of Condtons look for bas, herarchy, and dependence. normalty and normal probablty plot of each group. standard devatons are approxmately equal f (RULE OF THUMB): largest sd < smallest sd f not, we cannot be confdent n our p-value from the F dstrbuton Slde 39 Slde 40 Once we reject Ho for the ANOVA, we know that at least two of the µ k s are dfferent We need to fnd whch group means are dfferent, but we shouldn t use a bunch of ndependent t tests We dscussed n secton. that each ndependent t test for each two group combnaton can nflate the overall rsk of a type error A naïve approach would be to calculate one sample C s for the mean usng the pooled standard devaton assumpton that the sd s were approx. equal look for overlap n the C s, but the problem s that these are stll 95% C s wth each alpha 0.05 ndvdual 95% Cs For Mean Based on Pooled StDev Level N Mean StDev A 4 5.00.69 (----*----) B 4 8.950.690 (----*----) C 4 3.650.30 (----*----) D 4.450.33 (----*----) E 4 0.350.5 (----*----) 0.0 4.0 8.0 3.0 Slde 4 Slde 4 7

A better soluton s to compare each group wth an overall α of 0.05. for ths we use a technque called a multple comparson (MC) procedure The dea s to compare means two at a tme at a reduced sgnfcance level, to ensure an "overall There are many dfferent MC Bonferron: smple and conservatve Each C calculated wth (overall error rate)/(# of comparsons) Newman-Keuls: less conservatve/more powerful, but complcated Tukey procedure: easy to use wth MTB We wll focus on the Tukey method Uses confdence ntervals for the dfference n means Confdence ntervals smlar to those n Chapter 7, for the dfference of two means usng an adjusted α RECALL: The zero rule We wll rely on the computer to calculate these ntervals Slde 43 Slde 44 Tukey 95% Smultaneous Confdence ntervals All Parwse Comparsons among Levels of varety ndvdual confdence level 99.5% varety A subtracted from: varety Lower Center Upper ---------+---------+---------+---------+ B -0.77 3.850 7.977 (----*----) C.43 6.550 0.677 (----*----) D -6.777 -.650.477 (----*----) E -8.877-4.750-0.63 (----*----) ---------+---------+---------+---------+ -8.0 0.0 8.0 6.0 varety B subtracted from: varety Lower Center Upper ---------+---------+---------+---------+ C -.47.700 6.87 (----*-----) D -0.67-6.500 -.373 (----*----) E -.77-8.600-4.473 (----*----) ---------+---------+---------+---------+ -8.0 0.0 8.0 6.0 varety C subtracted from: varety Lower Center Upper ---------+---------+---------+---------+ D -3.37-9.00-5.073 (-----*----) E -5.47 -.300-7.73 (----*----) ---------+---------+---------+---------+ -8.0 0.0 8.0 6.0 varety D subtracted from: varety Lower Center Upper ---------+---------+---------+---------+ E -6.7 -.00.07 (----*-----) ---------+---------+---------+---------+ -8.0 0.0 8.0 6.0 Slde 45 What does ths mean? The best was to summarze would be to thnk of the means n order from large to small and sght the dfferences (not necessary to repeat): The true mean yeld for varety A s statstcally sgnfcantly dfferent than the true means of varetes C, and E The true mean yeld for varety B s statstcally sgnfcantly dfferent than the true means of varetes D, and E The true mean yeld for varety C s statstcally sgnfcantly dfferent than the true means of varety A, D, and E (however C vs. A was prevously mentoned) Slde 46 34 ndvdual Value Plot of yeld vs varety Example: Walkng age (cont ) yeld 3 30 8 6 4 We do not need to carry out Tukey s test for ths data why? F.4, p 0.9 0 A B C varety D E Slde 47 Slde 48 8

Tukey 95% Smultaneous Confdence ntervals All Parwse Comparsons among Levels of treatment ndvdual confdence level 98.89% treatment subtracted from: treatment Lower Center Upper -.4.50 3.74 (---------*---------) 3-0.88.583 4.047 (---------*---------) 4-0.359.5 4.809 (---------*---------) -.5 0.0.5 5.0 treatment subtracted from: treatment Lower Center Upper 3 -.3 0.333.797 (---------*---------) 4 -.609 0.975 3.559 (---------*---------) -.5 0.0.5 5.0 treatment 3 subtracted from: treatment Lower Center Upper 4 -.94 0.64 3.6 (----------*---------) -.5 0.0.5 5.0 Slde 49 9