Importance of being uncertain

Size: px
Start display at page:

Download "Importance of being uncertain"

Transcription

1 npg 13 Nture Americ, Inc. All rights reserved. Points of Significnce Importnce of eing uncertin Sttistics does not tell us whether we re right. It tells us the chnces of eing wrong. When n experiment is reproduced we lmost never otin exctly the sme results. Insted, repeted mesurements spn rnge of vlues ecuse of iologicl vriility nd precision limits of mesuring equipment. But if results re different ech time, how do we determine whether mesurement is comptile with our hypothesis? In the gret trgedy of Science the slying of eutiful hypothesis y n ugly fct 1, how is ugliness mesured? Sttistics helps us nswer this question. It gives us wy to quntittively model the role of chnce in our experiments nd to represent dt not s precise mesurements ut s estimtes with error. It lso tells us how error in input vlues propgtes through clcultions. The prcticl ppliction of this theoreticl frmework is to ssocite uncertinty to the outcome of experiments nd to ssign confidence levels to sttements tht generlize eyond oservtions. Although mny fundmentl concepts in sttistics cn e understood intuitively, s nturl pttern-seekers we must recognize the limits of our intuition when thinking out chnce nd proility. The Monty Hll prolem is clssic exmple of how the wrong nswer cn pper fr too quickly nd too credily efore our eyes. A contestnt is given choice of three doors, only one leding to prize. After selecting door (e.g., door 1), the host opens one of the other two doors tht does not led to prize (e.g., door ) nd gives the contestnt the option to switch their pick of doors (e.g., door 3). The vexing question is whether it is in the contestnt s est interest to switch. The nswer is yes, ut you would e in good compny if you thought otherwise. When solution ws pulished in Prde mgzine, thousnds of reders (mny with PhDs) wrote in tht the nswer ws wrong. Comments vried from You mde mistke, ut look t the positive side. If ll those PhDs were wrong, the country would e in some very serious troule to I must dmit I douted you until my fifth grde mth clss proved you right. The Points of Significnce column will help you move eyond n intuitive understnding of fundmentl sttistics relevnt to your work. Its im will e to ddress the oservtion tht pproximtely hlf the rticles pulished in medicl journls tht use sttisticl methods use them incorrectly 3. Our presenttion will e prcticl nd cogent, with focus on foundtionl concepts, prcticl tips nd common misconceptions 4. A spredsheet will often ccompny ech column to demonstrte the clcultions (Supplementry Tle 1). We will not exhust you with mthemtics. Sttistics cn e rodly divided into two ctegories: descriptive nd inferentil. The first summrizes the min fetures of dt set with mesures such s the men nd stndrd devition (s.d.). The second generlizes from oserved dt to the world t lrge. Underpinning oth re the concepts of smpling nd estimtion, which ddress the process of collecting dt nd quntifying the uncertinty in these generliztions. this month To discuss smpling, we need to introduce the concept of popultion, which is the set of entities out which we mke inferences. The frequency histogrm of ll possile vlues of n experimentl vrile is clled the popultion distriution (Fig. 1). We re typiclly interested in inferring the men (μ) nd the s.d. (s) of popultion, two mesures tht chrcterize its loction nd spred (Fig. 1). The men is clculted s the rithmetic verge of vlues nd cn e unduly influenced y extreme vlues. The medin is more roust mesure Popultion distriution μ σ Loction Spred Figure 1 The men nd s.d. re commonly used to chrcterize the loction nd spred of distriution. When referring to popultion, these mesures re denoted y the symols m nd s. of loction nd more suitle for distriutions tht re skewed or otherwise irregulrly shped. The s.d. is clculted sed on the squre of the distnce of ech vlue from the men. It often ppers s the vrince (s ) ecuse its properties re mthemticlly esier to formulte. The s.d. is not n intuitive mesure, nd rules of thum help us in its interprettion. For exmple, for norml distriution, 39%, 68%, 95% nd 99.7% of vlues fll within ±.5s, ± 1s, ± s nd ± 3s. These cutoffs do not pply to popultions tht re not pproximtely norml, whose spred is esier to interpret using the interqurtile rnge. Fiscl nd prcticl constrints limit our ccess to the popultion: we cnnot directly mesure its men (μ) nd s.d. (s). The est we cn do is estimte them using our collected dt through the process of smpling (Fig. ). Even if the popultion is limited to nrrow rnge of vlues, such s etween nd 3 (Fig. ), the Frequency c Popultion distriution μ Smples Smple mens X 1 = [1,9,17,,6] X 1 = 14.6 X = [8,11,16,4,5] X = 16.8 X 3 = [16,17,18,,4] X 3 = Smpling distriution of smple mens μ X σ 3 σ X 3 Figure Popultion prmeters re estimted y smpling. () Frequency histogrm of the vlues in popultion. () Three representtive smples tken from the popultion in, with their smple mens. (c) Frequency histogrm of mens of ll possile smples of size n = 5 tken from the popultion in. rndom nture of smpling will imprt uncertinty to our estimte of its shpe. Smples re sets of dt drwn from the popultion (Fig. ), chrcterized y the numer of dt points n, usully denoted y X nd indexed y numericl suscript (X 1 ). Lrger smples pproximte the popultion etter. To mintin vlidity, the smple must e representtive of the popultion. One wy of chieving this is with simple rndom smple, where ll vlues in the popultion hve n equl chnce of eing selected t ech stge of the smpling process. Representtive does not men tht the smple is miniture replic of the popultion. In generl, smple will not resemle the popultion unless n is very Frequency nture methods VOL.1 NO.9 SEPTEMBER 13 89

2 this month npg 13 Nture Americ, Inc. All rights reserved. lrge. When constructing smple, it is not lwys ovious whether it is free from is. For exmple, surveys smple only individuls who greed to prticipte nd do not cpture informtion out those who refused. These two groups my e meningfully different. Smples re our windows to the popultion, nd their sttistics re used to estimte those of the popultion. The smple men nd s.d. re denoted y X nd s. The distinction etween smple nd popultion vriles is emphsized y the use of Romn letters for smples nd Greek letters for popultion (s versus s). Smple prmeters such s X hve their own distriution, clled the smpling distriution (Fig. c), which is constructed y considering ll possile smples of given size. Smple distriution prmeters re mrked with suscript of the ssocited smple vrile (for exmple, m X nd s X re the men nd s.d. of the smple mens of ll smples). Just like the popultion, the smpling distriution is not directly mesurle ecuse we do not hve ccess to ll possile smples. However, it turns out to e n extremely useful concept in the process of estimting popultion sttistics. Notice tht the distriution of smple mens in Figure c looks quite different thn the popultion in Figure. In fct, it ppers similr in shpe to norml distriution. Also notice tht its spred, s X, is quite it smller thn tht of the popultion, s. Despite these differences, the popultion nd smpling distriutions re intimtely relted. This reltionship is cptured y one of the most importnt nd fundmentl sttements in sttistics, the centrl limit theorem (CLT). The CLT tells us tht the distriution of smple mens (Fig. c) will ecome incresingly close to norml distriution s the smple size increses, regrdless of the shpe of the popultion distriution n = 3 n = 5 n = 1 n = Popultion distriution Norml Skewed Uniform Irregulr Smpling distriution of smple men Figure 3 The distriution of smple mens from most distriutions will e pproximtely normlly distriuted. Shown re smpling distriutions of smple mens for 1, smples for indicted smple sizes drwn from four different distriutions. Men nd s.d. re indicted s in Figure 1. (Fig. ) s long s the frequency of extreme vlues drops off quickly. The CLT lso reltes popultion nd smple distriution prmeters y m X = m nd s X = s/ n. The terms in the second reltionship re often confused: s X is the spred of smple mens, nd s is the spred of the underlying popultion. As we increse n, s X will decrese (our smples will hve more similr mens) ut s will not chnge (smpling hs no effect on the popultion). The mesured spred of smple mens is lso known s the stndrd error of the men (s.e.m., SE X ) nd is used to estimte s X. A demonstrtion of the CLT for different popultion distriutions (Fig. 3) qulittively shows the increse in precision of our estimte of the popultion men with increse in smple Smple men (X ) Smple stndrd devition (s) X 1 X X 3 4 Stndrd error of the men (s.e.m.) σ σ = X n Smple size (n) Figure 4 The men ( X ), s.d. (s) nd s.e.m. of three smples of incresing size drwn from the distriution in Figure. As n is incresed, X nd s more closely pproximte m nd s. The s.e.m. (s/ n) is n estimte of s X nd mesures how well the smple men pproximtes the popultion men. size. Notice tht it is still possile for smple men to fll fr from the popultion men, especilly for smll n. For exmple, in ten itertions of drwing 1, smples of size n = 3 from the irregulr distriution, the numer of times the smple men fell outside m ± s (indicted y verticl dotted lines in Fig. 3) rnged from 7.6% to 8.6%. Thus, use cution when interpreting mens of smll smples. Alwys keep in mind tht your mesurements re estimtes, which you should not endow with n ur of exctitude nd finlity 5. The omnipresence of vriility will ensure tht ech smple will e different. Moreover, s consequence of the 1/ n proportionlity fctor in the CLT, the precision increse of smple s estimte of the popultion is much slower thn the rte of dt collection. In Figure 4 we illustrte this vriility nd convergence for three smples drwn from the distriution in Figure, s their size is progressively incresed from n = 1 to n = 1. Be mindful of oth effects nd their role in diminishing the impct of dditionl mesurements: to doule your precision, you must collect four times more dt. Next month we will continue with the theme of estimtion nd discuss how uncertinty cn e ounded with confidence intervls nd visulized with error rs. Note: Any Supplementry Informtion nd Source Dt files re ville in the online version of the pper (doi:1.138/nmeth.613). Competing Finncil Interests The uthors declre no competing finncil interests. Mrtin Krzywinski & Nomi Altmn 1. Huxley, T.H. in Collected Essys 8, 9 (Mcmilln, 1894).. vos Svnt, M. Gme show prolem. (ccessed 9 July 13). 3. Glntz, S.A. Circultion 61, 1 7 (198). 4. Huck, S.W. Sttisticl Misconceptions (Routledge, 9). 5. Aleson, R.P. Sttistics s Principled Argument 7 (Psychology Press, 1995). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. μ σ 81 VOL.1 NO.9 SEPTEMBER 13 nture methods

3 this month npg 13 Nture Americ, Inc. All rights reserved. Points of Significnce Error rs The mening of error rs is often misinterpreted, s is the sttisticl significnce of their overlp. Lst month in Points of Significnce, we showed how smples re used to estimte popultion sttistics. We emphsized tht, ecuse of chnce, our estimtes hd n uncertinty. This month we focus on how uncertinty is represented in scientific pulictions nd revel severl wys in which it is frequently misinterpreted. The uncertinty in estimtes is customrily represented using error rs. Although most reserchers hve seen nd used error rs, misconceptions persist out how error rs relte to sttisticl significnce. When sked to estimte the required seprtion etween two points with error rs for difference t significnce P =.5, only % of respondents were within fctor of (ref. 1). In light of the fct tht error rs re ment to help us ssess the significnce of the difference etween two vlues, this oservtion is dishertening nd worrisome. Here we illustrte error r differences with exmples sed on simplified sitution in which the vlues re mens of independent (unrelted) smples of the sme size nd drwn from norml popultions with the sme spred. We clculte the significnce of the difference in the smple mens using the two-smple t-test nd report it s the fmilir P vlue. Although reporting the exct P vlue is preferred, conventionlly, significnce is often ssessed t P =.5 threshold. We will discuss P vlues nd the t-test in more detil in susequent column. The importnce of distinguishing the error r type is illustrted in Figure 1, in which the three common types of error rs stndrd devition (s.d.), stndrd error of the men (s.e.m.) nd confidence intervl (CI) show the spred in vlues of two smples of size n = 1 together with the P vlue of the difference in smple mens. In Figure 1, we simulted the smples so tht ech error r type hs the sme length, chosen to mke them exctly ut. Although these three dt pirs nd their error rs re visully identicl, ech represents different dt scenrio with different P vlue. In Figure 1, we fixed the P vlue to P =.5 nd show the length of ech type of r for this level of significnce. In this ltter scenrio, ech of the three pirs of points represents the sme pir of smples, ut the rs hve different lengths ecuse they indicte different sttisticl properties of the sme dt. And ecuse ech r is different length, you re likely to interpret ech one quite differently. In generl, gp etween rs s.d. s.e.m. 95% CI Smple men Smple men 1. P 1.. P Figure 1 Error r width nd interprettion of spcing depends on the error r type. (,) Exmple grphs re sed on smple mens of nd 1 (n = 1). () When rs re scled to the sme size nd ut, P vlues spn wide rnge. When s.e.m. rs touch, P is lrge (P =.17). () Br size nd reltive position vry gretly t the conventionl P vlue significnce cutoff of.5, t which rs my overlp or hve gp..5.5 Popultion distriution s.e.m. µ σ Smple mens with 95% CI σ σ Error r size Figure The size nd position of confidence intervls depend on the smple. On verge, CI% of intervls re expected to spn the men out 19 in times for 95% CI. () Mens nd 95% CIs of smples (n = 1) drwn from norml popultion with men m nd s.d. σ. By chnce, two of the intervls (red) do not cpture the men. () Reltionship etween s.e.m. nd 95% CI error rs with incresing n. does not ensure significnce, nor does overlp rule it out it depends on the type of r. Chnces re you were surprised to lern this unintuitive result. The first step in voiding misinterprettion is to e cler out which mesure of uncertinty is eing represented y the error r. In 1, error rs ppered in Nture Methods in out two-thirds of the figure pnels in which they could e expected (sctter nd r plots). The type of error rs ws nerly evenly split etween s.d. nd s.e.m. rs (45% versus 49%, respectively). In 5% of cses the error r type ws not specified in the legend. Only one figure used rs sed on the 95% CI. CIs re more intuitive mesure of uncertinty nd re populr in the medicl literture. Error rs sed on s.d. inform us out the spred of the popultion nd re therefore useful s predictors of the rnge of new smples. They cn lso e used to drw ttention to very lrge or smll popultion spreds. Becuse s.d. rs only indirectly support visul ssessment of differences in vlues, if you use them, e redy to help your reder understnd tht the s.d. rs reflect the vrition of the dt nd not the error in your mesurement. Wht should reder conclude from the very lrge nd overlpping s.d. error rs for P =.5 in Figure 1? Tht lthough the mens differ, nd this cn e detected with sufficiently lrge smple size, there is considerle overlp in the dt from the two popultions. Unlike s.d. rs, error rs sed on the s.e.m. reflect the uncertinty in the men nd its dependency on the smple size, n (s.e.m. = s.d./ n). Intuitively, s.e.m. rs shrink s we perform more mesurements. Unfortuntely, the commonly held view tht if the s.e.m. rs do not overlp, the difference etween the vlues is sttisticlly significnt is incorrect. For exmple, when n = 1 nd s.e.m. rs just touch, P =.17 (Fig. 1). Conversely, to rech P =.5, s.e.m. rs for these dt need to e out.86 rm lengths prt (Fig. 1). We cnnot overstte the importnce of recognizing the difference etween s.d. nd s.e.m. The third type of error r you re likely to encounter is tht sed on the CI. This is n intervl estimte tht indictes the reliility of mesurement 3. When scled to specific confidence level (CI%) the 95% CI eing common the r cptures the popultion men CI% of the time (Fig. ). The size of the s.e.m. is compred to the 95% CI in Figure. The two re relted y the t-sttistic, nd in lrge smples the s.e.m. r cn e interpreted s CI with confidence level of 67%. The size of the CI depends on n; two useful pproximtions for the CI re 95% CI 4 s.e.m (n = 3) nd 95% CI s.e.m. (n > 15). Smple size, n % CI nture methods VOL.1 NO.1 OCTOBER 13 91

4 this month npg 13 Nture Americ, Inc. All rights reserved. Smple men P Smple men s.e.m. error rs % CI error rs Figure 3 Size nd position of s.e.m. nd 95% CI error rs for common P vlues. Exmples re sed on smple mens of nd 1 (n = 1). A common misconception out CIs is n expecttion tht CI cptures the men of second smple drwn from the sme popultion with CI% chnce. Becuse CI position nd size vry with ech smple, this chnce is ctully lower. This vriety in rs cn e overwhelming, nd visully relting their reltive position to mesure of significnce is chllenging. We provide reference of error r spcing for common P vlues in Figure 3. Notice tht P =.5 is not reched until s.e.m. rs re seprted y out 1 s.e.m, wheres 95% CI rs re more generous nd cn overlp y s much s 5% nd still indicte significnt difference. If 95% CI rs just touch, the result is highly significnt (P =.5). All the figures cn e reproduced using the spredsheet ville in Supplementry Tle 1, with which you cn explore the reltionship etween error r size, gp nd P vlue. Be wry of error rs for smll smple sizes they re not roust, s illustrted y the shrp decrese in size of CI rs in tht regime (Fig. ). In these cses (e.g., n = 3), it is etter to show individul dt vlues. Furthermore, when deling with smples tht re relted (e.g., pired, such s efore nd fter tretment), other types of error rs re needed, which we will discuss in future column. It would seem, therefore, tht none of the error r types is intuitive. An lterntive is to select vlue of CI% for which the rs touch t desired P vlue (e.g., 83% CI rs touch t P =.5). Unfortuntely, owing to the weight of existing convention, ll three types of rs will continue to e used. With our tips, we hope you ll e more confident in interpreting them. Mrtin Krzywinski & Nomi Altmn Note: Any Supplementry Informtion nd Source Dt files re ville in the online version of the pper (doi:1.138/nmeth.659). COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. 1. Beli, S.F., Fidler, F., Willims, J. & Cumming, G. Psychol. Methods 1, (5).. Frøkjær-Jensen, C., Dvis, M.W., Ailion, M. & Jorgensen, E.M. Nt. Methods 9, (1). 3. Cumming, G., Fidler, F. & Vux, D.L. J. Cell. Biol. 177, 7 11 (7). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. 9 VOL.1 NO.1 OCTOBER 13 nture methods

5 npg 13 Nture Americ, Inc. All rights reserved. Points of SIGNIFICANCE Significnce, P vlues nd t-tests The P vlue reported y tests is proilistic significnce, not iologicl one. Bench scientists often perform sttisticl tests to determine whether n oservtion is sttisticlly significnt. Mny tests report the P vlue to mesure the strength of the evidence tht result is not just likely chnce occurrence. To mke informed judgments out the oservtions in iologicl context, we must understnd wht the P vlue is telling us nd how to interpret it. This month we will develop the concept of sttisticl significnce nd tests y introducing the one-smple t-test. To help you understnd how sttisticl testing works, consider the experimentl scenrio depicted in Figure 1 of mesuring protein expression level in cell line with western lot. Suppose we mesure n expression vlue of x = 1 nd hve good reson to elieve (for exmple, from pst mesurements) tht the reference level is m = 1 (Fig. 1). Wht cn we sy out whether this difference is due to rndom chnce? Sttisticl testing cn nswer this question. But first, we need to mthemticlly frme our intuitive understnding of the iologicl nd technicl fctors tht disperse our mesurements cross rnge of vlues. We egin with the ssumption tht the rndom fluctutions in the experiment cn e chrcterized y distriution (Fig. 1). This distriution is clled the null distriution, nd it emodies the null hypothesis (H ) tht our oservtion is smple from the pool of ll possile instnces of mesuring the reference. We cn think of constructing this distriution y mking lrge numer of independent mesurements of protein whose men expression is known to equl the reference vlue. This distriution represents the proility of oserving given expression level for protein tht is eing expressed t the reference level. The men of this distriution, m, is the reference expression, nd its spred is determined y reproduciility fctors inherent to our experiment. The purpose of sttisticl test is to locte our oservtion on this distriution to identify the extent to which it is n outlier. Sttistics quntifies the outlier sttus of n oservtion y the proility of smpling nother oservtion from the null distriu Expression Experimentl oservtion Distriution of reference expression vlues Proility of oserving more extreme vlue μ μ Reference Oserved μ x H x H x Expression Figure 1 The mechnism of sttisticl testing. ( c) The significnce of the difference etween oserved (x) nd reference (m) vlues () is clculted y ssuming tht oservtions re smpled from distriution H with men m (). The sttisticl significnce of the oservtion x is the proility of smpling vlue from the distriution tht is t lest s fr from the reference, given y the shded res under the distriution curve (c). This is the P vlue. c Expression Repeted oservtions of expression μ x Expression s.d. H s.e.m. Distriution of expression vlues μ Expression this month tion tht is s fr or frther wy from m. In our exmple, this corresponds to mesuring n expression vlue further from the reference thn x. This proility is the P vlue, which is the output of common sttisticl tests. It is clculted from the re under the distriution curve in the shded regions (Fig. 1c). In some situtions we my cre only if x is too ig (or too smll), in which cse we would compute the re of only the drk (light) shded region of Figure 1c. Unfortuntely, the P vlue is often misinterpreted s the proility tht the null hypothesis (H ) is true. This mistke is clled the prosecutor s fllcy, which ppels to our intuition nd ws so coined ecuse of its frequent use in courtroom rguments. In the process of clculting the P vlue, we ssumed tht H ws true nd tht x ws drwn from H. Thus, smll P vlue (for exmple, P =.5) merely tells us tht n improle event hs occurred in the context of this ssumption. The degree of improility is evidence ginst H nd supports the lterntive hypothesis tht the smple ctully comes from popultion whose men is different thn m. Sttisticl significnce suggests ut does not imply iologicl significnce. At this point you my sk how we rrive t our ssumptions out the null distriution in Figure 1. After ll, in order to clculte P, we need to know its precise shpe. Becuse experimentlly determining it is not prcticl, we need to mke n informed guess. For the purposes of this column, we will ssume tht it is norml. We will discuss roustness of tests to this ssumption of normlity in nother column. To complete our model of H, we still need to estimte its spred. To do this we return to the concept of smpling. To estimte the spred of H, we repet the mesurement of our protein s expression. For exmple, we might mke four dditionl independent mesurements to mke up smple with n = 5 (Fig. ). We use the men of expression vlues (x = 1.85) s mesure of our protein s expression. Next, we mke the key ssumption tht the s.d. of our smple (s x =.96) is suitle estimte of the s.d. of the null distriution (Fig. ). In other words, regrdless of whether the smple men is representtive of the null distriution, we ssume tht its spred is. This ssumption of equl vrinces is common, nd we will e returning to it in future columns. From our discussion out smpling 1, we know tht given tht H is norml, the smpling distriution of mens will lso e norml, nd we cn use s x / n to estimte its s.d. (Fig. c). We loclize the men expression on this distriution to clculte the P vlue, nlogously to wht ws done with the single vlue in Figure 1c. To void the nuisnce of deling with smpling distriution of mens for ech comintion of popultion prmeters, we cn trnsform c Distriution of verge expression vlues μ x Averge expression Figure Repeted independent oservtions re used to estimte the s.d. of the null distriution nd derive more roust P vlue. () A smple of n = 5 oservtions is tken nd chrcterized y the men x -, with error rs showing s.d. (s x ) nd s.e.m. (s x / n). () The null distriution is ssumed to e norml, nd its s.d. is estimted y s x. As in Figure 1, the popultion men is ssumed to e m. (c) The verge expression is locted on the smpling distriution of smple mens, whose spred is estimted y the s.e.m. nd whose men is lso m. The P vlue of x - is the shded re under this curve. nture methods VOL.1 NO.11 NOVEMBER

6 npg 13 Nture Americ, Inc. All rights reserved. this month t nd norml distriutions Norml n t P vlue P vlues of t sttistic Figure 3 The t nd norml distriutions. () The t distriution hs higher tils tht tke into ccount tht most smples will underestimte the vriility in popultion. The distriution is used to evlute the significnce of t sttistic derived from smple of size n nd is chrcterized y the degrees of freedom, d.f. = n 1. () When n is smll, P vlues derived from the t distriution vry gretly s n chnges. the men x to vlue determined y the difference of the smple nd popultion mens D = x m divided y the s.e.m. (s x / n). This is clled the test sttistic. It turns out, however, tht the shpe of this smpling distriution is close to, ut not exctly, norml. The extent to which it deprts from norml is known nd given y the Student s t distriution (Fig. 3), first descried y Willim Gosset, who pulished under the pseudonym Student (to void difficulties with his employer, Guinness) in his work on optimizing rley yields. The test sttistic descried ove is compred to this distriution nd is thus clled the t sttistic. The test illustrted in Figure is clled the one-smple t-test. This deprture in distriution shpe is due to the fct tht for most smples, the smple vrince, s x, is n underestimte of the vrince of the null distriution. The distriution of smple vrinces turns out to e skewed. The symmetry is more evident for smll n, where it is more likely tht we oserve vrince smller thn tht of the popultion. The t distriution ccounts for this underestimtion y hving higher tils thn the norml distriution (Fig. 3). As n grows, the t distriution looks very much like the norml, reflecting tht the smple s vrince ecomes more ccurte estimte. As result, if we do not correct for this if we use the norml distriution in the clcultion depicted in Figure c we will e using distriution tht is too nrrow nd will overestimte the significnce of our finding. For exmple, using the n = 5 smple in Figure for which t = 1.98, the t distriution gives us P =.119. Without the correction uilt into this distriution, we would underestimte P using the norml distriution s P =.48 (Fig. 3) n t When n is lrge, the required correction is smller: the sme t = 1.98 for n = 5 gives P =.54, which is now much closer to the vlue otined from the norml distriution. The reltionship etween t nd P is shown in Figure 3 nd cn e used to express P s function of the quntities on which t depends (D, s x, n). For exmple, if our smple in Figure hd size of t lest n = 8, the oserved expression difference D =.85 would e significnt t P <.5, ssuming we still mesured s x =.96 (t =.5, P =.41). A more generl type of clcultion cn identify conditions for which test cn relily detect whether smple comes from distriution with different men. This speks to the test s power, which we will discuss in the next column. Another wy of thinking out reching significnce is to consider wht popultion mens would yield P <.5. For our exmple, these would e m < 9.66 nd m > 1.4 nd define the rnge of stndrd expression vlues ( ) tht re comptile with our smple. In other words, if the null distriution hd men within this intervl, we would not e le to reject H t P =.5 on the sis of our smple. This is the 95% confidence intervl introduced lst month, given y m = x ± t* s.e.m. ( rerrnged form of the one-smple t-test eqution), where t* is the criticl vlue of the t sttistic for given n nd P. In our exmple, n = 5, P =.5 nd t* =.78. We encourge reders to explore these concepts for themselves using the interctive grphs in Supplementry Tle 1. The one-smple t-test is used to determine whether our smples could come from distriution with given men (for exmple, to compre the smple men to puttive fixed vlue m) nd for constructing confidence intervls for the men. It ppers in mny contexts, such s mesuring protein expression, the quntity of drug delivered y mediction or the weight of cerel in your cerel ox. The concepts underlying this test re n importnt foundtion for future columns in which we will discuss the comprisons cross smples tht re uiquitous in the scientific literture. Mrtin Krzywinski & Nomi Altmn Note: Any Supplementry Informtion nd Source Dt files re ville in the online version of the pper (doi:1.138/nmeth.698). COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. 1. Krzywinski, M. & Altmn, N. Nt. Methods 1, (13). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. 14 VOL.1 NO.11 NOVEMBER 13 nture methods

7 this month npg 13 Nture Americ, Inc. All rights reserved. Points of SIGNIFICANCE Power nd smple size The ility to detect experimentl effects is undermined in studies tht lck power. Sttisticl testing provides prdigm for deciding whether the dt re or re not typicl of the vlues expected when the hypothesis is true. Becuse our ojective is usully to detect deprture from the null hypothesis, it is useful to define n lterntive hypothesis tht expresses the distriution of oservtions when the null is flse. The difference etween the distriutions cptures the experimentl effect, nd the proility of detecting the effect is the sttisticl power. Sttisticl power is criticlly relevnt ut often overlooked. When power is low, importnt effects my not e detected, nd in experiments with mny conditions nd outcomes, such s omics studies, lrge percentge of the significnt results my e wrong. Figure 1 illustrtes this y showing the proportion of inference outcomes in two sets of experiments. In the first set, we optimisticlly ssume tht hypotheses hve een screened, nd 5% hve chnce for n effect (Fig. 1). If they re tested t power of., identified s the medin in recent review of neuroscience literture 1, then 8% of true positive results will e missed, nd % of positive results will e wrong (positive predictive vlue, PPV =.8), ssuming testing ws done t the 5% level (Fig. 1). In experiments with multiple outcomes (e.g., gene expression studies), it is not unusul for fewer thn 1% of the outcomes to hve n priori chnce of n effect. If 9% of hypotheses re null (Fig. 1), the sitution t. power level is lek over twothirds of the positive results re wrong (PPV =.31; Fig. 1). Even t the conventionlly cceptle minimum power of.8, more thn one-third of positive results re wrong (PPV =.64) ecuse lthough we detect greter frction of the true effects (8 out of 1), we declre lrger solute numer of flse positives (4.5 out of 9 nulls). Fiscl constrints on experimentl design, together with commonplce lck of sttisticl rigor, contriute to mny underpowered studies with spurious reports of oth flse positive nd flse negtive effects. The consequences of low power re prticulrly dire in the serch for high-impct Experiment groups 5% effect 1% effect PPV.8 Null Effect present Clssifiction nd proportion of inferences Power =. Power =.5.31 True negtive True positive.91 Power = Flse positive PPV = Flse negtive + Figure 1 When unlikely hypotheses re tested, most positive results of underpowered studies cn e wrong. () Two sets of experiments in which 5% nd 1% of hypotheses correspond to rel effect (lue), with the rest eing null (green). () Proportion of ech inference type within the null nd effect groups encoded y res of colored regions, ssuming 5% of nulls re rejected s flse positives. The frction of positive results tht re correct is the positive predictive vlue, PPV, which decreses with lower effect chnce. Null hypothesis Alterntive hypothesis Inference errors H x* μ x* x* μ A Correct inference 1 α Specificity, 1 α H H α A Power, sensitivity, 1 β Power α β 1 β d Incorrect inference H A β 1 β Type I error, α Type II error, β Expression Expression Figure Inference errors nd sttisticl power. () Oservtions re ssumed to e from the null distriution (H ) with men m. We reject H for vlues lrger thn x* with n error rte (red re). () The lterntive hypothesis (H A ) is the competing scenrio with different men m A. Vlues smpled from H A smller thn x* do not trigger rejection of H nd occur t rte. Power (sensitivity) is 1 (lue re). (c) Reltionship of inference errors to x*. The color key is sme s in Figure 1. results, when the resercher my e willing to pursue lowlikelihood hypotheses for groundreking discovery (Fig. 1). One nlysis of the medicl reserch literture found tht only 36% of the experiments exmined tht hd negtive results could detect 5% reltive difference t lest 8% of the time. More recent reviews of the literture 1,3 lso report tht most studies re underpowered. Reduced power nd n incresed numer of flse negtives is prticulrly common in omics studies, which test t very smll significnce levels to reduce the lrge numer of flse positives. Studies with indequte power re wste of reserch resources nd rguly unethicl when sujects re exposed to potentilly hrmful or inferior experimentl conditions. Addressing this shortcoming is priority the Nture Pulishing Group checklist for sttistics nd methods ( checklist.pdf) includes s the first question: How ws the smple size chosen to ensure dequte power to detect pre-specified effect size? Here we discuss inference errors nd power to help you nswer this question. We ll focus on how the sensitivity nd specificity of n experiment cn e lnced (nd kept high) nd how incresing smple size cn help chieve sufficient power. Let s use the exmple from lst month of mesuring protein s expression level x ginst n ssumed reference level m. We developed the ide of null distriution, H, nd sid tht x ws sttisticlly significntly lrger thn the reference if it exceeded some criticl vlue x* (Fig. ). If such vlue is oserved, we reject H s the cndidte model. Becuse H extends eyond x*, it is possile to flsely reject H, with proility of (Fig. ). This is type I error nd corresponds to flse positive tht is, inferring n effect when there is ctully none. In good experimentl design, is controlled nd set low, trditionlly t =.5, to mintin high specificity (1 ), which is the chnce of true negtive tht is, correctly inferring tht no effect exists. Let s suppose tht x > x*, leding us to reject H. We my hve found something interesting. If x is not drwn from H, wht distriution does it come from? We cn postulte n lterntive hypothesis tht chrcterizes n lterntive distriution, H A, for the oservtion. For exmple, if we expect expression vlues to e lrger y %, H A would hve the sme shpe s H ut men of m A = 1 insted of m = 1 (Fig. ). Intuitively, if oth of these distriutions hve similr mens, we nticipte tht it will e more difficult to relily distinguish etween them. This difference etween the distriutions is typiclly expressed y the difference in their mens, in units of their s.d., s. This mesure, given y c nture methods VOL.1 NO.1 DECEMBER

8 this month npg 13 Nture Americ, Inc. All rights reserved. Compromise etween specificity nd power H μ x* = x* = Specificity.95 H A β.36 α.5 μ A Expression Power.64 Specificity.88 β. α Expression Power.8 Specificity nd power reltionship x* μ Specificity.8 1 α.6 Power.4 1 β. α d = (m A m )/s, is clled the effect size. Sometimes effect size is comined with smple size s the noncentrlity prmeter, d n. In the context of these distriutions, power (sensitivity) is defined s the chnce of ppropritely rejecting H if the dt re drwn from H A. It is clculted from the re of H A in the H rejection region (Fig. ). Power is relted y 1 to the type II error rte,, which is the chnce of flse negtive (not rejecting H when dt re drwn from H A ). A test should idelly e oth sensitive (low flse positive rte, ) nd specific (low flse negtive rte, ). The nd rtes re inversely relted: decresing increses nd reduces power (Fig. c). Typiclly, < ecuse the consequences of flse positive inference (in n extreme cse, retrcted pper) re more serious thn those of flse negtive inference ( missed opportunity to pulish). But the lnce etween nd depends on the ojectives: if flse positives re suject to nother round of testing ut flse negtives re discrded, should e kept low. Let s return to our protein expression exmple nd see how the mgnitudes of these two errors re relted. If we set =.5 nd ssume norml H with s = 1, then we reject H when x > (Fig. 3). The frction of H A eyond this cutoff region is the power (.64). We cn increse power y decresing sensitivity. Incresing to.1 lowers the cutoff to x > 11.17, nd power is now.8. This 5% increse in power hs come t cost: we re now more thn twice s likely to mke flse positive clim ( =.1 vs..5). Figure 3 shows the reltionship etween nd power for our single expression mesurement s function of the position of Impct of smple size on power H H A d = 1 1. n = 3 Power n = 1 d = n = 3 n = Averge expression Power n d = 1.5 Power Power d = Averge expression Power α.1.1 Figure 3 Decresing specificity increses power. H nd H A re ssumed norml with s = 1. () Lowering specificity decreses the H rejection cutoff x*, cpturing greter frction of H A eyond x*, nd increses the power from.64 to.8. () The reltionship etween specificity nd power s function of x*. The open circles correspond to the scenrios in. α Impct of effect size on power.6.4 α d Figure 4 Impct of smple (n) nd effect size (d) on power. H nd H A re ssumed norml with s = 1. () Incresing n decreses the spred of the distriution of smple verges in proportion to 1/ n. Shown re scenrios t n = 1, 3 nd 7 for d = 1 nd =.5. Right, power s function of n t four different vlues for d = 1. The circles correspond to the three scenrios. () Power increses with d, mking it esier to detect lrger effects. The distriutions show effect sizes d = 1, 1.5 nd for n = 3 nd =.5. Right, power s function of d t four different vlues for n = 3. H rejection cutoff, x*. The S-shpe of the power curve reflects the rte of chnge of the re under H A eyond x*. The close coupling etween nd power suggests tht for m A = 1 the highest power we cn chieve for.5 is.64. How cn we improve our chnce to detect incresed expression from H A (increse power) without compromising (incresing flse positives)? If the distriutions in Figure 3 were nrrower, their overlp would e reduced, greter frction of H A would lie eyond the x* cutoff nd power would e improved. We cn t do much out s, lthough we could ttempt to lower it y reducing mesurement error. A more direct wy, however, is to tke multiple smples. Now, insted of using single expression vlues, we formulte null nd lterntive distriutions using the verge expression vlue from smple xˉ tht hs spred s/ n (ref. 4). Figure 4 shows the effect of smple size on power using distriutions of the smple men under H nd H A. As n is incresed, the H rejection cutoff is decresed in proportion with the s.e.m., reducing the overlp etween the distriutions. Smple size sustntilly ffects power in our exmple. If we verge seven mesurements (n = 7), we re le to detect 1% increse in expression levels (m A = 11, d = 1) 84% of the time with =.5. By vrying n we cn chieve desired comintion of power nd for given effect size, d. For exmple, for d = 1, smple size of n = chieves power of.99 for =.1. Another wy to increse power is to increse the size of the effect we wnt to relily detect. We might e le to induce lrger effect size with more extreme experimentl tretment. As d is incresed, so is power ecuse the overlp etween the two distriutions is decresed (Fig. 4). For exmple, for =.5 nd n = 3, we cn detect m A = 11, 11.5 nd 1 (1%, 15% nd % reltive increse; d = 1, 1.5 nd ) with power of.53,.83 nd.97, respectively. These clcultions re idelized ecuse the exct shpes of H nd H A were ssumed known. In prctice, ecuse we estimte popultion s from the smples, power is decresed nd we need slightly lrger smple size to chieve the desired power. Blncing smple size, effect size nd power is criticl to good study design. We egin y setting the vlues of type I error () nd power (1 ) to e sttisticlly dequte: trditionlly.5 nd.8, respectively. We then determine n on the sis of the smllest effect we wish to mesure. If the required smple size is too lrge, we my need to ressess our ojectives or more tightly control the experimentl conditions to reduce the vrince. Use the interctive grphs in Supplementry Tle 1 to explore power clcultions. When the power is low, only lrge effects cn e detected, nd negtive results cnnot e relily interpreted. Ensuring tht smple sizes re lrge enough to detect the effects of interest is n essentil prt of study design. Mrtin Krzywinski & Nomi Altmn Note: Any Supplementry Informtion nd Source Dt files re ville in the online version of the pper (doi:1.138/nmeth.738). COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. Corrected fter print 6 Novemer Button, K.S. et l. Nt. Rev. Neurosci. 14, (13).. Moher, D., Dulerg, C.S. & Wells, G.A. J. Am. Med. Assoc. 7, 1 14 (1994). 3. Breu, R.H., Crnt, T.A. & Goury, I. J. Urol. 176, (6). 4. Krzywinski, M.I. & Altmn, N. Nt. Methods 1, (13). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. 114 VOL.1 NO.1 DECEMBER 13 nture methods

9 ERRATA Errtum: Power nd smple size Mrtin Krzywinski & Nomi Altmn Nt. Methods 1, (13); pulished online 6 Novemer 13; corrected fter print 6 Novemer 13 In the print version of this rticle initilly pulished, the symol µ ws represented incorrectly in the eqution for effect size, d = (µ A µ )/σ. The error hs een corrected in the HTML nd PDF versions of the rticle. npg 13 Nture Americ, Inc. All rights reserved. nture methods

10 npg 14 Nture Americ, Inc. All rights reserved. Points of SIGNIFICANCE Visulizing smples with ox plots Use ox plots to illustrte the spred nd differences of smples. Visuliztion methods enhnce our understnding of smple dt nd help us mke comprisons cross smples. Box plots re simple ut powerful grphing tool tht cn e used in plce of histogrms to ddress oth gols. Wheres histogrms require smple size of t lest 3 to e useful, ox plots require smple size of only 5, provide more detil in the tils of the distriution nd re more redily compred cross three or more smples. Severl enhncements to the sic ox plot cn render it even more informtive. Popultion distriution m IQR σ μ + σ 4 4 Construction of ox plot Smple, n = 1.5 IQR IQR 1.5 IQR Q1 m Notch ~ n 95% CI for m m ± 1.58 IQR/ n 1 1 Figure 1 The construction of ox plot. () The medin (m =.19, solid verticl line) nd interqurtile rnge (IQR = 1.38, gry shding) re idel for chrcterizing symmetric or irregulrly shped distriutions. A skewed norml distriution is shown with men m = (drk dotted line) nd s.d. s = 1 (light dotted lines). () Box plots for n n = smple from. The ox ounds the IQR divided y the medin, nd Tukey-style whiskers extend to mximum of 1.5 IQR eyond the ox. The ox width my e scled y n, nd notch my e dded pproximting 95% confidence intervl (CI) for the medin. Open circles re smple dt points. Dotted lines indicte the lengths or widths of nnotted fetures. Box plots chrcterize smple using the 5th, 5th nd 75th percentiles lso known s the lower qurtile (Q1), medin (m or Q) nd upper qurtile (Q3) nd the interqurtile rnge (IQR = Q3 Q1), which covers the centrl 5% of the dt. Qurtiles re insensitive to outliers nd preserve informtion out the center nd spred. Consequently, they re preferred over the men nd s.d. for popultion distriutions tht re symmetric or irregulrly shped nd for smples with extreme outliers. In such cses these mesures my e difficult to intuitively interpret: the men my e fr from the ulk of the dt, nd conventionl rules for interpreting the s.d. will likely not pply. The core element tht gives the ox plot its nme is ox whose length is the IQR nd whose width is ritrry (Fig. 1). A line inside the ox shows the medin, which is not necessrily centrl. The plot my e oriented verticlly or horizontlly we use here (with one exception) horizontl oxes to mintin consistent orienttion with corresponding smple distriutions. Whiskers re conventionlly extended to the most extreme dt point tht is no more thn 1.5 IQR from the edge of the ox (Tukey style) or ll the wy to minimum nd mximum of the dt vlues (Sper style). The use Q3 Whiskers Outliers this month Smple vriility n = 5 n = 1 n = n = Figure Box plots reflect smple vriility nd should e voided for very smll smples (n < 5), with notches shown only when they pper within the IQR. Tukey-style ox plots for five smples with smple size n = 5, 1, nd 5 drwn from the distriution in Figure 1 re shown; notch width is s in Figure 1. Verticl dotted lines show Q1 (.78), medin (.19), Q3 (.6) nd Q IQR (.67) vlues for the distriution. of qurtiles for ox plots is well-estlished convention: oxes or whiskers should never e used to show the men, s.d. or s.e.m. As with the division of the ox y the medin, the whiskers re not necessrily symmetricl (Fig. 1). The 1.5 multiplier corresponds to pproximtely ±.7s (where s is s.d.) nd 99.3% coverge of the dt for norml distriution. Outliers eyond the whiskers my e individully plotted. Box plot construction requires smple of t lest n = 5 (preferly lrger), lthough some softwre does not check for this. For n < 5 we recommend showing the individul dt points. Smple size differences cn e ssessed y scling the ox plot width in proportion to n (Fig. 1), the fctor y which the precision of the smple s estimte of popultion sttistics improves s smple size is incresed. To ssist in judging differences etween smple medins, notch (Fig. 1) cn e used to show the 95% confidence intervl (CI) for the medin, given y m ± 1.58 IQR/ n (ref. 1). This is n pproximtion sed on the norml distriution nd is ccurte in lrge smples for other distriutions. If you suspect the popultion distriution is not close to norml nd your smple size is smll, void interpreting the intervl nlyticlly in the wy we hve descried for CI error rs. In generl, when notches do not overlp, the medins cn e judged to differ significntly, ut overlp does not rule out significnt difference. For smll smples the notch my spn lrger intervl thn the ox (Fig. ). The exct position of ox oundries will e softwre dependent. First, there is no universlly greedupon method to clculte qurtile vlues, which my e sed on simple verging or liner interpoltion. Second, some pplictions, such s R, use hinges insted of qurtiles for ox oundries. The lower nd upper hinges re the medin of the Figure 3 Qurtiles re more intuitive thn the men nd s.d. for smples from skewed distriutions. Four distriutions with the sme men (m =, drk dotted line) nd s.d. (s = 1, light dotted lines) ut significntly different medins (m) nd IQRs re shown with corresponding Tukeystyle ox plots for n = 1, smples. Uniform Norml Skew norml Slight right skew Skew norml Strong right skew m 4 4 IQR nture methods VOL.11 NO. FEBRUARY

11 this month npg 14 Nture Americ, Inc. All rights reserved. 3 1 C Mens s r plots Not recommended Mens s sctter plots Error rs 4 s.e.m. 95% CI 3 1 Box plots with optionl mens nd 95% CI Figure 4 Box plots re more communictive wy to show smple dt. Dt re shown for three n = smples from norml distriutions with s.d. s = 1 nd men m = 1 (A,B) or 3 (C). () Showing smple men nd s.e.m. using r plots is not recommended. Note how the chnge of seline or cutting the y xis ffects the comprtive heights of the rs. () When smple size is sufficiently lrge (n > 3), sctter plots with s.e.m. or 95% confidence intervl (CI) error rs re suitle for compring centrl tendency. (c) Box plots my e comined with smple men nd 95% CI error rs to communicte more informtion out smples in roughly the sme mount of spce. lower nd upper hlf of the dt, respectively, including the medin if it is prt of the dt. Boxes sed on hinges will e slightly different in some circumstnces thn those sed on qurtiles. Aspects of the ox plot such s width, whisker position, notch size nd outlier disply re suject to tuning; it is therefore importnt to clerly lel how your ox plot ws constructed. Fewer thn % of ox plot figures in 13 Nture Methods ppers specified oth smple size nd whisker type in their legends we encourge uthors to e more specific. The ox plot is sed on smple sttistics, which re estimtes of the corresponding popultion vlues. Smple vriility will e reflected in the vrition of ll spects of the ox plot (Fig. ). Modest smple sizes (n = 5 1) from the sme popultion cn yield very different ox plots whose notches re likely to extend eyond the IQR. Even for lrge smples (n = 5), whisker positions cn vry gretly. We recommend lwys indicting the smple size nd voiding notches unless they fll entirely within the IQR. Although the men nd s.d. cn lwys e clculted for ny smple, they do not intuitively communicte the distriution of vlues (Fig. 3). Highly skewed distriutions pper in ox plot form with c mrkedly shorter whisker-nd-ox region nd n sence of outliers on the side opposite the skew. Keep in mind tht for smll smple sizes, which do not necessrily represent the distriution well, these fetures my pper y chnce. We strongly discourge using r plots with error rs (Fig. 4), which re est used for counts or proportions 3. These chrts continue to e prevlent (we counted 1 figures tht used them in 13 Nture Methods ppers, compred to only tht used ox plots). They typiclly show only one rm of the error r, mking overlp comprisons difficult. More importntly, the r itself encourges the perception tht the men is relted to its height rther thn the position of its top. As result, the choice of seline cn interfere with ssessing reltive sizes of mens nd their error rs. The ddition of xis reks nd log scling mkes visul comprisons even more difficult. The trditionl men-nd-error sctter plot with s.e.m. or 95% CI error rs (Fig. 4) cn e incorported into ox plots (Fig. 4c), thus comining detils out the smple with n estimte of the popultion men. For smll smples, the s.e.m. r my extend eyond the ox. If dt re normlly distriuted, >95% of s.e.m. rs will e within the IQR for n 14. For 95% CI rs, the cutoff is n 8. Becuse they re sed on sttistics tht do not require us to ssume nything out the shpe of the distriution, ox plots roustly provide more informtion out smples thn conventionl error rs. We encourge their wider use nd direct the reder to (ref. 4), convenient online tool to crete ox plots tht implements ll the options descried here. Mrtin Krzywinski & Nomi Altmn COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. 1. McGill, R., Tukey, J.W & Lrsen, W.A. Am. Stt. 3, 1 16 (1978).. Krzywinski, M. & Altmn, N. Nt. Methods 1, 91 9 (13). 3. Streit, M. & Gehlenorg, N. Nt. Methods 11, 117 (14). 4. Spitzer, M. et l. Nt. Methods 11, 11 1 (14). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. 1 VOL.11 NO. FEBRUARY 14 nture methods

12 npg 14 Nture Americ, Inc. All rights reserved. Points of Significnce Compring smples prt I Roustly compring pirs of independent or relted smples requires different pproches to the t-test. Among the most common types of experiments re comprtive studies tht contrst outcomes under different conditions such s mle versus femle, plceo versus drug, or efore versus fter tretment. The nlysis of these experiments clls for methods to quntittively compre smples to judge whether differences in dt support the existence of n effect in the popultions they represent. This nlysis is strightforwrd nd roust when independent smples re compred; ut reserchers must often compre relted smples, nd this requires different pproch. We discuss oth situtions. We ll egin with the simple scenrio of compring two conditions. This cse is importnt to understnd ecuse it serves s foundtion for more complex designs with multiple simultneous comprisons. For exmple, we my wish to contrst severl tretments, trck the evolution of n effect over time or consider comintions of tretments nd sujects (such s different drugs on different genotypes). We will wnt to ssess the size of oserved differences reltive to the uncertinty in the smples. By uncertinty, we men the spred s mesured y the s.d., written s s nd s when referring to the popultion nd smple estimte, respectively. It is more convenient to model uncertinty using vrince, which is the squre of the s.d. nd denoted y Vr() (or s ) nd s for the popultion nd smple, respectively. Using this nottion, the reltionship etween the uncertinty in the popultion of smple mens nd tht of the popultion is Vr( ) = Vr(X)/n for smples Smple vs. reference μ X μ X X μ Vrince Vr(X) Vr(X)/n Vr(X)/n Popultion distriutions Y X Distriution of smple mens Distriution of difference in smple mens Smple vs. smple Y X X Y m = Vrince Vr(Y )Vr(X) Vr(Y)/m Vr(X)/n Vr(X)/n + Vr(Y)/m Figure 1 The uncertinty in sum or difference of rndom vriles is the sum of the vriles individul uncertinties, s mesured y the vrince. Numericl vlues reflect smple estimtes from Figure. Horizontl error rs show s.d., which is Vr. () Compring smple to reference vlue involves only one mesure of uncertinty: the vrince of the smple s underlying popultion, Vr(X). The vrince of the smple men is reduced in proportion to the smple size s Vr(X)/n, which is lso the uncertinty in the estimte of the difference etween smple nd reference. () When the reference is replced y smple Y of size m, the vrince of Y contriutes to the uncertinty in the difference of mens. n = 5 n = 5 Expression this month One-smple t-test Two-smple t-test X s s X s s Y s s X X X X Y Y X μ 1 ( ) ( μ ) t = t = X Y 1 μ s s X X 11 X X Y μ 1 Y s = s + s X Y X Y 9 sp / n + sp / m Expression Figure In the two-smple test, oth smples contriute to the uncertinty in the difference of mens. () The difference etween smple (n = 5, = 11.1, s X =.84) nd reference vlue (m = 1) cn e ssessed with one-smple t-test. () When the reference vlue is itself smple ( = 1, s Y =.85), the two-smple version of the test is used, in which the t-sttistic is sed on comined spred of X nd Y, which is estimted using the pooled vrince, s p. of size n. The equivlent sttement for smple dt is s = s X /n, where s is the s.e.m. nd s X is the smple s.d. Recll our exmple of the one-smple t-test in which the expression of protein ws compred to reference vlue 1. Our gol will e to extend this pproch, in which only one quntity hd uncertinty, to ccommodte comprison of two smples, in which oth quntities now hve uncertinty. Figure 1 encpsultes the relevnt distriutions for the one-smple scenrio. We ssumed tht our smple X ws drwn from popultion, nd we used the smple men to estimte the popultion men. We defined the t-sttistic (t) s the difference etween the smple men nd the reference vlue, m, in units of uncertinty in the men, given y the s.e.m., nd showed tht t follows the Student s t-distriution 1 when the reference vlue is the men of the popultion. We computed the proility tht the difference etween the smple nd reference ws due to the uncertinty in the smple men. When this proility ws less thn fixed type I error level,, we concluded tht the popultion men differed from m. Let s now replce the reference with smple Y of size m (Fig. 1). Becuse the smple mens re n estimte of the popultion mens, the difference serves s our estimte of the difference in the men of the popultions. Of course, popultions cn vry not only in their mens, ut for now we ll focus on this prmeter. Just s in the one-smple cse, we wnt to evlute the difference in units of its uncertinty. The dditionl uncertinty introduced y replcing the reference with Y will need to e tken into ccount. To estimte the uncertinty in, we cn turn to useful result in proility theory. For ny two uncorrelted rndom quntities, X nd Y, we hve the following reltionship: Vr(X Y) = Vr(X) + Vr(Y). In other words, the expected uncertinty in difference of vlues is the sum of individul uncertinties. If we hve reson to elieve tht the vrinces of the two popultions re out the sme, it is customry to use the verge of smple vrinces s n estimte of oth popultion vrinces. This is clled the pooled vrince, s p. If the smple sizes re equl, it is computed y simple verge, s p = (s X + s Y )/. If not, it is n verge weighted y n 1 nd m 1, respectively. Using the pooled vrince nd pplying the ddition of vrinces rule to the vrince of smple mens gives Vr( ) = s p /n + s p /m. The uncertinty in is given y its s.d., which is the squre root of this quntity. To illustrte with concrete exmple, we hve reproduced the protein expression one-smple t-test exmple 1 in Figure nd contrst it to its two-smple equivlent in Figure. We hve djusted smple vlues slightly to etter illustrte the difference etween these two tests. For the one-smple cse, we find t =.93 nd corresponding P vlue of.4. At type I error cutoff of =.5, we cn conclude tht the protein expression is significntly elevted reltive to the refer- nture methods VOL.11 NO.3 MARCH 14 15

13 this month npg 14 Nture Americ, Inc. All rights reserved. Independent smples Pired smples Smple of pired differences X Y X Y D s s D D X 11 X 1 D 1 Y 1 Y μ Expression Expression Figure 3 The pired t-test is pproprite for mtched-smple experiments. () When smples re independent, within-smple vriility mkes differences etween smple mens difficult to discern, nd we cnnot sy tht X nd Y re different t =.5. () If X nd Y represent pired mesurements, such s efore nd fter tretment, differences etween vlue pirs cn e tested, therey removing within-smple vriility from considertion. (c) In pired test, differences etween vlues re used to construct new smple, to which the one-smple test is pplied (D = 1.1, s D =.65). ence. For the two-smple cse, t =.6 nd P =.73. Now, when the reference is replced with smple, the dditionl uncertinty in our difference estimte hs resulted in smller t vlue tht is no longer significnt t the sme level. In the lookup etween t nd P for two-smple test, we use d.f. = n + m degrees of freedom, which is the sum of d.f. vlues for ech smple. Our inility to reject the null hypothesis in the cse of two smples is direct result of the fct tht the uncertinty in is lrger thn in m (Fig. 1) ecuse now Vr( ) is contriuting fctor. To rech significnce, we would need to collect dditionl mesurements. Assuming the smple mens nd s.d. do not chnge, one dditionl mesurement would e sufficient it would decrese Vr( ) nd increse the d.f. The ltter hs the effect of reducing the width of the t-distriution nd lowering the P vlue for given t. This reduction in sensitivity is ccompnied y reduction in power. The two-smple test hs lower power thn the one-smple equivlent, for the sme vrince nd numer of oservtions per group. Our one-smple exmple with smple size of 5 hs power of 5% for n expression chnge of 1.. The corresponding power for the two-smple test with five oservtions per smple is 38%. If the smple vrince remined constnt, to rech the 5% power, the twosmple test would require lrger smples (n = m = 7). When ssumptions re met, the two-smple t-test is the optiml procedure for compring mens. The roustness of the test is of interest ecuse these ssumptions my e violted in empiricl dt. One wy deprture from optiml performnce is reported is y the difference etween the type I error rte we think we re testing t nd the ctul type I error rte, t. If ll ssumptions re stisfied, = t, nd our chnce of committing type I error is indeed equl to. However, filing to stisfy ssumptions cn result in t >, cusing us to commit type I error more often thn we think. In other words, our rte of flse positives will e lrger thn plnned for. Let s exmine the ssumptions of the t-test in the context of roustness. First, the t-test ssumes tht smples re drwn from popultions tht re norml in shpe. This ssumption is the lest urdensome. Systemtic simultions of wide rnge of prcticl distriutions find tht the type I error rte is stle within.3 < t <.6 for =.5 for n 5 (ref. 3). Next, smple popultions re required to hve the sme vrince (Fig. 1). Fortuntely, the test is lso extremely roust with respect to this requirement more so thn most people relize 3. For exmple, when the smple sizes re equl, testing t =.5 (or =.1) gives t <.6 (t <.15) for n 15, regrdless of the difference in popultion c Chnge in expression vrinces. If these smple sizes re imprcticl, then we cn fll ck on the result tht t <.64 when testing t =.1 regrdless of n or difference in vrince. When smple sizes re unequl, the impct of vrince difference is much lrger, nd t cn deprt from sustntilly. In these cses, the Welch s vrint of the t-test is recommended, which uses ctul smple vrinces, s X /n + s Y /m, in plce of the pooled estimte. The test sttistic is computed s usul, ut the d.f. for the reference distriution depends on the estimted vrinces. The finl, nd rguly most importnt, requirement is tht the smples e uncorrelted. This requirement is often phrsed in terms of independence, though the two terms hve different technicl definitions. Wht is importnt is tht their Person correltion coefficient (r) e, or close to it. Correltion etween smples cn rise when dt re otined from mtched smples or repeted mesurements. If smples re positively correlted (lrger vlues in first smple re ssocited with lrger vlues in second smple), then the test performs more conservtively (t < ) 4, wheres negtive correltions increse the rel type I error (t > ). Even smll mount of correltion cn mke the test difficult to interpret testing t =.5 gives t <.3 for r >.1 nd t >.8 for r <.1. If vlues cn e pired cross smples, such s mesurements of the expression of the sme set of proteins efore nd fter experimentl intervention, we cn frme the nlysis s one-smple prolem to increse the sensitivity of the test. Consider the two smples in Figure 3, which use the sme vlues s in Figure. If smples X nd Y ech mesure different sets of proteins, then we hve lredy seen tht we cnnot confidently conclude tht the smples re different. This is ecuse the spred within ech smple is lrge reltive to the differences in smple mens. However, if Y mesures the expression of the sme proteins s X, ut fter some intervention, the sitution is different (Fig. 3), now we re concerned not with the spred of expression vlues within smple ut with the chnge of expression of protein from one smple to nother. By constructing smple of differences in expression (D; Fig. 3c), we reduce the test to one-smple t-test in which the sole source of uncertinty is the spred in differences. The spred within X nd Y hs een fctored out of the nlysis, mking the test of expression difference more sensitive. For our exmple, we cn conclude tht expression hs chnged etween X nd Y t P =. (t = 3.77) y testing ginst the null hypothesis tht m =. This method is sometimes clled the pired t-test. We will continue our discussion of smple comprison next month, when we will discuss how to pproch crrying out nd reporting multiple comprisons. In the mentime, Supplementry Tle 1 cn e used to interctively explore two-smple comprisons. Mrtin Krzywinski & Nomi Altmn Note: Any Supplementry Informtion nd Source Dt files re ville in the online version of the pper (doi:1.138/nmeth.858). COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. 1. Krzywinski, M. & Altmn, N. Nt. Methods 1, (13).. Krzywinski, M. & Altmn, N. Nt. Methods 1, (13). 3. Rmsey, P.H. J. Educ. Stt. 5, (198). 4. Wiedermn, W. & von Eye, A. Psychol. Test Assess. Model. 55, (13). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. 16 VOL.11 NO.3 MARCH 14 nture methods

14 npg 14 Nture Americ, Inc. All rights reserved. Points of SIGNIFICANCE Compring smples prt II When lrge numer of tests re performed, P vlues must e interpreted differently. It is surprising when your est friend wins the lottery ut not when rndom person in New York City wins. When we re monitoring lrge numer of experimentl results, whether it is expression of ll the fetures in n omics experiment or the outcomes of ll the experiments done in the lifetime of project, we expect to see rre outcomes tht occur y chnce. The use of P vlues, which ssign mesure of rrity to single experimentl outcome, is misleding when mny experiments re considered. Consequently, these vlues need to e djusted nd reinterpreted. The methods tht chieve this re clled multiple-testing corrections. We discuss the sic principles of this nlysis nd illustrte severl pproches. Recll the interprettion of the P vlue otined from single twosmple t-test: the proility tht the test would produce sttistic t lest s extreme, ssuming tht the null hypothesis is true. Significnce is ssigned when P, where is the type I error rte set to control flse positives. Applying conventionl =.5, we expect 5% chnce of mking flse positive inference. This is the per-comprison error rte (PCER). When we now perform N tests, this reltively smll PCER cn result in lrge numer of flse positive inferences, N. For exmple, if N = 1,, s is common in nlyses tht exmine lrge gene sets, we expect 5 genes to e incorrectly ssocited with n effect for =.5. If the effect chnce is 1% nd test power is 8%, we ll conclude tht 1,5 genes show n effect, nd we will e wrong 45 out of 1,5 times. In other words, roughly 1 out of 3 discoveries will e flse. For cses in which the effect chnce is even lower, our list of significnt genes will e over-run with flse positives: for 1% effect chnce, 6 out of 7 (495 of 575) discoveries re flse. The role of multiple-testing correction methods is to mitigte these issues lrge Gene 1 Gene Gene 3... Gene N Simultion gene expression smples Effect Expression smples, n = 5 Control Tretment Expression Simultion gene groups 1% effect chnce 5% effect chnce True negtive Flse positive True positive Flse negtive FPR = + FDR = + FNR = + Power = +... d = P vlue Figure 1 The experimentl design of our gene expression simultion. () A gene s expression ws simulted y control nd tretment smple (n = 5 ech) of normlly distriuted vlues (m =, s = 1). For frction of genes, n effect size d = (8% power) ws simulted y setting m =. () Gene dt sets were generted for 1% nd 5% effect chnces. P vlues were tested t =.5, nd inferences were ctegorized s shown y the color scheme. For ech dt set nd correction method, flse positive rte (FPR), flse detection rte (FDR) nd power were clculted. FNR is the flse negtive rte. P 1 P P 3... P N Tests 1 1 1, 1, 5% % FPR FDR Power FPR FDR this month Influence of multiple-test correction methods on proportion of inferences 1% effect chnce Method 5% effect chnce 1% Bonferroni BH Storey 33% 36% 36% Bonferroni BH Storey Bonferroni BH Storey Bonferroni BH Storey 5% % Power Figure Fmily-wise error rte (FWER) methods such s Bonferroni s negtively ffect sttisticl power in comprisons cross mny tests. Flse discovery rte (FDR)-sed methods such s Benjmini-Hocherg (BH) nd Storey s re more sensitive. Brs show flse positive rte (FPR), FDR nd power for ech comintion of effect chnce nd N on the sis of inference counts using P vlues from the gene expression simultion (Fig. 1) djusted with different methods (undjusted ( ), Bonferroni, BH nd Storey). Storey s method did not provide consistent results for N = 1 ecuse lrger numer of tests is needed. numer of flse positives nd lrge frction of flse discoveries while idelly keeping power high. There re mny djustment methods; we will discuss common ones tht djust the P vlue. To illustrte their effect, we performed simultion of typicl omics expression experiment in which N genes re tested for n effect etween control nd tretment (Fig. 1). Some genes were simulted to hve differentil expression with n effect size d =, which corresponded to test power of 8% t =.5. The P vlue for the difference in expression etween control nd tretment smples ws computed with two-smple t-test. We creted dt sets with N = 1, 1, 1, nd 1, genes nd n effect chnce (percentge of genes hving nonzero effect) of 1% nd 5% (Fig. 1). We performed the simultion 1 times for ech comintion of N nd effect chnce to reduce the vriility in the results to etter illustrte trends, which re shown in Figure. Figure 1 defines useful mesures of the performnce of the multiple-comprison experiment. Depending on the correction method, one or more of these mesures re prioritized. The flse positive rte (FPR) is the chnce of inferring n effect when no effect is present. Without P vlue djustment, we expect FPR to e close to. The flse discovery rte (FDR) is the frction of positive inferences tht re flse. Techniclly, this term is reserved for the expected vlue of this frction over ll smples for ny given smple, the term flse discovery percentge (FDP) is used, ut either cn e used if there is no miguity. Anlogously to the FDR, the flse nondiscovery rte (FNR) mesures the error rte in terms of flse negtives. Together the FDR nd FNR re the multiple-test equivlents of type I nd type II error levels. Finlly, power is the frction of rel effects tht re detected 1. The performnce of populr correction methods is illustrted using FPR, FDR nd power in Figure. The simplest correction method is Bonferroni s, which djusts the P vlues y multiplying them y the numer of tests, P = PN, up to mximum vlue of P = 1. As result, P vlue my lose its significnce in the context of multiple tests. For exmple, for N = 1, tests, n oserved P =.1 is djusted P =.1. The effect of this nture methods VOL.11 NO.4 APRIL

15 this month npg 14 Nture Americ, Inc. All rights reserved. Distriution of undjusted P vlues Effect sent Effect present Both 1% effect chnce Definition of FDR FDR = α c Estimte of π nd FDR 5% effect chnce Assume effect R sent for P > λ FDR Determine oundry αnπ /R to estimte π ~Nπ.5 1. ~ α Nπ tests re without effect P vlue α λ Figure 3 The shpe of the distriution of undjusted P vlues cn e used to infer the frction of hypotheses tht re null nd the flse discovery rte (FDR). () P vlues from null re expected to e distriuted uniformly, wheres those for which the null is flse will hve more smll vlues. Shown re distriutions from the simultion for N = 1,. () Inference types using color scheme of Figure 1 on the P vlue histogrm. The FDR is the frction of P < tht correspond to flse positives. (c) Storey s method first estimtes the frction of comprisons for which the null is true, p, y counting the numer of P vlues lrger thn cutoff l (such s.5) reltive to (1 l)n (such s N/), the count expected when the distriution is uniform. If R discoveries re oserved, out Np re expected to e flse positives, nd FDR cn e estimted y Np /R. correction is to control the proility of committing even one type I error cross ll tests. The chnce of this is clled the fmily-wise error rte (FWER), nd Bonferroni s correction ensures tht FWER <. FWER methods such s Bonferroni s re extremely conservtive nd gretly reduce the test s power in order to control the numer of flse positives, prticulrly s the numer of tests increses (Fig. ). For N = 1 comprisons, our simultion shows reduction in power for Bonferroni from 8% to ~33% for oth 1% nd 5% effect chnce. These vlues drop to ~8% for N = 1, nd y the time we re testing lrge dt set with N = 1,, our power is ~.%. In other words, for 1% effect chnce, out of the 1, genes tht hve n effect, we expect to find only! Unless the cost of flse positive gretly outweighs the cost of flse negtive, pplying Bonferroni correction mkes for n inefficient experiment. There re other FWER methods (such s Holm s nd Hocherg s) tht re designed to increse power y pplying less stringent djustment to the P vlues. The enefits of these vrints re relized when the numer of comprisons is smll (for exmple, <) nd the effect rte is high, ut neither method will rescue the power of the test for lrge numer of comprisons. In most situtions, we re willing to ccept certin numer of flse positives, mesured y FPR, s long s the rtio of flse positives to true positives is low, mesured y FDR. Methods tht control FDR such s Benjmini-Hocherg (BH), which scles P vlues in inverse proportion to their rnk when ordered provide etter power chrcteristics thn FWER methods. Our simultion shows tht their power does not decrese s quickly s Bonferroni s with N for smll effect chnce (for exmple, 1%) nd ctully increses with N when the effect chnce is high (Fig. ). At N = 1,, wheres Bonferroni correction hs power of <%, BH mintins 1% nd 56% power t 1% nd 5% effect rte while keeping FDR t 4.4% nd.%, respectively. Now, insted of identifying two genes t N = 1, nd effect rte 1% with Bonferroni, we find 88 nd re wrong only four times. The finl method shown in Figure is Storey s, which introduces two useful mesures: p nd the q vlue. This pproch is sed on the oservtion tht if the requirements of the t-test re met, the distriution of its P vlues for comprisons for which the null is true is expected + to e uniform (y definition of the P vlue). In contrst, comprisons corresponding to n effect will hve more P vlues close to (Fig. 3). In rel-world experiment we do not know which comprisons truly correspond to n effect, so ll we see is the ggregte distriution, shown s the third histogrm in Figure 3. If the effect rte is low, most of our P vlues will come from cses in which the null is true, nd the pek ner will e less pronounced thn for high effect chnce. The pek will lso e ttenuted when the power of the test is low. When we perform the comprison P on undjusted P vlues, ny vlues from the null will result in flse positive (Fig. 3). This results in very lrge FDR: for the undjusted test, FDR = 36% for N = 1, nd 1% effect chnce. Storey s method djusts P vlues with rnk scheme similr to tht of BH ut incorportes the estimte of the frction of tests for which the null is true, p. Conceptully, this frction corresponds to prt of the distriution elow the optiml oundry tht splits it into uniform (P under true null) nd skewed components (P under flse null) (Fig. 3). Two common estimtes of p re twice the verge of ll P vlues (Pound nd Cheng s method) nd /N times the numer of P vlues greter thn.5 (Storey s method). The ltter is specific cse of generlized estimte in which different cutoff, l, is chosen (Fig. 3c). Although p is used in Storey s method in djusting P vlues, it cn e estimted nd used independently. Storey s method performs very well, s long s there re enough comprisons to roustly estimte p. For ll simultion scenrios, power is etter thn BH, nd FDR is more tightly controlled t 5%. Use the interctive grphs in Supplementry Tle 1 to run the simultion nd explore djusted P-vlue distriutions. The consequences of misinterpreting the P vlue re repetedly rised,3. The pproprite mesure to report in multiple-testing scenrios is the q vlue, which is the FDR equivlent of the P vlue. Adjusted P vlues otined from methods such s BH nd Storey s re ctully q vlues. A test s q vlue is the minimum FDR t which the test would e declred significnt. This FDR vlue is collective mesure clculted cross ll tests with FDR q. For exmple, if we consider comprison with q =.1 significnt, then we ccept n FDR of t most.1 mong the set of comprisons with q.1. This FDR should not e confused with the proility tht ny given test is flse positive, which is given y the locl FDR. The q vlue hs more direct mening to lortory ctivities thn the P vlue ecuse it reltes the proportion of errors in the quntity of interest the numer of discoveries. The choice of correction method depends on your tolernce for flse positives nd the numer of comprisons. FDR methods re more sensitive, especilly when there re mny comprisons, wheres FWER methods scrifice sensitivity to control flse positives. When the ssumptions of the t-test re not met, the distriution of P vlues my e unusul nd these methods lose their pplicility we recommend lwys performing quick visul check of the distriution of P vlues from your experiment efore pplying ny of these methods. Note: Any Supplementry Informtion nd Source Dt files re ville in the online version of the pper (doi:1.138/nmeth.9). COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. Mrtin Krzywinski & Nomi Altmn 1. Krzywinski, M. & Altmn, N. Nt. Methods 1, (13).. Nuzzo, R. Nture 56, (14). 3. Anonymous. Troule t the l. Economist 6 3 (19 Octoer 13). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. 356 VOL.11 NO.4 APRIL 14 nture methods

16 npg 14 Nture Americ, Inc. All rights reserved. Points of SIGNIFICANCE Nonprmetric tests Nonprmetric tests roustly compre skewed or rnked dt. We hve seen tht the t-test is roust with respect to ssumptions out normlity nd equivrince 1 nd thus is widely pplicle. There is nother clss of methods nonprmetric tests more suitle for dt tht come from skewed distriutions or hve discrete or ordinl scle. Nonprmetric tests such s the sign nd Wilcoxon rnk-sum tests relx distriution ssumptions nd re therefore esier to justify, ut they come t the cost of lower sensitivity owing to less informtion inherent in their ssumptions. For smll smples, the performnce of these tests is lso constrined ecuse their P vlues re only corsely smpled nd my hve lrge minimum. Both issues re mitigted y using lrger smples. These tests work nlogously to their prmetric counterprts: test sttistic nd its distriution under the null re used to ssign significnce to oservtions. We compre in Figure 1 the one-smple t-test to nonprmetric equivlent, the sign test (though more sensitive nd sophisticted vrints exist), using puttive smple X whose source distriution we cnnot redily identify (Fig. 1). The null hypothesis of the sign test is tht the smple medin m X is equl to the proposed medin, M =.4. The test uses the numer of smple vlues lrger thn M s its test sttistic, W under the null we expect to see s mny vlues elow the medin s ove, with the exct proility given y the inomil distriution (Fig. 1c). The medin is more useful descriptor thn the men for symmetric nd otherwise irregulr distriutions. The sign test mkes no ssumptions out the distriution only tht smple vlues e independent. If we propose tht the popultion medin is M =.4 nd we oserve X, we find W = 5 (Fig. 1). The chnce of oserving vlue of W under the null tht is t lest s extreme (W 1 or W 5) is P =., using oth tils of the inomil distriution (Fig. 1c). To limit the test to whether the medin of X ws ised towrds vlues lrger thn M, we would consider only the re for W 5 in the right til to find P =.11. The P vlue of. from the sign test is much higher thn tht from the t-test (P =.4), reflecting tht the sign test is less sensitive. This is ecuse it is not influenced y the ctul distnce etween the smple vlues nd M it mesures only how mny insted of how much. Consequently, it needs lrger smple sizes or more supporting evidence thn the t-test. For the exmple of X, to otin P <.5 we c s X s X X Smple Clculte test sttistic Determine P vlue M X m X One-smple t-test Student s t Binomil t = ( X M)/s X = (.7.4)/.11 P P = Sign test W = count(x i > M) = 5 t 4 4 W Figure 1 A smple cn e esily tested ginst reference vlue using the sign test without ny ssumptions out the popultion distriution. () Smple X (n = 6) is tested ginst reference M =.4. Smple men is shown with s.d. (s X ) nd s.e.m. error rs (s ). m x is smple medin. () The t-sttistic compres to M in units of s.e.m. The sign test s W is the numer of smple vlues lrger thn M. (c) Under the null, t follows Student s t-distriution with five degrees of freedom, wheres W is descried y the inomil with 6 trils nd P =.5. Two-tiled P vlues re shown. c X Y Rnk this month Assign rnks Clculte test sttistic Determine P vlue X vs. Y 1 3,4,5,6 7,8,9,1 R = = 13 W = R n Y (n Y + 1)/ = 13 1 = 3 PXY.7 Z X vs. Z P XZ R = Rnk 1, 3,4 5 6,7,8,9,1 W = = W Figure Mny nonprmetric tests re sed on rnks. () Smple comprisons of X vs. Y nd X vs. Z strt with rnking pooled vlues nd identifying the rnks in the smller-sized smple (e.g., 1, 3, 4, 5 for Y; 1,, 3, 6 for Z). Error rs show smple men nd s.d., nd smple medins re shown y verticl dotted lines. () The Wilcoxon rnk-sum test sttistic W is the difference etween the sum of rnks nd the smllest possile oserved sum. (c) For smll smple sizes the exct distriution of W cn e clculted. For smples of size (6, 4), there re only 1 different rnk comintions corresponding to 5 distinct vlues of W. would need to hve ll vlues lrger thn M (W = 6). Its lrge P vlues nd strightforwrd ppliction mkes the sign test useful dignostic. Tke, for exmple, hypotheticl sitution slightly different from tht in Figure 1, where P >.5 is reported for the cse where tretment hs lowered lood pressure in 6 out of 6 sujects. You my think this P seems implusily lrge, nd you d e right ecuse the equivlent scenrio for the sign test (W = 6, n = 6) gives two-tiled P =.3. To compre two smples, the Wilcoxon rnk-sum test is widely used nd is sometimes referred to s the Mnn-Whitney or Mnn- Whitney-Wilcoxon test. It tests whether the smples come from distriutions with the sme medin. It doesn t ssume normlity, ut s test of equlity of medins, it requires oth smples to come from distriutions with the sme shpe. The Wilcoxon test is one of mny methods tht reduce the dynmic rnge of vlues y converting them to their rnks in the list of ordered vlues pooled from oth smples (Fig. ). The test sttistic, W, is the degree to which the sum of rnks is lrger thn the lowest possile in the smple with the lower rnks (Fig. ). We expect tht smple from popultion with smller medin will e converted to set of smller rnks. Becuse there is finite numer (1) of comintions of rnkordering for X (n x = 6) nd Y (n Y = 4), we cn enumerte ll outcomes of the test nd explicitly construct the distriution of W (Fig. c) to ssign P vlue to W. The smllest vlue of W = occurs when ll vlues in one smple re smller thn those in the other. When they re ll lrger, the sttistic reches mximum, W = n X n Y = 4. For X versus Y, W = 3, nd there re 14 of 1 test outcomes with W 3 or W 1. Thus, P XY =14/1 =.67. For X versus Z, W =, nd P XZ = 8/1 =.38. For cses in which oth smples re lrger thn 1, W is pproximtely norml, nd we cn otin the P vlue from z-test of (W m W )/s W, where m W = n 1 (n 1 + n + 1)/ nd s W = (m W n /6). The ility to enumerte ll outcomes of the test sttistic mkes clculting the P vlue strightforwrd (Figs. 1c nd c), ut there is n importnt consequence: there will e minimum P vlue, P min. Depending on the size of smples, P min cn e reltively lrge. For comprisons of smples of size n X = 6 nd n Y = 4 (Fig. ), P min = 1/1 =.5 for one-tiled test, or.1 for two-tiled test, corresponding to W =. Moreover, ecuse there re only 5 distinct vlues of W (Fig. c), only two other twotiled P vlues re <.5: P =. (W = 1) nd P =.38 (W = ). The next-lrgest P vlue (W = 3) is P =.7. Becuse there is no P with vlue.5, the test cnnot e set to reject the null t type I rte of 5%. Even if we test t =.5, we will e rejecting the null t the nture methods VOL.11 NO.5 MAY

17 this month npg 14 Nture Americ, Inc. All rights reserved. Distriution Effect Norml Exponentil Uniform Smple size nd smpling method Continuous n =5 Discrete Continuous n =5 Discrete Figure 3 The Wilcoxon rnk-sum test cn outperform the t-test in the presence of discrete smpling or skew. Dt were smpled from three common nlyticl distriutions with m = 1 (dotted lines) nd s = 1 (gry rs, m ± s). Discrete smpling ws simulted y rounding vlues to the nerest integer. The FPR, FDR nd power of Wilcoxon tests (lck lines) nd t-tests (colored rs) for 1, smple pirs for ech comintion of smple size (n = 5 nd 5), effect chnce ( nd 1%) nd smpling method. In the sence of n effect, oth smple vlues were drwn from given distriution type with m = 1. With effect, the distriution for the second smple ws shifted y d (d = 1.4 for n = 5; d =.57 for n = 5). The effect size ws chosen to yield 5% power for the t-test in the norml noise scenrio. Two-tiled P t =.5. next lower P for n effective type I error of 3.8%. We will see how this ffects test performnce for smll smples further on. In fct, it my even e impossile to rech significnce t =.5 ecuse there is limited numer of wys in which smll smples cn vry in the context of rnks, nd no outcome of the test hppens less thn 5% of the time. For exmple, smples of size 4 nd 3 offer only 35 rrngements of rnks nd two-tiled P min = /35 =.57. Contrst this to the t-test, which cn produce ny P vlue ecuse the test sttistic cn tke on n infinite numer of vlues. This hs serious implictions in multiple-testing scenrios discussed in the previous column 3. Recll tht when N tests re performed, multiple-testing corrections will scle the smllest P vlue to NP. In the sme wy s test my never yield significnt result (P min > ), pplying multiple-testing correction my lso preclude it (NP min > ). For exmple, mking N = 6 comprisons on smples such s X nd Y shown in Figure (n X = 6, n Y = 4) will never yield n djusted P vlue lower thn =.5 ecuse P min =.1 > /N. To chieve two-tiled significnce t =.5 cross N = 1, 1 or 1, tests, we require smple sizes tht produce t lest 4, 4, or 4, distinct rnk comintions. This is chieved for smple pirs of size of (5, 6), (7, 8) nd (9, 9), respectively. The P vlues from the Wilcoxon test (P XY =.7, P XZ =.4) in Figure pper to e in conflict with those otined from the t-test (P XY =.4, P XZ =.6). The two methods tell us contrdictory informtion or do they? As mentioned, the Wilcoxon test concerns the medin, wheres the t-test concerns the men. For symmetric distriutions, these vlues cn e quite different, nd it is conceivle tht the medins re the sme ut the mens re different. The t-test does not identify the difference in mens of X nd Z s significnt ecuse the stndrd devition, s Z, is reltively lrge owing to the influence of the smple s lrgest vlue (.81). Becuse the t-test rects to ny chnge in ny smple vlue, the presence of outliers cn esily influence its outcome when smples re smll. For exmple, simply incresing the lrgest vlue in X (1.) y.3 will increse s X from.8 to.35 nd result in P XY vlue tht is no longer significnt t =.5. This chnge does not lter the Wilcoxon P vlue ecuse the rnk scheme remins unltered. This insensitivity to chnges in the dt outliers nd typicl effects like reduces the sensitivity of rnk methods. Test t W FPR FDR Power The fct tht the output of rnk test is driven y the proility tht vlue drwn from distriution A will e smller (or lrger) thn one drwn from B without regrd to their solute difference hs n interesting consequence: we cnnot use this proility (pirwise preferences, in generl) to impose n order on distriutions. Consider cse of three eqully prevlent diseses for which tretment A hs cure times of, nd 5 dys for the three diseses, nd tretment B hs 1, 4 nd 4. Without tretment, ech disese requires 3 dys to cure let s cll this control C. Tretment A is etter thn C for the first two diseses ut not the third, nd tretment B is etter only for the first. Cn we determine which of the three options (A, B, C) is etter? If we try to nswer this using the proility of oserving shorter time to cure, we find P(A < C) = 67% nd P(C < B) = 67% ut lso tht P(B < A) = 56% rock-pper-scissors scenrio. The question out which test to use does not hve n unqulified nswer oth hve limittions. To illustrte how the t- nd Wilcoxon tests might perform in prcticl setting, we compred their flse positive rte (FPR), flse discovery rte (FDR) nd power t =.5 for different smpling distriutions nd smple sizes (n = 5 nd 5) in the presence nd sence of n effect (Fig. 3). At n = 5, Wilcoxon FPR =.3 < ecuse this is the lrgest P vlue it cn produce smller thn, not ecuse the test inherently performs etter. We cn lwys rech this FPR with the t-test y setting =.3, where we ll find tht it will still hve slightly higher power thn Wilcoxon test tht rejects t this rte. At n = 5, Wilcoxon performs etter for discrete smpling the power (.43) is essentilly the sme s the t-test s (.46), ut the FDR is lower. When oth tests re pplied t =.3, Wilcoxon power (.43) is slightly higher thn t-test power (.39). The differences etween the tests for n = 5 diminishes ecuse the numer of rrngements of rnks is extremely lrge nd the norml pproximtion to smple mens is more ccurte. However, one cse stnds out: in the presence of skew (e.g., exponentil distriution), Wilcoxon power is much higher thn tht of the t-test, prticulrly for continuous smpling. This is ecuse the mjority of vlues re tightly spced nd rnks re more sensitive to smll shifts. Skew ffects t-test FPR nd power in complex wy, depending on whether one- or two-tiled tests re performed nd the direction of the skew reltive to the direction of the popultion shift tht is eing studied 4. Nonprmetric methods represent more cutious pproch nd remove the urden of ssumptions out the distriution. They pply nturlly to dt tht re lredy in the form of rnks or degree of preference, for which numericl differences cnnot e interpreted. Their power is generlly lower, especilly in multiple-testing scenrios. However, when dt re very skewed, rnk methods rech higher power nd re etter choice thn the t-test. Corrected fter print 3 My 14. COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. Mrtin Krzywinski & Nomi Altmn 1. Krzywinski, M. & Altmn, N. Nt. Methods 11, (14).. Krzywinski, M. & Altmn, N. Nt. Methods 1, (13). 3. Krzywinski, M. & Altmn, N. Nt. Methods 11, (14). 4. Reineke, D.M, Bggett, J. & Elfessi, A. J. Stt. Educ. 11 (3). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. 468 VOL.11 NO.5 MAY 14 nture methods

18 corrigend Corrigendum: Nonprmetric tests Mrtin Krzywinski & Nomi Altmn Nt. Methods 11, (14); pulished online 9 April 14; corrected fter print 3 My 14 In the version of this rticle initilly pulished, the expression X (n X = 6) ws incorrectly written s X (n Y = 6). The error hs een corrected in the HTML nd PDF versions of the rticle. npg 14 Nture Americ, Inc. All rights reserved. nture methods

19 npg 14 Nture Americ, Inc. All rights reserved. Points of View Designing comprtive experiments Good experimentl designs limit the impct of vriility nd reduce smple-size requirements. In typicl experiment, the effect of different conditions on iologicl system is compred. Experimentl design is used to identify dt-collection schemes tht chieve sensitivity nd specificity requirements despite iologicl nd technicl vriility, while keeping time nd resource costs low. In the next series of columns we will use sttisticl concepts introduced so fr nd discuss design, nlysis nd reporting in common experimentl scenrios. In experimentl design, the resercher-controlled independent vriles whose effects re eing studied (e.g., growth medium, drug nd exposure to light) re clled fctors. A level is sudivision of the fctor nd mesures the type (if ctegoricl) or mount (if continuous) of the fctor. The gol of the design is to determine the effect nd interply of the fctors on the response vrile (e.g., cell size). An experiment tht considers ll comintions of N fctors, ech with n i levels, is fctoril design of type n 1 n n N. For exmple, 3 4 design hs two fctors with three nd four levels ech nd exmines ll 1 comintions of fctor levels. We will review sttisticl methods in the context of simple experiment to introduce concepts tht pply to more complex designs. Suppose tht we wish to mesure the cellulr response to two different tretments, A nd B, mesured y fluorescence of n liquot of cells. This is single fctor (tretment) design with three levels (untreted, A nd B). We will ssume tht the fluorescence (in ritrry units) of n liquot of untreted cells hs norml distriution with μ = 1 nd tht rel effect sizes of tretments A nd B re d A =.6 nd d B = 1 (A increses response y 6% to 1.6 nd B y 1% to 11). To simulte vriility owing to iologicl vrition nd mesurement uncertinty (e.g., in the numer of cells in n liquot), we will use σ = 1 for the distriutions. For ll tests nd clcultions we use α =.5. We strt y ssigning smples of cell liquots to ech level (Fig. 1). To improve the precision (nd power) in mesuring the men of the response, more thn one liquot is needed 1. One smple will e control (considered level) to estlish the seline response, nd cpture iologicl nd technicl vriility. The other two smples will e used to mesure response to ech tretment. Before we cn crry out the experiment, we need to decide on the smple size. We cn fll ck to our discussion out power 1 to suggest n. How lrge n effect size (d) do we wish to detect nd t wht sensitivity? Aritrrily smll effects cn e detected with lrge enough smple size, ut this mkes for very expensive experiment. We will need to lnce our decision sed on wht we consider to e iologiclly meningful response nd the resources t our disposl. If we re stisfied with n 8% chnce (the lowest power we should ccept) of detecting 1% chnge in response, which corresponds to the rel effect of tretment B (d B = 1), the two-smple t-test requires n = 17. At this n vlue, the power to detect d A =.6 is 4%. Power this month Experimentl design Smple mens Differences in mens Tretment A d Two-smple A A C B t-tests Control A/C B/A Tretment B d B B/C.18.9 A/C B/C B/A 17 n = 17 P P =.7 d Avg. response Figure 1 Design nd reporting of single-fctor experiment with three levels using two-smple t-test. () Two treted smples (A nd B) with n = 17 re compred to control (C) with n = 17 nd to ech other using two-smple t-tests. () Simulted mens nd P vlues for smples in. Vlues re drwn from norml popultions with σ = 1 nd men response of 1 (C), 1.6 (A) nd 11 (B). (c) The preferred reporting method of results shown in, illustrting difference in mens with CIs, P vlues nd effect size, d. All error rs show 95% CI. clcultions re esily computed with softwre; typiclly inputs re the difference in mens (Δμ), stndrd devition estimte (σ), α nd the numer of tils (we recommend lwys using two-tiled clcultions). Bsed on the design in Figure 1, we show the simulted smples mens nd their 95% confidence intervl (CI) in Figure 1. The 95% CI cptures the men of the popultion 95% of the time; we recommend using it to report precision. Our results show significnt difference etween B nd control (referred to s B/C, P =.9) ut not for A/C (P =.18). Prdoxiclly, testing B/A does not return significnt outcome (P =.15). Whenever we perform more thn one test we should djust the P vlues. As we only hve three tests, the djusted B/C P vlue is still significnt, Pʹ = 3P =.8. Although commonly used, the formt used in Figure 1 is inpproprite for reporting our results: smple mens, their uncertinty nd P vlues lone do not present the full picture. A more complete presenttion of the results (Fig. 1c) comines the mgnitude with uncertinty (s CI) in the difference in mens. The effect size, d, defined s the difference in mens in units of pooled stndrd devition, expresses this comintion of mesurement nd precision in single vlue. Dt in Figure 1c lso explin etter tht the difference etween significnt result (B/C, P =.9) nd nonsignificnt result (A/C, P =.18) is not lwys significnt (B/A, P =.15) 3. Significnce itself is hrd oundry t P = α, nd two ritrrily close results my strddle it. Thus, neither significnce itself nor differences in significnce sttus should ever e used to conclude nything out the mgnitude of the underlying differences, which my e very smll nd not iologiclly relevnt. CIs explicitly show how close we re to mking positive inference nd help ssess the enefit of collecting more dt. For exmple, the CIs of A/C nd B/C closely overlp, which suggests tht t our smple size we cnnot relily distinguish etween the response to A nd B (Fig. 1c). Furthermore, given tht the CI of A/C just rely crosses zero, it is possile tht A hs rel effect tht our test filed to detect. More informtion out our ility to detect n effect cn e otined from post hoc power nlysis, which ssumes tht the oserved effect is the sme s the rel effect (normlly unknown), nd uses the oserved difference in mens nd pooled vrince. For A/C, the difference in mens is.48 nd the pooled s.d. (s p ) = 1.3, which yields post hoc power of 7%; we hve little power to detect this difference. Other thn incresing smple size, how could we improve our chnces of detecting the effect of A? Our ility to detect the effect of A is limited y vriility in the difference etween A nd C, which hs two rndom components. If c Avg. response Δ nture methods VOL.11 NO.6 JUNE

20 npg 14 Nture Americ, Inc. All rights reserved. this month c Vrince mitigtion Within-suject vrition Between-suject vrition Untreted Treted Untreted Response Unpired Oserved x x y 1 x1 1 x y x y 1 y y x x 3 y 3 x y x y x y y y1 y 1 x Pired Oserved x x y x d x 1 x1x σ wit y3 σ et y y x y x1 x 3 x x x y 1 Mitigted Figure Sources of vriility, conceptulized s circles with mesurements (x i, y i ) from different liquots (x,y) rndomly smpled within them. () Limits of mesurement nd technicl precision contriute to σ wit (gry circle) oserved when the sme liquot is mesured more thn once. This vriility is ssumed to e the sme in the untreted nd treted condition, with effect d on liquot x nd y. () Biologicl vrition gives rise to σ et (green circle). (c) Pired design uses the sme liquot for oth mesurements, mitigting etween-suject vrition. we mesure the sme liquot twice, we expect vriility owing to technicl vrition inherent in our lortory equipment nd vriility of the smple over time (Fig. ). This is clled within-suject vrition, σ wit. If we mesure two different liquots with the sme fctor level, we lso expect iologicl vrition, clled etweensuject vrition, σ et, in ddition to the technicl vrition (Fig. ). Typiclly there is more iologicl thn technicl vriility (σ et > σ wit ). In n unpired design, the use of different liquots dds oth σ wit nd σ et to the mesured difference (Fig. c). In pired design, which uses the pired t-test 4, the sme liquot is used nd the impct of iologicl vrition (σ et ) is mitigted (Fig. c). If differences in liquots (σ et ) re pprecile, vrince is mrkedly reduced (to within-suject vrition) nd the pired test hs higher power. The link etween σ et nd σ wit cn e illustrted y n experiment to evlute weight-loss diet in which control group ets normlly nd tretment group follows the diet. A comprison of the men weight fter month is confounded y the initil weights of the sujects in ech group. If insted we focus on the chnge in weight, we remove much of the suject vriility owing to the initil weight. If we write the totl vrince s σ = σ wit + σ et, then the vrince of the oserved quntity in Figure c is σ for the unpired design ut σ (1 ρ) for the pired design, where ρ = σ et /σ is the correltion coefficient (intrclss correltion). The reltive difference is cptured y ρ of two mesurements on the sme liquot, which must e included ecuse the mesurements re no longer independent. If we ignore ρ in our nlysis, we will overestimte the vrince nd otin overly conservtive P vlues nd CIs. In the cse where there is no dditionl vrition etween liquots, there is no enefit to using the sme liquot: mesurements on the sme liquot re uncorrelted (ρ = ) nd vrince of the pired test is Experimentl design Tretment A 17 ΔA Pired t-tests ΔB 17 Tretment B ΔB/ΔA Twosmple t-test Avg. response Pired-smple mens n = 17 P =.18 Figure 3 Design nd reporting for pired, single-fctor experiment. () The sme n = 17 smple is used to mesure the difference etween tretment nd ckground (ΔA = A fter A efore, ΔB = B fter B efore ), nlyzed with the pired t-test. Two-smple t-test is used to compre the difference etween responses (ΔB versus ΔA). () Simulted smple mens nd P vlues for mesurements nd comprisons in. (c) Men difference, CIs nd P vlues for two vrince scenrios, σ et /σ wit of 1 nd 4, corresponding to ρ of.5 nd.8. Totl vrince ws fixed: σ et + σ wit = 1. All error rs show 95% CI. c Avg. response Δ Men differences σ et / σ wit = 1 ρ=.5 σ et / σ wit = 4 ρ= ΔA ΔB ΔB/ΔA ΔA ΔB ΔB/ΔA P <1 5.6 the sme s the vrince of the unpired. In contrst, if there is no vrition in mesurements on the sme liquot except for the tretment effect (σ wit = ), we hve perfect correltion (ρ = 1). Now, the difference mesurement derived from the sme liquot removes ll the noise; in fct, single pir of liquots suffices for n exct inference. Prcticlly, oth sources of vrition re present, nd it is their reltive size reflected in ρ tht determines the enefit of using the pired t-test. We cn see the improved sensitivity of the pired design (Fig. 3) in decresed P vlues for the effects of A nd B (Fig. 3 versus Fig. 1). With the etween-suject vrince mitigted, we now detect n effect for A (P =.13) nd n even lower P vlue for B (P =.) (Fig. 3). Testing the difference etween ΔA nd ΔB requires the two-smple t-test ecuse we re testing different liquots, nd this still does not produce significnt result (P =.18). When reporting pired-test results, smple mens (Fig. 3) should never e shown; insted, the men difference nd confidence intervl should e shown (Fig. 3c). The reson for this comes from our discussion ove: the enefit of piring comes from reduced vrince ecuse ρ >, something tht cnnot e glened from Figure 3. We illustrte this in Figure 3c with two different smple simultions with sme smple men nd vrince ut different correltion, chieved y chnging the reltive mount of σ et nd σ wit. When the component of iologicl vrince is incresed, ρ is incresed from.5 to.8, totl vrince in difference in mens drops nd the test ecomes more sensitive, reflected y the nrrower CIs. We re now more certin tht A hs rel effect nd hve more reson to elieve tht the effects of A nd B re different, evidenced y the lower P vlue for ΔB/ΔA from the two-smple t-test (.6 versus.18; Fig. 3c). As efore, P vlues should e djusted with multiple-test correction. The pired design is more efficient experiment. Fewer liquots re needed: 34 insted of 51, lthough now 68 fluorescence mesurements need to e tken insted of 51. If we ssume σ wit = σ et (ρ =.5; Fig. 3c), we cn expect the pired design to hve power of 97%. This power increse is highly contingent on the vlue of ρ. If σ wit is pprecily lrger thn σ et (i.e., ρ is smll), the power of the pired test cn e lower thn for the two-smple vrint. This is ecuse totl vrince remins reltively unchnged (σ (1 ρ) σ ) while the criticl vlue of the test sttistic cn e mrkedly lrger (prticulrly for smll smples) ecuse the numer of degrees of freedom is now n 1 insted of (n 1). If the rtio of σ et to σ wit is 1:4 (ρ =.), the pired test power drops from 97% to 86%. To nlyze experimentl designs tht hve more thn two levels, or dditionl fctors, method clled nlysis of vrince is used. This generlizes the t-test for compring three or more levels while mintining etter power thn compring ll sets of two levels. Experiments with two or more levels will e our next topic. Competing Finncil Interests The uthors declre no competing finncil interests. Mrtin Krzywinski & Nomi Altmn 1. Krzywinski, M.I. & Altmn, N. Nt. Methods 1, (13).. Krzywinski, M.I. & Altmn, N. Nt. Methods 11, (14). 3. Gelmn, A. & Stern, H. Am. Stt. 6, (6). 4. Krzywinski, M.I. & Altmn, N. Nt. Methods 11, (14). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. 598 VOL.11 NO.6 JUNE 14 nture methods

21 npg 14 Nture Americ, Inc. All rights reserved. Points of Significnce Anlysis of vrince nd locking Good experimentl designs mitigte experimentl error nd the impct of fctors not under study. Reproducile mesurement of tretment effects requires studies tht cn relily distinguish etween systemtic tretment effects nd noise resulting from iologicl vrition nd mesurement error. Estimtion nd testing of the effects of multiple tretments, usully including pproprite repliction, cn e done using nlysis of vrince (ANOVA). ANOVA is used to ssess sttisticl significnce of differences mong oserved tretment mens sed on whether their vrince is lrger thn expected ecuse of rndom vrition; if so, systemtic tretment effects re inferred. We introduce ANOVA with n experiment in which three tretments re compred nd show how sensitivity cn e incresed y isolting iologicl vriility through locking. Lst month, we discussed one-fctor three-level experimentl design tht limited interference from iologicl vrition y using the sme smple to estlish oth seline nd tretment vlues 1. There we used the t-test, which is not suitle when the numer of fctors or levels increses, in lrge prt due to its loss of power s result of multiple-testing correction. The two-smple t-test is specific cse of ANOVA, ut the ltter cn chieve etter power nd nturlly ccount for sources of error. ANOVA hs the sme requirements s the t-test: independent nd rndomly selected smples from pproximtely norml distriutions with equl vrince tht is not under the influence of the tretments. Here we continue with the three-tretment exmple 1 nd nlyze it with one-wy (single-fctor) ANOVA. As efore, we simulted smples for k = 3 tretments ech with n = 6 vlues (Fig. 1). The ANOVA null hypothesis is tht ll smples re from the sme distriution nd hve equl mens. Under this null, etween-group vrition of smple mens nd within-group vrition of smple A B C Anlysis of vrince F distriution Impct of within-group vrince on power k = 3, n = 6 d.f. W = k 1, d.f. B = N k σ W : 6 1 A,B,C A,B,C SS B SS W SS B SS F = W k 1 N k = MS B /MS W 1 Pro(F ) k= 3 k= 5 k = 1 n = F c F =1, P= F =3, 8 1 P=.8 1 F 8= 6, 1 P=.1 1 Power =.19 Power =.5 Power =.81 Figure 1 ANOVA is used to determine significnce using the rtio of vrince estimtes from smple mens nd smple vlues. () Between- nd within-group vrince is clculted from SS B, the etween tretment sum of squres, nd SS W, the within tretment sum of squres.. Devitions re shown s horizontl lines extending from grnd nd smple mens. The test sttistic, F, is the rtio men squres MS B nd MS W, which re SS B nd SS W weighted y d.f. () Distriution of F, which ecomes pproximtely norml s k nd N increse, shown for k = 3, 5 nd 1 smples ech of size n = 6. N = kn is the totl numer of smple vlues. (c) ANOVA nlysis of smple sets with decresing within-group vrince (σ w = 6,,1). MS B = 6 in ech cse. Error rs, s.d No locking = = = Response +.5 Blocking on culture +.5 = Response c this month Smpling schemes Completely Complete Incomplete lock rndomized rndomized lock 6 replictes 6 replictes 4 replictes Technicl repets vlues re predictly relted. Their rtio cn e used s test sttistic, F, which will e lrger thn expected in the presence of tretment effects. Although it ppers tht we re testing equlity of vrinces, we re ctully testing whether ll the tretment effects re zero. ANOVA clcultions re summrized in n ANOVA tle, which we provide for Figures 1, 3 nd 4 (Supplementry Tles 1 3) long with n interctive spredsheet (Supplementry Tle 4). The sums of squres (SS) column shows sums of squred devitions of vrious quntities from their mens. This sum is performed over ech dt point ech smple men devition (Fig. 1) contriutes to SS B six times. The degrees of freedom (d.f.) column shows the numer of independent devitions in the sums of squres; the devitions re not ll independent ecuse devitions of quntity from its own men must sum to zero. The men squre (MS) is SS/d.f. The F sttistic, F = MS B /MS W, is used to test for systemtic differences mong tretment mens. Under the null, F is distriuted ccording to the F distriution for k 1 nd N k d.f. (Fig. 1). When we reject the null, we conclude tht not ll smple mens re the sme; dditionl tests re required to identify which tretment mens re different. The rtio η = SS B /(SS B + SS W ) is the coefficient of vrition (lso clled R ) nd mesures the frction of the totl vrition resulting from differences mong tretment mens. We previously introduced the ide tht vrince cn e prtitioned: within-group vrince, s wit, ws interpreted s experimentl error nd etween-group vrince, s et, s iologicl vrition 1. In onewy ANOVA, the relevnt quntities re MS W nd MS B. MS W corresponds to vrince in the smple fter other sources of vrition hve een ccounted for nd represents experimentl error (σ wit ). If some sources of error re not ccounted for (e.g., iologicl vrition), MS W will e inflted. MS B is nother estimte for MS W, dditionlly inflted y verge squred devition of tretment mens from the 1 3 Figure Blocking improves sensitivity y isolting vrition in smples tht is independent from tretment effects. () Mesurements from tretment liquots derived from different cell cultures re differentilly offset (e.g., 1,.5,.5) ecuse of differences in cultures. () When liquots re derived from the sme culture, mesurements re uniformly offset (e.g.,.5). (c) Incorporting locking in dt collection schemes. Repets within locks re considered technicl replictes. In n incomplete lock design, lock cnnot ccommodte ll tretments. A B C Smples Response Prtitioned vrince Between group SSB=1.4 MS B =6. SS W =3. MS W =. Within group Response c Avg response Comprison of mens F = 3.1, P =.8, η =.9 4 C A/B B/C A/C.75.4 P dj Figure 3 Appliction of one-fctor ANOVA to comprison of three smples. () Three smples drwn from norml distriutions with s wit = nd tretment mens m A = 9, m B = 1 nd m C = 11. () Depiction of devitions with corresponding SS nd MS vlues. (c) Smple mens nd their differences. P vlues for pired smple comprison re djusted for multiple comprison using Tukey s method. Error rs, 95% CI. Response Δ nture methods VOL.11 NO.7 JULY

22 this month npg 14 Nture Americ, Inc. All rights reserved. BLK Block effect BLK+A A Prtitioned vrince SSB =1.4 MS B =6. SSBLK =19. MSBLK=3.8 SSW =11. MS W = Block Response Tretment Mesured response Comprison of mens Tretment F = 5.5, P =.4, η =.9 Block F = 3.4, P =.48, η = C A/B B/C A/C.6.11 P dj Figure 4 Including locking isoltes iologicl vrition from the estimte of within-group vrince nd improves power. () Blocking is simulted y ugmenting ech smple (s wit = 1) with fixed rndom component (µ lk =, s lk = 1). () Vrince is prtitioned to tretment, lock (lck lines) nd within-group. (c) Summry sttistics for tretment nd lock effects in the sme formt s Figure 3c. In the presence of sufficiently lrge locking effect, MS W is lowered nd tretment test sttistic F = MS B /MS W is incresed. Smller error rs on smple men differences reflect reduced MS W. grnd men, θ, times smple size if the null hypothesis is not true (σ wit + nθ ). Thus, the noisier the dt (σ wit ), the more difficult it is to tese out σ tret nd detect rel effects, just like in the t-test, the power of which could e incresed y decresing smple vrince. To demonstrte this, we simulted three different smple sets in Figure 1c with MS B = 6 nd different MS W vlues, for scenrio with fixed tretment effects (σ tret = 1), ut progressively reduced experimentl error (σ wit = 6,,1). As noise within smples drops, lrger frction vrition is llocted to MS B, nd the power of the test improves. This suggests tht it is eneficil to decrese MS W. We cn do this through process clled locking to identify nd isolte likely sources of smple vriility. Suppose tht our smples in Figure 1 were generted y mesuring the response to tretment of n liquot of cells fixed volume of cells from culture (Fig. ). Assume tht it is not possile to derive ll required liquots from single culture or tht it is necessry to use multiple cultures to ensure tht the results generlize. It is likely tht liquots from different cultures will respond differently owing to vrition in cell concentrtion, growth rtes, medium composition, mong others. These so-clled nuisnce vriles confound the rel tretment effects: the seline for ech mesurement unpredictly vries (Fig. ). We cn mitigte this y using the sme cell culture to crete three liquots, one for ech tretment, to propgte these differences eqully mong mesurements (Fig. ). Although mesurements etween cultures still would e shifted, the reltive differences etween tretments within the sme culture remin the sme. This process is clled locking, nd its purpose is to remove s much vriility s possile to mke differences etween tretments more evident. For exmple, the pired t-test implements locking y using the sme suject or iologicl smple. Without locking, cultures, liquots nd tretments re not mtched completely rndomized design (Fig. c) which mkes differences in cultures impossile to isolte. For locking, we systemticlly ssign tretments to cultures, such s in rndomized complete lock design, in which ech culture provides replicte of ech tretment (Fig. c). Ech lock is sujected to ech of the tretments exctly once, nd we cn optionlly collect technicl repets (repeting dt collection from the mesurement pprtus or multiple liquots from the sme culture) to minimize the impct of fluctutions in our mesuring pprtus; these vlues would e verged. In the cse where lock cnnot support ll tretments (e.g., culture yields only two liquots), we would use comintions of tretment pirs c Avg response Response Δ with the requirement tht ech pir is mesured eqully often lnced incomplete lock design. Let us look t how locking cn increse ANOVA sensitivity using the scenrio from Figure 1. We will strt with three smples (n = 6) (Fig. 3) tht mesure the effects of tretments A, B nd C on liquots of cells in completely rndomized scheme. We simulted the smples with σ wit = to represent experimentl error. Using ANOVA, we prtition the vrition (Fig. 3) nd find the men squres for the components (MS B = 6., MS W =.; Supplementry Tle ). MS W reflects the vlue σ wit = in the smple simultion, nd it turns out tht this vrince is too high to yield significnt F; we find F = 3.1 (P =.8; Fig. 3c). Becuse we did not find significnt difference using ANOVA, we do not expect to otin significnt P vlues from two-smple t-tests pplied pirwise to the smples. Indeed, when djusted for multipletest correction these P dj vlues re ll greter thn.5 (Fig. 3c). To illustrte locking, we simulte smples to hve the sme vlues s in Figure 3 ut with hlf of the vrince due to differences in cultures. These differences in cultures (lock effect) re simulted s norml with men μ lk = nd vrince σ lk = 1 (Fig. 4), nd re dded to ech of the smple vlues using the complete rndomized lock design (Fig. c). The vrince within smple is thus evenly split etween the lock effect nd the remining experimentl error, which we presumly cnnot prtition further. The contriution of the lock effect to the devitions is shown in Figure 4, now sustntil component of the vrince in ech smple, unlike in Figure 3, where locking ws not ccounted for. Hving isolted vrition owing to cell-culture differences, we incresed sensitivity in detecting tretment effect ecuse our estimte of within-group vrince is lower. Now MS W = 1.1 nd F = 5.5, which is significnt t P =.4 nd llows us to conclude tht the tretment mens re not ll the sme (Fig. 4c). By doing post hoc pirwise comprison with the two-smple t-test, we cn conclude tht tretments A nd C re different t n djusted P =. (95% confidence intervl (CI), ) (Fig. 4c). We cn clculte the F sttistic for the locking vrile using F = MS lk / MS W = 3.4 to determine whether locking hd significnt effect. Mthemticlly, the locking vrile hs the sme role in the nlysis s n experimentl fctor. Note tht just ecuse the locking vrile soks up some of the vrition we re not gurnteed greter sensitivity; in fct, ecuse we estimte the lock effect s well s the tretment effect, the within-group d.f. in the nlysis is lower (e.g., chnges from 15 to 1 in our cse); our test my lose power if the locks do not ccount for sufficient smple-to-smple vrition. Blocking incresed the efficiency of our experiment. Without it, we would need nerly twice s lrge smples (n = 11) to rech the sme power. The enefits of locking should e weighed ginst ny increse in ssocited costs nd the decrese in d.f.: in some cses it my e more sensile to simply collect more dt. Note: Supplementry informtion is ville in the online version of the pper (doi:1.138/nmeth.35). COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. Mrtin Krzywinski & Nomi Altmn 1. Krzywinski, M. & Altmn, N. Nt. Methods 11, (14).. Krzywinski, M. & Altmn, N. Nt. Methods 11, (14). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. 7 VOL.11 NO.7 JULY 14 nture methods

23 npg 14 Nture Americ, Inc. All rights reserved. POINTS OF SIGNIFICANCE Repliction Qulity is often more importnt thn quntity. Tle 1 Replicte hierrchy in hypotheticl mouse single-cell gene expression RNA sequencing experiment Replicte Replicte type ctegory Animl Colonies B study Strins B sujects Cohoused groups B Gender B Individuls B Smple Orgns from scrificed nimls B preprtion Methods for dissociting cells from tissue T Dissocition runs from given tissue smple T Individul cells B RNA-seq lirry construction T Sequencing Runs from the lirry of given cell T Reds from different trnscript molecules V Reds with unique moleculr identifier (UMI) from given trnscript molecule T Replictes re ctegorized s iologicl (B), technicl (T) or of vrile type (V). Sequence reds serve diverse purposes depending on the ppliction nd how reds re used in nlysis. Replicte Animl Biologicl Cell Biologicl Biologicl Vrince Mesurement σ Technicl M = Expression, X Technicl Totl Multiple levels of repliction σa = 1 σc = σb = 3 σm =.5 σtot = 3.5 σb THIS MONTH Vrince of expression nd expression men Figure 1 Replictes do not contriute eqully nd independently to the mesured vriility, which cn often underestimte the totl vriility in the system. () Three levels of repliction (two iologicl, one technicl) with niml, cell nd mesurement replictes normlly distriuted with men cross nimls of 1 nd rtio of vrinces 1::.5. Solid green (iologicl) nd lue (technicl) dots show how mesurement of the expression (X = 1) smples from ll three sources of vrition. Distriution s.d. is shown s horizontl lines. () Expression vrince, Vr(X), nd vrince of expression men, Vr( X), computed cross 1, simultions of n A n C n M = 48 mesurements for unique comintions of the numer of nimls (n A = 1 to 48), cells per niml (n C = 1 to 48) nd technicl replicte mesurements per cell (n M = 1 nd 3). The rtio of Vr(X) nd Vr( X) is the effective smple size, n, which corresponds to the equivlent numer of sttisticlly independent mesurements. Horizontl dshed lines correspond to iologicl nd totl vrition. Error rs on Vr(X) show s.d. from the 1, simulted smples (n M = 1). Science relies hevily on replicte mesurements. Additionl replictes generlly yield more ccurte nd relile summry sttistics in experimentl work. But the strightforwrd question, how mny nd wht kind of replictes should I run? elies deep set of distinctions nd trdeoffs tht ffect sttisticl testing. We illustrte different types of repliction in multilevel ( nested ) experimentl designs nd clrify sic concepts of efficient lloction of replictes. Replictes cn e used to ssess nd isolte sources of vrition in mesurements nd limit the effect of spurious vrition on hypothesis testing nd prmeter estimtion. Biologicl replictes re prllel mesurements of iologiclly distinct smples tht cpture rndom iologicl vrition, which my itself e suject of study or noise source. Technicl replictes re repeted mesurements of the sme smple tht represent independent mesures of the rndom noise ssocited with protocols or equipment. For iologiclly distinct conditions, verging technicl replictes cn limit the impct of mesurement error, ut tking dditionl iologicl replictes is often preferle for improving the efficiency of sttisticl testing. Nested study designs cn e quite complex nd include mny levels of iologicl nd technicl repliction (Tle 1). The distinction etween iologicl nd technicl replictes depends on which sources of vrition re eing studied or, lterntively, viewed s noise sources. An illustrtive exmple is genome sequencing, where se clls ( sttisticl estimte of the most likely se t given sequence position) re mde from multiple DNA reds of the sme genetic locus. These reds re technicl replictes tht smple the uncertinty in the sequencer redout ut will never revel errors present in the lirry itself. Errors in lirry construction cn e mitigted y constructing technicl replicte lirries from the sme smple. If dditionl resources re ville, one could potentilly return to the source tissue nd collect multiple smples to repet the entire sequencing workflow. Such replictes would e technicl if the smples were considered to e from the sme liquot or iologicl if considered to e from different liquots of iologiclly distinct mteril 1. Owing to historiclly high costs per ssy, the field of genome sequencing hs not demnded such repliction. As the need for ccurcy increses nd the cost of sequencing flls, this is likely to chnge. How does one determine the types, levels nd numer of replictes to include in study, nd the extent to which they contriute informtion out importnt sources of vrition? We illustrte the pproch to nswering these questions with single-cell sequencing scenrio in which we mesure the expression of specific gene in liver cells in mice. We simulted three levels of repliction: nimls, cells nd mesurements (Fig. 1). Ech level hs different vrince, with nimls (σ A = 1) nd cells (σ C = ) contriuting to totl iologicl vrince of σ B = 3. When technicl vrince from the ssy (σ M =.5) is included, these distriutions compound the uncertinty in the mesurement for totl vrince of σ TOT = 3.5. We next simulted 48 mesurements, llocted vriously etween iologicl replictes (the numer of nimls, n A nd numer of cells smpled per niml, n C ) nd technicl replictes (numer of mesurements tken per cell, n M ) for totl numer of mesurements n A n C n M = 48. Although we will lwys mke 48 mesurements, the effective smple size, n, will vry from out to 48, depending on how the mesurements re llocted. Let us look t how this comes out. Our ility to mke ccurte inferences will depend on our estimte of the vrince in the system, Vr(X). Different choices of n A, n C nd n M impct this vlue differently. If we smple n C = 48 cells from single niml (n A = 1) nd mesure ech n M = 1 times, our estimte of the totl vrince σ TOT will e Vr(X) =.5 (Fig. 1). This reflects cell nd mesurement vrinces (σ C + σ M ) ut not niml vrition; with only one niml smpled we hve no wy of knowing wht the niml vrince is. Thus Vr(X) certinly underestimtes σ TOT, ut we would not know y 4. Vr(X ) 3.. nm 1 nm 3 σtot σb σc + σ M 1 Vr( X ).5 na n C n n = Vr(X ) Vr( X ) n A n C n NATURE METHODS VOL.11 NO.9 SEPTEMBER

24 THIS MONTH npg 14 Nture Americ, Inc. All rights reserved. how much. Moreover, the uncertinty in Vr(X) (error r t n A = 1; Fig. 1) is the error in σ C + σ M nd not σ TOT. At nother extreme, if ll our mesurements re technicl replictes (n A = n C = 1, n M = 48) we would find Vr(X) =.5 (not represented in Fig. 1). This is only the technicl vrince; if we misinterpreted this s iologicl vrition nd used it for iologicl inference, we would hve n excess of flse positives. Be on the lookout: unusully smll error rs on iologicl mesurements my merely reflect mesurement error, not iologicl vrition. To otin the est estimte of σ TOT we should smple n C = 1 cells from n A = 48 nimls ecuse ech of the 48 mesurements will independently smple ech of the distriutions in Figure 1. Our choice of the numer of replictes lso influences Vr( X), the precision in the expression men. The optiml wy to minimize this vlue is to collect dt from s mny nimls s possile (n A = 48, n C = n M = 1), regrdless of the rtios of vrinces in the system. This comes from the fct tht n A contriutes to decresing ech contriution to Vr( X), which is given y σ A /n A + σ C /n A n C + σ M /n A n C n M. Although technicl replictes llow us to determine σ M, unless this is quntity of interest, we should omit technicl replictes nd mximize n A. Of course, good locking prctice suggests tht smples from the different nimls nd cells should e mixed cross the sequencing runs to minimize the effect of ny systemtic run-to-run vriility (not present in simulted dt here). The vlue in dditionl mesurements cn e estimted y the prospective improvement in effective smple size. We hve seen efore tht the vrince in the men of rndom vrile is relted to its vrince y Vr(X) = nvr( X). The rtio of Vr(X) to Vr( X) cn therefore e used s mesure of the equivlent numer of independent smples. From Figure 1, we cn see tht n = 48 only for n A = 48 nd drops to n = 5 for n A, n C = 1, 4 nd is s low s out for n A = 1. In other words, even though we my e collecting dditionl mesurements they do not ll contriute eqully to n increse in the precision of the men. This is ecuse dditionl cell nd technicl replictes do not correspond to sttisticlly independent vlues: technicl replictes re derived from the sme cell nd the cell replictes from the sme niml. If it is necessry to summrize expression vriility t the level of the nimls, then cells from given niml re pseudoreplictes sttisticlly correlted in wy tht is unique to tht niml nd not representtive of the popultion under study. Not ll replictes yield sttisticlly independent mesures, nd treting them s if they do cn erroneously lower the pprent uncertinty of result. The numer of replictes hs prcticl effect on inference errors in nlysis of differences of mens or vrinces. We illustrte this y enumerting inference errors in 1, simulted drug-tretment experiments in which we vry the numer of nimls nd cells (Fig. ). We ssume 1% effect chnce for two scenrios: twofold increse in vrince, σ C, or 1% increse in men, μ A, using the sme vlues for other vrinces nd 48 totl mesurements s in Figure 1. Applying the t-test, we show flse discovery rte (FDR) nd power for detecting these differences (Fig. ). If we wnt to detect difference in vrition cross cells, it is est to choose n A n C in our rnge. On the other hnd, when we re interested in chnges in men expression cross mice, it is etter to smple s mny mice s possile. In either cse, incresing the numer of mesurements from 48 to 144 y tking three technicl replictes (n M = 3) improves inference only slightly. Biologicl replictes re preferle to technicl replictes for inference out the men nd vrince of iologicl popultion. FDR Power n M 1 3 FDR nd power Inference on difference in vrince Inference on difference in mens na n C n A n C n A 1 n C 1 n M 1 n A 7 n C n M 1 Figure The numer of replictes ffects FDR nd power of inferences on the difference in vrinces nd mens. Shown re power nd FDR profiles of test of difference in cell vrinces (left) nd niml mens (right) for 48 (n M = 1) or 144 (n M = 3) mesurements using different comintions of n A nd n C. Verticl rrows indicte chnge in FDR nd power when technicl replictes re replced y iologicl replictes, s shown y n A,n C,n M, for the sme numer of mesurements (144). Vlues generted from 1, simultions of 1% chnce of tretment effect tht increses cell vrince σ C or niml men 1.1 µ A. Smples were tested with two-smple t-test (smple size n A ) t two-tiled α =.5. (Fig. ). For exmple, chnging n A,n C,n M from 8,6,3 (where power is highest) to 1,1,1 doules the power (.43 to.88) in detecting twofold chnge in vrince. In the cse of detecting 1% difference in mens, chnging n A,n C,n M from 4,,3 to 7,,1 increses power y out 5% from.66 to.98. Prcticlly, the cost difference etween iologicl nd technicl replictes should e considered; this will ffect the cost-enefit trdeoff of collecting dditionl replictes of one type versus the other. For exmple, if the cost units of nimls to cells to mesurements is 1:1:.1 (iologicl replictes re likely more expensive thn technicl ones) then n experiment with n A,n C,n M of 1,1,1 is out twice s expensive s tht with 8,6,3 (78 versus 14 cost units). However, power in detecting chnge in vrince is douled s well, so the cost increse is commensurte with increse in efficiency. In the cse of detecting differences in mens, 7,,1 is out three times s expensive s 4,,3 (878 versus 3 cost units) ut increses power only y 5%, mking this lower-vlue proposition. Typiclly, iologicl vriility is sustntilly greter thn technicl vriility, so it is to our dvntge to commit resources to smpling iologiclly relevnt vriles unless mesures of technicl vriility re themselves of interest, in which cse incresing the numer of mesurements per cell, n M, is vlule. Good experimentl design prctice includes plnning for repliction. First, identify the questions the experiment ims to nswer. Next, determine the proportion of vriility induced y ech step to distriute the cpcity for repliction of the experiment cross steps. Be wre of the potentil for pseudorepliction nd im to design sttisticlly independent replictes. As our cpcity for higher-throughput ssys increses, we should not e misled into thinking tht more is lwys etter. Cler thinking out experimentl questions nd sources of vriility is still crucil to produce efficient study designs nd vlid sttisticl nlyses. COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. Pul Bliney, Mrtin Krzywinski & Nomi Altmn 1. Rosky, K., Lewis, N.E. & Church, G.M. Nt. Rev. Genet. 15, 56 6 (14). Pul Bliney is n Assistnt Professor of Biologicl Engineering t MIT nd Core Memer of the Brod Institute. Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University VOL.11 NO.9 SEPTEMBER 14 NATURE METHODS

25 npg 14 Nture Americ, Inc. All rights reserved. POINTS OF SIGNIFICANCE Nested designs For studies with hierrchicl noise sources, use nested nlysis of vrince pproch. Mny studies re ffected y rndom-noise sources tht nturlly fll into hierrchy, such s the iologicl vrition mong nimls, tissues nd cells, or technicl vrition such s mesurement error. With nested pproch, the vrition introduced t ech hierrchy lyer is ssessed reltive to the lyer elow it. We cn use the reltive noise contriution of ech lyer to optimlly llocte experimentl resources using nested nlysis of vrince (ANOVA), which generlly ddresses repliction nd locking, previously discussed d hoc 1,. Recll tht fctors re independent vriles whose vlues we control nd wish to study 3 nd which hve systemtic effects on the response. Noise limits our ility to detect effects, ut known noise sources (e.g., cell culture) cn e mitigted if used s locking fctors. We cn model the contriution of ech locking fctor to the overll vriility, isolte it nd increse power. Sttisticins distinguish etween fixed fctors, typiclly tretments, nd rndom fctors, such s locks. The impct of fixed nd rndom fctors in the presence of experimentl error is shown in Figure 1. For fixed fctor (Fig. 1), ech of its levels (for exmple, specific drug) hs the sme effect in ll experiments nd n unmodeled uncertinty due to experimentl error. The levels of fixed fctor cn e exctly duplicted (level A1 in Fig. 1 is identicl for ech experiment) nd re of specific interest, usully the effect on the popultion men. In contrst, when we repet n experiment, the levels of rndom fctor re smpled from popultion of ll possile levels of the fctor (replictes) nd re different cross ll the experiments, emphsized y unique level lels (B1 B9; Fig. 1). Becuse the levels cnnot e exctly duplicted, their effect is rndom nd they re not of specific interest. Insted, we use the smple of levels to model the uncertinty dded y the rndom fctor (for exmple, ll mice). Fixed nd rndom fctors my e crossed or nested (Fig. ). When crossed, ll comintions of fctors re used to study the min effects nd interctions of two or more fctors (Fig. ). In contrst, nested designs pply hierrchy some level comintions re not studied ecuse the levels cnnot e duplicted or reused (Fig. ). Rndom fctors (for exmple, mouse nd cell) re nested within the fixed fctor (drug) to mesure noise due to individul mice nd cells nd to generlize the effects of the fixed Fixed fctor effect Rndom fctor effect Fctor effect 1 11 μa3 1 μa 9 μa Smple 8 medin Fctor level Fctor level A1 A A3 A1 A A3 A1 A A3 B1 B B3 B4 B5 B6 B7 B8 B Experiment Experiment Expression Fctor effect σb Figure 1 Inferences out fixed fctors re different thn those out rndom fctors, s shown y ox-plots of n = 1 smples cross three independent experiments. Circles indicte smple medins. Box-plot height reflects simulted mesurement error (s e =.5). () Fixed fctor levels re identicl cross experiments nd hve systemtic effect on the men. () Rndom fctor levels re smples from popultion, hve rndom effect on the men nd contriute noise to the system (σ B = 1). Expression c d THIS MONTH Crossed Nested Crossed nd nested Nested nd crossed Drug Drug Drug Mouse Dose Diet Mouse Cell Fctor type Fixed Mouse Cell Rndom Tissue Figure Fctors my e crossed or nested. () A crossed design exmines every comintion of levels for ech fixed fctor. () Nested design cn progressively sureplicte fixed fctor with nested levels of rndom fctor tht re unique to the level within which they re nested. (c) If rndom fctor cn e reused for different levels of the tretment, it cn e crossed with the tretment nd modeled s lock. (d) A split plot design in which the fixed effects (tissue, drug) re crossed (ech comintion of tissue nd drug re tested) ut themselves nested within replictes. fctor on ll mice nd cells. If mice cn e reused, we cn cross them with the drug nd use them s rndom locking fctor (Fig. c). We will use the design in Figure to illustrte the nlysis of nested fixed nd rndom fctors using nested ANOVA, similr to the ANOVA discussed previously. Now nesting is tken into ccount nd the clcultions hve different interprettions ecuse some of the fctors re rndom. The fixed fctor my hve n effect on the men, nd the two rndom fctors will dd uncertinty. We will e le to estimte the mount of vrince for ech rndom fctor nd use it to etter pln our repliction strtegy. We cn mximize power (for exmple, within cost constrints) to detect difference in mens due to the top-level fixed fctor or to detect vriility due to rndom fctors. The ltter is iologiclly interesting when incresed vrince in cell response my e due to incresed heterogeneity in the genotypes nd implicted in drug resistnce. We will simulte the nested design in Figure using three fctors: A ( = levels: control nd tretment), B (mice, = 5 levels, s B = 1), C (cells, c = 5 levels, s C = ). Expression for ech cell will e mesured using three technicl replictes (σ ε =.5, n = 3). The rw smple dt of the simultion re shown in Figure 3. Nested ANOVA clcultions egin with the sum of squred devitions (SS) to prtition the vrince mong the fctors, exctly s in regulr ANOVA. For exmple, the first lue rrow in Figure 3 represents the difference etween the verges of ll points from mouse B4 (X 14..) nd ll points from the control (X 1...). Fctor C hs the lrgest devitions (Fig. 3) ecuse it ws modeled to e the lrgest source of noise (s C = ). The distinction etween regulr nd nested ANOVA is how the men squres (MS) enter into the clcultion of the F-rtio for ech fctor. The F-rtio is rtio of MS vlues, nd the denomintor corresponds to the MS of the next nested fctor (for exmple, MS B /MS C ) nd not MS E (see Supplementry Tle 1 for nested ANOVA formuls nd clculted vlues; see Supplementry Tle for expected vlues of MS). The F-test uses the rtio of etweengroup smple vrince (estimte of popultion vrince from smple mens) nd within-group vrince (estimte of popultion vrince from smple vrinces) to test whether group mens differ (for fixed fctors). In the cse of rndom fctors, the interprettion is whether the fctor contriutes noise in ddition to the noise due to the fctor nested within it (for exmple, is there more mouse-to-mouse vriility thn would e expected from cell-to-cell vriility?). At the ottom of the nested hierrchy (n = 3 technicl replictes per cell), we find MS E =.55, which is n estimte of s e =.5 in our simultion. We find sttisticlly significnt (t =.5) contriutions to noise from oth mice (fctor B) nd cells (fctor C) with estimted vrince contriutions of.84 nd.1, respectively, which mtches Cell Drug NATURE METHODS VOL.11 NO.1 OCTOBER

26 THIS MONTH npg 14 Nture Americ, Inc. All rights reserved. Expression Smple vlues nd fctor verges X ij.. 8 X ijkl 6 A Control Tretment Fctor B C X Avg i... X ij d Xi... Fctor C Residul k cn cn n 1 SS k Σd d.f MS SS/d.f. F p.5.1 <.1 Vr Figure 3 Dt nd nlysis for simulted three-fctor nested experiment. () Simulted expression levels, X ijkl, mesured for = levels of fctor A (control nd tretment, i), = 5 of fctor B (mice, j), c = 5 of fctor C (cells, k) nd n = 3 technicl replictes (l). Averges cross fctor levels re shown s horizontl lines nd denoted y dots in suscript for the fctor s index. Blue rrows illustrte devitions used for clcultion of sum of squres (SS). Dt re simulted with m c = 1 for control nd m t = 11 for tretment nd s B = 1, s C =, s e =.5 for noise t mouse, cell nd technicl replicte levels, respectively. Vlues elow the figure show fctor levels nd verges t levels of A (X i...) nd B (X ij..). Lels for the levels of B nd C re reused ut represent distinct individul mice nd cells. () Histogrm of devitions (d) for ech fctor. Three devitions illustrted in re identified y the sme lue rrows. Nested ANOVA clcultions show numer of times (k) ech devition (d) contriutes to SS, degrees of freedom (d.f.), men squres (MS), F-rtio, P vlue nd the estimted vrince contriution of ech fctor. our inputs s B = 1 nd s C =. Becuse the top-lyer fctor is fixed nd not considered source of noise, its vrince component is not useful quntity of interest is its effect on the men. Unfortuntely, we were unle to detect difference in mens for A (P =.5) ecuse of poor power due to our lloction of replictes. It is useful to relte the F-test for fctor A to two-smple t-test to understnd the sttisticl quntities involved nd clculte power. The F-test for the top-lyer fctor A (F = MS A /MS B ) tests the difference etween the vrinces of tretment nd mouse mens. Any tretment effect on the men will show up s dditionl vrince, which we stnd chnce to detect. Becuse we hve only two levels of fctor A, the F-test, which hs degrees of freedom (d.f.) of 1 = 1 nd ( 1) = 8, is equivlent to the two-smple t-test for smples of size, ( 1) d.f. nd with t = F. This t-test is pplied to the control nd tretment smples formed using = 5 verges X ij.. (Fig. 3) whose expected vrince is E[Vr(X ij..)] = s B + s C /c + s e /(cn) = 1.43 (ref. 1). This quntity is estimted y MS B /(cn) = 1.8, which is exctly the verge vrince of the two smple vrinces 1.73 nd.83 (Supplementry Tle 3). These smples yield the control nd tretment mens of 1.1 nd 11. (X i...; Fig. 3) nd t-sttistic of.9/ (MS B /(cn)) = 1.4, which yields the sme P vlue of.5 s from the F-test. We cn now clculte the t-test power for our scenrio. For difference in mens of d = 1, the power using smples of size = 5 is.1, using the expected vrince In prctice, we might run tril experiment to determine this vlue using MS B /(cn). Clerly, our initil choice of, c nd n ws n indequte design we should im for power of t lest.8. If vrince is kept t 1.43 (c = 5, n = 5), this power cn e chieved for smple size = 4. With 4 mice, the expected vrince of the verge cross mice would e E[Vr(X i...)] = 1.43/4. Dividing this into the totl vrince due to repliction (s B + s C + s e = 3.5), we cn clculte the effective smple size, 57 (ref. 1). As we ve previously seen, this cn e chieved with the fewest numer of mesurements if we hve = 57 mice nd c = n = 1. If we ssume the cost of mice, cells nd technicl replictes to e 1, 1 nd 1, respectively, these designs would cost 3,96 ( = 4, c = 5, n = 3) nd 6,37 ( = 57, X ijk. Devitions for ech fctor c = 1, n = 1). Let s see if we cn use fewer mice nd increse repliction to otin the sme power t lower cost. The nested nlysis provides generl frmework for these cost nd power clcultions. The optimum numer of replictes t ech level cn e clculted on the sis of the cost of repliction nd the vrince t the level of the fctor. We wnt to minimize Vr(X i...) = s B / + s C /(c)+ s e /(cn) within the cost constrint K = C B + cc D + cnc D (C X is cost per replicte t fctor X) with the gol of finding vlues of, c nd n tht provide the lrgest decrese in the vrince per unit cost. The optimum numer of technicl replictes is n = C C / C D s e /s C. In other words, sureplictes re preferred to replictes when they re cheper nd their fctor is source of greter noise. With the costs s given ove (C C /C N = 1) we find n = 1.5/ =.5 nd n =. We cn pply the sme eqution for the numer of cells, c = C B /C C s C /s B, where C B is the cost of mouse. Using the sme tenfold cost rtio, c = 1 /1 = nd c = 5. For c = 5 nd n =, Vr(X ij..) is 1.45, nd we would rech power of.8 if we hd = 4 mice. This experiment is slightly cheper thn the one with n = 3 (3,84 vs. 3,96). Two components ffect power in detecting differences in mens. Surepliction t the cell nd technicl lyer helps increse power y decresing the vrince of mouse verges, Vr(X ij..), used for t-test smples. The numer of mice lso increses power ecuse it decreses the stndrd error of X ij.. (the precision of mouse verges) ecuse smple size is incresed. To otin the lrgest power to detect tretment effect with the fewest numer of mesurements, it is lwys est to pick s mny mice s possile: effective smple size is lrgest nd vrince of smple verges is lowest. The numer of replictes lso ffects our ility to detect the noise contriution from ech rndom fctor. If detecting nd estimting vriility in mice nd cells is of interest, we should im to increse the power of the ssocited F-tests (Supplementry Tle 1). For exmple, under the lterntive hypothesis of nonzero contriution of cells to noise (s C ), the F-sttistic will e distriuted s multiple of the null hypothesis F-sttistic, F u,v (ns C + s e )/s e. The multipliction fctor is the rtio of expected MS vlues (Supplementry Tle ). For our simultion vlues, the multiple is 13 nd the d.f. re u = 4 nd v = 1. The criticl F-vlue is 1.5, nd our power is the P vlue for 1.5/13, which is essentilly 1 (this is why the P vlue for fctor C in Fig. is very low). For level B we hve u = 8, v = 4, multiple of 3.3 (1.5/6.5) nd power of.7. The power of our design to detect noise within mice nd cells ws much higher thn tht for detecting n effect of the tretment on the mens. Nested designs re useful for understnding sources of vriility in the hierrchy of the susmples nd cn reduce the cost of the experiment when costs vry cross the hierrchy. Sttisticl conclusions cn e mde only out the lyers ctully replicted technicl repliction cnnot replce iologicl repliction for iologicl inference. Note: Any Supplementry Informtion nd Source Dt files re ville in the online version of the pper (doi:1.138/nmeth.3137). COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. Mrtin Krzywinski, Nomi Altmn & Pul Bliney 1. Bliney, P., Krzywinski, M. & Altmn, N. Nt. Methods 11, (14).. Krzywinski, M. & Altmn, N. Nt. Methods 11, (14). 3. Krzywinski, M. & Altmn, N. Nt. Methods 11, (14). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. Pul Bliney is n Assistnt Professor of Biologicl Engineering t MIT nd Core Memer of the Brod Institute. 978 VOL.11 NO.1 OCTOBER 14 NATURE METHODS

27 THIS MONTH npg 14 Nture Americ, Inc. All rights reserved. POINTS OF SIGNIFICANCE Two-fctor designs When multiple fctors cn ffect system, llowing for interction cn increse sensitivity. When proing complex iologicl systems, multiple experimentl fctors my interct in producing effects on the response. For exmple, in studying the effects of two drugs tht cn e dministered simultneously, oserving ll the pirwise level comintions in single experiment is more reveling thn vrying the levels of one drug t fixed level of the other. If we study the drugs independently we my miss iologiclly relevnt insight out synergies or ntisynergies nd scrifice sensitivity in detecting the drugs effects. The simplest design tht cn illustrte these concepts is the design, which hs two fctors (A nd B), ech with two levels (/A nd /B). Specific comintions of fctors (/, A/, /B, A/B) re clled tretments. When every comintion of levels is oserved, the design is sid to e complete fctoril or completely crossed design. So this is complete fctoril design with four tretments. Our previous discussion out experimentl designs ws limited to the study of single fctor for which the tretments re the fctor levels. We used ANOVA 1 to determine whether fctor hd n effect on the oserved vrile nd followed up with pirwise t-tests to isolte the significnt effects of individul levels. We now extend the ANOVA ide to fctoril designs. Following the ANOVA nlysis, pirwise t-tests cn still e done, ut often nlysis focuses on different set of comprisons: min effects nd interctions. Figure 1 illustrtes some possile outcomes in fctoril experiment (vlues in Tle 1). Suppose tht oth fctors correspond to drugs nd the oserved vrile is liver glucose level. In Figure 1, drugs A nd B increse glucose levels y 1 unit. Becuse neither drug influences the effect of the other we sy there is no interction nd tht the effects re dditive. In Figure 1, the effect of A in the presence of B is lrger thn the sum of their effects when they re dministered seprtely (3 vs ). When the effect of the levels of fctor depends on the levels of other fctors, we sy tht there is n interction etween the fctors. In this cse, we need to e creful out defining the effects of ech fctor. The min effect of fctor A is defined s the difference in the mens of the two levels of A verged over ll the levels of B. For Figure 1, the verge for level is τ = ( + 1)/ =.5 nd for level A is τ = (.5 + 3)/ = 1.75, giving min effect of = 1.5 (Tle 1). Similrly, the min effect of B is.5 = The interction compres the differences in the men of A t the two levels of B (.5 = 1.5; in the D row) or, equivlently, the differences in the men of B t the two levels of A (.5 1 = 1.5). Interction plots re useful to evlute effects when the numer of fctors is smll (line plots, Fig 1). The x xis represents levels of one fctor nd lines correspond to levels of other fctors. Prllel lines indicte no interction. The more the lines diverge, or cross, the greter the interction. Figure 1c shows n interction effect with no min effect. This cn hppen if one fctor increses the response t one level of the other fctor ut decreses it t the other. Both fctors hve the sme verge vlue for ech of their levels, t =.5. However, the A τ Δ Min effect Min nd interction effects Interction effect B τ τ τ τ Δ Figure 1 When studying multiple fctors, min nd interction effects cn e oserved, shown here for two fctors (A, lue; B, red) with two levels ech. () The min effect is the difference etween t vlues (light gry), which is the response for given level of fctor verged over the levels of other fctors. () The interction effect is the difference etween effects of A t the different levels of B or vice vers (drk gry, D). (c) Interction effects my msk min effects. two fctors do interct ecuse the effect of one drug is different depending on the presence of the other. There re vrious wys in which effects cn comine; their cler nd concise reporting is importnt. For design with two levels per fctor, effects cn e estimted directly from tretment mens. In this cse, effects should e summrized with their estimted vlue nd confidence intervl (CI) nd grphiclly reported s plot of mens with error rs. Optionlly, twosmple t-test cn e used to provide P vlue for the null hypothesis tht the two tretments hve the sme effect zero difference in their mens. For exmple, with levels /A nd /B we hve four tretment mens m, m A, m B nd m AB. The effect of A t level is m A m, which is estimted y sustituting the oserved smple mens. The stndrd error of this estimte is s.e. = s (1/n A + 1/n ), where s is the estimte of the popultion stndrd devition, estimted y MS E, where MS E is the residul men squre from the ANOVA, nd n ij is the oserved smple size for tretment A = i nd B = j. If the design is lnced, n A = n = n nd s.e. = (MS E /n). The t-sttistic is t = ( A )/s.e. The CI cn e constructed using A ± t* s.e., where t * is the criticl vlue for the t-sttistic t the desired. Note, however, tht the degrees of freedom (d.f.) re the error d.f. from the ANOVA, not (n 1) s in the usul two-smple t-test, ecuse the MS E rther thn the smple vrinces is used in the s.e. computtion. When there re more fctors or more levels, the min effects nd interctions re summrized over mny comprisons s sums of squres (SS) nd usully only the test sttistic (F-test), its d.f. nd the P vlue re reported. If there re sttisticlly significnt interctions, pirwise comprisons of different levels of one fctor for fixed levels of the other fctors (sometimes clled simple min effects) re often computed in the mnner descried ove. If the interctions re not significnt, we typiclly compute differences etween levels of one fctor verged over the levels of the other fctor. Agin, these re pirwise comprisons etween mens tht re hndled s just descried, except tht the smple sizes re lso summed over the levels. To illustrte the two-fctor design nlysis, we ll use simulted dt set in which the effect of levels of the drug nd diet were tested in two different designs, with 8 mice nd 8 oservtions (Fig. ). We ll ssume n experimentl protocol in which mouse liver tissue smple is tested for glucose levels using two-wy ANOVA. Our simulted simple effects re shown in Figure 1 the increse in the response vrile is.5 (A/), 1 (/B) nd 3 (A/B). The two drugs re synergistic A is 4 s potent in the presence of B, s cn e seen y (μ AB μ B )/(μ A μ ) = Δ B /Δ = /.5 = 4 (Tle 1). We ll ssume the sme vrition due to mice nd mesurement error, s =.5. c τ Δ NATURE METHODS VOL.11 NO.1 DECEMBER

28 THIS MONTH npg 14 Nture Americ, Inc. All rights reserved. Tle 1 Quntities used to determine min nd interction effects from dt in Figure 1 Min nd Min effect interction effects Interction effect B t B t B t A Tretment vlues shown re mens for /, /B, A/ nd A/B level comintions. A min effect is oserved if the difference etween t vlues (e.g., = 1) is nonzero. An interction effect is oserved if D, the difference etween the men levels of A, vries cross levels of B or vice vers. We ll use completely rndomized design with ech of the 8 mice rndomly ssigned to one of the four tretments in lnced fshion ech providing single liver smple (Fig. ). First, let s test the effect of the two fctors seprtely using one-wy ANOVA, verging over the vlues of the other fctor. If we consider only A, the effects of B re considered prt of the residul error nd we do not detect ny effect (P =.48, Fig. ). If we consider only B, we cn detect n effect (P =.4) ecuse B hs lrger min effect (..5 = 1.75) thn A ( = 1.5). When we test for multiple fctors, the ANOVA clcultion prtitions the totl sum of squres, SS T, into components tht correspond to A (SS A ), B (SS B ) nd the residul (SS E ) (Fig. ). The dditive twofctor model ssumes tht there is no interction etween A nd B the effect of given level of A does not depend on level of B. In this cse, the interction component is ssumed to e prt of the error. If this ssumption is relxed, we cn prtition the totl vrince into four components, now ccounting for how the response of A vries with B. In our exmple, the SS A nd SS B terms remin the sme, ut SS E is reduced y the mount of SS AB (4.6), to. from 6.6. The resulting reduction in MS E (.5 vs. 1.3) corresponds to the vrince explined y the interction etween the two fctors. When interction is ccounted for, the sensitivity of detecting n effect of A nd B is incresed ecuse the F-rtio, which is inversely proportionl to MS E, is lrger. To clculte the effect nd interction CIs, s descried ove, we strt with the tretment mens =.7, A =.39, B =.86 nd AB = 3.3, ech clculted from two vlues. To clculte the min effects of A nd B, we verge over four mesurements to Mouse Two-fctor designs Completely rndomized (CR) Rndomized complete lock (RCB) Tissue smple A B Block A + B Prtitioning of sum of squres nd P vlues CR RCB SS T SS A SS B SS AB SS M SS E P vlue A B AB M Figure In two-fctor experiments, vrince is prtitioned etween ech fctor nd ll comintions of interctions of the fctors. () Two common twofctor designs with 8 mesurements ech. In the CR scenrio, ech mouse is rndomly ssigned single tretment. Vriility mong mice cn e mitigted y grouping mice y similr chrcteristics (e.g., litter or weight). The group ecomes lock. Ech lock is suject to ll tretments. () Prtitioning of the totl sum of squres (SS T ; CR, 16.9; RCB, 6.4) nd P vlues for the CR nd RCB designs in. M represents the locking fctor. Verticl xis is reltive to the SS T. The totl d.f. in oth cses = 7; ll other d.f. = 1. find =.57, A = 1.4, =.6 nd B =.5. The residul error MS E =.5 is used to clculte the s.e. of min effects: (MS E /n) = (.5/4) =.5. The criticl t-vlue t =.5 nd d.f. = 4 is.78, giving 95% CI for the min effect of A to e.9 ± 1.4 (F 1,4 =.9), where d.f. = (1,4) nd of B to e.1 ± 1.4 (F 1,4 = 17.6). The CIs reflect tht we detected the min effect of B ut not of A. For the interction, we find ( AB B ) ( A ) = 3. with s.e. = 1 nd CI of 3. ±.8 (F 1,4 = 9.1). To improve the sensitivity of detecting n effect of A, we cn mitigte iologicl vriility in mice y using rndomized complete lock pproch 1 (Fig. ). If the mice shre some chrcteristic, such s litter or weight which contriutes to response vriility, we could control for some of the vrition y ssigning one complete replicte to ech tch of similr mice. The totl numer of oservtions will still e 8, nd we will trck the mouse tch cross mesurements nd use the tch s rndom locking fctor. Now, in ddition to the effect of interction, we cn further reduce the MS E y the mount of vrince explined y the lock (Fig. ). The sum-of-squres prtitioning nd P vlues for the locking scenrio re shown in Figure. In ech cse, the SS E vlue is proportiontely lower thn in the completely rndomized design, which mkes the tests more sensitive. Once we incorporte locking nd interction, we re le to detect oth min nd interction effects nd ccount for nerly ll of the vrince due to sources other thn mesurement error (SS E =.8, MS E =.5). The interprettion of P =.1 for the locking fctor M is tht the iologicl vrition due to the locking fctor hs nonzero vrince. Effects nd CIs re clculted just s for the completely rndomized design lthough the mens hve two sources of vrince (lock effect nd MS E ), their difference hs only one (MS E ) ecuse the lock effect cncels. With two fctors, more complicted designs re lso possile. For exmple, we might expose the whole mouse to drug (fctor A) in vivo nd then expose two liver smples to different in vitro tretments (fctor B). In this cse, the two liver smples from the sme mouse form lock tht is nested in mouse. We might lso consider fctoril designs with more levels per fctor or more fctors. If the response to our two drugs depends on genotype, we might consider using three genotypes in 3 fctoril design with 1 tretments. This design llows for the possiility of interctions mong pirs of fctors nd lso mong ll three fctors. The smllest fctoril design with k fctors hs two levels for ech fctor, leding to k tretments. Another set of designs, clled frctionl fctoril designs, used frequently in mnufcturing, llows for lrge numer of fctors with smller numer of smples y using crefully selected suset of tretments. Complete fctoril designs re the simplest designs tht llow us to determine synergies mong fctors. The dded complexity in visuliztion, summry nd nlysis is rewrded y n enhnced ility to understnd the effects of multiple fctors cting in unison. COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. Mrtin Krzywinski & Nomi Altmn 1. Krzywinski, M. & Altmn, N. Nt. Methods 11, (14).. Krzywinski, M. & Altmn, N. Nt. Methods 11, (14). 3. Montgomery, D.C. Design nd Anlysis of Experiments 8th edn. (Wiley, 1). Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University VOL.11 NO.1 DECEMBER 14 NATURE METHODS

29 this month npg 15 Nture Americ, Inc. All rights reserved. Points of SIGNIFICANCE Sources of vrition To generlize conclusions to popultion, we must smple its vrition. Vriility is inevitle in experiments owing to oth iologicl nd technicl effects. Wheres technicl vriility should e tightly controlled to enhnce the internl vlidity of the results, some types of iologicl vriility need to e mintined to llow generliztion of the results to the popultion of interest. Experimentl control, rndomiztion, locking nd repliction re the tools tht llow replicle nd meningful results to e otined in the fce of vriility. In previous columns we hve given exmples of how vrition limits our ility to detect effects y reducing the power of tests. This month we go into more detil out vriility nd how it ffects our ility to replicte the experimentl results (internl vlidity) nd generlize from our experiment to the popultion (externl vlidity). Let s strt with n idelized experiment, which we will then expnd upon. Suppose tht we re le to culture single murine cell under tightly controlled conditions so tht the response of different liquots of the culture is identicl. Also, suppose tht our mesuring device is so ccurte tht the difference etween mesurements of n liquot is elow the detection limit. If mesurement does not disrupt the cell culture, we require only single liquot: we mesure the seline response, pply the tretment nd mesure the tretment response. No repliction is needed ecuse differences etween the mesurements cn only e due to the tretment. This idelized system hs perfect internl vlidity the response vrile solely reflects the tretment effect, nd repeting the experiment on nother liquot from the sme cell culture will give identicl results. However, the system lcks externl vlidity it tells us out only specific cell from specific mouse. We know tht cells vry within single tissue, nd tht tissues vry from mouse to mouse, ut we cnnot use this idel system to mke inferences out other cell cultures or other mice ecuse we hve no wy of determining how much vriility to expect. To do so requires tht we smple the iologicl vrition cross relevnt experimentl vriles (Fig. 1). A well-designed experiment is compromise etween internl nd externl vlidity. Our gol is to oserve reproducile effect tht cn e due only to the tretment (voiding confounding nd is) while c Suject popultion Precise ut not representtive Not precise ut representtive Tretment effect Plceo Tretment Precise nd representtive Block Block Block Figure 1 Internl nd externl vlidity relte respectively to how precise nd representtive the results re of the popultion of interest. () Smpling only prt of the popultion my crete precise mesurements, ut generlizing to the rest of the popultion cn result in is. () Better representtion cn e chieved y smpling cross the popultion, ut this cn result in highly vrile mesurements. (c) Identifying locks of similr sujects within the popultion increses the precision (within lock) nd cptures popultion vriility (etween locks). simultneously mesuring the vriility required to estimte how much we expect the effect to differ if the mesurements re repeted with similr ut not identicl smples (replictes). When dministering the tretment in vivo, we cn never control the mny sources of iologicl vriility in the mice sufficiently to chieve identicl mesurements for different nimls. However, with creful design, we cn reduce the impct of this vriility on our mesurements y controlling some of these fctors. Genotype nd gender re exmples of sources of vriility tht re under complete experimentl control. We cn eliminte the source entirely y selecting single level or select severl levels so tht the effects cn e determined. For gender we cn oserve ll the possile levels, so we cn tret gender s fixed fctor in our experiment. Genotype cn e fixed effect (specific genotypes of interest, such s mutnt nd its ckground wild type) or rndom (noise) effect (severl wild-type strins representing the wild-type popultion). Only y deliertely introducing vriility cn we mke generl sttements out tretment effect nd then only cross fctors tht were vried. Other sources of vriility, such s diet, temperture nd other housing effects, re under prtil experimentl control. Noise fctors tht cnnot e controlled, or re unknown, cn e hndled y rndom ssignment 1 (to void is), repliction (to increse precision) nd locking 3 (to isolte noise). When deling with vrition, two principles pply: the precision with which we cn chrcterize smple (e.g., s.e.m.) nd the mnner in which vrinces from different sources comine together 4. The s.e.m. of rndom smple is s/ n, where s is the s.d. of the popultion (lso written s Vr( ) = Vr(X)/n). With sufficient repliction (lrge n), our precision in mesuring the men s mesured y the s.e.m. cn e mde ritrrily smll (Fig. ). When multiple independent sources of vrition re present, the vrince of the mesurement is the sum of individul vrinces. These two principles cn e comined to otin the vrition of the men in nested repliction scenrio (Supplementry Fig. 1). Suppose tht vrinces due to mouse, cell nd mesurement re M, C nd e (Vr() is omitted for revity). The vrince of the mesurement of single cell will e M + C + e, the sum of the individul vrinces. If we mesure the sme cell n e times, the vrince of the verge mesurement will e M + C + e/n e. If we mesure n C cells, ech n e times, the vrince will e M + C/n C + e/(n C n e ). Finlly, if we repet the procedure for n M mice, the vrince will e reduced to M/n M + C/(n M n C ) + e/(n M n C n e ). In generl, the vrince of ech source is divided y the numer of times tht source is independently smpled. This is illustrted in Figure for M = 1, C = 4 nd e =.5. As we hve lredy seen, the numer of replictes t ech lyer (n M, n C, n e ) cn e controlled to optimlly reduce vrition (increse power) within prcticl constrints (cost). For exmple, to reduce the totl vrince to 5% of the totl M + C + e, we cn smple using n M = 4, n C = 1 or n M = n C = 3 (Fig. ). Smpling single mouse llows us to reduce vrince only to M, ut it would not llow us to estimte the vrition t the mouse lyer nd therefore would not llow for inference out the popultion of mice. For our exmple, technicl vrition is much smller thn iologicl vrition, nd technicl replictes re of little vlue vrince is reduced y only 5% for n M = n C = 1 nd n e = 1 (Fig., gry trce) nd cn e reduced only to M + C. When mesurements themselves re n verge of lrge numer of contriuting fctors, iologicl vriility of the components cn e underestimted. For exmple, mesuring two smples from the nture methods VOL.1 NO.1 JANUARY 15 5

30 this month npg 15 Nture Americ, Inc. All rights reserved. σ σ Precision of smple men 95% CI 4 s.e.m Smple size, n Effect of nested repliction on vrince of smple men σ TOT Technicl Cell Mouse Mouse + cell Numer of replictes per level Figure In the presence of vriility, the precision in smple men cn e improved y incresing the smple size, or the numer of replictes in nested design. () Incresing the smple size, n, improves the precision in the men y 1/ n s mesured y the s.e.m. The 95% CI is more intuitive mesure of precision: the rnge of vlues tht re not significntly different t =.5 from the oserved men. The 95% confidence intervl (CI) shrinks s t*/ n, where t* is the criticl vlue of the Student s t-distriution t two-tiled =.5 nd n 1 degrees of freedom. t* decreses from 4.3 (n = 3) to. (n = 5). Dotted lines represent constnt multiples of the s.e.m. () For nested design with mouse, cell nd technicl vrinces of M = 1, C = 4, e =.5 (s TOT = 5.5), the vrince of the men decreses with the numer of replictes t ech lyer. sme homogenized tissue, gives us the verge of ll cells. There is essentilly no iologicl vrition in these mesurements ecuse n in the s.e.m. term is very lrge the only vriility tht we re likely to find is due to mesurement error. We must not confuse the reproduciility of the tissue verge with response of individul cells, which cn e quite vrile. Blocking 3 on noise vrile llows us to remove noise effect y tking difference of two mesurements tht shre the sme vlue of the noise (e.g., sme smple efore nd fter tretment). Blocking enhnces externl vlidity within the lock, vriility is controlled s tightly s possile for internl vlidity. The locks themselves re chosen to cover the rnge of vriility needed to estimte the response vriility in the popultion of interest (Fig. 1c). This is the pproch tken y the pired t-test, in which the lock is suject. For nother exmple, heterogeneous tissue could not e homogenized nd lock would e defined y sptil oundry etween different cells. Neglecting to ccount for this would disregrd the lock oundries in Figure 1c nd would reduce sensitivity. There cn lso e multiple sources of technicl vriility, such s regents, mesurement pltforms nd personnel. The sme principles pply s for iologicl inference, mesures of technicl vriility re seldom of interest the usul ojective is to minimize it. Blocking my still e used to eliminte known sources of noise for exmple, collorting ls my ech do one complete replicte of n experiment to provide sufficient repliction while eliminting ny vriility due to l effects in the tretment comprisons. Consider n experiment tht ssesses the effect of drug on the livers of mle mice of specific genotype, t oth the niml nd cell lyers. If the drug is dministered in vivo, the niml is euthnized nd the response mesured on mny cells, nimls exposed to the drug cnnot e their own controls. So, we expect vriility t oth the mouse lyer nd the cell (within mouse) lyer. As well, we expect vriility due to cell culture nd mternl effects. In the simplest experiment, we hve nested design, with mice selected t rndom for the tretment nd the control. After dissection, cells re smpled from ech liver, nd their response to the drug is mesured. The totl vrition of the mesurement is the sum of vrinces of ech effect, weighted y the numer of times the effect ws independently smpled (Fig. ). Using the sme vrinces s ove Repliction level nd (n M, n C, n e ) = (1, 5, 3) we find Vr( ) = 1/1 + 4/5 +.5/15 =.18. The vrince of the difference in the mens of two mesurements (e.g., reference nd drug) will e twice this,.36, nd our power to detect n effect of d = 1.5 is.65 (Supplementry Note). Suppose tht we discover tht the mouse vrition, M = 1, hs significnt components from mternl nd cell culture effects, given y vrinces M MAT nd M CELL. In this context, we cn prtition M = M MAT + M CELL + M, where M is the unique vrince not ttriutle to mternl or cell culture effects. We cn ttempt to control mternl effects y using siling pirs ( lock) nd sujecting one mouse from ech pir to the drug nd one to the control. As the pirs hve the sme mother, the mternl effects cncel. Similrly, vrince due to cell culture effects cn e minimized y concurrently euthnizing ech siling pir (nother lock) nd jointly prepring the cell cultures. Hving locked these two effects, lthough M MAT nd M CELL still contriute to the vrince for oth control nd drug, we hve effectively removed them from the vrince of the difference in mens. If these effects ccount for hlf of the mouse vrince, M MAT + M CELL = M/ =.5 (using M = 1 s ove), locking reduces the vrince in the difference y (M MAT + M CELL )/1 from.36 to.6 nd increses our power to.79 (Supplementry Note). We cn use the concept of effective smple size, n = Vr(X)/Vr( ), to demonstrte the effect of this locking. In the nested repliction design, n is typiclly smller thn the totl numer of mesurements (n M n C n e ) ecuse we do not independently smple ech source of vrition in ech mesurement (it is lrgest for n C = n e = 1). As result, repliction t the cell nd technicl lyers decreses Vr( ) proportionlly more slowly thn repliction t the topmost mouse lyer. When oth mternl nd cell culture effects re included, Vr(X) = M + C + e = 5.5 nd the effective smple size is n = 5.5/.36 = 15. When mternl nd cell effects re locked, Vr(X) remins the sme, ut now Vr( ) is reduced to.6 nd n = 5.5/.6 =. Given the choice, we should lwys lock t the top lyer ecuse the noise in this lyer is independently smpled the fewest times. We cn use the effective smple size n to illustrte this. Blocking t mouse lyer decresed M from 1 to.5 (y 5%) nd incresed n from 15 to (power from.65 to.79). In contrst, proportionl reduction in C from 4 to increses n to 19 (power to.76), wheres reduction in e hs essentilly no effect on n. We need to distinguish etween sources of vrition tht re nuisnce fctors in our gol to mesure men iologicl effects from those tht re required to ssess how much effects vry in the popultion. Wheres the former should e minimized to optimize the power of the experiment, the ltter need to e smpled nd quntified so tht we cn oth generlize our conclusions nd roustly determine the uncertinty in our estimtes. Note: Any Supplementry Informtion nd Source Dt files re ville in the online version of the pper (doi:1.138/nmeth.34). COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. Nomi Altmn & Mrtin Krzywinski 1. Krzywinski, M. & Altmn, Nt. Methods 11, (14).. Bliney, P., Krzywinski, M. & Altmn, Nt. Methods 11, (14). 3. Krzywinski, M. & Altmn, N. Nt. Methods 11, (14). 4. Krzywinski, M. & Altmn, N. Nt. Methods 1, (13). Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. 6 VOL.1 NO.1 JANUARY 15 nture methods

31 this month npg 15 Nture Americ, Inc. All rights reserved. Points of SIGNIFICANCE Split plot design When some fctors re hrder to vry thn others, split plot design cn e efficient. We hve lredy seen tht vrying two fctors simultneously provides n effective experimentl design for exploring the min (verge) effects nd interctions of the fctors 1. However, in prctice, some fctors my e more difficult to vry thn others t the level of experimentl units. For exmple, drugs given orlly re difficult to dminister to individul tissues, ut oservtions on different tissues my e done y iopsy or utopsy. When the fctors cn e nested, it is more efficient to pply difficult-to-chnge fctor to the units t the top of the hierrchy nd then pply the esier-to-chnge fctor to nested unit. This is clled split plot design. The term split plot derives from griculture, where fields my e split into plots nd suplots. It is instructive to review completely rndomized design (CRD) nd rndomized complete lock design (RCBD) nd show how these relte to split plot design. Suppose we re studying the effect of irrigtion mount nd fertilizer type on crop yield. We hve ccess to eight fields, which cn e treted independently nd without proximity effects (Fig. 1). If pplying irrigtion nd fertilizer is eqully esy, we cn use complete fctoril design nd ssign levels of oth fctors rndomly to fields in lnced wy (ech comintion of fctor levels is eqully represented). If our lnd is divided into two lrge fields tht my differ in some wy, we cn use the field s locking fctor (Fig. 1). Within ech lock, we gin perform complete fctoril design: irrigtion nd fertilizer re ssigned to ech of the four smller fields within the lrge field, leding to n RCBD with field s the lock. Ech comintion of irrigtion nd fertilizer is lnced within the lrge field. So fr, we hve not considered whether mnging levels of irrigtion nd fertilizer require the sme effort. If vrying irrigtion on smll scle is difficult, it mkes more sense to irrigte lrger res of lnd thn in Figure 1 nd then vry the fertilizer ccordingly to mintin lnced design. If our lnd is divided into four fields (whole plots), ech of which cn e split into two suplots (Fig. 1c), we would ssign irrigtion to whole plots using CRD. Within whole plot, fertilizer would e distriuted cross suplots using RCBD, CRD RCBD c Split plot + CRD d Split plot + RCBD A A AABB Whole plot Suplot AB BA ABBA BABB BAAB BA BABA Irrigtion AB Fertilizer Block Figure 1 Split plot design exmples from griculture. () In CRD, levels of irrigtion nd fertilizer re ssigned to plots of lnd (experimentl units) in rndom nd lnced fshion. () In RCBD, similr experimentl units re grouped (for exmple, y field) into locks nd tretments re distriuted in CRD fshion within the lock. (c) If irrigtion is more difficult to vry on smll scle nd fields re lrge enough to e split, split plot design ecomes pproprite. Irrigtion levels re ssigned to whole plots y CRD nd fertilizer is ssigned to suplots using RCBD (irrigtion is the lock). (d) If the fields re lrge enough, they cn e used s locks for two levels of irrigtion. Ech field is composed of two whole plots, ech composed of two suplots. Irrigtion is ssigned to whole plots using RCBD (locked y field) nd fertilizer ssigned to suplots using RCBD (locked y irrigtion). Split plot + CRD Split plot + RCBD c Split-split plot + CRD/RCBD Mouse Drug Tissue Housing unit Temperture Figure In iologicl experiments using split plot designs, whole plot experimentl units cn e individul nimls or groups. () A two-fctor, split plot niml experiment design. The whole plot is represented y mouse ssigned to drug, nd tissues represent suplots. () Biologicl vriility coming from nuisnce fctors, such s weight, cn e ddressed y locking the whole plot fctor, whose levels re now smpled using RCBD. (c) With three fctors, the design is split-split plot. The housing unit is the whole plot experimentl unit, ech suject to different temperture. Temperture is ssigned to housing using CRD. Within ech whole plot, the design shown in is performed. Drug nd tissue re suplot nd su-suplot units. Repliction is done y incresing the numer of housing units. rndomly nd lnced within whole plots with given irrigtion level. Irrigtion is the whole plot fctor nd fertilizer is the suplot fctor. It is importnt to note tht ll split plot experiments include t lest one RCBD suexperiment, with the whole plot fctor cting s lock. Assigning levels of irrigtion to fields t rndom neglects ny heterogeneity mong the fields. For exmple, if the lnd is divided into two lrge fields (Fig. 1), it is est to consider ech s lock. Within ech lock, we consider hlf of the field s whole plot nd irrigte using RCBD (Fig. 1d). As efore, the fertilizer is ssigned to suplots using RCBD. The designs in Figure 1c nd Figure 1d vry only in how the whole plot fctor levels re ssigned: y CRD or RCBD. Becuse split plot designs re sed on RCBD, the two cn e esily confused. For exmple, why is Figure 1 not considered split plot design with field index eing the whole plot fctor? The nswer involves whether we re interested in specific levels of the fctor or re using it for locking purposes. In Figure 1, the field is locking fctor ecuse it is used to control the vriility of the plots, not s systemtic effect. We use these two fields to generlize to ll fields. In Figure 1c, irrigtion is whole plot fctor nd not locking fctor ecuse we re studying the specific levels of irrigtion. The terms whole plot nd suplot trnslte nturlly from griculturl to iologicl context, where split plot designs re common. Mny fctors, such s diet or housing conditions, re more esily pplied to lrge groups of experimentl sujects, mking them suitle t the whole plot level. In other experiments, fctors tht re smpled hierrchiclly or from the sme individul (tissue, cell or time points) cn ct s suplot fctors. Figure illustrtes split plot designs in iologicl context. Suppose tht we wish to determine the in vivo effect of drug on gene expression in two tissues. We ssign mice to one of two drug tretments using CRD. The mouse is the whole plot experimentl unit nd the drug is the whole plot fctor. Both tissues re smpled from ech mouse. The tissue is the suplot fctor nd ech mouse cts s lock for the tissue suplot fctor; this is the RCBD component (Fig. ). The mouse itself cn e considered rndom fctor used to smple iologicl vriility nd increse the externl vlidity of the experiment. If we suspect environmentl vriility, we cn group the mice y their housing unit (Fig. ), just s we did whole plots y field (Fig. 1d). The housing unit is now locking fctor for the drug, which is pplied to mice using RCBD. Other wys to group mice might e y weight, fmilil reltionship or genotype. Sensitivity in detecting effects of the suplot fctor s well s interctions is generlly greter thn for corresponding completely nture methods VOL.1 NO.3 MARCH

32 this month npg 15 Nture Americ, Inc. All rights reserved. Split-split plot c d Two-fctor time course Split plot + CRD repeted mesures t 1 t t 3 t 1 t t 3 Time Time Mouse Drug Tissue Split plot + RCBD repeted mesures Housing unit repeted mesures Figure 3 The split plot design with CRD is commonly pplied to repeted mesures time course design. () Bsic time course design, in which time is one of the fctors. Ech mesurement uses different mouse. () In repeted mesures design, mice re followed longitudinlly. Drug is ssigned to mice using CRD. Time is the suplot fctor. (c) Drug is locked y housing. (d) A three-fctor, repeted mesures split-split plot design, now including tissue. Tissue is suplot nd time is su-suplot. rndomized fctoril design in which only one tissue is mesured in ech mouse. This is ecuse tissue comprisons re within mouse. However, ecuse compring the whole plot fctor (drug) is done etween sujects, the sensitivity for the whole plot fctor is similr to tht of completely rndomized design. Applying locking t the whole plot level, such s housing (Fig. ), cn improve sensitivity for the whole plot fctor similrly to using RCBD. Compred to split plot design, the completely rndomized design is oth more expensive (twice s mny mice re required) nd less efficient (mouse vriility will not cncel, nd thus the tissue nd interction effects will include mouse-to-mouse vriility). The experimentl unit t the whole plot level does not hve to correspond to n individul. It cn e one level ove the individul in the hierrchy, such s group or enclosure. For exmple, suppose we re interested in dding temperture s one of the fctors to the study in Figure. Since it is more prcticl to control the temperture of the housing unit thn of individul mice, we use cge s the whole plot (Fig. c). Temperture is the whole plot fctor nd cge is the experimentl unit t the whole plot level. As in Figure, we use CRD to ssign the whole plot fctor (temperture) levels to whole plots (cges). Mice re now experimentl units t the suplot level nd the drug is now suplot fctor. Becuse we hve three lyers in the hierrchy of fctors, tissue is t the su-suplot level nd the design is split-split plot. In Figure, the cge is lock used to control vriility ecuse the effects of housing re not of specific interest to us. By contrst, in Figure c, specific levels of the temperture fctor re of interest so it is prt of the plot fctor hierrchy. Cre must e tken to not mistke split plot design for CRD. For exmple, n indvertent split plot 3 cn result if some fctor levels re not chnged etween experiments. If the nlysis trets ll experiments s independent, then we cn expect mistkes in conclusions out the significnce of effects. With two fctors, more complicted designs re lso possile. For exmple, we might expose the whole mouse to drug (fctor A) in vivo nd then expose two liver smples to different in vitro tretments (fctor B). In this cse, the two liver smples from the sme mouse form lock, which is nested in mouse 4. The split plot CRD design (Fig. ) is commonly used s the sis for repeted mesures design, which is type of time course design. The most sic time course includes time s one of the fctors in twofctor design. In completely rndomized time course experiment, different mice re used t ech of the mesurement times t 1, t nd t 3 fter initil tretment (Fig. 3). If the sme mouse is used t ech time nd the mice re ssigned t rndom to the levels of (time-invrint) fctor, the design ecomes repeted mesures design (Fig. 3) Tle 1 Split plot ANOVA tle for two-fctor split plot designs CRD RCBD d.f. MS F-rtio d.f. MS F-rtio Block, l n MS l MS l /MS wp A MS A MS A /MS wp MS A MS A /MS wp Error wp n MS wp n MS wp B MS B MS B /MS sp MS B MS B /MS sp A B MS A B MS AB /MS sp MS A B MS A B /MS sp Error sp n MS sp n MS sp Totl n 1 n 1 Split plot ANOVA tle for two fctor split plot designs using CRD (Fig. 1c) nd RCBD (Fig. 1d) with levels of whole plot fctor A nd levels of suplot fctor B. For CRD n is mesurements per suplot nd for RCBD n is numer of locks. Whole plot nd suplot errors re indicted y wp nd sp suscripts, respectively. For RCBD, interction etween locking fctor l nd B is usully included in the suplot error term. = 1, = 1, n = n 1. d.f., degrees of freedom; F-rtio, test sttistic for F test. ecuse the mesurements re nested within mouse. The time of mesurement is the suplot fctor. The corresponding repeted mesures of the design tht uses housing s lock in Figure is shown in Figure 3c. As efore, housing is the lock nd drug is the whole plot fctor, ut now time is the suplot fctor. If we include tissue type, the design ecomes split-split plot, with tissue eing suplot nd time su-suplot (Fig. 3d). Split plot designs re nlyzed using ANOVA. Becuse comprisons t the whole plot level hve different vriility thn those t the suplot level, the ANOVA tle contins two sources of error, MS wp nd MS sp, the men squre ssocited with whole plots nd suplots, respectively (Tle 1). This difference occurs ecuse the suplot fctor is lwys compred within lock, while the whole plot fctor is compred etween the whole plots. For exmple, in Figure, vrition etween mice cncels out when compring tissues ut not when compring drugs. Anlogously to two-fctor ANOVA 1, we clculte the sums of squres nd men squres in split plot ANOVA. For exmple, in split plot with RCBD, given n locks of locking fctor l (Tle 1) t the whole plot level nd nd levels of whole plot fctor A nd suplot fctor B, MS l = SS l /(n 1), where SS l is the sum of squred devitions of the verge cross ech lock reltive to the grnd men times the numer of mesurements contriuting to ech verge ( ). Similrly, SS A uses the verge cross levels of A nd the multiple is n. The nlysis t the whole plot level is essentilly the sme s in one-wy ANOVA with locking: the suplot vlues re considered susmples. The ssocited MS sp is usully lower thn in fctoril design, which improves the sensitivity in detecting A B interctions. Split plot designs re helpful when it is difficult to vry ll fctors simultneously, nd, if fctors tht require more time or resources cn e identified, split plot designs cn offer cost svings. This type of design is lso useful for cses when the investigtor wishes to expnd the scope of the experiment: fctor cn e dded t the whole plot level without scrificing sensitivity in the suplot fctor. COMPETING FINANCIAL INTERESTS The uthors declre no competing finncil interests. Nomi Altmn & Mrtin Krzywinski 1. Krzywinski, M., Altmn, N. & Bliney, P. Nt. Methods 11, (14).. Krzywinski, M. & Altmn, N. Nt. Methods 11, (14). 3. Gnju, J. & Lucs, J.M. J. Stt. Pln. Infer. 81, (1999). 4. Krzywinski, M., Altmn, N. & Bliney, P. Nt. Methods 11, (14). Nomi Altmn is Professor of Sttistics t The Pennsylvni Stte University. Mrtin Krzywinski is stff scientist t Cnd s Michel Smith Genome Sciences Centre. 166 VOL.1 NO.3 MARCH 15 nture methods

33 this month npg 15 Nture Americ, Inc. All rights reserved. Points of SIGNIFICANCE Byes theorem Incorporte new evidence to updte prior informtion. Oserving, gthering knowledge nd mking predictions re the foundtions of the scientific process. The ccurcy of our predictions depends on the qulity of our present knowledge nd ccurcy of our oservtions. Wether forecsts re fmilir exmple the more we know out how wether works, the etter we cn use current oservtions nd sesonl records to predict whether it will rin tomorrow nd ny disgreement etween prediction nd oservtion cn e used to refine the wether model. Byesin sttistics emodies this cycle of pplying previous theoreticl nd empiricl knowledge to formulte hypotheses, rnk them on the sis of oserved dt nd updte prior proility estimtes nd hypotheses using oserved dt 1. This will e our first of series of columns out Byesin sttistics. This month, we ll introduce the topic using one of its key concepts Byes theorem nd expnd to include topics such s Byesin inference nd networks in future columns. Byesin sttistics is often contrsted with clssicl (frequentist) sttistics, which ssumes tht oserved phenomen re generted y n unknown ut fixed process. Importntly, clssicl sttistics ssumes tht popultion prmeters re unknown constnts, given tht complete nd exct knowledge out the smple spce is not ville. For estimtion of popultion chrcteristics, the concept of proility is used to descrie the outcomes of mesurements. In contrst, Byesin sttistics ssumes tht popultion prmeters, though unknown, re quntifile rndom vriles nd tht our uncertinty out them cn e descried y proility distriutions. We mke sujective proility sttements, or priors, out these prmeters sed on our experience nd resoning out the popultion. Proility is understood from this perspective s degree of elief out the vlues of the prmeter under study. Once we collect dt, we comine them with the prior to crete distriution clled the posterior tht represents our updted informtion out the prmeters, s proility ssessment out the possile vlues of T Toss H T Toss H.5 1. C Cʹ Coin.5 1. C C Coin Mrginl, conditionl nd joint proilities Mrginl (individul) Conditionl Independent events P(C) P(H) P(H C) P(C H) Dependent events P(C ) P(H) P(H C ) P(C H) Joint P(C,H) P(C) P(H) =.5 P(C,H) = P(C H) P(H) P(H C) P(C).375 Figure 1 Mrginl, joint nd conditionl proilities for independent nd dependent events. Proilities re shown y plots 3, where columns correspond to coins nd stcked rs within column to coin toss outcomes, nd re given y the rtio of the lue re to the re of the red outline. The choice of one of two fir coins (C, Cʹ) nd outcome of toss re independent events. For independent events, mrginl nd conditionl proilities re the sme nd joint proilities re clculted using the product of proilities. If one of the coins, C, is ised (yields heds (H) 75% of the time), the events re dependent, nd joint proility is clculted using conditionl proilities. Byes theorem P(C H) = P(H C ) P(C ) / P(H) P(H C ) = P(C H) P(H) / P(C ) Posterior Prior Posterior Prior Updting priors nd itertive estimtion of proilities P(C ) P(C H) P(C ) P(C H) Prior Posterior Prior Posterior Updte H prior H.5.6 Figure Grphicl interprettion of Byes theorem nd its ppliction to itertive estimtion of proilities. () Reltionship etween conditionl proilities given y Byes theorem relting the proility of hypothesis tht the coin is ised, P(C ), to its proility once the dt hve een oserved, P(C H). () The proility of the identity of the chosen coin cn e inferred from the toss outcome. Oserving hed increses the chnces tht the coin is ised from P(C ) =.5 to.6, nd further to.69 if second hed is oserved. the prmeter. Given tht experience, knowledge, nd resoning process vry mong individuls, so do their priors mking specifiction of the prior one of the most controversil topics in Byesin sttistics. However, the influence of the prior is usully diminished s we gther knowledge nd mke oservtions. At the core of Byesin sttistics is Byes theorem, which descries the outcome proilities of relted (dependent) events using the concept of conditionl proility. To illustrte these concepts, we ll strt with independent events tossing one of two fir coins, C nd C. The toss outcome proility does not depend on the choice of coin the proility of heds is lwys the sme, P(H) =.5 (Fig. 1). The joint proility of choosing given coin (e.g., C) nd toss outcome (e.g., H) is simply the product of their individul proilities, P(C, H) = P(C) P(H). But if we were to replce one of the coins with ised coin, C, tht yields heds 75% of the time, the choice of coin would ffect the toss outcome proility, mking the events dependent. We express this using conditionl proilities y P(H C) =.5 nd P(H C ) =.75, where mens given or conditionl upon (Fig. 1). If P(H C ) is the proility of oserving heds given the ised coin, how cn we clculte P(C H), the proility tht the coin is ised hving oserved heds? These two conditionl proilities re generlly not the sme filing to distinguish them is known s the prosecutor s fllcy. P(H C ) is property of the ised coin nd, unlike P(C H), is unffected y the chnce of the coin eing ised. We cn relte these conditionl proilities y first writing the joint proility of selecting C nd oserving H: P(C, H) = P(C H) P(H) (Fig. 1). The fct tht this is symmetric, P(C H) P(H) = P(H C ) P(C ), leds us to Byes theorem, which is rerrngement of this equlity: P(C H) = P(H C ) P(C )/P(H) (Fig. ). P(C ) is our guess of the coin eing ised efore dt re collected (the prior), nd P(C H) is our guess once we hve oserved heds (the posterior). If oth coins re eqully likely to e picked, P(C ) = P(C) =.5. We lso know tht P(H C ) =.75, which is property of the ised coin. To pply Byes theorem, we need to clculte P(H), which is the proility of ll the wys of oserving heds picking the fir coin nd oserving heds nd picking the ised coin nd oserving heds. This is P(H) = P(H C) P(C) + P(H C ) P(C ) = =.65. By sustituting these vlues in Byes theorem, we cn compute the proility tht the coin is ised.6.69 nture methods VOL.1 NO.4 APRIL 15 77

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17 CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom Lerning Gols Continuous Rndom Vriles Clss 5, 8.05 Jeremy Orloff nd Jonthn Bloom. Know the definition of continuous rndom vrile. 2. Know the definition of the proility density function (pdf) nd cumultive

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

10. AREAS BETWEEN CURVES

10. AREAS BETWEEN CURVES . AREAS BETWEEN CURVES.. Ares etween curves So res ove the x-xis re positive nd res elow re negtive, right? Wrong! We lied! Well, when you first lern out integrtion it s convenient fiction tht s true in

More information

CS667 Lecture 6: Monte Carlo Integration 02/10/05

CS667 Lecture 6: Monte Carlo Integration 02/10/05 CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of

More information

Chapter 9 Definite Integrals

Chapter 9 Definite Integrals Chpter 9 Definite Integrls In the previous chpter we found how to tke n ntiderivtive nd investigted the indefinite integrl. In this chpter the connection etween ntiderivtives nd definite integrls is estlished

More information

4.1. Probability Density Functions

4.1. Probability Density Functions STT 1 4.1-4. 4.1. Proility Density Functions Ojectives. Continuous rndom vrile - vers - discrete rndom vrile. Proility density function. Uniform distriution nd its properties. Expected vlue nd vrince of

More information

Section 6: Area, Volume, and Average Value

Section 6: Area, Volume, and Average Value Chpter The Integrl Applied Clculus Section 6: Are, Volume, nd Averge Vlue Are We hve lredy used integrls to find the re etween the grph of function nd the horizontl xis. Integrls cn lso e used to find

More information

1 ELEMENTARY ALGEBRA and GEOMETRY READINESS DIAGNOSTIC TEST PRACTICE

1 ELEMENTARY ALGEBRA and GEOMETRY READINESS DIAGNOSTIC TEST PRACTICE ELEMENTARY ALGEBRA nd GEOMETRY READINESS DIAGNOSTIC TEST PRACTICE Directions: Study the exmples, work the prolems, then check your nswers t the end of ech topic. If you don t get the nswer given, check

More information

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

Lecture 3: Equivalence Relations

Lecture 3: Equivalence Relations Mthcmp Crsh Course Instructor: Pdric Brtlett Lecture 3: Equivlence Reltions Week 1 Mthcmp 2014 In our lst three tlks of this clss, we shift the focus of our tlks from proof techniques to proof concepts

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

Non-Linear & Logistic Regression

Non-Linear & Logistic Regression Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find

More information

The practical version

The practical version Roerto s Notes on Integrl Clculus Chpter 4: Definite integrls nd the FTC Section 7 The Fundmentl Theorem of Clculus: The prcticl version Wht you need to know lredy: The theoreticl version of the FTC. Wht

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

2008 Mathematical Methods (CAS) GA 3: Examination 2

2008 Mathematical Methods (CAS) GA 3: Examination 2 Mthemticl Methods (CAS) GA : Exmintion GENERAL COMMENTS There were 406 students who st the Mthemticl Methods (CAS) exmintion in. Mrks rnged from to 79 out of possible score of 80. Student responses showed

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

Interpreting Integrals and the Fundamental Theorem

Interpreting Integrals and the Fundamental Theorem Interpreting Integrls nd the Fundmentl Theorem Tody, we go further in interpreting the mening of the definite integrl. Using Units to Aid Interprettion We lredy know tht if f(t) is the rte of chnge of

More information

Section 5.1 #7, 10, 16, 21, 25; Section 5.2 #8, 9, 15, 20, 27, 30; Section 5.3 #4, 6, 9, 13, 16, 28, 31; Section 5.4 #7, 18, 21, 23, 25, 29, 40

Section 5.1 #7, 10, 16, 21, 25; Section 5.2 #8, 9, 15, 20, 27, 30; Section 5.3 #4, 6, 9, 13, 16, 28, 31; Section 5.4 #7, 18, 21, 23, 25, 29, 40 Mth B Prof. Audrey Terrs HW # Solutions by Alex Eustis Due Tuesdy, Oct. 9 Section 5. #7,, 6,, 5; Section 5. #8, 9, 5,, 7, 3; Section 5.3 #4, 6, 9, 3, 6, 8, 3; Section 5.4 #7, 8,, 3, 5, 9, 4 5..7 Since

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1.

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1. Mth Anlysis CP WS 4.X- Section 4.-4.4 Review Complete ech question without the use of grphing clcultor.. Compre the mening of the words: roots, zeros nd fctors.. Determine whether - is root of 0. Show

More information

( ) as a fraction. Determine location of the highest

( ) as a fraction. Determine location of the highest AB Clculus Exm Review Sheet - Solutions A. Preclculus Type prolems A1 A2 A3 A4 A5 A6 A7 This is wht you think of doing Find the zeros of f ( x). Set function equl to 0. Fctor or use qudrtic eqution if

More information

AB Calculus Review Sheet

AB Calculus Review Sheet AB Clculus Review Sheet Legend: A Preclculus, B Limits, C Differentil Clculus, D Applictions of Differentil Clculus, E Integrl Clculus, F Applictions of Integrl Clculus, G Prticle Motion nd Rtes This is

More information

( ) where f ( x ) is a. AB Calculus Exam Review Sheet. A. Precalculus Type problems. Find the zeros of f ( x).

( ) where f ( x ) is a. AB Calculus Exam Review Sheet. A. Precalculus Type problems. Find the zeros of f ( x). AB Clculus Exm Review Sheet A. Preclculus Type prolems A1 Find the zeros of f ( x). This is wht you think of doing A2 A3 Find the intersection of f ( x) nd g( x). Show tht f ( x) is even. A4 Show tht f

More information

Chapters Five Notes SN AA U1C5

Chapters Five Notes SN AA U1C5 Chpters Five Notes SN AA U1C5 Nme Period Section 5-: Fctoring Qudrtic Epressions When you took lger, you lerned tht the first thing involved in fctoring is to mke sure to fctor out ny numers or vriles

More information

For the percentage of full time students at RCC the symbols would be:

For the percentage of full time students at RCC the symbols would be: Mth 17/171 Chpter 7- ypothesis Testing with One Smple This chpter is s simple s the previous one, except it is more interesting In this chpter we will test clims concerning the sme prmeters tht we worked

More information

Designing Information Devices and Systems I Spring 2018 Homework 7

Designing Information Devices and Systems I Spring 2018 Homework 7 EECS 16A Designing Informtion Devices nd Systems I Spring 2018 omework 7 This homework is due Mrch 12, 2018, t 23:59. Self-grdes re due Mrch 15, 2018, t 23:59. Sumission Formt Your homework sumission should

More information

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses Chpter 9: Inferences bsed on Two smples: Confidence intervls nd tests of hypotheses 9.1 The trget prmeter : difference between two popultion mens : difference between two popultion proportions : rtio of

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Calculus Module C21. Areas by Integration. Copyright This publication The Northern Alberta Institute of Technology All Rights Reserved.

Calculus Module C21. Areas by Integration. Copyright This publication The Northern Alberta Institute of Technology All Rights Reserved. Clculus Module C Ares Integrtion Copright This puliction The Northern Alert Institute of Technolog 7. All Rights Reserved. LAST REVISED Mrch, 9 Introduction to Ares Integrtion Sttement of Prerequisite

More information

Section 4: Integration ECO4112F 2011

Section 4: Integration ECO4112F 2011 Reding: Ching Chpter Section : Integrtion ECOF Note: These notes do not fully cover the mteril in Ching, ut re ment to supplement your reding in Ching. Thus fr the optimistion you hve covered hs een sttic

More information

7.1 Integral as Net Change and 7.2 Areas in the Plane Calculus

7.1 Integral as Net Change and 7.2 Areas in the Plane Calculus 7.1 Integrl s Net Chnge nd 7. Ares in the Plne Clculus 7.1 INTEGRAL AS NET CHANGE Notecrds from 7.1: Displcement vs Totl Distnce, Integrl s Net Chnge We hve lredy seen how the position of n oject cn e

More information

MA123, Chapter 10: Formulas for integrals: integrals, antiderivatives, and the Fundamental Theorem of Calculus (pp.

MA123, Chapter 10: Formulas for integrals: integrals, antiderivatives, and the Fundamental Theorem of Calculus (pp. MA123, Chpter 1: Formuls for integrls: integrls, ntiderivtives, nd the Fundmentl Theorem of Clculus (pp. 27-233, Gootmn) Chpter Gols: Assignments: Understnd the sttement of the Fundmentl Theorem of Clculus.

More information

Homework Solution - Set 5 Due: Friday 10/03/08

Homework Solution - Set 5 Due: Friday 10/03/08 CE 96 Introduction to the Theory of Computtion ll 2008 Homework olution - et 5 Due: ridy 10/0/08 1. Textook, Pge 86, Exercise 1.21. () 1 2 Add new strt stte nd finl stte. Mke originl finl stte non-finl.

More information

Lecture 3 Gaussian Probability Distribution

Lecture 3 Gaussian Probability Distribution Introduction Lecture 3 Gussin Probbility Distribution Gussin probbility distribution is perhps the most used distribution in ll of science. lso clled bell shped curve or norml distribution Unlike the binomil

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

1.2 What is a vector? (Section 2.2) Two properties (attributes) of a vector are and.

1.2 What is a vector? (Section 2.2) Two properties (attributes) of a vector are and. Homework 1. Chpters 2. Bsis independent vectors nd their properties Show work except for fill-in-lnks-prolems (print.pdf from www.motiongenesis.com Textooks Resources). 1.1 Solving prolems wht engineers

More information

Chapter 6 Continuous Random Variables and Distributions

Chapter 6 Continuous Random Variables and Distributions Chpter 6 Continuous Rndom Vriles nd Distriutions Mny economic nd usiness mesures such s sles investment consumption nd cost cn hve the continuous numericl vlues so tht they cn not e represented y discrete

More information

Name Ima Sample ASU ID

Name Ima Sample ASU ID Nme Im Smple ASU ID 2468024680 CSE 355 Test 1, Fll 2016 30 Septemer 2016, 8:35-9:25.m., LSA 191 Regrding of Midterms If you elieve tht your grde hs not een dded up correctly, return the entire pper to

More information

APPROXIMATE INTEGRATION

APPROXIMATE INTEGRATION APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be

More information

Linear Inequalities. Work Sheet 1

Linear Inequalities. Work Sheet 1 Work Sheet 1 Liner Inequlities Rent--Hep, cr rentl compny,chrges $ 15 per week plus $ 0.0 per mile to rent one of their crs. Suppose you re limited y how much money you cn spend for the week : You cn spend

More information

Things to Memorize: A Partial List. January 27, 2017

Things to Memorize: A Partial List. January 27, 2017 Things to Memorize: A Prtil List Jnury 27, 2017 Chpter 2 Vectors - Bsic Fcts A vector hs mgnitude (lso clled size/length/norm) nd direction. It does not hve fixed position, so the sme vector cn e moved

More information

Continuous Random Variables

Continuous Random Variables CPSC 53 Systems Modeling nd Simultion Continuous Rndom Vriles Dr. Anirn Mhnti Deprtment of Computer Science University of Clgry mhnti@cpsc.uclgry.c Definitions A rndom vrile is sid to e continuous if there

More information

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model: 1 2 MIXED MODELS (Sections 17.7 17.8) Exmple: Suppose tht in the fiber breking strength exmple, the four mchines used were the only ones of interest, but the interest ws over wide rnge of opertors, nd

More information

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O 1 Section 5. The Definite Integrl Suppose tht function f is continuous nd positive over n intervl [, ]. y = f(x) x The re under the grph of f nd ove the x-xis etween nd is denoted y f(x) dx nd clled the

More information

Derivations for maximum likelihood estimation of particle size distribution using in situ video imaging

Derivations for maximum likelihood estimation of particle size distribution using in situ video imaging 2 TWMCC Texs-Wisconsin Modeling nd Control Consortium 1 Technicl report numer 27-1 Derivtions for mximum likelihood estimtion of prticle size distriution using in situ video imging Pul A. Lrsen nd Jmes

More information

5.1 How do we Measure Distance Traveled given Velocity? Student Notes

5.1 How do we Measure Distance Traveled given Velocity? Student Notes . How do we Mesure Distnce Trveled given Velocity? Student Notes EX ) The tle contins velocities of moving cr in ft/sec for time t in seconds: time (sec) 3 velocity (ft/sec) 3 A) Lel the x-xis & y-xis

More information

Preparation for A Level Wadebridge School

Preparation for A Level Wadebridge School Preprtion for A Level Mths @ Wdebridge School Bridging the gp between GCSE nd A Level Nme: CONTENTS Chpter Removing brckets pge Chpter Liner equtions Chpter Simultneous equtions 6 Chpter Fctorising 7 Chpter

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

Name Solutions to Test 3 November 8, 2017

Name Solutions to Test 3 November 8, 2017 Nme Solutions to Test 3 November 8, 07 This test consists of three prts. Plese note tht in prts II nd III, you cn skip one question of those offered. Some possibly useful formuls cn be found below. Brrier

More information

Special Relativity solved examples using an Electrical Analog Circuit

Special Relativity solved examples using an Electrical Analog Circuit 1-1-15 Specil Reltivity solved exmples using n Electricl Anlog Circuit Mourici Shchter mourici@gmil.com mourici@wll.co.il ISRAE, HOON 54-54855 Introduction In this pper, I develop simple nlog electricl

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Bridging the gap: GCSE AS Level

Bridging the gap: GCSE AS Level Bridging the gp: GCSE AS Level CONTENTS Chpter Removing rckets pge Chpter Liner equtions Chpter Simultneous equtions 8 Chpter Fctors 0 Chpter Chnge the suject of the formul Chpter 6 Solving qudrtic equtions

More information

Quadratic Forms. Quadratic Forms

Quadratic Forms. Quadratic Forms Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte

More information

Review of Probability Distributions. CS1538: Introduction to Simulations

Review of Probability Distributions. CS1538: Introduction to Simulations Review of Proility Distriutions CS1538: Introduction to Simultions Some Well-Known Proility Distriutions Bernoulli Binomil Geometric Negtive Binomil Poisson Uniform Exponentil Gmm Erlng Gussin/Norml Relevnce

More information

Continuous Random Variable X:

Continuous Random Variable X: Continuous Rndom Vrile : The continuous rndom vrile hs its vlues in n intervl, nd it hs proility distriution unction or proility density unction p.d. stisies:, 0 & d Which does men tht the totl re under

More information

Comparison Procedures

Comparison Procedures Comprison Procedures Single Fctor, Between-Subects Cse /8/ Comprison Procedures, One-Fctor ANOVA, Between Subects Two Comprison Strtegies post hoc (fter-the-fct) pproch You re interested in discovering

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

8Similarity UNCORRECTED PAGE PROOFS. 8.1 Kick off with CAS 8.2 Similar objects 8.3 Linear scale factors. 8.4 Area and volume scale factors 8.

8Similarity UNCORRECTED PAGE PROOFS. 8.1 Kick off with CAS 8.2 Similar objects 8.3 Linear scale factors. 8.4 Area and volume scale factors 8. 8.1 Kick off with S 8. Similr ojects 8. Liner scle fctors 8Similrity 8. re nd volume scle fctors 8. Review U N O R R E TE D P G E PR O O FS 8.1 Kick off with S Plese refer to the Resources t in the Prelims

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk out solving systems of liner equtions. These re prolems tht give couple of equtions with couple of unknowns, like: 6= x + x 7=

More information

Mathematics Number: Logarithms

Mathematics Number: Logarithms plce of mind F A C U L T Y O F E D U C A T I O N Deprtment of Curriculum nd Pedgogy Mthemtics Numer: Logrithms Science nd Mthemtics Eduction Reserch Group Supported y UBC Teching nd Lerning Enhncement

More information

Linear Systems with Constant Coefficients

Linear Systems with Constant Coefficients Liner Systems with Constnt Coefficients 4-3-05 Here is system of n differentil equtions in n unknowns: x x + + n x n, x x + + n x n, x n n x + + nn x n This is constnt coefficient liner homogeneous system

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Credibility Hypothesis Testing of Fuzzy Triangular Distributions

Credibility Hypothesis Testing of Fuzzy Triangular Distributions 666663 Journl of Uncertin Systems Vol.9, No., pp.6-74, 5 Online t: www.jus.org.uk Credibility Hypothesis Testing of Fuzzy Tringulr Distributions S. Smpth, B. Rmy Received April 3; Revised 4 April 4 Abstrct

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

( ) where f ( x ) is a. AB/BC Calculus Exam Review Sheet. A. Precalculus Type problems. Find the zeros of f ( x).

( ) where f ( x ) is a. AB/BC Calculus Exam Review Sheet. A. Precalculus Type problems. Find the zeros of f ( x). AB/ Clculus Exm Review Sheet A. Preclculus Type prolems A1 Find the zeros of f ( x). This is wht you think of doing A2 Find the intersection of f ( x) nd g( x). A3 Show tht f ( x) is even. A4 Show tht

More information

The Shortest Confidence Interval for the Mean of a Normal Distribution

The Shortest Confidence Interval for the Mean of a Normal Distribution Interntionl Journl of Sttistics nd Proility; Vol. 7, No. 2; Mrch 208 ISSN 927-7032 E-ISSN 927-7040 Pulished y Cndin Center of Science nd Eduction The Shortest Confidence Intervl for the Men of Norml Distriution

More information

Week 10: Line Integrals

Week 10: Line Integrals Week 10: Line Integrls Introduction In this finl week we return to prmetrised curves nd consider integrtion long such curves. We lredy sw this in Week 2 when we integrted long curve to find its length.

More information

Surface maps into free groups

Surface maps into free groups Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The

More information

This chapter will show you. What you should already know. 1 Write down the value of each of the following. a 5 2

This chapter will show you. What you should already know. 1 Write down the value of each of the following. a 5 2 1 Direct vrition 2 Inverse vrition This chpter will show you how to solve prolems where two vriles re connected y reltionship tht vries in direct or inverse proportion Direct proportion Inverse proportion

More information

Quantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16

Quantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16 Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 1 / 16 Non-Signling Boxes The primry lesson from lst lecture

More information

Designing Information Devices and Systems I Fall 2016 Babak Ayazifar, Vladimir Stojanovic Homework 6. This homework is due October 11, 2016, at Noon.

Designing Information Devices and Systems I Fall 2016 Babak Ayazifar, Vladimir Stojanovic Homework 6. This homework is due October 11, 2016, at Noon. EECS 16A Designing Informtion Devices nd Systems I Fll 2016 Bk Ayzifr, Vldimir Stojnovic Homework 6 This homework is due Octoer 11, 2016, t Noon. 1. Homework process nd study group Who else did you work

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

and that at t = 0 the object is at position 5. Find the position of the object at t = 2. 7.2 The Fundmentl Theorem of Clculus 49 re mny, mny problems tht pper much different on the surfce but tht turn out to be the sme s these problems, in the sense tht when we try to pproimte solutions we

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

0.1 THE REAL NUMBER LINE AND ORDER

0.1 THE REAL NUMBER LINE AND ORDER 6000_000.qd //0 :6 AM Pge 0-0- CHAPTER 0 A Preclculus Review 0. THE REAL NUMBER LINE AND ORDER Represent, clssify, nd order rel numers. Use inequlities to represent sets of rel numers. Solve inequlities.

More information

A study of Pythagoras Theorem

A study of Pythagoras Theorem CHAPTER 19 A study of Pythgors Theorem Reson is immortl, ll else mortl. Pythgors, Diogenes Lertius (Lives of Eminent Philosophers) Pythgors Theorem is proly the est-known mthemticl theorem. Even most nonmthemticins

More information

Advanced Algebra & Trigonometry Midterm Review Packet

Advanced Algebra & Trigonometry Midterm Review Packet Nme Dte Advnced Alger & Trigonometry Midterm Review Pcket The Advnced Alger & Trigonometry midterm em will test your generl knowledge of the mteril we hve covered since the eginning of the school yer.

More information

Polynomials and Division Theory

Polynomials and Division Theory Higher Checklist (Unit ) Higher Checklist (Unit ) Polynomils nd Division Theory Skill Achieved? Know tht polynomil (expression) is of the form: n x + n x n + n x n + + n x + x + 0 where the i R re the

More information

Physics 201 Lab 3: Measurement of Earth s local gravitational field I Data Acquisition and Preliminary Analysis Dr. Timothy C. Black Summer I, 2018

Physics 201 Lab 3: Measurement of Earth s local gravitational field I Data Acquisition and Preliminary Analysis Dr. Timothy C. Black Summer I, 2018 Physics 201 Lb 3: Mesurement of Erth s locl grvittionl field I Dt Acquisition nd Preliminry Anlysis Dr. Timothy C. Blck Summer I, 2018 Theoreticl Discussion Grvity is one of the four known fundmentl forces.

More information

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 UNIFORM CONVERGENCE Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 Suppose f n : Ω R or f n : Ω C is sequence of rel or complex functions, nd f n f s n in some sense. Furthermore,

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

4.6 Numerical Integration

4.6 Numerical Integration .6 Numericl Integrtion 5.6 Numericl Integrtion Approimte definite integrl using the Trpezoidl Rule. Approimte definite integrl using Simpson s Rule. Anlze the pproimte errors in the Trpezoidl Rule nd Simpson

More information

MATH 144: Business Calculus Final Review

MATH 144: Business Calculus Final Review MATH 144: Business Clculus Finl Review 1 Skills 1. Clculte severl limits. 2. Find verticl nd horizontl symptotes for given rtionl function. 3. Clculte derivtive by definition. 4. Clculte severl derivtives

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

Lecture 20: Numerical Integration III

Lecture 20: Numerical Integration III cs4: introduction to numericl nlysis /8/0 Lecture 0: Numericl Integrtion III Instructor: Professor Amos Ron Scribes: Mrk Cowlishw, Yunpeng Li, Nthnel Fillmore For the lst few lectures we hve discussed

More information