Primer on Statistical Analysis (Level 2)

Size: px

Start display at page:

Download "Primer on Statistical Analysis (Level 2)"

Anne Robbins
6 years ago
Views:

1 Prmer on Statstcal Analyss (Level ) Table of Contents. Introducton Nonparametrc Statstcs. 3. Sgn Test for Medan. Wlcoxon Rank Sum Test.3 Wlcoxon Sgned Rank Test.4 Kruskal-Walls H-Test 3. Bascs of Desgn of Experments Analyss of Varance One-Way Classfcaton 4. Parwse Comparson of Means (wth Extrapolaton Example) 4.3 Two-Way Classfcaton 4.4 Two-Way wth Replcatons 4.5 Latn Square 4.6 Falure of Assumptons 5. Categorcal Data One-Way Tables 5. Two-Way Tables 6. Lnear Regresson The Bascs 6. Least Squares 6.3 Statstcal Sgnfcance 6.4 Predcton 6.5 Assumptons 6.6 Falure of Assumptons 6.7 Nonlnear Lnear Regresson 6.8 Multple Lnear Regresson

2 . Introducton Level of ths statstcs prmer revewed many of the basc concepts covered n any college level statstcs course. Hopefully, f ths prmer accomplshed ts purpose, the topcs were easer to understand than havng to read them n a book. Havng a frm grasp of the basc terms and procedures n Level s essental for any type of analyss. Granted, the materal tself s a lttle dry and theoretcal, but those concepts form the buldng blocks of the more practcal statstcs that wll be covered n the remander of the prmer. Usng confdence ntervals s fne for answerng smple questons wthout dedcatng a lot of resources. In the real world, however, few thngs are ever smple. The hypothess tests dscussed n Level are also smple technques, but they are more general and wll arse over and over agan n advanced statstcal procedures. Some of those procedures dscussed n Level nclude nonparametrc statstcs, analyss of varance, categorcal data, and regresson. Most software packages (ncludng spreadsheets) can perform many of these technques. As wth Level, ths prmer s ntended for your reference. Don t try to memorze ths stuff people wll thnk you re werd f you do.. Nonparametrc Statstcs An unfortunate drawback of many of the procedures from Level s the assumpton that the random varable of nterest comes from a normal populaton. In many cases, the normalty assumpton s ether nvald (not enough data ponts) or ncorrect (data comes from a dfferent dstrbuton). In other stuatons, the data may not be measurable, quanttatve results. In such nstances, as long as the data can be ranked n some order, the analyst can use statstcal tests that don t rely on any assumptons about the underlyng dstrbuton. Such tests are logcally called dstrbuton-free tests. They make fewer assumptons than the tests dscussed n Level so they are not as powerful, but they are more applcable. Nonparametrc statstcs s a branch of nferental statstcs devoted to dstrbuton free tests. Ths secton wll cover a few common nonparametrc technques. If these technques are not sutable, consult a statstcs book for other tests.. Sgn Test for Medan The smplest of nonparametrc tests, the sgn test, s specfcally desgned for testng hypotheses about the medan of any contnuous populaton. The test s based on takng the dfference between each observaton and the desred medan, M o. Two specal values are defned to account for the dfferences: S + s equal to the number of postve dfferences (.e., observatons greater than M o ) and S - s equal to the number of negatve dfferences (.e., observatons less than M o ). Note that nothng s done about observatons that are equal to M o. Under the null hypothess, S + and S - come from a bnomal dstrbuton wth parameters p = 0.5 and n = S + + S -. The followng summarzes the test: Stat Prmer -3

3 H o H a Test Statstc Rejecton Regon M M o M < M o S = S - p-value = P[Bn(n,0.5) S] < α M M o M > M o S = S + p-value = P[Bn(n,0.5) S] < α M = M o M M o S = max(s +,S - ) p-value = P[Bn(n,0.5) S] < α/ For large samples (n 5), the sgn test s smplfed by takng advantage of the fact that the bnomal dstrbuton can be approxmated by a normal. For such cases, the test statstc s z = S E( S) V ( S) = S 0. 5n ( 05. )( 05. ) n = S 0. 5n 0. 5 n The rejecton regon s z > z α for one-taled tests and z > z α/ for two-taled tests. Ths approxmaton s especally handy when bnomal tables (lke Table of Appendx A) are not avalable. The sgn test can also be used to test two pared populatons by usng the medan of the dfference between the observatons. In other words, the null hypothess wll be somethng lke H o : M(X - Y) = M o (where M(X - Y) s the medan of the dfference of two random varables). Note f M o = 0, the test s equvalent to H o : X ~ Y (.e., X and Y come from the same dstrbuton). Example. The F-3 (Secton 4.3 n Level ) s undergong bomb range tests. Because of the tght budget, only 0 flghts were performed. For each msson, the number of bombs requred to penetrate a hardened bunker were: 3.5,.5, 3.75, 6.0,.5, 4.0, 5.0, 3.5, 7.0, 5.0 (unts are,000 pound bomb equvalents). As the lead analyst on the range, you are asked to determne f 4 bombs wll be suffcent for real bombng mssons. Snce you do not want to assume normalty and 0 data ponts are nsuffcent to nvoke the Central Lmt Theorem, you perform the sgn test as follows: H o : M 4 H a : M > 4 (Remember, you want to put the result you want to prove as H a.) The dfferences are: -0.5, -.5, -0.5,.0, -.5, 0.0,.0, -0.75, 3.0,.0 Test Statstc: S + = 4 and S - = 5 (n = = 9) P[Bn(0.5, 9) S + = 4] = - P[Bn(0.5,9) < 4] = = Snce s greater than any respectable α, you fal to reject the null hypothess and conclude an F-3 carryng four bombs wll take out a hardened bunker at least 50 percent of the tme. You normally wouldn t make a clam that strong by falng to reject the null hypothess, but wth a p-value of t s a pretty safe bet.. Wlcoxon Rank Sum Test When ndependent samples are taken to compare two populatons whch cannot be assumed to be normal, the Wlcoxon rank sum test s usually used. The test s also called the Mann-Whtney rank sum test by some computer programs. The frst step n conductng the rank sum test s to rank the data from to n wth tes gettng the average rank (e.g., f 4 observatons te for the 7 th Stat Prmer -4

4 smallest observaton, each would be gven the rank of ( )/4 = 8.5 and the next observaton s gven a rank of ). The rankngs nclude observatons from both samples. For convenence, defne populaton to be the one wth fewer observatons (.e., n n ). Also, let D and D represent the relatve frequency dstrbutons for populatons and, respectvely. The rank sum statstcs are T and T whch, as the name mples, are merely the sums of the ranks for the observatons n samples and. The test can be summarzed as follows wth <, =, and > referrng to the shapes of D and D (e.g., D > D s read populaton s shfted to the rght of populaton ): H o H a Test Statstc Rejecton Regon D D D < D T = T T T L D D D > D T = T T T U D = D D D T = T T T L or T T U where T L and T U come from Table 4 of Appendx A. Some statstcs books wll go one step further and ntroduce a U-statstc, but t s bascally the same test (no sense complcatng t further). For large sample szes (n and n 0), the rank sum test can take advantage of a normal approxmaton smlar to the one dscussed n the prevous secton. Ths s useful when the T L and T U you need are not on the table or you don t have the table at all. The new test statstc s: T T E(T ) z = = V(T ) nn + n( n + ) nn ( n + n + ) The rejecton regons for the three cases gven n the table above are z < - z α, z > z α, and z > z α/. Example. The PM for the F-3 comes to you wonderng whch of two technques s best for patchng defects n the pant. Accordng to range tests, both technques are equally effectve, so you are lookng for the qucker one. The tmes (n mnutes) for method one are 35, 50, 5, 55, 0, 30, 0. For method two the tmes are 45, 50, 40, 35, 46, 45, 3. At frst glance you guess the frst technque s gong to be faster so you set up your test to prove that. H o : D D H a : D < D The ranks for the observatons are: Stat Prmer -5

5 Technque Technque Tme (mn) Rank Tme (mn) Rank n = 7 T = 45 n = 7 T = 60 Test Statstc: T = T = 45 Rejecton Regon: Usng an α = 0.05, the T L = 39 and T U = 66 Snce 45 > 39 you cannot reject the null hypothess and conclude that there s nsuffcent evdence to suggest the frst technque s faster than the second. The decson must be postponed untl more data s collected or must be based on other factors (e.g., cost, ease of setup, hazardous materals, etc.)..3 Wlcoxon Sgned Rank Test (Matched Pars) The Wlcoxon sgned rank test s smlar to the rank test dscussed above except that t s used for matched pars nstead of ndependent samples. In techncal terms, a matched pars desgn s a randomzed block desgn wth k = treatments (see Secton 3). In Englsh, that means the data for the two populatons consdered are collected n pars. For example, a taste test lookng for dfferences between Coke and Peps wll have a judge submt a score for each drnk (.e., pars of data). The frst step n the sgned rank test s to get the dfferences between the matched pars. Dfferences equal to zero are elmnated and the number of pars n s reduced accordngly. The dfferences are then ranked by absolute value wth tes gettng the average rank. As n the sgn test, specal values are defned for the ranks: T - s the sum of the ranks for the negatve dfferences and T + s the sum of the ranks for the postve dfferences. Usng the same notaton as Secton. ( D > D s read populaton s shfted to the rght of populaton ), the test can be summarzed as follows: H o H a Test Statstc Rejecton Regon D D D < D T = T + T T o D D D > D T = T - T T o D = D D D T = mn(t -, T + ) T T o where T o comes from Table 5 n Appendx A. Just lke the sgn test n Secton. was adapted for pars, the sgned rank test can be adapted for a sngle populaton medan. The only adjustment s to make the dfferences dscussed above be the dfferences between the observatons and the theorzed medan M o. Note that the sgned rank test does not make the assumpton of a contnuous dstrbuton as the sgn test dd. Stat Prmer -6

6 Just lke the other nonparametrc tests, the sgned rank test can also take advantage of the normal approxmaton for large sample szes (n 5). In such cases, the test statstc s: T E(T) z = = V(T) T n(n + ) 4 n(n + )( n + ) 4 The rejecton regon s z < -z α for one-taled tests and z < -z α/ for two-taled tests. Example. Revew the example n Secton.. Assume that the data was collected n a controlled envronment so each patch was done on the same type and sze of defect (.e., collected n pars). Usng the Sgned Rank Test results n: H o : D D H a : D < D The ranks for the observatons are: Technque Patch Dfference Rank N/A n = 6 T + = 6 Test Statstc: T = T + = 6 Rejecton Regon: Usng an α = 0.05, the value for T o s. Snce 6 > you cannot reject the null hypothess. As n Secton., you conclude that there s nsuffcent evdence to suggest the frst technque s faster than the second..4 Kruskal-Walls H-Test The prevous tests are only good for comparng two populatons at a tme. The Kruskal-Walls H-test, however, s desgned to compare the means of k populatons. The test s the nonparametrc equvalent of the analyss of varance (ANOVA) F-test (see Secton 4). It s mportant to look at the assumptons of the Kruskal-Walls H-test before tryng to use t. The frst assumpton s a completely randomzed desgn. Ths smply means that the data used comes from ndependent random samples of n, n,..., n k observatons from the k populatons. Other assumptons are that each sample has at least fve measurements and the observatons can be ranked. Just lke the Wlcoxon rank sum test, the n = n + n n k observatons must be sorted accordng to rank (wth tes gettng the average rank). Also lke the rank sum test, the ranks for each sample are added to form the rank sums T. If the assumptons stated above hold, Stat Prmer -7

7 the test statstc H wll be approxmated by a ch-square dstrbuton wth (k - ) degrees of freedom. Here are the specfcs of the test: H o : The k populaton probablty dstrbutons are dentcal H a : At least two of the k populaton probablty dstrbutons dffer n locaton k T Test statstc: H = n + n n 3( ) ( + ) n = Rejecton Regon: H > χ α,( k ) Example. Gong back to the example n secton. agan, assume there are actually four technques and you want to know f any of them s sgnfcantly better (.e., faster) than the others. The data for the frst two are the same. For method three you have 9 observatons: 5, 35, 30, 45, 0, 40, 3, 34, 8. There are 8 observatons for technque four: 55, 40, 46, 35, 54, 50, 3, 4. You conduct an H-test by creatng the followng table: Tech. Rank Tech. Rank Tech. 3 Rank Tech. 4 Rank T = 95 T = 30.5 T 3 = 88 T 4 = 8.5 From the table you can calculate H: H = ( 3) = ( 3) Also, from Table 7 n Appendx A, you know that a χ wth α = 0.05 and 3 df s Snce > 7.847, you reject H o and conclude that at least one of the four technques s dfferent than the others. From here you can go back and perform some of the two populaton tests to get more nformaton. 3. Bascs of Desgn of Experments Up to ths pont n the prmer there has been menton of completely randomzed desgns and other data collecton technques. The focus has been more on what to do wth the data, rather than how to collect t. That s where desgn of experments (DOE) comes n. Bascally DOE s a procedure for selectng sample data. If done correctly, DOE can save tme and resources by Stat Prmer -8

8 obtanng more nformaton from smaller samples. Here are a few defntons that should be enough to get through ths level of the prmer (Level 3 wll go more n depth on DOE): Block - relatvely homogeneous (smlar) group of expermental unts; observng treatments wthn blocks s a method of elmnatng known sources of data varaton (see Secton 4.3) Expermental Desgn - method used to assgn treatments to experment unts; 4 steps: Select Factors Decde How Much Informaton You Want 3 Choose Treatments & Number of Observatons 4 Choose Expermental Desgn Expermental Unt - object upon whch measurements are made Factors - ndependent varables related to the response varable(s); factors are correlated wth the response(s), hence ther mportance, but they do not necessarly have drect nfluence on the response(s) (see Secton 6.) Level - dfferent levels or settngs of a factor; also called the factor s ntensty Replcaton - number of observatons per treatment Treatment - partcular combnaton of levels for the factors nvolved n an experment Settng up an expermental desgn requres four steps as mentoned above. The frst step nvolves selectng the factors. Ths means dentfyng the parameters that are the object of the study and nvestgatng what factors have an nfluence on them. Usually, the target parameters are the populaton means assocated wth the factor-level combnatons. Once you know what you are lookng for, the next step s to decde how much you want to know about t. That s, decde on the magntude of the standard error(s) that you desre. (The standard error of a statstc s the standard devaton of ts probablty dstrbuton; e.g., for the sample mean x, the standard error s s n ). The thrd step n an expermental desgn s to choose the factor-level combnatons (.e., treatments). Usually, each factor s only tested at two levels f ts effect on the response(s) can be assumed to be lnear. If the assumpton cannot be made, the factor s set at three levels. Occasonally, factors may be assgned more than three levels, but t may complcate the desgn. Once all the treatments are decded for each factor, they are put nto a desgn whch wll accomplsh the desred objectves. Level 3 goes more nto detal about specfc desgns. 4. Analyss of Varance Once data for a desgned experment has been collected, t must be analyzed. The usual technque s some form of analyss of varance (ANOVA). The basc dea behnd an ANOVA s to see whether two (or more) treatment means dffer based on the means of the ndependent random samples. Fgure shows the plots for two cases wth fve measurements for each sample. The open crcles on the left sde are from the frst sample and the sold crcles on the rght are from the second sample. Horzontal lnes pass through the means for the two samples, Stat Prmer -9

9 y and y. For Case A, t seems a far statement to say that the sample means dffer. It seems rght because the dstance (varaton) between the sample means s greater than the varaton wthn the y values for each of the two samples. The opposte s true n Case B whch suggests the sample means do not dffer. y y y y 8 6 y y A Sample Sample Sample Sample B Fgure. Plots of Data for Two Cases 4. One-Way Classfcaton As explaned above, the basc dea behnd an ANOVA s pretty smple. Unfortunately, when t comes tme to actually do one, some math s requred. Luckly, most software packages do all the calculatons for you so you only need to worry about understandng the concepts n the remander of ths secton. The smplest type of ANOVA s the one-way classfcaton of a completely randomzed desgn. Bascally, that means there are a possble treatments to whch expermental unts are assgned randomly (wth the same probablty as the other treatments). Each treatment has n observatons x, x,..., x n. The populaton mean for treatment s represented by µ and the populaton varance by σ (note there s no subscrpt because the varance s assumed to be constant for each treatment). Therefore, the overall populaton mean s gven by: µ = In order to confuse you wth the typcal Mathenese you wll fnd n a text book, here s what the sample mean and varance look lke for treatment : a = a = n µ n s x = n j= = x n j n ( xj x ) j= n Stat Prmer -0

10 Those equatons look pretty complcated, especally wth the lttle everywhere (t s used for two-way classfcatons). They are bascally the same equatons for sample mean and varance gven n Level, except you only use the responses that pertan to treatment. If you understand that, ANOVA wll be no problem for you. Luckly, f you don t understand t, you can let a computer do all the number crunchng so t doesn t matter. All the fancy equatons are nce, but what do you do wth them? You use them n other fancy equatons! Before lstng those, however, t s probably best to take a step back and look at where they ft nto the ANOVA. A one-way classfcaton has two basc assumptons: All x j ~ N(µ, σ ), =,,..., a; j =,,..., n Model: x j = µ + τ + ε j where x j = µ + ε j, ε j ~ N(0,σ ), τ = µ - µ, n τ = 0 Talk about some fancy equatons to mpress your frends! Bascally, the frst assumptons says that each observaton comes from a normal dstrbuton wth a specfc mean for the respectve treatment (µ ) and a constant varance (σ ). The second assumpton states that the basc model for each observaton s the overall mean (µ ) plus the devaton from the mean for the respectve treatment (τ ) plus some random error term (ε j ). All ANOVAs use some form of these two assumptons (unfortunately, they only get more complcated). The ANOVA for a one-way classfcaton looks somethng lke the followng: a = Source SS df MS F Treatment SS T a - SS T /(a - ) MS T /MS E Error SS E n - a SS E /(n - a) Total SS n - Fgure. ANOVA for One-Way Classfcaton The SS column s for the sums of squares whch represents the varablty n the data caused by the source (treatment or error). If the treatment error (SS T ) s very small relatve to the random error (SS E ), you would conclude that the treatment s not sgnfcant to the response varable. To put that quanttatvely, there s the F statstc computed n the last column. It s the rato of the mean square error for treatment (MS T ) to the mean square error (MS E ). Before movng on, t s mportant to note that the MS E s an estmate of the populaton varance σ. Also, you should be warned that statstcans are notorous for developng ther own specal notaton, especally when t comes to regresson and ANOVA. Some classcal statstcans wll even use somethng called the correcton for the mean whch changes SS to SS total(corrected) and adds SS mean and SS total. The terms and symbols used n ths prmer may not be exactly what you see n a text book or computer output, but the basc concepts are the same. Here s the formal hypothess test for the F statstc: H o : µ = µ =... = µ a (.e., the treatments have no affect on the response) Stat Prmer -

11 H a : µ µ j for some j (Ths s equvalent to: H o : τ = τ =... = τ a = 0 and H a : some τ 0) Test Statstc: F = MS T /MS E Rejecton Regon: F > F α, (a-,n-a) If for some unfortunate reason, you do not have access to a computer and you have to compute the ANOVA by hand, here are the equatons you wll need: a a ( j ) ( ) SS E = x + x = n s = n j= a SS T = n x = = ( x ) ( j ) SS = SS + SS = x x = x nx T E a n = j= = j= a n Example. Refer to the F-3 descrpton n Secton 4.3 of Level. After new range testng, the PM brngs you data whch he wants analyzed. The contractor has been expermentng wth dfferent pantng technques to get a better sorte rate. The three technques take an equal amount of tme to pant the F-3, but they seem to dffer n how long they last before defects form durng flght. The followng table lsts the hours of flght tme before the pant needs to be fxed or reappled: Pantng Technque Totals,96 3,099 4,78 The PM wants to know f there s any sgnfcant dfference between the technques. Beng an analyst on a bg program lke the F-3, you re very happy that you have access to computers so you don t have to do the tedous calculatons by hand. Frst you set up the formal hypothess test: j H o : µ = µ = µ 3 H a : At least two of the three means dffer Then you let the computer crunch some numbers and get: Stat Prmer -

12 Source SS df MS F F.o5,(,7) p-value Treatment Error Total From here you conclude that you must reject H o because F > F α, (a-,n-a) (the same concluson s drawn by notng that p-value = > α = 0.05). Therefore, wth a 5 percent (α) chance of beng wrong, you tell the PM that at least two of the three technques dffer n duraton. The next secton wll expand on what can be done from here. 4. Parwse Comparson of Means (wth Extrapolaton Example) If the null hypothess of a one-way classfcaton s rejected, there are several other tests that can be done to gan addtonal nformaton about the treatments. The most common of these s the a parwse comparson of means. Bascally, the comparson checks all or some of the possble ( ) pars of treatment means to see whch ones are not equal. There are three methods normally used: Least Sgnfcant Dfference (LSD): H o : µ = µ j (repeat as desred for all =,,..., a & j =,,..., a wth j) H a : µ µ j x x j Test Statstc: t = MS E + n n j Rejecton Regon: t > t α/, (n-a) Recall that a hypothess test can also be performed as a confdence nterval: x x j ± t n a + α /,( ) MS E n n ( ) Smultaneous Bonferron CIs: a H o : µ = µ j (test any m lnear combnatons up to ( ) ) H a : µ µ j x x j Test Statstc: t = (same as LSD) MS E + n n j Rejecton Regon: t > t (α/)/m, (n-a) (note change n level of confdence) Studentzed Range (Tukey): a H o : µ = µ j (tests all ( ) parwse combnatons) j Stat Prmer -3

13 H a : µ µ j Test Statstc: t = x x j + MS E n n j Rejecton Regon: t > q α, (a, n-a) Requres n = n =... = n a to be an exact test The percentage ponts of the studentzed range, q(p,v), can be found n Tables and 3 of Appendx A for α = 0.05 and 0.0, respectvely There are other tests, such as the contrast of means test, but they are rarely used n practce. Some computer packages and text books may cover them, but t s no bg loss f they don t. Example. After performng the ANOVA n the prevous secton, you wsely stop yourself before gong to the PM. You know at least two of the means dffer, but you fgure you should know whch ones before presentng your fndngs. Snce there are 0 observatons for each treatment, the Tukey method for a parwse comparson of means wll be exact. You also realze that the computer software you orgnally used to do the ANOVA can also do the Tukey comparsons (f you tell t to). You run back to the computer and get the followng results: Tukey Comparsons Comparson Estmate ( x x ) 95% CI Sgnfcant j Techs & [ , 07.05] No Techs & [ , ] Yes Techs & [ , ] No The results from the software show that only one confdence nterval does not contan zero. Therefore, technques and 3 dffer sgnfcantly (technque 3 lasts longer than snce all ponts n the CI are negatve). Notce, however, that there s no sgnfcant dfference between and or between and 3. For the beneft of those underprvleged offcers who may not have access to hgh tech computers, you decde to repeat the comparson of technques and 3 by hand: H o : µ = µ 3 H a : µ µ Test Statstc: t = = Rejecton Regon: From Table of Appendx A, you extrapolate q.o5, (3, 7) by solvng: x = x = 3.5 Snce 3.7 > 3.5, you reject H o and conclude there s a sgnfcant dfference between the mean duraton of the pant appled by technques and 3. Note that you really dd not Stat Prmer -4

14 need to extrapolate snce 3.7 s also larger than the more conservatve The extrapolaton was done as a demonstraton. 4.3 Two-Way Classfcaton A natural extenson of the one-way classfcaton s to add a second factor. Statstcans have creatvely called ths a two-way classfcaton. An mportant applcaton of the second factor s to account for subject varablty, whch wll be drven home wth an example. Now, just to confuse the readers, most statstcs books change notaton from one-way to two-way classfcatons. In order to avod upsettng the statstcs world too much, ths secton wll use the most common notaton. For a two-way classfcaton there are r treatments and c blocks wth one observaton per block per treatment (replcatons wll be consdered later). Smlar to the oneway classfcaton, each treatment has a populaton mean represented by µ and the populaton varance by σ (note there s no subscrpt because the varance s assumed to be constant for each treatment). Also, each block j has a populaton mean µ j and varance σ. Block and treatment sample means and varances are computed as they are for the one-way classfcaton. The populaton mean for each treatment-block par s µ j whch really can t be estmated snce there are no replcatons. A two-way classfcaton has two basc assumptons: All x j ~ N(µ j, σ ), =,,..., r; j =,,..., c Model: x j = µ + τ + β j + ε j where x j = µ j + ε j, ε j ~ N(0,σ ), τ = µ - µ, r τ = = 0, β j = µ j - µ, β j = 0 c j= The nterpretaton of the assumptons s smlar to that of a one-way classfcaton, but a lttle more complcated. The two-way ANOVA looks somethng lke the followng: Source SS df MS F Treatment SS T r - SS T /(r - ) MS T /MS E Block SS B c - SS B /(c - ) MS B /MS E Error SS E (r-)(c-) SS E /(r-)(c-) Total SS rc - Fgure 3. ANOVA for Two-Way Classfcaton The equatons gven n Secton 4. for SS T and SS E are the same (after adjustng the ranges of the summatons). The formula for SS only requres one modfcaton: SS = SS T + SS B + SS E. The equaton for the new term s: c ( j ) SS B = r x x Here are the formal hypothess tests for the two F statstcs: H o : τ = τ =... = τ r = 0 (.e., the treatments have no affect on the response) H a : some τ 0 j= Stat Prmer -5

15 Test Statstc: F = MS T /MS E Rejecton Regon: F > F α, (r-,(r-)(c-)) H o : β = β =... = β c = 0 (.e., the blocks have no affect on the response) H a : some β j 0 Test Statstc: F = MS B /MS E Rejecton Regon: F > F α, (c-,(r-)(c-)) You may be wonderng why someone would go through the extra trouble of dong a two-way classfcaton. There are more equatons, but by blockng the data, you can remove known (or suspected) sources of varaton. That means you can get the same senstvty (MS E ) as a one-way classfcaton usng less data. If collectng data consumes a lot of resources, ths s a good thng. The relatve effcency R of a two-way classfcaton tells how many tmes as many observatons you would need to obtan the same senstvty wth a one-way versus a two-way classfcaton. The value can be found by: σ one-way ( c ) MSB + c( r ) MS R = = σ ( rc ) MS two-way Example. Supposed you are busy revewng bomb run data from the F-3. You notce that there are three dfference methods used to deploy the bomb n queston. Whle organzng the data, you also notce that there are four dfferent plots durng the test flghts. The data for the bomb mss dstances n meters s lsted here: Plot Totals Means Method Totals Means You decde to perform a two-way classfcaton ANOVA to determne f ether the bombng method or the plots have a sgnfcant mpact on the bomb mss dstances. As shown n the table above, the bombng method s the treatment and the plots are the blocks. Source SS df MS F F.o5,(-,-) p-value Method Plot Error Total Accordng to the computer output, the bombng method tself does not appear to have a sgnfcant mpact on the bomb mss dstance (p-value = > α = 0.05). On the other hand, E E Stat Prmer -6

16 the plots have enough varaton between them that t does matter whch plot s flyng the msson as to what the bomb mss dstance s. The crtcal F statstcs n the table are computed wth (,6) df and (3,6) df for the bombng methods and plots, respectvely (n case you really feel the urge to verfy the table by hand; Table 9 of Appendx A). Wthout a two-way classfcaton, the mpact of the plots would have been mxed n wth the bombng methods. In other words, t may have appeared that the methods themselves were sgnfcantly dfferent, when n fact, they aren t. 4.4 Two-Way wth Replcatons An easy way to mpress your frends and complcate the notaton s to add replcatons to a twoway classfcaton. Havng replcatons s actually a good thng because you gan more nformaton about the populaton. A two-way classfcaton wth m observatons per cell can have the same nformaton descrbed above n addton to a term for nteractons (a cell s a treatment-block par). In the bomb mss dstance example just dscussed, the nteracton term can tell whether there s a sgnfcant relatonshp between the plots and the bombng methods. For example, plots and 3 mght be best wth method, but plot s best wth method 3. In such a stuaton, there would be some nteracton between the plots and the bombng methods. If all plots had smlar standngs among the methods, the nteracton would not be sgnfcant. The notaton s complcated by addng a thrd subscrpt to denote the replcaton. Therefore, x jk s the k th observaton of treatment and block j. The equatons gven thus far can be modfed to use the new subscrpt by just addng over all k. Also, the equaton for SS now ncludes the SS I (Sum of Squares for Interacton) term as part of the sum. The changes begn wth the two basc assumptons: All x jk ~ N(µ j, σ ), =,,..., r; j =,,..., c, k =,,..., m (Note t s µ j because the observaton k does not affect the mean.) Model: x jk = µ + τ + β j + γ j + ε jk where x jk = µ j + ε jk, ε jk ~ N(0,σ ), τ = µ - µ, c β j = j= r r τ = = c 0, β j = µ j - µ, 0, γ j = 0 j =,,..., c, γ j = 0 =,,..., = j= r Agan, the nterpretatons of the assumptons are smlar to before (but much more complcated). The mportant parts of the assumptons wll be dscussed n Secton 4.6 so you don t have to worry f you can t recte these n your sleep. The revsed ANOVA looks somethng lke the followng: Stat Prmer -7

17 Source SS df MS F Treatment SS T r - SS T /(r - ) MS T /MS E Block SS B c - SS B /(c - ) MS B /MS E Interacton SS I (r-)(c-) SS I /(r-)(c-) MS I /MS E Error SS E rc(m-) SS E /rc(m-) Total SS rcm - The equaton for the new term s: Fgure 4. ANOVA for Two-Way wth Replcatons r c ( j j ) SS I = m x x x + x Here are the formal hypothess tests for the three F statstcs: = j= H o : τ = τ =... = τ r = 0 (.e., the treatments have no affect on the response) H a : some τ 0 Test Statstc: F = MS T /MS E Rejecton Regon: F > F α, (r-,rc(m-)) H o : β = β =... = β c = 0 (.e., the blocks have no affect on the response) H a : some β j 0 Test Statstc: F = MS B /MS E Rejecton Regon: F > F α, (c-, rc(m-)) H o : γ j = 0 =,,..., r, j =,,..., c (.e., there s no nteracton) H a : some γ j 0 Test Statstc: F = MS I /MS E Rejecton Regon: F > F α, ((r-)(c-), rc(m-)) If you are attemptng to do a two-way classfcaton wth replcatons, you had better have a computer or you ll spend all your tme performng calculatons and you ll never get to the analyss part. Hopefully by now you understand how to nterpret the output from an ANOVA so there s really no need for another example on two-way classfcatons (t also saves tme, paper, and nk). 4.5 Latn Square If you understand one and two-way classfcatons, t s tme to step nto somethng a lttle further out there (there beng defned as anywhere you don t want to be). A latn square s smlar to a one-way classfcaton n that there s one factor wth a possble treatments. In addton to that, there are two other factors wth a treatments each (that s a total of three factors for the mathematcally challenged). The extra factors set up the data n such a way that t s possble to reduce the MS E wthout ncreasng the number of observatons. It gets even better these Stat Prmer -8

18 factors don t necessarly have to be under your control as wll be demonstrated shortly. Fgure 5 shows a typcal latn square desgn wth a = 4. Factor B Factor A Fgure 5. Latn Square wth a = 4 To read the table, select a cell. The number n the cell ndcates what level to set the man factor. The row and column ndcate the levels for the addtonal two factors. The basc thngs to remember n settng up a latn square s that there wll be a observatons and each treatment occurs only once n each row and once n each column. As you may suspect, collectng so lttle data (a observatons) when there are so many possble combnatons of factors (a 3 ) means that ganng addtonal nformaton lke nteractons s not possble. On the other hand, a latn square by defnton wll gve you an orthogonal desgn whch s hghly desrable and wll be dscussed n more detal n Level 3. Some text books may show desgns for several values of a, but there s no unque way to set up a latn square. A classc example for latn squares s a feld on a farm. In order to test several dfferent technques, the farmer must spread them evenly over the feld n order to reduce the varaton caused by the condtons n the feld. (Parts of the feld may receve more or less water, or more or less sunlght, or have dfferent nutrents n the sol, etc.) In ths example, the man factor could be the type or amount of fertlzer or the type of seeds used. The secondary factors would be the grd coordnates of the physcal locaton n the feld. Notce that the farmer does not drectly control the secondary factors, but uses them to hs advantage anyway. Hopefully you understand the basc set up and purpose for a latn square because t s tme to ht the math agan. Just lke the prevous types of ANOVA dscussed, latn squares have certan assumptons: All x jk ~ N(µ jk, σ ), =,,..., a; j =,,..., a, k =,,..., a (not all are observed) Model: x jk = µ + τ + β j + δ k + ε jk where x jk = µ jk + ε jk, ε jk ~ N(0,σ ), τ = µ - µ, a β j = j= 0, δ k = µ k - µ, δ k = 0 a k = a τ = = 0, β j = µ j - µ, The revsed ANOVA looks somethng lke the followng: Stat Prmer -9

19 Source SS df MS F Rows SS R a - SS R /(a - ) MS R /MS E Columns SS C a - SS C /(a - ) MS C /MS E Treatment SS T a - SS T /(a - ) MS T /MS E Error SS E (a-)(a-) SS E /(a-)(a-) Total SS a - Fgure 6. ANOVA for Latn Square Here are the formal hypothess tests for whch the three F statstcs: H o : τ = τ =... = τ a = 0 (.e., the row treatments have no affect on the response) H a : some τ 0 Test Statstc: F = MS R /MS E Rejecton Regon: F > F α, (a-,(a-)(a-)) H o : β = β =... = β a = 0 (.e., the column treatments have no affect on the response) H a : some β j 0 Test Statstc: F = MS C /MS E Rejecton Regon: F > F α, (a-, (a-)(a-)) H o : δ = δ = = δ a = 0 (.e., the man treatment has no affect on the response) H a : some δ k 0 Test Statstc: F = MS T /MS E Rejecton Regon: F > F α, ((a-), (a-)(a-)) Even though no one n ther rght mnd would try to do stuff lke ths by hand, there s the occasonal masochst so here are the necessary equatons: a ( ) SS R = a x x a = a ( j ) SS C = a x x j= a ( k ) SS T = a x x k = ( jk j k ) SS E = x x x x + x = a j= x ( jk x ) SS = SSR + SSC + SST + SSE = Don t those look lke fun? Unfortunately, many software programs stll do not ncorporate latn squares so you may have to fnd out just how much fun t really s. If you have access to software that can perform two-way ANOVAs though, you mght be able to persuade t do a = a j= Stat Prmer -0

20 perform a latn square f you ask t ncely. The way to do that s to duplcate and rearrange the data so the rows correspond to the man treatment. In other words, reorganze the data so all the values n the frst row correspond to the frst treatment, all the data ponts n the second row correspond to the second, etc. Now run a two-way usng the orgnal data (.e., rows and columns are the treatment and block for the two-way). From here you can extract the values for SS, SS R, and SS C. Next you have to run another two-way on the rearranged data. The new term wll be SS T (f you re quck, you ll notce the other numbers are the same ones you got n the prevous ANOVA). The fnal step s to compute SS E = SS - SS R - SS C - SS T. Wth all these values t s pretty easy to fll n the rest of the table. The example below shows how to do ths trck n Mcrosoft Excel. Oh, thngs can get more complcated f you wsh. Just lke the one-way ANOVA dscussed n Secton 4., once you determne a treatment s sgnfcant, you probably want more nformaton. Back then there was somethng called parwse comparsons. That s exactly what you wll do wth the data from the latn square see some thngs n the statstcs world are smple. The way statstcans hold ther job securty s by not tellng you what the subtle dfferences are. For example, you use MS E as the estmated varance for all the comparsons whch means you now use [a,(a-)(a-)] degrees of freedom nstead of what s wrtten s Secton 4.. Example. You are probably thnkng that all ths stuff doesn t sound easy, but when you have the computng power, a latn square s actually your frend. It was already shown that a latn square can be used for physcal areas (farm example), but t can also account for tme. Contnung the F-3 example, you are approached by the program drector who s concerned about the safety of the workers n the pant shop. They wear protectve gear, but t s only desgned to protect up to certan tolerances and regardless of the suts, t s always best to keep levels of dangerous chemcals as low as possble (as to not offend the envronmentalsts). There are fve dstnct processes that nvolve the use of a certan hazardous chemcal. You need to do some prelmnary analyss before tacklng such a large problem so you look over hstorcal data of recorded amounts of the chemcal n the ar (n parts per mllon, ppm) and come up wth a latn square desgn wth a = 5: Day Week M T W Th F Mean 8 (D) 7 (C) 4 (A) (B) 7 (E) x = (C) 34 (B) (E) 6 (A) 5 (D) x = (A) 9 (D) 3 (B) 7 (E) 3 (C) x 3 = (E) 3 (A) 4 (C) 3 (D) 5 (B) x 4 =.0 5 (B) 6 (E) 6 (D) 3 (C) 7 (A) x 5 =. Mean x = 5. x = 3.8 x 3 = 3.4 x 4 = 5. x 5 = 5.4 x = 0.6 The desgnatons A through E label the dfferent processes. The varous sample means are ncluded for those sck people who lke to try thngs by hand. In addton to those, you wll need to go through the table and get sample means for the process treatments. Those are: Stat Prmer -

21 x =.4 x 4 = 3.8 x = 6.6 x 5 =.6 x 3 = 9.6 Now you can go through all those equatons or you can skllfully demonstrate your prowess on the computer and come up wth the followng results (see Appendx B for the Excel calculatons): Source SS df MS F F.o5,(4,) P-value Rows Columns Treatment Error Total The nformaton above ndcates that both the processes (treatments) and the days of the week (columns) cause sgnfcant varaton n the data. If a smple one-way classfcaton was done nstead, the fact that days of the week are sgnfcant would not have been notced. There are several explanatons why the days may be mportant. Two smple ones would be that the workers aren t as productve on Mondays and Frdays. Another one s that the chemcal may buld up durng the week. These stuatons would have to be nvestgated further. Knowng that there s also varaton caused by the processes, you can perform more analyses to determne whch processes are releasng the largest amounts of the hazardous chemcal. The Tukey comparsons dscussed n Secton 4. can provde that nformaton. Remember that the degrees of freedom wll be dfferent (5 & n ths case). 4.6 Falure of Assumptons After menton of t n Secton, the many assumptons made for each ANOVA should have been dreadfully obvous. Luckly, those assumptons are vald n most cases. If they do not hold, however, the tests may not be accurate. Ths secton wll revew the assumptons of normalty and constant varance for the error terms (ε). Of course, workng wth the actual error terms s mpossble because you need to know the actual populaton parameters to compute ε. As you may already suspect, we have to estmate the error terms. Those estmates are called resduals (e) and are defned to be the dfference between the observed values and the ftted values. An observed value s the data that s collected whle ftted values are those computed by the model developed from the data. The normalty assumpton s used to derve the F-tests for an ANOVA. The easest way to verfy the assumpton s to compute the resduals and then calculate standardzed resduals (e/ms E ) whch should be dstrbuted as a standard normal. From here, there are three technques. The frst s to check for standardzed resduals greater than 3 n absolute value (recall that 99 percent of them should fall wthn the nterval from -3 to 3). The second test nvolves formng a hstogram and examnng the shape. The wdth of each bn s extremely mportant because t has a serous mpact on the shape of the hstogram (some software programs can determne wdths Stat Prmer -

22 for you). The most dffcult and most accurate test s to calculate a Q-Q plot as descrbed n Secton 5.3 of Level. Luckly, ths opton requres just as many keystrokes on a computer as the other ones. If the checks for normalty ndcate that the resduals are not normal there are two optons. The frst s to gnore the problem. Before you get too excted, ths opton s only avalable f t s only a moderate departure from normalty. The reason for gnorng the problem s that the test statstcs wll only dffer slghtly from what they would be f the assumpton was vald. If there s a serous departure from normalty, the second opton requres the nonparametrc F-test. The frst step n performng ths test s to rank all the observatons n ncreasng order wth tes gettng the average rank (see Secton.). Then repeat the ANOVA usng the ranked data. If the normalty assumpton turns out to be vald based on the checks dscussed above, you re not out of hot water yet because you stll have to check for constant varance. That assumpton s used to prove that the MS E s an unbased estmate of the populaton varance for the error terms. One way to test the assumpton s to perform Bartlett s Test whch extends a smple comparson of two varances (see Secton C.4 n Level ) to a varances: H o : σ = σ = = σ H a : some σ σ j Test Statstc: X = M/C Rejecton Regon: X > χ α, (a-) a [ ] a a [ ( ) ln( E )] ( ) ln ( ) M = n MS n s = = a C = + a 3 a n ( ) = ( n ) Unfortunately, Bartlett s Test only tells whether the constant varance assumpton s vald or not. It does not suggest any ways to remedy the stuaton. The Bartlett Test s also unthnkable wthout a computer so a graphcal method s sometmes employed to verfy the assumpton. The graphcal technque bascally tres to fnd a pattern n the plot of the resduals versus the ftted values. Fgure 7 summarzes the most common departures and correctons for the constant varance assumpton. There are many other departures as well as many other tests, ncludng some that get nasty enough to use dervatves. The materal n ths secton should be enough for most cases. If you re unfortunate enough to encounter a case where ths s not enough, consult a statstcs text book and request dvne nterventon. = Stat Prmer -3

23 Type of Resduals Plot Correcton e e ~ N(0,σ ) 0 None Needed (t s the assumpton) ftted e σ ncreases wth ftted values ln transform data; redo ANOVA ftted e σ decreases wth ftted values ln transform data; redo ANOVA ftted e Posson transform data; redo ANOVA ftted There s no set plot for Bnomal bnomal sn - transform; redo ANOVA resduals. e ftted Fgure 7. Graphcal Method for Non-Constant Varance 5. Categorcal Data The ANOVA just dscussed s a very powerful technque, but there s a large class of data for whch t s nvald. Because of the normalty assumpton, ANOVA techncally cannot be performed on dscrete data, lke counts for surveys. The most common type of categorcal data comes from a multnomal dstrbuton. That s a fancy name for a generc fnte dscrete probablty dstrbuton wth k possble outcomes (e.g., bnomal has k = ). The populaton parameters of nterest are p, p,..., p k, where p s the probablty of the th outcome. As you Stat Prmer -4

24 would expect, p + p p k = (see Level, Secton 4.). A multnomal varable may also be called a qualtatve varable because the only nformaton t provdes s whch of the k bns t belongs to. The bn tself can provde further nformaton (e.g., Peps drnker, Coke drnker, etc.). 5. One-Way Tables If there s only one qualtatve varable for an experment, the data s arranged n a one-way table as shown n Fgure 8. Category... k Total Count n n... n k n Proporton p p... p k Fgure 8. One-Way Table of Category Counts The values n, n,..., n k represent the category counts and n = n + n n k s the total number of observatons. It s smple to estmate the populaton probabltes dscussed above because a multnomal experment can always be reduced to a bnomal experment by solatng one category. For example, the estmate for the th category s p n = n Smlar to a bnomal dstrbuton, when n s large, p wll be approxmately normally dstrbuted wth and E( p ) = p p ( p ) V ( p ) = n You may recall from Level that you can do smple confdence ntervals (or hypothess tests) for ndvdual populaton proportons as well as for dfferences between any par of proportons. As a remnder, f n s large, the ( - α)00% confdence nterval for p s p ± z α / p ( p ) n whch s exactly the same as the CI for a bnomal proporton gven n Secton C.3 of Level. A dfference of proportons s a lttle more dffcult because they are no longer ndependent. It can be shown that Cov(n,n j ) = -np p j and Cov( p, p j ) = -p p j /n. These come n handy n calculatng the varance of the dfference between p and p j V( p - p j ) = V( p ) + V( p j ) - Cov( p, p j ) Stat Prmer -5

25 Puttng n the values and applyng your knowledge form Level (don t panc), you can derve the CI for (p - p j ) ( j ) p p ± z α / p ( p ) + p ( p ) + p p j j j The prevous two confdence ntervals are useful, but t can be pretty tedous to calculate one for k all k populaton proportons and all ( ) n pars of proportons. That s where some nasty math comes nto play wth a weghted sum of squared devatons between observed and expected cell counts a good topc for conversaton at partes. It sounds mpressve, but t s really not that dffcult (especally f a computer s dong all the number crunchng). Here s a summary of the hypothess test for all populaton proportons: H o : p = p,o, p = p,o,..., p k = p k,o (p,o s hypotheszed value for category ) H a : some p p,o Test Statstc: X = k [ n E( n )] = E( n ) Rejecton Regon: X > χ α, (k-), where E(n ) = np,o The only assumpton ths test makes s that E(n ) 5 for all. That s not askng too much s t? Agan, t may look complcated (all thngs wth cool formulas do), but t s pretty smple. The followng example wll prove t. Example. Contnung wth the F-3 example, you are lookng at the Vper s susceptblty to detecton. The plane was desgned to have a typcal four spke sgnature as shown here: That s, the strongest reflected sgnal s at 45 degrees from the F-3 s headng (marker ). For smplcty you only consder eght bearngs for the trackng radar stes: s the F-3 s headng and each bearng s 45 degrees from the next. The contractor clamed that 90 percent of all detectons would come from the even bearngs (45 degrees off). You want to verfy that clam because t wll make msson plannng easer (you ll know how to fly the mssons to avod detectons). Assumng a unform spread of detectons between the four large spkes (and four smaller spkes), the expected proportons of detectons are: Bearng P(Detect) Stat Prmer -6

26 You have data from the flght range that tells you how many detectons there were from each bearng: Bearng Total # Detects You set up the hypothess test as dscussed n ths secton: H o : p = 0.05, p = 0.5,..., p 8 = 0.5 H a : some p p,o Rejecton Regon: X > χ α, 7df = (Table 7, Appendx A) k [ n E( n ) ] X = = ( ) ( ) ( ) = 7. 9 = E( n ) Snce 7.9 > 4.067, you reject the null hypothess. Havng access to those hgh tech computers, you d probably just look at the p-value (.e. P(χ.o5,7) 7.9) whch n ths case s Granted ths s only a test based on 98 detectons at the range under operatonal condtons so t does not necessarly prove the contractor faled to meet the specfcatons (f t ddn t meet them you wouldn t have the plane n OT&E). 5. Two-Way Tables Occasonally, you may have more than one type of category. A classc example s an electon poll where the data s collected based on poltcal party of the ndvduals and the canddate they plan to vote for. Another nvolves breakng out survey results based on demographc data. The generc layout out of such a two-way or contngency table s shown n Fgure 9. Column Row c Totals n n n c R n n n c R Row Column Totals r n r n r n rc R r C C C c n Fgure 9. Two-Way Table of Category Counts where n j s the number of observed counts for row and column j C j = n j + n j + + n rj R = n + n + + n c n = C + C + + C c = R + R + + R r = r c n j = j= Stat Prmer -7

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to