Chapter 3. Extensions

Size: px

Start display at page:

Download "Chapter 3. Extensions"

Erick Simon
5 years ago
Views:

1 Chpter 3 Extensions EXTENSION 3E: TESTS OF REPLICATION In Chpter 3, we ssumed tht the only comprison of interest in the two-group cse is tht between cell men model nd grnd men model. Tht is, we hve compred the full model of Y i = μ + ε i, (3.7, repeted) with the model obtined when we impose the restriction tht μ = μ = μ. However, this is certinly not the only restriction on the mens tht would be possible. Occsionlly, you cn mke more specific sttement of the results you expect to obtin. This is most often true when your study is replicting previous reserch tht provided detiled informtion bout the phenomen under investigtion. As long s you cn express your expecttion s restriction on the vlues of liner combintion of the prmeters of the full model, the sme generl form of our F test llows you to crry out comprison of the resulting models. For exmple, you my wish to impose restriction similr to tht used in the one-group cse in which you specify the exct vlue of one or both of the popultion mens present in the full model. To extend the numericl exmple involving the hyperctive-children dt presented in Tble 3. of the text, we might hypothesize tht popultion of hyperctive children nd popultion of nonhyperctive children would both hve men IQ of 98 tht is, μ = μ = 98. (3E.) In this cse, our restricted model would simply be Y i = 98 + ε i. (3E.) Thus no prmeters need to be estimted, nd hence the degrees of freedom ssocited with the model would be n + n. As second exmple, you my wish to specify numericl vlues for the popultion mens in your restriction but llow them to differ between the two groups. This lso would rise in

2 Chpter 3 situtions in which you re replicting previous reserch. Perhps you crried out n extensive study of hyperctive children in one school yer nd found the men IQ of ll identified hyperctive children ws 06, wheres tht of the remining children ws 98. If two yers lter you wondered whether the vlues hd remined the sme nd wnted to mke udgment on the bsis of smple of the cses, you could specify these exct vlues s your null hypothesis or restriction. Tht is, your restricted model would be Y i = 06 + ε i, Y i = 98 + ε i. (3E.3) Once gin, no prmeters must be estimted, nd so df R = n + n. As with ny model, the sum of squred devitions from the specified prmeter vlues could be used s mesure of the dequcy of this model nd be compred with tht ssocited with the full model. In generl, if we let c stnd for the constnt specified in such restriction, we could write our restricted model s or equivlently, Y i = c + ε i, Y i = c + ε i, Y i = c + ε i. The error term used s mesure of the dequcy of such model would then be E = e = ( Y c ). (3E.4) R i ir As third exmple, you my wish to specify only tht the difference between groups is equl to some specified vlue. Thus if the hyperctive-group men hd been estimted t 06 nd the norml-group men t 98, you might test the hypothesis with new smple tht the hyperctive men would be 8 points higher thn the norml men. This would llow for the opertion of fctors such s chnging demogrphic chrcteristics of the popultion being smpled, which might cuse the IQ scores to generlly increse or decrese. The null hypothesis could still be stted esily s μ μ = 8. It is bit wkwrd to stte the restricted model in this cse, but thinking through the formultion of the model illustrtes gin the flexibility of the model-comprison pproch. In this cse, we do not wish to plce ny constrints on the grnd men, yet we wish to specify the mgnitude of the between-group difference t 8 points. We cn ccomplish this by specifying tht the hyperctive-group men will be 4 points bove the grnd men nd tht the norml-group men will be 4 points below the grnd men tht is, i i Y i = μ ε i, Y i = μ 4 + ε i. (3E.5) Arriving t lest-squres estimte of μ in this context is slightly different problem thn encountered previously. However, we cn solve the problem by trnslting it into form we

3 Extensions 3 considered in the one-group cse. By subtrcting 4 from both sides of the eqution for the Y i scores nd dding 4 to both sides of the eqution for the Y i scores in Eqution 3E.5, we obtin Y i 4 = μ + ε i, Y i + 4 = μ + ε i. (3E.6) This is now essentilly the sme estimtion problem tht we used to introduce the lest-squres criterion in the one-smple cse. There we showed tht the lest-squres estimte of μ is the men of ll scores on the left side of the equtions, which here would imply tking the men of set of trnsformed scores, with the scores from Group being 4 less thn those observed nd the scores in Group being 4 greter thn those observed. In the equl-n cse, these trnsformtions cncel ech other, nd the estimte of μ would be the sme s in conventionl restricted model. In the unequl-n cse, the procedure described would generlly result in somewht different estimte of the grnd men, with the effect tht the predictions for the lrger group re closer to the men for tht group thn is the cse for the smller group. In ny event, the errors of prediction re generlly different for this restricted model thn for conventionl model. In this cse, we hve E = ( Y 4 µ ˆ) = ( Y + 4 µ ˆ), (3E.7) R i i i i where ˆµ is the men of the trnsformed scores, s described previously. This test, like the others considered in this chpter, ssumes tht the popultion vrinces of the different groups re equl. We discuss this ssumption in more detil in the section of Chpter 3 entitled Sttisticl Assumptions nd present procedures there for testing the ssumption. In the cse in which it is concluded tht the vrinces re heterogeneous, refer to Wilcox (985) for n lterntive procedure for determining if the difference between two-group mens differ by more thn specified constnt. Additionl techniques for imposing constrints on combintions of prmeter vlues re considered in the following chpters. To try to prevent ny misunderstnding tht might be suggested by the lbel test of repliction, we should stress tht the tests we introduce in this section follow the strtegy of identifying the constrint on the prmeters with the restricted model or the null hypothesis being tested. This llows one to detect if the dt deprt significntly from wht would be expected under this null hypothesis. A significnt result then would men filure to replicte. Note tht the identifiction here of the theoreticl expecttion with the null hypothesis is different from the usul sitution in psychology nd insted pproximtes tht in certin physicl sciences. As mentioned in Chpter, Meehl (967) clls ttention to how theory testing in psychology is usully different from theory testing in physics. In physics, one typiclly proceeds by mking specific point prediction nd ssessing whether the dt deprt significntly from tht theoreticl prediction, wheres in psychology, one typiclly lends support to theoreticl hypothesis by reecting null hypothesis of no difference. On the one hnd, the typicl sitution in psychology is less precise thn tht in physics in tht the theoreticl prediction is often ust tht the groups will differ rther thn specifying by how much. On the other hnd, the identifiction of the theoreticl prediction with the null hypothesis rises different set of problems in tht the presumption in hypothesis testing is in fvor of the null hypothesis. Among the potentil disdvntges to such n pproch, which pplies to the tests of repliction introduced here, is tht one could be more likely to confirm one s theoreticl expecttions by running fewer subects or doing other things to lower power. It is possible to both hve the dvntges of theoreticl point prediction nd give the presumption to hypothesis tht is different from such theoreticl expecttions, but doing so requires use of

4 4 Chpter 3 novel methods beyond wht we introduce here. For provoctive discussion of method of crrying out test in which the null hypothesis is tht dt deprt by prespecified mount or more from expecttions so tht reection would men significnt support for theoreticl point prediction, see Serlin nd Lpsley (985). Alterntively, one might use confidence intervls round men difference to demonstrte the equivlence of two groups (see discussion in Chpter 4 of Equtions 40 nd 4; cf. Mxwell, Lu, & Howrd, 05). EXTENSION 3E: ROBUST METHODS FOR ONE-WAY BETWEEN-SUBJECT DESIGNS BROWN FORSYTHE, WELCH, AND KRUSKAL WALLIS TESTS In Chpter 3, we stte tht ANOVA is predicted on three ssumptions: normlity, homogeneity of vrince, nd independence of observtions. When these conditions re met, ANOVA is uniformly most powerful procedure. In essence, this mens tht the F test is the best possible test when one is interested uniformly (i.e., eqully) in ll possible lterntives to the null hypothesis. Thus, in the bsence of plnned comprisons, ANOVA is the optiml technique to use for hypothesis testing whenever its ssumptions hold. In prctice, the three ssumptions re often met t lest closely enough so tht the use of ANOVA is still optiml. Recll from our discussion of sttisticl ssumptions in Chpter 3 tht ANOVA is generlly robust to violtions of normlity nd homogeneity of vrince, lthough robustness to the ltter occurs only with equl n (more on this lter). Robustness mens tht the ctul rte of Type I errors committed is close to the nominl rte (typiclly.05) even when the ssumptions fil to hold. In ddition, ANOVA procedures generlly pper to be robust with respect to Type II errors s well, lthough less reserch hs been conducted on the Type II error rte. The generl robustness of ANOVA ws tken for grnted by most behviorl reserchers during the 970s, bsed on findings documented in the excellent literture review by Glss, Peckhm, nd Snders (97). Becuse both Type I nd Type II error rtes were only very slightly ffected by violtions of normlity or homogeneity (with equl n), there seemed to be little need to consider lterntive methods of hypothesis testing. However, the 980s sw renewed interest in possible lterntives to ANOVA. Although prt of the impetus behind this movement stemmed from further investigtion of robustness with regrd to the Type I error rte, the mor focus ws on the Type II error rte tht is, on issues of power. As Blir (98) points out, robustness implies tht the power of ANOVA is reltively unffected by violtions of ssumptions. However, the user of sttistics is interested not in whether ANOVA power is unffected, but in whether ANOVA is the most powerful test vilble for prticulr problem. Even when ANOVA is robust, it my not provide the most powerful test vilble when its ssumptions hve been violted. Sttisticins re developing possible lterntives to ANOVA. Our purpose in this extension is to provide brief introduction to few of these possible lterntives. We wrn you tht our coverge is fr from exhustive; we simply could not cover in brief review the wide rnge of possibilities lredy developed. Insted, our purpose is to mke you wre tht the field of sttistics is dynmic nd ever chnging, ust like other scientific fields of inquiry. Techniques (or theories) tht re fvored tody my be in disfvor tomorrow, replced by superior lterntives. Another reson we mke no ttempt to be exhustive here is tht further reserch yet needs to be done to compre the techniques we describe for usul ANOVA methods. At this time, it is uncler which, if ny, of these methods will be udged most useful. Although we provide evlutive comments where possible, we forewrn you tht this re is full of complexity nd controversy. The ssumption tht distributions re norml nd vrinces re homogeneous simplifies

5 Extensions 5 the world enormously. A moment s reflection should convince you tht nonnorml nd heterogeneous lck the precision of norml nd homogeneous. Dt cn be nonnorml in n infinite number of wys, rpidly mking it very difficult for sttisticins to find n optiml technique for nlyzing nonnorml dt. Wht is good for one form of nonnormlity my be bd for nother form. Also, wht kinds of distributions occur in rel dt? A theoreticl sttisticin my be interested in compring dt-nlysis techniques for dt from specific nonnorml distribution, but if tht prticulr distribution never underlies behviorl dt, the comprison my hve no prcticl import to behviorl reserchers. How fr do ctul dt deprt from normlity nd homogeneity? There is no simple nswer, which prtilly explins why compring lterntives to ANOVA is complicted nd controversil. The presenttion of methods in this extension is not regulrly prlleled by similr extensions on robust methods lter in the book becuse mny of the lterntives to ANOVA in the singlefctor, between-subects design hve not been generlized to more complex designs. Two possible types of lterntives to the usul ANOVA in between-subects designs hve received considerble ttention in recent yers. The first type is prmetric modifiction of the F test tht does not ssume homogeneity of vrince. The second type is nonprmetric pproch tht does not ssume normlity. Becuse the third ANOVA ssumption is independence, you might expect there to be third type of lterntive tht does not ssume independence. However, s we stted erlier, independence is lrgely mtter of design, so modifictions would likely involve chnges in the design insted of chnges in dt nlysis (see Kenny & Judd, 986). Besides these two brod types of lterntives, severl other possible pproches re being investigted. We look t two of these fter we exmine the prmetric modifictions nd the nonprmetric pproches. Prmetric Modifictions As stted erlier, one ssumption underlying the usul ANOVA F test is homogeneity of vrince. Sttisticins hve known for mny yers tht the F test cn be either very conservtive (too few Type I errors nd hence decresed power) or very liberl (too mny Type I errors) when vrinces re heterogeneous nd smple sizes re unequl. In generl, the F test is conservtive when lrge smple sizes re pired with lrge vrinces. The F is liberl when lrge smple sizes re pired with smll vrinces. Extension 3E4, the finl extension for Chpter 3, shows why the nture of the piring cuses the F sometimes to be conservtive nd other times to be liberl. Obviously, either occurrence is problemtic, especilly becuse the popultion vrinces re unknown prmeters. As consequence, we cn never know with complete certinty whether the ssumption hs been stisfied in the popultion. However, sttisticl tests of the ssumption re vilble (see Chpter 3), so one strtegy might be to use the stndrd F test to test men differences only if the homogeneity of vrince hypothesis cnnot be reected. Unfortuntely, this strtegy seems to offer lmost no dvntge (Wilcox, Chrlin, & Thompson, 986; Zimmermn, 004). The filure of this strtegy hs led some sttisticins (e.g., Tomrken & Serlin, 986; Wilcox et l., 986; Zimmermn, 004) to recommend tht the usul F test routinely be replced by one of the more robust lterntives we present here, especilly with unequl n. Although these problems with unequl n provide the primry motivtion for developing lterntives, severl studies hve shown tht the F test is not s robust s hd previously been thought when smple sizes re equl. Clinch nd Keselmn (98), Rogn nd Keselmn (977), Tomrken nd Serlin (986), nd Wilcox et l. (986) show tht the F test cn become somewht liberl with equl n when vrinces re heterogeneous. When vrinces re very different from ech other, the ctul Type I error rte my rech.0 or so (with nominl rte of.05), even with equl n. Of course, when vrinces re less different, the ctul error rte is closer to.05. In

6 6 Chpter ChAPTER 3 summry, there seems to be sufficient motivtion for considering lterntives to the F test when vrinces re heterogeneous, prticulrly when smple sizes re unequl. We consider two lterntives: Brown nd Forsythe (974) developed the first test, which hs rther intuitive rtionle. Welch (95) developed the second test. Both re vilble in SPSS (one-wy ANOVA procedure), so in our discussion, we downply computtionl detils. The test sttistic developed by Brown nd Forsythe (974) is bsed on the between-group sum of squres clculted in exctly the sme mnner s in the usul F test: SS = n Y B ( Y ), = (3E.) where Y = n Y/ N. However, the denomintor is clculted differently from the denomintor of the usul F test. The Brown Forsythe denomintor is chosen to hve the sme expected vlue s the numertor if the null hypothesis is true, even if vrinces re heterogeneous. (The rtionle for finding denomintor with the sme expected vlue s the numertor if the null hypothesis is true is discussed in Chpter 0.) After some tedious lgebr, it cn be shown tht the expected vlue of SS B under the null hypothesis is given = by E ( SS ) = ( / ) B n N σ. (3E.) = Notice tht if we were willing to ssume homogeneity of vrince, Eqution 3E. would simplify to E ( SS ) = B ( n / N) σ = σ ( n / N ) = ( ) σ, = = where s denotes the common vrince. With homogeneity, E (MS W ) = s, so the usul F is obtined by tking the rtio of MS B (which is SS B divided by ) nd MS W. Under homogeneity, MS B nd MS W hve the sme expected vlue under the null hypothesis, so their rtio provides n pproprite test sttistic. 3 When we re unwilling to ssume homogeneity, it is preferble to estimte the popultion vrince of ech group (i.e., σ ) seprtely. This is esily ccomplished by using s s n unbised estimte of σ. A suitble denomintor cn be obtined by substituting s for σ in Eqution 3E., yielding ( n / N ) s. = (3E.3) The expected vlue of this expression equls the expected vlue of SS B under the null hypothesis, even if homogeneity fils to hold. Thus tking the rtio of SS B nd the expression in Eqution 3E.3 yields n pproprite test sttistic: F * = = = n( ( Y Y) ) = =. ( ( n / N) / ) ss (3E.4) The sttistic is written s F* insted of F becuse it does not hve n exct F distribution. However, Brown nd Forsythe show tht the distribution of F* cn be pproximted by n F distribution.

7 Extensions 7 with numertor degrees of freedom nd f denomintor degrees of freedom. Unfortuntely, the denomintor degrees of freedom re tedious to clculte nd re best left to computer progrm. Nevertheless, we present the formul for denomintor degrees of freedom s follows: where f = = g ( n ), (3E.5) g = ( n / N) s. ( n / N) s = It is importnt to relize tht, in generl, F* differs from F in two wys. First, the denomintor degrees of freedom for the two pproches re different. Second, the observed vlues of the test sttistics re typiclly different s well. In prticulr, F* my be either systemticlly smller or lrger thn F. If lrge smples re pired with smll vrinces, F* tends to be smller thn F; however, this reflects n dvntge for F*, becuse F tends to be liberl in this sitution. Conversely, if lrge smples re pired with lrge vrinces, F* tends to be lrger thn F; once gin, this reflects n dvntge for F*, becuse F tends to be conservtive in this sitution. Wht if smple sizes re equl? With equl n, Eqution 3E.4 cn be rewritten s F* = n ( Y Y ) n n( Y Y) = = = [ ( / ) ] s [( )/ ] s = = n ( Y Y ) n ( Y Y) /( ) = = = = ( ) s / s / = MSB = = F. MS W Thus, with equl n, the observed vlues of F* nd F re identicl. However, the denomintor degrees of freedom re still different. It cn be shown tht with equl n, Eqution 3E.5 for the denomintor degrees of freedom ssocited with F* becomes = f n s = ( ) = s ( ) =. (3E.6) Although it my not be immeditely pprent, f is n index of how different smple vrinces re from ech other. If ll smple vrinces were identicl to ech other, f would equl (n ),

8 8 Chpter 3 the denomintor degrees of freedom for the usul F test. At the other extreme, s one vrince becomes infinitely lrger thn ll others, f pproches vlue of n. In generl, then, f rnges from n to (n ) nd ttins higher vlues for more similr vrinces. We cn summrize the reltionship between F* nd F with equl n s follows. To the extent tht the smple vrinces re similr, F* is similr to F; however, when smple vrinces re different from ech other, F* is more conservtive thn F becuse the lower denomintor degrees of freedom for F* imply higher criticl vlue for F* thn for F. As consequence, with equl n, F* reects the null hypothesis less often thn does F. If the homogeneity of vrince ssumption is vlid, the impliction is tht F* is less powerful thn F. However, Monte Crlo studies by Clinch nd Keselmn (98) nd Tomrken nd Serlin (986) suggest tht the power dvntge of F over F* rrely exceeds.03 with equl n. 4 On the other hnd, if the homogeneity ssumption is violted, F* tends to mintin t.05, wheres F becomes somewht liberl. However, the usul F test tends to remin robust s long s the popultion vrinces re not widely different from ech other. As result, in prctice, ny dvntge tht F* might offer over F with equl n is typiclly slight, except when vrinces re extremely discrepnt from ech other. However, with unequl n, F* nd F my be very different from one nother. If it so hppens tht lrge smples re pired with smll vrinces, F* mintins ner.05 (ssuming tht.05 is the nominl vlue), wheres the ctul level for the F test cn rech.5 or even.0 (Clinch & Keselmn, 98; Tomrken & Serlin, 986), if popultion vrinces re substntilly different from ech other. Conversely, if lrge smples hppen to be pired with lrge vrinces, F* provides more powerful test thn does the F test. The dvntge for F* cn be s gret s.5 or.0 (Tomrken & Serlin, 986), depending on how different the popultion vrinces re nd on how the vrinces re relted to the smple sizes. Thus F* is not necessrily more conservtive thn F. Welch (95) lso derived n lterntive to the F test tht does not require the homogeneity of vrince ssumption. Unlike the Brown nd Forsythe lterntive, which ws bsed on the between-group sum of squres of the usul F test, Welch s test uses different weighting of the sum of squres in the numertor. Welch s sttistic is defined s where, w( Y Y) /( ) = W = + ( ) Λ 3 w = n / s Y = = = wy 3 w w / w /( n ) = = Λ=. w

9 Extensions 9 When the null hypothesis is true, W is pproximtely distributed s n F vrible with numertor nd /Λ denomintor degrees of freedom. (Notice tht Λ is used to represent the vlue of Wilks s lmbd in Chpter 4. Its mening here is entirely different nd reflects the unfortunte trdition mong sttisticins to use the sme symbol for different expressions. In ny event, the mening here should be cler from the context.) It might llevite some concern to remind you t this point tht the SPSS progrm for one-wy ANOVA clcultes W s well s its degrees of freedom nd ssocited p vlue. The bsic difference between the rtionles behind F* nd W involves the weight ssocited with group s devition from the grnd men tht is, Y Y. As Eqution 3E. shows, F* weights ech group ccording to its smple size. Lrger groups receive more weight becuse their smple men is likely to be better estimte of their popultion men. W, however, weights ech group ccording to n/ s, which is the reciprocl of the estimted vrince of the men. Less vrible group mens thus receive more weight, whether the lesser vribility results from lrger smple size or smller vrince. This difference in weighting cuses W to be different from F*, even though neither ssumes homogeneity of vrince. As n side, notice lso tht the grnd men is defined differently in Welch s pproch thn for either F or F*; lthough it is still weighted verge of the group mens, the weights depend on the smple vrinces s well s the smple sizes. Welch s W sttistic compres to the usul F test in generlly similr mnner s F* compres to F. When lrge smples re pired with lrge vrinces, W is less conservtive thn F. When lrge smples re pired with smll vrinces, W is less liberl thn F. Interestingly, when smple sizes re equl, W differs more from F thn does F*. Wheres F nd F* hve the sme observed vlue with equl n, in generl, the observed vlue of W is different. The reson is tht, s seen erlier, W gives more weight to groups with smller smple vrinces. When homogeneity holds in the popultion, this differentil weighting is simply bsed on chnce, becuse in this sitution, smple vrinces differ from one nother s result of smpling error only. As result, tests bsed on W re somewht less powerful thn tests bsed on F. Bsed on Tomrken nd Serlin s (986) findings, the difference in power is usully.03 or less nd would rrely exceed.06 unless smple sizes re very smll. However, when homogeneity fils to hold, W cn be pprecibly more powerful thn the usul F test, even with equl n. The power dvntge of W ws often s lrge s.0 nd even reched.34 in one condition in Tomrken nd Serlin s simultions. This dvntge stems from W giving more weight to the more stble smple mens, which F does not do (nor does F*). It must be dded, however, tht W cn lso hve less power thn F with equl n. If the group tht differs most from the grnd men hs lrge popultion vrince, W ttches reltively smll weight to the group becuse of its lrge vrince. In this prticulr cse, W tends to be less powerful thn F becuse the most discrepnt group receives the lest weight. Nevertheless, Tomrken nd Serlin found tht W is generlly more powerful thn F for most ptterns of mens when heterogeneity occurs with equl n. The choice between F* nd W when heterogeneity is suspected is difficult given the current stte of knowledge. On the one hnd, Tomrken nd Serlin (986) found tht W is more powerful thn F* cross most configurtions of popultion mens. On the other hnd, Clinch nd Keselmn (98) found tht W becomes somewht liberl when underlying popultion distributions re skewed insted of norml. They found tht F* generlly mintins close to nominl vlue of.05 even for skewed distributions. In ddition, Wilcox et l. (986) found tht W mintined n pproprite Type I error rte better thn F* when smple sizes re equl, but tht F* ws better thn W when unequl smple

10 0 Chpter 3 sizes re pired with equl vrinces. Choosing between F* nd W is obviously fr from cler-cut, given the complex nture of findings. Further reserch is needed to clrify their reltive strengths. Although the choice between F* nd W is unsettled, it is cler tht both re preferble to F when popultion vrinces re heterogeneous nd smple sizes re unequl. Tble 3E. summrizes the properties of F, F*, nd W s function of popultion vrinces nd smple sizes. Agin, from prcticl stndpoint, the primry point of the tble is tht F* or W should be considered seriously s replcement for the usul F test when smple sizes re unequl nd heterogeneity of vrince is suspected. TABLE 3E. PROPERTIES OF F, F*, AND W AS A FUNCTION OF SAMPLE SIZES AND POPULATION VARIANCES Test Sttistic F F* W Equl Smple Sizes Equl vrinces Approprite Slightly conservtive Robust Unequl vrinces Robust, except cn become liberl for very lrge differences in vrinces Robust, except cn become liberl for extremely lrge differences in vrinces Robust Unequl Smple Sizes Equl vrinces Approprite Robust Robust, except cn become slightly liberl for very lrge differences in smple sizes Lrge smples pired with lrge vrinces Lrge smples pired with smll vrinces Conservtive Liberl Robust, except cn become slightly liberl when differences in smple sizes nd in vrinces re both very lrge Robust, except cn become slightly liberl when differences in smple sizes nd in vrinces re both very lrge Robust, except cn become slightly liberl when differences in smple sizes nd in vrinces re both very lrge Robust, except cn become slightly liberl when differences in smple sizes nd in vrinces re both very lrge Nonprmetric Approches The prmetric modifictions of the previous section were developed for nlyzing dt with unequl popultion vrinces. The nonprmetric pproches of this section were developed for nlyzing dt whose popultion distributions re nonnorml. As we discuss in some detil lter, nother motivting fctor for the development of nonprmetric techniques in the behviorl sciences hs been the belief held by some reserchers tht they require less stringent mesurement properties of the dependent vrible. The orgniztionl structure of this section consists of, first, presenting prticulr nonprmetric technique, nd, second, discussing its merits reltive to prmetric techniques.

11 Extensions There re severl nonprmetric lterntives to ANOVA for the single-fctor, betweensubects design. We present only one of these, the Kruskl Wllis test, which is the most frequently used nonprmetric test for this design. For informtion on other nonprmetric methods, consult such nonprmetric textbooks s Brdley (968), Cliff (996), Gibbons (97), Mrscuilo nd McSweeney (977), Noether (976), nd Siegel (956). The Kruskl Wllis test is often clled n ANOVA by Rnks becuse fundmentl distinction between the usul ANOVA nd the Kruskl Wllis test is tht the originl scores re replced by their rnks in the Kruskl Wllis test. Specificlly, the first step in the test is to rnk order ll observtions from low to high (ctully, high to low yields exctly the sme result) in the entire set of N subects. Be certin to notice tht this rnking is performed cross ll groups, independently of group membership. When scores re tied, ech observtion is ssigned the verge (i.e., men) rnk of the scores in the tied set. For exmple, if three scores re tied for 6th, 7th, nd 8th plce in order, ll three scores re ssigned rnk of 7. Once the scores hve been rnked, the test sttistic is given by H = n R N N N [( + )/ ], (3E.7) ( + ) = { } where R is the men rnk for group. Although Eqution 3E.7 my look very different from the usul ANOVA F sttistic, in fct, there is n underlying similrity. For exmple, (N + l)/ is simply the grnd men of the rnks, which we know must hve vlues of,, 3,..., N. Thus the term Σ = n{ R [( N + )/ ]} is weighted sum of squred devitions of group mens from the grnd men, s in the prmetric F test. It lso proves to be unnecessry to estimte s, the popultion error vrince, becuse the test sttistic is bsed on finite popultion of size N (cf. Mrscuilo & McSweeney, 977, for more on this point). The importnt point for our purposes is tht the Kruskl Wllis test is very much like n ANOVA on rnks. When the null hypothesis is true, H is pproximtely distributed s χ with degrees of freedom. The χ pproximtion is ccurte unless smple sizes within some groups re quite smll, in which cse tbles of the exct distribution of H should be consulted in such sources s Siegel (956) or Imn, Qude, nd Alexnder (975). When ties occur in the dt, correction fctor T should be pplied: T = G 3 ( ti ti), 3 N N i= where t i is the number of observtions tied t prticulr vlue nd G is the number of distinct vlues for which there re ties. A corrected test sttistic H is obtined by dividing H by T : H = H/T. The correction hs little effect (i.e., H differs very little from H) unless smple sizes re very smll or there re mny ties in the dt reltive to smple size. Most mor sttisticl pckges (e.g., SAS nd SPSS) hve progrm for computing H (or H ) nd its ssocited p vlue. Also, it should be pointed out tht when there re only two groups to be compred (i.e., = ), the Kruskl Wllis test is equivlent to the Wilcoxon Rnk Sum test, which is lso equivlent to the Mnn Whitney U.

12 Chpter 3 Choosing Between Prmetric nd Nonprmetric Tests Sttisticins hve debted the reltive merits of prmetric versus nonprmetric tests ever since the inception of nonprmetric pproches. As consequence, ll too often behviorl reserchers re told either tht prmetric procedures should lwys be used (becuse they re robust nd more powerful) or tht nonprmetric methods should lwys be used (becuse they mke fewer ssumptions). Not surprisingly, both of these extreme positions re oversimplifictions. We provide brief overview of the dvntges ech pproch possesses in certin situtions. Our discussion is limited to comprison of the F, F*, nd W prmetric tests nd the Kruskl Wllis nonprmetric test. Nevertheless, even with this limittion, do not expect our comprison of the methods to provide definitive nswer s to which pproch is best. The choice between pproches is too complicted for such simple nswer. There re certin occsions where prmetric tests re preferble, nd there re others where nonprmetric tests re better. A wise dt nlyst crefully weighs the dvntges in his or her sitution nd mkes n informed choice ccordingly. A primry reson the comprison of prmetric nd nonprmetric pproches is so difficult is tht they do not lwys test the sme null hypothesis. To see why they do not, we must consider the ssumptions ssocited with ech pproch. As stted erlier, we consider specificlly the F test nd Kruskl Wllis test for one-wy between-subects designs. As discussed in Chpter 3, the prmetric ANOVA cn be conceptulized in terms of full model of the form ANOVA tests null hypothesis Y i = μ + α + ε i. H 0 : α = α = = α = 0, where it is ssumed tht popultion distributions re norml nd hve equl vrinces. In other words, under the null hypothesis, ll popultion distributions re identicl norml distributions if ANOVA ssumptions hold. If the null hypothesis is flse, one or more distributions re shifted either to the left or to the right of the other distributions. Figure 3E. illustrtes such n occurrence for the cse of three groups. The three distributions re identicl except tht μ = 0, μ = 0, nd μ 3 = 35. When the normlity nd homogeneity ssumptions re met, the distributions still hve the sme shpe, but they hve different loctions when the null hypothesis is flse. For this reson, ANOVA is sometimes referred to s test of loction or s testing shift hypothesis. Under certin conditions, the Kruskl Wllis test cn lso be conceptulized s testing shift hypothesis. However, lthough it my seem surprising given it hs been mny yers since Kruskl nd Wllis (95) introduced their test, there hs been fir mount of confusion nd vrition in textbook descriptions even decdes lter bout wht ssumptions re required nd wht hypothesis is tested by the Kruskl Wllis test ( summry is provided by Vrgh & Delney, 998). As usul, wht one cn conclude is driven lrgely by wht one is willing to ssume. If one is willing to ssume tht the distributions being compred re identicl except possibly for their loction, then the Kruskl Wllis test cn led to similr conclusion s n ANOVA. Like other uthors who dopt these restrictive shift model

13 Extensions 3 FIGURE 3E. Shifted distributions under ANOVA ssumptions ssumptions, Hollnder nd Wolfe (973) rgue tht the Kruskl Wllis test cn be thought of in terms of full model of the form Y i = μ + α + ε i nd tht the null hypothesis being tested cn be represented by H 0 : α = α = = α = 0, ust s in the prmetric ANOVA. From this perspective, the only difference concerns the ssumptions involving the distribution of errors (ε i ). Wheres the prmetric ANOVA ssumes both normlity nd homogeneity, the Kruskl Wllis test ssumes only tht the popultion of error scores hs n identicl continuous distribution for every group. As consequence, in the Kruskl Wllis model, homogeneity of vrince is still ssumed, but normlity is not. The importnt point for our purposes is tht, under these ssumptions, the Kruskl Wllis test is testing shift hypothesis, s is the prmetric ANOVA, when its ssumptions re met. Figure 3E. illustrtes such n occurrence for the cse of three groups. As in Figure 3E., the three distributions of Figure 3E. re identicl to ech other except for their loction on the X xis. Notice, however, tht the distributions in Figure 3E. re skewed, unlike the FIGURE 3E. Shifted distributions under Kruskl Wllis ssumptions

14 4 Chpter 3 FIGURE 3E.3 Mening of α α for two groups when shift hypothesis holds distributions in Figure 3E. tht re required to be norml by the ANOVA model. Under these conditions, both pproches re testing the sme null hypothesis, becuse the α prmeters in the models re identicl. For exmple, the difference α α represents the extent to which the distribution of Group is shifted either to the right or to the left of Group. Not only does α α equl the difference between the popultion mens, but s Figure 3E.3 shows, it lso equls the difference between the medins, the 5th percentile, the 75th percentile, or ny other percentile. Indeed, it is firly common to regrd the Kruskl Wllis test s wy of deciding if the popultion medins differ (cf. Wilcox, 996, p. 365). This is legitimte when the ssumption tht ll distributions hve the sme shpe is met. In this sitution, the only difference between the two pproches is tht the prmetric ANOVA mkes the dditionl ssumption tht this common shpe is tht of norml distribution. Of course, this difference implies different properties for the tests, which we discuss momentrily. To summrize, when one dopts the ssumptions of the shift model (nmely, identicl distributions for ll groups, except for possible shift under the lterntive hypothesis), the Kruskl Wllis test nd the prmetric ANOVA re testing the sme null hypothesis. In this circumstnce, it is possible to compre the two pproches nd stte conditions under which ech pproch is dvntgeous. 5 However, there re good resons for viewing the Kruskl Wllis test differently. Although it is widely understood tht the null hypothesis being tested is tht the groups hve identicl popultion distributions, the most pproprite lterntive hypothesis nd required ssumptions re not widely understood. One importnt point to understnd is tht the Kruskl Wllis test possesses the desirble mthemticl property of consistency only with respect to n lterntive hypothesis stted in terms of whether individul scores re greter in one group thn nother. This is seen most clerly when two groups re being compred. Let p be defined s the probbility tht rndomly smpled observtion from one group on continuous dependent vrible Y is greter thn rndomly smpled observtion from the second group (e.g., Wilcox, 996, p. 365). Tht is, p = Pr(Y > Y ).

15 Extensions 5 If the two popultion distributions re identicl, p = ½, nd so one could stte the null hypothesis the Kruskl Wllis is testing in this cse s H 0 : p = ½. The mthemticl reson doing so mkes most sense is tht the test is consistent ginst n lterntive hypothesis if nd only if it implies H : p ½ (Kendll & Sturt, 973, p. 53). Cses in which p = ½ hve been termed cses of stochstic equlity (cf. Mnn & Whitney, 947; Delney & Vrgh, 00) nd the consistency property mens tht if the two popultions re stochsticlly unequl (p ½), then the probbility of reecting the null hypothesis pproches s the smple sizes get lrger. When the popultions hve the sme shpe, ANOVA nd Kruskl Wllis re testing the sme hypothesis regrding loction, lthough it is common to regrd the ANOVA s test of differences between popultion mens, but regrd the Kruskl Wllis test s test of differences between popultion medins. 6 However, when distributions hve different symmetric shpes, it is possible for the popultion mens to ll be equl nd yet the popultion medins ll be different, or vice vers. Similrly, with regrd to the hypothesis tht the Kruskl Wllis test is most pproprite for, nmely, stochstic equlity with different symmetric distributions, the popultion mens might ll be equl, yet the distributions be stochsticlly unequl, or vice vers. The point is tht, in such cse, the prmetric ANOVA my be testing true null hypothesis, wheres the nonprmetric pproch is testing flse null hypothesis. In such circumstnce, the probbilities of reecting the null hypothesis for the two pproches cnnot be compred meningfully becuse they re nswering different questions. In summry, when distributions hve different shpes, the prmetric nd nonprmetric pproches re generlly testing different hypotheses. Different shpes occur firly often, in prt becuse of floor-nd-ceiling effects such s occur with Likert-scle dependent vribles. In such conditions, the bsis for choosing between the pproches should probbly involve considertion of whether the reserch question is best formulted in terms of popultion mens or in terms of comprisons of individul scores. When these differ, we would rgue tht more often thn not, the scientist is interested in the comprison of individuls. If one is interested in compring two methods of therpy for reducing depression, or clients dily lcohol consumption, one is likely more interested in which method would help the greter number of people rther thn which method produces the greter men level of chnge if these re different. In such situtions, stochstic comprison might be preferred to the comprison of mens. Suppose tht, in fct, popultion distributions re identicl which pproch is better, prmetric or nonprmetric? Although the question seems reltively strightforwrd, the nswer is not. Under some conditions, such s norml distributions, the prmetric pproch is better. However, under other conditions, such s certin long-tiled distributions (in which extreme scores re more likely thn in the norml distribution), the nonprmetric pproch is better. As usul, the choice involves considertion of Type I error rte nd power. If popultion distributions re identicl nd norml, both the F test nd the Kruskl Wllis test mintin the ctul α level t the nominl vlue, becuse the ssumptions of both tests hve been met (ssuming, in ddition, s we do throughout this discussion, tht observtions re independent of one nother). On the other hnd, if distributions re identicl but nonnorml, only the ssumptions of the Kruskl Wllis test re met. Nevertheless, the extensive survey conducted by Glss nd collegues (97) suggests tht the F test is robust with respect to Type I errors to ll but extreme violtions of normlity. 7 Thus, with regrd to Type I error rtes, there is little prcticl reson to prefer either test over the other if ll popultion distributions hve identicl shpes. While on the topic of the Type I error rte, it is importnt to dispel myth concerning nonprmetric tests. Mny reserchers pprently believe tht the Kruskl Wllis test should be used insted of the F test when vrinces re unequl, becuse the Kruskl Wllis test does not ssume homogeneity of vrince. However, we cn see tht this belief is misguided. Under the

16 6 Chpter 3 shift model, the Kruskl Wllis test ssumes tht popultion distributions re identicl under the null hypothesis, nd identicl distributions obviously hve equl vrinces. Even when the Kruskl Wllis is treted s test of stochstic equlity, the test ssumes tht the rnks of the scores re eqully vrible cross groups (Vrgh & Delney, 998), so homogeneity of vrince in some form is, in fct, n ssumption of the Kruskl Wllis test. Furthermore, the Kruskl Wllis test is not robust to violtions of this ssumption with unequl n. Keselmn, Rogn, nd Feir-Wlsh (977), s well s Tomrken nd Serlin (986), found tht the ctul Type I error rte of the Kruskl Wllis test could be s lrge s twice the nominl level when lrge smples re pired with smll vrinces (cf., Oshim & Algin, 99). It should be dded tht the usul F test ws even less robust thn the Kruskl Wllis test. However, the importnt prcticl point is tht neither test is robust. In contrst, Tomrken nd Serlin (986) found both F* nd W to mintin cceptble α levels even for vrious ptterns of unequl smple sizes nd unequl vrinces. 8 Thus the prcticl impliction is tht F* nd W re better lterntives to the usul F test thn is the stndrd Kruskl Wllis test when heterogeneity of vrince is suspected, especilly with unequl n. Robust forms of the Kruskl Wllis test hve now been proposed, but re not considered here (see Delney & Vrgh, 00). A second common myth surrounding nonprmetric tests is tht they re lwys less powerful thn prmetric tests. It is true tht if the popultion distributions for ll groups re norml with equl vrinces, then the F test is more powerful thn the Kruskl Wllis test. The size of the difference in power vries s function of the smple sizes nd the mens, so it is impossible to stte single number to represent how much more powerful the F test is. However, it is possible to determine mthemticlly tht s smple sizes increse towrd infinity, the efficiency of the Kruskl Wllis test to the F test is.955 under normlity. 9 In prcticl terms, this mens tht for lrge smples, the F test cn chieve the sme power s the Kruskl Wllis test nd yet require only 95.5% s mny subects s would the Kruskl Wllis test. It cn lso be shown tht for lrge smples, the Kruskl Wllis test is t lest 86.4% s efficient s the F test for distributions of ny shpe, s long s ll distributions hve the sme shpe. Thus, t its bsolute worst, for lrge smples, using the Kruskl Wllis insted of the F test is nlogous to filing to use 3.6% of the subects one hs observed. We must dd, however, tht the previous sttement ssumes tht ll popultion distributions re identicl. If they re not, the Kruskl Wllis test in some circumstnces hs little or no power for detecting true men differences, becuse it is testing different hypothesis nmely, stochstic equlity. So fr, we hve done little to dispel the myth tht prmetric tests re lwys more powerful thn nonprmetric tests. However, for certin nonnorml distributions, the Kruskl Wllis test is, in fct, considerbly more powerful thn the prmetric F test. Generlly speking, the Kruskl Wllis test is more powerful thn the F test when the underlying popultion distributions re symmetric but hevy-tiled, which mens tht extreme scores (i.e., outliers) re more frequent thn in the norml distribution. The size of the power dvntge of the Kruskl Wllis test depends on the prticulr shpe of the nonnorml distribution, smple sizes, nd mgnitude of seprtion between the groups. However, the size of this dvntge cn esily be lrge enough to be of prcticl importnce in some situtions. It should lso be dded tht the Kruskl Wllis test is frequently more powerful thn the F test when distributions re identicl but skewed. As mentioned erlier, nother rgument tht hs been mde for using nonprmetric procedures is tht they require less stringent mesurement properties of the dt. In fct, there hs been heted controversy ever since Stevens (946, 95) introduced the concept of levels of mesurement (i.e., nominl, ordinl, intervl, nd rtio scles) with his views of their implictions for sttistics. Stevens rgues tht the use of prmetric sttistics requires tht the observed dependent vrible be mesured on n intervl or rtio scle. However, mny behviorl vribles fil to meet this criterion, which hs been tken by some psychologists to imply tht most behviorl

17 Extensions 7 dt should be nlyzed with nonprmetric techniques. Others (e.g., Gito, 980; Lord, 953) rgue tht the use of prmetric procedures is entirely pproprite for behviorl dt. We cnnot possibly do ustice in this discussion to the complexities of ll viewpoints. Insted, we ttempt to describe briefly few themes nd recommend dditionl reding. Grdner (975) provides n excellent review of both sides of the controversy through the mid-970s. Three points rised in his review deserve specil mention here. First, prmetric sttisticl tests do not mke ny sttisticl ssumptions bout the level of mesurement. As we stted previously, the ssumptions of the F test re normlity, homogeneity of vrince, nd independence of observtions. A correct numericl sttement concerning popultion men differences does not require intervl mesurement. Second, lthough prmetric test cn be performed on ordinl dt without violting ny ssumptions of the test, the mening of the test could be dmged. In essence, this cn be thought of s potentil construct vlidity problem (see Chpter ). Although the test is correct s sttement of men group differences on the observed vrible, these differences might not reflect true differences on the underlying construct. Third, Grdner cites two empiricl studies (Bker, Hrdyck, & Petrinovich, 966; Lbovitz, 967) tht showed tht, lthough in theory construct vlidity might be problemtic, in relity, prmetric tests produced meningful results for constructs even when the level of mesurement ws only ordinl. Recent work demonstrtes tht the erlier empiricl studies conducted prior to 980 were correct s fr s they went, but it hs become cler tht these erlier studies were limited in n importnt wy. In effect, the erlier studies ssumed tht the underlying popultion distributions on the construct not only hd the sme men but lso were literlly identicl to ech other. However, number of lter studies (e.g., Mxwell & Delney, 985; Spencer, 983) show tht when the popultion distributions on the construct hve the sme men but different vrinces, prmetric techniques on ordinl dt cn result in very misleding conclusions. Thus, in some prcticl situtions, nonprmetric techniques my indeed be more pproprite thn prmetric pproches. Mny interesting rticles continue to be written on this topic. Articles deserving ttention re Dvison nd Shrm (988), Mrcus-Roberts nd Roberts (987), Michell (986), nd Townsend nd Ashby (984). In summry, the choice between prmetric test (F, F*, or W) nd the Kruskl Wllis test involves considertion of number of fctors. First, the Kruskl Wllis test does not lwys test the sme hypothesis s the prmetric tests. As result, in generl, it is importnt to consider whether the reserch question of interest is most ppropritely formulted in terms of comprisons of individul scores or comprisons of mens. Second, neither the usul F test nor the Kruskl Wllis test is robust to violtions of homogeneity of vrince with unequl n. Either F* or W, or robust forms of the Kruskl Wllis test, re preferble in this sitution. Third, for some distributions, the F test is more powerful thn the Kruskl Wllis test, wheres for other distributions, the reverse is true. Thus neither pproch is lwys better thn the other. Fourth, level of mesurement continues to be controversil s fctor tht might or might not influence the choice between prmetric nd nonprmetric pproches. EXTENSION 3E3: TWO OTHER APPROACHES: RANK TRANSFORMATIONS AND M ESTIMATORS As if the choice between prmetric nd nonprmetric were not lredy complicted, there re yet other possible techniques for dt nlysis, even in the reltively simple one-wy, betweensubects design. As we stted t the beginning of Extension 3E, sttisticins re constntly inventing new methods of dt nlysis. In this section, we tke brief glimpse t two methods tht re still in the experimentl stges of development. Becuse the dvntges nd disdvntges

18 8 Chpter 3 of these methods re lrgely unexplored, we would not recommend s of this writing tht you use these pproches s your sole dt-nlysis technique without first seeking expert dvice. Nevertheless, we believe tht it is importnt to expose you to these methods becuse they represent the types of innovtions currently being studied. As such, they my become preferred methods of dt nlysis during the creers of those of you who re reding this book s students. The first innovtion, clled rnk trnsformtion pproch, hs been described s bridge between prmetric nd nonprmetric sttistics by its primry developers, Conover nd Imn (98). The rnk trnsformtion pproch consists of simply replcing the observed dt with their rnks nd then pplying the usul prmetric test. Conover nd Imn (98) discuss how this pproch cn be pplied to such diverse problems s multiple regression, discriminnt nlysis, nd cluster nlysis. In the cse of the one-wy, between-subects design, the prmetric F computed on rnks (denoted F R ) is closely relted to the Kruskl Wllis test. Conover nd Imn show tht F R is relted to the Kruskl Wllis H with the formul F H /( ) / ( N H)/( N ). R = [ ] [ ] The rnk trnsformtion test compres F R to criticl F vlue, wheres the Kruskl Wllis test compres H to criticl χ vlue. Both methods re lrge-smple pproximtions to the true criticl vlue. Imn nd Dvenport (976) found the F pproximtion to be superior to the χ pproximtion in the mority of cses they investigted (see Delney & Vrgh, 00, for discussion of sitution where using rnk trnsformtions did not work well). A second innovtion involves method of prmeter estimtion other thn lest squres. Lest squres forms the bsis for compring models in ll prmetric techniques we discuss in this book. In one form or nother, we generlly end up finding prmeter estimte ˆµ to minimize n expression of the form Σ( Y µ ˆ). Such n pproch proves to be optiml when distributions re norml with equl vrinces. However, s we hve seen, optimlity is lost when these conditions do not hold. In prticulr, lest squres tends to perform poorly in the presence of outliers (i.e., extreme scores) becuse the squring function is very sensitive to extreme scores. For exmple, consider the following five scores: 5, 0, 5, 0, 75. If we regrd these five observtions s rndom smple, we could use lest squres to estimte the popultion men. It is esily verified tht ˆµ = 5 minimizes Σ( Y µ ˆ) for these dt. As we know, the smple men, which here equls 5, is the lest-squres estimte. However, only one of the five scores is this lrge. The smple men hs been gretly influenced by the single extreme score of 75. If we re willing to ssume tht the popultion distribution is symmetric, we could lso use the smple medin s n unbised estimtor of the popultion men. 0 It is obvious tht the medin of our smple is 5, but how does this relte to lest squres? It cn be shown tht the medin is the estimte tht minimizes the sum of the bsolute vlue of errors: Σ Y µ ˆ. Thus the smple men minimizes the sum of squred errors, wheres the smple medin minimizes the sum of bsolute errors. The medin is less sensitive thn the men to outliers for some distributions, this is n dvntge, but for others, it is disdvntge. In prticulr, for hevy-tiled distributions, the medin s insensitivity to outliers mkes it superior to the men. However, in norml distribution, the medin is much less efficient estimtor thn is the men. The fct tht neither the medin nor the men is uniformly best hs prompted the serch for lterntive estimtors. Sttisticins developed clss of estimtors clled M estimtors tht in mny respects represent compromise between the men nd the medin. For exmple, one member of this clss (the Huber M estimtor) is described s cting like the men for centrlly locted observtions nd like the medin for observtions fr removed from the bulk of the dt (Wu, 985, p. 339). As consequence, these robust estimtors represent nother bridge between prmetric

19 Extensions 9 nd nonprmetric pproches. These robust estimtors re obtined once gin by minimizing term involving the sum of errors. However, M estimtors constitute n entire clss of estimtors defined by minimizing the sum of some generl function of the errors. The form of the function determines the specific estimtor in the generl clss. For exmple, if the function is the squre of the error, the specific estimtion technique is lest squres. Thus lest-squres estimtors re members of the brod clss of M estimtors. The medin is lso member of the clss becuse it involves minimizing the sum of function of the errors, with the prticulr function being the bsolute vlue function. Although quite few robust estimtors hve been developed, we describe only n estimtor developed by Huber becuse of its reltive simplicity. Huber s estimtor requires tht robust estimtor of scle (i.e., dispersion or vribility) hs been clculted prior to determining the robust estimte of loction (i.e., popultion men). Note tht the scle estimte need not ctully be bsed on robust estimtor; however, using robust estimtor of scle is sensible, if one believes tht robust estimtor of loction is needed in prticulr sitution. Although number of robust estimtors of scle re vilble, we present only one: the medin bsolute devition (MAD) from the medin. MAD is defined s MAD = medin { Y i Mdn }, where Mdn is the smple medin. Although t first reding, the definition of MAD my resemble double-tlk, its clcultion is ctully very strightforwrd. For exmple, consider gin our hypotheticl exmple of five scores: 5, 0, 5, 0, nd 75. As we hve seen, the medin of these scores is 5, so we cn write Mdn =5. Then the bsolute devitions re given by 5 5 = 0, 0 5 = 5, 5 5 = 0, 0 5 = 5, nd 75 5 = 60. MAD is defined to be the medin of these five bsolute devitions, which is 5 in our exmple. MAD cn be thought of s robust type of stndrd devition. However, the expected vlue of MAD is considerbly less thn s for norml distribution. For this reson, MAD is often divided by.6745, which puts it on the sme scle s s for norml distribution. We let S denote this robust estimte of scle, so we hve S = MAD/ With this bckground, we cn now consider Huber s M estimtor of loction. To simplify our nottion, we define u i to be (Y i ˆµ ) /S, where S is the robust estimte of scle (hence we lredy know its vlue) nd ˆµ is the robust estimte of loction whose vlue we re seeking. n Then Huber s M estimtor minimizes the sum of function of the errors i f( u i ) =, where the function f is defined s follows: ui if ui f( ui ) =. ui if ui > Notice tht function f involves minimizing sums of squred errors for errors tht re close to the center of the distribution but involves minimizing the sum of bsolute errors for errors tht re fr from the center. Thus, s our erlier quote from Wu indicted, Huber s estimte relly does behve like the men for observtions ner the center of the distribution but like the medin for those frther wy. At this point, you my be wondering how the ˆµ tht minimizes the sum of Huber s function is determined. It turns out tht the vlue must be determined through n itertive procedure. As first step, strting vlue for ˆµ is chosen; simple choice for the strting vlue would be the smple medin. We might denote this vlue ˆµ 0 the zero subscript indicting tht this vlue is the optiml vlue fter zero itertions. Then new estimte is computed n tht minimizes the function i f( u i ) =, where u ˆ i = ( Y µ 0)/ S. This yields new estimte ˆµ, where the subscript indictes tht one itertion hs been completed. The process continues until it converges, mening tht further itertions would mke no prcticl difference in the vlue. 3

20 0 Chpter 3 Not only does M estimtion produce robust estimtes, but it lso provides methodology for hypothesis testing. Schrder nd Hettmnsperger (980) show how full nd restricted models bsed on M estimtes cn be compred to rrive t n F test using the sme bsic logic tht underlies the F test with lest squres. Li (985) nd Wu (985) describe how M estimtion cn be pplied to robust tests in regression nlysis. In summry, we hve seen two possible bridges between the prmetric nd nonprmetric pproches. It remins to be seen whether either of these bridges will eventully spn the gp tht hs historiclly existed between proponents of prmetrics nd proponents of nonprmetrics. EXTENSION 3E4: WHY DOES THE USUAL F TEST FALTER WITH UNEQUAL NS WHEN POPULATION VARIANCES ARE UNEQUAL? Why is the F test conservtive when lrge smple sizes re pired with lrge vrinces, yet liberl when lrge smple sizes re pired with smll vrinces? The nswer cn be seen by compring the expected vlues of MS W nd MS B when the null hypothesis is true, but vrinces re possibly unequl. In this sitution, the expected vlues of both MS B nd MS W re weighted verges of the popultion vrinces. However, smple sizes ply different roles in the two weighting schemes. Specificlly, it cn be shown tht if the null hypothesis is true, MS B hs n expected vlue given by E ( MS ) B = w = σ w, (3E4.) where w = N n. Thus the weight popultion vrince receives in MS B is inversely relted to its smple size. Although this my seem counterintuitive, it helps to relize tht MS B is bsed on Y Y, nd lrger groups contribute proportionlly more to Y. Similrly, it cn be shown tht MS W hs n expected vlue equl to E ( MS ) W = w * = σ w *, (3E4.) where w * = n. Thus the weight popultion vrince receives in MS W is directly relted to its smple size. Wht re the implictions of Equtions 3E4. nd 3E4.? Let s consider some specil cses. Cse I. Homogeneity of Vrince If ll σ re equl to ech other, Equtions 3E4. nd 3E4. simplify to E ( MS B ) = σ nd E ( MS W ) = σ, becuse the weights re irrelevnt when ll the numbers to be verged re identicl. In this cse, the F rtio of MS B to MS W works ppropritely, regrdless of whether the smple sizes re equl or unequl.

21 Extensions Cse II. Unequl Vrinces but Equl n If ll n re equl to ech other, Equtions 3E4. nd 3E4. simplify to E ( MS ) = B Σ = σ / nd E ( MS ) = W Σ = σ /. Becuse the weights re equl to one nother, in both cses, the weighted verges become identicl to simple unweighted verges. Thus MS B nd MS W re equl to one nother in the long run. Although the ANOVA ssumption hs been violted, the F test is typiclly only slightly ffected here. Cse III. Unequl Vrinces: Lrge Smples Pired with Smll Vrinces In this sitution, we cn see from Eqution 3E4. tht E (MS B ) receives more weight from the smller smples, which hve lrger vrinces. Thus the weighted verge used to clculte E (MS B ) is lrger thn the unweighted verge of the σ terms. However, E (MS W ) receives more weight from the lrger smples, which hve smller vrinces. Thus the weighted verge used to clculte E (MS W ) is smller thn the unweighted verge of the σ terms. As consequence, E (MS B ) > E (MS W ), even when the null hypothesis is true. F vlues tend to be too lrge, resulting in too mny reections of the null hypothesis when it is true. Thus the Type I error rte is too high. Cse IV. Unequl Vrinces: Lrge Smples Pired with Lrge Vrinces This sitution is ust the opposite of Cse III. Now E (MS B ) gives more weight to the groups with smll vrinces becuse they re smller in size. In contrst, E (MS W ) gives more weight to the groups with lrge vrinces becuse they re lrger in size. As result, E (MS B ) < E (MS W ) when the null hypothesis is true. The F test is conservtive nd reects the null hypothesis too infrequently. Thus power suffers. EXERCISES *. True or Flse: Although the prmetric modifiction F* is more robust thn the usul F test to violtions of homogeneity of vrince in between-subects designs, the F* test is lwys t lest slightly less powerful thn the F test.. True or Flse: The prmetric test bsed on Welch s W sttistic cn be either more or less powerful thn the usul F test in equl-n designs. 3. True or Flse: When smple sizes re unequl nd heterogeneity of vrince is suspected in one-wy between-subects designs, either F* or W should seriously be considered s replcement for the usul F test. 4. True or Flse: If one is willing to ssume distributions hve identicl shpes, the Kruskl Wllis test cn be regrded s testing shift hypothesis in loction without requiring n ssumption tht scores re distributed normlly. 5. True or Flse: The nonprmetric Kruskl Wllis test nd the prmetric F test lwys test the sme hypothesis, but they require different distributionl ssumptions. 6. True or Flse: Although the F test is more powerful thn the Kruskl Wllis test when the normlity nd homogeneity of vrince ssumptions re met, the Kruskl Wllis test cn be more powerful thn the F test when these ssumptions re not met.

22 Chpter 3 *7. True or Flse: When smple sizes re unequl nd heterogeneity of vrince is suspected in one-wy between-subects designs, the nonprmetric Kruskl Wllis test should be considered seriously s replcement for the usul F test. *8. How do the vlues of F, F*, nd W compre to ech other when smples re of different sizes nd vrinces re considerbly different from one nother? Consider the following summry sttistics: Group Group Group 3 n = 0 n = 0 n 3 = 50 Y = 0 s = 0 Y = s = 0 Y 3 = 4 s 3 = 50. Clculte n observed F vlue for these dt. b. Clculte the F* vlue for these dt (however, you need not compute the denomintor degrees of freedom). c. Clculte the W vlue for these dt. d. Are your nswers to prts c consistent with the ssertion mde in Tble 3E. tht when lrge smples re pired with lrge vrinces the F is conservtive, wheres F* nd W re more robust? 9. Suppose tht, s in Exercise 8, smples re of different sizes, nd vrinces re considerbly different from ech other. Now, however, the lrge vrince is pired with smll smple size: Group Group Group 3 n = 0 n = 0 n 3 = 50 Y = 0 s = 50 Y = s = 0 Y 3 = 4 s 3 = 0. Clculte n observed F vlue for these dt. b. Clculte the F* vlue for these dt (however, you need not compute the denomintor degrees of freedom). c. Clculte the W vlue for these dt (however, you need not compute the denomintor degrees of freedom). d. Are your nswers to prts c consistent with the ssertion mde in Tble 3E. tht when lrge smples re pired with smll vrinces, the F is liberl, wheres F* nd W re more robust? e. Are the F, F*, nd W vlues of this exercise higher or lower thn the corresponding F, F*, nd W vlues of Exercise 8? Is the direction of chnge consistent with Tble 3E.? * 0. Assume the following dt re from one-wy between-subects design (these dt re lso used in Exercise 6 t the end of Chpter 5): Group Group Group Perform nonprmetric test of the difference mong these three groups.

23 Extensions 3 NOTES. Wilcox et l. s (986) results suggest tht Type I error rtes re more likely to be excessive s the number of groups increses. For exmple, with equl ns s smll s, the Type I error rte of the t test remins close to.05, even when the popultion stndrd devitions hve 4: rtio. However, the Type I error rte for four-group ANOVA with equl ns of ws.09 when the popultion stndrd devition of one group ws four times lrger thn the stndrd devition of the other groups. Even for equl ns of 50, the Type I error rte for ANOVA ws.088 in this sitution. Thus, for more thn two groups, wide disprities in popultion stndrd devitions cn mke the usul ANOVA excessively liberl, even with equl n.. SPSS provides the vlue of the test sttistic, degrees of freedom, nd p vlue for both tests. 3. Strictly speking, MS B nd MS W re both unbised estimtors of the sme popultion vrince if homogeneity holds nd the null hypothesis is true. The further ssumptions of normlity nd independence gurntee tht the rtio of MS B nd MS W follows n F distribution. 4. Monte Crlo studies by necessity investigte power differences only under limited set of conditions. Nevertheless, the vlue of.03 would seem to be resonble figure for most prcticl situtions. The single exception is likely to be where n is very smll, in which cse F might enoy lrger dvntge over F*. 5. Clevelnd (985, pp ) presents two grphicl techniques tht re especilly pproprite for udging whether the dt conform to shift hypothesis when compring the distributions of two groups. The percentile comprison grph is obtined by plotting the percentiles of one distribution ginst the corresponding percentiles of the other distribution. If shift of loction describes the difference between the groups, the resultnt plot should resemble stright line. The Tukey sum-difference grph plots sums of corresponding percentiles ginst differences of corresponding percentiles nd should resemble flt stright line when the shift hypothesis holds. Clevelnd rgues tht compring mens my be misleding when the percentile comprison grph is curved (or the Tukey sum-difference grph is not flt). Indeed, in such sitution, ny single vlue (e.g., men or medin) my hide importnt chrcteristics of the difference between the two distributions. Drlington s (973) ordinl dominnce curve methodology provides n interesting lterntive in this sitution. 6. When popultion distributions hve different shpes, lterntive methods hve been developed for testing differences between popultion medins. For further informtion, see Fligner nd Rust (98) nd Wilcox nd Chrlin (986). 7. Although the generl consensus mong sttisticins is tht the F test is robust to violtions of nonnormlity, there re some dissenters to this view. For n exmple, the interested reder should consult Brdley (978), who provides very redble set of rguments for why he believes tht the robustness of prmetric tests hs been oversold. 8. As stted erlier, Tomrken nd Serlin only smpled from norml popultions. Clinch nd Keselmn (98) found F* to be somewht more robust thn W when smpling from nonnorml popultions. 9. Reltive efficiency s smple sizes pproch infinity is referred to s symptotic reltive efficiency, which is often bbrevited ARE. Although ARE is useful concept, the reltive efficiency of two tests in smll smples my differ considerbly from the ARE. In prticulr, one limittion of the Kruskl Wllis test is tht it is typiclly impossible to estblish criticl vlue tht will set α =.05, even when ll ssumptions hve been met. Especilly with smll smple sizes, α my hve to be set considerbly below.05, which inevitbly results in loss of power. In such sitution, the reltive efficiency of the nonprmetric test suffers reltive to the prmetric test. 0. The smple medin is lwys medin-unbised estimtor of the popultion medin for rndom smpling. When the popultion distribution is symmetric, its men nd medin re identicl. Although the smple men nd smple medin re generlly different, both re unbised estimtors of the popultion men of symmetric distribution.. To simplify even further, we show Huber s estimtor with fixed tuning constnt set equl to.0. See Hoglin, Mosteller, nd Tukey (983); Huber (98); nd Wu (985) for more detils.

24 4 Chpter 3. The medin cn be defined in more thn one wy when some scores re tied. We hve chosen the simplest definition here, which simply ignores the presence of ties nd defines the medin to equl the vlue of the middle observtion. 3. Hoglin et l. (983) show tht M estimtors cn be thought of s weighted verges of the observtions. Specific members of the clss differ in terms of how they weight the observtions. For exmple, the men weights ech observtion eqully, wheres Huber s M estimtor weights observtions ner the center of the dt more hevily thn observtions t the extremes. REFERENCES: CHAPTER 3 EXTENSIONS Bker, B. O., Hrdyck, C. D., & Petrinovich, L. F. (966). Wek mesurements vs. strong sttistics: An empiricl critique of S. S. Stevens proscriptions on sttistics. Eductionl nd Psychologicl Mesurement, 6, Blir, R. C. (98). A rection to Consequences of filure to meet ssumptions underlying the fixed effects nlysis of vrince nd covrince. Review of Eductionl Reserch, 5, Brdley, J. V. (968). Distribution-free sttisticl tests. Englewood Cliffs, NJ: Prentice Hll. Brdley, J. V. (978). Robustness? British Journl of Mthemticl nd Sttisticl Psychology, 3, Brown, M. B., & Forsythe, A. B. (974). The ANOVA nd multiple comprisons for dt with heterogeneous vrinces. Biometrics, 30, Clevelnd, W. S. (985). The elements of grphing dt. Belmont, CA: Wdsworth. Cliff, N. (996). Ordinl methods for behviorl dt nlysis. Mhwh, NJ: Lwrence Erlbum Assocites. Clinch, J. J., & Keselmn, H. J. (98). Prmetric lterntives to the nlysis of vrince. Journl of Eductionl Sttistics, 7, Conover, W. J., & Imn, R. L. (98). Rnk trnsformtions s bridge between prmetric nd nonprmetric sttistics. The Americn Sttisticin, 35, 4 9. Drlington, R. B. (973). Compring two groups by simple grphs. Psychologicl Bulletin, 79, 0 6. Dvison, M. L., & Shrm, A. R. (988). Prmetric sttistics nd levels of mesurement. Psychologicl Bulletin, 04, Delney, H. D., & Vrgh, A. (00). Compring severl robust tests of stochstic equlity with ordinlly scled vribles nd smll to moderte sized smples. Psychologicl Methods, 7, Fligner, M. A., & Rust, S. W. (98). A modifiction of Mood s medin test for the generlized Behrens Fisher problem. Biometrik, 69, 6. Gito, J. (980). Mesurement scles nd sttistics: Resurgence of n old misconception. Psychologicl Bulletin, 87, Grdner, P. L. (975). Scles nd sttistics. Review of Eductionl Reserch, 45, Gibbons, J. D. (97). Nonprmetric sttisticl inference. New York: McGrw-Hill. Glss, G. V., Peckhm, P. D., & Snders, J. R. (97). Consequences of filure to meet ssumptions underlying the nlysis of vrince nd covrince. Review of Eductionl Reserch, 4, Hoglin, D. C., Mosteller, F., & Tukey, J. W. (983). Introduction to more refined estimtors. In D. C. Hoglin, F. Mosteller, & J. W. Tukey (Eds.), Understnding robust nd explortory dt nlysis (pp ). New York: Wiley. Hollnder, M., & Wolfe, D. A. (973). Nonprmetric sttisticl methods. New York: Wiley. Huber, P. J. (98). Robust sttistics. New York: Wiley. Imn, R. L., & Dvenport, J. M. (976). New pproximtions to the exct distribution of the Kruskl-Wllis test sttistic. Communictions in Sttistics, Series A, 5, Imn, R. L., Qude, D., & Alexnder, D. (975). Exct probbility levels for the Kruskl-Wllis test. In H. L. Hrter & D. B. Owen (Eds.), Selected tbles in mthemticl sttistics (pp ). Providence, RI: Americn Mthemticl Society. Kendll, M. G., & Sturt, A. (973). The dvnced theory of sttistics: Vol.. Inference nd reltionship (3rd ed.). London: Griffin. Kenny, D. A., & Judd, C. M. (986). Consequences of violting the independence ssumption in the nlysis of vrince. Psychologicl Bulletin, 99, Keselmn, H. J., Rogn, J. C., & Feir-Wlsh, B. J. (977). An evlution of some nonprmetric nd prmetric tests for loction equlity. British Journl of Mthemticl nd Sttisticl Psychology, 30, 3.

25 Extensions 5 Kruskl, W. H., & Wllis, W. A. (95). Use of rnks. Journl of the Americn Sttisticl Assocition, 47, Lbovitz, S. (967). Some observtions on mesurement nd sttistics. Socil Forces, 46, Li, G. (985). Robust regression. In D. C. Hoglin, F. Mosteller, & J. W. Tukey (Eds.), Exploring dt tbles, trends, nd shpes (pp ). New York: Wiley. Lord, F. M. (953). On the sttisticl tretment of footbll numbers. Americn Psychologist, 8, Mnn, H. B., & Whitney, D. R. (947). On test of whether one of two rndom vribles is stochsticlly lrger thn the other. The Annls of Mthemticl Sttistics, 8, Mrscuilo, L. A., & McSweeney, M. (977). Nonprmetric nd distribution-free methods for the socil sciences. Monterey, CA: Brooks/Cole. Mrcus-Roberts, H. M., & Roberts, F. S. (987). Meningless sttistics. Journl of Eductionl Sttistics,, Mxwell, S. E., & Delney, H. D. (985). Mesurement nd sttistics: An exmintion of construct vlidity. Psychologicl Bulletin, 97, Mxwell, S. E., Lu, M. Y., & Howrd, G. S. (05). Is psychology suffering from repliction crisis? Americn Psychologist, 70, Meehl, P. E. (967). Theory-testing in psychology nd physics: A methodologicl prdox. Philosophy of Science, 34, Michell, J. (986). Mesurement scles nd sttistics: A clsh of prdigms. Psychologicl Bulletin, 00, Noether, G. E. (976). Introduction to sttistics: A nonprmetric pproch. Boston: Houghton Mifflin. Oshim, T. C., & Algin, J. (99). Type I error rtes for the Jmes s second order test nd Wilcox s H m test under heteroscedsticity nd nonnormlity. British Journl of Mthemticl nd Sttisticl Psychology, 45, Rogn, J. C., & Keselmn, H. J. (977). Is the ANOVA F-test robust to vrince heterogeneity when smple sizes re equl? An investigtion vi coefficient of vrition. Americn Eductionl Reserch Journl, 4, Schrder, R. M., & Hettmnsperger, T. P. (980). Robust nlysis of vrince bsed upon likelihood rtio criterion. Biometrik, 67, Serlin, R. C., & Lpsley, D. K. (985). Rtionlity in psychologicl reserch: The good-enough principle. Americn Psychologist, 40, Siegel, S. (956). Nonprmetric sttistics for the behviorl sciences. New York: McGrw-Hill. Spencer, B. D. (983). Test scores s socil sttistics: Compring distributions. Journl of Eductionl Sttistics, 8, Stevens, S. S. (946). On the theory of scles of mesurement. Science, 03, Stevens, S. S. (95). Mthemtics, mesurement nd psychophysics. In S. S. Stevens (Ed.), Hndbook of experimentl psychology (pp. 49). New York: Wiley. Tomrken, A. J., & Serlin, R. C. (986). Comprison of ANOVA lterntives under vrince heterogeneity nd specific noncentrlity structures. Psychologicl Bulletin, 99, Townsend, J. T., & Ashby, F. G. (984). Mesurement scles nd sttistics: The misconception misconceived. Psychologicl Bulletin, 96, Vrgh, A., & Delney, H. D. (998). The Kruskl-Wllis test nd stochstic homogeneity. Journl of Eductionl nd Behviorl Sttistics, 3, Welch, B. L. (95). On the comprison of severl men vlues: An lterntive pproch. Biometrik, 38, Wilcox, R. R. (985). On compring tretment effects to stndrd when the vrinces re unknown nd unequl. Journl of Eductionl Sttistics, 0, Wilcox, R. R. (996). Sttistics for the socil sciences. Sn Diego, CA: Acdemic Press. Wilcox, R. R., & Chrlin, V. L. (986). Compring medins: A Monte Crlo study. Journl of Eductionl Sttistics,, Wilcox, R. R., Chrlin, V. L., & Thompson, K. L. (986). New Monte Crlo results on the robustness of the ANOVA F, W, nd F* sttistics. Communictions in Sttistics-Simultion nd Computtion, 5, Wu, L. L. (985). Robust m-estimtion of loction nd regression. In Ν. Β. Tum (Ed.), Sociologicl methodology (pp ). Sn Frncisco: Jossey-Bss. Zimmermn, D. W. (004). A note on preliminry tests of equlity of vrinces. British Journl of Mthemticl nd Sttisticl Psychology, 57, 73 8.

27 Chpter 4 Extensions EXTENSION 4E: DERIVATION OF PARAMETER ESTIMATES AND SUM OF SQUARED ERRORS We now show lgebriclly tht the intuitive resoning bout using the verge of the first two smple mens s the estimte of μ* (see Eqution 4.0 in the text) is correct, nd we lso develop more generl formul tht cn be used when smple sizes re unequl. The gol in estimting μ* is to choose s n estimte whtever vlue minimizes the following expression: n n ( Y ˆ* * ) ˆ i µ + ( Yi µ ) i= i=, (4E.) which is the sum of squred errors for subects in the first nd second groups. However, this expression is equivlent to n ( Y ˆ* i µ ). (4E.) = i= Notice tht in this expression we re summing over n + n individul scores. Although in fct these scores come from two distinct groups, the sum would be the sme if we hd single group of n + n scores. We sw previously tht the smple men of group provides the best (in lest-squres sense) estimte in this cse. Thus to minimize Eqution 4E., we should choose ˆµ * equl to the smple men of the n + n scores in the first nd second groups. Symboliclly, n Yi /( n n). (4E.3) = i= µ ˆ* = + Equivlently, it cn be shown tht the estimte μ* is weighted men of Y nd Y : µ ˆ* = ( ny + ny )/( n + n ), (4E.4) 7

28 8 Chpter 4 which, in the specil cse of equl smple sizes (n = n ), simplifies to µ ˆ* = ( Y + Y )/. (4E.5) As stted erlier, this expression for estimting μ* should mke sense intuitively becuse ccording to the restricted model, μ = μ = μ*. If this is ctully true, Y nd Y differ from one nother only becuse of smpling error, nd the best estimte of the single popultion men is obtined by verging Y nd Y. To test the null hypothesis tht μ = μ, it is necessry to find E R. This turns out to be esy conceptully now tht we know the lest-squres prmeter estimtes for the model of Eqution 4.7 (or, equivlently, Eqution 4.8). Tht it is lso esy computtionlly becomes pprent shortly. If we let Y * represent our estimte ˆµ *, we hve n E = Y R Y + Y ( * ) ( Y ). (4E.6) i = i= = 3 i= n Recll tht our rel interest is in the increse in error brought bout by the restricted model, E R E F. To help mke it esier to see wht this difference equls, we cn rewrite Eqution 4.9 s n i E = Y F Y + Y ( ) ( Y ). (4E.7) i = i= = 3 i= n Now, by subtrcting the terms in Eqution 4E.7 from those in Eqution 4E.6, we see tht the difference E R E F equls n i E E = ( Y Y * ) ( Y Y ). (4E.8) R F i = i= = i= n After some strightforwrd but tedious lgebr, Eqution 4E.8 simplifies to i E R ( Y Y) EF = + n n nn = ( n + n Y Y ). (4E.9) EXTENSION 4E: EXAMPLE OF CORRELATION BETWEEN NONORTHOGONAL CONTRASTS We explore this ide more fully with n exmple using the contrsts of Eqution 4.49 in the text. Suppose tht unbeknownst to us, μ = μ = μ 3 =0. In this cse, it follows tht ψ = ψ = ψ 3 = 0. Although the popultion mens re equl for the three groups, the smple mens, of course, vry from group to group nd from repliction to repliction. According to our ssumptions, the Y vlues re normlly distributed cross replictions. For simplicity in this exmple, s shown in Tble 4E., we ssume tht Y cn tke on only three vlues:

29 Extensions 9 TABLE 4E. ORTHOGONALITY Y Y Y 3 ψˆ ψˆ ψˆ μ, μ, nd μ +. In effect, we re ssuming tht the error for group men is either, 0, or in ny smple. We lso ssume tht these three vlues re eqully likely. (Although this ssumption for the error term is unrelistic, it mkes the implictions of orthogonlity much esier to show thn does the normlity ssumption.) According to our simple model, then, ech Y is 8, 0, or, nd these three vlues occur eqully often. Wht is the reltionship between Y, Y, nd Y 3? They re independent of one nother becuse the three groups of subects re independent. This mens, for exmple, tht knowing Y = 8 sys nothing bout whether Y is 8, 0, or. The first three columns of Tble 4E. show the 7 possible combintions of Y, Y, nd Y 3 tht cn occur, given our ssumptions. As result of the independence between groups, ech of these 7 combintions is eqully likely to occur tht is, ech hs probbility of /7. The next three columns show for ech combintion of Y vlues the resulting vlues for ψ ˆ, ψ ˆ, nd ψ ˆ3, where ψ = Y Y, ˆ ψ ˆ = Y Y3, ψ ˆ = Y + Y ) Y. ( 3 3

30 30 ChAPTER 4 The primry purpose for obtining the vlues in Tble 4E. is to investigte the reltionships mong the different contrsts. Erlier, we rgued intuitively tht ψ nd ψ were relted to one nother. Specificlly, it would seem resonble tht if ψ ˆ is lrge, then ψ ˆ would be lrge lso becuse both involve compring Y with nother group. This possibility cn be explored systemticlly by forming contingency tble relting to the ψ ˆ nd ψ ˆ vlues of Tble 4E.. The top hlf of Tble 4E. is such contingency tble. TABLE 4E. CONTINGENCY TABLES ILLUSTRATING RELATIONShIP OF ψ ˆ TO ψ ˆ AND ψ ˆ TO ψ ˆ3 ψ ˆ ψ ˆ ψˆ ψ 0 3 ˆ 4 Ech entry in Tble 4E. equls the number of times tht prticulr combintion of ψˆ nd ψ ˆ vlues occurs in Tble 4E.. For exmple, the combintion ψ ˆ = 4 nd ψ ˆ = 4 occurs once in Tble 4E., wheres ψˆ = 0 = ψˆ = 0 occurs three times. The combintion ψ ˆ = 4 nd ψ ˆ = 4 never occurs. If we were to divide ech entry in the contingency tble by 7, the result would be bivrite probbility distribution, but this degree of formlity is unnecessry for our purposes. Insted, the importnt point here is simply tht ψ ˆ nd ψ ˆ re correlted. Specificlly, they re positively correlted becuse higher vlues of ψ ˆ tend to be ssocited with higher vlues of ψ ˆ. Thus, smples in which ψ ˆ exceed zero hve systemtic tendency to yield ψ ˆ vlues tht re in excess of zero. Is this lso true of ψ ˆ nd ψ ˆ3? We sw erlier tht ccording to the definition of orthogonlity, ψ nd ψ3 re orthogonl. The bottom hlf of Tble 4E. displys the contingency tble for ψ ˆ nd ψ ˆ3. Are ψ ˆ nd ψ ˆ3 correlted? Cn we predict ψ ˆ3 from ψ ˆ (or vice vers)? Suppose tht ψ ˆ = 4. When ψ ˆ = 4, the best guess concerning ψ ˆ3 is zero, becuse zero is the men vlue of ψ ˆ3 when ψ ˆ = 4. Suppose tht ψ ˆ =. The best guess for ψ ˆ3 is still zero. In fct, for ny given vlue of ψ ˆ, the best guess for ψ ˆ3 is zero. Knowledge of ψ ˆ does not improve prediction of ψ ˆ3. Thus ψ ˆ nd ψ ˆ3 re uncorrected. In this exmple, ψ ˆ nd ψ ˆ3 re not sttisticlly independent becuse the errors were distributed s, 0, nd insted of normlly. With normlly distributed errors, ψ ˆ nd ψ ˆ3 would hve been sttisticlly independent s well s uncorrelted. Thus orthogonl contrsts possess the beneficil property of being uncorrelted with one nother.

31 Extensions 3 Another Look t Nonorthogonl Contrsts: Venn Digrms Another property of orthogonl contrsts cn best be illustrted by exmple. Consider the dt for three groups in Tble 4E.3. It cn esily be shown tht the sum of squres for the test of the omnibus null hypothesis is given by SS B = 90 for these dt. Let s reconsider our three contrsts which re shown in Eqution 4.49 of the text s follows: ψ = μ μ, ψ = μ μ 3, ψ 3 = ( µ µ ) µ 3. TABLE 4E.3 HYPOTHETICAL DATA FOR THREE GROUPS Y 0 3 Y = 8 n ( Yi Y ) i= n ( Y Y) i= i We cn test ech contrst in turn by forming n pproprite restricted model nd compring its error sum of squres to the error sum of squres of the full model. After some computtion, it turns out tht SS(ψ ) =.5, SS(ψ ) = 60.0, SS(ψ 3 ) = Interestingly enough, the sum of SS(ψ ) + SS(ψ 3 ) = 90, which ws the between-group sum of squres. As you might suspect, this occurrence is not ccidentl. Given three groups, two orthogonl contrsts prtition the sum of squres between groups tht is, the sum of the sum of squres for the contrsts equls SS B. More generlly, for groups, orthogonl contrsts prtition the between-group sum of squres. This fct provides nother perspective on the unique informtion provided by ech member of set of orthogonl contrsts. If we decide to test ψ nd ψ 3 s given here, then we hve completely ccounted for ll differences between the three groups. In this sense, ψ nd ψ 3 together extrct ll vilble informtion concerning group differences. Venn digrms re sometimes used to depict this sitution visully. Figure 4E. shows how ψ nd ψ 3 together ccount for SS B, which is represented by the entire circle.

32 3 Chpter 4 FIGURE 4E. Venn digrm of reltionship between SS( ψ ˆ ), SS( ψ ˆ3 ), nd SS B FIGURE 4E. Venn digrm of reltionship between SS( ψ ˆ ), SS( ψ ˆ ), nd SSB However, suppose we test ψ nd ψ. The sum of SS(ψ ) nd SS(ψ ) fils to ccount for ll of the between-group sum of squres becuse these two contrsts re nonorthogonl. Figure 4E. shows tht ψ nd ψ overlp. At this point, you might think tht the combintion of ψ nd ψ is inferior to ψ nd ψ 3 becuse is less thn the 90 sum of Consider, however, the possibility of testing ψ nd ψ 3 together. It would seem tht these two contrsts, which re nonorthogonl, somehow ccount for more of difference between the groups thn ctully exists. Tht this is not true cn be seen from Figure 4E.3. Becuse ψ nd ψ 3 re nonorthogonl, there is substntil overlp in the res they represent. Thus they do not ccount for more between-group vribility thn exists. This illustrtes n importnt principle: The sums of squres of nonorthogonl contrsts re not dditive for exmple, the sum of hs no mening here. However, the sums of squres of orthogonl contrsts cn be dded to determine the mgnitude of the sum of squres they ointly ccount for. One dditionl point concerning orthogonlity is of interest. Why is contrst defined to hve the restriction tht the sum of its coefficients must equl zero tht is, Σ = c = 0? The reson

Extensions 33 FIGURE 4E.3 Venn digrm of reltionship between SS( ψ ˆ ), SS( ψ ˆ3 ), nd SS B for this restriction is tht it gurntees tht the contrst is orthogonl to the grnd men μ.

33 Extensions 33 FIGURE 4E.3 Venn digrm of reltionship between SS( ψ ˆ ), SS( ψ ˆ3 ), nd SS B for this restriction is tht it gurntees tht the contrst is orthogonl to the grnd men μ. Notice tht μ is like contrst in the sense tht it is liner combintion of the popultion mens. With equl n for groups, Consider generl contrst of the form = µ = µ / = ( / ) µ. = ψ = c µ. = Is ψ orthogonl to μ? Applying Eqution 4.50 yields (/)(c ) + (l/)(c ) + + (/)(c ) s the sum of the products. Becuse / is common term, it cn be fctored out, resulting in (/)(c + c + + c ), which equls ( ) Σ = c. This must equl zero for ψ to be orthogonl to μ, but we know Σ = c does equl zero, given the definition of contrst. The Σ = c = 0 condition lso cn be shown to pply for unequl n, given the more generl definition of nonorthogonlity. If we llowed contrsts in which Σ = c ws nonzero, such contrst would not be orthogonl to μ. Why should contrst be orthogonl to μ? Contrsts should represent differences between the groups nd should thus be insensitive to the men score verged over ll groups. By requiring tht Σ = c = 0, the informtion obtined from ψ is independent of the grnd men nd hence reflects pure differences between the groups. If contrst in which Σ = c 0 is llowed, the informtion then reflects some combintion of group differences nd the size of the grnd men. For exmple, consider four-group problem in which the experimenter decides to test liner combintion

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of