Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth Unversty, 3 School of Informaton Technology and Mathematcal Scences, Unversty of Ballarat. E-Mal: Mke.Steele@cu.edu.au Keywords: Goodness-of-ft; ; Emprcal dstrbuton functon. EXTENDED ABSTRACT The use of goodness-of-ft test statstcs for dscrete or categorcal data s wdespread throughout the research communty wth the Ch- Square the most popular when a researcher ams to determne f observed categorcal data dffers from a hypotheszed multnomal dstrbuton. Even for ordnal categorcal data, the use of emprcal dstrbuton functon (EDF) test statstcs such as the Kolmogorov-Smrnov, the three Cramér-von Mses (A, W and U as defned below) and varous modfcatons of these are lmted n the lterature. studes of the EDF type test statstcs are even more lmted. The results of the smulated power studes n ths paper lead to the followng general recommendatons: For trend type alternatves A and W appear much more powerful than U and χ. (See Fgure for a unform null aganst a decreasng trend alternatve dstrbuton). For all the other nvestgated alternatve dstrbutons U and χ appear much more powerful than A and W. (See Fgure 3 for a unform null aganst a leptokurtc type alternatve dstrbuton). Ths paper compares the smulated power of the three Cramér-von Mses test statstcs wth that of the Ch-Square test statstc for a unform null hypothess aganst a varety of alternatve dstrbutons whch are summarzed n Fgure. Recommendatons are made on whch s the most powerful test statstc for the predefned alternatve dstrbutons. 3 5 Decreasng trend Step Fgure. s for a unform null and a decreasng alternatve dstrbuton. Trangular Platykurtc 3 5 Leptokurtc Bmodal Fgure. Type of alternatve dstrbutons used n the power studes. Fgure 3. s for a unform null and a leptokurtc alternatve dstrbuton. 3

. INTRODUCTION Although desgned for ordnal categorcal data, the emprcal dstrbuton functon (EDF) type goodness-of-ft test statstcs Cramér-von Mses (W ), Anderson-Darlng (A ) and Watson (U ) as defned by Choulakan et al. (994) are not wdely used n the appled lterature. These authors have used smulaton studes to show that A and W are relatvely more powerful than the Ch-Square (χ ) test statstc (Pearson 9) when the null dstrbuton s unform and the alternatve dstrbuton follows a trend. The test statstcs are specfed n Table. Table. Test statstcs used n the power study. Test Statstc Cramérvon Mses Anderson- A Equaton = k W N Z p () = = k Z p N = Darlng H( H) k = = ( ) Watson () U N Z Z p (3) Pearson s k ( O E ) Ch- χ = (4) = E Square where k s the number of cells, N s the sample sze, p s the probablty of an event occurrng n cell, E s the expected frequency n cell, O s the observed frequency n cell, Z = O E, = H E and = k Z Z p. ( ) = = = There have been lmted nvestgatons of the powers of these partcular EDF type test statstcs. Ths paper uses smulated powers to extend the studes of Choulakan et al. (994) and From (996) by comparng the powers of the three Cramér-von Mses type test statstcs wth the χ test statstc for a unform null dstrbuton (A ) aganst the fully specfed alternatve dstrbutons summarzed n Fgure (Decreasng A, Step A, Trangular or bath-tub type A 3, Platykurtc A 4, Leptokurtc A 5 and Bmodal A 6 ) and fully defned n Table. The unform null dstrbuton was used because most smlar publshed power studes of dscrete goodness-of-ft tests have used such a null dstrbuton however further work on non-unform null dstrbutons has been undertaken by Steele (). For a small number of categores some of the alternatve dstrbutons do not clearly exhbt the shapes llustrated n Fgure. Also the dstrbutons become qute smlar for a small number of categores. For ths reason a larger number of categores (k=) was used. Table. Dstrbutons used n the power study. Cell Probabltes 3 4 5 6 7 8 9 A.......... A.3.3..8.7.7.6.6.5.5 A.5.5.5.5.5.5.5.5.5.5 A 3.7.3..7.3.3.7..3.7 A 4.4.....4 A 5.5.5.5.5.3.3.5.5.5.5 A 6.5..7..6.6..7..5 In Secton the smulaton and lnear nterpolaton technques used to approxmate power are dscussed wth sample sze consderatons. The results of the power studes are presented n Secton 3 and a summary table of the most powerful test statstc for each alternatve dstrbuton s presented n the concludng Secton 4.. CALCULATION OF THE SIMULATED POWER For a unform null dstrbuton over ten cells aganst the alternatve dstrbutons defned n Table the powers of the test statstcs are approxmated for sample szes of,, 3, 5, and. The sample szes represent expected frequences of,, 3, 5, and per cell under the unform null dstrbuton and by selectng these expected frequences researchers who use goodness-of-ft tests wth a mnmum requrement of 5 observatons per cell can make power comparsons for dfferent mnmum number of observatons per cell. It s also shown n the results that n most of the stuatons dscussed below that sample szes of around per cell produce power approxmatons very close to. The powers are estmated usng smulated random samples. The smulated null dstrbuton of each test statstc s dscrete whch means that a crtcal value and correspondng power at a sgnfcance level of exactly 5% may not be possble. To enable meanngful comparsons of the powers of each test statstc, the powers are obtaned for crtcal values ether sde of the 5% level, and lnearly nterpolated to produce the approxmate power for the 5% level. 3

3. POWER STUDY RESULTS 3.. Unform Null wth a Decreasng (A ) Alternatve For small sample szes Fgure 4 shows that A and W have powers greater than χ and U. The largest cumulatve dfference between the unform null and the decreasng alternatve dstrbuton occurs at the second cell and as A and W are affected by large cumulatve dfferences at the earler cells ths s one reason why they have larger power under these crcumstances. Also χ generally has hgher power than U. For sample szes of at least 5 per cell (e N 5 n ths example), the powers of all the test statstcs are very hgh. 3 5 3 5 Fgure 4. s for unform null and decreasng (A ) alternatve. 3.. Unform Null wth a Step Type (A ) Alternatve For the step type dstrbuton the cumulatve dfference between a unform null and the step type A dstrbuton ncreases up to the ffth cell. Because they are more able to detect larger cumulatve dfferences n the earler cells the test statstcs A and W are shown n Fgure 5 to be more powerful. It should be noted that the power of U s almost as good as A and W whle the power of χ s notceably less than the three Cramér-von Mses type test statstcs. For larger sample szes of ten or more per cell (e N n ths stuaton) the powers of all four test statstcs are very hgh and approxmately the same. Fgure 5. s for a unform null and step type (A ) alternatve. 3.3. Unform Null wth a Trangular (A 3 ) Alternatve The maor cumulatve dfferences between a unform dstrbuton and the A 3 trangular alternatve dstrbuton do not occur n the earler cells as was the case n Sectons 3. and 3.. For ths reason t s expected that A and W are less lkely to detect a dfference and hence have lower power. The U statstc s crcular n that although t can be used on ordnal type data, calculaton of the test statstc does not depend on whch cell s defned as the frst. Ths crcular test statstc s shown n Fgure 6 to be much more powerful than the other three test statstcs. However for larger sample szes the powers of all the test statstcs are approxmately the same and hgh. Ths result also corresponds to a smlar trangular type alternatve dstrbuton based on cells by Choulakan et al. (994). 3.4. Unform Null wth a Platykurtc (A 4 ) Alternatve As the cumulatve dfferences between a unform null and the A 4 platykurtc alternatve dstrbuton are not large the A and W test statstcs are expected to have lower power. The power of W s shown n Fgure 7 to be very poor for all sample szes however for the smaller sample szes of fve per cell (that s N 5) under the unform null all the test statstcs have poor power. For larger sample szes χ and U are shown to have much hgher power. 3

3 5 3 5 Fgure 6. s for unform null and trangular (A 3 ) alternatve. Fgure 8. s for unform null and leptokurtc (A 5 ) alternatve. 3.6. Unform Null wth a Bmodal (A 6 ) Alternatve The powers of the test statstcs are shown n Fgure 9 to be qute dverse. The power of χ s shown to be approxmately double those of the other test statstcs for smaller sample szes. Although the power of U s qute low t s stll much larger than the very weak powers of A and W. 3 5 Fgure 7. s for unform null and platykurtc (A 4 ) alternatve. 3.5. Unform Null wth a Leptokurtc (A 5 ) Alternatve As was also the case n Secton 3.4, the cumulatve dfferences between the unform null and the leptokurtc A 5 alternatve are qute small for earler cells and the low powers of A and W n Fgure 8 show ths to be true for smaller sample szes. The powers of U and χ are shown to be approxmately equal for all sample szes. It appears that due to ts crcular nature, U s more able to detect the large cumulatve dfferences whch occur at the mddle cells. 3 5 Fgure 9. s for unform null and bmodal (A 6 ) alternatve. 33

4. CONCLUSIONS Although t s not possble to recommend one of these test statstcs as beng the most powerful for all stuatons a very broad summary of the smulated powers n ths paper suggests that, partcularly for smaller sample szes: For trend type alternatves A and W appear much more powerful than U and χ. For all the other nvestgated alternatve dstrbutons U and χ appear much more powerful than A and W. Importantly, when consderng the power of the test statstc, the smulated results presented n ths and other papers suggests that the appled researcher should not blndly use one partcular test statstc. However the broad summary above may assst an appled researcher to at least consder alternatves to the χ test when testng whether ther observed ordnal data dffers from that expected under a multnomal null dstrbuton. 5. REFERENCES Choulakan, V., Lockhart, R.A. and Stephens, M.A (994), Cramér-von Mses statstcs for dscrete dstrbutons, The Canadan Journal of Statstcs, (994) 5-37. From, S.G. (996), A new goodness of ft test for the equalty of multnomal cell probabltes verses trend alternatves, Communcatons n Statstcs-Theory and Methods, 5(996) 367-383. Pearson, K. (9), On the crteron that a gven system of devatons from the probable n the case of a correlated system of varables s such that t can be reasonably supposed to have arsen from random samplng, Phlosophcal Magazne Seres 5, 5(9) 57-75. Steele, M., Chaselng, J. and Hurst, C. (5), A power study of goodness-of-ft tests for categorcal data, 55 th Sesson of the Internatonal Statstcal Insttute, Proceedngs, Sydney, Australa. Steele, M (), The power of categorcal goodness-of-ft test statstcs, PhD thess, Grffth Unversty, Brsbane, Australa. 34