Noparametri Goodess-of-Fit Tests for Disrete, Grouped or Cesored Data Boris Yu. Lemeshko, Ekateria V. Chimitova ad Stepa S. Kolesikov Novosibirsk State Tehial Uiversity Departmet of Applied Mathematis Karl Marx 639 Novosibirsk, Russia (e-mail: headrd@fpm.ami.stu.ru) Abstrat. The problems of appliatio of oparametri Kolmogorov, Cramer-vo Mises- Smirov, Aderso-Darlig goodess-of-fit tests for disrete, grouped ad esored data have bee osidered i this paper. The use of these tests for grouped ad esored data as well as samples of disrete radom variables is based o Smirov trasformatio. The overgee of statisti distributios to the orrespodig limitig distributio laws has bee ivestigated uder true ull hypothesis by meas of statistial simulatio methods, as well as the test power agaist lose ompetig hypotheses. For disrete ad grouped data the riteria have bee ompared by power with Pearso hi-squire test. The riteria have bee also ompared by power with the modified oparametri tests for esored samples. Keywords: Goodess-of-fit tests; disrete, grouped, esored data; Smirov trasformatio; Kolmogorov test, Cramer-vo Mises-Smirov test, Aderso-Darlig test. Itrodutio I ase of disrete or grouped data there are o evidet problems with testig simple hypotheses about goodess-of-fit of a empirial distributio to theoretial law oly if χ goodess-of-fit tests are beig used. Diret appliatio of Kolmogorov, ω Cramer-vo Mises-Smirov or Ω Aderso-Darlig tests is impossible, as the limitig statisti distributios for these riteria are obtaied o the assumptio of radom variable otiuity. For testig simple goodess-of-fit hypotheses from right ad/or left esored samples oe a use the Reyi test [Reyi, 953], Kolmogorov-Smirov [Barr ad Davidso, 973], ω Cramer-vo Mises-Smirov or Ω Aderso-Darlig [Pettitt ad Stephes, 976] modified tests. However i ase of esored data, these riteria have a umber of disadvatages embarrassig their appliatio i pratie. I partiular, Reyi statisti distributio overges to the limitig law very slowly, espeially for high or, o the otrary, low esorig degree [Lemeshko ad Chimitova, 4]. The distributios of modified Kolmogorov-Smirov, Cramer-vo Mises-Smirov ad Aderso-Darlig tests overge rather quikly This researh was supported by the Russia Foudatio for Basi Researh, projet o. 6--59
Lemeshko et al. to the orrespodig limitig laws for small esorig degree [Lemeshko ad Chimitova, 4]. Appliatio of the riteria for esored data has t bee realized almost i ay kow for us software system of statistial aalysis. Ad hee they are hardly available for a large umber of speialists. M. Nikuli has attrated our attetio to the possibility of effetive appliatio of oparametri goodess-of-fit tests for the aalysis of grouped ad esored data ad samples of disrete radom variables by meas of Smirov trasformatio ad the radomizatio, eablig to move from stairase ad disotiuous distributio futio to the otiuous oe [Greewood ad Nikuli, 6]. The advatages of suh approah are evidet as we move to the problem of testig goodess-of-fit of the empirial distributio obtaied after trasformatios to the otiuous (uiform) distributio law. Smirov trasformatio is used rather ofte i statistial aalysis. Let us test whether the radom sample X, X,..., X orrespods to the law with distributio futio F (x). The trasformatio U = F x ) overts the X X,..., X, observed sample of radom variables ito the sample of values uiformly distributed o the iterval [, ]. The the hypothesis about belogig U U,..., U, of to the uiform law a be tested, for example, usig the Kolmogorov riterio with statisti D = sup u F ( u) u where F (u) is the empirial distributio futio. i ( i, () The radomizatio as a tehique of oversio of grouped ad esored data ad disrete variable observatios to the otiuous variable observatio is really appliable oly i omputer aalysis. The purpose of the paper is to ivestigate some pratial aspets of appliatio of lassial goodess-of-fit tests for the aalysis of grouped ad esored data ad disrete variable observatios i ase of usig the Smirov trasformatio with radomizatio. I the paper it has bee studied the overgee of statisti distributios to the orrespodig limitig laws, as well as the power of the osidered riteria for testig lose ompetig hypotheses. Grouped ad disrete data Let us test simple hypothesis about goodess-of-fit of grouped sample to the theoretial distributio law F (x). Grouped sample of the size is give with the boudary poits itervals, x ad k x < x <... < x k < x, where k is the umber of k x are the left ad right boudaries of the radom variable domai respetively, ad the umber of observatios i falle ito the i -th
k Noparametri Goodess-of-Fit Tests 3 iterval, i =. Assume Y ij ( i =,... k, j=,... i ) are idepedet i= realizatios of the radom variable uiformly distributed o [,]. The the radom variables obtaied with radomizatio o the groupig itervals x, ( i, xi ] U F x ) + Y [ F( x ) F( x )], i,..., k ij = ( i ij i i =, j =,...,i, () are idepedet ad uiformly distributed o [,]. The statemet () allows [Greewood ad Nikuli, 6] to move from grouped sample to omplete sample of idividual observatios uiformly distributed o [,]. After that oe a test the simple hypothesis about goodessof-fit of the empirial distributio, built by the sample of values U, i =,..., k, j =,..., i, to the uiform distributio usig ay oparametri goodess-of-fit test. A sample of observatios X, X,..., X of some disrete radom variable a be similar to the grouped ase trasformed to the sample of uiformly distributed observatios U F( X ) + Y [ F( X ) F( X )], i =,...,, (3) i = i i i i = limf( x z ad Y, Y,..., Y are z idepedet where F( x ) ) realizatios of the radom variable uiformly distributed o [,]. I radomizatio the values Y ad Y i the statemets () ad (3) have to be ij i simulated i aordae with the uiform distributio o [,]. I [Lemeshko ad Postovalov, ] it was show that oparametri goodess-of-fit test statisti distributios i ase of otiuous distributio laws ad omplete samples overge to orrespodig limitig laws very quikly. The limitig laws a be already used with without risk of makig a great mistake. Noparametri goodess-of-fit test statisti distributios have bee ivestigated for disrete radom variables ad grouped samples of otiuous values with the usage of osidered approah. It has bee show that empirial distributios of oparametri test statistis also overge with the sample size growth to the orrespodig limitig laws very fast. For example, i the figure the limitig Kolmogorov law K (S) ad obtaied after simulatio of empirial distributio of Kolmogorov test statisti G K H ) are show. The true ( hypothesis H uder test is about goodess-of-fit to the ormal law. The empirial distributio is built by N = grouped samples of the size = with k = groupig itervals i ase of asymptotially optimal groupig method. ij
4 Lemeshko et al. As the Kolmogorov test statisti we have used the statisti with Bolshev s orretio 6 D + K 6 + + i where D = max{ D, D }, D = max F( X ( i) ), i i D = max F( X ( i) ). i =, (4) Fig.. The empirial distributio futio of statisti (4) ad the limitig Kolmogorov distributio law The empirial distributio of the Kolmogorov statisti perfetly fits the Kolmogorov law K (S) eve for =. This fat is also ofirmed with the * values of ahieved sigifiae level P { S > S } while testig hypothesis about goodess-of-fit of the sample of statisti s (4) values to the Kolmogorov distributio K (S) with χ Pearso, ω Cramer-vo Mises-Smirov, Ω * Aderso-Darlig ad Kolmogorov riteria. S is the value of orrespodig goodess-of-fit test statisti. The similar results about overgee of statisti distributios to the limitig laws for grouped data ad disrete radom variables have bee obtaied for Cramer-vo Mises-Smirov ad Aderso-Darlig riteria. It has bee also
Noparametri Goodess-of-Fit Tests 5 show that the rate of overgee of G S H ) to the limitig laws of ( statisti S does ot deped o the groupig method ad the umber of groupig itervals k. 3 Cesored data Let X,...,, X X be a sample of idepedet similarly distributed radom variables. A set of values X ( ) X ()... X ( r) ( X ( r) X ( r+ )... X ( ) ) is alled a right/left esored sample, where r< is the umber of omplete observatios, ad the rest r observatios are esored. The modifiatios of oparametri Kolmogorov-Smirov, Cramer-vo Mises-Smirov ad Aderso-Darlig tests are itrodued i [Barr ad Davidso, 973], [Pettitt ad Stephes, 976] for testig goodess-of-fit by esored samples. I partiular the Kolmogorov statisti for esored data is defied by K = sup F( x) F ( x), where M = { x : F( x) a} for left esorig M ad M = { x : F( x) a} for right esorig, a (,) is the esorig degree. The limitig distributio of the Kolmogorov statisti K for esored data is give as [Barr ad Davidso, 973] + i a S a P{ K < S} = ( ) exp( i S ) P X is < = K ( S) i= a a a where X is the stadard ormal radom variable. Whe a = the limitig distributio of statistis K oiides with the Kolmogorov distributio K (S). As before it is possible to move from a esored sample to the sample of radom variables U, U,..., U, uiformly distributed o [,]. I ase of right esorig we have U = F ), U = F ),, U = F ), ad ( X ( ) r, Ur+ U ( X ( ) r ( X (r ) the values U +,..., are simulated uiformly o the iterval [ F ( x ), ], where x is the esorig poit. I ase of the first type esorig the poit x is fixed ad the umber of omplete observatios r is radom. I the seod type esorig the last (first) observed value i sample is take as x. Classial Kolmogorov, Cramer-vo Mises-Smirov ad Aderso-Darlig tests a be applied to aalyze trasformed sample. The empirial distributios of statisti (4) ad modified Kolmogorov statisti by the esored sample are represeted i the figure. The orrespodig limitig distributios are give i the figure for ompariso. Statisti s values are
6 Lemeshko et al. alulated by right esored samples from the expoetial distributio of the sample size = ad esorig degree 8% (the right part of radom variable domai is iaessible for observatio, probability to fall i whih is equal to a =.8 ). The empirial distributio of Kolmogorov statisti K, alulated from the trasformed samples, perfetly agrees with the limitig law K( S ) already for =. At the same time the empirial distributio of the modified Kolmogorov statisti K, applied diretly to esored samples of the same size, essetially a differ from the limitig law K ( S ). Fig.. The distributios of Kolmogorov test statisti i testig goodess-of-fit to the expoetial law i ase of = ad esorig degree 8% Distributios of statisti K (with Smirov trasformatio ad radomizatio) have bee ivestigated with differet types ad degrees of esorig. It has bee show that the rate of overgee of empirial distributios G K H ) to K (S) does ot deped o the type ad degree of ( esorig. Similar results have bee obtaied for ω Cramer-vo Mises- Smirov ad Ω Aderso-Darlig tests. Empirial distributios G( K H ) agree with the limitig law K a (S) rather well begiig with = 3 oly whe esorig degree is less tha 5% ( a <. 5 ). If the esorig degree ireases up to 95%, suffiiet loseess of
Noparametri Goodess-of-Fit Tests 7 G( K H ) to K a (S) takes plae if 5 [Lemeshko ad Chimitova, 4]. 4 Some remarks o the test power There is o doubt that olusios obtaied i [Lemeshko et al., 7], oerig the omparative aalysis of the test power for lose ompetig hypotheses, are also plae for grouped samples. For esored data it is worth omparig the power of lassial riteria applied to the trasformed data with the power of modified for esored samples tests [Barr ad Davidso, 973], [Pettitt ad Stephes, 976]. For example, the power of modified Kolmogorov test essetially depeds o the esorig degree. By meas of statistial modelig methods we have show that the higher esorig degree the more modified Kolmogorov test exeeds by power the Kolmogorov test with Smirov trasformatio ad radomizatio. For small esorig degrees (approximately up to 3%) these riteria are lose by power. Fig. 3. The distributios of Kolmogorov statisti for the true hypothesis H ad H The illustratio (fig. 3) shows two ases of modified Kolmogorov test statisti distributios applied to esored sample ad two ases of test statisti distributios alulated by the trasformed sample U, U,..., U. I the first ase the hypothesis H, the Weibull distributio with the form parameter 3, is true; ad i the seod ase the ompetig hypothesis H, the Weibull
8 Lemeshko et al. distributio with the form parameter 3.5, is true. The sample size = 3, seod type right esorig, the esorig degree a =. 5. 5 Colusio The results of ivestigatio eable to olude a good possibility to use the approah osidered (Smirov trasformatio with radomizatio) for orret appliatio of lassial oparametri goodess-of-fit tests for grouped ad esored data ad samples of disrete radom variables. I ase of simple hypothesis testig, oparametri statisti distributios overge to statisti limitig distributios very quikly. For the sample size oe a use the limitig laws without risk of makig a great mistake. The ifluee of groupig methods o the power of oparametri goodessof-fit tests should be ivestigated i more detail. The appliatio of Smirov trasformatio with radomizatio is quite effiiet for realizatio i software systems of statistial aalysis. It expads the possibilities of the lassial oparametri goodess-of-fit tests` appliatio to grouped data ad disrete radom variables. Referees [Barr ad Davidso, 973] Barr D.M., Davidso T. A Kolmogorov-Smirov test for esored samples. Tehometris, 973. V. 5. N. 4. [Greewood ad Nikuli, 996] Greewood P.E., Nikuli M.S. A Guide to Chi-Squared Testig. Joh Wiley & Sos, I. 996. 8 p. [Lemeshko ad Chimitova, 4] Lemeshko B.Yu., Chimitova E.V. Ivestigatio of the estimates properties ad goodess-of-fit test statistis from esored samples with omputer modelig tehique // Proeedigs of the Seveth Iteratioal Coferee Computer Data Aalysis ad Modelig: Robustess ad Computer Itesive Methods, September 6-, 4, Misk. Vol.. P. 43-46 [Pettitt ad Stephes, 976] Pettitt A.N., Stephes M.A. Modified Cramer vo Mises statistis for esored data // Biometrika, 976. V. 63. N.. [Reyi, 953] Reyi A. O the theory of order statistis // Ata Mathem. Aad. Si. Hug. 953. Vol. 4. P. 9-3. [Lemeshko ad Postovalov, ] Lemeshko B.Yu., Postovalov S.N. O the depedee of oparametri test statisti distributios ad the test power o parameter estimatio method // Zavodskaya Laboratoriya. Diagostika materialov.. Vol. 67. - 7. - P. 6-7. (i Russia) [Lemeshko et al., 7] Lemeshko B.Yu., Lemeshko S.B., Postovalov S.N. Power goodess-of-fit tests at lose alteratives // Izmeritelaya Tehika. 7.. P. -7. (i Russia)