A Algothm of a Logest of Rus Test fo Vey Log equeces of Beoull Tals Alexade I. KOZYNCHENKO Faculty of cece, Techology, ad Meda, Md wede Uvesty, E-857, udsvall, wede alexade_kozycheko@yahoo.se Abstact A ew algothm of computg statstcs of a logest of us test s poposed fo the case of equal pobablty Beoull tals pocesses. The algothm s fouded o the aalyss of the evet tee dagam, whch has show the ole of Fboacc umbes of hghe odes coutg the umbe of outcomes of teest the sample space. The poof by ducto s gve. Compaed to the classcal combatoal fomulas, the poposed algothm povdes the eo-fee exact pobabltes ad makes possble the pocessg of vey log bomal data sets up to 3 o cotempoay computes. Keywods: us tests, logest u, Beoull tals, Fboacc umbes, computg algothms Mathematcs ubject Classfcatos: 6G; 6-4; B39. Itoducto Dstbuto-fee tests fo adomess of a sample data play a mpotat ole much statstcal feece ad ae elevat to may applcatos socology, bology, psychology, egeeg etc., cludg such patcula poblems as egesso ad
cuve fttg. Thee s a geat body of lteatue o the subject, wothy of meto of whch ae the books by egel ad Castella, [], ad pet []. Ths pape s coceed wth the computatoal aspects of a mpotat dstbutofee us test, amely, the logest of us test of adomess appled to log pattes of bomal tals. Amogst a umbe of publcatos the us tests vestgato, t s woth metog the pape by Mood, [3], ad the moogaph by Badley, [4], who gave a detaled teatmet of us tests, as well as the appopate suvey of the woks doe 94s-6s by Olmstead, Mostelle, Gat, Bu ad Cae, et al. The exstg us tests ae based o ethe umbes of us o legths of us. The total-umbe-of-us test povdes both exact combatoal fomulas ad asymptotc oes the assumpto of omally dstbuted statstcs fo lage samples see, e.g., [4], p. 6. Howeve, fo the logest of us tests thee s o such a asymptotc theoy, ad we have to use the exact combatoal fomulas. o, the questo ases as to whethe those fomulas ae applcable fo computg the statstcs o cotempoay computes the case of lage samples, o t s ecessay to deve moe adequate theoy applcable to pocessg log samples. P. Aalyss of the classcal combatoal fomulas fo the logest of us test The covetoal appoach to devg a geeal fomula fo the pobablty?, of obtag at least oe u of legth o geate amog ethe the s o the s had bee descbed [3]. It s to be oted that ethe cludes the possblty
3 of both. The appoach s based o the fomula of calculatg the pobablty of a sum of adom compatble evets: P?, P P,, + P ad,, whee,,? ae umbes of us of s, s, ad of uspecfed type of elemet cotag the u, espectvely; P s the pobablty of obtag at least oe u of legth, amog the s but ot amog the s; P s the pobablty of obtag at least oe such u amog the s, but ot amog the s; P ad s the pobablty of obtag at least oe such,, u amog both the s ad the s; uppose that a sequece of tals cotas s ad s. I ths case, the pobabltes ca be computed o the followg combatoal fomulas: P, / + +, / + + P,, 3
4 + + + + + + + + / / / /,, ad P 4 These u fomulas take ad as gve. But the case of Beoull tals, whe ad ae mutually exclusve outcomes wth pobabltes p ad q espectvely of occuece o a sgle tal, t would be coveet to elmate paametes ad. The exteso whee ad ae ot fxed, so that the pobablty s completely espectve of ad depeds oly o ad p, s descbed [3]. The compoud pobablty s obtaed by takg the poduct of the bomal pobablty q p ad the pobablty beg computed o -4. The sum of that poduct ove all possble values of gves the sought-fo pobablty:,,?,?, p p P p P 5 Evdetly, t s woth whle developg the computg algothm ode to check the coectess ad to evaluate the pefomace of ths fomula. The autho has ceated the C++ pogam that computes the pobablty,?, p P usg the fomulas - 5. The code s placed Appedx A. A umbe of computg tests has bee
5 accomplshed, ad the aalyss of the esults has evealed two dawbacks of the fomula 5. Fst of all, t gves a systematc eo that mafests tself the expesso 4. Let us cosde, fo stace, the case of 8, 4, p.5. The computatos o the fomula 5 gve the pobablty P,.375 ad the umbe of?, p outcomes of teest N 96 the umbe of all possble outcomes equals to 8 56. Howeve, the coect values ae.367875 ad 94, espectvely. The easo s the fomula 4 that gves the eoeous zeo value fo the pobablty P, ad,, wheeas the coect value s -7, whch coespods to two outcomes of teest avalable: ad. I ode to futhe checkg of the classcal fomulas, the autho developed a bute-foce algothm based o the beadth-fst seach techque. It gves the exact solutos, but has the expoetal computato tme t O ad theefoe caot be appled to samples of legth > 4. The compaso of the esults obtaed by ths algothm wth that of the classcal fomulas dscloses that the classcal fomulas gve a egula postve eo whe. 5. Ths evdetly cofms the fallacy of the fomula 4, sce t elates to the paths wth two o moe us of legth whe the fomula 4 s appled. ecodly, the computg tests have evealed a uppe lmt o the legth of bomal sequeces beg pocessed o cotempoay PCs equpped, e.g., the AMD Athlo 64x Dual Coe pocesso 46+. Ths lmt amouts to 8 fo < ad p.5. uch a estcto does ot actually allow pocessg vey log bomal sequeces whee the adequate powe of us tests could be attaed.
6 3. Descpto of the poposed algothm usg the Fboacc umbes, ts poof, ad pefomace The logest of us test of a Beoull tals pocess ca be aalysed usg a bay tee dagam as the stadad techque of epesetg the sample space ad coutg pobabltes, [5]. The paths wth outcomes of teest cota at least oe u of legth o geate. They all ae dcated Fg., whee sold ad dotted les mea success o, say, ad falue o of a chace expemet, coespodgly. Numbe of a tal j, 3 4 5 6 7 8 C Fg.. The half of a evet tee dagam showg the umbe of the us of legth the sequece of Beoull tals of legth 8 4 B F D A E 4 7 Fboacc umbes F+-, - of ode -, j- 4 ome paths cotag a u of legth at the eds ae show completely fom the oot to a leaf e.g., ABDF, wheeas the othes ae pooled to clustes of paths havg both a tal commo explct pat that eds wth a u of legth ad a subsequet abtay sub-tee see, e.g., clustes ABC o ABDE.
7 We wll cosde the patcula ad most mpotat case of a Beoull tals pocess wth equal pobabltes of successes ad falues o a chace expemet p q.5. Hee, the pobabltes of all outcomes ae the same, beg equal to.5 fo Beoull tals. Hece, ode to compute the pobablty P?, of obtag at least oe u of legth o geate amog ethe the s o the s we eed to calculate the umbe of outcomes of teest. I the case 8, 4 depcted Fg., ths umbe ca be estmated as follows: 4 3 8, p.5 + + + 4 + 7 N, 6 4?, 4 3 whee the fst tem,, coespods to the cluste ABC, the secod oe,, coespods to ABDE, ad so o utl the tem 7 that elates to the dvdual paths ot cludg sub-tees, as ABDF. As we ca see, the factos,,, 4, 7 fom a pat wthout zeos of the sequece of Fboacc umbes of 3 d ode. Havg aalysed the tee dagams fo othe, cases, the geeal fomula fo abtay, < s deved: N, p.5?, F + F +... + F +... + F F, +, + 3, whee F, s a + th Fboacc umbe of - ode. + +, +, 7 o, the fomula fo the sought-fo pobablty s deved fom by dvdg t by the umbe of all possble outcomes : F +, P + 8?,, p.5 The fomula 7 ca be poved by mathematcal ducto:
8. The bass: the fomula 7 holds whe. Ideed, ths case 7 gves us the coect umbe two of paths of legth that cota a u of legth : N +, p.5 F F?, +,,. The ductve step: suppose that the fomula 7 holds fo some. We eed to pove that the fomula 7 also holds whe + s substtuted fo. Let us wte dow the fomula 7 fo +: N +, p.5?, F +, + + F + F +, + 3, + + The fst summad of ths expesso gves a umbe of those outcomes of teest fo + bay tals, whch ae geeated fom the eds of all paths exstg at the th level of a tee dagam. These outcomes ae epeseted by clustes of paths at the +st level see Fg.. The secod summad gves a umbe of the sgle outcomes of teest appeag at the + st level. These ew outcomes belog to the paths havg oly oe u of the legth whch s stuated at the ed of the path. These tematg us ogate at the paths havg the same featue oly oe u of the legth at the ed of the path at levels, -,. The total umbe of these geeatg paths equals to the sum of the umbes at levels,, K, +. As we ca see fom the fomula 7 fo the th level wtte usg the Hoe scheme N +?,, p.5 F +, F + F + + F + K+ F + F KK +, +, K +, 3,,.
9 the abovemetoed umbes equal to the Fboacc umbes F, + j, j,. Ths meas that the acto F +3, of the secod summad equals to the sum of Fboacc umbes j F + j, ad, theefoe, s deed a Fboacc umbe of - ode by defto. That s, the ductve step s pove The aalyss of computato pefomace of the fomula 8 has bee caed out usg the C++ code gve Appedx B. Fst of all, the pogam calculates the elated Fboacc umbes placed to aay fb. I so dog the computg algothm takes to accout the e stuctue of a Fboacc umbes sequece, whch cotas a tal sub-sequece of umbes beg a powe of. eveal Fboacc umbes sequeces ae lsted Table A, the tal sub-sequeces beg selected by the gey backgoud colou. Table A. Fboacc umbes F+, of ode,, 3 4 5 6 7 8 9 3 3 5 8 3 34 55 89 4 4 7 3 4 44 8 49 74 5 4 8 5 9 56 8 8 4 6 4 8 6 3 6 36 464 7 4 8 6 3 63 5 48 49 8 4 8 6 3 64 7 53 54 9 4 8 6 3 64 8 55 59
The secod pat of the pogam computes the pobablty P, p.5 by the?, fomula 8. The algothm ad code ae able to make calculatos o cotempoay PCs fo vey log Beoull tals sequeces, up to 3. The esults obtaed ae depcted the Fg. ad ca be used to test fo adomess of a patte of Beoull tals wth p q. 5 ull hypothess. Pobablty of obtag at least oe u of legth o geate amog ethe the s o the s, p.5 P?, 9.3. 3. 4 5 4 6 8 ze of the sequece of Beoull tals Fo example, let us cosde a sequece of s ad s of legth 6 cotag a u of legth 4, ad test the ull hypothess ude the sgfcace level α. 5. Cosultg the Fg., oe ca fd that the ull hypothess should be ejected. If we cease the umbe of tals up to, the chace pobablty that a sequece of Beoull tals wth p q. 5 would cota a u of 4 o moe cosecutve ethe
s o s s about.6, so the ull hypothess caot be ejected at the gve sgfcace level. The Fboacc umbes of hghe odes ae used othe Beoull tals elated poblems, such as the co tossg see, e.g. [6], whee the pobablty that o us of k cosecutve tals wll occu co tosses s gve by Fboacc k-step umbe kth ode. F k + /, whee k F l s a 4. ummay I the pape, a ew poweful appoach to the logest of us test s descbed, whch ca effectvely eplace the classcal combatoal fomulas the patcula, but mpotat, case of equal pobabltes Beoull tals pocesses. Ths appoach s based o a thoough aalyss of the evet tee dagam, whch suggested devg a cocse fomula fo the pobablty of obtag at least oe u of legth o geate amog ethe the s o the s. The deved fomula extesvely uses the Fboacc umbes of hghe odes. The fomula poves to be capable pocessg vey log dchotomous sequeces up to 3 as compaed to 8 fo the classcal combatoal appoach. The coectess of the esults obtaed was checked by a beadth-fst seach algothm, ad the complete cocdece has bee show. The sde esult of the pape les evealg a egula eo beg heet the classcal combatoal algothm some cases. 5. Ackowledgemets The autho would lke to thak Pof. Wej-M Huag fo hs commets ad suggestos that led to mpovemets the pape.
Appedx A //The C++ code developed fo computg the statstcs of the //classcal logest of us test: #clude<osteam> #clude<cmath> #clude<omap> usg amespace std; double Factoalt double t; t ; fot, ; < ; ++ t * ; etu t; double Ct, t f < etu ; double t; t ; fot, ; > -+; -- t * ; etu t/factoal; double Pobt, t s, double p.5, double q.5 double pob ; t ; fot ; < ; ++ double pob, pob, pob ; fo ; < /s; ++ pob + pow-, +*C-+, *C-*s, -; fo ; < -/s; ++ pob + pow-, +*C+, *C-*s, ; fot ; < -s+; ++ double a, a, a3, a4 ; fo ; <-/s-; ++ a + pow-, +*C, *C--*s-, -; fo ; < --+/s-; ++ a + pow-, +*C-, *C---*s-, -; fo ; < --/s-; ++ a3 + pow-, +*C, *C---*s-, -; fo ; < ---/s-; ++ a4 + pow-, +*C+, *C---*s-, ; pob + a*a + *a3 + a4; pob + pob + pob pob*powp, *powq, -; etu pob; t ma t 8, s 4; double pob Pob, s; cout.setfos::fxed; cout << " " << << " " << "s " << s << edl <<"cout "
3 <<setw6<<setpecso<<pob*pow,<< edl << "pob4 " << setw4<<setpecso4<<pob<<edl; etu ; Appedx B //The C++ code fo the poposed logest of us test algothm //usg the Fboacc umbes: #clude<osteam> #clude<cmath> #clude<omap> usg amespace std; double RusFbt, t s double* fb ew double[-s+]; fot ; < -s; ++ fb[] ; double p ; fs > cout << "eo" << edl; etu -; else fb[] ; fb[] ; fo ; < s- && < -s+; ++ fb[] pow,-; fo s-; < -s; ++ fot j ; j < s-; ++j fb[] + fb[-j-]; fo ; < -s; ++ p + fb[]*pow.5, s+; p * ; cout.setfos::fxed; cout << " " << << " s " << s << edl; cout << " pob4 " << setpecso4 << p << edl; delete [] fb; etu p; t ma t 8, s 4; RusFb,s; etu ;
4 Refeeces [] egel,., Castella, N.J., J., 988, Nopaametc tatstcs fo the Behavoal ceces, d ed. New Yok: McGaw-Hll. [] pet P., 993, Appled Nopaametc tatstcal Methods, d ed. Lodo: Chapma & Hall. [3] Mood, A. M., 94, The Dstbuto Theoy of Rus, Aals of Mathematcal tatstcs,, 367-39. [4] Badley, J.V., 968, Dstbuto-Fee tatstcal Tests Eglewood Clffs, New Yok: Petce Hall. [5] Gstead C. M., ell J. L., 997, Itoducto to Pobablty, d ev. ed. Ameca Mathematcal ocety. [6] http://mathwold.wolfam.com/fboacc-tepnumbe.html