The exact confidence limits for unknown probability in Bernoulli models

The eact cofidece limits for uow probabilit i Beroulli models RI Adrushiw Departmet of Mathematical Scieces ad Ceter for Applied Mathematics ad Statistics New Jerse Istitute of Techolog Newar NJ 7 DA Klushi YuI Petui Departmet of Cberetics Kiv Natioal Taras Shevcheo Uiversit Kiv Uraie MYu Savia Istitute of Mathematics Natioal Academ of Scieces of Uraie Kiv Uraie CAMS Report 45-5 Sprig 5 Ceter for Applied Mathematics ad Statistics

THE EXACT CONFIDENCE LIMITS FOR UNKNOWN PROBABILITY IN BERNOULLI MODELS RIAdrushiw Departmet of Mathematical Scieces ad Ceter for Applied Mathematics ad Statistics New Jerse Istitute of Techolog Newar NJ USA DAKlushi YuIPetui Departmet of Cberetics Kiv Natioal Taras Shevcheo Uiversit KivUraie MYu Savia Istitute of Mathematics Natioal Academ of Scieces of Uraie Kiv Uraie Abstract The applicatio of mathematical-statistical models i medical diagostics ofte requires the costructio of a "eact" cofidece iterval for the uow probabilit p of success i Beroulli models (so called biomial proportio or proportio of populatio) This problem was cosidered i a umber of papers (for eample see [-5] ad refereces cited there) The website BioMed Cetral gives more tha citatios devoted to this theme The purpose of our paper is to costruct a "eact" cofidece iterval for uow probabilit p of success i classical ad geeralized Beroulli models Kewords Probabilit eact cofidece iterval Beroulli models The Settig Cosider the followig test of homogeeit for two populatios Let G ad G be geeral populatios with uow cotiuous distributio fuctios F ( u ) ad F ( u ) respectivel Let = ( ) be a sample from G ad = m be a sample from G We wat to test whether the uow distributio fuctios F ( u ) ad F ( u ) are the same (hpothesis H ) or ot (hpothesis H ) If the hpothesis H is true we have a homogeeous composite sample m otherwise the composite sample is heterogeeous For this purpose itroduce the variatioal series () () ( ) where () = ad ( + ) = ad cosider a radom iterval Iiq = ( ( i) ( i + q) ) with i ad q fied umbers i q i+ The scheme of trials is formulated i the followig wa: at the th step ( = m) we test whether the sample value belogs to the iterval I iq ad obtai a set of evets A = { Ii q} = m where ever evet ca occur with certai probabilit p = P( A) = m Let us itroduce a radom variable κ that is equal to the umber of evets A arisig i m trials If the hpothesis H is true the all probabilities p are the same ad equal to q pq = P( A H ) = () + This scheme is called the geeralized Beroulli model I paper [67] the distributio of probabilities of radom variable κ was determied: l m l Cl+ q Cm+ l q P( κ = l H ) = m C () m+ l = m where m are sizes of samples ad respectivel q is a fied umber which is equal to the umber of the order statistics i the iterval I iq s C r is the umber of combiatios of r elemets tae s at a time The purpose of our paper is to costruct the eact cofidece iterval I ( κ ) cotaiig the probabilit () o the basis of the value of the radom variable κ The word eact meas that the sigificace level of this cofidece iterval does ot eceed a give umber β (as a rule β = 5) With the help of this iterval it is possible to propose the followig test of hpotheses H ad H :

) O the basis of sample costruct the variatioal series ad tae a radom iterval I where i ad q fied umbers iq ) Calculate the statistics κ which is equal to the umber of the elemets of the sample which fall ito the iterval I iq ) O the basis of the statistics κ costruct the cofidece iterval I ( κ ) with the sigificace level β 4) If the iterval I ( κ ) does ot cover the probabilit p q the hpothesis H is rejected otherwise this hpothesis is ot rejected Costructio of Eact Cofidece Iterval Let us costruct the iterval I ( κ ) Let be a arbitrar iteger discrete radom variable with the distributio p = p( = ) = This radom variable geerate the fuctio ϕ ( ) defied o the set M = { } b the formula ϕ = p Cosider a arbitrar segmet I = { : } M We call the segmet I a domai of mootoicit of fuctio ϕ ( ) if the coditio i ( i I) implies that ϕ ϕ ( i) The iteger radom variable is called uimodal if its rage M ca be represeted as a uio of oe or two domais of mootoicit of the fuctio ϕ ( ) For eample a radom variable with biomial distributio ad a radom variable with distributio () are uimodal Remar The cocept of uimodal iteger discrete radom variable whe > differs from the cocept of uimodal radom variable proposed b AKhichi Ideed b Khichi a distributio F u of the uimodal radom variable fuctio is cove o the ra ( ) where is a mode of ad is cocave o the ra ( ) Therefore ( u ) ca have o more tha oe brea poit but a F discrete radom variable has a staircase fuctio hece if > the umber of brea poits is more tha two I additio to the iteger discrete radom variable with the distributio p = p( = ) = let us cosider a cotiuous radom variable with the followig desit of probabilities if u f ( u) = p if u + = if u + We shall call this fuctio iducig cotiuous radom variable The radom variable iduces the radom variable with the help of fuctio = = It mappig ever value R to its iteger part = It Equalit i the above formula has the followig implicatio Deote b E a radom trial producig a radom variable with give fuctio of probabilit p ( ) ad deote b E z a idepedet radom trial producig the radom variable z uiforml distributed o [ ] I a compoud radom trial E = E E the radom variable = + z has ( z) the desit fuctio f ( u ) ad the radom variable It ( ) taes the value if ad ol if = as a result of the trial E z Therefore we ca cosider that the compoud radom trial E produces the radom values ad such that ot ol the distributios of ad It ( ) are the same but also the values ad It ( ) are the same Let be a radom variable with biomial distributio ad be a cotiuous radom variable iducig I a Beroulli model the mathematical epectatio m( ) ad variace σ are as follows: + + m = uf u du = uf u du = + = p udu = + p = ( ) = = = p ( ) + p( ) = p + ; = = =

+ = m = p u du = = p + + ( ) = p p p = = = = + + = = m( ) + m + = = ( σ + ( m ) ) + = = pq + ( p) + p + σ = pq+ σ = pq + I the geeralized Beroulli model we have m = mp q ( + + ) m m σ = pq( pq) + Therefore i the geeralized Beroulli model m = mp q + ( m+ + ) m σ = pq( pq) + + Cosider a arbitrar fied cofidece iterval ab cotaiig the bul of G with sigificace level α Sice It ( ) is a o-decreasig fuctio it follows that the radom evet A = [ a b] = a b implies the radom { } { } A = { = It It a It b } evet Therefore the sigificace level of the closed cofidece iterval It ( a) It ( b) for the bul of G does ot eceed α Moreover It ( a) It ( b) [ a b] ad hece p ( It ( a) It ( b) ) p ( [ a b] ) Therefore the sigificace level of the cofidece iterval [ a b] for the bul of G also does ot eceed α: { [ ]} p a b α It is eas to see that the iteger discrete radom variable is uimodal if ad ol if iducig cotiuous radom variable is uimodal i the sese of Khichi For such radom variables the Gauss-Vsochasij-Petui iequalit holds [8] 4 p( m λσ ) 9 λ 8 where λ> Therefore the sigificace level of the cofidece iterval m λσ m +λσ 4 coverig the bul of G does ot eceed α= 9 λ 4 I particular whe λ = we have α= < 5 8 I the case of the classical Beroulli model put a = m λσ = p + λ pq + b = m +λσ = p + +λ pq + O the basis of the previous reasoig we have that the cofidece iterval [ a b] covers the bul of the radom variable with biomial distributio ie the cofidece iterval I = p λ pq + p +λ pq + has the sigificace level which does ot eceed 4 8 α= whe λ> 9 λ The radom evet { I} ca be rewritte i the followig form: p +λ pq + Thus i the Beroulli model p h p λ pq + + α To costruct the cofidece iterval for the uow probabilit p o the basis of the proportio h i the Beroulli model cosistig of

trials cosider two fuctios depedig o p : [ ] ϕ ( p) = h p ad ( p ) λ ψ = + ( ) p p + Let ψ ( p) = p ( p) + p R I eas to see that the graph of the fuctio ψ ( p) p R is the upper half of the ellipse E passig through the poits A= + + B = + 4 C = + D = + 4 with the ceter at the poit The graph of ψ ( p) is costructed o the basis of restrictio of the graph of ψ ( p) to the segmet [ ] b λ stretchig or compressig its graph b a factor ad shiftig b Therefore the graph of the fuctio ψ ( p) which does ot deped o h is a arc of a ellipse ψ Γ passig through the poits ( ) ψ ( ψ () ) such that the fuctio ψ ( p) achieves its miimum at the poit p = ad is smmetrical with respect to that poit The lower cofidece limit p is a root of the quadratic equatio λ λ + p + h () h λ + h + = 4 λ If h >ψ = + the the lower cofidece limit p is the least root of () If h ψ the p = Similarl the upper cofidece limit p is a root of the equatio λ λ + p + + h (4) h λ + h + + = 4 h >ψ the the upper cofidece limit If p is the largest root of (4) If h ψ the p = Remar Note that p h p so that the proportio of successes alwas lies i the cofidece iterval [ p p ] For the geeralized Beroulli model a similar reasoig gives the followig quadratic equatio for the lower cofidece limit: ( m+ + ) λ + p ( ) m + + ( m+ + ) λ + h (5) m ( ) m + h λ + h + = m 4m λ If h > + =γ the the lower m m cofidece limit p for the geeralized Beroulli model is the least root of (5) If h γ the p = Similarl the upper cofidece limit p for the geeralized Beroulli model is the root of the quadratic equatio

( m+ + ) λ ( + ) m ( m ) ( + ) + p + + λ + + h (6) m m h λ + h + + = m 4m If h >γ the the upper cofidece limit p is the largest root of (6) If h γ the p = B virtue of the previous results the sigificace level of the cofidece iterval does ot eceed 4 (i particular 5 for λ= ) 9 λ [8] Vsochasij DF Petui YuI Justificatio of the σ rule for uimodal distributio Theor Probab ad Math Stat 989; : 5-6 Refereces [] Petui Yu I Klushi DA Adrushiw RI Gaia KP Boroda NV Computer- Aided Differetial Diagosis of Breast Cacer ad Fibroadeomatosis based o Maligac Associated Chages i Buccal Epithelium Automedica ; 9(-4): 5-64 [] Brow LD Cai TT DasGupta A Iterval Estimatio for a Biomial Proportio Stat Sci ; (6): - [] Petui Yu I Klushi DA Adrushiw RI Gaia KP Boroda NV Aalsis of Maligac-Associated DNA Chages i the Nuclei of Buccal Epithelium i the Patholog of the Throid ad Mammar Glads Aals of the New Yor Academ of Scieces ; 98: - [4] [4] Yoo S David H Revisitig Clopper- Pirso Techical Report -5 Departmet of Statistics ad Statistical Laborator Iowa Uiversit; [5] Adrushiw RI Klushi DA Petui Yu I Lsu V Boroda NV Diagosis of Breast Cacer b the Modified Nearest Neighbor Recogitio Method I: F Valafar editor Proceedigs of the Iteratioal Coferece o Mathematics ad Egieerig Techiques i Medicie ad Biological Scieces; Ju 4-7; Las Vegas Nevada USA; p 76-89 [6] Matvechu SA Petui YuI Geeralized Beroulli schemes i variace statistics Part I Ur Mat Joural 99; 4(4): 58-58 [7] Matvechu SA Petui YuI Geeralized Beroulli schemes i variace statistics Part II Ur Mat Joural 99; 4(6): 779-785