British Journl of Mthemtics & Computer Science 102: 1-7, 2015, Article no.bjmcs.19191 ISSN: 2231-0851 SCIENCEDOMAIN interntionl www.sciencedomin.org Estimtion of Binomil Distribution in the Light of Future Dt Kunio Tkezw 1 1 Agroinformtics Division, Agriculturl Reserch Center, Ntionl Agriculture nd Food Reserch Orgniztion Knnondi 3-1-1, Tsukub, Ibrki 305-8666, Jpn. Article Informtion DOI: 10.9734/BJMCS/2015/19191 Editors: 1 H. M. Srivstv, Deprtment of Mthemtics nd Sttistics, University of Victori, Cnd. Reviewers: 1 P. E. Oguntunde, Deprtment of Mthemtics, Covennt University, Nigeri. 2 Jos Alejndro Gonzlez Cmpos, Deprtment of Mthemtic nd Sttistic, Ply Anch University, Chile. Complete Peer review History: http://sciencedomin.org/review-history/10077 Short Reserch Article Received: 29 My 2015 Accepted: 15 June 2015 Published: 07 July 2015 Abstrct A predictive estimtor for estimting the prmeter of binomil distribution is suggested. This estimtor ims to mximize the expecttion of expected log-likelihood. The results given by this estimtor re superior to those given by the mximum likelihood estimtor in terms of the predictions when little prior knowledge bout the prmeter is vilble. Keywords: expected log-likelihood; binomil distribution; mximum likelihood estimtor; optimiztion 2010 Mthemtics Subject Clssifiction: 60G25; 62F10; 62M20 1 Introduction The mximum likelihood estimtor nd the unbised estimtor re common tools, which re used to estimte the prmeters of probbility density function from dt. However, the mximum likelihood estimtor is known to give uncceptble estimtes in some situtions e.g., pge 343 in [1]. The unbised estimtor does not exist under certin conditions nd, even if it exists, it my not be invrint pge 415 in [1]. In this pper, I propose predictive estimtor for estimting the prmeter of binomil distribution. The estimtor ims to mximize the expecttion of the expected log-likelihood, nd herefter is clled the predictive estimtor. Becuse nothing is known bout the form of the predictive estimtor, I ssume very simple form. Therefore, predictive estimtors with other forms my yield better results. *Corresponding uthor: E-mil: nonpr@gmil.com
Tkezw; BJMCS, 102, 1-7, 2015; Article no.bjmcs.19191 2 Predictive Estimtor for Binomil Distribution The probbility density function of the binomil distribution is written s n fξ = p ξ 1 p n ξ ξ = 0, 1, 2,..., n. 2.1 ξ where p is the true vlue of the prmeter nd ξ is the vrible. The expecttion of Eq.2.1 cn be written s n ξfξ = n p. 2.2 ξ=0 Assuming tht X is rndom vrible tht obeys the probbility density function fξ nd its reliztion i.e., dt is clled x, then the log-likelihood lp x of the dt is given s lp x n = log + xlogp + n xlog1 p. 2.3 n x To derive the p tht mximizes the bove vlue, the vlue is differentited with respect to p nd set equl to 0 s follows: x p n x = 0. 2.4 1 p Therefore, the mximum likelihood estimtor ˆp is obtined from the dt t hnd x s ˆp = x n. 2.5 Next, future dt re nmed {x i } 1 i m; future dt comes from nother smpling thn tht for dt t hnd x. The log-likelihood lˆp {x i } of ˆp in the light of this dt is represented s lˆp {x i } = 1 m n log + 1 m x i logˆp + 1 m n x i log1 ˆp. 2.6 m m m m x i The ˆp tht mximizes this vlue is termed ˆp nd is depicted s m ˆp = x i mn. 2.7 Becuse the number of future dt is infinite, m is set to be infinite. In this sitution, the vlue of ˆp is clled ˆp nd Eq.2.2 leds to m ˆp = lim x i = p. 2.8 m mn Here, the mximum likelihood estimtor for n infinite number of future dt is the true vlue of the prmeter i.e., p. Substitution of Eq.2.8 into Eq.2.6 yields lˆp {x i } 1 m n lim = lim log + n plogˆp + n1 plog1 ˆp. 2.9 m m m m x i When the bove eqution is tken into ccount nd the log-likelihood of ˆp for n infinite number of future dt is nmed l ˆp, we hve l 1 m n ˆp = lim log + n plogˆp + n1 plog1 ˆp. 2.10 m m x i 2
Tkezw; BJMCS, 102, 1-7, 2015; Article no.bjmcs.19191 Becuse the first term of the right-hnd side does not ffect derivtion of ˆp, this term cn be omitted nd the log-likelihood, nmed l ˆp, cn be written s l ˆp = n plogˆp + n1 plog1 ˆp. 2.11 The ˆp given by Eq.2.5 my not be the optiml estimtor in the light of future dt; therefore, the optiml estimtor for future dt i.e., the predictive estimtor, here clled ˆp +, cn be represented s ˆp + = x, 2.12 where is constnt. Hence, the log-likelihood of ˆp + for n infinite number of future dt is l ˆp + x = n plog + n1 plog 1 x. 2.13 By differentiting this eqution with respect to nd setting it equl to 0, we obtin = n. 2.14 x p The resulting eqution cn be written s which is n obvious reltionship. ˆp + = p, 2.15 Next, let us consider the verged vlue of l ˆp + i.e., expecttion of l ˆp +, which is given by smpling x infinite times. This expecttion is represented s ] [ E x l ˆp + ] x = E x [n plog = n 1 ξ=1 n plog ξ + n1 plog 1 x + n1 plog n 1 ξ=1 1 ξ n p r 1 p n ξ ξ. n p ξ ξ 1 p n ξ 2.16 The rnge of summtion of the right-hnd side of the second line of this eqution is from r = 1 through r = n 1 insted of r = 0 through r = n. This is becuse the possibility tht x tkes the vlue of 0 or n hs to be excluded. As result of this mesure, the expecttion is stndrdized by the constnt i.e., the denomintor. Moreover, if is set to be negtive, the rgument of log 1 x cn be negtive; hence, hs to be 0 or positive. It should be noted tht the type of expecttion used in Eq.2.16 is found in the derivtion of AIC Akike s informtion criterion. For exmple, E Gxn on pge 55 in [2] is similr expecttion. When = 0 is ssumed in Eq.2.16, it yields the expecttion of expected log-likelihood. Therefore, if the vlue of Eq.2.16 for 0 is lrger thn its vlue for = 0, n tht holds 0 will give better estimtor. Then, is defined s [ = E x l ˆp + ] [ E x l ˆp ] x = E x [n plog + n1 plog 1 x ] x E x [n plog + n1 plog 1 x ]. 2.17 n n 3
Tkezw; BJMCS, 102, 1-7, 2015; Article no.bjmcs.19191 n= 5, p ~ =0.1 n= 5, p ~ =0.3 n= 5, p ~ =0.5 n= 5, p ~ =0.7 Figure 1: given by Eq.2.17 when n = 5. p = 0.1 top left; p = 0.3 top right; p = 0.5 bottom left; nd p = 0.7 bottom right. This is clculted using Eq.2.16. When is clculted ssuming n = 5, p = {0.1, 0.3, 0.5, 0.7}, nd = {0.1, 0.2, 0.3,..., 2.0} the results re shown in Figure1. When n = 10 is ssumed nd the other prmeters remin the sme s those used in Figure 1, the results re shown in Figure2. For exmple, if we know from the phenomenon generting the dt tht n = 10 nd 0.1 p 0.5, should be set t 0 to 1.5. In this sitution, when is set t 0, it gives the mximum likelihood estimtor. Hence, when = 0.5 or = 1 is used for exmple, the verged expecttion of expected log-likelihood yielded by repeted smplings is lrger thn the verged expecttion given by the mximum likelihood estimtor Figure 2, which indictes tht this predictive estimtor is more desirble estimtor. To optimize the vlue of exctly, detiled prior knowledge bout the prmeters or settings of the dt re required. Next, we consider the sitution when the probbility of success obeys the binomil distribution nd the probbility of filure lso obeys the binomil distribution. Assuming, p s is the probbility of success, p f is the probbility of filure, nd p s + p f = 1. If we know tht n = 10 nd 0.1 p s 0.5, then setting in 2.12 t 0.5 or 1 gives lrger vlue of the expecttion of expected log-likelihood thn is given by the mximum likelihood estimtor. By contrst, becuse we know 0.5 p f 0.9, setting of in Eq.2.12 t 0 i.e., use of the mximum likelihood estimtor is resonble choice, nd ˆp + s + ˆp + f 1. Conventionlly, we believe tht when ps + p f = 1 holds, ˆp s + ˆp f = 1 is stisfied. 4
Tkezw; BJMCS, 102, 1-7, 2015; Article no.bjmcs.19191 0.4 n= 10, p ~ =0.1 n= 10, p ~ =0.3 n= 10, p ~ =0.5 n= 10, p ~ =0.7 Figure 2: given by Eq.2.17 when n = 10. p = 0.1 top left; p = 0.3 top right; p = 0.5 bottom left; nd p = 0.7 bottom right. This inference is not obviously vlid. When the purpose of the estimtion is mximiztion of the expecttion of expected log-likelihood, this reltionship is no longer stisfied. 3 Conclusions Byes estimtor [3-8] is conventionlly the most common tool for using prior knowledge to estimte prmeters of probbility density function. The predictive estimtor method proposed here uses prior knowledge such s 0.1 p s 0.5 to mke the expecttion of expected log-likelihood lrger thn tht given by the mximum likelihood estimtor. In the cse of the predictive estimtor for the exponentil distribution, the estimtor given by multiplying 1 1 n n is the number of dt with the mximum likelihood estimtor yields lrger vlue for the expected log-likelihood thn simply using the mximum likelihood estimtor [9]. Therefore, even if no prior informtion bout the true prmeter is vilble, predictive estimtor tht performs better thn the mximum likelihood estimtor is vilble. Conversely, our predictive estimtor of the binomil distribution requires some prior knowledge bout the prmeters. However, predictive estimtor tht does not need ny prior knowledge of the prmeters is still possible. Furthermore, if we develop method of constructing n optiml predictive estimtor depending on the prior knowledge on the prmeters, we expect tht such predictive estimtor will replce the mximum likelihood estimtor. The 5
Tkezw; BJMCS, 102, 1-7, 2015; Article no.bjmcs.19191 sme is true for the estimtion of prmeters of distributions other thn the binomil distribution. The MSE men squred error is defined s below pge 330 in [1]. MSE = E θ W θ 2, 3.1 where θ is the true prmeter, W is the estimtor of the prmeter, nd E θ is the expecttion given by verging the results of repeted smplings. As noted on pge 332 of [1], MSE depends upon the true prmeter; hence, some prior knowledge bout the prmeters is needed to estimte the prmeters by minimizing MSE. In this regrd, minimizing MSE is similr to mximizing the expecttion of expected log-likelihood. Estimtion of vrince of dt by minimizing MSE, however, does not need such prior knowledge. This sitution should be hndled s n exception in the sme wy tht the exponentil distribution hs n exceptionl predictive estimtor nd the norml distribution hs third vrince [10-11]. Moreover, methodologies bsed on predictive estimtors hve something in common with smoothing splines e.g., sections 2 nd 3 of [12], nd section 3 of [13]. Currently, no definitive theory on the desirble form of the roughness penlty in smoothing splines is vilble; hence, we could not confirm tht the most commonly-used roughness penlty is the optiml one. Diverse theories nd numericl simultions, however, indicte tht our common roughness penlty is vlid from vrious perspectives. Further, we hve not confirmed tht the use of Eq.2.12 s predictive estimtor is pproprite or optiml. However, Fig.1 nd Fig.2 show tht when is set in vlid rnge, the predictive estimtor defined in Eq.2.12 outperforms the mximum likelihood estimtor. Therefore, lthough we cnnot deny the possibility tht better predictive estimtors exist, tenttively Eq.2.12 with n pproprite vlue of cn be used. Tht is, setting n pproprite vlue of corresponds to setting pproprite vlues for the smoothing prmeters in smoothing splines. Thus, the use of predictive estimtor such s Eq.2.12 cn be justified in the sme sense tht the use of smoothing splines is justified, even though it hs not been proven tht smoothing splines leds to better results thn those obtined by ll the other estimtion methods. The chrcteristics of the predictive estimtor should now be evluted for vrious estimtions; in prticulr, the predictive estimtor should be compred with the mximum likelihood estimtor nd unbised estimtor. Such studies will help estblish mjor techniques for using dt more efficiently. Acknowledgements The uthor is very grteful to the referees for crefully reding the pper nd for their comments nd suggestions which hve improved the pper. Competing Interests The uthor declres tht no competing interests exist. References [1] Csell G, Berger RL. Sttisticl inference. 2nd ed. Pcific Grove CA: Duxbury Press; 2001. [2] Konishi S, Kitgw G. Informtion criteri nd sttisticl modeling. New York: Springer; 2008. [3] Robert CP. The byesin choice: From decision-theoretic foundtions to computtionl implementtion. 2nd edition. Springer; 2007. 6
Tkezw; BJMCS, 102, 1-7, 2015; Article no.bjmcs.19191 [4] Crlin BP, Louis TA. Byesin methods for dt nlysis. Third Edition. Chpmn & Hll CRC; 2008. [5] O Hgn A, Forster J. Kendll s dvnced theory of sttistics. Byesin Inference, 1 edition. Wiley. 2010;2B. [6] Lee PM. Byesin sttistics: An introduction. Wiley; 2012. [7] Gelmn A, Crlin JB, Stern HS, Dunson DB, Vehtri A, Rubin DB. Byesin dt nlysis, third edition. Chpmn & Hll CRC; 2013. [8] Bessiere P, Mzer E, Ahuctzin JM, Mekhnch K. Byesin progrmming. Chpmn & Hll/CRC; 2013. [9] Tkezw K. Estimtion of the exponentil distribution in the light of future dt. British Journl of Mthemtics & Computer Science. 2015;51:128-132. [10] Tkezw K. A revision of ic for norml error models. Open Journl of Sttistics. 2012;23:309-312. [11] Tkezw K. Lerning regression nlysis by simultion. Springer; 2013. [12] Green PJ, Silvermn BW. Nonprmetric regression nd generlized liner models: A roughness penlty pproch. Chpmn & Hll CRC; 1993. [13] Tkezw K. Introduction to nonprmetric regression. Wiley; 2005. c 2015 Tkezw; This is n Open Access rticle distributed under the terms of the Cretive Commons Attribution License http://cretivecommons.org/licenses/by/4.0, which permits unrestricted use, distribution, nd reproduction in ny medium, provided the originl work is properly cited. Peer-review history: The peer review history for this pper cn be ccessed here Plese copy pste the totl link in your browser ddress br http://sciencedomin.org/review-history/10077 7