A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University of Benin, Benin City, Nigeria ABSTRACT This paper proposes a new algorith for boosting in kernel density estiation (KDE). This algorith enjoys the property of a bias reduction technique like other existing boosting algoriths and also enjoys the property of less function evaluations when copared with other boosting schees. Nuerical exaples are used and copared with existing algorith and the findings are coparatively interesting. Keywords: Boosting, kernel density estiation, bias reduction, boosting algorith. INTRODUCTION Kernel density estiation (KDE) is a process of constructing a density function fro a given set of data. The process depends on the choice of the soothing paraeter which is not easily found (Silveran, 1986; Jones and Signorini, 1997; and and Jones, 1995). As a result of this proble of finding a soothing paraeter, the idea of boosting in KDE is born. The area of boosting is relatively new in kernel density estiation. It was first proposed by Schapire (1990). Other contributors are Freund (1995), Freund and Schapire (1996), Schapire and Singer (1999). Boosting is a eans of iproving the perforance of a "weak learner" in the sense that given sufficient data, it would be guaranteed to produce an error rate which is better than "rando guessing". Mazio and Taylor (004) proposed an algorith in which a kernel density classifier is boosted by suitably re-weighting the data. This weight placed on the kernel estiator is a ratio of the log function in which the denoinator is a leave-one-out estiate of the density function. Also, Mazio and Taylor s (004) work theoretically gave an explanation and finally showed how boosting is able to reduce the bias in the asyptotic ean integrated squared error (AMISE) of Silveran (1986). In section, we shall see how the Mazio and Taylor's (004) algorith is a bias reduction technique. Section 3 shall show a proposed eshsize algorith and its liitations. Nuerical exaples Journal of Science and Technology, Vol. 8, No., August, 008 69

A eshsize boosting algorith in kernel density estiation are used to justify this algorith in section 4 and copare the results with that of Mazio and Taylor (004). Algorith on boosting kernel density estiates and bias reduction Throughout this paper, we shall assue our data to be univariate. The algorith of Mazio and Taylor (004) is briefly suarized in algorith 1. Algorith 1 Step 1: Given {x 1, i=1. n}, initialize 1 =1/n Step : Select h (the soothing paraeter). Step 3: For =1,, M, obtain a weighted kernel estiate ( i) n x x fˆ i = k (.1) i= 1 h h where x can be any value within the range of the x 1 s, k is the kernel function and w is a weight function; and then update the weights according to ( i) = ( i) + 1 ˆ Step 4: Provide output as M = 1 f ˆ ( x ) ( x ) fˆ i + log ( 1 ) (.) f ( xi ) renoralized to integrate to unity For full ipleentation of this algorith see Marzio and Taylor (004). Boosting as a Bias Reduction in Kernel Density Estiation Suppose we want to estiate f(x) by a ultiplicative estiate. e also suppose that we use only "weak" estiates which are such that h does not tend to zero as n. Let us use a population version instead of saple in which our weak learner, for h >0 is given by x y fˆ = 1 ( y) K f ( y)dy (.3) h h where 1 (y) is taken to be 1. e shall take our kernel function to be Gaussian (since all distributions tend to be noral as n, the saple size, becoes large through central liit theory (Towers, 00). The first approxiation in the Taylor's series, valid for h < 1 provided that the derivatives of f(x) are properly behaved, is fˆ ( 1) = f h f + and so we observe the usual bias of order 0(h ) of and and Jones (1995). If we now let ( ) ˆ x = f( ) 1 1 1 ( x+ zh) 4 + ( h ) f ( x zh)dz f f ˆ = k( z) f ( x+ zh) + h 0 + 4 ( ) 1 h f = + 0 f h This gives an overall estiator at the second step as 4 h f ( ) h 1 ˆ ( )( ) ˆ f f 1 x f( ) = f 1 + h + 0 + 0 f f 4 0( h ) f + (.4) 4 ( h ) (.5) which is clearly of order four and so we can see a bias reduction fro order two to order four. Proposed eshsize algorith in boosting In this section, we shall see how the leave-oneout estiator of the (.) in the weight function can be replaced by a eshsize estiator due to the tie coplexity involved in the leave-one-out estiator. In the leave-one-out estiator, we require (n+(n-1)).n function evaluations of the density for each boosting step. Thus, we are using a eshsize in its place. The only liitation on this eshsize algorith is that we ust first deterine the quantity 1/nh so as to know what the eshsize that would be placed on the weight function of (.) would be. The need to use a eshsize in place of the leave-one-out lies on the 70 Journal of Science and Technology, Vol. 8, No., August, 008

A eshsize boosting algorith in kernel density estiation fact that boosting is like the steepest-descent algorith in unconstrained optiization and thus a good substitute that approxiates the leave-one-out estiate of the function (Duffy and Helbold, 000; Taha, 1971; Ratsch et al., 000; Mannor et al., 001). The new eshsize algorith is stated as: Algorith Step 1: Given {x 1, i=1. n}, initialize 1 =1/n, initialize Step : Select h (the soothing paraeter). Step 3: For =1,, M, i) Get fˆ n = ( i) x xi k i= 1 h h where x can be any value within the range of the x 1 s, k is the kernel function and w is a weight function ii) Update ( i) = ( i) esh +1 + Step 4: Provide output M = 1 f ˆ ( x ) and noralize to integrate to unity Table 4.1 Showing bias reduction e can see that the weight function uses a eshsize instead of the leave-one-out log ratio function of Mazio and Taylor (004). The nuerical verification of this algorith would be seen in section 4. NUMERICAL RESULTS In this section, we shall use three sets of data (data 1 is the lifespan of a car battery in years, data is the nuber of written words without istakes in every 100 words by a set of students and data 3 is the scar length of patients randoly sapled. See Ishiekwene and Osewenkhae, 006) which were suspected to be noral to illustrate algoriths 1 and and copare their results. BASIC prograing language is used to generate the results for a specified h (Ishiekwene and Osewenkhae (006) and their corresponding graphs shown in figures 1a 1c for data 1(n=40), figures a c for data (n=64) while figures 3a 3c is for data 3(n=110). Also, table 4.1 shows the bias reduction we discussed in section for the three sets of data used. e can clearly see the bias reduction between the noral kernel estiator and the boosted kernel estiator. e also copared our boosted kernel estiator with the fourth-order kernel estiator of Jones and Signorini (1997) which is arguably the best soothing paraeter choice and an equivalent of Ishiekwene and Osewenkhae s (006) higher-order choice at order four. CONCLUSION Fro the graphs in figures 1A 3C for the three sets of data, we can clearly see that our proposed NORMAL (conventional order two) H4 (as in Jones and Signorini 1997 and Ishiekwene and Osewenkhae 006) BOOSTED (proposed schee) Bias Var AMISE Bias var AMISE Bias var AMISE Data 1 0.00576637 0.019811685 0.508835 0.00071803 0.01478945 0.016861048 0.00078009 0.015168456 0.01746655 Data 0.00093946 0.00113040 0.00144348 0.000108591 0.000809307 0.000917898 0.000108715 0.0008153 0.000930868 Data 3 0.004697604 0.01663515 0.0131119 0.001767617 0.0113416 0.013109779 0.00176818 0.01144619 0.01314437 Journal of Science and Technology, Vol. 8, No., August, 008 71

A eshsize boosting algorith in kernel density estiation.50e-01.00e-01 F D P 5.00E-0.00E-01 1.80E-01 1.60E-01 1.40E-01 1.0E-01 8.00E-0 6.00E-0 4.00E-0 0.00E+00 Fig. 1A (Leave-one out Algoriths).50E-01.00E-01.00E-0 Fig. A (Leave-One-Out Algoriths).50E-01.00E-01 5.00E-0 5.00E-0 1 3 5 7 9 11 1 Fig. 1B (Meshsize Algoriths) 1 3 4 5 6 7 8 9 101111314 Fig. B (Meshsize Algoriths) F D P.50E-01.00E-01 5.00E-0 LEAVE-ONE-OUT MESHSIZE Fig. 1C (Coparison of both Algoriths).50E-01.00E-01 LEAVE-ONE-OUT MESHSIZE F D P 5.00E-0 Fig. C; (Coparison of Both Algoriths) 7 Journal of Science and Technology, Vol. 8, No., August, 008

0.16 0.14 0.1 0.1 0.08 0.06 0.04 0.0 0 A eshsize boosting algorith in kernel density estiation 1 3 4 5 6 7 8 9 10111131415 Fig. 3A (Leave-One-Out Algoriths) P D F 0.18 0.16 0.14 0.1 0.1 0.08 0.06 0.04 0.0 0 1 3 4 5 6 7 8 9 10 11 1 13 14 15 Fig. 3B (Meshsize Algoriths) 0.18 0.16 0.14 0.1 0.1 0.08 0.06 0.04 0.0 0 15 LEAVE-ONE-OUT MESHSIZE Fig. 3C (Coparison of both Algoriths) algorith copares favorably if not arguably better than that of Mazio and Taylor (004). Also, the results of Table 4.1 has revealed the bias reduction targeted by both algoriths. Since our algorith does the sae task with fewer function evaluations and in fact better in ters of tie coplexity, we are recoending it for use in boosting in KDE. REFERENCES Duffy, N. and Helbold, D. (000). Potential bosters? Advances in Neural info. Proc. Sys. 1: 58 64. Freund, Y. (1995). Boosting a weak learning algorith by ajority. Info Cop. 11: 56-85. Freund, Y. and Schapire, R. (1996). Experients with a new boosting algorith. In achine learning: proceedings of the 1 th Intl. conf. Ed. L. Saite 148-156. Ishiekwene, C. C. and Osewenkhae, J. E. (006). A coparison of fourth order window width selectors in KDE (A Univarate case). ABACUS 33(A): 14 0. Jones, M.C. and Signorini, D.F. (1997). A coparison of Higher-order Bias Kernel Density Estiators. JASA 9: 1063 107. Mannor, S., Meir, R. and Mendelson, S. (001). On consistency of boosting algoriths. (Unpublished Manuscript). Mazio, D. M. and Taylor, C.C. (004). Boosting Kernel Density estiates: A bias reduction technique? Bioetrika 91: 6-33. Ratsch, G. Mika, S., Scholkopf, B. and Muller, K. K. (000). Construction boosting algoriths fro SVM's: An application to-one-class classification. GMD Tech Report No. 1. Schapire, R. (1990). The strength of weak learnability. Machine learn. 5: 313 31. Schapire, R. and Singer, Y. (1999). Iproved boosting algorith using confidence rated prediction. Machine learn. 37: 97 336. Journal of Science and Technology, Vol. 8, No., August, 008 73

A eshsize boosting algorith in kernel density estiation Silveran, B.. (1986). Density Estiation for Statistics and Data analysis. Chapan and Hall, London; 34 74. Taha, H. A. (1971). Operations Research: An Introduction Prentice Hall, New Jersey; 781 814. Towers, S. (00). Kernel Probability density estiation ethods. In; M. halley, L. Lyons (Eds), Advanced Statistical techniques in particle Physics, Durha,England, 11 115. and, M.P. and Jones, M.C. (1995). Kernel Soothing. Chapan and Hall/CRC, London; 10 88. enxin, J. (001). Soe theoretical aspects of boosting in the presence of noisy data. Nueral Coputation 14(10): 415-437. 74 Journal of Science and Technology, Vol. 8, No., August, 008