A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

Similar documents
Machine Learning Basics: Estimators, Bias and Variance

Combining Classifiers

Boosting with log-loss

Ensemble Based on Data Envelopment Analysis

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

E. Alpaydın AERFAISS

Ch 12: Variations on Backpropagation

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

1 Proof of learning bounds

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

Biostatistics Department Technical Report

Pattern Recognition and Machine Learning. Artificial Neural networks

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Pattern Recognition and Machine Learning. Artificial Neural networks

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Kernel Methods and Support Vector Machines

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Non-Parametric Non-Line-of-Sight Identification 1

Stochastic Subgradient Methods

The Methods of Solution for Constrained Nonlinear Programming

Sharp Time Data Tradeoffs for Linear Inverse Problems

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

1 Bounding the Margin

Research in Area of Longevity of Sylphon Scraies

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

COS 424: Interacting with Data. Written Exercises

Bayes Decision Rule and Naïve Bayes Classifier

PAC-Bayes Analysis Of Maximum Entropy Learning

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Randomized Recovery for Boolean Compressed Sensing

N-Point. DFTs of Two Length-N Real Sequences

Research Article A New Biparametric Family of Two-Point Optimal Fourth-Order Multiple-Root Finders

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

An Improved Particle Filter with Applications in Ballistic Target Tracking

Boosting kernel density estimates: A bias reduction technique?

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

IAENG International Journal of Computer Science, 42:2, IJCS_42_2_06. Approximation Capabilities of Interpretable Fuzzy Inference Systems

Testing equality of variances for multiple univariate normal populations

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines. Maximizing the Margin

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Robustness and Regularization of Support Vector Machines

Probabilistic Machine Learning

Probability Distributions

A note on the multiplication of sparse matrices

Tracking using CONDENSATION: Conditional Density Propagation

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

On a few Iterative Methods for Solving Nonlinear Equations

Bayesian Approach for Fatigue Life Prediction from Field Inspection

An improved self-adaptive harmony search algorithm for joint replenishment problems

Efficient Filter Banks And Interpolators

Quantum Chemistry Exam 2 Take-home Solutions

On Constant Power Water-filling

Soft-margin SVM can address linearly separable problems with outliers

Use of PSO in Parameter Estimation of Robot Dynamics; Part One: No Need for Parameterization

Neural Networks Trained with the EEM Algorithm: Tuning the Smoothing Parameter

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Kernel-Based Nonparametric Anomaly Detection

Computational and Statistical Learning Theory

Variational Adaptive-Newton Method

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Analyzing Simulation Results

A Theoretical Analysis of a Warm Start Technique

Shannon Sampling II. Connections to Learning Theory

Two New Unbiased Point Estimates Of A Population Variance

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting

Bernoulli Wavelet Based Numerical Method for Solving Fredholm Integral Equations of the Second Kind

SPECTRUM sensing is a core concept of cognitive radio

Lower Bounds for Quantized Matrix Completion

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Ufuk Demirci* and Feza Kerestecioglu**

P016 Toward Gauss-Newton and Exact Newton Optimization for Full Waveform Inversion

Meta-Analytic Interval Estimation for Bivariate Correlations

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Interactive Markov Models of Evolutionary Algorithms

Accuracy at the Top. Abstract

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Bootstrapping Dependent Data

Comparison of Stability of Selected Numerical Methods for Solving Stiff Semi- Linear Differential Equations

TABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS. By William W. Chen

Complexity reduction in low-delay Farrowstructure-based. filters utilizing linear-phase subfilters

VI. Backpropagation Neural Networks (BPNN)

Comparing Probabilistic Forecasting Systems with the Brier Score

A method to determine relative stroke detection efficiencies from multiplicity distributions

Transcription:

A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University of Benin, Benin City, Nigeria ABSTRACT This paper proposes a new algorith for boosting in kernel density estiation (KDE). This algorith enjoys the property of a bias reduction technique like other existing boosting algoriths and also enjoys the property of less function evaluations when copared with other boosting schees. Nuerical exaples are used and copared with existing algorith and the findings are coparatively interesting. Keywords: Boosting, kernel density estiation, bias reduction, boosting algorith. INTRODUCTION Kernel density estiation (KDE) is a process of constructing a density function fro a given set of data. The process depends on the choice of the soothing paraeter which is not easily found (Silveran, 1986; Jones and Signorini, 1997; and and Jones, 1995). As a result of this proble of finding a soothing paraeter, the idea of boosting in KDE is born. The area of boosting is relatively new in kernel density estiation. It was first proposed by Schapire (1990). Other contributors are Freund (1995), Freund and Schapire (1996), Schapire and Singer (1999). Boosting is a eans of iproving the perforance of a "weak learner" in the sense that given sufficient data, it would be guaranteed to produce an error rate which is better than "rando guessing". Mazio and Taylor (004) proposed an algorith in which a kernel density classifier is boosted by suitably re-weighting the data. This weight placed on the kernel estiator is a ratio of the log function in which the denoinator is a leave-one-out estiate of the density function. Also, Mazio and Taylor s (004) work theoretically gave an explanation and finally showed how boosting is able to reduce the bias in the asyptotic ean integrated squared error (AMISE) of Silveran (1986). In section, we shall see how the Mazio and Taylor's (004) algorith is a bias reduction technique. Section 3 shall show a proposed eshsize algorith and its liitations. Nuerical exaples Journal of Science and Technology, Vol. 8, No., August, 008 69

A eshsize boosting algorith in kernel density estiation are used to justify this algorith in section 4 and copare the results with that of Mazio and Taylor (004). Algorith on boosting kernel density estiates and bias reduction Throughout this paper, we shall assue our data to be univariate. The algorith of Mazio and Taylor (004) is briefly suarized in algorith 1. Algorith 1 Step 1: Given {x 1, i=1. n}, initialize 1 =1/n Step : Select h (the soothing paraeter). Step 3: For =1,, M, obtain a weighted kernel estiate ( i) n x x fˆ i = k (.1) i= 1 h h where x can be any value within the range of the x 1 s, k is the kernel function and w is a weight function; and then update the weights according to ( i) = ( i) + 1 ˆ Step 4: Provide output as M = 1 f ˆ ( x ) ( x ) fˆ i + log ( 1 ) (.) f ( xi ) renoralized to integrate to unity For full ipleentation of this algorith see Marzio and Taylor (004). Boosting as a Bias Reduction in Kernel Density Estiation Suppose we want to estiate f(x) by a ultiplicative estiate. e also suppose that we use only "weak" estiates which are such that h does not tend to zero as n. Let us use a population version instead of saple in which our weak learner, for h >0 is given by x y fˆ = 1 ( y) K f ( y)dy (.3) h h where 1 (y) is taken to be 1. e shall take our kernel function to be Gaussian (since all distributions tend to be noral as n, the saple size, becoes large through central liit theory (Towers, 00). The first approxiation in the Taylor's series, valid for h < 1 provided that the derivatives of f(x) are properly behaved, is fˆ ( 1) = f h f + and so we observe the usual bias of order 0(h ) of and and Jones (1995). If we now let ( ) ˆ x = f( ) 1 1 1 ( x+ zh) 4 + ( h ) f ( x zh)dz f f ˆ = k( z) f ( x+ zh) + h 0 + 4 ( ) 1 h f = + 0 f h This gives an overall estiator at the second step as 4 h f ( ) h 1 ˆ ( )( ) ˆ f f 1 x f( ) = f 1 + h + 0 + 0 f f 4 0( h ) f + (.4) 4 ( h ) (.5) which is clearly of order four and so we can see a bias reduction fro order two to order four. Proposed eshsize algorith in boosting In this section, we shall see how the leave-oneout estiator of the (.) in the weight function can be replaced by a eshsize estiator due to the tie coplexity involved in the leave-one-out estiator. In the leave-one-out estiator, we require (n+(n-1)).n function evaluations of the density for each boosting step. Thus, we are using a eshsize in its place. The only liitation on this eshsize algorith is that we ust first deterine the quantity 1/nh so as to know what the eshsize that would be placed on the weight function of (.) would be. The need to use a eshsize in place of the leave-one-out lies on the 70 Journal of Science and Technology, Vol. 8, No., August, 008

A eshsize boosting algorith in kernel density estiation fact that boosting is like the steepest-descent algorith in unconstrained optiization and thus a good substitute that approxiates the leave-one-out estiate of the function (Duffy and Helbold, 000; Taha, 1971; Ratsch et al., 000; Mannor et al., 001). The new eshsize algorith is stated as: Algorith Step 1: Given {x 1, i=1. n}, initialize 1 =1/n, initialize Step : Select h (the soothing paraeter). Step 3: For =1,, M, i) Get fˆ n = ( i) x xi k i= 1 h h where x can be any value within the range of the x 1 s, k is the kernel function and w is a weight function ii) Update ( i) = ( i) esh +1 + Step 4: Provide output M = 1 f ˆ ( x ) and noralize to integrate to unity Table 4.1 Showing bias reduction e can see that the weight function uses a eshsize instead of the leave-one-out log ratio function of Mazio and Taylor (004). The nuerical verification of this algorith would be seen in section 4. NUMERICAL RESULTS In this section, we shall use three sets of data (data 1 is the lifespan of a car battery in years, data is the nuber of written words without istakes in every 100 words by a set of students and data 3 is the scar length of patients randoly sapled. See Ishiekwene and Osewenkhae, 006) which were suspected to be noral to illustrate algoriths 1 and and copare their results. BASIC prograing language is used to generate the results for a specified h (Ishiekwene and Osewenkhae (006) and their corresponding graphs shown in figures 1a 1c for data 1(n=40), figures a c for data (n=64) while figures 3a 3c is for data 3(n=110). Also, table 4.1 shows the bias reduction we discussed in section for the three sets of data used. e can clearly see the bias reduction between the noral kernel estiator and the boosted kernel estiator. e also copared our boosted kernel estiator with the fourth-order kernel estiator of Jones and Signorini (1997) which is arguably the best soothing paraeter choice and an equivalent of Ishiekwene and Osewenkhae s (006) higher-order choice at order four. CONCLUSION Fro the graphs in figures 1A 3C for the three sets of data, we can clearly see that our proposed NORMAL (conventional order two) H4 (as in Jones and Signorini 1997 and Ishiekwene and Osewenkhae 006) BOOSTED (proposed schee) Bias Var AMISE Bias var AMISE Bias var AMISE Data 1 0.00576637 0.019811685 0.508835 0.00071803 0.01478945 0.016861048 0.00078009 0.015168456 0.01746655 Data 0.00093946 0.00113040 0.00144348 0.000108591 0.000809307 0.000917898 0.000108715 0.0008153 0.000930868 Data 3 0.004697604 0.01663515 0.0131119 0.001767617 0.0113416 0.013109779 0.00176818 0.01144619 0.01314437 Journal of Science and Technology, Vol. 8, No., August, 008 71

A eshsize boosting algorith in kernel density estiation.50e-01.00e-01 F D P 5.00E-0.00E-01 1.80E-01 1.60E-01 1.40E-01 1.0E-01 8.00E-0 6.00E-0 4.00E-0 0.00E+00 Fig. 1A (Leave-one out Algoriths).50E-01.00E-01.00E-0 Fig. A (Leave-One-Out Algoriths).50E-01.00E-01 5.00E-0 5.00E-0 1 3 5 7 9 11 1 Fig. 1B (Meshsize Algoriths) 1 3 4 5 6 7 8 9 101111314 Fig. B (Meshsize Algoriths) F D P.50E-01.00E-01 5.00E-0 LEAVE-ONE-OUT MESHSIZE Fig. 1C (Coparison of both Algoriths).50E-01.00E-01 LEAVE-ONE-OUT MESHSIZE F D P 5.00E-0 Fig. C; (Coparison of Both Algoriths) 7 Journal of Science and Technology, Vol. 8, No., August, 008

0.16 0.14 0.1 0.1 0.08 0.06 0.04 0.0 0 A eshsize boosting algorith in kernel density estiation 1 3 4 5 6 7 8 9 10111131415 Fig. 3A (Leave-One-Out Algoriths) P D F 0.18 0.16 0.14 0.1 0.1 0.08 0.06 0.04 0.0 0 1 3 4 5 6 7 8 9 10 11 1 13 14 15 Fig. 3B (Meshsize Algoriths) 0.18 0.16 0.14 0.1 0.1 0.08 0.06 0.04 0.0 0 15 LEAVE-ONE-OUT MESHSIZE Fig. 3C (Coparison of both Algoriths) algorith copares favorably if not arguably better than that of Mazio and Taylor (004). Also, the results of Table 4.1 has revealed the bias reduction targeted by both algoriths. Since our algorith does the sae task with fewer function evaluations and in fact better in ters of tie coplexity, we are recoending it for use in boosting in KDE. REFERENCES Duffy, N. and Helbold, D. (000). Potential bosters? Advances in Neural info. Proc. Sys. 1: 58 64. Freund, Y. (1995). Boosting a weak learning algorith by ajority. Info Cop. 11: 56-85. Freund, Y. and Schapire, R. (1996). Experients with a new boosting algorith. In achine learning: proceedings of the 1 th Intl. conf. Ed. L. Saite 148-156. Ishiekwene, C. C. and Osewenkhae, J. E. (006). A coparison of fourth order window width selectors in KDE (A Univarate case). ABACUS 33(A): 14 0. Jones, M.C. and Signorini, D.F. (1997). A coparison of Higher-order Bias Kernel Density Estiators. JASA 9: 1063 107. Mannor, S., Meir, R. and Mendelson, S. (001). On consistency of boosting algoriths. (Unpublished Manuscript). Mazio, D. M. and Taylor, C.C. (004). Boosting Kernel Density estiates: A bias reduction technique? Bioetrika 91: 6-33. Ratsch, G. Mika, S., Scholkopf, B. and Muller, K. K. (000). Construction boosting algoriths fro SVM's: An application to-one-class classification. GMD Tech Report No. 1. Schapire, R. (1990). The strength of weak learnability. Machine learn. 5: 313 31. Schapire, R. and Singer, Y. (1999). Iproved boosting algorith using confidence rated prediction. Machine learn. 37: 97 336. Journal of Science and Technology, Vol. 8, No., August, 008 73

A eshsize boosting algorith in kernel density estiation Silveran, B.. (1986). Density Estiation for Statistics and Data analysis. Chapan and Hall, London; 34 74. Taha, H. A. (1971). Operations Research: An Introduction Prentice Hall, New Jersey; 781 814. Towers, S. (00). Kernel Probability density estiation ethods. In; M. halley, L. Lyons (Eds), Advanced Statistical techniques in particle Physics, Durha,England, 11 115. and, M.P. and Jones, M.C. (1995). Kernel Soothing. Chapan and Hall/CRC, London; 10 88. enxin, J. (001). Soe theoretical aspects of boosting in the presence of noisy data. Nueral Coputation 14(10): 415-437. 74 Journal of Science and Technology, Vol. 8, No., August, 008