Ensemble Confidence Estimates Posterior Probability

Similar documents
Ensamble methods: Bagging and Boosting

Ensamble methods: Boosting

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

1 Review of Zero-Sum Games

Vehicle Arrival Models : Headway

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

An Ensemble Approach for Incremental Learning in Nonstationary Environments

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

20. Applications of the Genetic-Drift Model

Notes on Kalman Filtering

Linear Response Theory: The connection between QFT and experiments

Sequential Importance Resampling (SIR) Particle Filter

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

Comparing Means: t-tests for One Sample & Two Related Samples

STATE-SPACE MODELLING. A mass balance across the tank gives:

Air Traffic Forecast Empirical Research Based on the MCMC Method

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Testing for a Single Factor Model in the Multivariate State Space Framework

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Random Walk with Anti-Correlated Steps

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Bias-Variance Error Bounds for Temporal Difference Updates

Matlab and Python programming: how to get started

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

A Shooting Method for A Node Generation Algorithm

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2

New Boosting Methods of Gaussian Processes for Regression

A new flexible Weibull distribution

Asymptotic Equipartition Property - Seminar 3, part 1

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

Online Convex Optimization Example And Follow-The-Leader

Christos Papadimitriou & Luca Trevisan November 22, 2016

The ROC-Boost Design Algorithm for Asymmetric Classification

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Stability and Bifurcation in a Neural Network Model with Two Delays

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

) were both constant and we brought them from under the integral.

GMM - Generalized Method of Moments

Lecture 33: November 29

UNIVERSITY OF TRENTO MEASUREMENTS OF TRANSIENT PHENOMENA WITH DIGITAL OSCILLOSCOPES. Antonio Moschitta, Fabrizio Stefani, Dario Petri.

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.

Some Basic Information about M-S-D Systems

3.1 More on model selection

Echocardiography Project and Finite Fourier Series

RC, RL and RLC circuits

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France

Lab 10: RC, RL, and RLC Circuits

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Distributed Deep Learning Parallel Sparse Autoencoder. 2 Serial Sparse Autoencoder. 1 Introduction. 2.1 Stochastic Gradient Descent

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

Some Ramsey results for the n-cube

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Tasty Coffee example

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

5.2. The Natural Logarithm. Solution

Particle Swarm Optimization

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

The Arcsine Distribution

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

A Video Vehicle Detection Algorithm Based on Improved Adaboost Algorithm Weiguang Liu and Qian Zhang*

Lecture 2 April 04, 2018

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Final Spring 2007

Recent Developments In Evolutionary Data Assimilation And Model Uncertainty Estimation For Hydrologic Forecasting Hamid Moradkhani

(Not) Bounding the True Error

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

EKF SLAM vs. FastSLAM A Comparison

Ecological Archives E A1. Meghan A. Duffy, Spencer R. Hall, Carla E. Cáceres, and Anthony R. Ives.

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011

The field of mathematics has made tremendous impact on the study of

Financial Econometrics Kalman Filter: some applications to Finance University of Evry - Master 2

Appendix to Creating Work Breaks From Available Idleness

Learning Naive Bayes Classifier from Noisy Data

Shiva Akhtarian MSc Student, Department of Computer Engineering and Information Technology, Payame Noor University, Iran

An introduction to the theory of SDDP algorithm

Vectorautoregressive Model and Cointegration Analysis. Time Series Analysis Dr. Sevtap Kestel 1

Effect of Pruning and Early Stopping on Performance of a Boosting Ensemble

Transcription:

Ensemble Esimaes Poserior Probabiliy Michael Muhlbaier, Aposolos Topalis, and Robi Polikar Rowan Universiy, Elecrical and Compuer Engineering, Mullica Hill Rd., Glassboro, NJ 88, USA {muhlba6, opali5}@sudens.rowan.edu polikar@rowan.edu Absrac. We have previously inroduced he Learn ++ algorihm ha provides surprisingly promising performance for incremenal learning as well as daa fusion applicaions. In his conribuion we show ha he algorihm can also be used o esimae he poserior probabiliy, or he confidence of is decision on each es insance. On hree increasingly difficul ess ha are specifically designed o compare poserior probabiliy esimaes of he algorihm o ha of he opimal Bayes classifier, we have observed ha esimaed poserior probabiliy approaches o ha of he Bayes classifier as he number of classifiers in he ensemble increase. This saisfying and inuiively expeced oucome shows ha ensemble sysems can also be used o esimae confidence of heir oupu. Inroducion Ensemble / muliple classifier sysems have enjoyed increasing aenion and populariy over he las decade due o heir favorable performances and/or oher advanages over single classifier based sysems. In paricular, ensemble based sysems have been shown, among oher hings, o successfully generae srong classifiers from weak classifiers, resis over-fiing problems [, ], provide an inuiive srucure for daa fusion [-4], as well as incremenal learning problems [5]. One area ha has received somewha less of an aenion, however, is he confidence esimaion poenial of such sysems. Due o heir very characer of generaing muliple classifiers for a given daabase, ensemble sysems provide a naural seing for esimaing he confidence of he classificaion sysem on is generalizaion performance. In his conribuion, we show how our previously inroduced algorihm Learn ++ [5], inspired by AdaBoos bu specifically modified for incremenal learning applicaions, can also be used o deermine is own confidence on any given specific es daa insance. We esimae he poserior probabiliy of he class chosen by he ensemble using a weighed sofmax approach, and use ha esimae as he confidence measure. We empirically show on hree increasingly difficul daases ha as addiional classifiers are added o he ensemble, he poserior probabiliy of he class chosen by he ensemble approaches o ha of he opimal Bayes classifier. I is imporan o noe ha he mehod of ensemble confidence esimaion being proposed is no specific o Learn ++, bu can be applied o any ensemble based sysem. N.C. Oza e al. (Eds.): MCS 5, LNCS 354, pp. 36 335, 5. Springer-Verlag Berlin Heidelberg 5

Ensemble Esimaes Poserior Probabiliy 37 Learn ++ In ensemble approaches using a voing mechanism o combine classifier oupus, he individual classifiers voe on he class hey predic. The final classificaion is hen deermined as he class ha receives he highes oal voe from all classifiers. Learn ++ uses weighed majoriy voing, a raher non-democraic voing scheme, where each classifier receives a voing weigh based on is raining performance. One novely of he Learn ++ algorihm is is abiliy o incremenally learn from newly inroduced daa. For breviy, his feaure of he algorihm is no discussed here and ineresed readers are referred o [4,5]. Insead, we briefly explain he algorihm and discuss how i can be used o deermine is confidence as an esimae of he poserior probabiliy on classifying es daa. For each daase (D k ) ha consecuively becomes available o Learn ++, he inpus o he algorihm are (i) a sequence of m raining daa insances x k,i along wih heir correc labels y i, (ii) a classificaion algorihm BaseClassifier, and (iii) an ineger T k specifying he maximum number of classifiers o be generaed using ha daabase. If he algorihm is seeing is firs daabase (k=), a daa disribuion (D ) from which raining insances will be drawn - is iniialized o be uniform, making he probabiliy of any insance being seleced equal. If k>, hen a disribuion iniializaion sequence, iniializes he daa disribuion. The algorihm hen adds T k classifiers o he ensemble saring a =et k + where et k denoes he number of classifiers ha currenly exis in he ensemble. The pseudocode of he algorihm is given in Figure. For each ieraion, he insance weighs, w, from he previous ieraion are firs normalized (sep ) o creae a weigh disribuion D. A hypohesis, h, is generaed using a subse of D k drawn from D (sep ). The error, ε, of h is calculaed: if ε > ½, he algorihm deems he curren classifier h o be oo weak, discards i, and reurns o sep ; oherwise, calculaes he normalized error β (sep 3). The weighed majoriy voing algorihm is called o obain he composie hypohesis, H, of he ensemble (sep 4). H represens he ensemble decision of he firs hypoheses generaed hus far. The error E of H is hen compued and normalized (sep 5). The insance weighs w are finally updaed according o he performance of H (sep 6), such ha he weighs of insances correcly classified by H are reduced and hose ha are misclassified are effecively increased. This ensures ha he ensemble focus on hose regions of he feaure space ha are ye o be learned. We noe ha H allows Learn ++ o make is disribuion updae based on he ensemble decision, as opposed o AdaBoos which makes is updae based on he curren hypohesis h. 3 as an Esimae of Poserior Probabiliy In applicaions where he daa disribuion is known, an opimal Bayes classifier can be used for which he poserior probabiliy of he chosen class can be calculaed; a quaniy which can hen be inerpreed as a measure of confidence [6]. The poserior probabiliy of class ω j given insance x is classically defined using he Bayes rule as:

38 M. Muhlbaier, A. Topalis, and R. Polikar Inpu: For each daase D k k=,,,k Sequence of i=,,m k insances x k,i wih labels y Y {,..., c} i k = Weak learning algorihm BaseClassifier. Ineger T k, specifying he number of ieraions. Do for k=,,,k If k= Iniialize w = D( i) = / m, et = for all i. Else Go o Sep 5 o evaluae he curren ensemble on new daase D k, updae weighs, and recall curren number of classifiers! j= Do for = et k +, et k +,, et k + Tk : m. Se D = w i= w ( i) so ha D is a disribuion. et = k k T j. Call BaseClassifier wih a subse of D k randomly chosen using D. 3. Obain h : X! Y, and calculae is error: ε = D() i ih : ( x ) y i i If ε > ½, discard h and go o sep. Oherwise, compue normalized error as β = ε ε ). ( 4. Call weighed majoriy voing o obain he composie hypohesis H = arg max log ( β) y Y h : ( xi) = y i 5. Compue he error of he composie hypohesis E = D() i ih : ( xi) yi 6. Se B =E /(-E ), <B <, and updae he insance weighs: B, if H( xi) = yi D+ () i = D, oherwise Call weighed majoriy voing o obain he final hypohesis. H final K arg max log y Y k = : h ( xi) = y i = ( β ) Fig.. Learn ++ Algorihm P( x ωj) P( ωj) P( ω j x) = N P( x ω ) ( ) k k P ω = k Since class disribuions are rarely known in pracice, poserior probabiliies mus be esimaed. While here are several echniques for densiy esimaion [7], such echniques are difficul o apply for large dimensional problems. A mehod ha can ()

Ensemble Esimaes Poserior Probabiliy 39 esimae he Bayesian poserior probabiliy would herefore prove o be a mos valuable ool in evaluaing classifier performance. Several mehods have been proposed for his purpose [6-9]. One example is he sofmax model [8], commonly used wih classifiers whose oupus are binary encoded, as such oupus can be mapped ino an esimae of he poserior class probabiliy using Aj ( x) e P( ω x) C ( x ) = () j j N Ak ( x) e k= where A j (x) represens he oupu for class j, and N is he number of classes. C j (x) is hen he confidence of he classifier in predicing class ω j for insance x, which is an esimae of he poserior probabiliy P(ω j x). The sofmax funcion essenially akes he exponenial of he oupu and normalizes i o [ ] range by summing over he exponenials of all oupus. This model is generally believed o provide good esimaes if he classifier is well rained using sufficienly dense raining daa. In an effor o generae a measure of confidence for an ensemble of classifiers in general, and for Learn ++ in paricular, we expand he sofmax concep by using he individual classifier weighs in place of a single exper s oupu. The ensemble confidence, esimaing he poserior probabiliy, can herefore be calculaed as: where Fj ( x) e P( ω x) C ( x ) = (3) j j N Fk ( x) e k= ( β ) N log h( x) = ω j Fj ( x ) = = oherwise (4) The confidence, C j (x), associaed wih class ω j for insance x is herefore he exponenial of he sum of classifier weighs ha seleced class ω j, divided by he sum of he aforemenioned exponenials corresponding o each class. The significance of his confidence esimaion scheme is in is consideraion of he diversiy in he classifier decisions: in calculaing he confidence of class ω j, he confidence will increase if he classifiers ha did no choose class ω j have varying decisions as opposed o having a common decision, ha is, if he evidence agains class ω j is no srong. On he oher hand, he confidence will decrease if he classifiers ha did no choose class ω j have a common decision, ha is, here is srong evidence agains class ω j. 4 Simulaion Resuls In order o find ou if and how well he Learn ++ ensemble confidence approximaes he Bayesian poserior probabiliy, he modified sofmax approach was analyzed on hree increasingly difficul problems. In order o calculae he heoreical Bayesian poserior probabiliies, and hence compare he Learn ++ confidences o hose of Bayes-

33 M. Muhlbaier, A. Topalis, and R. Polikar ian probabiliies, experimenal daa were generaed from Gaussian disribuion. For raining, random insances were seleced from each class disribuion, using which an ensemble of 3 MLP classifiers were generaed wih Learn ++. The daa and classifier generaion process was hen repeaed and averaged imes wih randomly seleced daa o ensure generaliy. For each simulaion, we also benchmark he resuls by calculaing a mean square error beween Learn ++ and Bayes confidences over he enire grid of he feaure space, wih each added classifier o he ensemble. 4. Experimen A wo feaure, hree class problem, where each class has a known Gaussian disribuion is seen in Fig.. In his experimen class,, and 3 have a variance of.5 and are cenered a [-, ], [, ], and [, -], respecively. Since he disribuion is known (and is Gaussian), he acual poserior probabiliy can be calculaed from Equaion, given he known likelihood P(x ω j ) ha can be calculaed as ( π ) d / / Σ j T ( x µ j) Σ j ( x µ j) P( x ω ) = e (5) j where d is he dimensionaliy, and µ j and Σ j are he mean and he covariance marix of he disribuion from which j h class daa are generaed. Each class was equally likely, hence P(ω j )=/3. For each insance, over he enire grid of he feaure space shown in Fig., we calculaed he poserior probabiliy of he class chosen by he Bayes classifier, and ploed hem as a confidence surface, as shown in Fig.3a. Calculaing he confidences of Learn ++ decisions on he same feaure space provided he plo in Fig 3b, indicaing ha he ensemble confidence surface closely approximaes ha of he Bayes classifier. Densiy y.5.4.3.. Feaure - - - - Feaure Fig.. Daa disribuions used in Experimen

Ensemble Esimaes Poserior Probabiliy 33.8.6.8.6 Feaure - - - Feaure Feaure - - - Feaure Fig. 3. (a) Bayesian and (b) Learn ++ confidence surface for Experimen I is ineresing o noe ha he confidences in boh cases plumme around he decision boundaries and approach away from he decision boundary, an oucome ha makes inuiive sense. To quaniaively deermine how closely he Learn ++ confidence approximaes ha of Bayes classifier, and how his approximaion changes wih each addiional classifier, he mean squared error (MSE) was calculaed beween he ideal Bayesian confidence surface and he Learn ++ confidence over he enire grid of he feaure space - for each addiional classifier added o he ensemble. As seen in Fig.4, MSE beween he wo decreases as new classifiers are added o he ensemble, an expeced, bu neverheless immensely saisfying oucome. Furhermore, he decrease in he error is exponenial and raher monoonic, and does no appear o indicae any over-fiing, a leas for as many as 3 classifiers added o he ensemble. The ensemble confidence was hen compared o ha of a single MLP classifier, where he confidence was calculaed using he MLP s raw oupu values. The mean squared error was calculaed beween he resuling confidence and he Bayesian confidence and has been ploed as a doed line in Fig. 4 in comparison o he Learn ++ confidence. The single MLP differs from classifiers generaed using he Learn ++ algorihm on wo accouns. Firs, he single MLP is rained using all of he raining daa where each classifier in he Learn ++ ensemble is rained on /3 of he raining daa. Also, Learn ++ confidence is based on he discree decision of each classifier. If here were only one classifier in he ensemble, all classifiers would agree resuling in a confidence of. Therefore, confidence of a single MLP can only be calculaed based on he (sofmax normalized) acual oupu values unlike Learn ++ which uses a weighed voe of he discree oupu labels. 4. Experimen To furher characerize he behavior of his confidence esimaion scheme, Experimen was repeaed by increasing he variances of he class disribuions from.5 o.75, resuling in a more overlapping disribuion (Fig. 5) and a ougher classificaion problem. Learn ++ was rained wih daa generaed from his disribuion, is confidence calculaed over he enire grid of he feaure space and ploed in comparison o ha of Bayes classifier in Fig. 6. We noe ha low confidence valleys around he decision boundaries are wider in his case, an expeced oucome of he increased variance.

33 M. Muhlbaier, A. Topalis, and R. Polikar.3.3 Mean Squared Error.8.6.4...8 5 5 5 3 Number of Classifiers Fig. 4. Mean square error as a funcion of number of classifiers - Experimen Densiy.4. Feaure - - - - Feaure Fig. 5. Daa disribuions used in Experimen.8.6.8.6 Feaure - - - - Feaure Feaure - - - - Feaure Fig. 6. (a) Bayesian and (b) Learn ++ confidence surface for Experimen

Ensemble Esimaes Poserior Probabiliy 333.5.45 Mean Squared Error.4.35.3.5. 5 5 5 3 Number of Classifiers Fig. 7. Mean square error as a funcion of number of classifiers - Experimen Fig.7 shows ha he MSE beween he Bayes and Learn ++ confidences is once again decreasing as new classifiers are added o he ensemble. Fig. 7 also compares Learn ++ performance o a single MLP, shown as he doed line, as described above. 4.3 Experimen 3 Finally, an addiional class was added o he disribuion from Experimen wih a variance of.5 and mean a [ ] (Fig. 8), making i an even more challenging classificaion problem due o addiional overlap beween classes. Similar o he previous wo experimens, an ensemble of 3 classifiers was generaed by Learn ++, and rained wih daa drawn from he above disribuion. The confidence of he ensemble over he enire feaure space was calculaed and ploed in comparison wih he poserior probabiliy based confidence of he Bayes classifier over he same feaure space. Fig. 9 shows hese confidence plos, where he Learn ++ based ensemble confidence (Fig. 9b) closely approximaes ha of Bayes (Fig. 9a). Densiy.6.4. Feaure - - - - Feaure Fig. 8. Daa disribuions used in Experimen 3

334 M. Muhlbaier, A. Topalis, and R. Polikar.8.6.8.6 Feaure - - - Feaure Feaure - - - Feaure Fig. 9. (a) Bayesian and (b) Learn ++ confidence surface for Experimen 3 Fig. 9 indicaes ha Learn ++ assigns a larger peak confidence o he middle class han he Bayes classifier. Since he Learn ++ confidence is based on he discree decision of each classifier, when a es insance is presened from his porion of he space, mos classifiers agree on he middle class resuling in a high confidence. However, he Bayesian confidence is based on he disribuion of he paricular class and he disribuion overlap of he surrounding classes, hus lowering he confidence. Finally, he MSE beween he Learn ++ confidence and he Bayesian confidence, ploed in Fig., as a funcion of ensemble populaion, shows he now-familiar characerisic of decreasing error wih each new classifier added o he ensemble. For comparison, a single MLP was also rained on he same daa, and is mean squared error wih respec o he Bayesian confidence is shown by a doed line..5.45 Mean Squared Error.4.35.3.5 5 5 5 3 Number of Classifiers Fig.. Mean square error as a funcion of number of classifiers - Experimen 3 5 Conclusions and Discussions In his conribuion we have shown ha he confidence of an ensemble based classificaion algorihm in is own decision can easily be calculaed as an exponenially nor-

Ensemble Esimaes Poserior Probabiliy 335 malized raio of he weighs. Furhermore, we have shown - on hree experimens of increasingly difficul Gaussian disribuion - ha he confidence calculaed in his way approximaes he poserior probabiliy of he class chosen by he opimal Bayes classifier. In each case, we have observed ha he confidences calculaed by Learn ++ approximaed he Bayes poserior probabiliies raher well. However, in order o quaniaively assess exacly how close he approximaion was, we have also compued he mean square error beween he wo over he enire grid of he feaure space on which he wo classifiers were evaluaed. We have ploed his error as a funcion of he number of classifiers in he ensemble, and noiced ha he error decreased exponenially and monoonically as he number of classifiers increased; an inuiive, ye quie saisfying oucome. No over-fiing effecs were observed afer as many as 3 classifiers, and he final confidences esimaed by Learn ++ was ypically wihin % of he poserior probabiliies calculaed for he Bayes classifier. While hese resuls were obained by using Learn ++ as he ensemble algorihm, hey should generalize well o oher ensemble and/or boosing based algorihms. Acknowledgemen This maerial is based upon work suppored by he Naional Science Foundaion under Gran No. ECS-399, CAREER: An Ensemble of Classifiers Approach for Incremenal Learning. References. Kuncheva L.I. Combining Paern Classifiers, Mehods and Algorihms, Hoboken, NJ: Wiley Inerscience, 4.. Y. Freund and R. Schapire, A decision heoreic generalizaion of on-line learning and an applicaion o boosing, Compuer and Sysem Sci., vol. 57, no., pp. 9-39, 997. 3. Kuncheva L.I., A Theoreical Sudy on Six Classifier Fusion Sraegies, IEEE Trans. Paern Analysis and Machine Inelligence, vol. 4, no., pp. 8 86,. 4. Lewi M. and Polikar R., An ensemble approach for daa fusion wih Learn ++, Proc. 4 h In. Work. on Muliple Classifier Sysems, (Windea T.and Roli F., eds.) LNCS vol. 79, pp. 76-86, Berlin: Springer, 3. 5. Polikar R., Udpa L., Udpa S., and Honavar V., Learn ++ : An incremenal learning algorihm for supervised neural neworks, IEEE Trans. on Sysem, Man and Cyberneics (C), vol. 3, no. 4, pp. 497-58,. 6. Duin R.P., Tax M., Classifier condiional poserior probabiliies, Lecure Noes in Compuer Science, LNCS vol. 45, pp. 6-69, Berlin: Springer, 998. 7. Duda R., Har P., Sork D., In Paern Classificaion /e, Chap. 3 &4, pp. 8-4, New York, NY: Wiley Inerscience,. 8. Alpaydin E. and Jordan M. Local linear perceprons for classificaion. IEEE Transacions on Neural Neworks vol. 7, no. 3, pp. 788-79, 996. 9. Wilson D., Marinez T., Combining cross-validaion and confidence o measure finess, IEEE Join Conf. on Neural Neworks, vol., pp. 49-44, 999.