ESTIMATION OF MISCLASSIFICATION ERROR USING BAYESIAN CLASSIFIERS

Similar documents
Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Introduction to local (nonparametric) density estimation. methods

ESS Line Fitting

Functions of Random Variables

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Summary of the lecture in Biostatistics

TESTS BASED ON MAXIMUM LIKELIHOOD

Point Estimation: definition of estimators

CHAPTER VI Statistical Analysis of Experimental Data

Bayes (Naïve or not) Classifiers: Generative Approach

STK4011 and STK9011 Autumn 2016

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

The Mathematical Appendix

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

Lecture Notes Types of economic variables

ENGI 3423 Simple Linear Regression Page 12-01

6.867 Machine Learning

Non-uniform Turán-type problems

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

An Introduction to. Support Vector Machine

A New Measure of Probabilistic Entropy. and its Properties

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Lecture 3. Sampling, sampling distributions, and parameter estimation

Simple Linear Regression

Module 7: Probability and Statistics

Class 13,14 June 17, 19, 2015

Analysis of Lagrange Interpolation Formula

Entropy ISSN by MDPI

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Unsupervised Learning and Other Neural Networks

ρ < 1 be five real numbers. The

Generating Multivariate Nonnormal Distribution Random Numbers Based on Copula Function

Chapter 14 Logistic Regression Models

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Bayes Interval Estimation for binomial proportion and difference of two binomial proportions with Simulation Study

Department of Agricultural Economics. PhD Qualifier Examination. August 2011

Chapter 3 Sampling For Proportions and Percentages

Some Notes on the Probability Space of Statistical Surveys

Beam Warming Second-Order Upwind Method

BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION

Chapter 8. Inferences about More Than Two Population Central Values

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

Median as a Weighted Arithmetic Mean of All Sample Observations

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Simple Linear Regression

LECTURE 2: Linear and quadratic classifiers

MAX-MIN AND MIN-MAX VALUES OF VARIOUS MEASURES OF FUZZY DIVERGENCE

Lecture 8: Linear Regression

Lecture 3 Probability review (cont d)

Chapter 5 Properties of a Random Sample

(b) By independence, the probability that the string 1011 is received correctly is

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Econometric Methods. Review of Estimation

MULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Confidence Intervals for Double Exponential Distribution: A Simulation Approach

A tighter lower bound on the circuit size of the hardest Boolean functions

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

On generalized fuzzy mean code word lengths. Department of Mathematics, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Lecture 07: Poles and Zeros

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

MOLECULAR VIBRATIONS

Third handout: On the Gini Index

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem

Permutation Tests for More Than Two Samples

A Robust Total Least Mean Square Algorithm For Nonlinear Adaptive Filter

PTAS for Bin-Packing

MEASURES OF DISPERSION

PROJECTION PROBLEM FOR REGULAR POLYGONS

Derivation of 3-Point Block Method Formula for Solving First Order Stiff Ordinary Differential Equations

Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

2006 Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America

Supervised learning: Linear regression Logistic regression

Lecture 1. (Part II) The number of ways of partitioning n distinct objects into k distinct groups containing n 1,

Bootstrap Method for Testing of Equality of Several Coefficients of Variation

A New Family of Transformations for Lifetime Data

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Descriptive Statistics

Objectives of Multiple Regression

Multivariate Transformation of Variables and Maximum Likelihood Estimation

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Special Instructions / Useful Data

Lecture 9: Tolerant Testing

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

VOL. 3, NO. 11, November 2013 ISSN ARPN Journal of Science and Technology All rights reserved.

Module 7. Lecture 7: Statistical parameter estimation

On Fuzzy Arithmetic, Possibility Theory and Theory of Evidence

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR

Transcription:

Producto Systems ad Iformato Egeerg Volume 5 (2009), pp. 4-50. ESTIMATION OF MISCLASSIFICATION ERROR USING BAYESIAN CLASSIFIERS PÉTER BARABÁS Uversty of Msolc, Hugary Departmet of Iformato Techology barabas@t.u-msolc.hu LÁSZLÓ KOVÁCS Uversty of Msolc, Hugary Departmet of Iformato Techology ovacs@t.u-msolc.hu [Receved Jauary 2009 ad accepted Aprl 2009] Abstract. Bayesa classfers provde relatvely good performace compared wth other more complex algorthms. Msclassfcato rato s very low for traed samples, but the case of outlers the msclassfcato error may crease sgfcatly. The usage of summato hac method Bayesa classfcato algorthm ca reduce the msclassfcatos rate for utraed samples. The goal of ths paper s to aalyze the applcablty of summato hac Bayesa classfers geeral. Keywords: Bayesa classfer, summato hac, polyomal dstrbuto, msclassfcato error. Itroducto The Bayesa classfcato method s a geeratve statstcal classfer. Studes comparg classfcato algorthms have foud that the smple or Nave Bayesa classfer provdes relatvely good performace compared wth other more complex algorthms. Accuracy of classfcato s a very mportat property of a classfer, a measure of whch ca be separated to two parts: a measure of accuracy case of traed samples ad a measure of accuracy case of utraed samples. Nave Bayesa classfcato s geerally very accurate the frst case sce all testg samples are traed before ad have o outlers; the secod case the effcecy s worse due to outlers. I [], the role of outlers s examed classfcato methods, the Nave Bayesa classfcato s reactve to outlers, ad they ca cause msclassfcato. Usage of summato hac ca reduce the effect of outlers. The goal of our research s to aalyze the geeralzato capablty of Bayesa classfcato usg summato hac. I the secod part a short summary about Nave Bayesa classfcato s gve. I the thrd part the cocept of summato hac s troduced ad examed. I the fourth part the classfcato methods are

42 P. BARABÁS AND L. KOVÁCS aalyzed cosderg the msclassfcato error. Fally, the test results ad coclusos have bee summarzed the last secto. It s assumed that the obects to be classfed are descrbed by -dmesoal patter vectors x = (x,,x ) R. The dmesos correspod to the attrbutes of the obects. Every patter vector s assocated wth a class label c, where the total umber of classes s m. The class label c deotes that the obect belogs to the -th class. Thus, a classfer ca be regarded as a fucto g x ) : R { c,..., c }. (.) ( m The optmal classfcato fucto s amed at mmzg the msclassfcato rs [2]. The rs value R depeds o the probablty of the dfferet classes ad o the msclassfcato cost of the classes: R ( g( x ) x) = b( g( x) c ) c x), (.2) c where c x) deotes the codtoal probablty of c for the patter vector x ad b(c c ) deotes the cost value of decdg favor of c stead of the correct class c. The cost fucto b has usually the followg smplfed form: 0, f c = c b ( c c ) = (.3), f c c. Usg ths d of fucto b, the msclassfcato error value ca be gve by R( g( x ) x) = g x c ( ) x). (.4) The optmal classfcato fucto mmzes the value R(g(x) x). As thus f c c c x ) =, (.5) P ( g( x) x) max, (.6) the the R(g(x) x) has a mmal value. The decso rule whch mmzes the average rs s the Bayes rule whch assgs the x patter vector to the class that has the greatest probablty for x[3].

ESTIMATION OF MISCLASSIFICATION ERROR USING BAYES CLASSIFICATION 43 2. Bayes classfcato A Bayesa classfer s based o Bayes theorem whch relates to the codtoal ad margal probabltes of two radom evets. Let A ad B deote evets. Codtoal probablty A B) s the probablty of evet A, gve the occurrece of evet B. Margal probablty s the ucodtoal probablty A) of evet A, regardless of whether evet B does or does ot occur. The smplfed verso of Bayesa theorem ca be wrtte for evet A ad B as follows: B A) A) P ( A B) =. (2.) B) If s the complemetary evet of A, called ot A. Let A, A 2, A 3, be a partto of the evet space. The geeral form of the theorem s gve as: B A ) A ) A B) =. B A ) A ) (2.2) Let C = {c } deote the set of classes. The observable propertes of the obects are descrbed by vector x. A obect wth propertes x has to be classfed to the class for whch the c x) probablty s maxmal. O the bass of Bayes theorem: x c ) c ) c x) =. (2.3) x) Sce x) s the same for all we have to maxmze oly the expresso x c )c ). The value c ) s gve a pror or ca be apprecated wth relatve frequeces from the samples. Accordg to the assumpto of Nave Bayes classfcato the attrbutes a gve class are codtoally depedet of every other attrbute. So the ot probablty model ca be expressed as,..., x ) = c ) x c ) = c, x. (2.4) Usg the above equato the probablty of class c for a obect featured by vector x s equal to

44 P. BARABÁS AND L. KOVÁCS c c ) x c = x) =. x) ) (2.5) For the case where c* x) s maxmal the correspodg class label [5] s: c * = c C { c )} = argmaxc C c) = x c) argmax x. (2.6) If a gve class ad feature ever occur together the trag set, the the relatve frequecy wll be zero. Thus, the total probablty s also set to zero. Oe of the smplest solutos of ths problem s to add to all occurreces of the gve attrbute. I case of a large umber of samples the dstorto of probabltes s margal ad the formato loss through the zero tag ca be elmated successfully. Ths techque s called Laplace estmato [4]. A more refed soluto s to add p stead of to the relatve frequeces, where p s the relatve frequecy of th attrbute value the global teachg set, ot oly the set belogg to class c. 3. Summato hac Outlers the classfcato ca dcate faulty data whch cause msclassfcato. The use of summato hac s a optoal method to reduce the msclassfcato error. Summato hac s a ad-hoc replacemet of a product by a sum a probablstc expresso []. Ths hac s usually explaed as a devce to cope wth outlers, wth o formal dervato. Ths ote shows that the hac does mae sese probablstcally, ad ca be best thought of as replacg a outler-sestve lelhood wth a outler-tolerat oe. Let us defe a vector x wth compoets x,x 2,,x ad a class c. I Bayes classfcato where the vector values are codtoally depedet: x c) = x c). (3.) = I ths case the probablty s sestve to outlers dvdual dmesos so f ay x c) value s equal to 0, the product wll be zero. Usg summato hac we get the followg:

ESTIMATION OF MISCLASSIFICATION ERROR USING BAYES CLASSIFICATION 45 x c) x c). (3.2) = I ths case the result wll be zero f ad oly f all p(x c) values are equal to 0. Usg (2.9) ad (3.2) the computg of wer class s based upo the followg formula: c * argmax x x c), (3.3) = = c C { c )} argmax c C c) Applyg summato hac the error of classfcato ca be reduced. I every equato above the frequecy probabltes are replaced wth ther approxmated values, where P e e ( ) = lm, (3.4) t ad t s the total umber of trals ad e s the umber of trals where evet e occurred. If the umber of test evets approaches fty, the relatve frequecy value wll coverge to the probablty value. I may classfcato tass; a small umber of samples s gve [6], the umber of tests s low, so a larger approxmato error wll arse the calculatos. We ca wrte the probablty as follows: x = v c x = x = v c ) + Δ, (3.5) where meas the error of approxmato. The cumulated classfcato error case of summato hac ca be computed by the summato of the error elemets. Ths error value dffers from the classfcato error for the product of probabltes as t s calculated by the followg form: = ( P ( x = v c ) + Δ ) ( x = v c )). (3.6) =

46 P. BARABÁS AND L. KOVÁCS 4. Aalyss of approxmato error The ma cause of msclassfcato s the error of the approxmated probablty values show formula (3.4). To calculate the error value, the followg model s appled. Let {c} be the set of classes, ad {a } the set of attrbutes where a attrbute may be of vector value. A test case s descrbed by a (a,c) par where c deotes the class related to the a attrbute. The uow probablty that a belogs to c s deoted by p. The relatve frequecy of the evet that a belogs to c s deoted by g. I the calculatos p are approxmated by g. The classfcato of the attrbute ca be regarded as a stochastc evet, where p, g ) deotes the probablty that g wll be used the calculatos stead of g. Let X(x ) be a -dmesoal stochastc varable, where x deotes the umber of attrbutes classfed as c. X has a polyomal dstrbuto: where N! 2 P ( x =, x2 = 2,..., x = ) = P P2... P, (4.)!!...! 2 N =, P =. (4.2) = = A gve g(, 2,, r ) frequecy value has dfferet P probabltes for the dfferet p(p,p 2,,p r ) probablty tuples. The p(p,p 2,,p r ) wth maxmal P value s assumed to be the real probablty value tuple. As the maxmum lelhood approxmato of the probablty s the frequecy value, the relatve frequeces are the best approxmatos of real probabltes: P. (4.3) N The probablty of other p vectors ca also be calculated wth ths formula. For the case = 2 the resultg P dstrbuto fucto s show Fg.. I the ext step, the approxmato error of product P s calculated. It s clear that the larger the dfferece betwee p ad g, the hgher the error value s. O the other had, the lower the dfferece betwee p ad g, the hgher probablty of ths par s. I the vestgato, the average error value s calculated the followg way:

ESTIMATION OF MISCLASSIFICATION ERROR USING BAYES CLASSIFICATION 47 ε ( g ) = P ( p, g) ε ( p, g), p (4.4) where ε deotes the error value, where ε(p,g): the error value of matchg p wth g, p,g): the probablty of matchg p wth g ad ε(g): the average error related to frequecy vector g., Fgure. Probablty fucto for the case =2 I the test case, the error formula for p ad the mea value of error ca be computed as follows: ε ( p, g) = p ( p ) ( ), (4.5) N N Fg. 2 shows the error fucto for the test bomal case. The umber of attempts s 00 where the umber of attrbutes belogg to class c s 30. I the Fgure, ca be see the mmum error s case of p=0.3. Sce the fucto s symmetrc, aother mmum pot ca be foud at p=0.7.

48 P. BARABÁS AND L. KOVÁCS, Fgure 2. Error fucto for the case p=0.3 I Nave Bayesa classfer the accuracy depeds strogly o the umber of attempts. The larger the test pool, the better the accuracy s. I Fg. 3 the mea value error fucto ca be see for dfferet N values. The results show that for a small umber of N values the use of summato hac ca mprove the accuracy but for a larger test pool the Nave Bayesa classfer s the domat oe. 5. Test results I frst tests [7] the referece pots were geerated wth uform dstrbuto space. The wer was the Nave Bayesa classfer teachg ad testg phase equally. The teachg accuracy had values from 80% to 00% depedg o evromet parameters. Usg summato hac ths accuracy decreased by about 0%. The testg accuracy s far lower, t s betwee 40 ad 70 percet case of Nave Bayesa classfer ad lower usg summato hac. The relatvely large rage of result values ca be explaed by the overtrag of the model whch ca be cotrolled by the correct choce of evromet parameters.

ESTIMATION OF MISCLASSIFICATION ERROR USING BAYES CLASSIFICATION 49 I later tests the referece pots were geerated sparsely, so the space has a small rego wth a relatvely large umber of referece pots ad outsde ths rego there are oly a few referece pots., Fgure 3. Mea value error fucto for dfferet (, N) values I the case of ths dstrbuto, the accuracy of classfers has chaged. The teachg accuracy of Nave Bayesa classfer remaed hghly smlar to other cases ad the use of summato hac brought up the accuracy to the Nave Bayesa. I the testg phase the experece shows that some cases the summato hac soluto ca mprove the effcecy of classfcato ad may cases t exceeds the Nave Bayesa. It cofrms the assumptos that the usage of summato hac Bayesa classfcato ca crease accuracy whe the samples cota a great umber of utraed attrbute values. The accuracy of classfcato depeds o may parameters of the evromet. Oe of the most mportat factors s the maxmum attrbute value parameter. Fg. 4 shows the accuracy fuctos for the followg maxmum attrbute parameter values: 20 (NB20,SH20), 00 (NB00,SH00) ad 500 (NB500,SH500). The otato NB s for Nave Bayesa algorthm ad SH for the modfed Bayesa algorthm. The accuracy of both algorthms has creased wth creasg the sze of the trag set.

50 P. BARABÁS AND L. KOVÁCS accuracy (%) 00 90 80 70 60 50 40 30 20 0 0 sze of trag set 30 60 20 250 500 000 NB20 SH20 NB00 SH00 NB500 SH500 Fgure 4. Relatve accuracy of algorthms accordg to umber of teachg samples 6. Coclusos Summato hac s a alteratve for the Nave Bayesa classfer wth larger probablty approxmato errors. Tag a decso tree as a referece classfer, we have compared the Nave Bayesa classfer wth the Bayesa classfer usg summato hac. The test results show that both methods ca yeld the same accuracy as the decso tree method has the case of large trag sets. REFERENCES [] THOMAS P. MINKA: The summato hac as a outler model, techcal ote, August 22, 2003 [2] HOLSTROM L, KOISTIEN P, LAAKSONEN J., OJA E: Neural ad Statstcal Classfers - Taxoomy ad Two Case Studes, IEEE Tras. O Neural Networs, Vol 8, No, 997. [3] KOVÁCS L., TERSTYÁNSZKI G.: Improved Classfcato Algorthm for the Couter Propagato Networ, Proceedgs of IJCNN 2000, Como, Italy. [4] JOAQUIM P. MARQUES DE SÁ: Appled Statstcs Usg SPSS, Statstca, Matlab ad R, Sprger, 2007, pp. 223-268 [5] FUCHUN PENG, DALE SHUURMANS, SHAOJUN WANG: Augmetg Nave Bayes Classfers wth Statstcal Laguage Models, Iformato Retreval, 7, Kluwer Academc Publshers, 2004, Netherlads, pp. 34-345 [6] ROBERT P.W. DUNN: Small sample sze geeralzato, 9 th Scadava Coferece of Image Aalyss, Jue 6-9, 995, Uppsala, Swede [7] BARABÁS P., KOVÁCS L.: Usablty of summato hac Bayes Classfcato, 9 th Iteratoal Symposum of Hugara Researchers o Computatoal Itellgece ad Iformatcs, November 6-8, 2008, Budapest, Hugary