Classification of Incomplete Patterns Based on the Fusion of Belief Functions

Size: px

Start display at page:

Download "Classification of Incomplete Patterns Based on the Fusion of Belief Functions"

Gervase Ramsey
5 years ago
Views:

1 18th Internatonal Conference on Informaton Fuson Washngton, DC - July 6-9, 2015 Classfcaton of Incomplete Patterns Based on the Fuson of Belef Functons Zhun-ga Lu, Quan Pan, Jean Dezert, Arnaud Martn and GrégoreMercer Northwestern Polytechncal Unversty, X an, Chna. Emal: luzhunga@nwpu.edu.cn, quanpan@nwpu.edu.cn ONERA - The French Aerospace Lab, F Palaseau, France. Emal: jean.dezert@onera.fr IRISA, Unversty of Rennes 1, Lannon, France. Emal: Arnaud.Martn@unv-rennes1.fr Telecom Bretagne, CNRS UMR 6285 Lab-STICC/CID, Brest, France. Emal: Gregore.Mercer@telecom-bretagne.eu Abstract The nfluence of the mssng values n the classfcaton of ncomplete pattern manly depends on the context. In ths paper, we present a fast classfcaton method for ncomplete pattern based on the fuson of belef functons where the mssng values are selectvely (adaptvely) estmated. At frst, t s assumed that the mssng nformaton s not crucal for the classfcaton, and the object (ncomplete pattern) s classfed based only on the avalable attrbute values. However, f the object cannot be clearly classfed, t mples that the mssng values play an mportant role to obtan an accurate classfcaton. In ths case, the mssng values wll be mputed based on the K-nearest neghbor (K- NN) and self-organzng map (SOM) technques, and the edted pattern wth the mputaton s then classfed. The (orgnal or edted) pattern s respectvely classfed accordng to each tranng class, and the classfcaton results represented by basc belef assgnments (BBA s) are fused wth proper combnaton rules for makng the credal classfcaton. The object s allowed to belong wth dfferent masses of belef to the specfc classes and metaclasses (.e. dsjunctons of several sngle classes). Ths credal classfcaton captures well the uncertanty and mprecson of classfcaton, and reduces effectvely the rate of msclassfcatons thanks to the ntroducton of meta-classes. The effectveness of the proposed method wth respect to other classcal methods s demonstrated based on several experments usng artfcal and real data sets. Keywords: nformaton fuson, combnaton rule, belef functons, classfcaton, ncomplete pattern. I. INTRODUCTION In many practcal classfcaton problems, some attrbutes of object can be mssng for varous reasons (e.g. the falure of the sensors, etc). So t s crucal to develop effcent technques to classfy as best as possble the objects wth mssng attrbute values (ncomplete pattern), and the search for a soluton of ths problem remans an mportant research topc n the communty [1], [2]. Many classfcaton approaches have been proposed to deal wth the ncomplete patterns [1]. The smplest method conssts n removng (gnorng) drectly the patterns wth mssng values, and the classfer s desgned only for the complete patterns. Ths method s acceptable when the ncomplete data set s only a very small subset (e.g. less than 5%) of the whole data set. A wdely adopted method s to fll the mssng values wth proper estmatons [3], and then to classfy the the edted patterns. There have been dfferent works devoted to the mputaton (estmaton) of mssng data. For example, the mputaton can be done ether by the statstcal methods, e.g. mean mputaton [4], regress mputaton [2], etc, or by machne learnng methods, e.g. K-nearest neghbors (K-NN) mputaton [5], Fuzzy c-means (FCM) mputaton [6], [7], etc. Some model-based technques have also been developed for dealng wth ncomplete patterns [8]. The probablty densty functon (PDF) of the tranng data (complete and ncomplete cases) s estmated at frst, and then the object s classfed usng bayesan reasonng. Other classfers [9] have also been proposed to drectly handle ncomplete pattern wthout mputng the mssng values. All these methods attempt to classfy the object nto a partcular class wth maxmal probablty or lkelhood measure. However, the estmaton of mssng values s n general qute uncertan, and the dfferent mputatons of mssng values can yeld very dfferent classfcaton results, whch prevent us to correctly commt the object nto a partcular class. Belef functon theory (BFT), also called Dempster-Shafer theory (DST) [10] and ts extenson [11], [12] offer a mathematcal framework for modelng uncertanty and mprecse nformaton [13]. BFT has already been appled successfully for object classfcaton [14], [15], [17] [19], clusterng [20] [23] and mult-source nformaton fuson [24], etc. Some classfers for the complete pattern based on DST have been developed by Denœux and hs collaborators to come up wth the evdental K-nearest neghbors [14], evdental neural network [19], etc. The extra gnorance element represented by the dsjuncton of all the elements n the whole frame of dscernment s ntroduced n these classfers to capture the totally gnorant nformaton. However, the partal mprecson, whch s very mportant n the classfcaton, s not well characterzed. That s why we have proposed new credal classfers n [15] [17], [22]. Our new classfers take nto account all the possble metaclasses (.e. the partcular dsjunctons of several sngleton classes) to model the partal mprecse nformaton thanks to belef functons. The credal classfcaton allows the objects to belong (wth dfferent masses of belef) not only to the ISIF 2033

2 sngleton classes, but also to any set of classes correspondng to the meta-classes. In our recent research works, a prototype-based credal classfcaton (PCC) [25] method for the ncomplete patterns has been ntroduced to well capture the mprecson of classfcaton caused by the mssng values. The object hard to correctly classfy are commtted to a sutable meta-class by PCC, whch captures well the mprecson of classfcaton caused by the mssng values and also reduces the msclassfcaton errors. In PCC, the mssng values n all the ncomplete patterns are mputed usng the prototype of each class, and the edted pattern wth each mputaton s respectvely classfed by a standard classfer (used for the classfcaton of complete pattern). Wth PCC, one obtans c peces of classfcaton results for one ncomplete pattern n a c class problem, and the global fuson of the c results s used for the credal classfcaton. Unfortunately, PCC classfer s computatonally greedy and tme-consumng, and the method of mputaton of the mssng values based on the prototype of each class s not so precse and accurate. That s why we propose a new nnovatve and more effectve method for credal classfcaton of ncomplete pattern wth adaptve mputaton of mssng values, and ths method can be called Credal Classfcaton wth Adaptve Imputaton (CCAI) for short. The pattern to classfy usually conssts of multple attrbutes. Sometmes, the class of the pattern can be precsely determned usng only a part (a subset) of the avalable attrbutes, whch means that the other attrbutes are redundant and n fact unnecessary for the classfcaton. In the classfcaton of ncomplete pattern wth mssng values, one can attempt at frst to classfy the object only usng the known attrbutes value. If a specfc classfcaton result s obtaned, t very lkely means that the mssng values are not very necessary for the classfcaton, and we drectly take the decson on the class of the object based on ths result. However, f we the object cannot be clearly classfed wth the avalable nformaton, t means that the mssng nformaton ncluded n the mssng attrbute values s probably very crucal for makng the classfcaton. In ths case, we propose a sophstcated classfcaton strategy for the edted pattern wth proper mputaton of mssng values obtaned usng K-NN and self-organzng map (SOM) technques [26]. The nformaton fuson technque s adopted n the classfcaton of orgnal ncomplete pattern (wthout mputaton of mssng values) or the edted pattern (wth mputaton of mssng values) to obtan the good results. One can respectvely get the smple classfcaton result represented by a smple basc belef assgnment (BBA) accordng to each tranng class. The global fuson (ensemble) of these multple BBA s wth a proper combnaton rule,.e. Dempster-Shafer (DS) rule or a new rule nspred by Dubos Prade (DP) rule dependng on the actual case, s then used to determne the class of the object. Ths paper s organzed as follows. The bascs of belef functon theory s brefly recalled n secton II. The new credal classfcaton method for ncomplete patterns s presented n the secton III, and the proposed method s then tested and evaluated n secton IV compared wth several other classcal methods. It s concluded n the fnal. II. BASIS OF BELIEF FUNCTION THEORY The Belef Functon Theory (BFT) s also known as Dempster-Shafer Theory (DST), or the Mathematcal Theory of Evdence [10] [12]. Let us consder a frame of dscernment consstng of c exclusve and exhaustve hypotheses (classes) denoted by Ω = {ω, = 1,2,...,c}. The power-set of Ω denoted 2 Ω s the set of all the subsets of Ω, empty set ncluded. In the classfcaton problem, the sngleton element (e.g. ω ) represents a specfc class. In ths work, the dsjuncton (unon) of several sngleton elements s called a meta-class whch characterzes the partal gnorance of classfcaton. In BFT, the basc belef assgnment (BBA) s a functon m(.) from 2 Ω to [0, 1] satsfyng m( ) = 0 and the normalzaton condton m(a) = 1. The subsets A of Ω such that m(a) > 0 are A 2 Ω called the focal elements of the belef mass m(.). The credal classfcaton (or parttonng) [20], [21] s defned as n-tuple M = (m 1,,m n ) of BBA s, where m s the basc belef assgnment of the object x X, = 1,...,n assocated wth the dfferent elements n the power-set 2 Θ. The credal classfcaton can well model the mprecse and uncertan nformaton thanks to the ntroducton of meta-class. For combnng multple sources of evdence represented by a set of BBA s, the well-known Dempster s rule [10] s stll wdely used. We denote t by DS (standng for Dempster- Shafer) because Dempster s rule has been wdely promoted by Shafer n [10]. The combnaton of two BBA s m 1 (.) and m 2 (.) over 2 Ω s done wth DS rule of combnaton defned by m DS ( ) = 0 and for A,B,C 2 Ω by m 1 (B)m 2 (C) B C=A m DS (A) = 1 m 1 (B)m 2 (C) B C= DS rule s commutatve and assocatve, and makes a compromse between the specfcty and complexty for the combnaton of BBA s. However, DS rule produces unreasonable results n hgh conflctng cases, and as well as n some specal low conflctng cases [27]. Many alternatve rules have been proposed to overcome the lmtatons of DS rule, e.g. Dubos- Prade (DP) rule [28] and Proportonal Conflct Redstrbutons (PCR) rules [29]. Our method s nspred by DP rule [28] defned by m DP ( ) = 0 and for A,B,C 2 Θ by m DP (A) = m 1 (B)m 2 (C)+ m 1 (B)m 2 (C) B C=A B C= B C=A (2) In DP rule, the partal conflctng belefs are all transferred to the unon of the elements (.e. meta-class) nvolved n the partal conflct. III. CREDAL CLASSIFICATION OF INCOMPLETE PATTERN Our new method conssts of two man steps. In the frst step, the object (ncomplete pattern) s drectly classfed accordng (1) 2034

3 to the known attrbute values only, and the mssng values are gnored. If one can get a specfc classfcaton result, the classfcaton procedure s done because the avalable attrbute nformaton s suffcent for makng the classfcaton. But f the class of the object cannot be clearly dentfed n the frst step, t means that the unavalable nformaton ncluded n the mssng values s lkely crucal for the classfcaton. In ths case, one has to enter n the second step of the method to classfy the object wth a proper mputaton of mssng values. In the classfcaton procedure, the orgnal or edted pattern wll be respectvely classfed accordng to each class of tranng data. The global fuson of these classfcaton results, whch can be consdered as multple sources of evdence represented by BBA s, s then used for the credal classfcaton of the object. The new method s referred as Credal Classfcaton wth Adaptve Imputaton of mssng values denoted by CCAI for concseness. A. Step 1: Drect classfcaton of ncomplete pattern Let us consder a set of test patterns (samples) X = {x 1,...,x n } to be classfed based on a set of labeled tranng patterns Y = {y 1,...,y s } over the frame of dscernment Ω = {ω 1,...,ω c }. In ths work, we focus on the classfcaton of ncomplete pattern n whch some attrbute values are absent. So we consder all the test patterns (e.g. x, = 1,...,n) wth several mssng values. The tranng data set Y may also have ncomplete patterns n some applcatons. However, f the ncomplete patterns take a very small amount say less than 5% n the tranng data set, they can be gnored n the classfcaton. If the percentage of ncomplete patterns s bg, the mssng values must usually be estmated at frst, and the classfer wll be traned usng the edted (complete) patterns. In the real applcatons, one can also just choose the complete labeled patterns to nclude n the tranng data set when the tranng nformaton s suffcent. So for smplcty and convenence, we consder that the labeled samples (e.g. y j,j = 1,...,s) of the tranng set Y are all complete patterns n the sequel. In the frst step of classfcaton, the ncomplete pattern say x wll be respectvely classfed accordng to each tranng class by a normal classfer (for dealng wth the complete pattern) at frst, and all the mssng values are gnored here. In ths work, we adopt a very smple classfcaton method for the convenence of computaton, and x s drectly classfed based on the dstance to the prototype of each class. The prototype of each class {o 1,...,o c } correspondng to {ω 1,...,ω c } s gven by the arthmetc average vector of the tranng patterns n the same class. Mathematcally, the prototype s computed for g = 1,...,c by o g = 1 y j (3) N g y j ω g where N g s the number of the tranng samples n the class ω g. In a c-class problem, one can get c peces of smple classfcaton result for x accordng to each class of tranng data, and each result s represented by a smple BBA s ncludng two focal elements,.e. the sngleton class and the gnorant class (Ω) to characterze the full gnorance. The belef of x belongng to class w g s computed based on the dstance between x and the correspondng prototype o g. Mahalanobs dstance s adopted here to deal wth the ansotropc class, and the mssng values are gnored n the calculaton of ths dstance. The other mass of belef s assgned to the gnorant class Ω. Therefore, the BBA s constructon s done by { m og (w g ) = e ηdg (4) m og (Ω) = 1 e ηdg wth d g = 1 p ( ) 2 xj o gj (5) p and δ gj = j=1 1 N g δ gj y ω g (y j o gj ) 2 (6) where x j s value of x n j-th dmenson, and y j s value of y n j-th dmenson. p s the number of avalable attrbute values n the object x. The coeffcent 1/p s necessary to normalze the dstance value because each test sample can have a dfferent number of mssng values. δ gj s the average dstance of all tranng samples n class ω g to the prototype o g n j-th dmenson. N g s the number of tranng samples n ω g. η s a tunng parameter, and the bgger η generally yelds smaller mass of belef on the specfc class w g. Obvously, the smaller dstance measure, the bgger mass of belef on the sngleton class. Ths partcular structure of BBA s ndcates that we can just confrm the degree of the object x assocated wth the specfc class w g only accordng to tranng data n w g. The other mass of belef reflects the level of belef one has on full gnorance, and t s commtted to the gnorant class Ω. Smlarly, one calculates c ndependent BBA s m og (w g ),g = 1,...,c based on the dfferent tranng classes. Before combnng these c BBA s, we examne whether a specfc classfcaton result can be derved from these c BBA s. Ths s done as follows: f t holds that m o1st (st ) = argmax g (m og (w g )), then the object wll be consdered to belong very lkely to the class st, whch obtans the bggest mass of belef n thecbba s. The class wth the second bggest mass of belef s denoted nd. The dstngushablty degree χ (0,1] of an object x assocated wth dfferent classes s defned by: χ = mo 2nd (nd ) (7) m omax (w max ) Let ǫ be a chosen small postve dstngushablty threshold value n (0,1]. If the condton χ ǫ s satsfed, t means that all the classes nvolved n the computaton of χ can be clearly dstngushed of x. In ths case, t s very lkely to obtan a specfc classfcaton result from the fuson of the c BBA s. The condton χ ǫ also ndcates that the avalable attrbute nformaton s suffcent for makng the 2035

4 classfcaton of the object, and the mputaton of the mssng values s not necessary. If χ ǫ condton holds, he c BBA s are drectly combned wth DS rule (1) to obtan the fnal classfcaton results of the object because DS rule usually produces specfc combnaton result wth acceptable computaton burden n the low conflctng case. In such case, the meta-class s not ncluded n the fuson result, because these dfferent classes are consdered dstngushable based on the condton of dstngushablty. Moreover, the mass of belef of the full gnorance class Ω, whch represents the nosy data (outlers), can be proportonally redstrbuted to other sngleton classes for more specfc results f one knows a pror that the nosy data s not nvolved. If the dstngushablty condton χ ǫ s not satsfed, t means that the classes st and nd cannot be clearly dstngushed for the object wth respect to the chosen threshold value ǫ, ndcatng that mssng attrbute values play almost surely a crucal role n the classfcaton. In ths case, the mssng values must be properly mputed to recover the unavalable attrbute nformaton before enterng the classfcaton procedure. Ths s the Step 2 of our method whch s explaned n the next subsecton. B. Step 2: Classfcaton of ncomplete pattern wth mputaton of mssng values 1) Multple estmaton of mssng values: In the estmaton of the mssng attrbute values, there exst varous methods. Partcularly, the K-NN mputaton method generally provdes good performance. However, the man drawback of KNN method s ts bg computatonal burden, snce one needs to calculate the dstances of the object wth all the tranng samples. Inspred by [30], we propose to use the Self Organzed Map (SOM) technque [26], [30] to reduce the computatonal complexty. SOM can be appled n each class of tranng data, and then M N weghtng vectors wll be obtaned after the optmzaton procedure. These optmzed weghtng vectors allow to characterze well the topologcal features of the whole class, and they wll be used to represent the correspondng data class. The number of the weghtng vectors s usually small (e.g. 5 6). So the K nearest neghbors of the test pattern assocated wth these weghtng vectors n the SOM can be easly found wth low computatonal complexty 1. The selected weghtng vector no. k n the class w g, g = 1,...,c s denoted σ wg k, for k = 1,...,K. In each class, the K selected close weghtng vectors provde dfferent contrbutons (weghts) n the estmaton of mssng values. The weghtp wg k of each vector s defned based on the dstance between the object x and weghtng vector σ wg k as follows p wg k = e( λdwg k ) (8) 1 The tranng of SOM usng the labeled patterns becomes tme consumng when the number of labeled patterns s bg, but fortunately t can be done offlne. In our experments, the runnng tme performance shown n the results does not nclude the computatonal tme spent for the off-lne procedures. wth λ = cnm(cnm 1) 2 d(σ,σ j ),j where d wg k s the Eucldean dstance between x and the neghboro wg k gnorng the mssng values, and 1 λ s the average dstance between each par of weghtng vectors produced by SOM n all the classes; c s the number of classes; M N s the number of weghtng vectors obtaned by SOM n each class; and d(σ,σ j ) s the Eucldean dstance between any two weghtng vectors σ and σ j. The weghted mean value ŷ wg of the selected K weghtng vectors n class tranng class w g wll be used for the mputaton of mssng values. It s calculated by K K ŷ wg = ( p wg k σwg k )/( p wg k ) (10) k=1 k=1 The mssng values n x wll be flled by the values of ŷ wg n the same dmensons. By dong ths, we get the edted pattern x wg accordng to the tranng class w g. Then x wg wll be smply classfed only based on the tranng data n w g as smlarly done n the drect classfcaton of ncomplete pattern usng eq. (4) of Step 1 for convenence 2. The classfcaton of x wth the estmaton of mssng values s also respectvely done based on the other tranng classes accordng to ths procedure. For a c-class problem, there are c tranng classes, and therefore one can get c peces of classfcaton results wth respect to one object. 2) Ensemble classfer for credal classfcaton: These c peces of results obtaned by each class of tranng data n a c-class problem are consdered wth dfferent weghts, snce the estmatons of the mssng values accordng to dfferent classes have dfferent relabltes. The weghtng factor of the classfcaton result assocated wth the class w g can be defned by the sum of the weghts of the K selected SOM weghtng vectors for the contrbutons to the mssng values mputaton n w g, whch s gven by K ρ wg = p wg k (11) k=1 The result wth the bggest weghtng factor ρ wmax s consdered as the most relable, because one assumes that the object must belong to one of the labeled classes (.e. w g, g = 1,...,c). So the bggest weghtng factor wll be normalzed as one. The other relatve weghtng factors are defned by: ˆα wg = ρwg ρ wmax (9) (12) If the condton 3 ˆα wg < ǫ s satsfed, the correspondng estmaton of the mssng values and the classfcaton result 2 Of course, some other sophstcated classfers can also be appled here accordng to the selecton of user, but the choce of classfer s not the man purpose of ths work. 3 The threshold ǫ s the same as n secton III-A, because t s also used to measure the dstngushablty degree here. 2036

5 are not very relable. Very lkely, the object does not belong to ths class. It s mplctly assumed that the object can belong to only one class n realty. If ths result whose relatve weghtng factor s very small (w.r.t.ǫ) s stll consdered useful, t wll be (more or less) harmful for the fnal classfcaton of the object. So f the condton ˆα wg < ǫ holds, then the relatve weghtng factor s set to zero. More precsely, we wll take { w 0, f ˆα g α wg = ρ wg ρ wmax < ǫ, otherwse. (13) After the estmaton of weghtng (dscountng) factorsα wg the c classfcaton results (the BBA s m og (.)) are classcally dscounted [10] by { ˆm og (w g ) = α wg m og (w g ) (14) ˆm og (Ω) = 1 α wg +α wg m og (Ω) These dscounted BBA s wll be globally combned to get the credal classfcaton result. If α wg = 0, one gets ˆm og (Ω) = 1, and ths fully gnorant (vacuous) BBA plays a neutral role n the global fuson process for the fnal classfcaton of the object. Although we have done our best to estmate the mssng values, the estmaton can be qute mprecse when the estmatons are obtaned from dfferent class wth the smlar weghtng factors, and the dfferent estmatons probably lead to dstnct classfcaton results. In such case, we prefer to cautously keep (rather to gnore) the uncertanty, and mantan the uncertanty n the classfcaton result. Such uncertanty can be well reflected by the conflct of these classfcaton results represented by the BBA s. DS rule s not sutable here, because all the conflctng belefs are dstrbuted to other focal elements. A partcular combnaton rule nspred by DP rule s ntroduced here to fuse these BBA s accordng to the current context. In our new rule, the partal conflctng belefs are prudently transferred to the proper meta-class to reveal the mprecson degree of the classfcaton caused by the mssng values. Ths new rule of combnaton s defned by: m (w g ) = ˆm og (w g ) ˆm oj (Ω) j g m (A) = ˆm oj (w j ) ˆm o k (Ω) (15) k j w j=a j, The global fuson formula (15) conssts of two parts. In the frst part, we use the conjunctve combnaton to commt the mass of belef to the specfc (sngleton) class, whereas the dsjunctve combnaton s used to transfer the conflctng belefs to the proper meta-class n the second part. The test pattern can be classfed accordng to the fuson results, and the object s consdered belongng to the class (sngleton class or meta-class) wth the maxmum mass of belef. Ths s called hard credal classfcaton. If one object s classfed to a partcular class, t means that ths object has been correctly classfed wth the proper mputaton of mssng values. If one object s commtted to a meta-class (e.g. A B), t means that we just know that ths object belongs to one of the specfc classes (e.g. A or B) ncluded n the meta-class, but we cannot specfy whch one. Ths case can happen when the mssng values are essental for the accurate classfcaton of ths object, but the mssng values cannot be estmated very well accordng to the context, and dfferent estmatons wll nduce the classfcaton of the object nto dstnct classes (e.g. A or B). Wth tradtonal classfers, the mssng values n each object are usually estmated before makng the classfcaton. In our CCAI approach, many objects can be drectly classfed based on the dstances to each class prototype, and the mputaton of mssng values s gnored accordng to the context. So the computaton complexty of CCAI s generally relatvely low wth respect to other methods lke KNNI, PCC, etc. Gudelne for tunng of the parameters ǫ and η: η n eq. (4) s assocated wth the calculaton of mass of belef on the specfc class, and the bgger η value wll lead to smaller mass of belef commtted to the specfc class. We advse to take η [0.5,0.8], and the value η = 0.7 can be taken as the default value. The parameter ǫ s the threshold for changng the classfcaton strategy. It s also used n Eq. (13) for the calculaton of the dscountng factor. The bgger ǫ wll makes fewer objects commtted to the meta-classes (correspondng to the low mprecson of classfcaton), but t ncreases the rsk of msclassfcaton error. ǫ should be tuned accordng to the compromse one can accept between the msclassfcaton error and mprecson. IV. EXPERIMENTS Two experments wth artfcal and real data sets have been used to test the performance of ths new CCAI method compared wth the K-NN mputaton (KNNI) method [5], FCM mputaton (FCMI) method [6], [7] and our prevous credal classfcaton PCC method [25]. The evdental neural network classfer (ENN) [19] s adopted here to classfy the edted pattern wth the estmated values n PCC, KNNI and FCMI, snce ENN produce generally good results n the classfcaton. The parameters of ENN can be automatcally optmzed as explaned n [19]. In the applcatons of PCC, the tunng parameter ǫ can be tuned accordng to the mprecson rate one can accept. In CCAI, a small number of the nodes n the 2-dmensonal grd of SOM s gven by M N = 3 4, and we take the value of K = N = 4 n K-NN for the mputaton of mssng values. Ths seems to provde good performance n the sequel experments. In order to show the ablty of CCAI and PCC to deal wth the meta-classes, the hard credal classfcaton s appled, and the class of each object s decded accordng to the crteron of the maxmal mass of belef. In our smulatons, the msclassfcaton s declared (counted) for one object truly orgnated from w f t s classfed nto A wth w A =. If w A and A w then t wll be consdered as an mprecse classfcaton. The error rate denoted by Re s calculated by Re = N e /T, where N e s number of msclassfcaton errors, and T s the number 2037

6 of objects under test. The mprecson rate denoted by R j s calculated by R j = N j /T, where N j s number of objects commtted to the meta-classes wth the cardnalty value j. In our experments, the classfcaton of object s generally uncertan (mprecse) among a very small number (e.g. 2) of classes, and we only take R 2 here snce there s no object commtted to the meta-class ncludng three or more specfc classes. A. Experment 1 (artfcal data set) In the frst experment, we show the nterest of credal classfcaton based on belef functons wth respect to the tradtonal classfcaton workng wth probablty framework. A 3-class data set Ω = {ω 1,ω 2,ω 3 } obtaned from three 2- D unform dstrbutons s consdered here. Each class has 200 tranng samples and 200 test samples, and there are 600 tranng samples and 600 test samples n total as shown n Fg (a). Classfcaton result by FCMI (Re = 14.67,tme = s) tr tr tr te te te (b). Classfcaton result by KNNI (Re = 14.17,tme = s) Fgure 1. Tranng data and test data ,2,3,3 The unform dstrbutons of the three classes are characterzed by the followng nterval bounds: x-label nterval y-label nterval (5, 65) (5, 25) (95, 155) (5, 25) (50, 110) (50, 70) The values n the second dmenson correspondng to y- coordnate of test samples are all mssng. So test samples are classfed accordng to the only one avalable value n the frst dmenson correspondng to x-coordnate. A partcular value of K = 9 s selected n the classfer K-NN mputaton method 4. The classfcaton results of the test objects by dfferent methods are gven n Fg. 2 (a) (c). For notaton concseness, we have denoted w te w test, w tr w tranng and w,...,k w... w k. The error rate (n %) and mprecson rate (n %) are specfed n the capton of each subfgure. 4 In fact, the choce of K rankng from 7 to 15 does not affect serously the results (c). Classfcaton result by CCAI (Re = 5.83,R 2 = 16.83,tme = s). Fgure 2. Classfcaton results of a 3-class artfcal data set by dfferent methods. Because the y value n the test sample s mssng, the class appears partally overlapped wth the classes and on ther margns accordng to the value of x-coordnate as shown n Fg. 1. The mssng value of the samples n the overlapped parts can be flled by qute dfferent estmatons obtaned from dfferent classes wth the almost same relabltes. For example, the estmaton of the mssng values of the objects n the rght margn of and the left margn of can be obtaned accordng to the tranng class or. The edted pattern wth the estmaton from wll be classfed nto 2038

7 class, whereas t wll be commtted to class f the estmaton s drawn from. It s smlar to the test samples n the left margn of and the rght margn of. Ths ndcates that the mssng value play a crucal rule n the classfcaton of these objects, but unfortunately the estmaton of these nvolved mssng values are qute uncertan accordng to context. So these objects are prudently classfed nto the proper meta-class (e.g. and ) by CCAI. The CCAI results ndcate that these objects belong to one of the specfc classes ncluded n the meta-classes, but these specfc classes cannot be clearly dstngushed by the object based only on the avalable values. If one wants to get more precse and accurate classfcaton results, one needs to request for addtonal resources for gatherng more useful nformaton. The other objects n the left margn of, rght margn of and mddle of can be correctly classfed based on the only known value n x-coordnate, and t s not necessary to estmate the mssng value for the classfcaton of these objects n CCAI. However, all the test samples are classfed nto specfc classes by the tradtonal methods KNNI and FCMI, and ths causes many errors due to the lmtaton of probablty framework. Thus, CCAI produces less error rate than KNNI and FCMI thanks to the use of meta-classes. Meanwhle, the computatonal tme of CCAI s smlar to that of FCMI, and s much shorter than KNNI because of the ntroducton of SOM technque n the estmaton of mssng values. It shows that the computatonal complexty of CCAI s relatvely low. Ths smple example shows the nterest and the potental of the credal classfcaton obtaned wth CCAI method. B. Experment 2 (real data set) Four well known real data sets (Breast cancer, Irs, Seeds and Wne data sets) avalable from UCI Machne Learnng Repostory [32] are used n ths experment to evaluate the performance of CCAI wth respect to KNNI, FCMI and PCC. ENN s also used here as standard classfer. The basc nformaton of these four real data sets s gven n Table I. The cross valdaton s performed on all the data sets, and we use the smplest 2-fold cross valdaton 5 here, snce t has the advantage that the tranng and test sets are both large, and each sample s used for both tranng and testng on each fold. Each test sample hasnmssng (unknown) values, and they are mssng completely at random n every dmenson. The average error rate Re and mprecson rate R (for PCC and CCAI) of the dfferent methods are gven n Table II. Partcularly, the reported classfcaton result of KNNI s the average wth K value rangng from 5 to 15. One can see that the credal classfcaton of PCC and CCAI always produce the lower error rate than the tradtonal FCMI and KNNI methods, snce some objects that cannot be 5 More precsely, the samples n each class are randomly assgned to two sets S 1 and S 2 havng equal sze. Then we tran on S 1 and test on S 2, and recprocally. Table I BASIC INFORMATION OF THE USED DATA SETS. name classes attrbutes nstances Breast (B) Irs (I) Seeds (S) Wne (W) Table II CLASSIFICATION RESULTS FOR DIFFERENT REAL DATA SETS (IN %). data n FCMI KNNI PCC CCAI Re Re {Re,R 2 } {Re,R 2 } B {3.81, 2.34} {3.66, 0} B {5.42,1.32} {4.83, 1.61} B {10.10, 2.64} {9.00, 0.66} I {5.33, 2.67} {4.00, 1.33} I {8.67,4.00} {8.00, 4.67} I {12.67, 9.33} {11.33, 12} S {9.52, 4.76 } {9.52, 0} S {10.48, 4.29} {10.00, 0.48} S {16.19, 14.76} {16.19, 13.81} W {26.97, 1.69} {6.74, 1.12} W {29.78, 2.25} {7.30, 3.93} W {30.34, 2.81} {12.36, 3.93} correctly classfed usng only the avalable attrbute values have been properly commtted to the meta-classes, whch can well reveal the mprecson of classfcaton. In CCAI, some objects wth the mputaton of mssng values are stll classfed nto the meta-class. It ndcates that these mssng values play a crucal role n the classfcaton, but the estmaton of these mssng values s no very good. In other words, the mssng values can be flled wth the smlar relabltes by dfferent estmated data, whch lead to dstnct classfcaton results. So we have to cautously assgn them to the meta-class to reduce the rsk of msclassfcaton. Compared wth our prevous method PCC, ths new method CCAI generally provde better performance wth lower error rate and mprecson rate, and t s manly because more accurate estmaton method (.e. SOM +KNN) for mssng values s adopted n CCAI. Ths thrd experment usng real data sets for dfferent applcatons shows the effectveness and nterest of ths new CCAI method wth respect to other methods. V. CONCLUSION A fast credal classfcaton method wth adaptve mputaton of mssng values (called CCAI) for dealng wth ncomplete pattern has been presented. In step 1 of CCAI method, some objects (ncomplete pattern) are drectly classfed gnorng the mssng values f the specfc classfcaton result can be obtaned, whch effectvely reduces the computaton complexty because t avods the mputaton of the mssng values. However, f the avalable nformaton s not suffcent to acheve a specfc classfcaton of the object, we estmate (recover) the mssng values before enterng the classfcaton 2039

8 procedure n the second step. The SOM and K-NN approaches are appled to make the estmaton of mssng attrbutes wth a good compromse between the estmaton accuracy and computaton burden. Informaton fuson technque s employed to combne the multple smple classfcaton results respectvely obtaned from each tranng class for the fnal credal classfcaton of object. The credal classfcaton n ths work allows the object to belong to dfferent sngleton classes and meta-class wth dfferent masses of belef. Once the object s commtted to a meta-class (e.g. A B), t means that the mssng values cannot be accurately recovered accordng to the context, and the estmaton s not very good. Dfferent estmatons wll lead the object to dstnct classes (e.g. A or B) nvolved n the meta-class. So some other sources of nformaton wll be requred to acheve more precse classfcaton of the object f necessary. Two experments have been appled to test the performance of CCAI method wth artfcal and real data sets. The results show that the credal classfcaton s able to well capture the mprecson of classfcaton and effectvely reduces the msclassfcaton errors as well. Acknowledgements Ths work has been partally supported by Natonal Natural Scence Foundaton of Chna (Nos , , ), the Fundamental Research Funds for the Central Unverstes (No JCQ01067) and Natural Scence Foundaton of Shaanx (No. 2015JQ6265). REFERENCES [1] P. Garca-Laencna, J. Sancho-Gomez, A. Fgueras-Vdal, Pattern classfcaton wth mssng data: a revew, Neural Comput Appl. Vol.19: , [2] R.J. Lttle, D.B. Rubn, Statstcal Analyss wth Mssng Data, John Wley & Sons, New York, 1987 (Second edton publshed n 2002). [3] A. Farhangfar, L. Kurgan, J. Dy, Impact of mputaton of mssng values on classfcaton error for dscrete data, Pattern Recognton Vol.41: , [4] D.J. Mundfrom, A. Whtcomb, Imputng mssng values: The effect on the accuracy of classfcaton, Multple Lnear Regresson Vewponts. Vol. 25(1):13 19, [5] G. Batsta, M.C. Monard, A Study of K-Nearest Neghbour as an Imputaton Method, n Proc. of Second Internatonal Conference on Hybrd Intellgent Systems (IOS Press, v. 87), pp , [6] J. Luengo, J.A. Saez, F.Herrera, Mssng data mputaton for fuzzy rulebased classfcaton systems, Soft Computng, Vol.16(5): , [7] D. L, J.Deogun, W. Spauldng, B. Shuart, Towards mssng data mputaton: a study of fuzzy k-means clusterng method, In: 4th nternatonal conference of rough sets and current trends n computng (RSCTC04), pp , [8] Z. Ghahraman, M.I. Jordan, Supervsed learnng from ncomplete data va an EM approach, In: Cowan JD et al. (Eds) Adv. Neural Inf. Process., Morgan Kaufmann Publshers Inc., Vol.6: , [9] K. Pelckmans, J.D. Brabanter, J.A.K. Suykens, B.D. Moor, Handlng mssng values n support vector machne classfers, Neural Networks, Vol. 18(5-6): , [10] G. Shafer, A mathematcal theory of evdence, Prnceton Unv. Press, [11] F. Smarandache, J. Dezert (Edtors), Advances and applcatons of DSmT for nformaton fuson, Amercan Research Press, Rehoboth, Vol. 1-4, [12] P. Smets, The combnaton of evdence n the transferable belef model, IEEE Trans. on Pattern Anal. and Mach. Intell., Vol. 12(5): ,1990. [13] A.-L. Jousselme, C. Lu, D. Grener, E. Bossé, Measurng ambguty n the evdence theory, IEEE Trans. on SMC, Part A: 36(5): , Sept [14] T. Denœux, A k-nearest neghbor classfcaton rule based on Dempster- Shafer Theory, IEEE Trans. on Systems, Man and Cybernetcs, Vol. 25(5): , [15] Z.-g. Lu, Q. Pan, J. Dezert, A new belef-based K-nearest neghbor classfcaton method, Pattern Recognton, Vol.46(3): , [16] Z.-g. Lu, Q. Pan, J. Dezert, G. Mercer, Fuzzy-belef K-nearest neghbor classfer for uncertan data, n Proc. of 17th Int. Conference on Informaton Fuson (Fuson 2014), Span, July 7-10, [17] Z.-g. Lu, Q. Pan, J. Dezert, G. Mercer, Credal classfcaton rule for uncertan data based on belef functons, Pattern Recognton, Vol. 47(7): ,2014. [18] T. Denœux, P. Smets, Classfcaton usng belef functons: relatonshp between case-based and model-based approaches, IEEE Trans. on Systems, Man and Cybernetcs, Part B: Vol.36(6): , [19] T. Denœux, A neural network classfer based on Dempster-Shafer theory, IEEE Trans. Systems, Man and Cybernetcs: Part A, Vol.30(2): , [20] M.-H. Masson, T. Denœux, ECM: An evdental verson of the fuzzy c-means algorthm, Pattern Recognton, Vol.41(4): , [21] T. Denœux, M.-H. Masson, EVCLUS: EVdental CLUSterng of proxmty data, IEEE Trans. on Systems, Man and Cybernetcs: Part B, Vol.34(1):95 109, [22] Z.-g. Lu, Q. Pan, J. Dezert, G. Mercer, Credal c-means clusterng method based on belef functons, Knowledge-based systems, Vol.74: , [23] T. Denœux, Maxmum lkelhood estmaton from uncertan data n the belef functon framework, IEEE Transactons on Knowledge and Data Engneerng, Vol. 25(1): , [24] Z.-g. Lu, J. Dezert, Q. Pan, G. Mercer, Combnaton of sources of evdence wth dfferent dscountng factors based on a new dssmlarty measure, Decson Support Systems, Vol.52: , [25] Z.-g. Lu, Q. Pan, G. Mercer, J. Dezert, A new ncomplete pattern classfcaton method based on evdental reasonng, IEEE Trans. Cybernetcs, Vol.45(4) : , [26] T. Kohonen, The Self-Organzng Map, Proceedngs of the IEEE, Vol.78(9): , [27] J. Dezert, A. Tchamova, On the valdty of Dempster s fuson rule and ts nterpretaton as a generalzaton of Bayesan fuson rule, Internatonal Journal of Intellgent Systems, Vol. 29(3): , [28] D. Dubos, H. Prade, Representaton and combnaton of uncertanty wth belef functons and possblty measures, Computatonal Intellgence, Vol. 4(4): , [29] F. Smarandache, J. Dezert, On the consstency of PCR6 wth the averagng rule and ts applcaton to probablty estmaton, n Proc. of 16th Int. Conference on Informaton Fuson (Fuson 2013), Istanbul, Turkey, July 9-12, [30] I. Hammam, J. Dezert, G. Mercer, A. Hamouda, On the estmaton of mass functons usng Self Organzng Maps, n Proc. of Belef 2014 Conf. Oxford, UK, Sept , [31] S. Gesser, Predctve nference: an ntroducton, New York, NY: Chapman and Hall, [32] A. Frank, A. Asuncon, UCI machne learnng repostory, Unversty of Calforna, School of Informaton and Computer Scence, Irvne, CA, USA,

Adaptive imputation of missing values for. incomplete pattern classification

Adaptive imputation of missing values for. incomplete pattern classification Adaptve mputaton of mssng values for 1 ncomplete pattern classfcaton Zhun-ga Lu 1, Quan Pan 1, Jean Dezert 2, Arnaud Martn 3 1. School of Automaton, Northwestern Polytechncal Unversty, X an, Chna. Emal: