Adaptive imputation of missing values for. incomplete pattern classification

Size: px

Start display at page:

Download "Adaptive imputation of missing values for. incomplete pattern classification"

Hillary Hampton
5 years ago
Views:

1 Adaptve mputaton of mssng values for 1 ncomplete pattern classfcaton Zhun-ga Lu 1, Quan Pan 1, Jean Dezert 2, Arnaud Martn 3 1. School of Automaton, Northwestern Polytechncal Unversty, X an, Chna. Emal: luzhunga@nwpu.edu.cn 2. ONERA - The French Aerospace Lab, F Palaseau, France. Emal: jean.dezert@onera.fr arxv: v1 [cs.ai] 8 Feb IRISA, Unversty of Rennes 1, Rue E. Branly, Lannon, France. Emal: Arnaud.Martn@unv-rennes1.fr Abstract In classfcaton of ncomplete pattern, the mssng values can ether play a crucal role n the class determnaton, or have only lttle nfluence (or eventually none) on the classfcaton results accordng to the context. We propose a credal classfcaton method for ncomplete pattern wth adaptve mputaton of mssng values based on belef functon theory. At frst, we try to classfy the object (ncomplete pattern) based only on the avalable attrbute values. As underlyng prncple, we assume that the mssng nformaton s not crucal for the classfcaton f a specfc class for the object can be found usng only the avalable nformaton. In ths case, the object s commtted to ths partcular class. However, f the object cannot be classfed wthout ambguty, t means that the mssng values play a man role for achevng an accurate classfcaton. In ths case, the mssng values wll be mputed based on the K-nearest neghbor (K-NN) and self-organzng map (SOM) technques, and the edted pattern wth the mputaton s then classfed. The (orgnal or edted) pattern s respectvely classfed accordng to each tranng class, and the classfcaton results represented by basc belef assgnments are fused wth proper combnaton rules for makng the credal classfcaton. The object s allowed to belong wth dfferent masses of belef to the specfc classes and meta-classes (whch are partcular dsjunctons of several sngle classes). The credal classfcaton captures well the uncertanty and mprecson of classfcaton, and reduces effectvely the rate of msclassfcatons thanks to the ntroducton of meta-classes. The effectveness of the proposed method wth respect to other classcal methods s demonstrated based on several experments usng artfcal and real data sets. Keywords: belef functon, classfcaton, mssng values, SOM, K-NN.

2 2 I. INTRODUCTION In many practcal classfcaton problems, the avalable nformaton for makng object classfcaton s partal (ncomplete) because some attrbute values can be mssng due to varous reasons (e.g. the falure or dysfunctonng of the sensors provdng nformaton, or partal observaton of object of nterest because of some occultaton phenomenon, etc). So t s crucal to develop effcent technques to classfy as best as possble the objects wth mssng attrbute values (ncomplete pattern), and the search for a soluton of ths problem remans an mportant research topc n the pattern classfcaton feld [1], [2]. Some more detals about pattern classfcaton can be found n [3], [4]. There have been many approaches developed for classfyng the ncomplete patterns [1], and they can be broadly grouped nto four dfferent types. The frst (smplest) one s to remove drectly the patterns wth mssng values, and the classfer s desgned only for the complete patterns. Ths method s acceptable when the ncomplete data set s only a very small subset (e.g. less than 5%) of the whole data set, but t cannot effectvely classfy the pattern wth mssng values. The second type s the modelbased technques [5]. The probablty densty functon (PDF) of the nput data (complete and ncomplete cases) s estmated at frst by means of some procedures, and then the object s classfed usng bayesan reasonng. For nstance, the expectaton-maxmzaton (EM) algorthm have been appled to many problems nvolvng mssng data for tranng Gaussan mxture models [5]. In the model-based methods, t must make assumptons about the jont dstrbuton of all the varables n the model, but the sutable dstrbutons sometmes are hard to obtan. The thrd type classfers are desgned to drectly handle ncomplete pattern wthout mputng the mssng values, such as neural network ensemble methods [6], decson trees [7], fuzzy approaches [8] and support vector machne classfer [9]. The last type s the often used mputaton (estmaton) method. The mssng values are flled wth proper estmatons [10] at frst, and then the edted patterns are classfed usng the normal classfer (for the complete pattern). The mssng values and pattern classfcaton are treated separately n these methods. Many works have been devoted to the mputaton of mssng data, and the mputaton can be done ether by the statstcal methods, e.g. mean mputaton [11], regress mputaton [2], etc, or by machne learnng methods, e.g. K-nearest neghbors mputaton (KNNI) [12], Fuzzy c-means (FCM) mputaton (FCMI) [13], [14], Self-organzng map mputaton (SOMI) [15], etc. In KNNI, the mssng values are estmated usng K-nearest neghbors of object n tranng data space. In FCMI, the mssng values are mputed accordng to the clusterng centers of FCM and takng nto account the dstances of the object to these centers [13], [14]. In SOMI [15], the best match node (unt) of ncomplete pattern can be found gnorng the mssng values, and the mputaton of the mssng values

3 3 s computed based on the weghts of the actvaton group of nodes ncludng the best match node and ts close neghbors. These exstng methods usually attempt to classfy the object nto a partcular class wth maxmal probablty or lkelhood measure. However, the estmaton of mssng values s n general qute uncertan, and the dfferent mputatons of mssng values can yeld very dfferent classfcaton results, whch prevent us to correctly commt the object nto a partcular class. Belef functon theory (BFT), also called Dempster-Shafer theory (DST) [16] and ts extenson [17], [18] offer a mathematcal framework for modelng uncertanty and mprecse nformaton [19]. BFT has already been appled successfully for object classfcaton [20] [28], clusterng [29] [33] and mult-source nformaton fuson [34] [37], etc. Some classfers for the complete pattern based on DST have been developed by Denœux and hs collaborators to come up wth the evdental K-nearest neghbors (EK-NN) [21], evdental neural network (ENN) [27], etc. The extra gnorance element represented by the dsjuncton of all the elements n the whole frame of dscernment s ntroduced n these classfers to capture the totally gnorant nformaton. However, the partal mprecson, whch s very mportant n the classfcaton, s not well characterzed. We have proposed credal classfers [23], [24] for complete pattern consderng all the possble meta-classes (.e. the partcular dsjunctons of several sngleton classes) to model the partal mprecse nformaton. The credal classfcaton allows the objects to belong (wth dfferent masses of belef) not only to the sngleton classes, but also to any set of classes correspondng to the metaclasses. In [23], a belef-based K-nearest neghbor classfer (BK-NN) has been presented, and the credal classfcaton of object s done accordng to the dstances between the object and ts K nearest neghbors as well as two gven (acceptance and rejecton) dstance thresholds. The K-NN classfer generally takes bg computaton burden, and ths s not convenent for real applcaton. Thus, a smple credal classfcaton rule (CCR) [24] has been further developed, and the belef value of object assocated wth dfferent classes (.e. sngleton classes and selected meta-classes) s drectly calculated by the dstance to the center of correspondng class and the dstngushablty degree (w.r.t. object) of the sngleton classes nvolved n the meta-class. The locaton of center of meta-class n CCR s consdered wth the same (smlar) dstance to all the nvolved sngleton classes centers. Moreover, when the tranng data s not avalable, we have also proposed several credal clusterng methods [30] [32] n dfferent cases. Nevertheless, these prevous credal classfcaton methods are manly for dealng wth complete pattern wthout takng nto account the mssng values. In our recent work, a prototype-based credal classfcaton (PCC) [25] method for the ncomplete patterns has been ntroduced to capture the mprecse nformaton caused by the mssng values. The object hard to

4 4 correctly classfy are commtted to a sutable meta-class by PCC, whch well characterzes the mprecson of classfcaton due to absence of part attrbutes and also reduces the msclassfcaton errors. In PCC, the mssng values n all the ncomplete patterns are mputed usng prototype of each class center, and the edted pattern wth each mputaton s respectvely classfed by a standard classfer (for complete pattern). Wth PCC, one obtans c peces of classfcaton results for each ncomplete pattern n a c class problem, and the global fuson of the c results s gven for the credal classfcaton. Unfortunately, PCC classfer s computatonally greedy and tme-consumng, and the mputaton of mssng values based on class prototype s not so precse. In order to overcome the lmtatons of PCC, we propose a new credal classfcaton method for ncomplete pattern wth adaptve mputaton of mssng values, and t can be called Credal Classfcaton wth Adaptve Imputaton (CCAI) for short. The pattern to classfy usually conssts of multple attrbutes. Sometmes, the class of the pattern can be precsely determned usng only a part (a subset) of the avalable attrbutes, and t mples that the other attrbutes are redundant and n fact unnecessary for the classfcaton. So a new method of credal classfcaton wth adaptve mputaton strategy (.e. CCAI) for mssng values s proposed. In CCAI, we attempt to classfy the object only usng the known attrbutes value at frst. If a specfc classfcaton result s obtaned, t very lkely means that the mssng values are not very necessary for the classfcaton, and we drectly take the decson on the class of the object based on ths result. However, f the object cannot be clearly classfed wth the avalable nformaton, t ndcates that the mssng nformaton ncluded n the mssng attrbute values s probably very crucal for makng the classfcaton. In ths case, we present a sophstcated classfcaton strategy for the edton of pattern based on the proper mputaton of mssng values. K-nearest neghbors-based mputaton method usually provdes pretty good performances for the estmaton of mssng values, but the ts man drawback s the bg computatonal burden. To reduce the computatonal burden, Self-Organzng Map (SOM) [38] s appled n each class, and the optmzed weghtng vectors are used to represent the correspondng class. Then, the K nearest weghtng vectors of the object n each class are respectvely employed to estmate the mssng values. For the classfcaton of orgnal ncomplete pattern (wthout mputaton of mssng values) or the edted pattern (wth mputaton of mssng values), we adopt the ensemble classfer approach. One can respectvely get the smple classfcaton result accordng to each tranng class, and each classfcaton result s represented by a smple basc belef assgnment (BBA) ncludng two focal elements (.e. sngleton class and gnorant class) only. The belef of the object belongng to each class s calculated based on the dstance to the

5 5 correspondng prototype, and the other belef s commtted to the gnorant element. The fuson (ensemble) of these multple BBA s s then used to determne the class of the object. If the object s drectly classfed usng only the known values, Dempster-Shafer 1 (DS) fuson rule [16] s appled because of the smplcty of ths rule and also because the BBA s to fuse are usually n low conflct. In ths case, a specfc result s obtaned wth DS rule. Otherwse, a new fuson rule nspred by Dubos and Prade (DP) rule [39] s used to classfy the edted pattern wth proper mputaton of ts mssng values. Because the estmaton of the mssng values can be qute uncertan, t naturally nduces an mprecse classfcaton. So the partal conflctng belefs wll be kept and commtted to the assocated meta-classes n ths new rule to reasonably reveal the potental mprecson of the classfcaton result. In ths paper, we present an credal classfcaton method wth adaptve mputaton of mssng values based on belef functon theory for dealng wth the ncomplete patterns, and t s organzed as follows. The bascs of belef functon theory and Self-Organzng Map s brefly recalled n secton II. The new credal classfcaton method for ncomplete patterns s presented n the secton III, and the proposed method s then tested and evaluated n secton IV compared wth several other classcal methods. The paper s concluded n the fnal. II. BACKGROUND KNOWLEDGE Belef functon theory (BFT) can well characterze the uncertan and mprecse nformaton, and t s used n ths work for the classfcaton of patterns. SOM technque s employed to fnd the optmzed weghtng vectors whch are used to represent the correspondng class, and ths can reduce the computaton burden n the estmaton of the mssng values based on K-NN method. So the basc knowledge on BFT and SOM wll be brefly recalled. A. Bass of belef functon theory The Belef Functon Theory (BFT) ntroduced by Glenn Shafer s also known as Dempster-Shafer Theory (DST), or the Mathematcal Theory of Evdence [16] [18]. Let us consder a frame of dscernment consstng of c exclusve and exhaustve hypotheses (classes) denoted by Ω = {ω, = 1, 2,..., c}. The power-set of Ω denoted 2 Ω s the set of all the subsets of Ω, empty set ncluded. For example, f Ω = {ω 1, ω 2, ω 3 }, then 2 Ω = {, ω 1, ω 2, ω 3, ω 1 ω 2, ω 1 ω 3, ω 2 ω 3, Ω}. In the classfcaton problem, the sngleton element (e.g. ω ) represents a specfc class. In ths work, the dsjuncton (unon) of several 1 Although the rule has been proposed orgnally by Arthur Dempster, we prefer to call t Dempster-Shafer rule because t has been wdely promoted by Shafer n [16].

6 6 sngleton elements s called a meta-class whch characterzes the partal gnorance of classfcaton. Examples of meta-classes are ω ω j, or ω ω j ω k. In BFT, one object can be assocated wth dfferent sngleton elements as well as wth sets of elements accordng to a basc belef assgnment (BBA), whch s a functon m(.) from 2 Ω to [0, 1] satsfyng m( ) = 0 and the normalzaton condton A 2 Ω m(a) = 1. The subsets A of Ω such that m(a) > 0 are called the focal elements of the belef mass m(.). The credal classfcaton (or parttonng) [29] s defned as n-tuple M = (m 1,, m n ) of BBA s, where m s the basc belef assgnment of the object x X, = 1,..., n assocated wth the dfferent elements n the power-set 2 Θ. The credal classfcaton allows the objects to belong to the specfc classes and the sets of classes correspondng to meta-classes wth dfferent belef mass assgnments. The credal classfcaton can well model the mprecse and uncertan nformaton thanks to the ntroducton of metaclass. For combnng multple sources of evdence represented by a set of BBA s, the well-known Dempster s rule [16] s stll wdely used, even f ts justfcaton s an open debate and questonable n the communty [40], [41]. The combnaton of two BBA s m 1 (.) and m 2 (.) over 2 Ω s done wth DS rule of combnaton defned by m DS ( ) = 0 and for A, B, C 2 Ω by m DS (A) = m 1 (B)m 2 (C) B C=A 1 m 1 (B)m 2 (C) B C= (1) B C= DS rule s commutatve and assocatve, and makes a compromse between the specfcty and complexty for the combnaton of BBA s. Wth ths rule, all the conflctng belefs m 1 (B)m 2 (C) are proportonally redstrbuted back to the focal elements through a classcal normalzaton step. However, ths redstrbuton can yeld unreasonable results n the hgh conflctng cases [40], as well as n some specal low conflctng cases as well [41]. That s why dfferent rules of combnaton have emerged to overcome ts lmtatons. Among the possble alternatves of DS rule, we fnd Smets conjunctve rule (used n hs transferable belef model (TBM) [18]), Dubos-Prade (DP) rule [39], and more recently the more complex Proportonal Conflct Redstrbutons (PCR) rules [42]. Unfortunately, DP and PCR rules are less appealng from mplementaton standpont snce they are not assocatve, and they become complex to use when more than two BBA s have to be combned altogether.

7 7 B. Overvew of Self-Organzng Map Self-Organzng Map (SOM) (also called Kohonen map) [38] ntroduced by Teuvo Kohonen s a type of artfcal neural network (ANN), and t s traned by unsupervsed learnng method. SOM defnes a mappng from the nput space to a low-dmensonal (typcally two-dmensonal) grd of M N nodes. So t allows to approxmate the feature space dmenson (e.g. a real nput vector x R p ) nto a projected 2D space, and t s stll able to preserve the topologcal propertes of the nput space usng a neghborhood functon. Thus, SOM s very useful for vsualzng low-dmensonal vews of hgh-dmensonal data by a non lnear projecton. The node at poston (, j), = 1,... M, j = 1,..., N corresponds to a weghtng vector denoted by σ(, j) R p. An nput vector x R p s to be compared to each σ(, j), and the neuron whose weghtng vector s the most close (smlar) to x accordng to a gven metrc s called the best matchng unt (BMU), whch s defned as the output of SOM wth respect to x. In real applcatons, the Eucldean dstance s usually used to compare x and σ(, j). The nput pattern x can be mapped onto the SOM at locaton (, j) where σ(, j) s wth the mnmal dstance to x. It s consdered that the SOM acheves a non-unform quantzaton that transforms x to σ x by mnmzng the gven metrc (e.g. dstance measure) [43]. In SOM, the compettve learnng s adopted, and the tranng algorthm s teratve. The ntal values of the weghtng vectors σ may be set randomly, but they wll converge to a stable value at the end of the tranng process. When an nput vector s fed to the network, ts Eucldean dstance to all weght vectors s computed. Then the BMU whose weght vector s most smlar to the nput vector s found, and the weghts of the BMU and neurons close to t n the SOM grd are adjusted towards the nput vector. The magntude of the change decreases wth tme and wth dstance (wthn the grd) from the BMU. The detaled nformaton about SOM can be found n [38]. In ths work, SOM s appled n each tranng class to obtan the optmzed weghtng vectors that are used to represent the correspondng class. The number of the weghtng vectors s much smaller than the orgnal samples n the assocated tranng class. We wll utlze these weghtng vectors rather than the orgnal samples to estmate the mssng values n the object (ncomplete pattern), and ths could effectvely reduce the computaton burden. III. CREDAL CLASSIFICATION OF INCOMPLETE PATTERN Our new method conssts of two man steps. In the frst step, the object (ncomplete pattern) s drectly classfed accordng to the known attrbute values only, and the mssng values are gnored. If one can get a

8 8 specfc classfcaton result, the classfcaton procedure s done because the avalable attrbute nformaton s suffcent for makng the classfcaton. But f the class of the object cannot be clearly dentfed n the frst step, t means that the unavalable nformaton ncluded n the mssng values s lkely crucal for the classfcaton. In ths case, one has to enter n the second step of the method to classfy the object wth a proper mputaton of mssng values. In the classfcaton procedure, the orgnal or edted pattern wll be respectvely classfed accordng to each class of tranng data. The global fuson of these classfcaton results, whch can be consdered as multple sources of evdence represented by BBA s, s then used for the credal classfcaton of the object. Our new method for credal classfcaton of ncomplete pattern wth adaptve mputaton of mssng values s referred as Credal Classfcaton wth Adaptve Imputaton, or just as CCAI for concseness. CCAI s based on belef functon theory, whch can well manage the uncertan and mprecse nformaton caused by the mssng values n the classfcaton. A. Frst step: Drect classfcaton of ncomplete pattern usng the avalable data Let us consder a set of test patterns (samples) X = {x 1,..., x n } to be classfed based on a set of labeled tranng patterns Y = {y 1,..., y s } over the frame of dscernment Ω = {ω 1,..., ω c }. In ths work, we focus on the classfcaton of ncomplete pattern n whch some attrbute values are absent. So we consder all the test patterns (e.g. x, = 1,..., n) wth several mssng values. The tranng data set Y may also have ncomplete patterns n some applcatons. However, f the ncomplete patterns take a very small amount say less than 5% n the tranng data set, they can be gnored n the classfcaton. If the percentage of ncomplete patterns s bg, the mssng values must usually be estmated at frst, and the classfer wll be traned usng the edted (complete) patterns. In the real applcatons, one can also just choose the complete labeled patterns to nclude n the tranng data set when the tranng nformaton s suffcent. So for smplcty and convenence, we consder that the labeled samples (e.g. y j, j = 1,..., s) of the tranng set Y are all complete patterns n the sequel. In the frst step of classfcaton, the ncomplete pattern say x wll be respectvely classfed accordng to each tranng class by a normal classfer (for dealng wth the complete pattern) at frst, and all the mssng values are gnored here. In ths work, we adopt a very smple classfcaton method 2 for the convenence of computaton, and x s drectly classfed based on the dstance to the prototype of each class. The prototype of each class {o 1,..., o c } correspondng to {ω 1,..., ω c } s gven by the arthmetc 2 Many other normal classfers (e.g. K-NN) can be selected here dependng on the preference of user, and we propose to use ths smple classfcaton method because of ts low computaton complexty.

9 9 average vector of the tranng patterns n the same class. Mathematcally, the prototype s computed for g = 1,..., c by o g = 1 y j (2) N g y j ω g where N g s the number of the tranng samples n the class ω g. In a c-class problem, one can get c peces of smple classfcaton result for x accordng to each class of tranng data, and each result s represented by a smple BBA s ncludng two focal elements,.e. the sngleton class and the gnorant class (Ω) to characterze the full gnorance. The belef of x belongng to class ω g s computed based on the dstance between x and the correspondng prototype o g. Normalzed Eucldean dstance as eq. (4) s adopted here to deal wth the ansotropc class, and the mssng values are gnored n the calculaton of ths dstance. The other mass of belef s assgned to the gnorant class Ω. Therefore, the BBA s constructon s done by (ω g ) = e ηd g m og m og (Ω) = 1 e ηd g (3) wth d g = 1 p p ( ) 2 xj o gj (4) j=1 δ gj and 1 δ gj = (y j o gj ) 2 (5) N g y ω g where x j s value of x n j-th dmenson, and y j s value of y n j-th dmenson. p s the number of avalable attrbute values n the object x. The coeffcent 1/p s necessary to normalze the dstance value because each test sample can have a dfferent number of mssng values. δ gj s the average dstance of all tranng samples n class ω g to the prototype o g n j-th dmenson. N g s the number of tranng samples n ω g. η s a tunng parameter, and the bgger η generally yelds smaller mass of belef on the specfc class w g. It s usually recommended to take η [0.5, 0.8] accordng to our varous tests, and η = 0.7 can be consdered as default value. Obvously, the smaller dstance measure, the bgger mass of belef on the sngleton class. Ths partcular structure of BBA s ndcates that we can just confrm the degree of the object x assocated wth the specfc class ω g only accordng to tranng data n ω g. The other mass of belef reflects the level of belef one has on full gnorance, and t s commtted to the gnorant class Ω. Smlarly, one calculates c ndependent

10 10 BBA s m og (ω g ), g = 1,..., c based on the dfferent tranng classes. Before combnng these c BBA s, we examne whether a specfc classfcaton result can be derved from these c BBA s. Ths s done as follows: f t holds that m o 1st (ω 1st ) = argmax g (m og (ω g )), then the object wll be consdered to belong very lkely to the class ω 1st, whch obtans the bggest mass of belef n the c BBA s. The class wth the second bggest mass of belef s denoted ω 2nd. The dstngushablty degree χ (0, 1] of an object x assocated wth dfferent classes s defned by: χ = mo 2nd (ω 2nd ) m omax (ω max ) (6) Let ɛ be a chosen small postve dstngushablty threshold value n (0, 1]. If the condton χ ɛ s satsfed, t means that all the classes nvolved n the computaton of χ can be clearly dstngushed of x. In ths case, t s very lkely to obtan a specfc classfcaton result from the fuson of the c BBA s. The condton χ ɛ also ndcates that the avalable attrbute nformaton s suffcent for makng the classfcaton of the object, and the mputaton of the mssng values s not necessary. If χ ɛ condton holds, the c BBA s are drectly combned wth DS rule to obtan the fnal classfcaton results of the object because DS rule usually produces specfc combnaton result wth acceptable computaton burden n the low conflctng case. In such case, the meta-class s not ncluded n the fuson result, because these dfferent classes are consdered dstngushable based on the condton of dstngushablty. Moreover, the mass of belef of the full gnorance class Ω, whch represents the nosy data (outlers), can be proportonally redstrbuted to other sngleton classes for more specfc results f one knows a pror that the nosy data s not nvolved. If the dstngushablty condton χ ɛ s not satsfed, t means that the classes ω 1st and ω 2nd cannot be clearly dstngushed for the object wth respect to the chosen threshold value ɛ, ndcatng that mssng attrbute values play almost surely a crucal role n the classfcaton. In ths case, the mssng values must be properly mputed to recover the unavalable attrbute nformaton before enterng the classfcaton procedure. Ths s the Step 2 of our method whch s explaned n the next subsecton. B. Second step: Classfcaton of ncomplete pattern wth mputaton of mssng values 1) Multple estmaton of mssng values: In the estmaton of the mssng attrbute values, there exst varous methods. Partcularly, the K-NN mputaton method generally provdes good performance. However, the man drawback of KNN method s ts bg computatonal burden, snce one needs to calculate the dstances of the object wth all the tranng samples. Inspred by [43], we propose to use the Self

11 11 Organzed Map (SOM) technque [38] to reduce the computatonal complexty. SOM can be appled n each class of tranng data, and then M N weghtng vectors wll be obtaned after the optmzaton procedure. These optmzed weghtng vectors allow to characterze well the topologcal features of the whole class, and they wll be used to represent the correspondng data class. The number of the weghtng vectors s usually small (e.g. 5 6). So the K nearest neghbors of the test pattern assocated wth these weghtng vectors n the SOM can be easly found wth low computatonal complexty 3. The selected weghtng vector no. k n the class ω g, g = 1,..., c s denoted σ ωg k, for k = 1,..., K. In each class, the K selected close weghtng vectors provde dfferent contrbutons (weght) n the estmaton of mssng values, and the weght p ωg k the object x and weghtng vector σ ωg k. of each vector s defned based on the dstance between p ωg k = e( λdωg k ) (7) wth λ = cnm(cnm 1) 2 d(σ, σ j ),j (8) where d ωg k s the Eucldean dstance between x and the neghbor o ωg k gnorng the mssng values, and 1 λ s the average dstance between each par of weghtng vectors produced by SOM n all the classes; c s the number of classes; M N s the number of weghtng vectors obtaned by SOM n each class; and d(σ, σ j ) s the Eucldean dstance between any two weghtng vectors σ and σ j. The weghted mean value ŷ ωg used for the mputaton of mssng values. It s calculated by of the selected K weghtng vectors n class tranng class ω g wll be ŷ ωg = K k=1 p ωg k σωg k K k=1 p ωg k (9) The mssng values n x wll be flled by the values of ŷ ωg get the edted pattern x ωg accordng to the tranng class ω g. Then x ωg n the same dmensons. By dong ths, we wll be smply classfed only based on the tranng data n ω g as smlarly done n the drect classfcaton of ncomplete pattern usng eq. (3) of Step 1 for convenence 4. 3 The tranng of SOM usng the labeled patterns becomes tme consumng when the number of labeled patterns s bg, but fortunately t can be done off-lne. In our experments, the runnng tme performance shown n the results doesn t nclude the computatonal tme spent for the off-lne procedures. 4 Of course, some other sophstcated classfers can also be appled here accordng to the selecton of user, but the choce of classfer s not the man purpose of ths work.

12 12 The classfcaton of x wth the estmaton of mssng values s also respectvely done based on the other tranng classes accordng to ths procedure. For a c-class problem, there are c tranng classes, and therefore one can get c peces of classfcaton results wth respect to one object. 2) Ensemble classfer for credal classfcaton: These c peces of results obtaned by each class of tranng data n a c-class problem are consdered wth dfferent weghts, snce the estmatons of the mssng values accordng to dfferent classes have dfferent relabltes. The weghtng factor of the classfcaton result assocated wth the class w g can be defned by the sum of the weghts of the K selected SOM weghtng vectors for the contrbutons to the mssng values mputaton n ω g, whch s gven by ρ ωg = The result wth the bggest weghtng factor ρ ωmax K k=1 p ωg k (10) s consdered as the most relable, because one assumes that the object must belong to one of the labeled classes (.e. w g, g = 1,..., c). So the bggest weghtng factor wll be normalzed as one. The other relatve weghtng factors are defned by: ˆα ωg = ρωg ρ ωmax (11) If the condton 5 ˆα ωg < ɛ s satsfed, the correspondng estmaton of the mssng values and the classfcaton result are not very relable. Very lkely, the object does not belong to ths class. It s mplctly assumed that the object can belong to only one class n realty. If ths result whose relatve weghtng factor s very small (w.r.t. ɛ) s stll consdered useful, t wll be (more or less) harmful for the fnal classfcaton of the object. So f the condton ˆα wg zero. More precsely, we wll take α ωg = < ɛ holds, then the relatve weghtng factor s set to 0, f ˆα ωg < ɛ ρ ωg ρ ωmax, otherwse. After the estmaton of weghtng (dscountng) factors α ωg, the c classfcaton results (the BBA s m og (.)) are classcally dscounted [16] by ˆm og (ω g ) = α ωg m og (ω g ) ˆm og (Ω) = 1 α ωg + α ωg m og (Ω) These dscounted BBA s wll be globally combned to get the credal classfcaton result. If α ωg = 0, 5 The threshold ɛ s the same as n secton III-A, because t s also used to measure the dstngushablty degree here. (12) (13)

13 13 one gets ˆm og (Ω) = 1, and ths fully gnorant (vacuous) BBA plays a neutral role n the global fuson process for the fnal classfcaton of the object. Although we have done our best to estmate the mssng values, the estmaton can be qute mprecse when the estmatons are obtaned from dfferent class wth the smlar weghtng factors, and the dfferent estmatons probably lead to dstnct classfcaton results. In such case, we prefer to cautously keep (rather to gnore) the uncertanty, and mantan the uncertanty n the classfcaton result. Such uncertanty can be well reflected by the conflct of these classfcaton results represented by the BBA s. DS rule s not sutable here, because all the conflctng belefs are dstrbuted to other focal elements. A partcular combnaton rule nspred by DP rule s ntroduced here to fuse these BBA s accordng to the current context. In our new rule, the partal conflctng belefs are prudently transferred to the proper meta-class to reveal the mprecson degree of the classfcaton caused by the mssng values. Ths new rule of combnaton s defned by: m (ω g ) = ˆm og (ω g ) j g m (A) = ω j =A j ˆm o j (Ω) ˆm o j (ω j ) k j ˆm o k (Ω) The test pattern can be classfed accordng to the fuson results, and the object s consdered belongng to the class (sngleton class or meta-class) wth the maxmum mass of belef. Ths s called hard credal classfcaton. If one object s classfed to a partcular class, t means that ths object has been correctly classfed wth the proper mputaton of mssng values. If one object s commtted to a meta-class (e.g. A B), t means that we just know that ths object belongs to one of the specfc classes (e.g. A or B) ncluded n the meta-class, but we cannot specfy whch one. Ths case can happen when the mssng values are essental for the accurate classfcaton of ths object, but the mssng values cannot be estmated very well accordng to the context, and dfferent estmatons wll nduce the classfcaton of the object nto dstnct classes (e.g. A or B). For convenence, Fg. 1 shows the functonal flowchart of ths new CCAI method. (14) Gudelne for tunng of the parameters ɛ and η: The tunng of parameters η and ɛ s very mportant n the applcaton of CCAI. η n eq. (3) s assocated wth the calculaton of mass of belef on the specfc class, and the bgger η value wll lead to smaller mass of belef commtted to the specfc class. Based on our varous tests, we advse to take η [0.5, 0.8], and the value η = 0.7 can be taken as the default value. The parameter ɛ s the threshold to tune for changng the classfcaton strategy. It s also used

14 14 Fgure 1. Flowchart of the proposed CCAI method. n Eq. (12) for the calculaton of the dscountng factor. The bgger ɛ wll make fewer objects gong to the sophstcated classfcaton procedure wth the mputaton of mssng values, and t also forces more dscountng factors to zero accordng to Eq. (12), whch mples that fewer smple classfcaton results obtaned based on each class can be useful n the global fuson step. So the bgger ɛ wll makes fewer objects commtted to the meta-classes (correspondng to the low mprecson of classfcaton), but t ncreases the rsk of msclassfcaton error. ɛ should be tuned accordng to the compromse one can accept between the msclassfcaton error and mprecson (non specfcty of classfcaton decson). One can

15 15 also apply the cross valdaton [44] (e.g. leave-one-out method) n the tranng data space to fnd a sutable threshold, and the mssng values n the test samples are randomly dstrbuted n all the dmensons. IV. EXPERIMENTS Three experments wth artfcal and real data sets have been used to test the performance of ths new CCAI method compared wth the K-NN mputaton (KNNI) method [12], FCM mputaton (FCMI) method [13], [14], SOM mputaton (SOMI) [15] method and our prevous credal classfcaton PCC method [25]. SOM technque s also employed n the second step of CCAI method, but CCAI s dfferent from the prevous SOMI method. In SOMI method, SOM s appled for the whole tranng data set, and the mssng values are precsely estmated based on an actvaton group composed of the best match node (unt) of nput pattern and ts close neghbors. Then, the edted pattern wth the mputaton of mssng values can be classfed usng a standard classfer. Nevertheless, SOM s not nvolved n the frst step of CCAI, and the object s drectly classfed gnorng the mssng values. In the second step of CCAI, SOM s respectvely appled n each tranng class, and multple estmatons of mssng values can be obtaned based on the nput pattern s K nearest weghtng vectors correspondng to nodes of SOM n each class. Then dfferent classfcaton results wll be produced accordng to dfferent estmatons, and these results are globally fused for fnal classfcaton. The conflctng nformaton commtted to the meta-class s kept n the fuson to characterze the mprecson of classfcaton n CCAI, but ths cannot be done n SOMI. These dfferent methods have been programmed and tested wth Matlab TM software. The evdental neural network classfer (ENN) [27] s adopted n the sequel experments to classfy the edted pattern wth the estmated values n PCC, KNNI and FCMI, snce ENN produce generally good results n the classfcaton 6. The evdental K-nearest neghbor (EK-NN) method [21] s also used to classfy the edted pattern n Experment 3 wth real data for comparson. The parameters of ENN and EK-NN can be automatcally optmzed as explaned n [27] and [22]. In SOMI, we use the M N = 6 8 nodes for mappng the whole nput data set consstng of all the tranng classes to the 2-dmensonal grd, and t has good performance. In the applcatons of PCC, the tunng parameter ɛ can be tuned accordng to the mprecson rate one can accept. In CCAI, a small number of the nodes n the 2-dmensonal grd of SOM s gven by M N = 3 4 for each class, and we take the value of K = N = 4 n K-NN for the mputaton of mssng values. Ths seems to provde good result n the sequel experments. In order to show the ablty of CCAI and PCC to deal wth the meta-classes, the hard credal classfcaton s appled, and the class of each object s decded accordng to the crteron of the maxmal mass of belef. 6 Other tradtonal classfers for complete pattern can also be selected here accordng to the actual applcaton.

16 16 In our smulatons, the msclassfcaton s declared (counted) for one object truly orgnated from ω f t s classfed nto A wth ω A =. If ω A and A ω then t wll be consdered as an mprecse classfcaton. The error rate denoted by Re s calculated by Re = N e /T, where N e s number of msclassfcaton errors, and T s the number of objects under test. The mprecson rate denoted by R j s calculated by R j = N j /T, where N j s number of objects commtted to the meta-classes wth the cardnalty value j. In our experments, the classfcaton of object s generally uncertan (mprecse) among a very small number (e.g. 2) of classes, and we only take R 2 commtted to the meta-class ncludng three or more specfc classes. here snce there s no object A. Experment 1 (artfcal data set) In the frst experment, we show the nterest of credal classfcaton based on belef functons wth respect to the tradtonal classfcaton workng wth probablty framework. A 3-class data set Ω = {ω 1, ω 2, ω 3 } obtaned from three 2-D unform dstrbutons shown by Fg. 2 s consdered here. Each class has 200 tranng samples and 200 test samples, and there are 600 tranng samples and 600 test samples n total. The unform dstrbutons of the three classes are characterzed by the followng nterval bounds: x-label nterval y-label nterval ω 1 (5, 65) (5, 25) ω 2 (95, 155) (5, 25) ω 3 (50, 110) (50, 70) The values n the second dmenson correspondng to y-coordnate of test samples are all mssng. So test samples are classfed accordng to the only one avalable value n the frst dmenson correspondng to x-coordnate. Several dfferent methods lke FCMI, KNNI, SOMI have been appled here for comparson wth CCAI as shown by Fg. 3-(a) 3-(f). Partcularly, the classfcaton result obtaned usng the (frst or second) sngle step of CCAI (denoted by SCCAI) are also gven as n Fg. 3-(d) 3-(e). In the frst step of CCAI, the drect classfcaton s done wthout mputaton of mssng value, whereas the object s classfed wth mputaton of mssng values n all ncomplete patterns by the only second step of CCAI. A partcular value of K = 9 s selected n the classfer K-NN mputaton method 7. For notaton concseness, we have denoted ω te ω test, ω tr ω tranng and ω,...,k ω... ω k. The error rate (n %), mprecson rate (n %) and computaton tme (Sec.) are specfed n the capton of each subfgure. 7 In fact, the choce of K rankng from 7 to 15 does not affect serously the results.

17 tr w 1 tr w 2 tr w 3 te w 1 te w 2 te w Fgure 2. Tranng data and test data. Because the y value n the test sample s mssng, the class w 3 appears partally overlapped wth the classes ω 1 and ω 2 on ther margns accordng to the value of x-coordnate as shown n Fg. 3-(a). The mssng value of the samples n the overlapped parts can be flled by qute dfferent estmatons obtaned from dfferent classes wth the almost same relabltes. For example, the estmaton of the mssng values of the objects n the rght margn of ω 1 and the left margn of ω 3 can be obtaned accordng to the tranng class ω 1 or ω 3. The edted pattern wth the estmaton from ω 1 wll be classfed nto class ω 1, whereas t wll be commtted to class ω 3 f the estmaton s drawn from ω 3. It s smlar to the test samples n the left margn of ω 2 and the rght margn of ω 3. Ths ndcates that the mssng value play a crucal rule n the classfcaton of these objects, but unfortunately the estmaton of these nvolved mssng values are

18 w 1 w 1 60 w 2 60 w 2 w 3 w (a). Classfcaton result by FCMI (Re = 14.67, tme = s) (b). Classfcaton result by KNNI (Re = 14.17, tme = s) w 1 w 1 60 w 2 60 w 2 w 3 w w 1,2 w 1, w 2, (c). Classfcaton result by SOMI (Re = 14.33, tme = s). w (d). Classfcaton result only by 1 st step of SCCAI (Re = 14.83, tme = s). 70 w 1 60 w 2 60 w 2 w 3 w 3 50 w 1,2 w 1,3 50 w 1,2 w 1,3 40 w 2,3 40 w 2, (e). Classfcaton result only by 2 nd step of SCCAI (Re = 4.83, R 2 = 19.33, tme = s). Fgure 3. Classfcaton results of a 3-class artfcal data set by dfferent methods. (f). Classfcaton result by CCAI (Re = 5.83, R 2 = 16.83, tme = s). qute uncertan accordng to context. So these objects are prudently classfed nto the proper meta-class (e.g. ω 1 ω 3 and ω 2 ω 3 ) by CCAI. The CCAI results ndcate that these objects belong to one of the specfc classes ncluded n the meta-classes, but these specfc classes cannot be clearly dstngushed by

19 19 the object based only on the avalable values. If one wants to get more precse and accurate classfcaton results, one needs to request for addtonal resources for gatherng more useful nformaton. The other objects n the left margn of ω 1, rght margn of ω 2 and mddle of ω 3 can be correctly classfed based on the only known value n x-coordnate, and t s not necessary to estmate the mssng value for the classfcaton of these objects n CCAI. However, all the test samples are classfed nto specfc classes by the tradtonal methods KNNI and FCMI, and ths causes many errors due to the lmtaton of probablty framework. If we just apply the frst step of SCCAI wthout mputaton of the mssng value and drectly classfy all the objects usng the only known value (.e. value n x-coordnate), t produces bgger error rate than the other methods, and ths ndcates that the mputaton procedure s mportant to mprove the accuracy of classfcaton. If only the second step of SCCAI s done wth mputaton of the mssng values n all ncomplete patterns, t causes hgh mprecson rate that s not an effcent soluton, and t takes much longer computaton tme than CCAI. CCAI wth the adaptve mputaton strategy can well balance the error rate, mprecson rate and computaton burden. CCAI consstng of two steps generally produces smaller error rate than KNNI, FCMI and SOMI thanks to the use of meta-classes. Meanwhle, the computatonal tme of CCAI s smlar to that of FCMI, and s much shorter than KNNI because of the ntroducton of SOM technque n the estmaton of mssng values. It shows that the computatonal complexty of CCAI s relatvely low. Ths smple example shows the nterest and the potental of the credal classfcaton obtaned wth CCAI method. B. Experment 2 (artfcal data set) In ths second experment, we evaluate the performance of CCAI method usng a 4D data set whch ncludes 3 classes ω 1, ω 2, and ω 3. The artfcal data are generated from three 4D Gaussan dstrbutons characterzed by the followng means vectors and covarance matrces (I denotes the 4 4 dentty matrx): µ 1 = (10, 50, 100, 100) T, Σ 1 = 10 I µ 2 = (30, 40, 50, 90) T, Σ 2 = 15 I µ 3 = (20, 80, 90, 130) T, Σ 3 = 12 I We have used g tranng samples, and g test samples (for g = 500, and g = 1000) n each class. So there are total N = 3 g tranng samples and N = 3 g test samples. Each test sample has n mssng values (for n = 1, 2, 3), and the mssng component value s randomly dstrbuted n every dmenson. Three other methods KNNI, FCMI, SOMI and PCC are also appled here for the performances comparson. For

20 20 each par (N, n), the reported error rates, mprecson rates and runnng tme (sec.) are the averages over 10 trals performed wth 10 ndependent random generaton of the data sets. For KNNI, the values of K rangng from 5 to 20 neghbors have been tested, and the mean error rate wth K [5, 20] s gven n Table I. In PCC method, the parameter ɛ has been optmzed to obtan an acceptable compromse between error rate and the mprecson degree. ENN s adopted to classfy the edted pattern wth mputaton of mssng values n FCMI, KNNI, SOMI and PCC. Table I CLASSIFICATION RESULTS FOR 3-CLASS DATA SET BY DIFFERENT METHODS (IN %). (N,n) FCMI KNNI SOMI PCC CCAI {Re, tme} {Re, tme} {Re, tme} {Re, R 2, tme} {Re, R 2, tme} (1500,1) {6.73, s} {7.42, s} {7.22, s} {6.20, 2.33, s} {4.64, 3.87, s} (1500,2) {14.38, s} {15.68, s} {15.43, s} {13.47, 5.93, s} {9.76, 9.79, s} (1500,3) {36.84, s} {40.11, 3.002s} {40.10, s} {34.57, 7.97, s} {29.71, 15.6, s} (3000,1) {6.75, s} {7.54, s} {7.14, s} {6.17, 1.63, s} {4.73, 3.83, s} (3000,2) {14.73, s} {15.80, s} {15.20, s} {14.00, 1.60, s} {9.90, 10.33, s} (3000,3) {36.43,1.6500s} {40.48, s} {40.05, s} {33.94, 8.13, s} {29.52, 16.83, s} The classfcaton results of the appled methods (.e. FCMI, KNNI, SOMI, PCC and CCAI) have been shown n Table I. Our proposed CCAI method produces the lowest error rate, snce some objects hard to correctly classfy because of the mssng values have been commtted to the proper meta-class. Meanwhle, CCAI takes the shortest computaton tme compared wth the other methods. Ths s because that some ncomplete patterns are drectly classfed gnorng the mssng values, whch are consdered unmportant for the classfcaton. However, the mssng values n each pattern are all mputed by other methods, and ths needs more computatons and thus ncreases the computatonal tme. Moreover, one can see that KNNI takes the longest tme, and ths s the man drawback of K-NN based method. The K-NN strategy s also adopted n CCAI, but we use a few optmzed weghtng vectors acqured by SOM technque to represent the whole tranng data class. Thus, we just need to calculate the dstances between the object and these obtaned weghtng vectors rather than all the tranng samples, whch reduces a lot the computaton burden. C. Experment 3 (real data set) Nne well known real data sets 8 avalable from UCI Machne Learnng Repostory [45] are used n ths experment to evaluate the performance of CCAI wth respect to KNNI, FCMI, SOMI and PCC. Both ENN and EK-NN are employed here as standard classfer to classfy the edted patterns. Moreover, the 8 We select seven classes from Yeast data set, because the last three classes (.e. VAC POX and ERL) contan qute few samples.

21 21 sngle (1 st and 2 nd ) step procedure of CCAI (SCCAI) has been also appled here for comparson. In the frst step of SCCAI, the object s drectly classfed usng the only avalable attrbutes wthout mputaton procedure, whereas all the mssng values are mputed before the classfcaton n the second step of SCCAI. The basc nformaton of these used real data sets s gven n Table II. In Hepatts data set, many patterns have already contaned mssng values. The patterns wth mssng values are consdered as test samples, and the others are used as tranng samples. There s no mssng values n the other seven orgnal data sets, and t s assumed that n values are mssng completely at random n all dmensons of each test sample. The cross valdaton s performed for these seven data sets, and we use the smplest 2-fold cross valdaton 9 here, snce t has the advantage that the tranng and test sets are both large, and each sample s used for both tranng and testng on each fold. The 2-fold cross valdaton has been repeated 10 tmes, and the average error rate Re and mprecson rate R (for PCC and CCAI) of the dfferent methods are gven n Table III. Partcularly, the reported classfcaton result of KNNI s the average wth K value rangng from 5 to 15. For the notaton concseness, the selected classfer (SC) s denoted by A=EK-NN, B=ENN n Table III. For the method of sngle step of CCAI (SCCAI), A represents the frst step of SCCAI, and B represents the second step of SCCAI. Table II BASIC INFORMATION OF THE USED DATA SETS. name classes attrbutes nstances Breast Hepatts Statlog (Heart) Irs Seeds Wne Knowledge Vehcle Yeast One can see n Table III that the credal classfcaton of PCC and CCAI always produce the lower error rate than the tradtonal FCMI, KNNI and SOMI methods, snce some objects that cannot be correctly classfed usng only the avalable attrbute values have been properly commtted to the meta-classes, whch can well reveal the mprecson of classfcaton. The selected classfers (.e. EK-NN and ENN) for classfcaton of edted patterns n FCMI, KNNI, SOMI and PCC are usually wth the smlar performance 9 More precsely, the samples n each class are randomly assgned to two sets S 1 and S 2 havng equal sze. Then we tran on S 1 and test on S 2, and recprocally.

22 22 n many cases 10, but t s known that the K-NN based method generally has bg computaton burden. The choce of EK-NN and ENN should be made accordng to the actual condton n real applcatons. In CCAI, some objects wth the mputaton of mssng values are stll classfed nto the meta-class. It ndcates that these mssng values play a crucal role n the classfcaton, but the estmaton of these mssng values s no very good. In other words, the mssng values can be flled wth the smlar relabltes by dfferent estmated data, whch lead to dstnct classfcaton results. So we have to cautously assgn them to the meta-class to reduce the rsk of msclassfcaton. Compared wth our prevous method PCC, ths new method CCAI generally provde better performance wth lower error rate and mprecson rate, and t s manly because more accurate estmaton method (.e. SOM+KNN) for mssng values s adopted n CCAI. However, f only the frst step of SCCAI s appled, t produces more msclassfcaton errors that other methods due to the absence of mputaton of mssng data. Whereas, the mprecson rate wll be qute hgh f only the second step of SCCAI s adopted because all the conflctng belefs caused n the combnaton procedure are transferred to the meta-classes. So CCAI wth adaptve mputaton of mssng values can provde a good compromse between the error and mprecson. Ths thrd experment usng real data sets shows the effectveness and nterest of ths new CCAI method wth respect to other methods. V. CONCLUSION A new credal classfcaton method wth adaptve mputaton of mssng values (called CCAI) for dealng wth ncomplete pattern has been presented based on belef functon theory. In the frst step of CCAI method, some objects (ncomplete pattern) are drectly classfed gnorng the mssng values f the specfc classfcaton result can be obtaned, whch effectvely reduces the computaton complexty because t avods the mputaton of the mssng values. However, f the avalable nformaton s not suffcent to acheve a specfc classfcaton of the object n the frst step, we estmate (recover) the mssng values before enterng the classfcaton procedure n a second step. The SOM and K-NN technques are appled to make the estmaton of mssng attrbutes wth a good compromse between the estmaton accuracy and computaton burden. The credal classfcaton n ths work allows the object to belong to dfferent sngleton classes and meta-class (.e. dsjuncton of several classes) wth dfferent masses of belef. Once the object s commtted to a meta-class (e.g. A B), t means that the mssng values cannot be accurately recovered accordng to the context, and the estmaton s not very good. Dfferent estmatons wll lead the object to dstnct classes (e.g. A or B) nvolved n the meta-class. So some other sources of nformaton wll be requred to acheve more precse classfcaton of the object f necessary. The credal classfcaton s 10 EK-NN outperforms ENN sometmes, but ENN can be better n some other cases.

Classification of Incomplete Patterns Based on the Fusion of Belief Functions

Classification of Incomplete Patterns Based on the Fusion of Belief Functions 18th Internatonal Conference on Informaton Fuson Washngton, DC - July 6-9, 2015 Classfcaton of Incomplete Patterns Based on the Fuson of Belef Functons Zhun-ga Lu, Quan Pan, Jean Dezert, Arnaud Martn and