Size: px
Start display at page:

Download ""

Transcription

1 An evidence-theoretic k-nn rule with parameter optimization 1 L. M. Zouhal y and T. Denux Universite detechnologie de Compiegne - U.M.R. CNRS 6599 Heudiasyc BP F Compiegne cedex - France Thierry.Denoeux@hds.utc.fr y Centre International des Techniues Informatiues Lyonnaise des Eaux 1 Technical Report Heudiasyc 97/46. To appear in IEEE Transactions on Systems, Man and Cybernetics.

2 Abstract This paper presents a learning procedure for optimizing the parameters in the evidencetheoretic k-nearest neighbor rule, a pattern classication method based on the Dempster-Shafer theory of belief functions. In this approach, each neighbor of a pattern to be classied is considered as an item of evidence supporting certain hypotheses concerning the class membership of that pattern. Based on this evidence, basic belief masses are assigned to each subset of the set of classes. Such masses are obtained for each of the k nearest neighbors of the pattern under consideration and aggregated using the Dempster's rule of combination. In many situations, this method was found experimentally to yield lower error rates than other methods using the same information. However, the problem of tuning the parameters of the classication rule was so far unresolved. In this paper, we propose to determine optimal or near-optimal parameter values from the data by minimizing an error function. This renement of the original method is shown experimentally to result in substantial improvement of classication accuracy.

3 1 Introduction In the classical approach to statistical pattern recognition, the entities to be classied are assumed to be selected by some form of random experiment. The feature vector describing each entity is then a random vector with well-dened { though unknown { probability density function depending on the pattern category. Based on these densities and on the prior probability of each class, posterior probabilities can be dened, and the optimal Bayes decision rule can then theoretically be used for classifying an arbitrary pattern with minimal expected risk. Since the class-conditional densities and prior probabilities are usually unknown, they need to be estimated from the data. A lot of methods have been proposed for building consistent estimators of the posterior probabilities under various assumptions. However, for nite sample size, the resulting estimates generally do not provide a faithful representation of the fundamental uncertainty pertaining to the class of a pattern to be classied. For example, if only a relatively small number of training vectors is available, and a new pattern is encountered that is very dissimilar from all previous patterns, the uncertainty is uite high and this situation of near-ignorance is not reected by the outputs of a conventional parametric or non parametric statistical classier, whose principle fundamentally relies on asymptotic assumptions. This problem is particularly acute in situations in which decisions need to be made based on weak information, such as commonly encountered in system diagnosis applications, for example. As an attempt to provide an answer to the above problem, it was recently suggested to re-formulate the pattern classication problem by considering the following uestion: Given a training set of nite size containing feature vectors with known (or partly known) classication, and a suitable distance measure, how to characterize the uncertainty pertaining to the class of a new pattern? In a recent paper [2], an answer to this uestion was proposed based on the Dempster-Shafer theory of evidence [12]. The approach consists in considering each neighbor of a pattern to be classied as an item of evidence supporting certain hypotheses concerning the class membership of that pattern. Based on this evidence, basic belief masses are assigned to each subset of the set of classes. Such masses are obtained for each of the k nearest neighbors of the pattern under consideration and aggregated using the Dempster's rule of combination. Given a nite set of actions and losses associated to each action and each class, decisions can then be made by using some generalization of the Bayes decision theory. In many situations, this method was found experimentally to yield lower error rates than other methods based on the same information. However, the problem of optimizing the parameters involved in the classication rule was so far unresolved. In this paper, we propose to determine optimal or near-optimal parameter values from the data by minimizing a certain error function. This renement of the original method is shown experimentally to result in substantial improvement of classication accuracy. The paper is organized as follows. The evidence-theoretic k-nn rule is rst recalled in Section 2. The basic concepts of the Dempster-Shafer theory are assumed to be 1

4 known to the reader who is invited to refer to [12] and [14] for detailed presentations, and to [2] for a short introduction. Section 3 describes the learning procedure, as well as an approximation to it allowing to near-optimize the error function very eciently. Simulation results are then presented in Section 4, and Section 5 concludes the paper. 2 The evidence-theoretic k-nn rule We consider the problem of classifying entities into M categories or classes. The set of classes is denoted by =f! 1 :::! M g. The available information is assumed to consist in a training set T = f(x (1)! (1) ) ::: (x (N )! (N ) )g of Nn-dimensional patterns x (i) i=1 ::: N and their corresponding class labels 1! (i) i=1 ::: N taking values in. The similarity between patterns is assumed to be correctly measured by a certain distance function d( ). Let x be a new vector to be classied on the basis of the information contained in T.Eachpair(x (i)! (i) ) constitutes a distinct item of evidence regarding the class membership of x. Ifx is \close" to x (i) according to the relevantmetricd, then one will be inclined to believe thatbothvectors belong to the same class. On the contrary, if d(x x (i) )isvery large, then the consideration of x (i) will leave us in a situation of almost complete ignorance concerning the class of x. Conseuently, this item of evidence may be postulated to induce a basic belief assignment (BBA) m(jx (i) )over dened by: m(f! gjx (i) ) = (d (i) ) (1) m(jx (i) ) = 1 ; (d (i) ) (2) m(ajx (i) ) = 0 8A 2 2 nf f! gg (3) where d (i) = d(x x (i) ),! is the class of x (i) (! (i) =! ), is a parameter such that 0 <<1 and is a decreasing function verifying (0)=1etlim d!1 (d) =0. Note that m(jx (i) ) reduces to the vacuous belief function (m(jx (i) ) = 1) when the distance between x and x (i) tends to innity, reecting a state of total ignorance. When d denotes the Euclidean distance, a rational choice for was shown in [4] to be: (d) = exp(; d 2 ) (4) being a positive parameter associated to class!. As a result of considering each training pattern in turn, we obtain N BBAs that can be combined using the Dempster's rule of combination to form a resulting BBA m synthesizing one's nal belief regarding the class of x: m = m(jx (1) ) ::: m(jx (N ) ) (5) 1 In this paper, we assume for simplicity the class of each training vector to be known with certainty. The more general situation in which the training set is only imperfectly labeled has been introduced in [2]. However, the problem of optimizing the parameters in the general case is not completely solved yet (see Section 5). 2

5 Since those training patterns situated far from x actually provide very little information, it is sucient to consider the k nearest neighbors of x in this sum. An alternative denition of m is therefore: m = m(jx (i 1) ) ::: m(jx (i k) ) (6) where I k = fi 1 ::: i k g contains the indices of the k nearest neighbors of x in T. Adopting this latter denition, m can be shown [2] to have the following expression: m(f! g) = 1 K m() = 1 K 1 ; Y 8 2f1 ::: Mg MY r=1 i2i k (1 ; (d (i) )) Y 1 A Y r6= Y i2i k r (1 ; r (d (i) )) (7) i2i k r (1 ; r (d (i) )) (8) where I k is the subset of I k corresponding to those neighbors of x belonging to class! and K is a normalizing factor. Hence, the focal elements of m are singletons and the whole frame. Conseuently, the credibility and the plausibility ofeach class! are respectively eual to: bel(f! g)=m(f! g) (9) pl(f! g)=m(f! g)+m() (10) The pignistic probability distribution as dened by Smets [13] is given by: BetP(f! g)= X faj!2ag m(a) jaj = m(f! g)+ m() M (11) for =1 ::: M. Let us now assume that, based on this evidential corpus, a decision has to be made regarding the assignment ofx to a class, and let us denote by the action of assigning x to class!. Let us further assume that the loss incurred in case of a wrong classication is eual to 1, while the loss corresponding to a correct classication is eual to 0. Then, the lower and the upper expected losses [3] associated to action are respectively eual to: R ( jx) =1; pl(f! g) (12) R ( jx) =1; bel(f! g) (13) The expected loss relative to the pignistic distribution is: R bet ( jx) =1; BetP(f! g) (14) Given the particular form of m, the three strategies consisting in minimizing R, R and R bet lead to the same decision in that case: the pattern is assigned to the class with maximum belief assignment. Other decision strategies including the possibility of pattern rejection as well as the existence of unknown classes are studied in [3, 5]. 3

6 3 Parameter optimization 3.1 The approach In the above description of the evidence-theoretic k-nn rule, we left open the uestion of the choice of parameters and =( 1 ::: ) t appearing in Euations 1 and 4. Whereas the value of proves in practice not to be too critical, the tuning of the other parameters was found experimentally to have signicant inuence on classication accuracy. In [2], it was proposed to set =0:95 and to the inverse of the mean distance between training patterns belonging to class!. Although this heuristic yields good results on average, the eciency of the classication procedure can be improved if these parameters are determined as the values optimizing a performance criterion. Such a criterion can be dened as follows. Let us consider a training pattern x (`) belonging to class!. The class membership of x can be encoded as a vector t (`) =(t (`) ::: t(`) 1 M )t of M binary indicator variables t (`) j dened by t (`) j = 1 if j = and t (`) j = 0 otherwise. By considering the k nearest neighbors of x (`) in the training set, one obtains a \leave-oneout" BBA m (`) characterizing one's belief concerning the class of x (`) if this pattern was to be classied using other training patterns. Based on m (`), an output vector P (`) = (BetP (`) (f! 1 g) ::: BetP (`) (f! M g)) t of pignistic probabilities can be computed, BetP (`) being the pignistic probability distribution associated to m (`). Ideally, vector P (`) should as \close" as possible to vector t (`), closeness being dened, for example, according to the suared error E(x (`) ): E(x (`) )=(P (`) ; t (`) ) t (P (`) ; t (`) )= MX (P (`) =1 ; t (`) ) 2 (15) The mean suared error over the whole training set T of size N is nally eual to: E = 1 N NX `=1 E(x (`) ) (16) Function E can be used as a cost function for tuning the parameter vector. The analytical expression for the gradient ofe(x (`) ) with respect to can be calculated, allowing the parameters to be determined iteratively by a gradient search procedure (see Appendix A). Alternatively, the minimum of function E can be approximated in one step for large N using the approach described in the seuel. 3.2 One-step procedure For an arbitrary training pattern x (`) and xed parameters, vector P (`) can be regarded as a function of two vectors: 1. a vector d 2 =(d (i 1)2 ::: d (i k)2 ) t of suared distances between x (`) and its k nearest neighbors, and 4

7 2. a vector! containing the class labels of these neighbors. For small k and large N, d 2 can be assumed to close to zero 2, allowing each component to be approximated by Taylor series expansion around 0 up to the rst order: P (`) P (`) (d 2 ) = P (`) (0)+ r d 2P (`)t d 2 =0 d2 (17) The rst term in this expression can be readily obtained from Euations 7 and 8 by setting d (i) to 0 for all i, which leads to: (1 P (`) ; )k (0) = K (1 ; ) ;k ; 1+ 1 M where k is the number of neighbors of x (`) in class! and K = The computation of the rst order term: (18) (1 ; ) k P M =1 (1 ; );k ; k +1 (19) r d 2P (`)t d 2 =0 d2 = kx (i)2(0)d(i)2 (20) is more involved (see appendix B). This term can be shown to be of the form A (`), where A (`) is a suare matrix of size M. As a conseuence, both E(x (`) ) and E can be approximated by uadratic forms of, which allows the minimum to be approached directly by solving a system of linear euations. Figures 1 and 2 show the uality of this approximation in the case of two Gaussian classes with mean vectors 1 =(20) t and 2 =(;2 0) t, respectively, and covariance matrices 1 = 2 = I. The data set contained 500 samples of each class. Displayed are the mean suared error as a function of ( 1 2 ) (Figure 1) and its uadratic approximation (Figure 2), for k = 5. The minima of the two functions dier by less than %, which proves the relevance of the approximation in that case. Note that the uality of the approximation depends on both k and N. 4 Numerical experiments The performances of the above methods were compared to those of the voting k-nn rule with randomly resolved ties, the distance-weighted k-nn rule [6], the fuzzy k- NN rule proposed by Keller [9], and the evidence-theoretic rule without parameter optimization [2]. Experiments were carried out on a set of standard articial and This assumption is justied by the following result [1]: Regarding the training set as a sample 2 drawn from some probability distribution, the k-th nearest neighbor of x (`) converges to x (`) with probability one as the sample size increases with k xed. 5

8 real-world benchmark classication tasks. The main characteristics of the used data sets are summarized in Table 1. Data sets B 1 and B 2 were generated using a method proposed in [7]. The data consists in three Gaussian classes in 10 dimensions, which diagonal covariance matrices D 1, D 2 and D 3. The i-th diagonal element D i of D is dened as a function of two parameters a and b: D 1i (a b) = a +(b ; a) i ; 1 n ; 1 D 2i (a b) = a +(b ; a) n ; i n ; 1 D 3i (a b) = a +(b ; a) min i n ; i n=2 where n = 10 is the input dimension. The mean vectors and covariance matrices (with n = 10) for the three classes were: 1 =(0 ::: 0) 2 =(1 ::: 1) 3 =(1 ;1 ::: 1 ;1) D 1 (1 10) D 2 (10 1) D 3 (1 10) The ionosphere data set (Ion) was collected by a radar system and consists of phased array of 16 high-freuency antennas with a total transmitted power of the order of 6.4 kilowatts [11]. The targets were free electrons in the ionosphere. \Good" radar returns are those showing evidence of some type of structure in the ionosphere. \Bad" returns are those that do not. The vehicle data set (Veh) was collected from silhouettes by the HIPS (Hierarchical Image Processing System) extension BINATTS. Four model vehicles were used for the experiment: bus, Chevrolet van, Saab 9000 and Opel Manta 400. The data was used to distinguish 3D objects within a 2D silhouette of the objects [11]. The sonar data were used by Gorman and Sejnowski in a study of the classication of sonar signals using a neural network [8]. The task is to discriminate between sonar signals bounced o a metal cylinder and those bounced o a roughly cylindrical rock. Test error rates are represented as a function of k in Figures 3 to 12. The results for synthetic data are averages over 6 independent training sets. Test error rates obtained for the values of k yielding the best classication of training vectors are presented in Table 2. As can be seen from these results, the evidence-theoretic rule with optimized presented in this paper always performed as well or better than the four other rules tested, and signicantly improved at the 90 % condence level over the evidencetheoretic rule with xed on data sets B 1 and B 2 when considering the best results obtained for 1 k 40. However, the most distinctive feature of this rule seems to be its robustness with respect to the number k of neighbors taken in consideration (Figures 3 to 12). By optimizing, the method learns to discard those neighbors whose distance to the pattern under consideration is too high. Practically, this property is of great importance since it relieves the designer of the system from the burden of 6

9 searching for the optimal value of k. When the number of training patterns is large, then the amount of computation may be further reduced by adopting the approximate one-step procedure for optimizing which gives reasonably good results for small k (Figures 3, 5, 7, 9 and 11). However, the use of the exact procedure should be prefered for small and medium-sized training sets. 5 Concluding remarks A techniue for optimizing the parameters in the evidence-theoretic k-nn rule has been presented. The classication rule obtained by this method has proved superior to the voting, distance-weighted and fuzzy rules on a number of benchmark problems. A remarkable property achieved with this approach is the relative insensitivity of the results to the choice of k. The method can be generalized in several ways. First of all, one can assume a more general metric than the Euclidean one considered so far, and apply the principles described in this paper to search for the optimal metric [15]. For instance, let be apositive denite diagonal matrix with diagonal elements 1 ::: n. The distance between an input vector x and a learning vector x (i) belonging to class! can be dened as: d(x x (i) ) = (x ; x (i) ) t (x ; x (i) ) = nx j=1 j (x j ; x (i) j ) 2 The parameters j for 1 M and 1 j n can then be optimized using exactly the same approach as described in this paper, which may in some cases result in further improvement of classication results. A more general form could even be assumed for, with however the risk of a dramatic increase in the number of parameters for large input dimensions. More fundamentally, the method can also be extended to handle the more general situation in which the class membership of training patterns is itself aected by uncertainty. For example, let us assume that the class of each training pattern x (i) is only known to lie in a subset A (i) of (such a situation may typically arise, e.g., in medical diagnosis problems in which some records in a database are related to patients for which only a partial or uncertain diagnosis is available). A natural extension of Euations 1{3 is then: m(a (i) jx (i) ) = (d (i) ) m(jx (i) ) = 1 ; (d (i) ) m(bjx (i) ) = 0 8B 2 2 nf A (i) g with (d) = exp(;d 2 ), being a positive parameter (note that we cannot a dene a separate parameter for each class in this case, since the class of x (i) is only partially known). The BBAs dened in that way correspond to simple belief functions and can 7

10 be combined in linear time with respect to the number of classes. For optimizing, the error criterion dened in Euation 15 has to be generalized in some way. With the same notations as in Section 3.1, a possible expression for the error concerning pattern x (`) is: E(x (`) )= BetP (`) (A (`) ) ; 1 2 which reects the fact that the pignistic probability ofx (`) belonging to A (`),given the other training patterns, should be as high as possible. The value of minimizing the mean error may then be determined using an iterative search procedure. Experiments with this approach are currently under way and will be reported in future publications. 8

11 A Computation of the derivatives of E w.r.t. Let x (`) be a training pattern and m (`) the BBA obtained by classifying x (`) using its k nearest neighbors in the training set. Function m (`) is computed according to Euations 7 and 8: m (`) (f! g) = 1 K m (`) () = 1 K 0 1 ; Y i2i (`) k 8 2f1 ::: Mg MY r=1 Y i2i (`) k r (1 ; (d (` i) )) 1 C A Y r6= Y i2i (`) k r (1 ; r (d (` i) )) (21) (1 ; r (d (` i) )) (22) where I (`) k r denotes the set of indices of the k nearest neighbors of pattern x(`) in class! r, d (` i) is the distance between x (`) and x (i),andk is a normalizing factor. In the following, we shall assume that: (d (` i) ) = exp(; d (` i)2 ) (23) which will simply be denoted by (` i). To simplify the calculations, we further introduce the unnormalized orthogonal sum [13] m dened as (`)0 m (A) (`)0 =Km (`) (A) for all A. We alsodenoteas m (` i) the unnormalized orthogonal sum of the m(jx (j) ) for all j 2 I (`), j 6= i, that is m(`)0 k =m (` i) ^ m(jx (i) ), where ^ denotes the unnormalized orthogonal sum operator. More precisely, wehave: m (`)0 (f! g) = m (`) (f! gjx (i) )( m (` i) (f! g)+ m (` i) ()) + (24) m (`) (jx (i) )m (` i) (f! g) 8 2f1 ::: Mg m (`)0 () = m (` i) ()m(jx (i) ) (25) The error for pattern x (`) is: E(x (`) )= MX (P (`) =1 ; t (`) )2 (26) where t (`) is the class membership vector for pattern x (`) and P (`) is the pignistic probability of class! computed from m (`) as P (`) = m (`) (f! g)+ m(`) (). M The derivative ofe(x (`) ) with respect to each parameter can be computed (`) = X i2i (`) (`) (27) 9

12 (`) ) = = MX r=1 MX (`) (`) r 2(P (`) (`) r ; t (`) r ) (`) (f! r g) + (`) () M # (28) (29) The derivatives in Euation 29 can be computed as: (`) (f! r (`) = 1 K 2 " = 1 K 2 " = MX r=1 = ;d (` i)2 (` i) (30) m (`)0 (f! r g) 0 K! (f! r g) ; m (`)0 (f! 0 K (`)0 (f! r g) ; m 0 () + # # (31) (32) (33) (`)0 (f! r g) =(m (` i) (f! r g)+ m (` i) ()) r ; m (` i) () (35) where is the Kronecker symbol, (`)0 () = ; m (` i) () (36) which completes the calculation of the gradient ofe(x (`) ) w.r.t.. To account for the constraint 0, we introduce new parameters ( = 1 ::: M) such that: =( ) 2 (37) and we (`) ) ) (`) ) 10

13 B Linearization We consider the expansion around d 2 = 0 of P (`) order: P (`) by ataylor series up to the rst (d 2 ) = P (`) (0)+(r d 2P (`) (0)) t d 2 (39) where P (`) (0) isgiven by Euations 18. In the following, we shall compute the rst order term in the above euation, and deduce from that result a method for determining an approximation to the optimal parameter vector. To simplify the notation, the superscript (`) will be omitted from the following calculations. As a result of the denition of the pignistic g) + (40) (i)2 (i)2 The derivatives of m(f! g) and m() can be more conveniently expressed as a function of the unnormalized BBA m 0 g) = 1 (f! g) ; m 0 (i)2 K (i)2 (i)2 To (i)2 Case 1: i 2 I k.we then = 1 (i)2 K ; (i)2 (i)2 0 (f! g) = ; exp(; d (i)2 (i)2 Y r6= Y we need to distinguish two cases: Y j2i k j6=i j2i k r (1 ; exp(; d 0 (i)2 = exp(; d (i)2 ) MY (1 ; exp(; d (j)2 )) 2 r=1 j2i k r j6=i Setting all distances to 0 in the above euations, we 0 (f! (i)2 d 0 (i)2 d 2 Case 2: i 2 I k l l 6=. Wehave: 0 (f! (i)2 = l exp(; l d (i)2 )(1 ; Y (42) ) (43) (1 ; exp(; r d (j)2 )) (44) = ; (1 ; ) k;1 (45) =0 = (1 ; ) k;1 (46) Y j2i k (1 ; exp(; d (j)2 )) 11

14 Y Y r6= j2i k r 0 (i)2 = l exp(; l d (i)2 ) (1 ; exp(; r d (j)2 )) (47) MY Y r=1 j2i k r j6=i Setting the distances to zero in the above 0 (f! (i)2 d 0 (i)2 d 2 =0 = l(1 ; (1 ; ) k )(1 ; )k;k ;1 =0 where k = ji k j. The derivatives of K are simply obtained as follows: (i)2 = M X =1 X M = =0 (1 ; exp(; r d (j)2 )) (48) (49) = l (1 ; ) k;1 (50) r=1 0 (f! (i)2 (i)2 (51) (1 ; (1 ; ) k r )(1 ; ) k;k r ;1 It follows from the preceding calculations that, for i 2 I k r, the derivatives of m 0 (f! g), m 0 () and K for d 2 = 0 are proportional to r.sincem 0 (f! g), m 0 () and K do not themselves depend on for d 2 = 0, the derivative ofp is also proportional to r. Hence, we have: X i2i k (i)2 2 (52) =0 d(i)2 = A r r (53) for all r 2f1 ::: Mg, A r being some constant not depending on. Conseuently, we can write: P = P (0)+ and, expressing this result in matrix form: MX r=1 A r r (54) P = P (0)+A (55) with A =(A i j ) is a suare matrix of size M. The above calculations have been performed for an arbitrary training pattern x. Reintroducing the pattern index `, we have: P (`) = P (`) (0)+A (`) (56) 12

15 Introducing these terms into the mean suared error, we have: E = 1 N = 1 N = 1 N NX `=1 NX `=1 NX `=1 (P (`) ; t (`) ) t (P (`) ; t (`) ) (P (`) (0) ; t (`) + A (`) ) t (P (`) (0) ; t (`) + A (`) ) (57) (P (`) (0) ; t (`) ) t (P (`) (0) ; t (`) )+2 t A (`)t (P (`) (0) ; t (`) )+ (58) t A (`)t A (`) The gradient E with respect to is therefore given by: r E = 1 N " NX A (`)t (P (`) (0) ; t (`) )+ NX `=1 `=1 A (`)t A (`) # (59) Minimizing E under the constraint 0 is a nonnegative least suares problem that may be solved eciently using, for instance, the algorithm described in [10, page 161]. 13

16 References [1] T. M. Cover and P. E. Hart. Nearest neighbor pattern classication. IEEE Transactions on Information Theory, IT-13(1):21{27, [2] T. Denux. A k-nearest neighbor classication rule based on Dempster-Shafer theory. IEEE Transactions on Systems, Man and Cybernetics, 25(05):804{813, [3] T. Denux. Analysis of evidence-theoretic decision rules for pattern classication. Pattern Recognition, 30(7):1095{1107, [4] T. Denux. Application du modele des croyances transferables en reconnaissance de formes. Traitement du Signal (In press), [5] T. Denux and G. Govaert. Combined supervised and unsupervised learning for system diagnosis using Dempster-Shafer theory. In P. Borne et al., editor, CESA'96 IMACS Multiconference. Symposium on Control, Optimization and Supervision, volume 1, pages 104{109, Lille, July [6] S. A. Dudani. The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, SMC-6:325{327, [7] J. H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84:165{175, [8] R. P. Gorman and T. J. Sejnowski. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks, 1:75{89, [9] J. M. Keller, M. R. Gray, and J. A. Givens. A fuzzy k-nn neighbor algorithm. IEEE Transactions on Systems, Man and Cybernetics, SMC-15(4):580{585, [10] C. L. Lawson and R. J. Hanson. Solving Least Suares Problems. Prentice-hall, [11] P. M. Murphy and D. W. Aha. UCI Reposition of machine learning databases [Machine-readable data repository]. University of California, Departement of Information and Computer Science., Irvine, CA, [12] G. Shafer. A mathematical theory of evidence. Princeton University Press, Princeton, N.J., [13] P. Smets. The combination of evidence in the Transferable Belief Model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(5):447{458, [14] P. Smets and R. Kennes. The Transferable Belief Model. Articial Intelligence, 66:191{243,

17 [15] L. M. Zouhal. Contribution a l'application de la theorie des fonctions de croyance en reconnaissance des formes. PhD thesis, Universite de Technologie de Compiegne,

18 Biography Lalla Meriem Zouhal Lalla Meriem Zouhal received a M.S. in electronics from the Faculte des Sciences Hassan II, Casablanca, Morocco, in 1990, a DEA (Dipl^ome d'etudes Approfondies) in System Control from the Universite de Technologie de Compiegne, France, in 1992, and a PhD from the same institution in Her research interests concern pattern classication, Dempster-Shafer theory and Fuzzy logic. Thierry Denux Thierry Denux graduated in 1985 as an engineer from the Ecole Nationale des Ponts et Chaussees in Paris, and earned a PhD from the same institution in He obtained the \Habilitation a diriger des Recherches" from the Institut National Polytechniue de Lorraine in From 1989 to 1992, he was employed by the Lyonnaise des Eaux water company, where he was in charge of research projects concerning the application of neural networks to forecasting and diagnosis. Dr. Denux joined the Universite detechnologie de Compiegne as an assistant professor in His research interests include articial neural networks, statistical pattern recognition, uncertainty modeling and data fusion. 16

19 List of Tables 1 Main characteristics of data sets: number of classes (M), training set size (N), test set size (N t ) and input dimension (n). :::::::::: 18 2 Test error rates obtained with the voting, distance-weighted, fuzzy and evidence-theoretic classication rules for the best value of k (in brackets), with 90 % condence intervals. ETF: evidence-theoretic classier with xed ETO: evidence-theoretic classier with optimized. : : : 18 List of Figures 1 Contour lines of the error function for dierent values of, using the gradient-descent method. The optimal value is =(0:84 0:46) t. : : : 19 2 Contour lines of the error function for dierent values of, using the linearization method for pignistic probabilities vectors. The optimal value is =(0:76 0:6) t. :::::::::::::::::::::::::: 19 3 Test error rates on data sets B 1 as a function of k, for the ETF (x.) and ETO k-nn rules: (:) gradient method and (- -) linearization method Test error rates on data sets B 1 as a function of k, for the voting (-), ETO(:), fuzzy (- -) and distance-weighted (-.) k-nn rules. ::::::: 20 5 Test error rates on data sets B 2 as a function of k, for the ETF (x.) and ETO k-nn rules: (:) gradient method and (- -) linearization method Test error rates on data sets B 2 as a function of k, for the voting (-), ETO(:), fuzzy (- -) and distance-weighted (-.) k-nn rules. ::::::: 21 7 Test error rates on ionosphere data as a function of k, for the ETF (x.) and ETO k-nn rules: (:) gradient method and (- -) linearization method Test error rates on ionosphere data as a function of k, for the voting (-), ETO(:), fuzzy (- -) and distance-weighted (-.) k-nn rules. ::::: 22 9 Test error rates on vehicle data as a function of k, for the ETF (x.) and ETO k-nn rules: (:) gradient method and (- -) linearization method Test error rates on vehicle data as a function of k, for the voting (-), ETO(:), fuzzy (- -) and distance-weighted (-.) k-nn rules. ::::::: Test error rates on sonar data as a function of k, for the ETF (x.) and ETO k-nn rules: (:) gradient method and (- -) linearization method Test error rates on sonar data as a function of k, for the voting (-), ETO(:), fuzzy (- -) and distance-weighted (-.) k-nn rules. ::::::: 24 17

20 Table 1: Main characteristics of data sets: number of classes (M), training set size (N), test set size (N t ) and input dimension (n). data set M N N t n B B Ion veh Son Table 2: Test error rates obtained with the voting, distance-weighted, fuzzy and evidence-theoretic classication rules for the best value of k (in brackets), with 90 % condence intervals. ETF: evidence-theoretic classier with xed ETO: evidencetheoretic classier with optimized. data set voting ETF ETO weighted fuzzy B (8) (13) (27) (21) (5) B (14) (13) (17) (15) (16) Ion (1) (2) (8) (1) (1) Veh (3) (4) (3) (6) (3) Son (1) (1) (6) (1) (1) 18

21 Figure 1: Contour lines of the error function for dierent values of, using the gradient-descent method. The optimal value is =(0:84 0:46) t Figure 2: Contour lines of the error function for dierent values of, using the linearization method for pignistic probabilities vectors. The optimal value is = (0:76 0:6) t. 19

22 error rate k Figure 3: Test error rates on data sets B 1 as a function of k, for the ETF (x.) and ETO k-nn rules: (:) gradient method and (- -) linearization method error rate k Figure 4: Test error rates on data sets B 1 as a function of k, for the voting (-), ETO(:), fuzzy (- -) and distance-weighted (-.) k-nn rules. 20

23 error rate k Figure 5: Test error rates on data sets B 2 as a function of k, for the ETF (x.) and ETO k-nn rules: (:) gradient method and (- -) linearization method error rate k Figure 6: Test error rates on data sets B 2 as a function of k, for the voting (-), ETO(:), fuzzy (- -) and distance-weighted (-.) k-nn rules. 21

24 error rate k Figure 7: Test error rates on ionosphere data as a function of k, for the ETF (x.) and ETO k-nn rules: (:) gradient method and (- -) linearization method error rate k Figure 8: Test error rates on ionosphere data as a function of k, for the voting (-), ETO(:), fuzzy (- -) and distance-weighted (-.) k-nn rules. 22

25 error rate k Figure 9: Test error rates on vehicle data as a function of k, for the ETF (x.) and ETO k-nn rules: (:) gradient method and (- -) linearization method error rate k Figure 10: Test error rates on vehicle data as a function of k, for the voting (-), ETO(:), fuzzy (- -) and distance-weighted (-.) k-nn rules. 23

26 error rate k Figure 11: Test error rates on sonar data as a function of k, for the ETF (x.) and ETO k-nn rules: (:) gradient method and (- -) linearization method error rate k Figure 12: Test error rates on sonar data as a function of k, for the voting (-), ETO(:), fuzzy (- -) and distance-weighted (-.) k-nn rules. 24

Handling imprecise and uncertain class labels in classification and clustering

Handling imprecise and uncertain class labels in classification and clustering Handling imprecise and uncertain class labels in classification and clustering Thierry Denœux 1 1 Université de Technologie de Compiègne HEUDIASYC (UMR CNRS 6599) COST Action IC 0702 Working group C, Mallorca,

More information

A novel k-nn approach for data with uncertain attribute values

A novel k-nn approach for data with uncertain attribute values A novel -NN approach for data with uncertain attribute values Asma Trabelsi 1,2, Zied Elouedi 1, and Eric Lefevre 2 1 Université de Tunis, Institut Supérieur de Gestion de Tunis, LARODEC, Tunisia trabelsyasma@gmail.com,zied.elouedi@gmx.fr

More information

Handling Dierent Forms of. Uncertainty in Regression Analysis: a Fuzzy Belief Structure Approach. Universite de Technologie de Compiegne,

Handling Dierent Forms of. Uncertainty in Regression Analysis: a Fuzzy Belief Structure Approach. Universite de Technologie de Compiegne, Handling Dierent Forms of Uncertainty in Regression Analysis: a Fuzzy Belief Structure Approach Simon Petit-Renaud and Thierry Denu Universite de Technologie de Compiegne, HeuDiaSyc-UMR C.N.R.S. 6599,

More information

Pairwise Classifier Combination using Belief Functions

Pairwise Classifier Combination using Belief Functions Pairwise Classifier Combination using Belief Functions Benjamin Quost, Thierry Denœux and Marie-Hélène Masson UMR CNRS 6599 Heudiasyc Université detechnologiedecompiègne BP 059 - F-6005 Compiègne cedex

More information

A generic framework for resolving the conict in the combination of belief structures E. Lefevre PSI, Universite/INSA de Rouen Place Emile Blondel, BP

A generic framework for resolving the conict in the combination of belief structures E. Lefevre PSI, Universite/INSA de Rouen Place Emile Blondel, BP A generic framework for resolving the conict in the combination of belief structures E. Lefevre PSI, Universite/INSA de Rouen Place Emile Blondel, BP 08 76131 Mont-Saint-Aignan Cedex, France Eric.Lefevre@insa-rouen.fr

More information

arxiv: v1 [cs.ai] 28 Oct 2013

arxiv: v1 [cs.ai] 28 Oct 2013 Ranking basic belief assignments in decision making under uncertain environment arxiv:30.7442v [cs.ai] 28 Oct 203 Yuxian Du a, Shiyu Chen a, Yong Hu b, Felix T.S. Chan c, Sankaran Mahadevan d, Yong Deng

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

1/sqrt(B) convergence 1/B convergence B

1/sqrt(B) convergence 1/B convergence B The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been

More information

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999 In: Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Methods and Applications (CIMA), International Computer Science Conventions Rochester New York, 999 Feature Selection Based

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

arxiv: v1 [cs.cv] 11 Jun 2008

arxiv: v1 [cs.cv] 11 Jun 2008 HUMAN EXPERTS FUSION FOR IMAGE CLASSIFICATION Arnaud MARTIN and Christophe OSSWALD arxiv:0806.1798v1 [cs.cv] 11 Jun 2008 Abstract In image classification, merging the opinion of several human experts is

More information

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting

More information

Learning from data with uncertain labels by boosting credal classifiers

Learning from data with uncertain labels by boosting credal classifiers Learning from data with uncertain labels by boosting credal classifiers ABSTRACT Benjamin Quost HeuDiaSyC laboratory deptartment of Computer Science Compiègne University of Technology Compiègne, France

More information

Model Theory Based Fusion Framework with Application to. Multisensor Target Recognition. Zbigniew Korona and Mieczyslaw M. Kokar

Model Theory Based Fusion Framework with Application to. Multisensor Target Recognition. Zbigniew Korona and Mieczyslaw M. Kokar Model Theory Based Framework with Application to Multisensor Target Recognition Abstract In this work, we present a model theory based fusion methodology for multisensor waveletfeatures based recognition

More information

Analysis of Multiclass Support Vector Machines

Analysis of Multiclass Support Vector Machines Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern

More information

A vector from the origin to H, V could be expressed using:

A vector from the origin to H, V could be expressed using: Linear Discriminant Function: the linear discriminant function: g(x) = w t x + ω 0 x is the point, w is the weight vector, and ω 0 is the bias (t is the transpose). Two Category Case: In the two category

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Comparison of Log-Linear Models and Weighted Dissimilarity Measures

Comparison of Log-Linear Models and Weighted Dissimilarity Measures Comparison of Log-Linear Models and Weighted Dissimilarity Measures Daniel Keysers 1, Roberto Paredes 2, Enrique Vidal 2, and Hermann Ney 1 1 Lehrstuhl für Informatik VI, Computer Science Department RWTH

More information

Introduction to machine learning

Introduction to machine learning 1/59 Introduction to machine learning Victor Kitov v.v.kitov@yandex.ru 1/59 Course information Instructor - Victor Vladimirovich Kitov Tasks of the course Structure: Tools lectures, seminars assignements:

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition Halima Bensmail Universite Paris 6 Gilles Celeux INRIA Rh^one-Alpes Abstract Friedman (1989) has proposed a regularization technique

More information

Estimating the sample complexity of a multi-class. discriminant model

Estimating the sample complexity of a multi-class. discriminant model Estimating the sample complexity of a multi-class discriminant model Yann Guermeur LIP, UMR NRS 0, Universit Paris, place Jussieu, 55 Paris cedex 05 Yann.Guermeur@lip.fr Andr Elissee and H l ne Paugam-Moisy

More information

Functions. Marie-Hélène Masson 1 and Thierry Denœux

Functions. Marie-Hélène Masson 1 and Thierry Denœux Clustering Interval-valued Proximity Data using Belief Functions Marie-Hélène Masson 1 and Thierry Denœux Université de Picardie Jules Verne Université de Technologie de Compiègne UMR CNRS 6599 Heudiasyc

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Introduction to belief functions

Introduction to belief functions Introduction to belief functions Thierry Denœux 1 1 Université de Technologie de Compiègne HEUDIASYC (UMR CNRS 6599) http://www.hds.utc.fr/ tdenoeux Spring School BFTA 2011 Autrans, April 4-8, 2011 Thierry

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o Category: Algorithms and Architectures. Address correspondence to rst author. Preferred Presentation: oral. Variational Belief Networks for Approximate Inference Wim Wiegerinck David Barber Stichting Neurale

More information

Feature selection. c Victor Kitov August Summer school on Machine Learning in High Energy Physics in partnership with

Feature selection. c Victor Kitov August Summer school on Machine Learning in High Energy Physics in partnership with Feature selection c Victor Kitov v.v.kitov@yandex.ru Summer school on Machine Learning in High Energy Physics in partnership with August 2015 1/38 Feature selection Feature selection is a process of selecting

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE

NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Series A, OF THE ROMANIAN ACADEMY Volume 0, Number /009, pp. 000 000 NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE

More information

Gaussian Mixture Distance for Information Retrieval

Gaussian Mixture Distance for Information Retrieval Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,

More information

Exploiting k-nearest Neighbor Information with Many Data

Exploiting k-nearest Neighbor Information with Many Data Exploiting k-nearest Neighbor Information with Many Data 2017 TEST TECHNOLOGY WORKSHOP 2017. 10. 24 (Tue.) Yung-Kyun Noh Robotics Lab., Contents Nonparametric methods for estimating density functions Nearest

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Evidence combination for a large number of sources

Evidence combination for a large number of sources Evidence combination for a large number of sources Kuang Zhou a, Arnaud Martin b, and Quan Pan a a. Northwestern Polytechnical University, Xi an, Shaanxi 710072, PR China. b. DRUID, IRISA, University of

More information

Hierarchical Proportional Redistribution Principle for Uncertainty Reduction and BBA Approximation

Hierarchical Proportional Redistribution Principle for Uncertainty Reduction and BBA Approximation Hierarchical Proportional Redistribution Principle for Uncertainty Reduction and BBA Approximation Jean Dezert Deqiang Han Zhun-ga Liu Jean-Marc Tacnet Abstract Dempster-Shafer evidence theory is very

More information

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a preprint version which may differ from the publisher's version. For additional information about this

More information

Learning from partially supervised data using mixture models and belief functions

Learning from partially supervised data using mixture models and belief functions * Manuscript Click here to view linked References Learning from partially supervised data using mixture models and belief functions E. Côme a,c L. Oukhellou a,b T. Denœux c P. Aknin a a Institut National

More information

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

A Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification

A Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification A Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification Santanu Sen 1 and Tandra Pal 2 1 Tejas Networks India Ltd., Bangalore - 560078, India

More information

On belief functions implementations

On belief functions implementations On belief functions implementations Arnaud Martin Arnaud.Martin@univ-rennes1.fr Université de Rennes 1 - IRISA, Lannion, France Xi an, July, 9th 2017 1/44 Plan Natural order Smets codes General framework

More information

Learning Conditional Probabilities from Incomplete Data: An Experimental Comparison Marco Ramoni Knowledge Media Institute Paola Sebastiani Statistics

Learning Conditional Probabilities from Incomplete Data: An Experimental Comparison Marco Ramoni Knowledge Media Institute Paola Sebastiani Statistics Learning Conditional Probabilities from Incomplete Data: An Experimental Comparison Marco Ramoni Knowledge Media Institute Paola Sebastiani Statistics Department Abstract This paper compares three methods

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

An Adaptive Bayesian Network for Low-Level Image Processing

An Adaptive Bayesian Network for Low-Level Image Processing An Adaptive Bayesian Network for Low-Level Image Processing S P Luttrell Defence Research Agency, Malvern, Worcs, WR14 3PS, UK. I. INTRODUCTION Probability calculus, based on the axioms of inference, Cox

More information

Decision fusion for postal address recognition using belief functions

Decision fusion for postal address recognition using belief functions Decision fusion for postal address recognition using belief functions David Mercier, Genevieve Cron, Thierry Denoeux, Marie-Hélène Masson To cite this version: David Mercier, Genevieve Cron, Thierry Denoeux,

More information

Eects of domain characteristics on instance-based learning algorithms

Eects of domain characteristics on instance-based learning algorithms Theoretical Computer Science 298 (2003) 207 233 www.elsevier.com/locate/tcs Eects of domain characteristics on instance-based learning algorithms Seishi Okamoto, Nobuhiro Yugami Fujitsu Laboratories, 1-9-3

More information

The Unnormalized Dempster s Rule of Combination: a New Justification from the Least Commitment Principle and some Extensions

The Unnormalized Dempster s Rule of Combination: a New Justification from the Least Commitment Principle and some Extensions J Autom Reasoning manuscript No. (will be inserted by the editor) 0 0 0 The Unnormalized Dempster s Rule of Combination: a New Justification from the Least Commitment Principle and some Extensions Frédéric

More information

Dempster's Rule of Combination is. #P -complete. Pekka Orponen. Department of Computer Science, University of Helsinki

Dempster's Rule of Combination is. #P -complete. Pekka Orponen. Department of Computer Science, University of Helsinki Dempster's Rule of Combination is #P -complete Pekka Orponen Department of Computer Science, University of Helsinki eollisuuskatu 23, SF{00510 Helsinki, Finland Abstract We consider the complexity of combining

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

How to implement the belief functions

How to implement the belief functions How to implement the belief functions Arnaud Martin Arnaud.Martin@univ-rennes1.fr Université de Rennes 1 - IRISA, Lannion, France Autrans, April, 5 2011 1/45 Plan Natural order Smets codes General framework

More information

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules. Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present

More information

Potential Cases, Methodologies, and Strategies of Synthesis of. Solutions in Distributed Expert Systems. Minjie Zhang

Potential Cases, Methodologies, and Strategies of Synthesis of. Solutions in Distributed Expert Systems. Minjie Zhang Potential Cases, Methodologies, and Strategies of Synthesis of Solutions in Distributed Expert Systems Minjie Zhang School of Computer, and Information Sciences Edith Cowan University, WA 6027, Australia

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically

More information

The intersection probability and its properties

The intersection probability and its properties The intersection probability and its properties Fabio Cuzzolin INRIA Rhône-Alpes 655 avenue de l Europe Montbonnot, France Abstract In this paper we introduce the intersection probability, a Bayesian approximation

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 5, Number 3-4, Pages 351 358 c 2009 Institute for Scientific Computing and Information STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

More information

Abstract. In this paper we propose recurrent neural networks with feedback into the input

Abstract. In this paper we propose recurrent neural networks with feedback into the input Recurrent Neural Networks for Missing or Asynchronous Data Yoshua Bengio Dept. Informatique et Recherche Operationnelle Universite de Montreal Montreal, Qc H3C-3J7 bengioy@iro.umontreal.ca Francois Gingras

More information

Tracking and Identification of Multiple targets

Tracking and Identification of Multiple targets Tracking and Identification of Multiple targets Samir Hachour, François Delmotte, Eric Lefèvre, David Mercier Laboratoire de Génie Informatique et d'automatique de l'artois, EA 3926 LGI2A first name.last

More information

Minimum Entropy Data Partitioning. Exhibition Road, London SW7 2BT, UK. in even a 3-dimensional data space). a Radial-Basis Function (RBF) classier in

Minimum Entropy Data Partitioning. Exhibition Road, London SW7 2BT, UK. in even a 3-dimensional data space). a Radial-Basis Function (RBF) classier in Minimum Entropy Data Partitioning Stephen J. Roberts, Richard Everson & Iead Rezek Intelligent & Interactive Systems Group Department of Electrical & Electronic Engineering Imperial College of Science,

More information

A NEW CLASS OF FUSION RULES BASED ON T-CONORM AND T-NORM FUZZY OPERATORS

A NEW CLASS OF FUSION RULES BASED ON T-CONORM AND T-NORM FUZZY OPERATORS A NEW CLASS OF FUSION RULES BASED ON T-CONORM AND T-NORM FUZZY OPERATORS Albena TCHAMOVA, Jean DEZERT and Florentin SMARANDACHE Abstract: In this paper a particular combination rule based on specified

More information

A new generalization of the proportional conflict redistribution rule stable in terms of decision

A new generalization of the proportional conflict redistribution rule stable in terms of decision Arnaud Martin 1, Christophe Osswald 2 1,2 ENSIETA E 3 I 2 Laboratory, EA 3876, 2, rue Francois Verny, 29806 Brest Cedex 9, France. A new generalization of the proportional conflict redistribution rule

More information

The cautious rule of combination for belief functions and some extensions

The cautious rule of combination for belief functions and some extensions The cautious rule of combination for belief functions and some extensions Thierry Denœux UMR CNRS 6599 Heudiasyc Université de Technologie de Compiègne BP 20529 - F-60205 Compiègne cedex - France Thierry.Denoeux@hds.utc.fr

More information

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Perceptron. (c) Marcin Sydow. Summary. Perceptron Topics covered by this lecture: Neuron and its properties Mathematical model of neuron: as a classier ' Learning Rule (Delta Rule) Neuron Human neural system has been a natural source of inspiration for

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

g(.) 1/ N 1/ N Decision Decision Device u u u u CP

g(.) 1/ N 1/ N Decision Decision Device u u u u CP Distributed Weak Signal Detection and Asymptotic Relative Eciency in Dependent Noise Hakan Delic Signal and Image Processing Laboratory (BUSI) Department of Electrical and Electronics Engineering Bogazici

More information

Sequential adaptive combination of unreliable sources of evidence

Sequential adaptive combination of unreliable sources of evidence Sequential adaptive combination of unreliable sources of evidence Zhun-ga Liu, Quan Pan, Yong-mei Cheng School of Automation Northwestern Polytechnical University Xi an, China Email: liuzhunga@gmail.com

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G.

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G. In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and J. Alspector, (Eds.). Morgan Kaufmann Publishers, San Fancisco, CA. 1994. Convergence of Indirect Adaptive Asynchronous

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

arxiv:cs/ v1 [cs.ai] 6 Sep 2004

arxiv:cs/ v1 [cs.ai] 6 Sep 2004 The Generalized Pignistic Transformation Jean Dezert Florentin Smarandache Milan Daniel ONERA Dpt.of Mathematics Institute of Computer Science 9 Av. de la Div. Leclerc Univ. of New Mexico Academy of Sciences

More information

Necessary Corrections in Intransitive Likelihood-Ratio Classifiers

Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Gang Ji and Jeff Bilmes SSLI-Lab, Department of Electrical Engineering University of Washington Seattle, WA 9895-500 {gang,bilmes}@ee.washington.edu

More information

A Static Evidential Network for Context Reasoning in Home-Based Care Hyun Lee, Member, IEEE, Jae Sung Choi, and Ramez Elmasri, Member, IEEE

A Static Evidential Network for Context Reasoning in Home-Based Care Hyun Lee, Member, IEEE, Jae Sung Choi, and Ramez Elmasri, Member, IEEE 1232 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 6, NOVEMBER 2010 A Static Evidential Network for Context Reasoning in Home-Based Care Hyun Lee, Member,

More information

The internal conflict of a belief function

The internal conflict of a belief function The internal conflict of a belief function Johan Schubert 1 Abstract In this paper we define and derive an internal conflict of a belief function We decompose the belief function in question into a set

More information

Multi-hypothesis nearest-neighbor classifier based on class-conditional weighted distance metric

Multi-hypothesis nearest-neighbor classifier based on class-conditional weighted distance metric Multi-hypothesis nearest-neighbor classifier based on class-conditional weighted distance metric Lianmeng Jiao, Quan Pan, Xiaoxue Feng To cite this version: Lianmeng Jiao, Quan Pan, Xiaoxue Feng. Multi-hypothesis

More information

Any Reasonable Cost Function Can Be Used for a Posteriori Probability Approximation

Any Reasonable Cost Function Can Be Used for a Posteriori Probability Approximation Any Reasonable Cost Function Can Be Used for a Posteriori Probability Approximation Marco Saerens, Patrice Latinne & Christine Decaestecker Université Catholique de Louvain and Université Libre de Bruxelles

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Uncertainty and Rules

Uncertainty and Rules Uncertainty and Rules We have already seen that expert systems can operate within the realm of uncertainty. There are several sources of uncertainty in rules: Uncertainty related to individual rules Uncertainty

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

Selection of Classifiers based on Multiple Classifier Behaviour

Selection of Classifiers based on Multiple Classifier Behaviour Selection of Classifiers based on Multiple Classifier Behaviour Giorgio Giacinto, Fabio Roli, and Giorgio Fumera Dept. of Electrical and Electronic Eng. - University of Cagliari Piazza d Armi, 09123 Cagliari,

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Reasoning with Uncertainty

Reasoning with Uncertainty Reasoning with Uncertainty Representing Uncertainty Manfred Huber 2005 1 Reasoning with Uncertainty The goal of reasoning is usually to: Determine the state of the world Determine what actions to take

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information