BAYESIAN network is a popular learning tool for decision

Size: px
Start display at page:

Download "BAYESIAN network is a popular learning tool for decision"

Transcription

1 Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Self-Adaptive Probability Estimation for Naive Bayes Classification Jia Wu, Zhihua Cai, and Xingquan Zhu Abstract Probability estimation from a given set of training examples is crucial for learning Naive Bayes (NB) Classifiers. For an insufficient number of training examples, the estimation will suffer from the zero-frequency problem which does not allow NB classifiers to classify instances whose conditional probabilities are zero. Laplace-estimate and M-estimate are two common methods which alleviate the zero-frequency problem by adding some fixed terms to the probability estimation to avoid zero conditional probability. A major issue with this type of design is that the fixed terms are pre-specified without considering the uniqueness of the underlying training data. In this paper, we propose an Artificial Immune System (AIS) based self-adaptive probability estimation method, namely AISENB, which uses AIS to automatically and self-adaptively select the optimal terms and values for probability estimation. The unique immune system based evolutionary computation process, including initialization, clone, mutation, and crossover, ensure that AISENB can adjust itself to the data without explicit specification of functional or distributional forms for the underlying model. Experimental results and comparisons on 36 benchmark datasets demonstrate that AISENB significantly outperforms traditional probability estimation based Naive Bayes classification approaches. I. INTRODUCTION BAYESIAN network is a popular learning tool for decision making [1]. Naive Bayes (NB) [2] as a special case of a Bayesian network has been popularly used in many realworld learning tasks, especially for high dimensional data such as text classification [3] and web mining [4]. Given a training set D = {x 1,, x N } with N instances, each of which contains n attribute values and a class label. We use x i = {x i,1, x i,j, x i,n, y i } to denote the ith instance in the dataset D, with x i,j denoting the jth attribute value and y i denoting the class label of the instance. The class space Y = {c 1,, c k,, c L } denotes the set of labels each instance belonging to and c k denotes the kth label of the class space. The attribute of the dataset is denoted by A = {a 1,, a j,, a n }, where a j denotes the jth attribute. Each attribute can be a discrete random variable (with a number of discrete values) or a continuous random variable. In this paper, we only focus on categorical (or nominal) attributes, and for any attribute a j, we use Jia Wu is with Quantum Computation & Intelligent Systems Research Centre, Faculty of Engineering & Information Technology, University of Technology Sydney, Australia, and is affiliated with the Department of Computer Science, China University of Geosciences Wuhan, China. ( {jia.wu}@student.uts.edu.au). Xingquan Zhu is with Quantum Computation & Intelligent Systems Research Centre, Faculty of Engineering & Information Technology, University of Technology Sydney, Australia, ( {xingquan.zhu}@uts.edu.au). Zhihua Cai is with the Department of Computer Science, China University of Geosciences Wuhan, China. ( {zhcai}@cug.edu.cn). This work is supported, in part, by Australian Research Council Discovery Project under Grant No. DP , and by National Natural Science Foundation of China under Grant No a τ j, τ = 1,, a j to denote the τth attribute value of a j and a j denotes the total number of distinct values of a j. For each instance x i, its value satisfies x i,j a j. For ease of understanding, we also use (x i, y i ) as a shorthand to represent an instance and its class label and use x i as a shorthand of x i. We also use a j as a shorthand to represent attribute j. For an instance (x i, y i ) in the training set D, its class label satisfies y i Y, whereas a test instance x t only contains attribute values and its class label y t needs to be predicted by the learning model. By using conditional independency assumption, NB can follow the rules as defined in Eq.(1). c(x t ) = arg max c k Y P (c k ) n P (x t,j c k ) (1) j=1 Building an NB network is not difficult because one only needs to calculate class probability p(c k ) and conditional probability values P (x t,j c k ) from the training examples D. When calculating conditional probability value P (x t,j c k ), a necessary step is to observe the distribution of the random variable x t,j conditioned by the given class label c k. Depending on whether attribute a j is a discrete or a continuous random variable, the conditional probabilities are modeled either by some continuous probability distributions over the range of the attribute s values or by converting numeric attribute values into discrete space by using Discretization approaches. Because using continuous probability distribution requires a predefined distribution model (such as Gaussian or Normal distributions) to be employed in the learning process, the adoption of the wrong distribution models can severely deteriorate the learning model, discretization is the most common solution for handling continuous attributes in NB. The limitation of using Discretization is that it does not classify instances on which conditional probabilities of the attribute value is zero [5]. This is because Eq.(1) becomes zero, so no class can be selected to classify instance x t. Many reasons can result in zero conditional probability. The most common reason is the zero-frequency which means that the value x t,j practically exists in attribute a j s domain (i.t. x t,j a j, but this value (x t,j ) never appears in the training data D. In practice, this frequently happens when the size of the training dataset is small or the domain of an attribute a j is large [6]. In order to solve the problem, Laplace-estimate[7] and M-estimate[8] are usually applied for probability estimation by adding a very small value to each conditional probability value. Jiang et al. [9] conducted an extensive empirical study on the performance of several commonly used Laplace-estimate and M-estimate when different Bayesian classifiers (NB [2], TAN [10], AODE [11], and HNB [1]) are used as the base classifier, respectively /13/$ IEEE 2303

2 Lowd and Domingos [12] proposed a Naive Bayes model with M-estimate as an alternative to Bayesian networks for general probability estimation tasks. While M-estimate based NB (MENB) has shown to outperform Laplace-estimate (LENB) in some cases, MENB has two important parameters m and p which have to be carefully defined and have significant impact on the performance of the NB classifier. The two parameters are correlated with m being a positive integer, called the equivalent sample size, defining the parameter for controlling the shift towards p. Choosing suitable parameter values for m and p is, unfortunately, a problem dependent task and requires users to have good experience. Despite of their important roles, there is no consistent methodology to help identify optimal m and p values for probability value estimation, and most of the time, those values are arbitrarily set within some predefined ranges. For example, the m value is often empirically set to 1. p is the base rate or prior of estimate of the probability. One typical method to set p is to use a uniform distribution hypothetically [13]. In [9], Jiang et al. proposed a nested M- estimate method. Zadrozny and Elkan [14] set p to a constant 1/10. These parameter setting methods for M-estimate have achieved some good accuracy performance in some specific datasets. However, in many real-word applications, this assumption is often violated. Motivated by the above observations, in this paper, we propose to use Artificial Immune System (AIS) [15] mechanism for self-adaptive probability estimation for Naive Bayes classification. Our method will use AIS principles to design an automated search strategy to find optimal m and p parameters for each dataset. The unique immune system computation process, including initialization, clone, mutation, and crossover, ensures that our method can adjust itself to the data without any explicit specification of functional or distributional form for the underlying model. The experiments and comparisons, on 36 UCI benchmark datasets [16] which are commonly used to validate classification algorithms [17], demonstrate that AIS based probability Estimation for Naive Bayes (AISENB) classification can successfully find optimal parameter combinations for probability value estimation, and AISENB consistently outperforms other state-of-the-art NB algorithms. The remainder of the paper is structured as follows. Section II reviews related work on probability estimation in NB and briefly describes the artificial immune systems and its connection to the parameter search. In Section III, we propose a new estimation structure for Naive Bayes, include the calculation of the best value for conditional probabilities using artificial immune systems algorithm. Section IV reports experimental setups and comparisons, and we conclude the paper in Section V. II. RELATED WORK A. Probability Estimation for Naive Bayes Naive Bayesian Classifier can handle both nominal and numeric attributes. For a numeric attribute a j, it is normally discretized into a number of intervals over the range of the attribute values, so the whole attribute a j is treated as nominal (or categorical) for probability estimation. A basic approach to estimate probability can be defined as n k = x i D;y i=c k 1; p(c k ) = n k N (2) (3) n k n (j,τ) k = x i D;y i=c k;x i,j=a τ j 1 (4) where N is the total number of instances in the training set D, n k denotes the number of instances in D whose class labels equals to c k, and n (j,τ) k denotes the number of instances in D whose class label equals to c k and the jth attribute value equals to a τ j, as defined in Eq.(4). 1) Laplace-estimate: The limitation of the size of the training data or the use of Discretization may raise complication that it does not classify instances whose conditional probabilities have the zero-frequency problem. One way to solve the zero-frequency is Laplace-estimate. This estimation method introduces a prior probability for each attribute a j such that no attribute has a zero conditional probability values, as given in Eq.(5) and Eq.(6). p(c k ) = n k N + L n k + a j where, a j is the number of distinct values of attribute a j and L is the number of classes in D. 2) M-estimate: In Laplace-estimate, the fixed terms are introduced to p(c k ) and p(a τ j c k) without taking the size of different classes into consideration. For example, for a binary classification problem (L=2), if the majority class have 1000 instances and the minority class has 2 instances only. The ratio between without and with Laplace-estimate for majority class is ( ) ( ) = = 0.999, which is very trivial, whereas for the minority class the ratio 1 is ( ) ( ) = 1 2 = 0.5, which is very significant. In other words, Laplace-estimate has much larger impact on minor classes, compared to the majority classes [12]. To solve the problem, M-estimate introduces two parameters m and p to adjust the impact of the extra terms in the probability estimation as follows. p(c k ) = n k + m p (7) N + m + m p n k + m where p is the base rate or prior of estimate of the probability, and m (also called the equivalent sample size) is a positive integer controlling the shift towards p. To further take the class size into consideration, Jiang et al. [13] set p to a uniform distribution, namely, m = 1 and (5) (6) (8) 2304

3 Fig. 1. A simple view of immune response: When a B-cell (the middle rings on the left) recognizing a antigen (lozenge) with certain affinity, the system will respond and result in proliferation, differentiation and variation process of the B-cell then secretory antibodies. Antibodies with high affinity becomes memory cells. The others become effector cells. p = 1 a. For p(c j k), m = 1 and p = 1 L. Then we can rewrite Eq.(7) and Eq.(8) as follows: p(c k ) = n k + 1/(L) N /( a j ) n k + 1 (9) (10) Zadrozny and Elkan [14] propose to set m and p values to constants, such that m = 1 and p = 1/10. p(c k ) = n k + 1/10 N /10 n k + 1 (11) (12) Jiang et al. [9] propose a parameter setting method, where the p(c k ) is the same as Eq.(9) but for p(a τ j c k) the parameter p is set to p(a τ j ), where p(aτ j ) can be estimated by M- estimate again, which can be defined as follows. p(a τ j ) = n(j,τ) + m p N + m ; n(j,τ) = x i D;x i,j=a τ j 1 (13) where m = 1 and p = 1 a j, and n(j,τ) denotes the number of instances in D whose jth attribute value equals to a τ j. Meanwhile Eq.(8) can be rewritten to estimate probabilities of p(a τ j c k) as follows. p(a τ j c k ) = n(j,τk) + (n (j,τ) + 1/ a i )/(N + 1) n k + 1 B. AIS: Artificial Immune Systems (14) The human immune system contains two major parts: (1) humoral immunity, which deals with infectious agents in the blood and body tissues, and (2) cell-mediated immunity, which deals with body cells that have been infected. In general, the humoral system is managed by B-cells (with help from T-cells), and the cell-mediated system is managed by T-cells [18]. In this paper, humoral immunity is delegated to the natural immune system and the action of T-cells is not explained. When pathogens invade the body, antibodies that are produced from B-cells will respond for the detection of a foreign protein or antigen [19]. This response process could be explained by clonal selection theory [20], the details of which can be showed in Figure 1. The clonal selection followed by the B-cells of human immune system is the fundamental mechanism on which AIS is modeled. When AIS is used for classification, shape-space representation method, which aims at quantitatively describing the integrations among immune cells, is commonly used for modeling antibodies and antigens [21]. AIS has been well used in various areas of research including pattern recognition [22], clustering [23], optimization [24] and Remote Sensing [25]. However, few applications have been reported in Bayesian network. In this paper, we propose a new probability estimate method based on AIS, which has high classification accuracy performance for M-estimate for NB. III. AISENB: ARTIFICIAL IMMUNE SYSTEM BASED PROBABILITY ESTIMATION FOR NB A. Problem Definition In this paper, we focus on the calculation of the conditional probability p(x t,j c k ) and class probability p(c k ) by using M-estimate with optimal parameter values for m and p. While all existing M-estimate based approaches arbitrarily define the m and p values without considering the uniqueness of the underlying training data, we intend to solve the optimal m and p value selection problem as an optimization process. From Eq.(7) and Eq.(8), in order to obtain the class label for a test instance in test set T ( x t T ), each conditional probability p(x t,j c k ); j = 1,, n; k = 1,, L (n denotes the number of attributes and L denotes the number of class labels) and class probability p(c k ) need to be calculated. Assume that the calculation of each conditional probability value p(x t,j c k ) has an optimal < m j, p j > values, there are n m j and p j vectors (< m j, p j >, j = 1,, n) needed to finish the classification process, while for class probability p(c k ) one < m j, p j > vector is needed. For ease of understanding, we use a single vector < m, p >, m =< m 1,, m n+1 >, p =< p 1,, p n+1 > to denote the n + 1 m j and p j pairs (j = 1,, n + 1). As a result, the NB classification based on M-estimate can be translated to an optimization problem as follows. c(x t ) = arg max P (c k ) n P (x t,j c k ) c k C j=1 s.t. p(x t,j c k ) = n(j,t) k +m j p j n k+m j, 1 j n, (15) p(c k ) = nk+mj pj N+m j, j = n + 1, m j {1,, + }, 0 < p j 1 where tuple < m j, p j > denote the jth value of < m, p >. B. AISENB In order to use AIS to improve the performance of M- estimate method for NB, we regard the < m, p > value for probability estimation as immune response process, and 2305

4 TABLE I MAPPING BETWEEN THE IMMUNE SYSTEM AND AISENB Immune Response AISENB Antigens Training instances in datasets Antibody Parameters vector < m, p > Shape-space Possible values of the data vectors Affinity The classification accuracy using the vector < m, p > on the testing datasets Clonal Expansion Reproduction of parameters vectors that are well matched with antigens Affinity Maturation Specific mutation and crossover of < m, p > vector and removal of lowest stimulated parameter vectors. Immune Memory Memory set of mutated and crossed parameter vectors Metadynamics Continual removal and creation of parameter vectors Training Data Set Instance 1 Instance 2 AISENB Preprocessing and initialization of antibodies set W * Clone Process Evolve the antibody pupation W * * Develop memory cellw c and complete the training of this antigen in generation N Stopping Condition use the AIS to obtain the optimal m and p vector. Antigens in AISENB are simulated as feature attribute vectors which are presented to the system during the training and testing process. In particular, AISENB has its specific representation in Naive Bayes classification. The antibodies as candidate < m, p > vectors which has good affinity, known as classification accuracy, will experience a form of clonal expansion after being presented with an input data (analogous to antigens). After antibodies are cloned, they will go through a mutation process, in which specific mutation function will be designed. In order to increase the diversity of population in evolution and ensure the algorithms to have the ability of searching for the global optimum solution, individual crossoperation, which is exclusively designed for our probability estimation problem, is adopted in AISENB. The evolving optimization process of the AIS system will help discover the candidate < m, p > vector which has the best classification accuracy for NB classification. Table I summarizes the mapping between the immune system and AISENB. AISENB, which is similar to Laplace-estimated and M- estimated method, adjusts the parameters m and p adaptively. Before introducing the detailed algorithm design, we briefly define following notations. Let W = {w 1,, w H } represents the set of antibodies. Where, H represents the size of antibodies, in which w h represents a single antibody. We use w h = {w h,1,, w h,n+1 } to denote the hth antibody in antibody sets W, with wh,j denoting the jth value of the hth antibody w h. Let w c represent the memory cell, which has the best affinity. In AISENB, W presents vector sets with m and p, < M, P >= {< m 1, p 1 >,, < m H, p H >}. w h in AISENB, presented by < m h, p h >= {< m h,1, p h,1 >,, < m h,n+1, p h,n+1 >}, denotes the hth antibody in < M, P >, with < m h,j, p h,j > (analogous to wh,j ) denotes the jth value of antibody (w h or < m h, p h >). w c in AISENB presents that < m c, p c > with the best classification accuracy. Training set D a = {x a 1,, x a N a} represents the set of antigens, with N a antigens, in which x a i represents the ith antigen. Fig. 2. Y Obtain the optimal * memory cell Classifying test data sets using * w c w c AISENB classification system. We use AIS method to learn the optimal w c in AISENB, with no assumption or information about the parameter in AISENB (we expect that AIS can help us find the optimal parameters automatically). After obtaining the best individual w c, we build the AISENB classifier to classify test data. The detailed process of our new algorithm AISENB is described as follows: 1) Initialization: For individuals in W, we should first determine the antibody population size H, and make sure that every individual w h, h = 1, 2,, H in antibody population is generated through certain random mechanisms. The p h,j value of w h for each individual is set a random number distributed between (0, 1], while m h,j value of w h is set to a random positive integer. Besides, in the experiment a certain portion (e.g.80% ) of train instances D are used as a antigens sets D a to learn the w c and the remaining instances form test set D b. 2) Clone of AISENB: The calculation of affinity function: The affinity of the hth individual of the tth generation (w h )t is the classification accuracy that is obtained by AISENB using the (w h )t to carry out the probability estimation. Calculation of affinity function can be described as f[(w h) t ] = 1 N b δ[c(x b i), yi b ] (16) N b i=1 where, c(x b i ) is the classification result of the ith instance in test dataset D b with N b instances, using the AISENB classifier based on individual (w h )t. yi b is the actual class value of the ith instance. δ[c(x b i ), yb i ] is one if c(x b i ) = yb i, and zero otherwise. Antibody Selection: We sort individuals in Initial antibody population according to the affinity of each individual, and then choose the individual (w c) t with the best affinity performance in tth generation as the memory antibody. 2306

5 Algorithm 1 AISENB (Estimation method by AIS) Input: Clone Factor c; Threshold T, Crossover Factor CR; Maximum Evolution Generation MaxGen, Test affinity set D b ; Antibody Population W, Antigen Population D a ; Output: The target class label c(x t) of test instance x t; 1: W The p h,j value of w h for each individual is set a random number distributed between (0, 1], while m h,j value of w h is set to a random positive integer. 2: while t MaxGen and f[(w c )t+1 ] f[(w c )t ] T do 3: f[(w h )t ] Apply antigen population D a, test affinity set D b to antibody (w h )t, and calculate the affinity of (w h )t. 4: (w c )t Apply the sequence of each f[(w h )t ] to the whole antibody population (W ) t and find the (w c )t with the best affinity. 5: (W,r ) t Select the c antibodies with the lowest affinity and obtain the temporary antibody set. 6: (W,c ) t Clone (w c )t with clone factor c and obtain clone antibody set. 7: (W ) t ((W ) t (W,r ) t ) (W,c ) t ; 8: for all each (w h )t in (W ) t do 9: (v h )t+1 Apply two randomly antibody (w r1 )t and (w r2 )t to (w h )t and obtain the mutation individual. 10: (u h )t+1 Apply CR and (v h )t+1 to (w h )t and obtain the crossover individual. 11: (w h )t+1 Apply (u h )t+1 to (w h )t and obtain the new individual in t + 1th generation. 12: end for 13: end while 14: c(x t) Apply w c to instance xt and predict its class label. Antibody Clone: To ensure that the population size of every generation is fixed, the best individual (w c) t will be cloned under the clone factor c. After that, the individuals with low affinity are replaced using the clone set, according to the same rate c, to preserve the population size. 3) Evolution of AISENB.: Antibody Mutation: Using the mutate operation to treat the individuals in tth generation (W ) t. It means that we get the middle generation composed with the new variation individuals from the parent generation. For any individual (w h )t from the tth generation, the new variation individual (v h )t+1 can be generated as follows: (v h) t+1 = (w h) t + F [(w r1) t (w r2) t ] (17) Among them, r1 and r2 are randomly selected integers, which are not equal to i. F, as the variation factor during the process of evolution, can be adaptively obtained according to different clones [25]. F = 1 f[(w h) t ] (18) where f[(w h )t ] denotes the affinity of the hth individual of the tth generation. In the mutation process, it is necessary to maintain its zero mean, because it can reduce the possibility of infeasible solutions. Antibody Crossover: To avoid the loss of diversity among individuals in the population, and enhance the speed of convergence of AISENB. we must cross the hth individual of the tth generation and its corresponding variation vector obtained by mutation strategy. To ensure that the acquired trail individual vector in crossover operation has the evolution feature, at least one dimension of (m h,j ) t+1 and (p h,j ) t+1 in variable vectors (u h )t+1 should be provided by those in mutation vector (v h )t+1. We use the cross probability factor CR to decide which dimension variable should be provided by mutation individual vector and which one should be provided by target individual vector. The crossover can be described as (u h,j) t+1 = { (v h,j ) t+1, rand(j) CR or j = rand(h) (w h,j) t, rand(j) > CR and j rand(h) (19) where, rand(j) is a uniformly distributed random number between [0, 1], 0 j < n + 1. n is the number of attributes. randm(h) is a random integer between [1, n+1], which is to ensure that at least one dimension variable for m and p of the trail vector is provided by the variation vector (v h )t+1. Otherwise, it can not get the new individual leading to the result that the target vector (w h )t and the individual vector obtained by crossover operation are the same. 4) Antibody Population Update: To determine whether the crossover individual (u h )t+1 can replace the target individual vector (w h )t as a new individual (w h )t+1 in the t + 1th generation, AIS algorithm adopts a greedy search strategy. More specifically, a crossover individual is chosen as the offspring, if its affinity (u h )t+1 is better than that of the target individual (w h )t, otherwise, the individual (w h )t is maintained in the t + 1th generation. The system chooses the individual (w c) t+1 with the best affinity performance in t + 1th generation as the new memory antibody. The evolutionary process for the population includes three steps from step 2 to step 4. AIS repeats this process, until the algorithm evolution surpasses the pre-set maximum number MaxGen or the same optimal result is obtained continuously for more than a given threshold number T in the running process. IV. EXPERIMENTAL RESULTS AND ANALYSIS A. Experimental Conditions We conduct our experiments using WEKA [26] and validate the algorithm performance on 36 UCI benchmark datasets [16], which represent a wide range of domains and data characteristics and are described in Table II. In our experiments, we replace all missing attribute values using the unsupervised attribute filter ReplaceMissingV alues in WEKA, and apply unsupervised filter Discretize in WEKA to discretize numeric attributes into nominal attributes. We continue by introducing baseline algorithms and their abbreviations in our experiments. MENB with symbol P o denotes the underlying approach using the m and p setting according our experimental analysis. 1. MENB Eq 9,10 : NB with M-estimate (MENB) using the setting of m and p in the literature [13] by using Eq.(9) and Eq.(10). 2. MENB Eq 9,14 : MENB using the setting of m and p in the literature [9] by using Eq.(9) and Eq.(14). 2307

6 TABLE II DETAILED INFORMATION OF EXPERIMENTAL DATA Dataset Instances Attributes Classes Missing Numeric anneal Y Y anneal.orig Y Y audiology Y N autos Y Y balance-scale N Y breast-cancer Y N breast-w Y N colic Y Y colic.orig Y Y credit-a Y Y credit-g N Y diabetes N Y Glass N Y heart-c Y Y heart-h Y Y heart-statlog N Y hepatitis Y Y hypothyroid Y Y ionosphere N Y iris N Y kr-vs-kp N N labor Y Y letter N Y lymph N Y mushroom Y N primary-tumor Y N segment N Y sick Y Y sonar N Y soybean Y N splice N N vehicle N Y vote Y N vowel N Y waveform N Y zoo N Y 3. MENB Eq 11,12 : MENB using the setting of m and p in the literature [14] by using Eq.(11) and Eq.(12). 4. MENB P o [1,0.0001] : MENB optimal setting of (m = 1, p = ). 5. MENB P o [1,0.05] : MENB optimal setting of (m = 1, p = 0.05). 6. MENB P o [7,0.1] : MENB optimal setting of (m = 7, p = 0.1). 7. MENB P o [8,0.1] : MENB optimal setting of (m = 8, p = 0.1). 8. LNB : NB with Laplace-estimate. 9. AISENB : Artificial Immune System based selfadaptive probability estimation (the proposed method). In our experiments, the maximum evolution generation MaxGen is set to 100 and the size of antibody population N a is set to 50. Clone rate c is generally set to 5%. The Crossover factor CR in the cross process is set to 0.6. Threshold T is set to B. Cross-test for Parameter Setting This part of experiment evaluates the results of using different m and p values (the parameters selection in M- estimate) for Naive Bayes classification. We compare algorithm performance based on Laplace-estimate and M- estimate using Eq.(7) and Eq.(8), and carry out the Crosstest m values (m = 1, 2,, 8) and p values (p = , 0.005,, 0.5) via 10 runs of 10-fold cross validation. Figure 3 reports the average classification accuracy of MENB (with m and p values from the above interval) on the entire 36 datasets. The results show that the best parameter setting for the whole 36 datasets is < m = 1, p = 0.05 > (average accuracy = 82.74%). Average accuracy % on 36 standard UCI datasets p Fig. 3. The average accuracy of LENB (82.16%)and MENB with different m and p values on 36 UCI benchmark datasets. (With varying m values (m = 1, 2,, 8) and p values (m = , ,, 0.5). While the above analysis shows that m = 1 and p = 0.05 result in good performance, for individual dataset, this parameter setting may not guarantee the best performance. In order to prove this hypothesis, we consider < m = 1, p = 0.05 > as a possible optimal parameter combination, and carry out another experimental analysis and try to find other optimal parameter combinations. We analyze the frequency of each optimal parameter m, p and their combinations on every dataset, the result is shown in Table III. The frequency value in Table III shows that the parameter combination < m = 1, p = > is optimal on four datasets ( 4 of the upper left corner for example). The result asserts that attribute dependency and distributions are different as the dataset varies. According to the result in Table III, the single parameter with the maximum frequency is m = 1 (the frequency is 17) compared with other m values, and p = 0.1 ( the frequency is 18). So, < m = 1, p = 0.1 > is considered a second optimal combination. This type of experimental optimal parameters setting is the same as settings in [14] for M-estimate. < m = 1, p = >, < m = 7, p = 0.1 > and < m = 8, p = 0.1 > are regarded as the possible optimal parameter combination settings with there frequency is 4. On the basis of our discussion above, we can easily find other four possible optimal parameter combinations < m = 1, p = >, < m = 7, p = 0.1 >, < m = 8, p = 0.1 > and TABLE III FREQUENCY OF EACH OPTIMAL PARAMETER m, p COMBINATIONS p m RELATED TO 36 UCI DATASETS m=1 m=2 m=3 m=4 m=5 m=6 m=7 m=8 Frequency p= p= p= p= p= p= p= p= Frequency m

7 TABLE IV EXPERIMENTAL RESULTS OF AISENB VERSUS NAIVE BAYES WITH LAPLACE-ESTIMATE (LENB), M-ESTIMATE USING (m, p) SETTING IN THE LITERATURES [13] (MENB-EQ 9,10 ), [9] (MENB-EQ 9,14 ), [14] (MENB-EQ 11,12 ) AND USING (m, p) SETTING ACCORDING OUR EXPERIMENTAL ANALYSIS MENB-PO [1,0.05], MENB-PO [1,0.0001], MENB-PO [7,0.1] AND MENB-PO [8,0.1] : CLASSIFICATION ACCURACY AND STANDARD DEVIATION Dataset AISENB LENB MENB-Eq 9,10 MENB-Eq 9,14 MENB-Eq 11,12 MENB-Po [1,0.05] MENB-Po [1,0.0001] MENB-Po [7,0.1] MENB-Po [8,0.1] anneal 98.02± ±2.23 * 96.34±1.80 * 96.76±2.09 * 96.86±1.72 * 97.35±1.54 * 98.12± ±1.82 * 95.74±1.84 * anneal.orig 88.54± ± ± ± ± ± ± ±2.19 * 84.11±2.23 * audiology 77.15± ±6.37 * 75.74± ± ± ±6.50 * 73.25±7.27 * 46.55±2.71 * 46.28±2.28 * autos 70.77± ±11.35 * 66.12±11.12 * 68.13± ± ± ± ±11.42 * 65.09±11.28 * balance-scale 91.44± ± ± ± ± ± ± ± ±1.30 breast-cancer 72.73± ± ± ± ± ± ± ± ±7.84 breast-w 97.31± ± ± ± ± ± ± ± ±1.75 colic 81.16± ± ± ± ± ± ±6.38 * 79.54± ±6.05 colic.orig 74.05± ± ± ± ± ± ± ± ±6.98 credit-a 84.52± ± ± ± ± ± ± ± ±3.86 credit-g 75.95± ± ± ± ± ± ± ± ±3.70 diabetes 75.14± ± ± ± ± ± ± ± ±4.83 glass 58.14± ± ± ± ± ± ± ± ±10.11 heart-c 83.35± ± ± ± ± ± ±7.08 * 83.35± ±6.10 heart-h 83.95± ± ± ± ± ± ± ± ±5.96 heart-statlog 84.32± ± ± ± ± ± ±6.20 * 83.96± ±5.43 hepatitis 85.29± ± ± ± ± ± ± ± ±8.57 hypothyroid 93.44± ±0.73 * 92.68±0.75 * 92.83±0.69 * 92.82±0.70 * 92.84±0.71 * 92.92±0.66 * 93.38± ±0.56 ionosphere 90.80± ± ± ± ± ± ± ± ±4.28 iris 95.00± ± ± ± ± ± ± ± ±6.64 kr-vs-kp 91.10± ±1.91 * 87.81±1.90 * 87.80±1.90 * 87.80±1.90 * 87.80±1.90 * 87.80±1.90 * 87.78±1.87 * 87.76±1.88 * labor 93.73± ± ± ± ± ± ± ± ±8.56 letter 67.50± ±2.04 * 67.14±1.97 * 67.30± ±1.97 * 67.22±1.95 * 67.58± ±1.97 * 66.06±2.01 * lymph 85.27± ± ± ± ± ± ± ± ±9.43 mushroom 99.67± ±2.03 * 95.99±1.58 * 98.22±1.06 * 96.13±1.51 * 97.15±1.50 * 99.53± ±1.94 * 93.79±1.95 * primary-tumor 47.44± ± ± ± ± ± ± ±4.85 * 41.54±5.01 * segment 90.84± ±1.66 * 90.07±1.65 * 90.07±1.64 * 90.07±1.65 * 90.17±1.64 * 90.93± ±1.65 * 89.18±1.65 * sick 97.50± ±0.91 * 96.94±0.84 * 97.12±0.78 * 97.01±0.82 * 97.12±0.79 * 97.25± ±0.83 * 97.03±0.81 * sonar 76.28± ± ± ± ± ± ± ± ±9.89 soybean 94.20± ±3.23 * 93.53± ± ± ± ± ±2.90 * 90.85±2.89 * splice 95.46± ± ± ± ± ± ± ± ±1.14 vehicle 61.41± ± ± ± ± ± ± ± ±3.48 vote 90.69± ± ± ± ± ± ± ± ±3.93 vowel 67.26± ± ± ± ± ± ± ± ±4.74 waveform ± ± ± ± ± ± ±3.37 * 79.92± ±2.96 zoo 96.34± ± ± ± ± ± ± ±4.65 * 90.70±4.41 * Average 83.24± ± ± ± ± ± ± ± ±4.62 w/t/l - 0/26/10 0/28/8 0/30/6 0/29/7 0/28/8 0/29/7 0/24/12 0/24/12 : Statistically significant degradation via 10 runs of 10-fold cross validation with a 95% confidence level using t-test. < m = 1, p = 0.05 >. C. Accuracy analysis for MENB in connection with LENB The experimental result in Figure 3 shows that different parameters settings in M-estimate can affect the classifier performance. NB with M-estimate (MENB) overall performs better than the classifier with Laplace-estimate (LENB). Compared with LENB (accuracy 82.16%), MENB (m = 1, p 0.1), MENB (m = 2, p 0.1), MENB (m = 3, p 0.1) and MENB (m = 4, 0.01 p 0.1) have higher average classification accuracy. MENB (m = 1, p = 0.05, 82.74%) has much better accuracy performance than that of other parameter settings. The classification accuracy of MENB is inversely proportional to the m value. The overall performance on 36 datasets shows that for any given p value the larger the m value, the lower the classification accuracy is. In other words, MENB with m = 1 is optimal. This is consistent with the previous studies, where the m value in literatures [13], [9] and [14] are all set to 1. D. The Accuracy of AISENB In this section, we report our AISENB based method for estimating probability using simple evaluation model, the M-estimate scheme (using m and p value settings in the literature [13], [9] and [14]), M-estimate (with the possible optimal parameters setting though our experimental analysis) and the Laplace-estimate scheme. Table IV reports the accuracies of AISENB, MENB, and LENB. For MENB and LENB, we use m and p values reported in the literature. The average accuracy and standard deviation are summarized at the bottom of the table. A * symbol in Table IV indicates that the classification performance of this algorithm is statistically and significantly lower than AISENB in the table (at 95% confidence level under t-test). The w/t/l value in Table IV reports the number of times that the algorithm in the corresponding row wins, ties, and loses in all 36 benchmark datasets, compared to the corresponding AISENB algorithm. The detailed results in Table IV show that the proposed method AISENB is competitive with Naive Bayes classifier with Laplace-estimate, M-estimate in the literature, and M- estimate with optimal parameter settings via our experimental analysis. Several major findings can be highlighted as follows. 1. AISENB significantly outperforms LENB with 10 wins and 0 loss. The average classification accuracy on 36 datasets for AISENB (83.24%) is higher than LENB 2309

8 (82.16%). 2. AISENB outperforms MENB-Eq 9,10 with 8 wins and 0 loss, MENB-Eq 9,14 with 6 wins and 0 loss, and MENB-Eq 11,12 with 7 wins and 0 loss. The average classification accuracy of AISENB (83.24%) is higher than MENB-Eq 9,10 (82.60%), MENB-Eq 9,14 (82.63%), MENB-Eq 11,12 (82.68%). 3. AISENB significantly outperforms MENB-Po [7,0.1] (12 wins and 0 losses), MENB-Po [8,0.1] (12 wins and 0 losses), and MENB-Po [1,0.05] (8 wins and 0 losses), and outperforms MENB-Po [1,0.0001] in Accuracy (7 wins and 0 losses). The average classification accuracy of AISENB (83.24%) is higher than MENB-Po [7,0.1] (81.40%), MENB-Po [8,0.1] (81.22%), MENB-Po [1,0.05] (82.74%) and MENB-Po [1,0.0001] (82.61%). Considering that AISENB has an adaptive probability estimation mechanism whereas MENB requires a number of parameter settings, AISENB is overall more effective and stable than existing NB methods. It is worth mentioning that existing research [28] has found strong attribute dependencies in kr-vs-kp dataset, and our method AISENB achieves 91.10% accuracy, whereas the accuracy of other methods is about 87.80%. V. CONCLUSION In this paper, we studied two typical probability estimation methods, Laplace-estimate (LENB) and M-estimate (MEN- B), for Naive Bayes classification. Our analysis shows that MENB has better performance than LENB, but its accuracy is unstable with respect to different m and p parameter values. The unstable performance with respect to different parameter settings motivated us to design a self-adaptive parameter selection algorithm for probability estimation in NB classification. To address this challenge, we proposed to use artificial immune system (AIS) based method to adaptively estimate the probability, and validated the proposed design on 36 benchmark datasets. The experimental results and comparisons demonstrated that that our method (AISENB) outperforms the state-of-the-art probability estimation algorithms and can indeed achieve optimal parameter selection for different datasets. REFERENCES [1] L. Jiang, H. Zhang and Z. Cai, A Novel Bayes Model: Hidden Navie Bayes, IEEE Transactions on Knowledge and Data Engineering, vol. 21, pp , [2] H.H. Shan and A. Banerjee, Mixed-membership naive Bayes models, Data Mining and Knowledge Discovery,vol. 23, pp. 1-62, [3] S.B. Kim, K.S. Han, H.C. Rim and K.U Seoul, Some Effective Techniques for Naive Bayes Text Classification, IEEE Transactions on Knowledge and Data Engineering, vol. 18, pp , [4] C. Zhang, G.R. Xue, Y. Yu and H.Y. Zha, Web-scale classification with naive bayes, in Proceedings of the 18th international conference on World wide web, WWW09, ACM press, USA, 2009, pp [5] G.I. Webb, J.R. Boughton, F. Zheng, K.M. Ting and H. Salem, Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification, Machine Learning, vol. 86, pp , [6] J. Duan, Z. Lin, W. Yi and M. Lu, Scaling Up the Accuracy of Bayesian Classifier Based on Frequent Itemsets by M-estimate, in Proceeding of Artificial Intelligence and Computational Intelligence, AICI 10, Springer Press, China, 2010, pp [7] B. Cestnik, Estimating Probabilities: A Crucial Task in Machine Learning, in Proceedings of the 9th European Conference on Artificial Intelligence, ECAI 90, IOS Press, Sweden, 1990, pp [8] T.M. Mitchell, Machine learning, McGraw-Hill Publishers, [9] L. Jiang, D. Wang and Z. Cai, Scaling Up the Accuracy of Bayesian Network Classifiers by M-Estimate, in Proceedings of the 3rd International Conference on Intelligent Computing, ICIC 07, Springer Press, China, 2007, pp [10] N. Friedman, D. Geiger and M. Goldszmidt, Bayesian Network Classifiers, Machine Learning, vol. 29, pp , [11] G.I. Webb, J. Boughton and Z. Wang, Not So Naive Bayes: Aggregating One-Dependence Estimators, Machine Learning, vol. 58 pp. 5-24, [12] D. Lowd, P. Domingos, Naive Bayes models for probability estimation, in Proceeding of 22nd International Conference on Machine Learning, ICML 05, ACM Press, Bonn, Germany, 2005, pp [13] L. Jiang, H. Zhang, Z.H. Cai and D. Wang, Weighted Averaged One-Dependence Estimators, Journal of experimental and Theoretical Artificial Intelligence, vol. 24, pp , [14] B. Zadrozny and C. Elkan, Learning and making decisions when costs and probabilities are both unknown, in Proceedings of 7th ACM SIGKDD international conference on Knowledge discovery and data mining, SIGKDD 01, ACM Press, USA, 2001, pp [15] L.N. De Castro and J. Timmis, Artificial Immune Systems: A New Computational Intelligence Approach, Springer Verlag, [16] C. Merz, P. Murphy, and D. Aha, UCI repository of machine learning databases. in Dept of ICS, University of California, Irvine, [17] G.B. Huang, X.J. Ding and H.M. Zhou, Optimization method based extreme learning machine for classification, Neurocomputing, vol. 74, pp , [18] A. Watkins, J. Timmis, Artificial Immune Recognition System (AIRS): Revisions and Refinements, in: Proceedings of 1st International Conference on Artificial Immune Systems, ICARIS 20, Canterbury, England, 2002, pp [19] J. Yang, X.J. Liu, T. Li, G. Liang and S.J. Liu, Distributed agents model for intrusion detection based on AIS, Knowledge-Based Systems, vol. 22, pp , [20] R.H. Shang, L.C. Jiao, F. Liu and W.P. Ma, A Novel Immune Clonal Algorithm for MO Problems, IEEE Transactions on Evolutionary Computation, vol. 16, pp , [21] S. Ozsen and S. Gunes, Attribute weighting via genetic algorithms for attribute weighted artificial immune system (AWAIS) and its application to heart disease and liver disorders problems, Expert Systems with Applications, vol. 36, pp , [22] J.S. Yuan, L.W. Zhang, C.Z. Zhao, Z. Li and Y.H. Zhang, An Improved Self-organization Antibody Network for Pattern Recognition and Its Performance Study, Pattern Recognition, vol. 321, pp , [23] L. de Mello Honorio, A.M.L. da Silva and D.A. Barbosa, A Cluster and Gradient-Based Artificial Immune System Applied in Optimization Scenarios, IEEE Transactions on Evolutionary Computation, vol. 16, pp , [24] K.M. Woldemariam, Vaccine-Enhanced Artificial Immune System for Multimodal Function Optimization, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 40, pp , [25] Y.F. Zhong and L.P. Zhang, An Adaptive Artificial Immune Network for Supervised Classification of Multi-/Hyperspectral Remote Sensing Imagery, IEEE Transactions on Geoscience and remote sensing, vol. 50, pp , [26] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques (2nd ed.), San Francisco, CA: Morgan Kaufmann, [27] C. Nadeau and Y. Bengio, Inference for the generalization error, Machine Learning, vol. 52, pp , [28] R. Kohavi, Scaling Up the Accuracy of Naive-Bayes Classifiers:A Decision-Tree Hybrid, in Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, KDD 96, AAAI Press, USA, 1996, pp

Voting Massive Collections of Bayesian Network Classifiers for Data Streams

Voting Massive Collections of Bayesian Network Classifiers for Data Streams Voting Massive Collections of Bayesian Network Classifiers for Data Streams Remco R. Bouckaert Computer Science Department, University of Waikato, New Zealand remco@cs.waikato.ac.nz Abstract. We present

More information

Received: 8 July 2017; Accepted: 11 September 2017; Published: 16 September 2017

Received: 8 July 2017; Accepted: 11 September 2017; Published: 16 September 2017 entropy Article Attribute Value Weighted Average of One-Dependence Estimators Liangjun Yu 1,2, Liangxiao Jiang 3, *, Dianhong Wang 1,2 and Lungan Zhang 4 1 School of Mechanical Engineering and Electronic

More information

An Empirical Study of Building Compact Ensembles

An Empirical Study of Building Compact Ensembles An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.

More information

Abstract. 1 Introduction. Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand

Abstract. 1 Introduction. Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand Í Ò È ÖÑÙØ Ø ÓÒ Ì Ø ÓÖ ØØÖ ÙØ Ë Ð Ø ÓÒ Ò ÓÒ ÌÖ Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand eibe@cs.waikato.ac.nz Ian H. Witten Department of Computer Science University

More information

A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index

A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index Dan A. Simovici and Szymon Jaroszewicz University of Massachusetts at Boston, Department of Computer Science, Boston,

More information

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules. Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present

More information

On Multi-Class Cost-Sensitive Learning

On Multi-Class Cost-Sensitive Learning On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou and Xu-Ying Liu National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract

More information

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Kaizhu Huang, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

On Multi-Class Cost-Sensitive Learning

On Multi-Class Cost-Sensitive Learning On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou, Xu-Ying Liu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract

More information

LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules

LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules Zhipeng Xie School of Computer Science Fudan University 220 Handan Road, Shanghai 200433, PR. China xiezp@fudan.edu.cn Abstract LBR is a highly

More information

Optimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005

Optimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005 IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers using ROC Analysis Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / pie@zurich.ibm.com ICML 2005 August 9, 2005 To classify, or not to classify:

More information

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER:

More information

Integer weight training by differential evolution algorithms

Integer weight training by differential evolution algorithms Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp

More information

Anti correlation: A Diversity Promoting Mechanisms in Ensemble Learning

Anti correlation: A Diversity Promoting Mechanisms in Ensemble Learning Anti correlation: A Diversity Promoting Mechanisms in Ensemble Learning R. I. (Bob) McKay 1 and Hussein A. Abbass 2 School of Computer Science, University of New South Wales, Australian Defence Force Academy

More information

Multivariate interdependent discretization in discovering the best correlated attribute

Multivariate interdependent discretization in discovering the best correlated attribute Data Mining VI 35 Multivariate interdependent discretization in discovering the best correlated attribute S. Chao & Y. P. Li Faculty of Science and Technology, University of Macau, China Abstract The decision

More information

Bayesian Model Averaging Naive Bayes (BMA-NB): Averaging over an Exponential Number of Feature Models in Linear Time

Bayesian Model Averaging Naive Bayes (BMA-NB): Averaging over an Exponential Number of Feature Models in Linear Time Bayesian Model Averaging Naive Bayes (BMA-NB): Averaging over an Exponential Number of Feature Models in Linear Time Ga Wu Australian National University Canberra, Australia wuga214@gmail.com Scott Sanner

More information

A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS

A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS Journal of the Chinese Institute of Engineers, Vol. 32, No. 2, pp. 169-178 (2009) 169 A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS Jen-Feng Wang, Chinson Yeh, Chen-Wen Yen*, and Mark L. Nagurka ABSTRACT

More information

ON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING

ON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING ON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING Mykola PECHENIZKIY*, Alexey TSYMBAL**, Seppo PUURONEN* Abstract. The curse of dimensionality is pertinent to

More information

Diversity-Based Boosting Algorithm

Diversity-Based Boosting Algorithm Diversity-Based Boosting Algorithm Jafar A. Alzubi School of Engineering Al-Balqa Applied University Al-Salt, Jordan Abstract Boosting is a well known and efficient technique for constructing a classifier

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

An Approach to Classification Based on Fuzzy Association Rules

An Approach to Classification Based on Fuzzy Association Rules An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based

More information

Concept Lattice based Composite Classifiers for High Predictability

Concept Lattice based Composite Classifiers for High Predictability Concept Lattice based Composite Classifiers for High Predictability Zhipeng XIE 1 Wynne HSU 1 Zongtian LIU 2 Mong Li LEE 1 1 School of Computing National University of Singapore Lower Kent Ridge Road,

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Non-negative Matrix Factorization on Kernels

Non-negative Matrix Factorization on Kernels Non-negative Matrix Factorization on Kernels Daoqiang Zhang, 2, Zhi-Hua Zhou 2, and Songcan Chen Department of Computer Science and Engineering Nanjing University of Aeronautics and Astronautics, Nanjing

More information

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

A Bayesian approach to estimate probabilities in classification trees

A Bayesian approach to estimate probabilities in classification trees A Bayesian approach to estimate probabilities in classification trees Andrés Cano and Andrés R. Masegosa and Serafín Moral Department of Computer Science and Artificial Intelligence University of Granada

More information

Modified Learning for Discrete Multi-Valued Neuron

Modified Learning for Discrete Multi-Valued Neuron Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Modified Learning for Discrete Multi-Valued Neuron Jin-Ping Chen, Shin-Fu Wu, and Shie-Jue Lee Department

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

When is undersampling effective in unbalanced classification tasks?

When is undersampling effective in unbalanced classification tasks? When is undersampling effective in unbalanced classification tasks? Andrea Dal Pozzolo, Olivier Caelen, and Gianluca Bontempi 09/09/2015 ECML-PKDD 2015 Porto, Portugal 1/ 23 INTRODUCTION In several binary

More information

1 Introduction. Keywords: Discretization, Uncertain data

1 Introduction. Keywords: Discretization, Uncertain data A Discretization Algorithm for Uncertain Data Jiaqi Ge, Yuni Xia Department of Computer and Information Science, Indiana University Purdue University, Indianapolis, USA {jiaqge, yxia}@cs.iupui.edu Yicheng

More information

Predicting the Probability of Correct Classification

Predicting the Probability of Correct Classification Predicting the Probability of Correct Classification Gregory Z. Grudic Department of Computer Science University of Colorado, Boulder grudic@cs.colorado.edu Abstract We propose a formulation for binary

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

An Evolution Strategy for the Induction of Fuzzy Finite-state Automata

An Evolution Strategy for the Induction of Fuzzy Finite-state Automata Journal of Mathematics and Statistics 2 (2): 386-390, 2006 ISSN 1549-3644 Science Publications, 2006 An Evolution Strategy for the Induction of Fuzzy Finite-state Automata 1,2 Mozhiwen and 1 Wanmin 1 College

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Subcellular Localisation of Proteins in Living Cells Using a Genetic Algorithm and an Incremental Neural Network

Subcellular Localisation of Proteins in Living Cells Using a Genetic Algorithm and an Incremental Neural Network Subcellular Localisation of Proteins in Living Cells Using a Genetic Algorithm and an Incremental Neural Network Marko Tscherepanow and Franz Kummert Applied Computer Science, Faculty of Technology, Bielefeld

More information

Not so naive Bayesian classification

Not so naive Bayesian classification Not so naive Bayesian classification Geoff Webb Monash University, Melbourne, Australia http://www.csse.monash.edu.au/ webb Not so naive Bayesian classification p. 1/2 Overview Probability estimation provides

More information

A Discretization Algorithm for Uncertain Data

A Discretization Algorithm for Uncertain Data A Discretization Algorithm for Uncertain Data Jiaqi Ge 1,*, Yuni Xia 1, and Yicheng Tu 2 1 Department of Computer and Information Science, Indiana University Purdue University, Indianapolis, USA {jiaqge,yxia}@cs.iupui.edu

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Relation between Pareto-Optimal Fuzzy Rules and Pareto-Optimal Fuzzy Rule Sets

Relation between Pareto-Optimal Fuzzy Rules and Pareto-Optimal Fuzzy Rule Sets Relation between Pareto-Optimal Fuzzy Rules and Pareto-Optimal Fuzzy Rule Sets Hisao Ishibuchi, Isao Kuwajima, and Yusuke Nojima Department of Computer Science and Intelligent Systems, Osaka Prefecture

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Keywords: Multimode process monitoring, Joint probability, Weighted probabilistic PCA, Coefficient of variation.

Keywords: Multimode process monitoring, Joint probability, Weighted probabilistic PCA, Coefficient of variation. 2016 International Conference on rtificial Intelligence: Techniques and pplications (IT 2016) ISBN: 978-1-60595-389-2 Joint Probability Density and Weighted Probabilistic PC Based on Coefficient of Variation

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

Feature Selection with Fuzzy Decision Reducts

Feature Selection with Fuzzy Decision Reducts Feature Selection with Fuzzy Decision Reducts Chris Cornelis 1, Germán Hurtado Martín 1,2, Richard Jensen 3, and Dominik Ślȩzak4 1 Dept. of Mathematics and Computer Science, Ghent University, Gent, Belgium

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Neural Network Construction using Grammatical Evolution

Neural Network Construction using Grammatical Evolution 2005 IEEE International Symposium on Signal Processing and Information Technology Neural Network Construction using Grammatical Evolution Ioannis G. Tsoulos (1), Dimitris Gavrilis (2), Euripidis Glavas

More information

Using HDDT to avoid instances propagation in unbalanced and evolving data streams

Using HDDT to avoid instances propagation in unbalanced and evolving data streams Using HDDT to avoid instances propagation in unbalanced and evolving data streams IJCNN 2014 Andrea Dal Pozzolo, Reid Johnson, Olivier Caelen, Serge Waterschoot, Nitesh V Chawla and Gianluca Bontempi 07/07/2014

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Pairwise Naive Bayes Classifier

Pairwise Naive Bayes Classifier LWA 2006 Pairwise Naive Bayes Classifier Jan-Nikolas Sulzmann Technische Universität Darmstadt D-64289, Darmstadt, Germany sulzmann@ke.informatik.tu-darmstadt.de Abstract Class binarizations are effective

More information

Top-k Parametrized Boost

Top-k Parametrized Boost Top-k Parametrized Boost Turki Turki 1,4, Muhammad Amimul Ihsan 2, Nouf Turki 3, Jie Zhang 4, Usman Roshan 4 1 King Abdulaziz University P.O. Box 80221, Jeddah 21589, Saudi Arabia tturki@kau.edu.sa 2 Department

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

A Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape

A Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape WCCI 200 IEEE World Congress on Computational Intelligence July, 8-23, 200 - CCIB, Barcelona, Spain CEC IEEE A Mixed Strategy for Evolutionary Programming Based on Local Fitness Landscape Liang Shen and

More information

Online Estimation of Discrete Densities using Classifier Chains

Online Estimation of Discrete Densities using Classifier Chains Online Estimation of Discrete Densities using Classifier Chains Michael Geilke 1 and Eibe Frank 2 and Stefan Kramer 1 1 Johannes Gutenberg-Universtität Mainz, Germany {geilke,kramer}@informatik.uni-mainz.de

More information

Weight Initialization Methods for Multilayer Feedforward. 1

Weight Initialization Methods for Multilayer Feedforward. 1 Weight Initialization Methods for Multilayer Feedforward. 1 Mercedes Fernández-Redondo - Carlos Hernández-Espinosa. Universidad Jaume I, Campus de Riu Sec, Edificio TI, Departamento de Informática, 12080

More information

OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION

OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION Na Lin, Haixin Sun Xiamen University Key Laboratory of Underwater Acoustic Communication and Marine Information Technology, Ministry

More information

Uwe Aickelin and Qi Chen, School of Computer Science and IT, University of Nottingham, NG8 1BB, UK {uxa,

Uwe Aickelin and Qi Chen, School of Computer Science and IT, University of Nottingham, NG8 1BB, UK {uxa, On Affinity Measures for Artificial Immune System Movie Recommenders Proceedings RASC-2004, The 5th International Conference on: Recent Advances in Soft Computing, Nottingham, UK, 2004. Uwe Aickelin and

More information

Blind Source Separation Using Artificial immune system

Blind Source Separation Using Artificial immune system American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-03, Issue-02, pp-240-247 www.ajer.org Research Paper Open Access Blind Source Separation Using Artificial immune

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Support Vector Machine via Nonlinear Rescaling Method

Support Vector Machine via Nonlinear Rescaling Method Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University

More information

P leiades: Subspace Clustering and Evaluation

P leiades: Subspace Clustering and Evaluation P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de

More information

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,

More information

Ensemble Pruning via Individual Contribution Ordering

Ensemble Pruning via Individual Contribution Ordering Ensemble Pruning via Individual Contribution Ordering Zhenyu Lu, Xindong Wu +, Xingquan Zhu @, Josh Bongard Department of Computer Science, University of Vermont, Burlington, VT 05401, USA + School of

More information

Class Prior Estimation from Positive and Unlabeled Data

Class Prior Estimation from Positive and Unlabeled Data IEICE Transactions on Information and Systems, vol.e97-d, no.5, pp.1358 1362, 2014. 1 Class Prior Estimation from Positive and Unlabeled Data Marthinus Christoffel du Plessis Tokyo Institute of Technology,

More information

Machine Learning 2010

Machine Learning 2010 Machine Learning 2010 Concept Learning: The Logical Approach Michael M Richter Email: mrichter@ucalgary.ca 1 - Part 1 Basic Concepts and Representation Languages 2 - Why Concept Learning? Concepts describe

More information

Exact model averaging with naive Bayesian classifiers

Exact model averaging with naive Bayesian classifiers Exact model averaging with naive Bayesian classifiers Denver Dash ddash@sispittedu Decision Systems Laboratory, Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15213 USA Gregory F

More information

Ensemble determination using the TOPSIS decision support system in multi-objective evolutionary neural network classifiers

Ensemble determination using the TOPSIS decision support system in multi-objective evolutionary neural network classifiers Ensemble determination using the TOPSIS decision support system in multi-obective evolutionary neural network classifiers M. Cruz-Ramírez, J.C. Fernández, J. Sánchez-Monedero, F. Fernández-Navarro, C.

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

Necessary Intransitive Likelihood-Ratio Classifiers. Gang Ji, Jeff Bilmes

Necessary Intransitive Likelihood-Ratio Classifiers. Gang Ji, Jeff Bilmes Necessary Intransitive Likelihood-Ratio Classifiers Gang Ji, Jeff Bilmes {gang,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 9895-2500 UW Electrical Engineering UWEE Technical

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Pattern-Based Decision Tree Construction

Pattern-Based Decision Tree Construction Pattern-Based Decision Tree Construction Dominique Gay, Nazha Selmaoui ERIM - University of New Caledonia BP R4 F-98851 Nouméa cedex, France {dominique.gay, nazha.selmaoui}@univ-nc.nc Jean-François Boulicaut

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Juan Torres 1, Ashraf Saad 2, Elliot Moore 1 1 School of Electrical and Computer

More information

Fast heterogeneous boosting

Fast heterogeneous boosting Fast heterogeneous boosting Norbert Jankowski Department of Informatics, Nicolaus Copernicus University, Poland norbert@is.umk.pl Abstract he main goal of this paper is introduction of fast heterogeneous

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

A Posteriori Corrections to Classification Methods.

A Posteriori Corrections to Classification Methods. A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Sparse Support Vector Machines by Kernel Discriminant Analysis

Sparse Support Vector Machines by Kernel Discriminant Analysis Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE

A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE Li Sheng Institute of intelligent information engineering Zheiang University Hangzhou, 3007, P. R. China ABSTRACT In this paper, a neural network-driven

More information

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company UNSUPERVISED LEARNING OF BAYESIAN NETWORKS VIA ESTIMATION OF DISTRIBUTION ALGORITHMS: AN

More information

Data Mining Part 4. Prediction

Data Mining Part 4. Prediction Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction

More information

DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA

DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA 315 C. K. Lowe-Ma, A. E. Chen, D. Scholl Physical & Environmental Sciences, Research and Advanced Engineering Ford Motor Company, Dearborn, Michigan, USA

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information