Target Classification Using Knowledge-Based Probabilistic Model

Size: px
Start display at page:

Download "Target Classification Using Knowledge-Based Probabilistic Model"

Transcription

1 14th International Conference on Information Fusion Chicago, Illinois, USA, July 5-8, 2011 Target Classification Using Knowledge-Based Probabilistic Model Wenyin Tang 1,K.Z.Mao 1,, Lee Onn Mak 2,GeeWahNg 2, Zhaoyang Sun 1,JiHuaAng 2 and Godfrey Lim 2 1 School of EEE, Nanyang Technological University, Singapore 2 DSO National Laboratories, Singapore Corresponding author, ekzmao@ntu.edu.sg Abstract In past decades, pattern classification has been intensively explored in machine learning. With the in-depth exploration of machine learning in various applications, new challenges arise, which requests researchers to move from driven to domain-driven models by integrating domain knowledge, and to move from static to dynamic models to adapt to the changing environment. This paper proposes an intelligent classification system with following features, to address these requests. Firstly, this system integrates both association and classification modules. The contextual information extracted from input is saved as learnt knowledge which is then combined with given expert knowledge in classification. The experimental study shows that this learning process helps to reduce the ambiguity of classification. Secondly, the proposed classifier, i.e. knowledge-based naive Bayes, classifies the incoming based on both expert knowledge and learnt knowledge. Thirdly, a soft-decision mechanism is adopted in classification algorithm, which can effectively handle overlapping. Keywords: target classification, association, soft decision. I. INTRODUCTION As a highly active area of machine learning, pattern classification has been extensively explored [1], [2]. Successful classification algorithms such as naive Bayes, k-nearest neighbor, artificial neural networks, support vector machines etc, have been applied in various fields including biomedical engineering, image processing, mining and defense technology. Although the classification algorithms more or less emulate the way that human classify perceptual objects or patterns, most of them focus on modeling single functions of human intelligence under some assumptions. Moreover, most classifiers are designed in a -driven manner, and the classification performance largely relies on the quality of given training. This is actually a non-trivial problem because in many applications, a good training set is not readily available. With the in-depth exploration of machine learning methods in various applications, new challenges arise, which requests researchers to move (i) from -driven to domain-driven models by understanding and integrating domain knowledge [3]; (ii) from static to dynamic models capable of adapting to changing environment (this issue is referred to as situation awareness in information fusion [4]). These new challenges motivate us to design a target classification system that can make good use of domain knowledge and learn from environment at the same time. The proposed target classification system is an intelligent system that integrates both association and classification modules. The input is first transferred to association, where the is associated to the historical clusters on entries list. The novel that is excluded from all existing clusters (based on a predefined threshold) will be recorded and input into the classification module. While the non-novel will be associated to its nearest cluster. From the entries list, the contextual information will be extracted from recent historical and saved as learnt knowledge for further use. From the experimental study, we find that integrating the contextual information into the given expert knowledge helps to reduce the ambiguity of classification and therefore improves the overall classification performance. The proposed classification algorithm is knowledge-driven, starting from the prior knowledge stored in known target library (KTL). KTL contains expert knowledge of targets characteristics, either from long period of collection or provided by manufacturers [5]. This model is extremely useful when the prior expert knowledge instead of training is available. Classical models include prototypes models, rule/boundary models, and hybrid models (see, for example, chapter 2 of [6]). In this paper, we propose a probabilistic model, called knowledge-based naive Bayes model, which is capable of integrating both prior and learnt knowledge (from incoming ). Our system employs a soft decision mechanism. The principle behind is, based on the studies of cognitive psychology, identifying alternative hypotheses is helpful in improving human s problem solving skills than fixing on one hypothesis [7]. In classification problem, our previous work [5] found that soft decision is also important to deal with overlapping, which is pervasive over different applications. The rest of this paper is organized as follows. Section II will introduce the proposed knowledge-based target classification system. The experimental study is presented in Section III. Section IV will conclude the whole paper. II. KNOWLEDGE-BASED TARGET CLASSIFICATION SYSTEM A. System Overview Figure 1 shows the flowchart of the proposed knowledgebased target classification system, where the raw is collected from multiple sensors and is aligned to form a ISIF 701

2 Target Figure 1. Sensors Raw Data Alignment Track Data Association EL Response output Data & Predictions Predictions Update Legends: EL: Entries List KTL: Known Target Library LK: Learnt Knowledge Classification LK KTL Knowledge-based target classification system track which is then transferred to association. The association recruits new samples into the entries list (EL) incrementally: the novel track that has not been seen before will be recorded and its predictive result will be retrieved based on the knowledge stored in known target library (KTL) and learnt knowledge (LK). Otherwise, the non-novel track will be used to update the nearest existing entry. KTL contains target/class characteristics either from long period of collection or provided by manufacturers [5]. The target/class characteristics are represented in a common set of attributes, while each class is described by a conjunction of specific attribute values or value ranges. In feature space, the characteristics define a specific region for each of the known target classes. Figure 2 shows a snapshot of KTL with two classes represented in two continuous attributes {X 1,X 2 } (left panel), and the corresponding class regions in the feature space spanned by X 1 and X 2 (right panel). Known Target Library Target1: [ ] [ ] Target2: [ ] [ ] Target 1 Target Figure 2. Snapshot of known target library (left) and the corresponding classes defined in feature space (right) Figure 2 shows a simple example of known target library. The practical known target library is much more complicated, containing a large number of classes, attributes, and many overlaps between different classes as well. In experimental study, the KTL contains 166 target classes (as will be shown in Figure 8). In entries list, the new inputs generated is used to update the learnt knowledge (LK) as contextual information, for further classification. This point will be implemented in detail in Section II-C1. The overall performance of the system mainly depends on the accuracy of classification models embedded in this system. Next we will focus on knowledge-based classification models. B. Related Classification Models 1) Production Rule-based Classifier: Production rule-based classifier considers a simple representation in which each rule consists of a conjunction of constraints on the attributes. Considering the example in Figure 2, two rules can be generated from the known target library, as follows: Rule 1: IF 0.1 X AND 0.4 X THEN Class ω 1 Rule 2: IF 0.4 X AND 0.6 X THEN Class ω 2 If any incoming x i satisfies all the constraints of Rule k, it will be classified to the corresponding class ω k.inthe feature space as shown in Figure 2, each rule defines a region bounded by a hyper-rectangle or a box. Any located inside the box can be identified directly. Production rules interpret the concepts of target classes clearly, but this method has several intrinsic limitations. Firstly, production rules are too strict to tolerate noisy. The that floats out of its defined box due to noise may make the unrecognized, although it may be still very close to the box. Secondly, for located in the overlapping region of two or more classes, production rules are unable to tell which class is more likely to be the true target class. 2) Distance-based Classifier: A more flexible way is to classify a based on the distances from the to different classes. The is assigned to the nearest one or more classes (if several classes get the same distances to the ). Therefore, a will be assigned to its most possible class even if it is outside the defined box. There are two options for calculating distances, i.e. i) distance to class centroid (or prototype model): m d c (x i,ω k )= (x ij μ kj ) 2 (1) j=1 where x ij is the value of j th attribute of input x i and μ kj is the center point of value range of class ω k on attribute X j, i.e. V jk ; and ii) distance to class boundary (or boundary model): d b (x i,ω k )= { 0; if xi inside of class ω k m j=1 [min(x ij V kj )] 2 ; otherwise (2) 702

3 Figure 3. classes Centroid distance (left) and boundary distance (right) to target In centroid distance-based classifier, the class is represented by the center point (shown as + in Figure 3) of a hyperrectangle or box in the feature space. As a result, the boundary information is lost. Some classes, say class 1, may have a bigger expansion on one attribute than other classes, say class 2. The that belongs to class 1 may be misclassified to the class nearby, as shown in Figure 4. Figure 4. The problem of centroid distance: will be misclassified to class 2 because it is nearer to the centroid of class 2. Boundary distance-based classifier will not misclassify a inside of the class boundary as shown in Figure 4, however, it is more sensitive to irrelevant attributes. For example, for represented by relevant attributes X 1 and X 2 and an irrelevant attribute X 3. When the unknown perfectly satisfies the constraints of class k on X 1 and X 2, i.e. 0 boundary distance on these two dimensions, the final boundary distance of the to class ω k will completely depend on the boundary distance on attribute X 3, which may be highly inaccurate. Secondly, boundary distance-based method cannot differentiate the unknown in the overlapping region of two or more classes, as shown in Figure 5. Figure 5. The problem of boundary distance: which class is more likely to be the true target class if the boundary distances are same? C. A Knowledge-Based Probabilistic Model for Classification A probabilistic model for classification involves approximating an unknown target function f : X Y by estimating the posterior probability P (Y X), where Y is a random variable that represents class label, and X is a vector containing n features or attributes, i.e. X =[X 1,X 2,...,X n ]. Based on Bayes rule, the posterior probability P (Y = ω k X = x i ) or P (ω k x i ) can be represented as follows. P (ω k x i )= P (x i ω k )P (ω k ) k P (x (3) i ω k )P (ω k ) For the input x i, applying MAP (maximum a posteriori) yields: y i = ω k if P (ω k x i ) >P(ω l x i ) for all k = l. (4) Bayes rule is actually the basis for various Bayes models, where the difference among various models lies in how we estimate P (X ω k ) (conditional probabilities) and P (ω k ) (priors). It is well-acknowledged that directly estimating P (X ω k ) at the n-dimensional feature space is not always practical, even for a small n. That is because accurate probability estimation requires a large number of training samples, to keep the statistical significance during the estimation [1]. Naive Bayes [8], [9] simplifies the probability estimation of P (X Y ) by making a conditional independence assumption. This reduces the complexity of parameters from the original O(2 n ) to O(n) [1]. Based on the assumption of conditional attribute independence, naive Bayes has: n P (X ω k )=P(X 1,X 2,...,X n ω k )= P (X j ω k ) (5) j=1 Substituting Eq. (5) into Eq. (3) and Eq. (4) yields the following NB classifier: y i =argmax ω k P (ω k ) n P (x ij ω k ) (6) j=1 In literature, the choice of probability estimation methods is usually according to attribute types, i.e. whether the attributes are discrete, continuous or mixed. For discretevalued training, the probability estimation is estimated using maximum likelihood estimation (MLE) by counting. In the case of continuous inputs, the attributes are either discretized into discrete ones [10] or estimates P (x ij ω k ) on continuous feature X j directly. The parametric method will assume that for each possible class ω k the distribution of each continuous X j is Gaussian, and estimate the parameters of each of these Gaussian distributions from training. Nonparametric methods such as Parzen window can also be used to estimate the conditional probability of input X given Y [2]. Due to the independence assumption, NB classifier is simple and can be trained very efficiently. Despite of its simplicity, naive Bayes classifier can often outperform more sophisticated classification methods [9], [11], [12]. The aforementioned methods for probability estimation are in a supervised setting where a set of training is provided. 703

4 In some applications, some expert knowledge of target characteristics, instead of training, may be available. Next, a new knowledge-based naive Bayes classifier using known target library will be introduced. 1) Knowledge-Based Naive Bayes (NB) Classifier: The knowledge-based NB classifier concerns with estimating conditional probabilities P (X j ω k ) and prior probabilities P (ω k ) using given expert knowledge. First and foremost, we need to consider knowledge representations in probability distributions. Given the class descriptions in KTL, the conditional probability P (X j ω k ) can be defined as: P (X j ω k )=η jk f(x j ω k ) (7) where η jk is a normalized factor to ensure + P (X j ω k )dx j = 1 and f(x j ω k ) is a representation function, which can be defined as Eqn (8). { 1, Xj V f(x j ω k )= jk (8) 0, otherwise where V jk is the value range of class k on attribute X j.the conditional probability of X given ω k is estimated by: ˆP (X ω k )= n η jk f(x j ω k ) (9) j=1 If function f( ) is defined as Eqn (8), the knowledgebased NB classifier is equivalent to the production rulebased classifier (as presented in Section II-B1), and thus the problems of the rule-based model mentioned in Section II-B1 will be inherited in the probabilistic model. To deal with the problems, an ideal function f( ) should be able to generalize well and provide a probability measure for any unknown input. Therefore, we modify the 0-1 function defined in Eqn (8) by attaching symmetrical right and left tails to it. The modified function is given in Eqn 10, and is shown in Figure 6 (b). 1, ) X j V jk f(x j ω k )= exp ( (x min(v jk)) 2 2(α r jk ), X 2 j < min(v jk ) ) exp ( (x max(v jk)) 2 2(α r jk ), X 2 j > max(v jk ) (10) where max(v jk ) and min(v jk ) functions represent the maximum and minimum values in V jk. Parameter r jk is the radius of V jk (i.e. r jk = max(v jk) min(v jk ) 2 ) and α is a parameter controlling the decreasing rate of the tails. The setting of parameter α depends on the degree of matching between the real distribution and the given region of target class. In our experimental study, we set α =0.1. For the prior probability estimation, traditional NB classifiers usually calculate P (ω k ) from the training set D by maximum likelihood estimation, i.e. ˆP (ωk )= D{Y =ω k} D, where counts the number of samples in one set Attribute (a) 0-1 function f( ) defined in Eqn (8): equivalent to rule-based classifier p Attribute (b) improved function f( ) defined in Eqn (10) Figure 6. Two knowledge representation functions on attribute value range from 0.4 to 0.6. For the knowledge-based NB classifier where the training set is unavailable, we estimate the prior probability based on recent historical testing set H, by using their predicted class labels as pseudo-feedback. Therefore, the prior probability could be defined as the percentage of the historical testing classified to ω k, i.e. ˆP (ω k, x i H = {x i h,...x i 1 })= H{Y = ω k} H (11) The idea is described in Figure 7, i.e. if the recent input are continuously predicted as class 1, for example, the current input which is located at the overlapping region of class 1 and class 2 should be more likely to belong to class 1. Figure 7. Consider the prior knowledge, the unknown is more likely to belong to class 1. To avoid the domination of the class that appears first, (for example, if the first test is classified to ω 1, then 704

5 ˆP (ω 1 )=1and ˆP (ω 2 )= ˆP (ω 3 )=... = ˆP (ω K )=0,the future unknown will be classified to ω 1 because the prior probabilities for other classes are all equal to zero), we will use a smoothed estimate based on Dirichlet prior over the prior probability by assuming equal priors on each class ω k.and the prior probability estimation function for the i th unknown x i can be calculated based on the h most recent historical, as below: ˆP (ω k, x i H = {x i h,...x i 1 })= H{Y = ω k} + l H + l K (12) where K is the number of classes and l is a parameter that determines the strength of the equal prior assumptions related to the recent history H. In experimental study, we find the number of recent history H for prior probability estimation should not be too large, say hundreds. We suggest to set h<100 as further discussed in Section III-B and l =1. 2) Soft Decision Mechanism: Based on cognitive psychology, one tip on problem solving is to identify alternative hypotheses, rather than fixing on one hypothesis [7]. Our system implements this idea by plugging a soft decision mechanism. Soft decision is very important for the with overlaps between different classes, which is common in practice. When classifying an input that falls into the overlap region of different classes, the traditional classifier that makes crisp decision will inevitably cause a high rate of misclassifications due to the big uncertainty. In this case, soft decision is believed to be better and let experts have the opportunity to evaluate reasonable alternatives. In our previous work [5], we explored soft decision in the context of neural networks, which was proved to be effective in dealing with overlapping. The algorithm for soft decision is quite simple. For any input x i, the output classes are sorted in a descending order of the corresponding posterior probabilities, i.e. ω (1) ω (2) ω (K) if ˆP (ω (1) x i ) ˆP (ω (2) x i ), ˆP (ω (K) x i ), where ω (k) represents the k th ranked class in outputs. We will select top k class by setting a cutoff point between ω (k) ω (k+1) if: ˆP (ω (k) x i ) ˆP (ω (k+1) x i ) ε (13) where ε is a small positive value to control the softness of decision. The smaller the ε, the softer the decision. III. EXPERIMENTAL STUDY A. Data and Evaluation Methods 1) Data: To test the target classification system developed in this paper, a simulated called KTL166- and three real-world sets from UCI machine learning repository [13] are used in the experiments. The simulated includes two parts: 1) KTL with 166 classes represented by 2 continuous attributes and 2) sequentially incoming test, as summarized in Table I. Figure 8 shows the KTL of the simulated with 166 classes in feature space spanned by two attributes X 1 and X 2.From this figure, we can observe that class distributions defined in KTL in application domain could be very complicated, where different classes may overlap severely. attribute Table I SIMULATED DATA SUMMARIZATION 1. Given known target library (KTL): #classes #features type of features continuous 2. Sequential inputs (test ): #inputs #features #classes KTL attribute 1 Figure 8. Known target library of DSO artificial in feature space spanned by attributes X 1 and X 2. For the three UCI sets summarized in Table II, KTL is not available beforehand, and the method that artificially generates KTL for each set will be introduced in detail in Section III-C. Compared with simulated, the number of attributes in these sets are higher. Table II STATISTICS OF THREE REAL-WORLD DATASETS FROM UCI MACHINE LEARNING DATA REPOSITORY set name #classes #samples #attributes ecoli segment vowel ) Evaluation of Soft Decision: The classification accuracy of soft decision is evaluated from two aspects, i.e. error rate and the average number of output class labels. As shown in Table III for example, if the ground truth class of the input is ω 1, there are three different levels of prediction quality: excellent - output only one prediction class that hit the target. good - output more than one class and the true class is found. 705

6 Table III EXCELLENT, GOOD AND POOR PREDICTIONS UNDER SOFT DECISION FRAMEWORK Ground truth class: ω 1 Prediction Error rate No. output Evaluation ω excellent ω 1,ω good ω 2 or ω 2,ω or 2 poor poor - make a wrong decision. Obviously, this evaluation criterion implies that it is better to output multiple predictions than a single but incorrect one. B. Results On Simulated Data Figure 9. Comparison of rule-based, distance-based and probabilistic methods for association and classification. The overall system with both association and classification modules using knowledge-based NB, production rule-based classifier and distance-based classifier has been tested on the simulated summarized in Table I. The experimental results are summarized in Figure 9. In this figure, the xaxis represents the average number of predicted class labels assigned to each input. The y-axis represents the error rate of online association and classification. Each curve or point shows the experimental results from one method, varied by different parameter settings. For knowledge-based NB, we vary the number of history used to estimate the prior probability of knowledge-based NB from h =0(i.e. equal priors), 2, 5, 10, 20, 50, 100. The error curve shows that when h =2, 5, 10, the performance of knowledge-based NB will be improved compared with equal priors when h =0with smaller number of predictions and similar error rates. This implies that introducing prior probability from incoming will help to reduce the ambiguity of classification. However, when h continues to increase, the error rate will increase to tradeoff the lower number of predictions. When compare with other methods, knowledgebased NB shows the better performance. C. Results On UCI Data In this section, our method is further tested using 3 realworld sets from UCI set repository [13] as summerized in Table II. We generate a KTL for each set artificially. For each set D, we first calculate the attribute value ranges of each class, i.e. V jk =[a, b], where a =min{x j (Y = ω k )} and b = max{x j (Y = ω k )}, and generate the perfect KTL={V jk j =1,...,n; k =1,...,K}. Then we generate KTL with an incomplete coverage for the training, i.e. KTL λ, where λ here represents the degree of knowledge completeness. One reason behind this kind of setting is that there may be a gap between the given expert knowledge and the incoming, due to the noise. Another important reason is that the descriptions of class may tend to interpret the typical samples of the class because these descriptions of classes are highly abstracted. Therefore, a reasonable setting here is to give value ranges of each class that only covers the central part of the from this class. As a result, KTL λ contains a set of value ranges shrinking toward their mean values. Specifically, KTL λ = {Vjk λ j = 1,...,n; k = 1,...,K}, where Vjk λ =[a + λ 2 (b a),b λ 2 (b a)], with 0 <λ 1. In classification, KTL λ is used to construct knowledgebased NB, rule-based classifier and distance-based classifiers, and the whole sample set is used as test. The experimental results are shown in Table IV, V and VI. From these experimental results, we can observe that: When the KTL completely and correctly cover class distributions of the true, all classifiers except centroid distance-based classifier perform very well, with a zero error rate and small number of predictions. Notice that the centroid distance-based classifier will not be affected by different degrees of knowledge completeness λ. But the error rate of centroid distance-based classifier is obviously higher than that of other methods, even when λ =1. With the decreasing completeness of knowledge, i.e. the decreasing λ, we find that the performance of production rule-based classifier will degrade dramatically, especially when the dimensionality of feature space (or the number of attributes) is high. As we know, the samples distributed in a high dimensionality are close to the boundary of a class [14]. Therefore, when the value ranges in KTL shrink toward the centers, most of samples will located outside of the class region boxes, and therefore cannot be recognized by production rule-based classifier. Our proposed method knowledge-based NB has reasoning capacity for the located outside of the class region box. That is why knowledge-based NB will keep a relatively lower error rates with the decreasing λ, compared with other three methods. When the knowledge completeness is too low, knowledge-based NB will output a higher number of predicted class labels. This indicates that based on less and less knowledge, knowledge-based NB has a lower confidence in decision making. 706

7 Table IV EXPERIMENTAL RESULTS OF KNOWLEDGE-BASED NB (K-NB), PRODUCTION RULE-BASED CLASSIFIER, BOUNDARY DISTANCE-BASED CLASSIFIER AND CENTROID DISTANCE-BASED CLASSIFIER ON ECOLI DATA Evaluation knowledge completeness parameter λ Methods measures K-NB error rate #prediction production error rate rule #prediction boundary error rate distance #prediction centroid error rate distance #prediction Table V EXPERIMENTAL RESULTS OF KNOWLEDGE-BASED NB (K-NB), PRODUCTION RULE-BASED CLASSIFIER, BOUNDARY DISTANCE-BASED CLASSIFIER AND CENTROID DISTANCE-BASED CLASSIFIER ON SEGMENT DATA Evaluation knowledge completeness parameter λ Methods measures K-NB error rate #prediction production error rate rule #prediction boundary error rate distance #prediction centroid error rate distance #prediction Table VI EXPERIMENTAL RESULTS OF KNOWLEDGE-BASED NB (K-NB), PRODUCTION RULE-BASED CLASSIFIER, BOUNDARY DISTANCE-BASED CLASSIFIER AND CENTROID DISTANCE-BASED CLASSIFIER ON VOWEL DATA Evaluation knowledge completeness parameter λ Methods measures K-NB error rate #prediction production error rate rule #prediction boundary error rate distance #prediction centroid error rate distance #prediction IV. CONCLUSIONS This paper proposed a knowledge-based target classification system. Three features differentiate our method from the traditional methods. Firstly, our system automatically learns contextual information from input. This learnt knowledge is saved and then combined with given expert knowledge in classification. The second feature of our system is that the classification algorithm proposed here is knowledge-driven, which is extremely useful in the applications where expert knowledge is available instead of training. Thirdly, the classification algorithm embeds a soft-decision mechanism, which is able to deal with overlapping. The experimental results show that our system with proposed knowledge-based naive Bayes outperforms other classical classification models including production rule, boundary and prototype models. The experimental results also show that our method performs better than other classical models when the provided knowledge is incomplete subject to the true distributions. REFERENCES [1] T. T. Mitchell, Machine Learning. The McGraw-Hill Companies, Inc., [2] R.O.Duda,P.E.Hart,andD.G.Stork,Pattern Classification. John Wiley & Sons, Inc, [3] P. L. Thomas G. Dietterich, Cognitive Networks (Book Chapter: Machine Learning for Cognitive Networks: Technology Assessment and Research Challenges). John Wiley & Sons, Ltd, [4] E. Bosse, J. Roy, and S. Wark, Concepts, Models, and Tools for Information Fusion. Norwood, MA, USA: Artech House, Inc., [5] W. Tang, K. Z. Mao, L. O. Mak, and G. W. Ng, Classification for overlapping classes using optimized overlapping region detection and soft decision, in Proceedings of the 13th International Conference on Information Fusion, Edinburgh, United Kingdom, July 2010, pp [6] R. Sun, The Cambridge Handbook of Computational Psychology. New York, NY, USA: Cambridge University Press, [7] R. R. Hunt and H. C. Ellis, Fundamentals of Cognitive Psychology, Seventh Edition. New York, NY, USA: The McGraw-Hill Companies, Inc., [8] P. Langley, W. Iba, and K. Thompson, An analysis of bayesian classifiers, in IN PROCEEDINGS OF THE TENTH NATIONAL CON- FERENCE ON ARTI CIAL INTELLIGENCE. MIT Press, 1992, pp [9] P. Domingos and M. Pazzani, On the optimality of the simple bayesian classifier under zero-one loss, Machine Learning, vol. 29, pp , [10] Y. Yang and G. I. Webb, A comparative study of discretization methods for naive-bayes classifiers, in Proceedings of PKAW The 2002 Pacific Rim Knowledge Acquisition Workshop. Tokyo, Japan, 2002, pp [11] N. Friedman, D. Geiger, and M. Goldszmidt, Bayesian network clas- 707

8 sifiers, vol. 29, pp , [12] F. Zheng and G. I. Webb, A comparative study of semi-naive bayes methods in classification learning, in Proceedings of 4th Australasian Data Mining conference (AusDM05, 2005, pp [13] P. M. Murphy and D. W. Aha, Uci repository of machine learning bases, Irvine, CA: University of California, [14] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. Springer,

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Kaizhu Huang, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Day 5: Generative models, structured classification

Day 5: Generative models, structured classification Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression

More information

Machine Learning Gaussian Naïve Bayes Big Picture

Machine Learning Gaussian Naïve Bayes Big Picture Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

An analysis of data characteristics that affect naive Bayes performance

An analysis of data characteristics that affect naive Bayes performance An analysis of data characteristics that affect naive Bayes performance Irina Rish Joseph Hellerstein Jayram Thathachar IBM T.J. Watson Research Center 3 Saw Mill River Road, Hawthorne, NY 1532 RISH@US.IBM.CM

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Decision Trees (Cont.)

Decision Trees (Cont.) Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

probability of k samples out of J fall in R.

probability of k samples out of J fall in R. Nonparametric Techniques for Density Estimation (DHS Ch. 4) n Introduction n Estimation Procedure n Parzen Window Estimation n Parzen Window Example n K n -Nearest Neighbor Estimation Introduction Suppose

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability

More information

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses Bayesian Learning Two Roles for Bayesian Methods Probabilistic approach to inference. Quantities of interest are governed by prob. dist. and optimal decisions can be made by reasoning about these prob.

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Classification of Ordinal Data Using Neural Networks

Classification of Ordinal Data Using Neural Networks Classification of Ordinal Data Using Neural Networks Joaquim Pinto da Costa and Jaime S. Cardoso 2 Faculdade Ciências Universidade Porto, Porto, Portugal jpcosta@fc.up.pt 2 Faculdade Engenharia Universidade

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Supervised Learning: Non-parametric Estimation

Supervised Learning: Non-parametric Estimation Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:

More information

Probabilistic Regression Using Basis Function Models

Probabilistic Regression Using Basis Function Models Probabilistic Regression Using Basis Function Models Gregory Z. Grudic Department of Computer Science University of Colorado, Boulder grudic@cs.colorado.edu Abstract Our goal is to accurately estimate

More information

Numerical Learning Algorithms

Numerical Learning Algorithms Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Machine Learning 2010

Machine Learning 2010 Machine Learning 2010 Concept Learning: The Logical Approach Michael M Richter Email: mrichter@ucalgary.ca 1 - Part 1 Basic Concepts and Representation Languages 2 - Why Concept Learning? Concepts describe

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Pattern Recognition Problem. Pattern Recognition Problems. Pattern Recognition Problems. Pattern Recognition: OCR. Pattern Recognition Books

Pattern Recognition Problem. Pattern Recognition Problems. Pattern Recognition Problems. Pattern Recognition: OCR. Pattern Recognition Books Introduction to Statistical Pattern Recognition Pattern Recognition Problem R.P.W. Duin Pattern Recognition Group Delft University of Technology The Netherlands prcourse@prtools.org What is this? What

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Maximum Entropy Generative Models for Similarity-based Learning

Maximum Entropy Generative Models for Similarity-based Learning Maximum Entropy Generative Models for Similarity-based Learning Maya R. Gupta Dept. of EE University of Washington Seattle, WA 98195 gupta@ee.washington.edu Luca Cazzanti Applied Physics Lab University

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Classification via kernel regression based on univariate product density estimators

Classification via kernel regression based on univariate product density estimators Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP

More information

Parameter Estimation. Industrial AI Lab.

Parameter Estimation. Industrial AI Lab. Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

A Data-driven Approach for Remaining Useful Life Prediction of Critical Components

A Data-driven Approach for Remaining Useful Life Prediction of Critical Components GT S3 : Sûreté, Surveillance, Supervision Meeting GdR Modélisation, Analyse et Conduite des Systèmes Dynamiques (MACS) January 28 th, 2014 A Data-driven Approach for Remaining Useful Life Prediction of

More information

Artificial Intelligence Roman Barták

Artificial Intelligence Roman Barták Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Introduction We will describe agents that can improve their behavior through diligent study of their

More information

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Short Note: Naive Bayes Classifiers and Permanence of Ratios Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

Algorithms for Classification: The Basic Methods

Algorithms for Classification: The Basic Methods Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.

More information

Logistic Regression and Boosting for Labeled Bags of Instances

Logistic Regression and Boosting for Labeled Bags of Instances Logistic Regression and Boosting for Labeled Bags of Instances Xin Xu and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {xx5, eibe}@cs.waikato.ac.nz Abstract. In

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

Perception: objects in the environment

Perception: objects in the environment Zsolt Vizi, Ph.D. 2018 Self-driving cars Sensor fusion: one categorization Type 1: low-level/raw data fusion combining several sources of raw data to produce new data that is expected to be more informative

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information