Applied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw
|
|
- Griffin Elliott
- 5 years ago
- Views:
Transcription
1 Applied Logic Lecture 4 part 2 Bayesian inductive reasoning Marcin Szczuka Institute of Informatics, The University of Warsaw Monographic lecture, Spring semester 2017/2018 Marcin Szczuka (MIMUW) Applied Logic / 34
2 The ones illiterate in general probability theory still keep asking why, above all, Trurl did probabilized the dragon instead of elf or dwarf. Those do it due to ignorance, since they do not know that the dragon is just more probable than a dwarf... Stanisław Lem, The Cyberiad Fable three or dragons of probability Marcin Szczuka (MIMUW) Applied Logic / 34
3 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34
4 Measure of truth/possibility Recall that from an inductive (quasi-)formal system that we dare to call inductive logic we expect to provide a measure of support. This measure gives use the level of influence of the truthfulness of premises on truthfulness of conclusions. We require: 1 Fulfillment of the Criterion of Adequacy (CoA). 2 Ensuring, that the degree of confidence in the inferred conclusion is no greater than the confidence of the premises and inference rules. 3 Ability to clearly discern between proper conclusions (hypotheses) and nonsensical ones. 4 Intuitive interpretation. Marcin Szczuka (MIMUW) Applied Logic / 34
5 Probabilistic inference From the earliest onset the researchers tried to match the inductive reasoning paradigm with probability and/or statistics. Over time probability-based reasoning, in particular Bayesian reasoning, have established itself as a central focal point of philosophers and logicians working on formalisation of inductive systems (inductive logics). elements of probabilistic reasoning can be found in works of Pascal, Fermat, and others. Modern, formal approach to inductive logic based on the notion of similarity and probability was proposed by John Maynard Keynes in Treatise on Probability (1921). Rudolf Carnap developed further these ideas in his Logical Foundations of Probability (1950) and some other works, which are now considered a corner stone of probabilistic logic. After the mathematical theory of probability was ordered by Kolmogorov the probabilistic reasoning gained more traction as a proper, formal theory. Marcin Szczuka (MIMUW) Applied Logic / 34
6 Probabilistic inductive logic In case of inductive logics, in particular those based on probability, there is very little point in considering the strict formal consequence relation and its relationship with relation =. For the relation = we usually consider the support (probability) mapping rather than exact logical consequences. Support mapping (function) Function P : L [0, 1], where L is a set of statements (a language) is called support function if for A, B, C - statements in L the following holds: 1 There exists at least one pair of statement D, E L for which P (D E) < 1. 2 If B = A then P (A B) = 1. 3 If = (B C) then P (A B) = P (A C). 4 If C = (A B) then either P (A B C) = P (A C) + P (B C) or D L P (D C) = 1. 5 P ((A B) C) = P (A (B C)) P (B C) Marcin Szczuka (MIMUW) Applied Logic / 34
7 Probabilistic inductive logic Its easy to see that the conditions for support function P are a re-formulation of the axioms for probability measure. In definition of P the operator corresponds to logical entailment, i.e., the basic step in reasoning. It is easy to see that the mapping P is not uniquely defined. The conditions for P are essentially the same as for (unconditional) probability. It suffices to set P (A) = P (A (D D)) for some sentence (event) D. However, these conditions also allow for establishing the value P (A C) in case of probability of event C being 0 (P (C) = P (C (D D)) = 0). Condition 1 (not-triviality) in definition of P can be also expressed as A L P ((A A) (A A)) < 1. Marcin Szczuka (MIMUW) Applied Logic / 34
8 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34
9 Probability At this point we need to introduce the (simplified) axioms for probability measure that we will further use. In order to clearly discern form previous notation we will use the Pr to mark probability measure. Axioms for discrete probability (Kolmogorov) 1 For each event A Ω value Pr(A) [0, 1]. 2 Unit measure Pr(Ω) = 1. 3 Additivity if A 1,..., A n are mutually exclusive events, then n Pr(A i ) = 1 Pr(B) = i=1 n Pr(B A i ) Pr(A i ). i=1 Axiom 2 (unit measure) may be a source of some concern for us. Marcin Szczuka (MIMUW) Applied Logic / 34
10 Properties of probability Pr(A B) = Pr(B) Pr(A B) = Pr(A) Pr(B A) Pr(A B) = Pr(A) + Pr(B) Pr(A B) Pr(A B) - (conditional) probability of A given B. Pr(A B) = Pr(A B) Pr(B) Bayes rule Pr(A B) = Pr(B A) Pr(A) Pr(B) Marcin Szczuka (MIMUW) Applied Logic / 34
11 Bayesian inference For the reason that will become clear in the next part of the lecture, we will use the following notation. T X - set of premises (evidence set) coming from (huge) universe X. h H - conclusion (hypothesis) coming from some (huge) set of hypotheses H. V S H,T - version space, i.e., subset of H containing hypotheses that are consistent with T. Inference rule (Bayes ) For a hypothesis h H and evidence set T X: Pr(h T ) = Pr(T h) Pr(h) Pr(T ) Probability (level of support) of conclusion (hypothesis) h is established on the basis of support of premises (evidence) and the degree to which the hypothesis justifies the existence of evidence (premises). Marcin Szczuka (MIMUW) Applied Logic / 34
12 Remarks Pr(h T ) - a posteriori (posterior) probability of hypothesis h given premises (evidence data) T. That is what we are looking for. Pr(T ) - Probability of premises (evidence data) T. Fortunately, we do not have to know it if we are only interested in comparison of posterior probabilities of hypotheses. If, for some reason, we need to directly calculate that then we may have a problem. We need to calculate Pr(h) and Pr(T h). For the moment we assume that we can do that and that H is known. Pr(T h) determines the degree to which h justifies the appearance (truthfulness) of premises in T. Marcin Szczuka (MIMUW) Applied Logic / 34
13 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34
14 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34
15 Decision support tasks The real usefulness of the Bayesian approach is visible in its practical applications. The most popular of these is decision support (classification). Decision support (classification) is an example of using inductive inference methods such as prediction, argument by analogy and eliminative induction. We are going to construct Bayesian classifiers,i.e., algorithms (procedures) that learn the probability of decision value (classification) for new cases on the basis of cases observed previously (training sample). By restricting the reasoning task to prediction of decision value we can produce computationally viable, automated tool. Marcin Szczuka (MIMUW) Applied Logic / 34
16 Classifiers - basic notions Domain (space, universe) is a set X from which we draw examples. An element x X we address as example (instance, case, record, entity, vector, object, row). Attribute (feature, variable, measurement) is a function a : X A. Set A is called attribute value set or attribute domain. We assume that each example x X is completely represented by the vector where a 1 (x),..., a n (x), a i : X A i for i = 1,..., n. n is sometimes called the size (length) of example. Foe our purposes we usually distinguish a special decision attribute (decision, class), traditionally marked by dec or d. Marcin Szczuka (MIMUW) Applied Logic / 34
17 Tabular data Outlook Temp Humid Wind EnjoySpt sunny hot high FALSE no sunny hot high TRUE no overcast hot high FALSE yes rainy mild high FALSE yes rainy cool normal FALSE yes rainy cool normal TRUE no overcast cool normal TRUE yes sunny mild high FALSE no..... rainy mild high TRUE no Marcin Szczuka (MIMUW) Applied Logic / 34
18 Classifier Training set (training sample) T X corresponds to the set of premises. T d - subset of training data with decision d which corresponds to the set of premises supporting a particular hypothesis. T d a i =v - subset of training data with attribute a i requal to v and decision d. This corresponds to the set of premises of particular type supporting a particular hypothesis. Hypothesis space H is now limited to a set of possible decision bvalues, i.e., conditions (dec = d), where d V dec. Classification task Given training sample T determine the best (most probable) value of dec(x) for previously unseen case x X ( x / T ). Question: How to choose the best value of decision? Marcin Szczuka (MIMUW) Applied Logic / 34
19 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34
20 Hypothesis selection - MAP In Bayesian classification we want to find the most probable decision value for new example x given the collection of previously seen (training) examples and attribute values for x. So, using Bayes formula we need to find a hypothesis h (decision value) that maximises support (empirical probability). MAP - Maximum A Posteriori hypothesis Given training set T we attempt to classify example x X using hypothesis h MAP H by assigning to object x the decision value given by: h MAP = arg max h H Pr(h T ) = arg max Pr(T h) Pr(h) h H In MAP we chose the hypothesis that is the most probable. Marcin Szczuka (MIMUW) Applied Logic / 34
21 Hypothesis selection - ML ML - Maximum Likelihood hypothesis Given training set T we attempt to classify example x X using hypothesis h ML H by assigning to object x the decision value given by: h ML = arg max Pr(T h). h H In ML approach we chose the hypothesis that best explains (makes it most likely) the existence of our training sample. Note, that the hypothesis h ML may itself have low probability, but be very well adjusted to our particular data. Marcin Szczuka (MIMUW) Applied Logic / 34
22 Discussion of ML and MAP Both methods require the knowledge of Pr(T h). In case of MAP we also need musimy Pr(h) to be able to use Bayes formula. MAP is quite natural, but has major drawbacks. In particular, it promotes the dominating decision value. Both methods assume that the training set is error-free and that the hypothesis we look for is in H. ML is close to intuitive understanding of inductive learning. In the process of selecting hypothesis we go for the one that gives the best reason for existence of the particular training set we have. The MAP rule for selecting hypotheses select the most probable hypothesis while we are rather interested in selecting the most probable decision value for an example. With V dec = {0, 1}, H = {h MAP, h 1,..., h m }, 1 i m h(x) = 0, h MAP (x) = 1 and m Pr(h MAP T ) Pr(h i T ) Marcin Szczuka (MIMUW) Applied Logic / 34 i=1
23 Finding probabilities Pr(h) the easier part. We may be either given a probability (by learning method) or treat all hypotheses equally. In the later case: Pr(h) = 1 H The problem is the size of H. It may be a HUGE space. Also, in reality, we may not even know the whole H. Pr(T h) the harder part. Notice, that we are in fact only interested in decision making. We want to know the probability that a sample T will be consistent (will have the same decision) with hypothesis h. This yields: { 1 gdy h V SH,T Pr(T h) = 0 gdy h / V S H,T Unfortunately, the problem with size of H is still present. Marcin Szczuka (MIMUW) Applied Logic / 34
24 ML and MAP in practice The MAP and/or ML, despite serious practical limitations, can still be used in some special cases, given that: The hypothesis space is very restricted (and reasonably small). We use MAP and/or ML to score (few) competing hypotheses constructed by other means. This relates to the topics of stacking, coupled classifiers and layered learning. Marcin Szczuka (MIMUW) Applied Logic / 34
25 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34
26 Bayesian Optimal Classifier The Bayesian Optimal Classifier (BOC) always returns the most probable decision value for an example. In this respect it cannot be beaten by any other algorithm in terms of true (global) error. Sadly, the BOC isn t very useful from practical point of view since it uses entire hypothesis space. The hypothesis returned by BOC may not belong to H. Let c(.) be a the desired decision (target concept), T training sample. Then h BOC = arg max d V dec Pr(c(x) = d T ) where: Pr(c(x) = d T ) = h H Pr(c(x) = d h) Pr(h T ) Pr(c(x) = d h) = { 1 if h(x) = d 0 if h(x) d The hypothesis returned by BOC may not belong to H. Marcin Szczuka (MIMUW) Applied Logic / 34
27 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34
28 Naïve Bayes classifier Let x be a new example that we need to classify. We should select a hypothesis h such that: n h(x ) = arg max Pr(c(x) = d a i (x) = a i (x )) d V dec Hence, from Bayes formula: arg max d C Pr(c(x) = d) Pr( n i=1 i=1 a i (x) = a i (x ) c(x) = d) If we (naïvely) assume that attributes are independent as probabilistic variables then: arg max Pr(c(x) = d) n Pr(a i (x) = a i (x ) c(x) = d) d C i=1 All, that is left to do is to estimate Pr(c(x) = d) and Pr(a i (x) = v c(x) = d) from data. Marcin Szczuka (MIMUW) Applied Logic / 34
29 NBC - technical details Usually, we employ an m-estimate to get Pr(a i (x) = v c(x) = d) = T d a i v + mp T + m where m is an integer parameter, and p is prior probability of decision class. Usually, if no background knowledge is given, we set m = A i and p = 1 A i, where A i is a (finite) set of values for attribute a i. Complexity of NBC For each example we have to modify counts for decision class and for particular attribute values. That is, in total O(n T ) basic computational steps Complexity of NBC is the lowest rational estimate for any classification algorithm without prior knowledge. Also, each step in NBC is fast and cheap, hence the method is computationally efficient. Marcin Szczuka (MIMUW) Applied Logic / 34
30 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34
31 Requirements for hypotheses On the higher level of abstraction we can demand from the hypothesis to not only be the best (most probable) explanation, but also to be the simplest one. This may be seen as a special application of lex parsimoniæ (Occam s razor). We prefer the simplest explanation, i.e., the hypothesis that requires according to William of Occam the least amount of assumptions. In practice, lex parsimoniæ is frequently replaced by a simpler Minimum Description Length (MDL) principle. MDL - Minimum Description Length MDL recommends the simplest method for re-encoding the data with use of hypothesis, i.e., hypothesis that gives the best compression. Choosing the particular hypothesis produces a shortest algorithm for reproduction of data. In classification, this usually means the shortest hypothesis. Marcin Szczuka (MIMUW) Applied Logic / 34
32 MDL in Bayesian classification Bayesian classifiers are considered one of the best method for producing MDL-compliant hypotheses. For the purposes of comparing description lengths in the example below we define the length with a (binary) logarithm of the description of probability. Taking the logarithm of Bayes formula, we get: log Pr(h T ) = log Pr(h) + log Pr(T h) log Pr(T ) Substituting L(.) for log Pr(.) we obtain: L(h T ) = L(h) + L(T h) L(T ) where L(h), L(T h) represent the length of hypothesis h and length of data T (given h). In both cases we assume that the encoding is known and optimal. Marcin Szczuka (MIMUW) Applied Logic / 34
33 MDL in Bayesian classification Ultimately, we select a hypothesis that is the best w.r.t. MDL: h MDL = arg min h H L Enc H (h) + L EncD (T h) Assuming that Enc H and Enc D are odpowiednio, hipotezy i danych, dostajemy: h MDL = h MAP. Intuitively, MDL helps to find the right balance between quality and simplicity of a hypothesis. The MDL principle is frequently used for scoring candidate hypotheses constructed by other means. It is also applicable to the task of simplifying existing hypotheses, for example in filtering of decision rule sets and decision tree pruning. It also provides an effective stop criterion for many practical algorithms. Marcin Szczuka (MIMUW) Applied Logic / 34
34 Kolmogorov complexity MDL is also connected with more general notion of Kolmogorov Complexity (descriptive complexity, Kolmogorov Chaitin complexity, algorithmic entropy). Kolmogorov Complexity for a finite or infinite sequence of symbols (stream of data) is defined as a length of the simplest (shortest) algorithm that generates this data. Naturally, the notion of algorithm length is quite complicated and requires formal definition. Such definition is usally done with use of formal languages and Turing machines. In most non-trivial cases the task of calculating Kolmogorov complexity for a sequence is very hard, frequently practically impossible (undecidable). Let s consider two finite sequences of numbers: has a very low Kolmogorov complexity since there exists a very simple algorithm to generate decimal expansion of π is a random sequence with potentially very high Kolmogorov complexity. Marcin Szczuka (MIMUW) Applied Logic / 34
Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction
15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationAlgorithms for Classification: The Basic Methods
Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.
More informationApplied Logic. Lecture 1 - Propositional logic. Marcin Szczuka. Institute of Informatics, The University of Warsaw
Applied Logic Lecture 1 - Propositional logic Marcin Szczuka Institute of Informatics, The University of Warsaw Monographic lecture, Spring semester 2017/2018 Marcin Szczuka (MIMUW) Applied Logic 2018
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationLecture 9: Bayesian Learning
Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal
More informationIntroduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees
Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationOutline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997
Outline Training Examples for EnjoySport Learning from examples General-to-specific ordering over hypotheses [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Version spaces and candidate elimination
More informationBayesian Learning Features of Bayesian learning methods:
Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more
More informationBayesian Classification. Bayesian Classification: Why?
Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationDecision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University
Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington State University Outline Decision tree representation ID3 learning algorithm Entropy and information gain
More informationDECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]
1 DECISION TREE LEARNING [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting Decision Tree 2 Representation: Tree-structured
More informationDecision Trees. Tirgul 5
Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2
More informationInteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano
Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano Prof. Josenildo Silva jcsilva@ifma.edu.br 2015 2012-2015 Josenildo Silva (jcsilva@ifma.edu.br) Este material é derivado dos
More information[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses
1 CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specific ordering over hypotheses Version spaces and
More informationBayesian Learning Extension
Bayesian Learning Extension This document will go over one of the most useful forms of statistical inference known as Baye s Rule several of the concepts that extend from it. Named after Thomas Bayes this
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}
More informationDecision Trees.
. Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationAnswers Machine Learning Exercises 2
nswers Machine Learning Exercises 2 Tim van Erven October 7, 2007 Exercises. Consider the List-Then-Eliminate algorithm for the EnjoySport example with hypothesis space H = {?,?,?,?,?,?, Sunny,?,?,?,?,?,
More informationBayesian Learning. Bayesian Learning Criteria
Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:
More informationAdministration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function
Administration Chapter 3: Decision Tree Learning (part 2) Book on reserve in the math library. Questions? CS 536: Machine Learning Littman (Wu, TA) Measuring Entropy Entropy Function S is a sample of training
More informationDecision Trees. Danushka Bollegala
Decision Trees Danushka Bollegala Rule-based Classifiers In rule-based learning, the idea is to learn a rule from train data in the form IF X THEN Y (or a combination of nested conditions) that explains
More informationDecision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis
Decision-Tree Learning Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) [read Chapter 3] [some of Chapter 2 might help ] [recommended exercises 3.1, 3.2] Decision tree representation
More informationCSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.
22 February 2007 CSE-4412(M) Midterm p. 1 of 12 CSE-4412(M) Midterm Sur / Last Name: Given / First Name: Student ID: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: Winter 2007 Answer the following
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationChapter 3: Decision Tree Learning
Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) Administration Books? New web page: http://www.cs.rutgers.edu/~mlittman/courses/ml03/ schedule lecture notes assignment info.
More informationStatistical Learning. Philipp Koehn. 10 November 2015
Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum
More informationSymbolic methods in TC: Decision Trees
Symbolic methods in TC: Decision Trees ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.cs.tcd.ie/kevin.koidl/cs0/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 01-017 A symbolic
More information10-701/ Machine Learning: Assignment 1
10-701/15-781 Machine Learning: Assignment 1 The assignment is due September 27, 2005 at the beginning of class. Write your name in the top right-hand corner of each page submitted. No paperclips, folders,
More informationDecision Trees.
. Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationFrom inductive inference to machine learning
From inductive inference to machine learning ADAPTED FROM AIMA SLIDES Russel&Norvig:Artificial Intelligence: a modern approach AIMA: Inductive inference AIMA: Inductive inference 1 Outline Bayesian inferences
More informationProbability Based Learning
Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic
More informationConfusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:
Confusion matrix classifier-determined positive label classifier-determined negative label true positive a b label true negative c d label Accuracy = (a+d)/(a+b+c+d) a = true positives b = false negatives
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification
More informationInduction on Decision Trees
Séance «IDT» de l'ue «apprentissage automatique» Bruno Bouzy bruno.bouzy@parisdescartes.fr www.mi.parisdescartes.fr/~bouzy Outline Induction task ID3 Entropy (disorder) minimization Noise Unknown attribute
More informationDecision Tree Learning
Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,
More informationBITS F464: MACHINE LEARNING
BITS F464: MACHINE LEARNING Lecture-09: Concept Learning Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031 INDIA Jan
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationConcept Learning. Space of Versions of Concepts Learned
Concept Learning Space of Versions of Concepts Learned 1 A Concept Learning Task Target concept: Days on which Aldo enjoys his favorite water sport Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationTopics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning
Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying
More informationCISC 876: Kolmogorov Complexity
March 27, 2007 Outline 1 Introduction 2 Definition Incompressibility and Randomness 3 Prefix Complexity Resource-Bounded K-Complexity 4 Incompressibility Method Gödel s Incompleteness Theorem 5 Outline
More informationPattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationClassification Using Decision Trees
Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association
More informationCS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón
CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING Santiago Ontañón so367@drexel.edu Summary so far: Rational Agents Problem Solving Systematic Search: Uninformed Informed Local Search Adversarial Search
More informationVersion Spaces.
. Machine Learning Version Spaces Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationLecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan
Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof Ganesh Ramakrishnan October 20, 2016 1 / 25 Decision Trees: Cascade of step
More informationConcept Learning.
. Machine Learning Concept Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de
More informationIntroduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition
Introduction Decision Tree Learning Practical methods for inductive inference Approximating discrete-valued functions Robust to noisy data and capable of learning disjunctive expression ID3 earch a completely
More informationQuestion of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning
Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis
More informationSome Concepts of Probability (Review) Volker Tresp Summer 2018
Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten
More informationGenerative Techniques: Bayes Rule and the Axioms of Probability
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 8 3 March 2017 Generative Techniques: Bayes Rule and the Axioms of Probability Generative
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Intelligent Data Analysis Decision Trees Paul Prasse, Niels Landwehr, Tobias Scheffer Decision Trees One of many applications:
More informationEECS 349:Machine Learning Bryan Pardo
EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example
More informationImagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.
Decision Trees Defining the Task Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything. Can we predict, i.e
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,
More informationCOMP 328: Machine Learning
COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang
More informationThe Bayesian Learning
The Bayesian Learning Rodrigo Fernandes de Mello Invited Professor at Télécom ParisTech Associate Professor at Universidade de São Paulo, ICMC, Brazil http://www.icmc.usp.br/~mello mello@icmc.usp.br First
More informationText Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)
Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed
More informationCMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th
CMPT 882 - Machine Learning Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th Stephen Fagan sfagan@sfu.ca Overview: Introduction - Who was Bayes? - Bayesian Statistics Versus Classical Statistics
More informationDecision Trees. Gavin Brown
Decision Trees Gavin Brown Every Learning Method has Limitations Linear model? KNN? SVM? Explain your decisions Sometimes we need interpretable results from our techniques. How do you explain the above
More informationApplied Logic. Lecture 4 part 1 Inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw
Applied Logic Lecture 4 part 1 Inductive reasoning Marcin Szczuka Institute of Informatics, The University of Warsaw Monographic lecture, Spring semester 2016/2017 Marcin Szczuka (MIMUW) Applied Logic
More informationData classification (II)
Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification
More informationConcept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University
Concept Learning Mitchell, Chapter 2 CptS 570 Machine Learning School of EECS Washington State University Outline Definition General-to-specific ordering over hypotheses Version spaces and the candidate
More informationOutline. Introduction. Bayesian Probability Theory Bayes rule Bayes rule applied to learning Bayesian learning and the MDL principle
Outline Introduction Bayesian Probability Theory Bayes rule Bayes rule applied to learning Bayesian learning and the MDL principle Sequence Prediction and Data Compression Bayesian Networks Copyright 2015
More informationM chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng
1 Decision Trees 2 Instances Describable by Attribute-Value Pairs Target Function Is Discrete Valued Disjunctive Hypothesis May Be Required Possibly Noisy Training Data Examples Equipment or medical diagnosis
More informationDecision Tree Learning - ID3
Decision Tree Learning - ID3 n Decision tree examples n ID3 algorithm n Occam Razor n Top-Down Induction in Decision Trees n Information Theory n gain from property 1 Training Examples Day Outlook Temp.
More informationQuestion of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning
Question of the Day? Machine Learning 2D43 Lecture 3: Concept Learning What row of numbers comes next in this series? 2 2 22 322 3222 Outline Training Examples for Concept Enjoy Sport Learning from examples
More informationJialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013
Simple Classifiers Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 1 Overview Pruning 2 Section 3.1: Simplicity First Pruning Always start simple! Accuracy can be misleading.
More informationThe Solution to Assignment 6
The Solution to Assignment 6 Problem 1: Use the 2-fold cross-validation to evaluate the Decision Tree Model for trees up to 2 levels deep (that is, the maximum path length from the root to the leaves is
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationLearning with Probabilities
Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior
More informationDecision Support. Dr. Johan Hagelbäck.
Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems
More informationDecision Tree Learning
0. Decision Tree Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 3 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell PLAN 1. Concept learning:
More informationClassification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci
Classification: Rule Induction Information Retrieval and Data Mining Prof. Matteo Matteucci What is Rule Induction? The Weather Dataset 3 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny
More informationMachine Learning 2nd Edi7on
Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
More informationI. Induction, Probability and Confirmation: Introduction
I. Induction, Probability and Confirmation: Introduction 1. Basic Definitions and Distinctions Singular statements vs. universal statements Observational terms vs. theoretical terms Observational statement
More informationChapter 3: Decision Tree Learning (part 2)
Chapter 3: Decision Tree Learning (part 2) CS 536: Machine Learning Littman (Wu, TA) Administration Books? Two on reserve in the math library. icml-03: instructional Conference on Machine Learning mailing
More informationLecture 2: Foundations of Concept Learning
Lecture 2: Foundations of Concept Learning Cognitive Systems II - Machine Learning WS 2005/2006 Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias Lecture
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationIntroduction and Models
CSE522, Winter 2011, Learning Theory Lecture 1 and 2-01/04/2011, 01/06/2011 Lecturer: Ofer Dekel Introduction and Models Scribe: Jessica Chang Machine learning algorithms have emerged as the dominant and
More informationClassification and Prediction
Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model
More informationDecision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology
Decision trees Special Course in Computer and Information Science II Adam Gyenge Helsinki University of Technology 6.2.2008 Introduction Outline: Definition of decision trees ID3 Pruning methods Bibliography:
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationDecision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1
Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationIntroduction to machine learning. Concept learning. Design of a learning system. Designing a learning system
Introduction to machine learning Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2). Introduction to machine learning When appropriate
More informationData Mining and Machine Learning
Data Mining and Machine Learning Concept Learning and Version Spaces Introduction Concept Learning Generality Relations Refinement Operators Structured Hypothesis Spaces Simple algorithms Find-S Find-G
More information