Applied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Size: px
Start display at page:

Download "Applied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw"

Transcription

1 Applied Logic Lecture 4 part 2 Bayesian inductive reasoning Marcin Szczuka Institute of Informatics, The University of Warsaw Monographic lecture, Spring semester 2017/2018 Marcin Szczuka (MIMUW) Applied Logic / 34

2 The ones illiterate in general probability theory still keep asking why, above all, Trurl did probabilized the dragon instead of elf or dwarf. Those do it due to ignorance, since they do not know that the dragon is just more probable than a dwarf... Stanisław Lem, The Cyberiad Fable three or dragons of probability Marcin Szczuka (MIMUW) Applied Logic / 34

3 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34

4 Measure of truth/possibility Recall that from an inductive (quasi-)formal system that we dare to call inductive logic we expect to provide a measure of support. This measure gives use the level of influence of the truthfulness of premises on truthfulness of conclusions. We require: 1 Fulfillment of the Criterion of Adequacy (CoA). 2 Ensuring, that the degree of confidence in the inferred conclusion is no greater than the confidence of the premises and inference rules. 3 Ability to clearly discern between proper conclusions (hypotheses) and nonsensical ones. 4 Intuitive interpretation. Marcin Szczuka (MIMUW) Applied Logic / 34

5 Probabilistic inference From the earliest onset the researchers tried to match the inductive reasoning paradigm with probability and/or statistics. Over time probability-based reasoning, in particular Bayesian reasoning, have established itself as a central focal point of philosophers and logicians working on formalisation of inductive systems (inductive logics). elements of probabilistic reasoning can be found in works of Pascal, Fermat, and others. Modern, formal approach to inductive logic based on the notion of similarity and probability was proposed by John Maynard Keynes in Treatise on Probability (1921). Rudolf Carnap developed further these ideas in his Logical Foundations of Probability (1950) and some other works, which are now considered a corner stone of probabilistic logic. After the mathematical theory of probability was ordered by Kolmogorov the probabilistic reasoning gained more traction as a proper, formal theory. Marcin Szczuka (MIMUW) Applied Logic / 34

6 Probabilistic inductive logic In case of inductive logics, in particular those based on probability, there is very little point in considering the strict formal consequence relation and its relationship with relation =. For the relation = we usually consider the support (probability) mapping rather than exact logical consequences. Support mapping (function) Function P : L [0, 1], where L is a set of statements (a language) is called support function if for A, B, C - statements in L the following holds: 1 There exists at least one pair of statement D, E L for which P (D E) < 1. 2 If B = A then P (A B) = 1. 3 If = (B C) then P (A B) = P (A C). 4 If C = (A B) then either P (A B C) = P (A C) + P (B C) or D L P (D C) = 1. 5 P ((A B) C) = P (A (B C)) P (B C) Marcin Szczuka (MIMUW) Applied Logic / 34

7 Probabilistic inductive logic Its easy to see that the conditions for support function P are a re-formulation of the axioms for probability measure. In definition of P the operator corresponds to logical entailment, i.e., the basic step in reasoning. It is easy to see that the mapping P is not uniquely defined. The conditions for P are essentially the same as for (unconditional) probability. It suffices to set P (A) = P (A (D D)) for some sentence (event) D. However, these conditions also allow for establishing the value P (A C) in case of probability of event C being 0 (P (C) = P (C (D D)) = 0). Condition 1 (not-triviality) in definition of P can be also expressed as A L P ((A A) (A A)) < 1. Marcin Szczuka (MIMUW) Applied Logic / 34

8 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34

9 Probability At this point we need to introduce the (simplified) axioms for probability measure that we will further use. In order to clearly discern form previous notation we will use the Pr to mark probability measure. Axioms for discrete probability (Kolmogorov) 1 For each event A Ω value Pr(A) [0, 1]. 2 Unit measure Pr(Ω) = 1. 3 Additivity if A 1,..., A n are mutually exclusive events, then n Pr(A i ) = 1 Pr(B) = i=1 n Pr(B A i ) Pr(A i ). i=1 Axiom 2 (unit measure) may be a source of some concern for us. Marcin Szczuka (MIMUW) Applied Logic / 34

10 Properties of probability Pr(A B) = Pr(B) Pr(A B) = Pr(A) Pr(B A) Pr(A B) = Pr(A) + Pr(B) Pr(A B) Pr(A B) - (conditional) probability of A given B. Pr(A B) = Pr(A B) Pr(B) Bayes rule Pr(A B) = Pr(B A) Pr(A) Pr(B) Marcin Szczuka (MIMUW) Applied Logic / 34

11 Bayesian inference For the reason that will become clear in the next part of the lecture, we will use the following notation. T X - set of premises (evidence set) coming from (huge) universe X. h H - conclusion (hypothesis) coming from some (huge) set of hypotheses H. V S H,T - version space, i.e., subset of H containing hypotheses that are consistent with T. Inference rule (Bayes ) For a hypothesis h H and evidence set T X: Pr(h T ) = Pr(T h) Pr(h) Pr(T ) Probability (level of support) of conclusion (hypothesis) h is established on the basis of support of premises (evidence) and the degree to which the hypothesis justifies the existence of evidence (premises). Marcin Szczuka (MIMUW) Applied Logic / 34

12 Remarks Pr(h T ) - a posteriori (posterior) probability of hypothesis h given premises (evidence data) T. That is what we are looking for. Pr(T ) - Probability of premises (evidence data) T. Fortunately, we do not have to know it if we are only interested in comparison of posterior probabilities of hypotheses. If, for some reason, we need to directly calculate that then we may have a problem. We need to calculate Pr(h) and Pr(T h). For the moment we assume that we can do that and that H is known. Pr(T h) determines the degree to which h justifies the appearance (truthfulness) of premises in T. Marcin Szczuka (MIMUW) Applied Logic / 34

13 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34

14 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34

15 Decision support tasks The real usefulness of the Bayesian approach is visible in its practical applications. The most popular of these is decision support (classification). Decision support (classification) is an example of using inductive inference methods such as prediction, argument by analogy and eliminative induction. We are going to construct Bayesian classifiers,i.e., algorithms (procedures) that learn the probability of decision value (classification) for new cases on the basis of cases observed previously (training sample). By restricting the reasoning task to prediction of decision value we can produce computationally viable, automated tool. Marcin Szczuka (MIMUW) Applied Logic / 34

16 Classifiers - basic notions Domain (space, universe) is a set X from which we draw examples. An element x X we address as example (instance, case, record, entity, vector, object, row). Attribute (feature, variable, measurement) is a function a : X A. Set A is called attribute value set or attribute domain. We assume that each example x X is completely represented by the vector where a 1 (x),..., a n (x), a i : X A i for i = 1,..., n. n is sometimes called the size (length) of example. Foe our purposes we usually distinguish a special decision attribute (decision, class), traditionally marked by dec or d. Marcin Szczuka (MIMUW) Applied Logic / 34

17 Tabular data Outlook Temp Humid Wind EnjoySpt sunny hot high FALSE no sunny hot high TRUE no overcast hot high FALSE yes rainy mild high FALSE yes rainy cool normal FALSE yes rainy cool normal TRUE no overcast cool normal TRUE yes sunny mild high FALSE no..... rainy mild high TRUE no Marcin Szczuka (MIMUW) Applied Logic / 34

18 Classifier Training set (training sample) T X corresponds to the set of premises. T d - subset of training data with decision d which corresponds to the set of premises supporting a particular hypothesis. T d a i =v - subset of training data with attribute a i requal to v and decision d. This corresponds to the set of premises of particular type supporting a particular hypothesis. Hypothesis space H is now limited to a set of possible decision bvalues, i.e., conditions (dec = d), where d V dec. Classification task Given training sample T determine the best (most probable) value of dec(x) for previously unseen case x X ( x / T ). Question: How to choose the best value of decision? Marcin Szczuka (MIMUW) Applied Logic / 34

19 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34

20 Hypothesis selection - MAP In Bayesian classification we want to find the most probable decision value for new example x given the collection of previously seen (training) examples and attribute values for x. So, using Bayes formula we need to find a hypothesis h (decision value) that maximises support (empirical probability). MAP - Maximum A Posteriori hypothesis Given training set T we attempt to classify example x X using hypothesis h MAP H by assigning to object x the decision value given by: h MAP = arg max h H Pr(h T ) = arg max Pr(T h) Pr(h) h H In MAP we chose the hypothesis that is the most probable. Marcin Szczuka (MIMUW) Applied Logic / 34

21 Hypothesis selection - ML ML - Maximum Likelihood hypothesis Given training set T we attempt to classify example x X using hypothesis h ML H by assigning to object x the decision value given by: h ML = arg max Pr(T h). h H In ML approach we chose the hypothesis that best explains (makes it most likely) the existence of our training sample. Note, that the hypothesis h ML may itself have low probability, but be very well adjusted to our particular data. Marcin Szczuka (MIMUW) Applied Logic / 34

22 Discussion of ML and MAP Both methods require the knowledge of Pr(T h). In case of MAP we also need musimy Pr(h) to be able to use Bayes formula. MAP is quite natural, but has major drawbacks. In particular, it promotes the dominating decision value. Both methods assume that the training set is error-free and that the hypothesis we look for is in H. ML is close to intuitive understanding of inductive learning. In the process of selecting hypothesis we go for the one that gives the best reason for existence of the particular training set we have. The MAP rule for selecting hypotheses select the most probable hypothesis while we are rather interested in selecting the most probable decision value for an example. With V dec = {0, 1}, H = {h MAP, h 1,..., h m }, 1 i m h(x) = 0, h MAP (x) = 1 and m Pr(h MAP T ) Pr(h i T ) Marcin Szczuka (MIMUW) Applied Logic / 34 i=1

23 Finding probabilities Pr(h) the easier part. We may be either given a probability (by learning method) or treat all hypotheses equally. In the later case: Pr(h) = 1 H The problem is the size of H. It may be a HUGE space. Also, in reality, we may not even know the whole H. Pr(T h) the harder part. Notice, that we are in fact only interested in decision making. We want to know the probability that a sample T will be consistent (will have the same decision) with hypothesis h. This yields: { 1 gdy h V SH,T Pr(T h) = 0 gdy h / V S H,T Unfortunately, the problem with size of H is still present. Marcin Szczuka (MIMUW) Applied Logic / 34

24 ML and MAP in practice The MAP and/or ML, despite serious practical limitations, can still be used in some special cases, given that: The hypothesis space is very restricted (and reasonably small). We use MAP and/or ML to score (few) competing hypotheses constructed by other means. This relates to the topics of stacking, coupled classifiers and layered learning. Marcin Szczuka (MIMUW) Applied Logic / 34

25 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34

26 Bayesian Optimal Classifier The Bayesian Optimal Classifier (BOC) always returns the most probable decision value for an example. In this respect it cannot be beaten by any other algorithm in terms of true (global) error. Sadly, the BOC isn t very useful from practical point of view since it uses entire hypothesis space. The hypothesis returned by BOC may not belong to H. Let c(.) be a the desired decision (target concept), T training sample. Then h BOC = arg max d V dec Pr(c(x) = d T ) where: Pr(c(x) = d T ) = h H Pr(c(x) = d h) Pr(h T ) Pr(c(x) = d h) = { 1 if h(x) = d 0 if h(x) d The hypothesis returned by BOC may not belong to H. Marcin Szczuka (MIMUW) Applied Logic / 34

27 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34

28 Naïve Bayes classifier Let x be a new example that we need to classify. We should select a hypothesis h such that: n h(x ) = arg max Pr(c(x) = d a i (x) = a i (x )) d V dec Hence, from Bayes formula: arg max d C Pr(c(x) = d) Pr( n i=1 i=1 a i (x) = a i (x ) c(x) = d) If we (naïvely) assume that attributes are independent as probabilistic variables then: arg max Pr(c(x) = d) n Pr(a i (x) = a i (x ) c(x) = d) d C i=1 All, that is left to do is to estimate Pr(c(x) = d) and Pr(a i (x) = v c(x) = d) from data. Marcin Szczuka (MIMUW) Applied Logic / 34

29 NBC - technical details Usually, we employ an m-estimate to get Pr(a i (x) = v c(x) = d) = T d a i v + mp T + m where m is an integer parameter, and p is prior probability of decision class. Usually, if no background knowledge is given, we set m = A i and p = 1 A i, where A i is a (finite) set of values for attribute a i. Complexity of NBC For each example we have to modify counts for decision class and for particular attribute values. That is, in total O(n T ) basic computational steps Complexity of NBC is the lowest rational estimate for any classification algorithm without prior knowledge. Also, each step in NBC is fast and cheap, hence the method is computationally efficient. Marcin Szczuka (MIMUW) Applied Logic / 34

30 Lecture plan 1 Introduction 2 Bayesian reasoning 3 Bayesian prediction and decision support Classification problems Selecting hypothesis - MAP and ML Bayesian Optimal Classifier Naïve Bayes classifier 4 Hypothesis selection general issues Marcin Szczuka (MIMUW) Applied Logic / 34

31 Requirements for hypotheses On the higher level of abstraction we can demand from the hypothesis to not only be the best (most probable) explanation, but also to be the simplest one. This may be seen as a special application of lex parsimoniæ (Occam s razor). We prefer the simplest explanation, i.e., the hypothesis that requires according to William of Occam the least amount of assumptions. In practice, lex parsimoniæ is frequently replaced by a simpler Minimum Description Length (MDL) principle. MDL - Minimum Description Length MDL recommends the simplest method for re-encoding the data with use of hypothesis, i.e., hypothesis that gives the best compression. Choosing the particular hypothesis produces a shortest algorithm for reproduction of data. In classification, this usually means the shortest hypothesis. Marcin Szczuka (MIMUW) Applied Logic / 34

32 MDL in Bayesian classification Bayesian classifiers are considered one of the best method for producing MDL-compliant hypotheses. For the purposes of comparing description lengths in the example below we define the length with a (binary) logarithm of the description of probability. Taking the logarithm of Bayes formula, we get: log Pr(h T ) = log Pr(h) + log Pr(T h) log Pr(T ) Substituting L(.) for log Pr(.) we obtain: L(h T ) = L(h) + L(T h) L(T ) where L(h), L(T h) represent the length of hypothesis h and length of data T (given h). In both cases we assume that the encoding is known and optimal. Marcin Szczuka (MIMUW) Applied Logic / 34

33 MDL in Bayesian classification Ultimately, we select a hypothesis that is the best w.r.t. MDL: h MDL = arg min h H L Enc H (h) + L EncD (T h) Assuming that Enc H and Enc D are odpowiednio, hipotezy i danych, dostajemy: h MDL = h MAP. Intuitively, MDL helps to find the right balance between quality and simplicity of a hypothesis. The MDL principle is frequently used for scoring candidate hypotheses constructed by other means. It is also applicable to the task of simplifying existing hypotheses, for example in filtering of decision rule sets and decision tree pruning. It also provides an effective stop criterion for many practical algorithms. Marcin Szczuka (MIMUW) Applied Logic / 34

34 Kolmogorov complexity MDL is also connected with more general notion of Kolmogorov Complexity (descriptive complexity, Kolmogorov Chaitin complexity, algorithmic entropy). Kolmogorov Complexity for a finite or infinite sequence of symbols (stream of data) is defined as a length of the simplest (shortest) algorithm that generates this data. Naturally, the notion of algorithm length is quite complicated and requires formal definition. Such definition is usally done with use of formal languages and Turing machines. In most non-trivial cases the task of calculating Kolmogorov complexity for a sequence is very hard, frequently practically impossible (undecidable). Let s consider two finite sequences of numbers: has a very low Kolmogorov complexity since there exists a very simple algorithm to generate decimal expansion of π is a random sequence with potentially very high Kolmogorov complexity. Marcin Szczuka (MIMUW) Applied Logic / 34

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

Algorithms for Classification: The Basic Methods

Algorithms for Classification: The Basic Methods Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.

More information

Applied Logic. Lecture 1 - Propositional logic. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Applied Logic. Lecture 1 - Propositional logic. Marcin Szczuka. Institute of Informatics, The University of Warsaw Applied Logic Lecture 1 - Propositional logic Marcin Szczuka Institute of Informatics, The University of Warsaw Monographic lecture, Spring semester 2017/2018 Marcin Szczuka (MIMUW) Applied Logic 2018

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997 Outline Training Examples for EnjoySport Learning from examples General-to-specific ordering over hypotheses [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Version spaces and candidate elimination

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

Bayesian Classification. Bayesian Classification: Why?

Bayesian Classification. Bayesian Classification: Why? Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington State University Outline Decision tree representation ID3 learning algorithm Entropy and information gain

More information

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4] 1 DECISION TREE LEARNING [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting Decision Tree 2 Representation: Tree-structured

More information

Decision Trees. Tirgul 5

Decision Trees. Tirgul 5 Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2

More information

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano Prof. Josenildo Silva jcsilva@ifma.edu.br 2015 2012-2015 Josenildo Silva (jcsilva@ifma.edu.br) Este material é derivado dos

More information

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses 1 CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specific ordering over hypotheses Version spaces and

More information

Bayesian Learning Extension

Bayesian Learning Extension Bayesian Learning Extension This document will go over one of the most useful forms of statistical inference known as Baye s Rule several of the concepts that extend from it. Named after Thomas Bayes this

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

Decision Trees.

Decision Trees. . Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Answers Machine Learning Exercises 2

Answers Machine Learning Exercises 2 nswers Machine Learning Exercises 2 Tim van Erven October 7, 2007 Exercises. Consider the List-Then-Eliminate algorithm for the EnjoySport example with hypothesis space H = {?,?,?,?,?,?, Sunny,?,?,?,?,?,

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function Administration Chapter 3: Decision Tree Learning (part 2) Book on reserve in the math library. Questions? CS 536: Machine Learning Littman (Wu, TA) Measuring Entropy Entropy Function S is a sample of training

More information

Decision Trees. Danushka Bollegala

Decision Trees. Danushka Bollegala Decision Trees Danushka Bollegala Rule-based Classifiers In rule-based learning, the idea is to learn a rule from train data in the form IF X THEN Y (or a combination of nested conditions) that explains

More information

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis Decision-Tree Learning Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) [read Chapter 3] [some of Chapter 2 might help ] [recommended exercises 3.1, 3.2] Decision tree representation

More information

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated. 22 February 2007 CSE-4412(M) Midterm p. 1 of 12 CSE-4412(M) Midterm Sur / Last Name: Given / First Name: Student ID: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: Winter 2007 Answer the following

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Chapter 3: Decision Tree Learning

Chapter 3: Decision Tree Learning Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) Administration Books? New web page: http://www.cs.rutgers.edu/~mlittman/courses/ml03/ schedule lecture notes assignment info.

More information

Statistical Learning. Philipp Koehn. 10 November 2015

Statistical Learning. Philipp Koehn. 10 November 2015 Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum

More information

Symbolic methods in TC: Decision Trees

Symbolic methods in TC: Decision Trees Symbolic methods in TC: Decision Trees ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.cs.tcd.ie/kevin.koidl/cs0/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 01-017 A symbolic

More information

10-701/ Machine Learning: Assignment 1

10-701/ Machine Learning: Assignment 1 10-701/15-781 Machine Learning: Assignment 1 The assignment is due September 27, 2005 at the beginning of class. Write your name in the top right-hand corner of each page submitted. No paperclips, folders,

More information

Decision Trees.

Decision Trees. . Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

From inductive inference to machine learning

From inductive inference to machine learning From inductive inference to machine learning ADAPTED FROM AIMA SLIDES Russel&Norvig:Artificial Intelligence: a modern approach AIMA: Inductive inference AIMA: Inductive inference 1 Outline Bayesian inferences

More information

Probability Based Learning

Probability Based Learning Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic

More information

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision: Confusion matrix classifier-determined positive label classifier-determined negative label true positive a b label true negative c d label Accuracy = (a+d)/(a+b+c+d) a = true positives b = false negatives

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Induction on Decision Trees

Induction on Decision Trees Séance «IDT» de l'ue «apprentissage automatique» Bruno Bouzy bruno.bouzy@parisdescartes.fr www.mi.parisdescartes.fr/~bouzy Outline Induction task ID3 Entropy (disorder) minimization Noise Unknown attribute

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

BITS F464: MACHINE LEARNING

BITS F464: MACHINE LEARNING BITS F464: MACHINE LEARNING Lecture-09: Concept Learning Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031 INDIA Jan

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Concept Learning. Space of Versions of Concepts Learned

Concept Learning. Space of Versions of Concepts Learned Concept Learning Space of Versions of Concepts Learned 1 A Concept Learning Task Target concept: Days on which Aldo enjoys his favorite water sport Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying

More information

CISC 876: Kolmogorov Complexity

CISC 876: Kolmogorov Complexity March 27, 2007 Outline 1 Introduction 2 Definition Incompressibility and Randomness 3 Prefix Complexity Resource-Bounded K-Complexity 4 Incompressibility Method Gödel s Incompleteness Theorem 5 Outline

More information

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING Santiago Ontañón so367@drexel.edu Summary so far: Rational Agents Problem Solving Systematic Search: Uninformed Informed Local Search Adversarial Search

More information

Version Spaces.

Version Spaces. . Machine Learning Version Spaces Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof Ganesh Ramakrishnan October 20, 2016 1 / 25 Decision Trees: Cascade of step

More information

Concept Learning.

Concept Learning. . Machine Learning Concept Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de

More information

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition Introduction Decision Tree Learning Practical methods for inductive inference Approximating discrete-valued functions Robust to noisy data and capable of learning disjunctive expression ID3 earch a completely

More information

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

Generative Techniques: Bayes Rule and the Axioms of Probability

Generative Techniques: Bayes Rule and the Axioms of Probability Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 8 3 March 2017 Generative Techniques: Bayes Rule and the Axioms of Probability Generative

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Intelligent Data Analysis Decision Trees Paul Prasse, Niels Landwehr, Tobias Scheffer Decision Trees One of many applications:

More information

EECS 349:Machine Learning Bryan Pardo

EECS 349:Machine Learning Bryan Pardo EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example

More information

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything. Decision Trees Defining the Task Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything. Can we predict, i.e

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

COMP 328: Machine Learning

COMP 328: Machine Learning COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang

More information

The Bayesian Learning

The Bayesian Learning The Bayesian Learning Rodrigo Fernandes de Mello Invited Professor at Télécom ParisTech Associate Professor at Universidade de São Paulo, ICMC, Brazil http://www.icmc.usp.br/~mello mello@icmc.usp.br First

More information

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others) Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed

More information

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th CMPT 882 - Machine Learning Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th Stephen Fagan sfagan@sfu.ca Overview: Introduction - Who was Bayes? - Bayesian Statistics Versus Classical Statistics

More information

Decision Trees. Gavin Brown

Decision Trees. Gavin Brown Decision Trees Gavin Brown Every Learning Method has Limitations Linear model? KNN? SVM? Explain your decisions Sometimes we need interpretable results from our techniques. How do you explain the above

More information

Applied Logic. Lecture 4 part 1 Inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Applied Logic. Lecture 4 part 1 Inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw Applied Logic Lecture 4 part 1 Inductive reasoning Marcin Szczuka Institute of Informatics, The University of Warsaw Monographic lecture, Spring semester 2016/2017 Marcin Szczuka (MIMUW) Applied Logic

More information

Data classification (II)

Data classification (II) Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification

More information

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University Concept Learning Mitchell, Chapter 2 CptS 570 Machine Learning School of EECS Washington State University Outline Definition General-to-specific ordering over hypotheses Version spaces and the candidate

More information

Outline. Introduction. Bayesian Probability Theory Bayes rule Bayes rule applied to learning Bayesian learning and the MDL principle

Outline. Introduction. Bayesian Probability Theory Bayes rule Bayes rule applied to learning Bayesian learning and the MDL principle Outline Introduction Bayesian Probability Theory Bayes rule Bayes rule applied to learning Bayesian learning and the MDL principle Sequence Prediction and Data Compression Bayesian Networks Copyright 2015

More information

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng 1 Decision Trees 2 Instances Describable by Attribute-Value Pairs Target Function Is Discrete Valued Disjunctive Hypothesis May Be Required Possibly Noisy Training Data Examples Equipment or medical diagnosis

More information

Decision Tree Learning - ID3

Decision Tree Learning - ID3 Decision Tree Learning - ID3 n Decision tree examples n ID3 algorithm n Occam Razor n Top-Down Induction in Decision Trees n Information Theory n gain from property 1 Training Examples Day Outlook Temp.

More information

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning Question of the Day? Machine Learning 2D43 Lecture 3: Concept Learning What row of numbers comes next in this series? 2 2 22 322 3222 Outline Training Examples for Concept Enjoy Sport Learning from examples

More information

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013 Simple Classifiers Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 1 Overview Pruning 2 Section 3.1: Simplicity First Pruning Always start simple! Accuracy can be misleading.

More information

The Solution to Assignment 6

The Solution to Assignment 6 The Solution to Assignment 6 Problem 1: Use the 2-fold cross-validation to evaluate the Decision Tree Model for trees up to 2 levels deep (that is, the maximum path length from the root to the leaves is

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Learning with Probabilities

Learning with Probabilities Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior

More information

Decision Support. Dr. Johan Hagelbäck.

Decision Support. Dr. Johan Hagelbäck. Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems

More information

Decision Tree Learning

Decision Tree Learning 0. Decision Tree Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 3 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell PLAN 1. Concept learning:

More information

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci Classification: Rule Induction Information Retrieval and Data Mining Prof. Matteo Matteucci What is Rule Induction? The Weather Dataset 3 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny

More information

Machine Learning 2nd Edi7on

Machine Learning 2nd Edi7on Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e

More information

I. Induction, Probability and Confirmation: Introduction

I. Induction, Probability and Confirmation: Introduction I. Induction, Probability and Confirmation: Introduction 1. Basic Definitions and Distinctions Singular statements vs. universal statements Observational terms vs. theoretical terms Observational statement

More information

Chapter 3: Decision Tree Learning (part 2)

Chapter 3: Decision Tree Learning (part 2) Chapter 3: Decision Tree Learning (part 2) CS 536: Machine Learning Littman (Wu, TA) Administration Books? Two on reserve in the math library. icml-03: instructional Conference on Machine Learning mailing

More information

Lecture 2: Foundations of Concept Learning

Lecture 2: Foundations of Concept Learning Lecture 2: Foundations of Concept Learning Cognitive Systems II - Machine Learning WS 2005/2006 Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias Lecture

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Introduction and Models

Introduction and Models CSE522, Winter 2011, Learning Theory Lecture 1 and 2-01/04/2011, 01/06/2011 Lecturer: Ofer Dekel Introduction and Models Scribe: Jessica Chang Machine learning algorithms have emerged as the dominant and

More information

Classification and Prediction

Classification and Prediction Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model

More information

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology Decision trees Special Course in Computer and Information Science II Adam Gyenge Helsinki University of Technology 6.2.2008 Introduction Outline: Definition of decision trees ID3 Pruning methods Bibliography:

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system Introduction to machine learning Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2). Introduction to machine learning When appropriate

More information

Data Mining and Machine Learning

Data Mining and Machine Learning Data Mining and Machine Learning Concept Learning and Version Spaces Introduction Concept Learning Generality Relations Refinement Operators Structured Hypothesis Spaces Simple algorithms Find-S Find-G

More information