Insect ID. Antennae Length. Insect Class. Abdomen Length

Size: px
Start display at page:

Download "Insect ID. Antennae Length. Insect Class. Abdomen Length"

Transcription

1 We have seen that we can do machine learning on data that is in the nice flat file format Rows are objects Columns are features Taking a real problem and massaging it into this format is domain dependent, but often the most fun part of machine learning. Let see just one example. Insect ID Abdomen Length Antennae Length Insect Class Grasshopper Katydid Grasshopper Grasshopper Katydid Grasshopper Katydid Grasshopper Katydid Katydids

2 (Western Pipistrelle (Parastrellus hesperus) Photo by Michael Durham

3 Western pipistrelle calls A spectrogram of a bat call.

4 We can easily measure two features of bat calls. Their characteristic frequency and their call duration Characteristic frequency Call duration Bat ID Characteristic frequency Call duration (ms) Bat Species Western pipistrelle

5

6

7

8

9 Classification We have seen 2 classification techniques: Simple linear classifier, Nearest neighbor,. Let us see two more techniques: Decision tree, Naïve Bayes There are other techniques: Neural Networks, Support Vector Machines, that we will not consider..

10 H(X) I have a box of apples.. 1 Pr(X = good) = p then Pr(X = bad) = 1 p the entropy of X is given by binary entropy function attains its maximum value when p =

11 Antenna Length Decision Tree Classifier Abdomen Length Ross Quinlan Abdomen Length > 7.1? no yes Antenna Length > 6.0? Katydid no yes Grasshopper Katydid

12 Antennae shorter than body? Yes No 3 Tarsi? Grasshopper Yes No Foretiba has ears? Cricket Yes No Decision trees predate computers Katydids Camel Cricket

13 Decision Tree Classification Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution Decision tree generation consists of two phases Tree construction At start, all the training examples are at the root Partition examples recursively based on selected attributes Tree pruning Identify and remove branches that reflect noise or outliers Use of decision tree: Classifying an unknown sample Test the attribute values of the sample against the decision tree

14 How do we construct the decision tree? Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they can be discretized in advance) Examples are partitioned recursively based on selected attributes. Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) Conditions for stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning majority voting is employed for classifying the leaf There are no samples left

15 Information Gain as A Splitting Criteria Select the attribute with the highest information gain (information gain is the expected reduction in entropy). Assume there are two classes, P and N Let the set of examples S contain p elements of class P and n elements of class N The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as p p n E( S) log 2 log 2 p n p n p n p n n 0 log(0) is defined as 0

16 Information Gain in Decision Tree Induction Assume that using attribute A, a current set will be partitioned into some number of child sets The encoding information that would be gained by branching on A Gain( A) E( Current set) E( all child sets) Note: entropy is at its minimum if the collection of objects is completely uniform

17 Person Hair Length Weight Age Class Homer M Marge F Bart M Lisa F Maggie F Abe M Selma F Otto M Krusty M Comic ?

18 p p n Entropy ( S) log 2 log 2 p n p n p n n p n yes no Hair Length <= 5? Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = Let us try splitting on Hair length Gain( A) E( Current set) E( all child sets) Gain(Hair Length <= 5) = (4/9 * /9 * ) =

19 p p n Entropy ( S) log 2 log 2 p n p n p n n p n yes Weight <= 160? no Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = Let us try splitting on Weight Gain( A) E( Current set) E( all child sets) Gain(Weight <= 160) = (5/9 * /9 * 0 ) =

20 p p n Entropy ( S) log 2 log 2 p n p n p n n p n yes age <= 40? no Entropy(4F,5M) = -(4/9)log 2 (4/9) - (5/9)log 2 (5/9) = Let us try splitting on Age Gain( A) E( Current set) E( all child sets) Gain(Age <= 40) = (6/9 * 1 + 3/9 * ) =

21 Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified So we simply recurse! yes Weight <= 160? no This time we find that we can split on Hair length, and we are done! yes no Hair Length <= 2?

22 We need don t need to keep the data around, just the test conditions. Weight <= 160? How would these people be classified? yes Hair Length <= 2? no Male yes no Male Female

23 It is trivial to convert Decision Trees to rules Weight <= 160? yes Hair Length <= 2? yes no no Male Male Female Rules to Classify Males/Females If Weight greater than 160, classify as Male Elseif Hair Length less than or equal to 2, classify as Male Else classify as Female

24 Once we have learned the decision tree, we don t even need a computer! This decision tree is attached to a medical machine, and is designed to help nurses make decisions about what type of doctor to call. Decision tree for a typical shared-care setting applying the system for the diagnosis of prostatic obstructions.

25 PSA = serum prostate-specific antigen levels PSAD = PSA density TRUS = transrectal ultrasound Garzotto M et al. JCO 2005;23:

26 The worked examples we have seen were performed on small datasets. However with small datasets there is a great danger of overfitting the data When you have few datapoints, there are many possible splitting rules that perfectly classify the data, but will not generalize to future datasets. Yes Female Wears green? No Male For example, the rule Wears green? perfectly classifies the data, so does Mothers name is Jacqueline?, so does Has blue shoes

27 Avoid Overfitting in Classification The generated tree may overfit the training data Too many branches, some may reflect anomalies due to noise or outliers Result is in poor accuracy for unseen samples Two approaches to avoid overfitting Prepruning: Halt tree construction early do not split a node if this would result in the goodness measure falling below a threshold Difficult to choose an appropriate threshold Postpruning: Remove branches from a fully grown tree get a sequence of progressively pruned trees Use a set of data different from the training data to decide which is the best pruned tree

28 Which of the Pigeon Problems can be solved by a Decision Tree? 1) Deep Bushy Tree 2) Useless 3) Deep Bushy Tree The Decision Tree has a hard time with correlated attributes ?

29 Advantages/Disadvantages of Decision Trees Advantages: Easy to understand (Doctors love them!) Easy to generate rules Disadvantages: May suffer from overfitting. Classifies by rectangular partitioning (so does not handle correlated features very well). Can be quite large pruning is necessary. Does not handle streaming data easily

30

31 How would we go about building a classifier for projectile points??

32 width I. Location of maximum blade width 1. Proximal quarter 2. Secondmost proximal quarter 3. Secondmost distal quarter 4. Distal quarter II. Base shape 1. Arc-shaped 2. Normal curve 3. Triangular 4. Folsomoid III. Basal indentation ratio 1. No basal indentation (shallow) (deep) IV. Constriction ratio V. Outer tang angle <50 VI. Tang-tip shape 1. Pointed 2. Round 3. Blunt VII. Fluting 1. Absent 2. Present VIII. Length/width ratio > length = 3.10 width = 1.45 length length /width ratio= 2.13

33 I. Location of maximum blade width 1. Proximal quarter 2. Secondmost proximal quarter 3. Secondmost distal quarter 4. Distal quarter II. Base shape 1. Arc-shaped 2. Normal curve 3. Triangular 4. Folsomoid III. Basal indentation ratio 1. No basal indentation (shallow) (deep) IV. Constriction ratio V. Outer tang angle <50 VI. Tang-tip shape 1. Pointed 2. Round 3. Blunt VII. Fluting 1. Absent 2. Present VIII. Length/width ratio > Fluting? = TRUE? no yes Base Shape = 4 no yes Late Archaic Mississippian Length/width ratio = 2

34 We could also us the Nearest Neighbor Algorithm? Late Archaic Transitional Paleo Transitional Paleo Late Archaic

35 It might be better to use the shape directly in the decision tree Decision Tree for Arrowheads Lexiang Ye and Eamonn Keogh (2009) Time Series Shapelets: A New Primitive for Data Mining. SIGKDD 2009 Avonlea Clovis Mix Training data (subset) Clovis Avonlea (Clovis) (Avonlea) I II Shapelet Dictionary I Arrowhead Decision Tree II The shapelet decision tree classifier achieves an accuracy of 80.0%, the accuracy of rotation invariant one-nearest-neighbor classifier is 68.0%.

36 Naïve Bayes Classifier Thomas Bayes We will start off with a visual intuition, before looking at the math

37 Antenna Length Grasshoppers Katydids Abdomen Length Remember this example? Let s get lots more data

38 Antenna Length With a lot of data, we can build a histogram. Let us just build one for Antenna Length for now Katydids Grasshoppers

39 We can leave the histograms as they are, or we can summarize them with two normal distributions. Let us us two normal distributions for ease of visualization in the following slides

40 We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it? We can just ask ourselves, give the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid. There is a formal way to discuss the most probable classification p(c j d) = probability of class c j, given that we have observed d 3 Antennae length is 3

41 p(c j d) = probability of class c j, given that we have observed d P(Grasshopper 3 ) = 10 / (10 + 2) = P(Katydid 3 ) = 2 / (10 + 2) = Antennae length is 3

42 p(c j d) = probability of class c j, given that we have observed d P(Grasshopper 7 ) = 3 / (3 + 9) = P(Katydid 7 ) = 9 / (3 + 9) = Antennae length is 7 7

43 p(c j d) = probability of class c j, given that we have observed d P(Grasshopper 5 ) = 6 / (6 + 6) = P(Katydid 5 ) = 6 / (6 + 6) = Antennae length is 5

44 Bayes Classifiers That was a visual intuition for a simple case of the Bayes classifier, also called: Idiot Bayes Naïve Bayes Simple Bayes We are about to see some of the mathematical formalisms, and more examples, but keep in mind the basic idea. Find out the probability of the previously unseen instance belonging to each class, then simply pick the most probable class.

45 Bayes Classifiers Bayesian classifiers use Bayes theorem, which says p(c j d ) = p(d c j ) p(c j ) p(d) p(c j d) = probability of instance d being in class c j, This is what we are trying to compute p(d c j ) = probability of generating instance d given class c j, We can imagine that being in class c j, causes you to have feature d with some probability p(c j ) = probability of occurrence of class c j, This is just how frequent the class c j, is in our database p(d) = probability of instance d occurring This can actually be ignored, since it is the same for all classes

46 Assume that we have two classes c 1 = male, and c 2 = female. We have a person whose sex we do not know, say drew or d. Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female, I.e which is greater p(male drew) or p(female drew) (Note: Drew can be a male or female name ) Drew Barrymore What is the probability of being called drew given that you are a male? p(male drew) = p(drew male ) p(male) p(drew) Drew Carey What is the probability of being a male? What is the probability of being named drew? (actually irrelevant, since it is that same for all classes)

47 This is Officer Drew (who arrested me in 1997). Is Officer Drew a Male or Female? Luckily, we have a small database with names and sex. Officer Drew We can use it to apply Bayes rule p(c j d) = p(d c j ) p(c j ) p(d) Name Sex Drew Male Claudia Female Drew Female Drew Female Alberto Male Karin Female Nina Female Sergio Male

48 Officer Drew p(c j d) = p(d c j ) p(c j ) p(d) p(male drew) = 1/3 * 3/8 = /8 3/8 p(female drew) = 2/5 * 5/8 = /8 3/8 Name Drew Sex Male Claudia Female Drew Drew Female Female Alberto Male Karin Nina Sergio Female Female Male Officer Drew is more likely to be a Female.

49 Officer Drew IS a female! Officer Drew p(male drew) = 1/3 * 3/8 = /8 3/8 p(female drew) = 2/5 * 5/8 = /8 3/8

50 So far we have only considered Bayes Classification when we have one attribute (the antennae length, or the name ). But we may have many features. How do we use all the features? p(c j d) = p(d c j ) p(c j ) p(d) Name Over 170CM Eye Hair length Sex Drew No Blue Short Male Claudia Yes Brown Long Female Drew No Blue Long Female Drew No Blue Long Female Alberto Yes Brown Short Male Karin No Blue Long Female Nina Yes Brown Short Female Sergio Yes Blue Long Male

51 To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d c j ) = p(d 1 c j ) * p(d 2 c j ) *.* p(d n c j ) The probability of class c j generating instance d, equals. The probability of class c j generating the observed value for feature 1, multiplied by.. The probability of class c j generating the observed value for feature 2, multiplied by..

52 To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d c j ) = p(d 1 c j ) * p(d 2 c j ) *.* p(d n c j ) p(officer drew c j ) = p(over_170 cm = yes c j ) * p(eye =blue c j ) *. Officer Drew is blue-eyed, over 170 cm tall, and has long hair p(officer drew Female) = 2/5 * 3/5 *. p(officer drew Male) = 2/3 * 2/3 *.

53 The Naive Bayes classifiers is often represented as this type of graph c j Note the direction of the arrows, which state that each class causes certain features, with a certain probability p(d 1 c j ) p(d 2 c j ) p(d n c j )

54 Naïve Bayes is fast and space efficient c j We can look up all the probabilities with a single scan of the database and store them in a (small) table p(d 1 c j ) p(d 2 c j ) p(d n c j ) Sex Over190 cm Male Yes 0.15 No 0.85 Female Yes 0.01 No 0.99 Sex Long Hair Male Yes 0.05 No 0.95 Female Yes 0.70 No 0.30 Sex Male Female

55 Naïve Bayes is NOT sensitive to irrelevant features... Suppose we are trying to classify a persons sex based on several features, including eye color. (Of course, eye color is completely irrelevant to a persons gender) p(jessica c j ) = p(eye = brown c j ) * p( wears_dress = yes c j ) *. p(jessica Female) = 9,000/10,000 * 9,975/10,000 *. p(jessica Male) = 9,001/10,000 * 2/10,000 *. Almost the same! However, this assumes that we have good enough estimates of the probabilities, so the more data the better.

56 An obvious point. I have used a simple two class problem, and two possible values for each example, for my previous examples. However we can have an arbitrary number of classes, or feature values c j p(d 1 c j ) p(d 2 c j ) p(d n c j ) Animal Mass >10 kg Cat Yes 0.15 No 0.85 Dog Yes 0.91 No 0.09 Pig Yes 0.99 No 0.01 Animal Color Cat Black 0.33 White 0.23 Brown 0.44 Dog Black 0.97 White 0.03 Brown 0.90 Pig Black 0.04 White 0.01 Animal Cat Dog Pig

57 Problem! Naïve Bayes assumes independence of features p(d c j ) Naïve Bayesian Classifier p(d 1 c j ) p(d 2 c j ) p(d n c j ) Sex Over 6 foot Male Yes 0.15 No 0.85 Female Yes 0.01 No 0.99 Sex Over 200 pounds Male Yes 0.11 No 0.80 Female Yes 0.05 No 0.95

58 Solution Consider the relationships between attributes p(d c j ) Naïve Bayesian Classifier p(d 1 c j ) p(d 2 c j ) p(d n c j ) Sex Over 6 foot Male Yes 0.15 No 0.85 Female Yes 0.01 No 0.99 Sex Over 200 pounds Male Yes and Over 6 foot 0.11 No and Over 6 foot 0.59 Yes and NOT Over 6 foot 0.05 No and NOT Over 6 foot 0.35

59 Solution Consider the relationships between attributes p(d c j ) Naïve Bayesian Classifier p(d 1 c j ) p(d 2 c j ) p(d n c j ) But how do we find the set of connecting arcs??

60 The Naïve Bayesian Classifier has a piecewise quadratic decision boundary Katydids Grasshoppers Ants Adapted from slide by Ricardo Gutierrez-Osuna

61 Which of the Pigeon Problems can be solved by a decision tree?

62 Advantages/Disadvantages of Naïve Bayes Advantages: Fast to train (single scan). Fast to classify Not sensitive to irrelevant features Handles real and discrete data Handles streaming data well Disadvantages: Assumes independence of features

63 Summary We have seen the four most common algorithms used for classification. We have seen there is no one best algorithm. We have seen that issues like normalizing, cleaning, converting the data can make a huge difference. We have only scratched the surface! How do we learn with no class labels? (clustering) How do we learn with expensive class labels? (active learning) How do we spot outliers (Anomaly detection) How do we.. Popular Science Book The Master Algorithm by Pedro Domingos Textbook Data Mining: by Charu C. Aggarwal

64

65 Malaria afflicts about 4% of all humans, killing one million of them each year. Malaria

66

67 Malaria Deaths (2003)

68 There are interventions to mitigate the problem A recent meta-review of randomized controlled trials of Insecticide Treated Nets (ITNs) found that ITNs can reduce malaria-related deaths in children by one fifth and episodes of malaria by half. Mosquito nets work!

69 How do we know where to do the interventions, given that we have finite resources?

70 One second of audio from our sensor. The Common Eastern Bumble Bee (Bombus impatiens) takes about one tenth of a second to pass the laser Background noise Bee begins to cross laser Bee has past though the laser x

71

72 Y(f) One second of audio from the laser sensor. Only Bombus impatiens (Common Eastern Bumble Bee) is in the insectary. Background noise Bee begins to cross laser x x Hz interference Peak at 197Hz Harmonics Single-Sided Amplitude Spectrum of Y(t) Frequency (Hz)

73 Y(f) -3 x Frequency (Hz) Frequency (Hz) Frequency (Hz)

74 Y(f) -3 x Frequency (Hz) Frequency (Hz) Frequency (Hz)

75 Wing Beat Frequency Hz

76 Anopheles stephensi is a primary mosquito vector of malaria. The yellow fever mosquito (Aedes aegypti) is a mosquito that can spread dengue fever, chikungunya, and yellow fever viruses Wing Beat Frequency Hz

77 Anopheles stephensi: Female mean =475, Std = Aedes aegyptii : Female mean =567, Std = 43 If I see an insect with a wingbeat frequency of 500, what is it? P Anopheles wingbeat = 500 = 1 2π 30 e ( )

78 What is the error rate? 12.2% of the area under the pink curve 8.02% of the area under the red curve Can we get more features?

79 Circadian Features Aedes aegypti (yellow fever mosquito) 0 dawn dusk Midnight Noon Midnight

80 Suppose I observe an insect with a wingbeat frequency of 420Hz What is it?

81 Suppose I observe an insect with a wingbeat frequency of 420Hz at 11:00am What is it? Midnight Noon Midnight

82 Suppose I observe an insect with a wingbeat frequency of 420 at 11:00am What is it? Midnight Noon Midnight (Culex [420Hz,11:00am]) = (6/ ( )) * (2/ ( )) = (Anopheles [420Hz,11:00am]) = (6/ ( )) * (4/ ( )) = (Aedes [420Hz,11:00am]) = (0/ ( )) * (3/ ( )) = 0.000

83 Blue Sky Ideas Once you have a classifier working, you begin to see new uses for them Let us see some examples..

84 Capturing or killing individually targeted insects Most efforts to capture or kill insects are shotgun. Many nontargeted insects (including beneficial ones) are killed/captured. In some cases, the ratios are 1,000 to 1 (i.e. 1,000 non-targeted insects are effected for each one that was targeted). We believe our sensors allow an ultra precise approach, with a ratio approaching 1 to 1. This has obvious implications for SIT/metagenomics

85 Kill It seems obvious you could kill a mosquito with a powerful enough laser and with enough time. But we need to do it fast, with as little power as possible. We have gotten this down to 1/20 th of a second, and just 1 watt. (and falling) The mosquitoes may survive the laser strike, but they cannot fly away (as was the case in photo shown right) We are building a SIT Hotel California for female mosquitoes (you can check out anytime you like, but you can never leave) Culex tarsalis Collaboration with UCR mechanical engineers Amir Rose and Dr. Guillermo Aguilar Zoom-in (after removing the wing)

86 Capture We envision building robotic traps that can be left in the field, and programed with different sampling missions. Such traps could be placed and retrieved by drones. Capturing live insects is important if you want to do metagenomics. Some examples of sampling missions Capture examples of gravid{aedes aegypti} Capture insects marked{ Cripple(left-C right-s) } Capture examples of insects that are NOT Anopheles AND have a wingbeat frequency > 400 (to exclude bees, etc.) Capture examples of any insects with a wingbeat frequency > 500, encountered between 4:00am and 4:10am Capture examples of fed{anopheles gambiae} OR fed{anopheles quadriannulatus} OR fed{anopheles melas}

87 Capture About 10% of the insects captured by Venus fly traps are flying insects We believe that we can build inexpensive mechanical traps that can capture sex/species targeted insects. Capture examples of gravid{aedes aegypti} Capture insects marked{ Cripple(left-C right-s) } Capture examples of insects that are NOT Anopheles AND have a wingbeat frequency > 400 (to exclude bees, etc.) Capture examples of any insects with a wingbeat frequency > 500, encountered between 4:00am and 4:10am Capture examples of fed{anopheles gambiae} OR fed{anopheles quadriannulatus} OR fed{anopheles melas}

88 Classification Problem: Fourth Amendment Cases before the Supreme Court II The Supreme Court s search and seizure decisions, terms. Keogh vs. State of California = {0,1,1,0,0,0,1,0} U = Unreasonable R = Reasonable

89 We can also learn decision trees for individual Supreme Court Members. Using similar decision trees for the other eight justices, these models correctly predicted the majority opinion in 75 percent of the cases, substantially outperforming the experts' 59 percent. Decision Tree for Supreme Court Justice Sandra Day O'Connor

Uncertainty. Yeni Herdiyeni Departemen Ilmu Komputer IPB. The World is very Uncertain Place

Uncertainty. Yeni Herdiyeni Departemen Ilmu Komputer IPB. The World is very Uncertain Place Uncertainty Yeni Herdiyeni Departemen Ilmu Komputer IPB The World is very Uncertain Place 1 Ketidakpastian Presiden Indonesia tahun 2014 adalah Perempuan Tahun depan saya akan lulus Jumlah penduduk Indonesia

More information

DATA MINING: NAÏVE BAYES

DATA MINING: NAÏVE BAYES DATA MINING: NAÏVE BAYES 1 Naïve Bayes Classifier Thomas Bayes 1702-1761 We will start off with some mathematical background. But first we start with some visual intuition. 2 Grasshoppers Antenna Length

More information

Data Mining Classification

Data Mining Classification Data Mining Classification Jingpeng Li 1 of 27 What is Classification? Assigning an object to a certain class based on its similarity to previous examples of other objects Can be done with reference to

More information

Lets start off with a visual intuition

Lets start off with a visual intuition Naïve Bayes Classifier (pages 231 238 on text book) Lets start off with a visual intuition Adapted from Dr. Eamonn Keogh s lecture UCR 1 Body length Data 2 Alligators 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

Lecture 7 Decision Tree Classifier

Lecture 7 Decision Tree Classifier Machine Learning Dr.Ammar Mohammed Lecture 7 Decision Tree Classifier Decision Tree A decision tree is a simple classifier in the form of a hierarchical tree structure, which performs supervised classification

More information

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18 Decision Tree Analysis for Classification Problems Entscheidungsunterstützungssysteme SS 18 Supervised segmentation An intuitive way of thinking about extracting patterns from data in a supervised manner

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

Decision Trees (Cont.)

Decision Trees (Cont.) Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split

More information

C4.5 - pruning decision trees

C4.5 - pruning decision trees C4.5 - pruning decision trees Quiz 1 Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can have? A: No. Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Lecture VII: Classification I. Dr. Ouiem Bchir

Lecture VII: Classification I. Dr. Ouiem Bchir Lecture VII: Classification I Dr. Ouiem Bchir 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find

More information

Decision Tree And Random Forest

Decision Tree And Random Forest Decision Tree And Random Forest Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni. Koblenz-Landau, Germany) Spring 2019 Contact: mailto: Ammar@cu.edu.eg

More information

Classification and Prediction

Classification and Prediction Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

Machine Learning & Data Mining

Machine Learning & Data Mining Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology Decision trees Special Course in Computer and Information Science II Adam Gyenge Helsinki University of Technology 6.2.2008 Introduction Outline: Definition of decision trees ID3 Pruning methods Bibliography:

More information

Machine Learning 2nd Edi7on

Machine Learning 2nd Edi7on Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e

More information

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online. Feel

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

More information

Time Series Shapelets

Time Series Shapelets Time Series Shapelets Lexiang Ye and Eamonn Keogh This file contains augmented versions of the figures in our paper, plus additional experiments and details that were omitted due to space limitations Please

More information

Lecture 7: DecisionTrees

Lecture 7: DecisionTrees Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:

More information

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1 Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating

More information

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

Bayesian Classification

Bayesian Classification Bayesian Classification CSE634: Data Mining Concepts and Techniques Professor Anita Wasilewska TEAM 14 1 References Links http://www.simafore.com/blog/3-challenges-with-naive-bayes-classifiers-and-how-to-overcome

More information

1. True/False (40 points) total

1. True/False (40 points) total Name: 1 2 3 4 5 6 7 total 40 20 45 45 20 15 15 200 UMBC CMSC 671 Final Exam December 20, 2009 Please write all of your answers on this exam. The exam is closed book and has seven problems that add up to

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014 Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists

More information

UVA CS 4501: Machine Learning

UVA CS 4501: Machine Learning UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Decision Trees. Danushka Bollegala

Decision Trees. Danushka Bollegala Decision Trees Danushka Bollegala Rule-based Classifiers In rule-based learning, the idea is to learn a rule from train data in the form IF X THEN Y (or a combination of nested conditions) that explains

More information

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi Predictive Modeling: Classification Topic 6 Mun Yi Agenda Models and Induction Entropy and Information Gain Tree-Based Classifier Probability Estimation 2 Introduction Key concept of BI: Predictive modeling

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

EECS 349:Machine Learning Bryan Pardo

EECS 349:Machine Learning Bryan Pardo EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example

More information

Informal Definition: Telling things apart

Informal Definition: Telling things apart 9. Decision Trees Informal Definition: Telling things apart 2 Nominal data No numeric feature vector Just a list or properties: Banana: longish, yellow Apple: round, medium sized, different colors like

More information

15-381: Artificial Intelligence. Decision trees

15-381: Artificial Intelligence. Decision trees 15-381: Artificial Intelligence Decision trees Bayes classifiers find the label that maximizes: Naïve Bayes models assume independence of the features given the label leading to the following over documents

More information

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University 3 We are given a set of training examples, consisting of input-output pairs (x,y), where: 1. x is an item of the type we want to evaluate. 2. y is the value of some

More information

Decision Trees. Gavin Brown

Decision Trees. Gavin Brown Decision Trees Gavin Brown Every Learning Method has Limitations Linear model? KNN? SVM? Explain your decisions Sometimes we need interpretable results from our techniques. How do you explain the above

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical

More information

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything. Decision Trees Defining the Task Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything. Can we predict, i.e

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007. Spring 2007 / Page 1 Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007. Don t panic. Be sure to write your name and student ID number on every page of the exam. The only materials

More information

Decision Trees. Tirgul 5

Decision Trees. Tirgul 5 Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Data Mining Project. C4.5 Algorithm. Saber Salah. Naji Sami Abduljalil Abdulhak

Data Mining Project. C4.5 Algorithm. Saber Salah. Naji Sami Abduljalil Abdulhak Data Mining Project C4.5 Algorithm Saber Salah Naji Sami Abduljalil Abdulhak Decembre 9, 2010 1.0 Introduction Before start talking about C4.5 algorithm let s see first what is machine learning? Human

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://wwwcstauacil/~apartzin/machinelearning/ and wwwcsprincetonedu/courses/archive/fall01/cs302 /notes/11/emppt The MIT Press, 2010

More information

CSCI 5622 Machine Learning

CSCI 5622 Machine Learning CSCI 5622 Machine Learning DATE READ DUE Mon, Aug 31 1, 2 & 3 Wed, Sept 2 3 & 5 Wed, Sept 9 TBA Prelim Proposal www.rodneynielsen.com/teaching/csci5622f09/ Instructor: Rodney Nielsen Assistant Professor

More information

Probabilistic Methods in Bioinformatics. Pabitra Mitra

Probabilistic Methods in Bioinformatics. Pabitra Mitra Probabilistic Methods in Bioinformatics Pabitra Mitra pabitra@cse.iitkgp.ernet.in Probability in Bioinformatics Classification Categorize a new object into a known class Supervised learning/predictive

More information

Machine Learning 3. week

Machine Learning 3. week Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which

More information

Symbolic methods in TC: Decision Trees

Symbolic methods in TC: Decision Trees Symbolic methods in TC: Decision Trees ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.cs.tcd.ie/kevin.koidl/cs0/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 01-017 A symbolic

More information

Introduction to Data Science Data Mining for Business Analytics

Introduction to Data Science Data Mining for Business Analytics Introduction to Data Science Data Mining for Business Analytics BRIAN D ALESSANDRO VP DATA SCIENCE, DSTILLERY ADJUNCT PROFESSOR, NYU FALL 2014 Fine Print: these slides are, and always will be a work in

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 06: ayes Theorem Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) Andrew Moore (CMU/Google) 1 Naïve ayes Classifier

More information

10701/15781 Machine Learning, Spring 2007: Homework 2

10701/15781 Machine Learning, Spring 2007: Homework 2 070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach

More information

Decision Tree. Decision Tree Learning. c4.5. Example

Decision Tree. Decision Tree Learning. c4.5. Example Decision ree Decision ree Learning s of systems that learn decision trees: c4., CLS, IDR, ASSISA, ID, CAR, ID. Suitable problems: Instances are described by attribute-value couples he target function has

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE

EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE Msc. Daniela Qendraj (Halidini) Msc. Evgjeni Xhafaj Department of Mathematics, Faculty of Information Technology, University

More information

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining 1/21 Tutorial 2 CPSC 340: Machine Learning and Data Mining Fall 2016 Overview 2/21 1 Decision Tree Decision Stump Decision Tree 2 Training, Testing, and Validation Set 3 Naive Bayes Classifier Decision

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)

AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p) 1 Decision Trees (13 pts) Data points are: Negative: (-1, 0) (2, 1) (2, -2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the

More information

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler + Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:

More information

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees) Decision Trees Lewis Fishgold (Material in these slides adapted from Ray Mooney's slides on Decision Trees) Classification using Decision Trees Nodes test features, there is one branch for each value of

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Classification and regression trees

Classification and regression trees Classification and regression trees Pierre Geurts p.geurts@ulg.ac.be Last update: 23/09/2015 1 Outline Supervised learning Decision tree representation Decision tree learning Extensions Regression trees

More information

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze)

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze) Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze) Learning Individual Rules and Subgroup Discovery Introduction Batch Learning Terminology Coverage Spaces Descriptive vs. Predictive

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information