Lecture Notes for Chapter 5. Introduction to Data Mining

Size: px
Start display at page:

Download "Lecture Notes for Chapter 5. Introduction to Data Mining"

Transcription

1 Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

2 Rule-Based Classifier Classify records by using a collection of if then rules Rule: where (Condition) y Condition is a conjunctions of attributes y is the class label LHS: rule antecedent or condition RHS: rule consequent Examples of classification rules: Condition=(A 1 op v 1 ) & (A 2 op v 2 ) (A k op v k ) (Blood Type=Warm) (Lay Eggs=Yes) Birds (Taxable Income < 50K) (Refund=Yes) Evade=No Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2

3 Rule-based Classifier (Example) Name Blood Type Give Birth Can Fly Live in Water Class human warm yes no no mammals python cold no no no reptiles salmon cold no no yes fishes whale warm yes no yes mammals frog cold no no sometimes amphibians komodo cold no no no reptiles bat warm yes yes no mammals pigeon warm no yes no birds cat warm yes no no mammals leopard shark cold yes no yes fishes turtle cold no no sometimes reptiles penguin warm no no sometimes birds porcupine warm yes no no mammals eel cold no no yes fishes salamander cold no no sometimes amphibians gila monster cold no no no reptiles platypus warm no no no mammals owl warm no yes no birds dolphin warm yes no yes mammals eagle warm no yes no birds R1: (Give Birth = no) (Can Fly = yes) Birds R2: (Give Birth = no) (Live in Water = yes) Fishes R3: (Give Birth = yes) (Blood Type = warm) Mammals R4: (Give Birth = no) (Can Fly = no) Reptiles R5: (Live in Water = sometimes) Amphibians Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3

4 Application of Rule-Based Classifier A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no) (Can Fly = yes) Birds R2: (Give Birth = no) (Live in Water = yes) Fishes R3: (Give Birth = yes) (Blood Type = warm) Mammals R4: (Give Birth = no) (Can Fly = no) Reptiles R5: (Live in Water = sometimes) Amphibians Name Blood Type Give Birth Can Fly Live in Water Class hawk warm no yes no? grizzly bear warm yes no no? The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4

5 10 Rule Coverage and Accuracy Coverage of a rule: Fraction of records that satisfy the antecedent of a rule Accuracy of a rule: Fraction of records that satisfy both the antecedent and consequent of a rule Tid Refund Marital Status Taxable Income Class 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Ex: (Status=Single) No Coverage = 40%, Accuracy = 50% Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5

6 How does Rule-based Classifier Work? R1: (Give Birth = no) (Can Fly = yes) Birds R2: (Give Birth = no) (Live in Water = yes) Fishes R3: (Give Birth = yes) (Blood Type = warm) Mammals R4: (Give Birth = no) (Can Fly = no) Reptiles R5: (Live in Water = sometimes) Amphibians Name Blood Type Give Birth Can Fly Live in Water Class lemur warm yes no no? turtle cold no no sometimes? dogfish shark cold yes no yes? A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6

7 Characteristics of Rule-Based Classifier Mutually exclusive rules Classifier contains mutually exclusive rules if the rules are independent of each other Every record is covered by at most one rule Exhaustive rules Classifier has exhaustive coverage if it accounts for every possible combination of attribute values Each record is covered by at least one rule Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7

8 From Decision Trees To Rules Classification Rules Yes NO Refund {Single, Divorced} Taxable Income No Marital Status < 80K > 80K {Married} NO (Refund=Yes) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income<80K) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes (Refund=No, Marital Status={Married}) ==> No NO YES Rules are mutually exclusive and exhaustive Rule set contains as much information as the tree Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8

9 10 Rules Can Be Simplified Yes NO NO Refund {Single, Divorced} Taxable Income No Marital Status < 80K > 80K YES {Married} NO Tid Refund Marital Status Taxable Income 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No Cheat 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Initial Rule: (Refund=No) (Status=Married) No Simplified Rule: (Status=Married) No Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9

10 Effect of Rule Simplification Rules are no longer mutually exclusive A record may trigger more than one rule Solution? Ordered rule set Unordered rule set use voting schemes Rules are no longer exhaustive A record may not trigger any rules Solution? Use a default class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

11 Ordered Rule Set Rules are rank ordered according to their priority An ordered rule set is known as a decision list When a test record is presented to the classifier It is assigned to the class label of the highest ranked rule it has triggered If none of the rules fired, it is assigned to the default class R1: (Give Birth = no) (Can Fly = yes) Birds R2: (Give Birth = no) (Live in Water = yes) Fishes R3: (Give Birth = yes) (Blood Type = warm) Mammals R4: (Give Birth = no) (Can Fly = no) Reptiles R5: (Live in Water = sometimes) Amphibians Name Blood Type Give Birth Can Fly Live in Water Class turtle cold no no sometimes? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

12 Rule Ordering Schemes Rule-based ordering Individual rules are ranked based on their quality Class-based ordering Rules that belong to the same class appear together Rule-based Ordering (Refund=Yes) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income<80K) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes (Refund=No, Marital Status={Married}) ==> No Class-based Ordering (Refund=Yes) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income<80K) ==> No (Refund=No, Marital Status={Married}) ==> No (Refund=No, Marital Status={Single,Divorced}, Taxable Income>80K) ==> Yes Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

13 Building Classification Rules Direct Method: Extract rules directly from data e.g.: RIPPER, CN2, Holte s 1R Holte s 1R Algorithm: for each attribute for each value of that attribute find class distribution for attr/value conc = most frequent class make rule: attribute=value -> conc calculate error rate of rule set select rule set with lowest error rate Indirect Method: Extract rules from other classification models (e.g. decision trees, neural networks, etc). e.g: C4.5rules Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

14 Direct Method: Sequential Covering 1. Start from an empty rule 2. Grow a rule using the Learn-One-Rule function 3. Remove training records covered by the rule 4. Repeat Step (2) and (3) until stopping criterion is met Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

15 Example of Sequential Covering (i) Original Data (ii) Step 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

16 Example of Sequential Covering R1 R1 R2 (iii) Step 2 (iv) Step 3 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

17 Aspects of Sequential Covering Rule Growing Instance Elimination Rule Evaluation Stopping Criterion Rule Pruning Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

18 Rule Growing Two common strategies { } Yes: 3 No: 4 Refund=No, Status=Single, Income=85K (Class=Yes) Refund=No, Status=Single, Income=90K (Class=Yes) Refund= No Yes: 3 No: 4 Status = Single Yes: 2 No: 1 Status = Divorced Yes: 1 No: 0 Status = Married Yes: 0 No: 3... Income > 80K Yes: 3 No: 1 Refund=No, Status = Single (Class = Yes) (a) General-to-specific (b) Specific-to-general Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

19 Rule Growing (Examples) CN2 Algorithm: Start from an empty conjunct: {} Add conjuncts that minimizes the entropy measure: {A}, {A,B}, Determine the rule consequent by taking majority class of instances covered by the rule RIPPER Algorithm: Start from an empty rule: {} => class Add conjuncts that maximizes FOIL s information gain measure: R0: {} => class (initial rule) R1: {A} => class (rule after adding conjunct) Gain(R0, R1) = t [ log (p1/(p1+n1)) log (p0/(p0 + n0)) ] where t: number of positive instances covered by both R0 and R1 p0: number of positive instances covered by R0 n0: number of negative instances covered by R0 p1: number of positive instances covered by R1 n1: number of negative instances covered by R1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

20 Aspects of Sequential Covering Rule Growing Instance Elimination Rule Evaluation Stopping Criterion Rule Pruning Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

21 Instance Elimination Why do we need to eliminate instances? Otherwise, the next rule is identical to previous rule Why do we remove positive instances? Ensure that the next rule is different Why do we remove negative instances? Prevent underestimating accuracy of rule Compare rules R2 and R3 in the diagram class = + class = - R1 R3 R Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

22 Aspects of Sequential Covering Rule Growing Instance Elimination Rule Evaluation Stopping Criterion Rule Pruning Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

23 Rule Evaluation Metrics: Accuracy Laplace M-estimate Foil-gain n c n n c 1 n k n c kp n k pos' FOIL _ Gain pos' (log 2 log 2 pos' neg' pos ) pos neg Pos/neg are # of positive/negative tuples covered by R. n : Number of instances covered by rule n c : Number of instances covered by rule k : Number of classes p : Prior probability Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

24 Aspects of Sequential Covering Rule Growing Instance Elimination Rule Evaluation Stopping Criterion Rule Pruning Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

25 Stopping Criterion and Rule Pruning Stopping criterion Compute the gain If gain is not significant, discard the new rule Rule Pruning FOIL _ Prune( Similar to post-pruning of decision trees Reduced Error Pruning: Remove one of the conjuncts in the rule R) pos pos Compare error rate on validation set before and after pruning If error improves, prune the conjunct neg neg Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

26 Summary of Direct Method Grow a single rule Remove Instances from rule Prune the rule (if necessary) Add rule to Current Rule Set Repeat Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

27 Direct Method: RIPPER For 2-class problem, choose one of the classes as positive class, and the other as negative class Learn rules for positive class Negative class will be default class For multi-class problem Order the classes according to increasing class prevalence (fraction of instances that belong to a particular class) Learn the rule set for smallest class first, treat the rest as negative class Repeat with next smallest class as positive class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

28 Direct Method: RIPPER Growing a rule: Start from empty rule Add conjuncts as long as they improve FOIL s information gain Stop when rule no longer covers negative examples Prune the rule immediately using incremental reduced error pruning Measure for pruning: v = (p-n)/(p+n) p: number of positive examples covered by the rule in the validation set n: number of negative examples covered by the rule in the validation set Pruning method: delete any final sequence of conditions that maximizes v Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

29 Direct Method: RIPPER Building a Rule Set: Use sequential covering algorithm Finds the best rule that covers the current set of positive examples Eliminate both positive and negative examples covered by the rule Each time a rule is added to the rule set, compute the new description length stop adding new rules when the new description length is d bits longer than the smallest description length obtained so far Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

30 Direct Method: RIPPER Optimize the rule set: For each rule r in the rule set R Consider 2 alternative rules: Replacement rule (r*): grow new rule from scratch Revised rule(r ): add conjuncts to extend the rule r Compare the rule set for r against the rule set for r* and r Choose rule set that minimizes MDL principle Repeat rule generation and rule optimization for the remaining positive examples Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

31 Indirect Methods P Q No Yes R Rule Set No Yes No Yes Q No Yes - + r1: (P=No,Q=No) ==> - r2: (P=No,Q=Yes) ==> + r3: (P=Yes,R=No) ==> + r4: (P=Yes,R=Yes,Q=No) ==> - r5: (P=Yes,R=Yes,Q=Yes) ==> + Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

32 Indirect Method: C4.5rules Extract rules from an unpruned decision tree For each rule, r: A y, consider an alternative rule r : A y where A is obtained by removing one of the conjuncts in A Compare the pessimistic error rate for r against all r s Prune if one of the r s has lower pessimistic error rate Repeat until we can no longer improve generalization error Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

33 Indirect Method: C4.5rules Instead of ordering the rules, order subsets of rules (class ordering) Each subset is a collection of rules with the same rule consequent (class) Compute description length of each subset Description length = L(error) + g L(model) g is a parameter that takes into account the presence of redundant attributes in a rule set (default value = 0.5) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

34 Example Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class human yes no no no yes mammals python no yes no no no reptiles salmon no yes no yes no fishes whale yes no no yes no mammals frog no yes no sometimes yes amphibians komodo no yes no no yes reptiles bat yes no yes no yes mammals pigeon no yes yes no yes birds cat yes no no no yes mammals leopard shark yes no no yes no fishes turtle no yes no sometimes yes reptiles penguin no yes no sometimes yes birds porcupine yes no no no yes mammals eel no yes no yes no fishes salamander no yes no sometimes yes amphibians gila monster no yes no no yes reptiles platypus no yes no no yes mammals owl no yes yes no yes birds dolphin yes no no yes no mammals eagle no yes yes no yes birds Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

35 C4.5 versus C4.5rules versus RIPPER Give Birth? C4.5rules: (Give Birth=No, Can Fly=Yes) Birds Yes No (Give Birth=No, Live in Water=Yes) Fishes (Give Birth=Yes) Mammals Mammals Live In Water? (Give Birth=No, Can Fly=No, Live in Water=No) Reptiles ( ) Amphibians Yes No RIPPER: (Live in Water=Yes) Fishes Sometimes (Have Legs=No) Reptiles Fishes Amphibians Can Fly? (Give Birth=No, Can Fly=No, Live In Water=No) Reptiles (Can Fly=Yes,Give Birth=No) Birds Yes No () Mammals Birds Reptiles Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

36 C4.5 versus C4.5rules versus RIPPER C4.5 and C4.5rules: RIPPER: Confusion Matrix PREDICTED CLASS Amphibians Fishes Reptiles Birds Mammals ACTUAL Amphibians CLASS Fishes Reptiles Birds Mammals PREDICTED CLASS Amphibians Fishes Reptiles Birds Mammals ACTUAL Amphibians CLASS Fishes Reptiles Birds Mammals Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

37 Advantages of Rule-Based Classifiers As highly expressive as decision trees Easy to interpret Easy to generate Can classify new instances rapidly Performance comparable to decision trees Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

38 Instance-Based Classifiers Set of Stored Cases Atr1... AtrN Class A B B C A C B Store the training records Use training records to predict the class label of unseen cases Unseen Case Atr1... AtrN Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

39 Instance Based Classifiers Examples: Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly Nearest neighbor Uses k closest points (nearest neighbors) for performing classification Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

40 Nearest Neighbor Classifiers Basic idea: If it walks like a duck, quacks like a duck, then it s probably a duck Compute Distance Test Record Training Records Choose k of the nearest records Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

41 Nearest-Neighbor Classifiers Unknown record Requires three things The set of stored records Distance Metric to compute distance between records The value of k, the number of nearest neighbors to retrieve To classify an unknown record: Compute distance to other training records Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

42 Definition of Nearest Neighbor X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

43 1 nearest-neighbor Voronoi Diagram Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

44 Nearest Neighbor Classification Compute distance between two points: Euclidean distance d( p, q) i ( p q i i ) 2 Determine the class from nearest neighbor list take the majority vote of class labels among the k- nearest neighbors Weigh the vote according to distance weight factor, w = 1/d 2 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

45 Nearest Neighbor Classification Choosing the value of k: If k is too small, sensitive to noise points If k is too large, neighborhood may include points from other classes X Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

46 Nearest Neighbor Classification Scaling issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Example: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

47 Nearest Neighbor Classification Problem with Euclidean measure: High dimensional data curse of dimensionality Can produce counter-intuitive results vs d = d = Solution: Normalize the vectors to unit length Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

48 Nearest neighbor Classification k-nn classifiers are lazy learners It does not build models explicitly Unlike eager learners such as decision tree induction and rule-based systems Classifying unknown records are relatively expensive Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

49 Example: PEBLS PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) Works with both continuous and nominal features For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM) Each record is assigned a weight factor Number of nearest neighbor, k = 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

50 10 Example: PEBLS Tid Refund Marital Status Taxable Income 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No Cheat 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Distance between nominal attribute values: d(single,married) = 2/4 0/4 + 2/4 4/4 = 1 d(single,divorced) = 2/4 1/2 + 2/4 1/2 = 0 d(married,divorced) = 0/4 1/2 + 4/4 1/2 = 1 d(refund=yes,refund=no) = 0/3 3/7 + 3/3 4/7 = 6/7 Marital Status Class Single Married Divorced Yes Refund Class Yes No Yes 0 3 d( V1, V2 ) n 1i n i 1 n n 2i 2 No No 3 4 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

51 10 Example: PEBLS Tid Refund Marital Status Taxable Income Cheat X Yes Single 125K No Y No Married 100K No Distance between record X and record Y: where: ( X w X, Y ) w X w Y d i1 d( X i, Y Number of times X is used for prediction Number of times X predicts correctly i 2 ) w X 1 if X makes accurate prediction most of the time w X > 1 if X is not reliable for making predictions Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

52 Naïve Bayes Classifier Thomas Bayes Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

53 Antenna Length Grasshoppers Katydids Abdomen Length Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

54 Antenna Length With a lot of data, we can build a histogram. Let us just build one for Antenna Length for now Katydids Grasshoppers Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

55 We can leave the histograms as they are, or we can summarize them with two normal distributions. Let us us two normal distributions for ease of visualization in the following slides Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

56 p(c j d) = probability of class c j, given that we have observed d P(Grasshopper 3 ) = 10 / (10 + 2) = P(Katydid 3 ) = 2 / (10 + 2) = Antennae length is 3 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

57 p(c j d) = probability of class c j, given that we have observed d P(Grasshopper 7 ) = 3 / (3 + 9) = P(Katydid 7 ) = 9 / (3 + 9) = Antennae length is 7 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

58 p(c j d) = probability of class c j, given that we have observed d P(Grasshopper 5 ) = 6 / (6 + 6) = P(Katydid 5 ) = 6 / (6 + 6) = Antennae length is 5? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

59 Bayesian Theorem Given training data X, posteriori probability of a hypothesis H, P(H X) follows the Bayes theorem P( H X ) P( X H) P( H) P( X ) Informally, this can be written as posterior =likelihood x prior probability/ evidence Predicts X belongs to C i iff the probability P(C i X) is the highest among all the P(C k X) foi all the k classes Practical difficulty: require initial knowledge of many probabilities =>significant computational cost Grasshopper Katydid Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

60 Example Name Chris Claudia Chris Chris Alberto Karin Nina Sergio Chris Sex Male Female Female Female Male Female Female Male Female Is Chris a Male or Female? P( H X ) P( X H) P( H) P( X ) p(male Chris) = 1/3 * 3/9 4/9 p(female Chris) = 3/6 * 6/9 4/9 =>Chris is a Female Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ /03/26 60

61 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem: ) ( ) ( ) ( ) ( A P C P C A P A C P ) ( ), ( ) ( ) ( ), ( ) ( C P A C P C A P A P A C P A C P

62 Bayesian Classifiers Consider each attribute and class label as random variables Given a record with attributes (A 1, A 2,,A n ) Goal is to predict class C Specifically, we want to find the value of C that maximizes P(C A 1, A 2,,A n ) Can we estimate P(C A 1, A 2,,A n ) directly from data? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

63 Bayesian Classifiers Approach: compute the posterior probability P(C A 1, A 2,, A n ) for all values of C using the Bayes theorem P( C A A 1 2 A n ) P( A A A C) P( C) 1 2 n P( A A A ) 1 2 n Choose value of C that maximizes P(C A 1, A 2,, A n ) Equivalent to choosing value of C that maximizes P(A 1, A 2,, A n C) P(C) How to estimate P(A 1, A 2,, A n C )? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

64 Naïve Bayes Classifier Assume independence among attributes A i when class is given: P(A 1, A 2,, A n C) = P(A 1 C j ) P(A 2 C j ) P(A n C j ) A1 (Height=170), A2 (Weight=50), difficult Can estimate P(A i C j ) for all A i and C j much easier New point is classified to C j if P(C j ) P(A i C j ) is maximal. P( C A A 1 2 A n ) P( A A A C) P( C) 1 2 n P( A A A ) 1 2 n Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

65 10 How to Estimate Probabilities from Data? categorica l Tid Refund Marital Status categorica l Taxable Income continuous 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No Evade class Class: P(C) = N c /N e.g., P(No) = 7/10, P(Yes) = 3/10 For discrete attributes: P(A i C k ) = A ik / N c k 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes where A ik is number of instances having attribute A i and belongs to class C k Examples: P(Status=Married No) = 4/7 P(Refund=Yes Yes)=0 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

66 Training dataset Class: C1:buys_computer= yes C2:buys_computer= no Data sample: X (age<=30, Income=medium, Student=yes, Credit_rating=Fair) age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no excellent no Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

67 Naïve Bayesian Classifier: Example X=(age<=30, income=medium, student=yes, credit_rating=fair) P(buys_computer= yes )=9/14, P(buys_computer= no )=5/14 Compute P(X Ci) for each class P(age= <30 buys_computer= yes ) = 2/9=0.222 P(income= medium buys_computer= yes )= 4/9 =0.444 P(student= yes buys_computer= yes)= 6/9 =0.667 P(credit_rating= fair buys_computer= yes )=6/9=0.667 X=(age<=30,income =medium, student=yes,credit_rating=fair) P(X Ci) : P(X buys_computer= yes )= x x x =0.044 P(credit_rating= fair buys_computer= no )=2/5=0.4 P(age= <30 buys_computer= no ) = 3/5 =0.6 P(income= medium buys_computer= no ) = 2/5 = 0.4 P(student= yes buys_computer= no )= 1/5=0.2 => P(X buys_computer= no )= 0.6 x 0.4 x 0.2 x 0.4 =0.019 P(X Ci)*P(Ci ) : P(X buys_computer= yes ) * P(buys_computer= yes )=0.028 P(X buys_computer= no ) * P(buys_computer= no )=0.007 => X belongs to class buys_computer=yes Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

68 Example of Naïve Bayes Classifier Given a Test Record: X ( Refund No,Married,Income 120K) naive Bayes Classifier: P(Refund=Yes No) = 3/7 P(Refund=No No) = 4/7 P(Refund=Yes Yes) = 0 P(Refund=No Yes) = 1 P(Marital Status=Single No) = 2/7 P(Marital Status=Divorced No)=1/7 P(Marital Status=Married No) = 4/7 P(Marital Status=Single Yes) = 2/7 P(Marital Status=Divorced Yes)=1/7 P(Marital Status=Married Yes) = 0 For taxable income: If class=no: sample mean=110 sample variance=2975 If class=yes: sample mean=90 sample variance=25 P(X Class=No) = P(Refund=No Class=No) P(Married Class=No) P(Income=120K Class=No) = 4/7 4/ = P(X Class=Yes) = P(Refund=No Class=Yes) P(Married Class=Yes) P(Income=120K Class=Yes) = = 0 Since P(X No)P(No) > P(X Yes)P(Yes) Therefore P(No X) > P(Yes X) => Class = No Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

69 Example of Naïve Bayes Classifier Name Give Birth Can Fly Live in Water Have Legs Class human yes no no yes mammals python no no no no non-mammals salmon no no yes no non-mammals whale yes no yes no mammals frog no no sometimes yes non-mammals komodo no no no yes non-mammals bat yes yes no yes mammals pigeon no yes no yes non-mammals cat yes no no yes mammals leopard shark yes no yes no non-mammals turtle no no sometimes yes non-mammals penguin no no sometimes yes non-mammals porcupine yes no no yes mammals eel no no yes no non-mammals salamander no no sometimes yes non-mammals gila monster no no no yes non-mammals platypus no no no yes mammals owl no yes no yes non-mammals dolphin yes no yes no mammals eagle no yes no yes non-mammals A: attributes M: mammals N: non-mammals P( A M ) P( A N) P( A M ) P( M ) P( A N) P( N) Give Birth Can Fly Live in Water Have Legs Class yes no yes no? P(A M)P(M) > P(A N)P(N) => Mammals Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

70 How to Estimate Probabilities from Data? For continuous attributes: Discretize the range into bins one ordinal attribute per bin violates independence assumption Two-way split: (A < v) or (A > v) choose only one of the two splits as new attribute Probability density estimation: Assume attribute follows a normal distribution Use data to estimate parameters of distribution (e.g., mean and standard deviation) Once probability distribution is known, can use it to estimate the conditional probability P(A i c) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

71 10 How to Estimate Probabilities from Data? categorica l Tid Refund Marital Status categorica l Taxable Income continuous 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No Evade 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes class Normal distribution: P( A i c One for each (A i,c i ) pair For (Income, Class=No): ) If Class=No j sample mean = 110 sample variance = 2975 ij e ( A i ) 2 ij 2 ij 2 P( Income 120 No) 1 2 (54.54) e (120110) 2 ( 2975) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

72 Naïve Bayes Classifier If one of the conditional probability is zero, then the entire expression becomes zero Probability estimation: Original : P( A Laplace : P( A i i C) C) m - estimate : P( A i N N N N ic ic c c C) 1 c N N ic c mp m c: number of classes p: prior probability m: parameter Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

73 Naïve Bayes (Summary) Robust to isolated noise points Handle missing values by ignoring the instance during probability estimate calculations Robust to irrelevant attributes Independence assumption may not hold for some attributes Use other techniques such as Bayesian Belief Networks (BBN) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

74 Naïve Bayesian Classifier: Comments Advantages : Easy to implement Good results obtained in most of the cases Disadvantages Assumption: class conditional independence Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history etc Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc Dependencies among these cannot be modeled by Naïve Bayesian Classifier How to deal with these dependencies? Bayesian Belief Networks Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

75 Bayesian Networks Bayesian belief network allows a subset of the variables conditionally independent A graphical model of causal relationships Represents dependency among the variables Gives a specification of joint probability distribution X Z Y P Nodes: random variables Links: dependency X,Y are the parents of Z, and Y is the parent of P No dependency between Z and P Has no loops or cycles Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ /03/26 76

76 Bayesian Belief Network: An Example One conditional probability table (CPT) for each variable Family History Smoker The CPT for the variable LungCancer: Shows the conditional probability for each possible combination of its parents (FH, S) (FH, ~S) (~FH, S) (~FH, ~S) LungCancer Emphysema LC ~LC PositiveXRay Dyspnea Bayesian Belief Networks Derivation of the probability of a particular combination of values of X, from CPT: P( x 1,..., x n ) n P( xi i 1 Parents( Yi)) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

77 Classification: A Mathematical Mapping Classification: predicts categorical class labels E.g., Personal homepage classification x i = (x 1, x 2, x 3, ), y i = +1 or 1 x 1 : # of a word homepage x 2 : # of a word welcome x 3 : Mathematically x X = n, y Y = {+1, 1} We want a function f: X Y Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ /03/26 78

78 Linear Classification Binary Classification problem x x x x o x x o x o x x o o o o o x o o o o o The data above the line belongs to class x The data below red line belongs to class o Examples: Perceptron SVM Probabilistic Classifiers Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

79 Neural Networks Analogy to Biological Systems (Indeed a great example of a good learning system) Massive Parallelism allowing for computational efficiency The first learning algorithm came in 1959 (Rosenblatt) If a target output value is provided for a single neuron with fixed inputs, one can incrementally change weights to learn to produce these outputs using the perceptron learning rule Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

80 Neural Networks Dendrites Terminal Branches of Axon Axon x1 x2 w1 w2 Dendrites Terminal Branches of Axon x3 w3 S Binary classifier functions Threshold activation function Axon xn wn Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

81 Activation Function Step Threshold Threshold Activation Function y hid ( u) 1 1 e u Sigmoid Sigmoid Neuron unit function y u N n1 w n x n Linear Linear Activation functions Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

82 Neural Networks The general artificial neuron model has five components, shown in the following list 1. A set of inputs, x i. 2. A set of weights, w i. 3. A bias, u. A Neuron 4. An activation function, f. x 0 w 0 - k 5. Neuron output, y x 1 x n w 1 w n f output y Input vector x weight vector w weighted sum Activation function Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

83 Artificial Neural Networks (ANN) X 1 X 2 X 3 Y Input X 1 X 2 X 3 Black box Output Y Output Y is 1 if at least two of the three inputs are equal to 1. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

84 Artificial Neural Networks (ANN) X 1 X 2 X 3 Y Input nodes X 1 X 2 X 3 Black box S 0.3 t=0.4 Output node Y 1, if 0.3x yˆ 1, if 0.3x x 0.3x x 0.3x ; Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

85 Artificial Neural Networks (ANN) Model is an assembly of inter-connected nodes and weighted links Input nodes X 1 Black box w 1 Output node Output node sums up each of its input value according to the weights of its links X 2 X 3 w 2 w 3 S t Y Compare output node against some threshold t Perceptron Model Y Y I( w X t) i i sign( w X t) i i i i or Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

86 General Structure of ANN x 1 x 2 x 3 x 4 x 5 Input Layer Input Neuron i Output Hidden Layer I 1 I 2 I 3 w i1 w i2 w i3 S i Activation function g(s i ) O i O i threshold, t Output Layer Training ANN means learning the weights of the neurons y Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

87 Layered Networks Inputs. Hidden Neurons Output Neurons Outputs x y 1 h1 (u) y o1 (u) y 1 x y 2 h2 (u) y o2 (u) y 2 x n y hn (u) y om (u) y m Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

88 Perceptron Learning Algorithm 1. Let D={x i,y i i=1,2,,n} be the training instances 2. Initialize the weight vector with random values w (0) 3. repeat 4. for each instance (x i,y i ) D 5. Compute 6. for each weight w j do 7. w (k+1) j = w (k) j + λ( - ) x ij 8. end for error 9. end for 10. until stopping condition is met ŷ ( ) yˆ k i Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ y i

89 Algorithm for learning ANN Initialize the weights (w 0, w 1,, w k ) Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples Objective function: SSE N Find the weights w i s that minimize the above objective function e.g., backpropagation algorithm i 2 ˆ y i y i Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

90 Error Back-Propagation (BP) Hidden Neurons Training error term: e y hid (u hid,1 ) u hid,1 N1 y hid w hid,1 w out,1 y train x y hid (u hid,2 ) u out,1 x w hid,2 N2 w out,2 y out (Su out,n ) y out e(y out, y train ) e w hid,n w out,n Output Neuron e 1 2 ( y yˆ) 2 y hid (u hid,n ) Nn Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

91 Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

92 Support Vector Machines B 1 One Possible Solution Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

93 Support Vector Machines B 2 Another possible solution Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

94 Support Vector Machines B 2 Other possible solutions Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

95 Support Vector Machines B 1 B 2 Which one is better? B1 or B2? How do you define better? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

96 Support Vector Machines B 1 B 2 b 21 b 22 margin b 11 Find hyperplane maximizes the margin => B1 is better than B2 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ b 12

97 Support Vector Machines B 1 x { x1, x2, x3, x4...} w x b 0 w x b 1 w x b 1 b 11 1 if w x b 1 ( x) 1 if w x b 1 f 2 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ b 12 Margin 2 w

98 Support Vector Machines We want to maximize: Margin Which is equivalent to minimizing: 2 2 w L( w) w 2 2 But subjected to the following constraints: 1 if w xi b 1 f ( x i ) 1 if w xi b 1 This is a constrained optimization problem Numerical approaches to solve it (e.g., quadratic programming) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

99 Support Vector Machines What if the problem is not linearly separable? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

100 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines What if the problem is not linearly separable? Introduce slack variables Need to minimize: Subject to: i i i i 1 b x w if 1 1- b x w if 1 ) ( x i f N i k i C w w L ) (

101 Nonlinear Support Vector Machines What if decision boundary is not linear? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

102 Nonlinear Support Vector Machines Transform data into higher dimensional space Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

103 Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made by multiple classifiers Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

104 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers C 1 C 2 C t -1 C t Step 3: Combine Classifiers C * Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

105 Why does it work? Suppose there are 25 base classifiers Each classifier has error rate, = 0.35 Assume classifiers are independent Probability that the ensemble classifier makes a wrong prediction: 25 i13 25 i (1 ) i 25 i 0.06 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

106 Examples of Ensemble Methods How to generate an ensemble of classifiers? Method By manipulating the training set Bagging Boosting By manipulating the input features By manipulating the learning algorithm By manipulating the class label Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

107 Bagging Sampling with replacement Original Data Bagging (Round 1) Bagging (Round 2) Bagging (Round 3) Build classifier on each bootstrap sample Each sample has probability (1 1/n) n of being selected Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

108 Boosting An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records Initially, all N records are assigned equal weights Unlike bagging, weights may change at the end of boosting round Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

109 Boosting Records that are wrongly classified will have their weights increased Records that are classified correctly will have their weights decreased Original Data Boosting (Round 1) Boosting (Round 2) Boosting (Round 3) Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

110 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example: AdaBoost Base classifiers: C 1, C 2,, C T Error rate: Importance of a classifier: N j j j i j i y x C w N 1 ) ( 1 i i i 1 ln 2 1

111 Example: AdaBoost Weight update: ( j) j ( j 1) w exp if ( ) i C j xi yi wi j Z j exp if C j( xi ) yi where Z is the normalization factor If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated Classification: j C *( x) arg max C Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ y T j1 j j ( x) y

112 Illustrating AdaBoost Initial weights for each data point Data points for training Original Data B Boosting Round = Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

113 Illustrating AdaBoost Boosting Round Boosting B B Round = = Boosting Round B3 = Overall Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 5. Introduction to Data Mining

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 5. Introduction to Data Mining Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Neural networks

More information

Lecture Notes for Chapter 5 (PART 1)

Lecture Notes for Chapter 5 (PART 1) Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 (PART 1) 1 Agenda Rule Based Classifier PART 1 Bayesian Classifier Artificial Neural Network Support Vector Machine PART 2

More information

Classification Part 2. Overview

Classification Part 2. Overview Classification Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Overview Rule based Classifiers Nearest-neighbor Classifiers Data Mining

More information

Advanced classifica-on methods

Advanced classifica-on methods Advanced classifica-on methods Instance-based classifica-on Bayesian classifica-on Instance-Based Classifiers Set of Stored Cases Atr1... AtrN Class A B B C A C B Store the training records Use training

More information

Classification Lecture 2: Methods

Classification Lecture 2: Methods Classification Lecture 2: Methods Jing Gao SUNY Buffalo 1 Outline Basics Problem, goal, evaluation Methods Nearest Neighbor Decision Tree Naïve Bayes Rule-based Classification Logistic Regression Support

More information

DATA MINING LECTURE 10

DATA MINING LECTURE 10 DATA MINING LECTURE 10 Classification Nearest Neighbor Classification Support Vector Machines Logistic Regression Naïve Bayes Classifier Supervised Learning 10 10 Illustrating Classification Task Tid Attrib1

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://wwwcstauacil/~apartzin/machinelearning/ and wwwcsprincetonedu/courses/archive/fall01/cs302 /notes/11/emppt The MIT Press, 2010

More information

DATA MINING: NAÏVE BAYES

DATA MINING: NAÏVE BAYES DATA MINING: NAÏVE BAYES 1 Naïve Bayes Classifier Thomas Bayes 1702-1761 We will start off with some mathematical background. But first we start with some visual intuition. 2 Grasshoppers Antenna Length

More information

Probabilistic Methods in Bioinformatics. Pabitra Mitra

Probabilistic Methods in Bioinformatics. Pabitra Mitra Probabilistic Methods in Bioinformatics Pabitra Mitra pabitra@cse.iitkgp.ernet.in Probability in Bioinformatics Classification Categorize a new object into a known class Supervised learning/predictive

More information

Data Analysis. Santiago González

Data Analysis. Santiago González Santiago González Contents Introduction CRISP-DM (1) Tools Data understanding Data preparation Modeling (2) Association rules? Supervised classification Clustering Assesment & Evaluation

More information

7 Classification: Naïve Bayes Classifier

7 Classification: Naïve Bayes Classifier CSE4334/5334 Data Mining 7 Classifiation: Naïve Bayes Classifier Chengkai Li Department of Computer Siene and Engineering University of Texas at rlington Fall 017 Slides ourtesy of ang-ning Tan, Mihael

More information

Chapter 6 Classification and Prediction (2)

Chapter 6 Classification and Prediction (2) Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 6 Based in part on slides from textbook, slides of Susan Holmes. October 29, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 6 Based in part on slides from textbook, slides of Susan Holmes. October 29, / 1 Week 6 Based in part on slides from textbook, slides of Susan Holmes October 29, 2012 1 / 1 Part I Other classification techniques 2 / 1 Rule based classifiers Rule based classifiers Examples: if Refund==No

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Naïve Bayesian. From Han Kamber Pei

Naïve Bayesian. From Han Kamber Pei Naïve Bayesian From Han Kamber Pei Bayesian Theorem: Basics Let X be a data sample ( evidence ): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine H

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Given a collection of records (training set )

Given a collection of records (training set ) Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is (always) the class. Find a model for class attribute as a function of the values of other

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty Bayes Classification n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

Lecture VII: Classification I. Dr. Ouiem Bchir

Lecture VII: Classification I. Dr. Ouiem Bchir Lecture VII: Classification I Dr. Ouiem Bchir 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Numerical Learning Algorithms

Numerical Learning Algorithms Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................

More information

Data Mining Part 4. Prediction

Data Mining Part 4. Prediction Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction

More information

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If

More information

Generative Model (Naïve Bayes, LDA)

Generative Model (Naïve Bayes, LDA) Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

CS590D: Data Mining Prof. Chris Clifton

CS590D: Data Mining Prof. Chris Clifton CS590D: Data Mining Prof. Chris Clifton February 1, 2005 Classification Classification and Prediction What is classification? What is prediction? Issues regarding classification and prediction Bayesian

More information

Data Mining vs Statistics

Data Mining vs Statistics Data Mining Data mining emerged in the 1980 s when the amount of data generated and stored became overwhelming. Data mining is strongly influenced by other disciplines such as mathematics, statistics,

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Bayesian Classification. Bayesian Classification: Why?

Bayesian Classification. Bayesian Classification: Why? Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Oliver Dürr. Statistisches Data Mining (StDM) Woche 11. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Oliver Dürr. Statistisches Data Mining (StDM) Woche 11. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften Statistisches Data Mining (StDM) Woche 11 Oliver Dürr Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften oliver.duerr@zhaw.ch Winterthur, 29 November 2016 1 Multitasking

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Data Mining: Concepts and Techniques Chapter 6 Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei Han and Micheline Kamber, All rights

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker

Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker Inductive Learning Chapter 18 Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker Chapters 3 and 4 Inductive Learning Framework Induce a conclusion from the examples Raw

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008)

Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008) Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008) RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Data Mining: Concepts and Techniques 1 Classification and Prediction What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction

More information

1 Handling of Continuous Attributes in C4.5. Algorithm

1 Handling of Continuous Attributes in C4.5. Algorithm .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Uncertainty. Yeni Herdiyeni Departemen Ilmu Komputer IPB. The World is very Uncertain Place

Uncertainty. Yeni Herdiyeni Departemen Ilmu Komputer IPB. The World is very Uncertain Place Uncertainty Yeni Herdiyeni Departemen Ilmu Komputer IPB The World is very Uncertain Place 1 Ketidakpastian Presiden Indonesia tahun 2014 adalah Perempuan Tahun depan saya akan lulus Jumlah penduduk Indonesia

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Data classification (II)

Data classification (II) Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM CIS59: Applied Machine Learning Fall 208 Homework 5 Handed Out: December 5 th, 208 Due: December 0 th, 208, :59 PM Feel free to talk to other members of the class in doing the homework. I am more concerned

More information

1 Handling of Continuous Attributes in C4.5. Algorithm

1 Handling of Continuous Attributes in C4.5. Algorithm .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous

More information

Data Mining vs Statistics

Data Mining vs Statistics Data Mining Data mining emerged in the 1980 s when the amount of data generated and stored became overwhelming. Data mining is strongly influenced by other disciplines such as mathematics, statistics,

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Decision Tree And Random Forest

Decision Tree And Random Forest Decision Tree And Random Forest Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni. Koblenz-Landau, Germany) Spring 2019 Contact: mailto: Ammar@cu.edu.eg

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

Data Mining algorithms

Data Mining algorithms Data Mining algorithms 2017-2018 spring 02.07-09.2018 Overview Classification vs. Regression Evaluation I Basics Bálint Daróczy daroczyb@ilab.sztaki.hu Basic reachability: MTA SZTAKI, Lágymányosi str.

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

Information Theory & Decision Trees

Information Theory & Decision Trees Information Theory & Decision Trees Jihoon ang Sogang University Email: yangjh@sogang.ac.kr Decision tree classifiers Decision tree representation for modeling dependencies among input variables using

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information