COMP 328: Machine Learning
|
|
- Mitchell Rodgers
- 6 years ago
- Views:
Transcription
1 COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang (HKUST) COMP 328 Spring / 34
2 Two different types of classifiers Decision tree classifiers Data Decision rules Classify unseen examples using the rules Naive Bayes classifiers Data probabilistic model about relationship among class and attributes Classify unseen examples via inference based on the model Nevin L. Zhang (HKUST) COMP 328 Spring / 34
3 Outline Probabilistic Models and Classification 1 Probabilistic Models and Classification 2 Probabilistic Independence 3 Na Ive Bayes Model Classifiers 4 Issues 5 Learning to Classify Text Nevin L. Zhang (HKUST) COMP 328 Spring / 34
4 Probabilistic Models and Classification Joint Distribution The most general way to describe relationships among random variables. Example: P(Temperature, Wind, PlayTennis) Temperature Wind PlayTennis Probability Hot Weak No 0.1 Hot Weak Yes 0 Hot Strong No 0.2 Hot Strong Yes 0.3 Cool Weak No 0.1 Cool Weak Yes 0.2 Cool Strong No 0.1 Cool Strong Yes 0 One prob value for each combination of the states of the variables. All the prob values must sum to 1. Nevin L. Zhang (HKUST) COMP 328 Spring / 34
5 Probabilistic Models and Classification Joint Distribution and Classification Suppose we have a joint distribution. How do classify? Calculate probability and then classify. Play tennis on a Hot day with Strong Wind? P(Play = y T = h, W = s) = = P(Play = y, T = h, W = s) P(T = h, W = s) = 0.6 P(Play = n T = h, W = s) = 1 P(Play = y T = h, W = s) = = 0.4 Answer: yes (if must answer with yes or no) Nevin L. Zhang (HKUST) COMP 328 Spring / 34
6 Probabilistic Models and Classification Joint Distribution and Classification We can perform classification even with a subset of attributes Play tennis when Wind is weak? P(Play = y W = w) = = P(Play = y, W = w) P(W = w) = 0.67 P(Play = n T = h) = 0.33 Answer: yes Nevin L. Zhang (HKUST) COMP 328 Spring / 34
7 Probabilistic Models and Classification The General Case Suppose we have a joint distribution P(A 1,A 2,...,A n,c) over attributes A 1,A 2,...,A n and the class variable C. For a new example with attribute values a 1,a 2,...,a n, For each possible value v j of class variable C, compute the probability P(C = v j A 1 = a 1, A 2 = a 2,..., A n = a n ) Assign the example to the class v j with the highest posterior probability: For all v j P(C = v j A 1 = a 1,A 2 = a 2,...,A n = a n) P(C = v j A 1 = a 1, A 2 = a 2,...,A n = a n) cj = arg max v j class labels P(C = v j A 1 = a 1,A 2 = a 2,..., A n = a n) Nevin L. Zhang (HKUST) COMP 328 Spring / 34
8 Probabilistic Models and Classification A Difficult 100 binary attributes, and a binary class variables. How many entries in the joint probability table? Too many to handle. Too many to estimate from data. Leads to overfit. The Naive Bayes model reduces the number of model parameters by making independence assumption. Nevin L. Zhang (HKUST) COMP 328 Spring / 34
9 Outline Probabilistic Independence 1 Probabilistic Models and Classification 2 Probabilistic Independence 3 Na Ive Bayes Model Classifiers 4 Issues 5 Learning to Classify Text Nevin L. Zhang (HKUST) COMP 328 Spring / 34
10 Probabilistic Independence Marginal independence Two random variables X and Y are marginally independent, written X Y, if for any state x of X and any state y of Y, P(X=x Y =y) = P(X=x), whenever P(Y = y) 0. Meaning: Learning the value of Y does not give me any information about X and vice versa.y contains no information about X and vice versa. Equivalent definition: P(X=x,Y =y) = P(X=x)P(Y =y) Shorthand for the equations: P(X Y ) = P(X),P(X,Y ) = P(X)P(Y ). Nevin L. Zhang (HKUST) COMP 328 Spring / 34
11 Probabilistic Independence Marginal independence Examples: X:result of tossing a fair coin for the first time, Y: result of second tossing of the same coin. X: result of US election, Y : your grades in this course. Counter example:x midterm exam grade, Y final exam grade. Nevin L. Zhang (HKUST) COMP 328 Spring / 34
12 Probabilistic Independence Conditional independence Two random variables X and Y are conditionally independent given a third variable Z,written X Y Z, if P(X=x Y=y, Z=z) = P(X=x Z=z) whenever P(Y =y, Z=z) 0 Meaning: If I know the state of Z already, then learning the state of Y does not give me additional information about X. Y might contain some information about X. However all the information about X contained in Y are also contained in Z. Shorthand for the equation: Equivalent definition: P(X Y, Z) = P(X Z) P(X, Y Z) = P(X Z)P(Y Z) Nevin L. Zhang (HKUST) COMP 328 Spring / 34
13 Probabilistic Independence Example of Conditional Independence There is a bag of 100 coins. 10 coins were made by a malfunctioning machine and are biased toward head. Tossing such a coin results in head 80% of the time. The other coins are fair. Randomly draw a coin from the bag and toss it a few time. X i : result of the i-th tossing, Y : whether the coin is produced by the malfunctioning machine. The X i s are not marginally independent of each other: If I get 9 heads in first 10 tosses, then the coin is probably a biased coin. Hence the next tossing will be more likely to result in a head than a tail. Learning the value of X i gives me some information about whether the coin is biased, which in term gives me some information about X j. Nevin L. Zhang (HKUST) COMP 328 Spring / 34
14 Probabilistic Independence Example of Conditional Independence However, they are conditionally independent given Y : If the coin is not biased, the probability of getting a head in one toss is 1/2 regardless of the results of other tosses. If the coin is biased, the probability of getting a head in one toss is 80% regardless of the results of other tosses. If I already knows whether the coin is biased or not, learning the value of X i does not give me additional information about X j. Here is how the variables are related pictorially. We will return to this picture later. Coin Type Toss 1 Result Toss 2 Result... Toss n Result Nevin L. Zhang (HKUST) COMP 328 Spring / 34
15 Outline Naive Bayes Model Classifiers 1 Probabilistic Models and Classification 2 Probabilistic Independence 3 Na Ive Bayes Model Classifiers 4 Issues 5 Learning to Classify Text Nevin L. Zhang (HKUST) COMP 328 Spring / 34
16 Naive Bayes Model Classifiers The Naive Bayes Model It assumes that the attributes are mutually independent of each other given the class variable. Graphically depicted as: Joint distribution given by n P(C,A 1,A 2,...A n ) = P(C) P(A i C) i=1 Nevin L. Zhang (HKUST) COMP 328 Spring / 34
17 Naive Bayes Model Classifiers Learning the Naive Bayes Model Learning amounts to estimating from data: P(C), P(A 1 C),..., P(A n C). Straightforward to compute: ˆP(C = v j ) = ˆP(A i = a i C = v j ) = # of examples with C = v j total # of examples # of examples with C = v j and A i = a i # of examples with C = v j Although simple, those are the maximum likelihood estimates (MLE) of the parameters. Have nice properties. Nevin L. Zhang (HKUST) COMP 328 Spring / 34
18 Naive Bayes Model Classifiers Classification the Naive Bayes Model For a new example with attribute values: a 1, a 2,..., a n, assign it to this class: v NB = arg max ˆP(C = v j ) v j V n ˆP(A i = a i C = v j ) i=1 Nevin L. Zhang (HKUST) COMP 328 Spring / 34
19 Example: PlayTennis Naive Bayes Model Classifiers Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Nevin L. Zhang (HKUST) COMP 328 Spring / 34
20 Naive Bayes Model Classifiers Example: Estimate parameters P(PlayTennis = y) = 9/14 P(PlayTennis = n) = 5/14 P(Outlook = sunny y) = 2/9 P(Outlook = sunny n) = 3/5 P(Outlook = overcast y) = 4/9 P(Outlook = overcast n) = 0/5 P(Outlook = rain y) = 3/9 P(Outlook = rain n) = 2/5 P(Temp = hot y) = 2/9 P(Temp = hot PlayTennis = n) = 2/5 P(Temp = mild y) = 4/9 P(Temp = mild n) = 2/5 P(Temp = cool y) = 3/9 P(Temp = cool n) = 1/5 P(Humidity = high y) = 3/9 P(Humidity = normal n) = 1/5 P(Humidity = normal y) = 6/9 P(Humidity = high n) = 4/5 P(Wind = strong y) = 3/9 P(Wind = strong n) = 3/5 P(Wind = weak y) = 6/9 P(Wind = weak n) = 2/5 Nevin L. Zhang (HKUST) COMP 328 Spring / 34
21 Naive Bayes Model Classifiers Example:Classification New case: (Sunny, Cool, High, Strong) PlayTennis? Inference: P(y)P(sunny y)p(cool y)p(high y)p(strong y) =.005 P(n)P(sunny n)p(cool n)p(high n)p(strong n) =.021 Conclusion: v NB = n No, don t play. Nevin L. Zhang (HKUST) COMP 328 Spring / 34
22 Outline Issues 1 Probabilistic Models and Classification 2 Probabilistic Independence 3 Na Ive Bayes Model Classifiers 4 Issues 5 Learning to Classify Text Nevin L. Zhang (HKUST) COMP 328 Spring / 34
23 Zero counts Issues None of the training instances with target v j have attribute a i? ˆP(a i v j ) = 0 Hence, we have ˆP(v j ) i ˆP(a i v j ) = 0 Future example with a i, no chance to be classifies as v j, even if all other attribute values suggest v j. Smoothing: Add virtual count 1 to each case. (Laplace Smoothing/Correction) ˆP(A i = a i C = v j ) = (# of examples with C = v j and A i = a i ) + 1 (# of examples with C = v j ) + C C : number of classes Nevin L. Zhang (HKUST) COMP 328 Spring / 34
24 Weka does just that Issues P(outlook yes) is {3/12, 5/12, 4/12} instead of {2/9, 4/9, 3/9} as on Slide 20. Nevin L. Zhang (HKUST) COMP 328 Spring / 34
25 Continuous attributes Issues Continuous attributes? Discretize them. Equal intervals. Weka has MDL method. Use parameteric form (Gaussian) for P(A i C). Nevin L. Zhang (HKUST) COMP 328 Spring / 34
26 Issues Conditional independence assumption Assumption: A i s mutually independent of each other given C. Often violated. Might lead to double counting To see this, suppose we duplicate A 1. So we have two copies of A 1 in data: A 1 and A 1. Information in data remain the same. However, classification might be different: arg max ˆP(v j )ˆP(A 1 v j )ˆP(A 1 v j )ˆP(A 2 v j )... v j V The evidence on A 1 is counted twice. Nevin L. Zhang (HKUST) COMP 328 Spring / 34
27 Issues Conditional independence assumption Naive Bayes classifier works surprisingly well anyway! Reason: Although ˆP(v j ) i ˆP(a i v j ) might be a poor estimation of ˆP(v j, a 1...,a n ) to be correct we might still have arg max ˆP(v j ) v j V i ˆP(a i v j ) = arg max v j V P(v j, a 1..., a n ) Nevin L. Zhang (HKUST) COMP 328 Spring / 34
28 Issues Notes: Conditional independence assumption Bayesian (belief) network classifier: relax the assumption Nevin L. Zhang (HKUST) COMP 328 Spring / 34
29 Issues Overfitting Overfitting is not an issue for Naive Bayes classifier because the complexity is fixed. However it is an issue for Bayesian networks. Nevin L. Zhang (HKUST) COMP 328 Spring / 34
30 Outline Learning to Classify Text 1 Probabilistic Models and Classification 2 Probabilistic Independence 3 Na Ive Bayes Model Classifiers 4 Issues 5 Learning to Classify Text Nevin L. Zhang (HKUST) COMP 328 Spring / 34
31 Learning to Classify Text Spam and non-spam s Classify s into spam and non-spam according to content. Classes: S and S Attributes: w 1, w 2,..., w n, a list of words. From training set. Stop word removal: e.g., a, the Stemming: e.g., engineering, engineered, engineer engineer Parameters P(w i S): probability of word w i appear in a spam mail. P(w i S): probability of word w i appear in a non-spam mail. P(S) and P( S): probability of a mail being spam or non-spam All those can be obtained from a training set by counting. Nevin L. Zhang (HKUST) COMP 328 Spring / 34
32 Learning to Classify Text Spam and non-spam s New document D with words: X 1, X 2,..., X n A subset of w 1, w 2,..., w n P(D S) = m i=1 P(X i S) P(D S) = m i=1 P(X i S) Document is spam if P(S) n n P(X i S) > P( S) P(X i S) i=1 Can we do this with decision trees? i=1 Nevin L. Zhang (HKUST) COMP 328 Spring / 34
33 Learning to Classify Text Nevin L. Zhang (HKUST) COMP 328 Spring / 34
34 Learning to Classify Text Final Remark When to use Naive Bayes Classifiers? Moderate or large training set available Attributes that describe instances are conditionally independent given classification Successful applications: Diagnosis Classifying text documents Nevin L. Zhang (HKUST) COMP 328 Spring / 34
The Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationBayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.
Bayesian Learning Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. (Linked from class website) Conditional Probability Probability of
More informationCSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas
ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem
More informationStephen Scott.
1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions
More informationLecture 9: Bayesian Learning
Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationBayesian Learning Features of Bayesian learning methods:
Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more
More informationIntroduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees
Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical
More informationArtificial Intelligence: Reasoning Under Uncertainty/Bayes Nets
Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets Bayesian Learning Conditional Probability Probability of an event given the occurrence of some other event. P( X Y) P( X Y) P( Y) P( X,
More informationBayesian Classification. Bayesian Classification: Why?
Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches
More informationAlgorithms for Classification: The Basic Methods
Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.
More informationBayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction
15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationProbability Based Learning
Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with
More informationIntroduction to Machine Learning
Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationMachine Learning. Bayesian Learning.
Machine Learning Bayesian Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de
More informationDecision Trees. Tirgul 5
Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2
More informationBayesian Learning. Remark on Conditional Probabilities and Priors. Two Roles for Bayesian Methods. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.
Machine Learning Bayesian Learning Bayes Theorem Bayesian Learning [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme
More informationInteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano
Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano Prof. Josenildo Silva jcsilva@ifma.edu.br 2015 2012-2015 Josenildo Silva (jcsilva@ifma.edu.br) Este material é derivado dos
More informationRepresentation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2
Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training
More informationClassification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction
Classification What is classification Classification Simple methods for classification Classification by decision tree induction Classification evaluation Classification in Large Databases Classification
More informationSoft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.
Soft Computing Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning
More informationProbability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides
Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More information10-701/ Machine Learning: Assignment 1
10-701/15-781 Machine Learning: Assignment 1 The assignment is due September 27, 2005 at the beginning of class. Write your name in the top right-hand corner of each page submitted. No paperclips, folders,
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationBe able to define the following terms and answer basic questions about them:
CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional
More informationData Mining Part 4. Prediction
Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationBayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)
Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy
More informationBayesian Learning. Bayesian Learning Criteria
Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:
More informationTopics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning
Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying
More informationDecision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University
Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington State University Outline Decision tree representation ID3 learning algorithm Entropy and information gain
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationNaïve Bayes Lecture 6: Self-Study -----
Naïve Bayes Lecture 6: Self-Study ----- Marina Santini Acknowledgements Slides borrowed and adapted from: Data Mining by I. H. Witten, E. Frank and M. A. Hall 1 Lecture 6: Required Reading Daumé III (015:
More informationAn AI-ish view of Probability, Conditional Probability & Bayes Theorem
An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there
More information10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.
An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there
More informationMachine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU
Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU March 1, 2017 1 / 13 Bayes Rule Bayes Rule Assume that {B 1, B 2,..., B k } is a partition of S
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationMachine Learning. Bayesian Learning. Acknowledgement Slides courtesy of Martin Riedmiller
Machine Learning Bayesian Learning Dr. Joschka Boedecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de
More informationMachine Learning Recitation 8 Oct 21, Oznur Tastan
Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy
More informationEstimating Parameters
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 13, 2012 Today: Bayes Classifiers Naïve Bayes Gaussian Naïve Bayes Readings: Mitchell: Naïve Bayes
More informationLecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan
Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof Ganesh Ramakrishnan October 20, 2016 1 / 25 Decision Trees: Cascade of step
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationIntroduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition
Introduction Decision Tree Learning Practical methods for inductive inference Approximating discrete-valued functions Robust to noisy data and capable of learning disjunctive expression ID3 earch a completely
More informationLecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan
Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability
More informationBayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)
Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Review: Bayes Theorem & Diagnosis P( a b) Posterior Likelihood Prior P( b a) P( a)
More informationGenerative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang
Generative Classifiers: Part 1 CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang 1 This Week Discriminative vs Generative Models Simple Model: Does the patient
More informationDecision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1
Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last
More informationWhere are we? è Five major sec3ons of this course
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 12: Probability and Sta3s3cs Review Yanjun Qi / Jane University of Virginia Department of Computer Science 10/02/14 1
More informationBayesian Approaches Data Mining Selected Technique
Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationCOS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007
COS 424: Interacting with ata Lecturer: ave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 1 Graphical Models Wrap-up We began the lecture with some final words on graphical models. Choosing a
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationCS 188: Artificial Intelligence. Machine Learning
CS 188: Artificial Intelligence Review of Machine Learning (ML) DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered.
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 26, 2015 Today: Bayes Classifiers Conditional Independence Naïve Bayes Readings: Mitchell: Naïve Bayes
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationBAYESIAN LEARNING. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6]
1 BAYESIAN LEARNING [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theorem MAP, ML hypotheses, MAP learners Minimum description length principle Bayes optimal classifier, Naive Bayes learner Example:
More informationUncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty
Bayes Classification n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationNaïve Bayes Classifiers
Naïve Bayes Classifiers Example: PlayTennis (6.9.1) Given a new instance, e.g. (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong ), we want to compute the most likely hypothesis: v NB
More informationBasic Probability and Statistics
Basic Probability and Statistics Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Jerry Zhu, Mark Craven] slide 1 Reasoning with Uncertainty
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 11: Probability Review. Dr. Yanjun Qi. University of Virginia. Department of Computer Science
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 11: Probability Review 10/17/16 Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Announcements: Schedule Midterm Nov. 26 Wed / 3:30pm
More informationMachine Learning 2nd Edi7on
Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
More informationNaive Bayes classification
Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationDimensionality reduction
Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands
More informationCOMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem
COMP61011 : Machine Learning Probabilis*c Models + Bayes Theorem Probabilis*c Models - one of the most active areas of ML research in last 15 years - foundation of numerous new technologies - enables decision-making
More informationBayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:
Bayesian Inference The purpose of this document is to review belief networks and naive Bayes classifiers. Definitions from Probability: Belief networks: Naive Bayes Classifiers: Advantages and Disadvantages
More informationCOMP538: Introduction to Bayesian Networks
COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology
More informationDecision Trees.
. Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationBayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?
Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior
More informationCSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.
22 February 2007 CSE-4412(M) Midterm p. 1 of 12 CSE-4412(M) Midterm Sur / Last Name: Given / First Name: Student ID: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: Winter 2007 Answer the following
More informationA.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *
A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II * kevin small & byron wallace * Slides borrow heavily from Andrew Moore, Weng- Keen Wong and Longin Jan Latecki today
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationCLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC
CLASSIFICATION NAIVE BAYES NIKOLA MILIKIĆ nikola.milikic@fon.bg.ac.rs UROŠ KRČADINAC uros@krcadinac.com WHAT IS CLASSIFICATION? A supervised learning task of determining the class of an instance; it is
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationT Machine Learning: Basic Principles
Machine Learning: Basic Principles Bayesian Networks Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationGenerative Learning algorithms
CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,
More informationNaïve Bayes. Vibhav Gogate The University of Texas at Dallas
Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationData classification (II)
Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification
More information