III. Naïve Bayes (pp.70-72) Probability review

Size: px
Start display at page:

Download "III. Naïve Bayes (pp.70-72) Probability review"

Transcription

1 III. Naïve Bayes (pp.70-72) This is a short section in our text, but we are presenting more material in these notes. Probability review Definition of probability: The probability of an even E is the ratio of nr. of cases where E occurs to total nr. of cases. Note well: This applies only when all the cases are equally likely! Example: We roll a pair of (fair) dice. Find the probability of the event E= The sum is five. In the experiment above, calculate the probability of the events A= The sum is odd, B= The sum is prime. Union (or) and intersection (and) of events Cast a die and define these events: A = Nr. dots is 1 B = Nr. dots is odd Calculate P(A), P(B), P(A B), P(AUB). In the two-dice experiment, calculate the probabilities of these events: A = The sum is <10 and second die is >4 B = The sum is <10 or second die is >4

2 Law of conditional probability:, or P(A given B) = P(A and B)/P(B) (*) Interpretation: Since we know that B occurred, we renormalize by dividing by its probability. Example: Assuming the dots in the figure are equally likely, calculate P(A B), P(B A). Do it two ways: from scratch, with the definition of probability as a ratio, and by applying the formula of conditional proabability. In the two-dice experiment, calculate the probability of the event C = The sum is <10 given that the second die is >4 Equation (*) can be used to calculate any of the 3 probabilities involved, knowing the other 2. In particular, this form is very useful: P(A B) = P(A B) P(B) = P(B A) P(A) (**)

3 Bayes Theorem: (***) Prove Bayes Theorem, using Eq. (**) There is a big party on campus where all CS and Business majors are invited. One in ten Business majors are shy and six in ten CS majors are shy. We meet a student who is shy. Is it more likely for their major to be CS or Business? Technically, what is the probability for their major to be CS? (Please ponder for a minute...) Hint: We are missing an important piece of information: There are about 100 CS majors and 1000 Business majors in this University! Solve the previous problem by using Bayes Theorem directly: (***) Hint: First define the events A and B! Solution: A = The student we meet is a CS major. B = The student we meet is shy. A B =???? B A =???? Can we calculate the probabilities of all events on the RHS of Bayes Theorem?

4 P(B A) = P(shy, given that CS major) = 6/10 = 0.6 P(A) = P(CS major) = 100/( ) = P(B) = P(shy) =????... Let us think back on what we did in the first solution: Where do the 100 and 60 come from? P(shy) = P(shy and CS) + P(shy and Business) Now we apply conditional probability (**) to each term: = P(shy, given CS) P(CS) + P(shy, given Business) P(Business) = = 0.6 x x 1000 Note: Since all probabilities involved have a denominator of 1,100, we only wrote the numerators! The addition of proababilities operated above is so useful, it was enshrined as another theorem or law of Probability Theory: Law of Total Probability (LTP): P(B) = P(B A) P(A) + P(B not A) P(not A) Note: P(Business) = P(not CS) = P(not A) When using the LTP in the denominator of Bayes Theorem, we have this more detailed form of Bayes Theorem: (****) P(A) is called the prior, and P(A B) the posterior probability of A. B is called evidence. The ratio is the support B offers A.

5 Two cab companies serve a city: the Green company operates 85% of the cabs and the Blue company operates 15% of the cabs. One of the cabs is involved in a hit-and-run accident at night, and a witness identifies the hitand-run cab as a Blue cab. When the court tests the reliability of the witness under circumstances similar to those on the night of the accident, (s)he correctly identifies the color of a cab 80% of the time and misidentifies it the other 20% of the time. What is the probability that the cab involved in the accident was Blue, as stated by the witness? Hint: Use Bayes Theorem. Define the events A and B. - For more practice: Wikipedia s page for Bayes Theorem has three nice examples - study them all: Drug tests Reliability of factory machines Identification of beetles

6 Application: TEXT LEARNING We have a number of messages written by several authors 1. For simplicity, let us call the authors A1 and A2. Also for simplicity, the classification will not be based on all the words present, but on a (relatively small) subset of them 2. In this example, let us consider only three magic words: foo, bar, and baz. For each author, we calculate the probability of each word to appear in a message: P(foo A1) = 0.2 P(bar A1) = 0.3 P(baz A1) = 0.4 P(foo A2) = 0.3 P(bar A2) = 0.1 P(baz A2) = 0.3 We also need to know the distribution of the messages between A1 and A2. Ideally, they are evenly distributed: P(A1) = P(A2) = 0.5. We now have a new message, whose author is unknown, either A1 or A2. We find which of the magic words are present in the message, for example foo and bar. We call {foo bar} a bag of words, because the order or proximity of the words are not considered. We would like to calculate the probabilities P(A1 foo bar) and P(A2 foo bar), because then we would predict that the author with the higher probability is the author of the message. Mnemonic: The formulas are easy to remember with A for author and B for bag of words: P(A B). We apply Bayes Theorem (***) to find: P(A1 foo bar) = P(foo bar A1)P(A1)/P(foo bar), and similar for A2. Note: In Bayes Theorem we only have one piece of evidence, but here we have two: foo and bar. What to do? Here is where the naive in Naive Bayes comes into play: We assume that the words occur independently 3, so we can factorize the intersections: P(foo bar A1) = P(foo A1)P(bar A1) and similar for A2. Combining the last two eqns. we have: P(A1 foo bar) = P(foo A1)P(bar A1)P(A1)/P(foo bar) and similar for A2. Since A1 and A2 have the same denominator, we don t need it in order to establish which probability is greater, so we further simplify the formulas to: P(A1 foo bar) ~ P(foo A1)P(bar A1)P(A1) P(A2 foo bar) ~ P(foo A2)P(bar A2)P(A2) (o) With the numerical values shown at the beginning of this section, find out which author is more likely for the message. - 1 For example, the 85 Federalist Papers were written by Alexander Hamilton, James Madison, and John Jay. 2 Several studies on the disputed Federalist Papers are based on a set of 70 so-called function words. 3 In principle, if we had a large-enough corpus of messages from an author, we could estimate joint distributions for each combination of words, but in practice the combinatorial explosion prevents us from doing so.

7 How about baz, or in general any magic word that is missing from the message? Their absence may also count as information. How do the formulas (o) change to take this into account? Recalculate the probabilities from the problem above, taking into account that the word baz is missing. Which author is more likely now? - Note: The BernoulliNB classifier from scikit-learn does take into account the probabilities of the missing features!

8 Solutions: Prove Bayes Theorem, using Eq. (**) Multiply both sides by P(B), to get P(A B) P(B) = P(B A) P(A). According to (**), both sides are P(A B). Two cab companies serve a city: the Green company operates 85% of the cabs and the Blue company operates 15% of the cabs. One of the cabs is involved in a hit-and-run accident at night, and a witness identifies the hitand-run cab as a Blue cab. When the court tests the reliability of the witness under circumstances similar to those on the night of the accident, he correctly identifies the color of a cab 80% of the time and misidentifies it the other 20% of the time. What is the probability that the cab involved in the accident was Blue, as stated by the witness? Hint: Use Bayes Theorem. Define the events A and B. With the numerical values shown at the beginning of this section, find out which author is more likely for the message. P(A1 foo bar) ~ P(foo A1)P(bar A1)P(A1) = 0.2*0.3*0.5 = 0.03 P(A2 foo bar) ~ P(foo A2)P(bar A2)P(A2) = 0.3*0.1*0.5 = A1 is more likely. Recalculate the probabilities from the problem above, taking into account that the word baz is missing. Which author is more likely now? The 1 st probability above is further multiplied by 1 - P(baz A1): 0.03*0.6 = The 2 nd probability above is further multiplied by 1 - P(baz A2): 0.015*0.7 = A1 is still more likely.

9 In our textbook, we are shown how to represent four messages (each message is a row) and words (each feature/column is a word) in vectorized form: Let us use W0, W1, W2, and W3 for clarity. The targets/classes are represented in the array y: Let us use A0 and A1 for clarity. We write code to count the number of occurrences of each word in each class, by summing each column (axis=0). The function np.unique returns the sorted unique elements of a numpy array: Now we can convert the counts above to the probabilities we need for the Naive Bayes algorithm! Finish calculating the missing probabilities above! For easy reference, place all the probabilities obtained above in this table:

10 Since we are going to use these probabilities for multiplication, we can avoid the zeroes by adding one to all denominators and numerators. This is called Laplace smoothing: We have a new message: [1, 1, 0, 0]. Calculate the products for the Naive Bayes algorithm and decide which author is more likely. Do it by using only the positive occurences. For more practice: Use both positive and negative occurences. For more practice: Programming Naive Bayes classification from scratch Write a Python function that takes an array of four binary values (the message) as argument, and returns the prediction. Hint: For this problem, it is sufficient to hard-code the probability table as a two-dimensional list-of-lists or numpy array.

11 Create an array X for the four-word example, with 20 rows and 4 columns. Place 10 messages from A0 first, followed by 10 from A1. The vector y has 10 zeros (A0), followed by 10 ones (A1, or not A0). Solution: Below is a CSV (comma-separated-variables) file, visualized with a spreadsheet editor (left) and with a plain-text editor (right). The name of the file is messages.csv. The first 4 columns have the data in the array X, and the 5 th has the data for y (targets). Because values are missing, we import data into a numpy array using genfromtxt: Now we create and train a Naive Bayes classifier that implements the algorithm described above. It is called Bernoulli Naive Bayes: The (smoothing) parameter alpha has a meaning in NB classification that is slightly different from regression, but similar in that it controls the complexity of the model: If all the words appear in each class of the training set (as was the case in the example above), then no smoothing is necessary. If, however, one word, e.g. W3, is missing from a class in the training set, e.g. A1, then the estimated conditional probability is zero P(W3/A1) = 0. All future messages from A1 that happen to contain W3 will be given a probability of zero, irrespective of any other words they contain! This is effectively noise: Due to the accidental content of our sample, we are under the wrong impression that W3 never occurs in A1s messages. A model that attempts to model this accident is too complex, so alpha reduces this complexity. To avoid the case described above, a constant alpha is added to all the counts. By default, alpha = 1 (Laplace smoothing). The NB classifiers in scikit-learn do not allow alpha = 0. Even if we give alpha a value of zero, it will be automatically set to a very small value that is practically equal to zero.

12 The feature counts calculated manually in the text code are avilable as an attrribute of the classifier: and the numbers of datapoints in each class are also tallied automatically: and the prior probabilities are stored in the classifier: Due to the multiplicative nature of calculations in the Naive Bayes algorithm, the probabilities are stored in logarithmic form - this way they can be added rather than multiplied. In our example, note that the result is , which is simply the natural logarithm of 0.5, since both authors are equally represented. As with all classifiers and regressors, a member function allows to calculate the score for an array of points. Since we used the entire dataset for training, let us find the training score: What conclusion do we draw from the score above? Underfitting (because the data set is too small!) SKIP MultinomialNB and GaussianNB

13 Conclusions on Naive Bayes (BernoulliNB): Strengths: alpha is not as important as in regression, but it can still fine-tune the model. Works well (efficiently) with large, sparse matrices X (more in the lab!) Like linear models: fast to train and predict, easy to understand. On very large datasets, it is even faster to train than a linear model! Weaknesses/limitations: Is used only when the features are binary (0 or 1), and specifically for classifying text. Assumes independence of features, which may not be the case in real-life, e.g. the feature overcast (Y/N) is probably correlated with temperature (High/Low). Data scarcity... can be mitigated using smoothing (alpha).

14 Solutions: We have a new message: [1, 1, 0, 0]. Calculate the products for the Naive Bayes algorithm and decide which author is more likely. Do it by using only the positive occurences: Conclusion: not A0, a.k.a. A1 is more likely. For more practice: Use both positive and negative occurences. Hint: The two authors turn out to be equally likely!

Bayes Theorem. Jan Kracík. Department of Applied Mathematics FEECS, VŠB - TU Ostrava

Bayes Theorem. Jan Kracík. Department of Applied Mathematics FEECS, VŠB - TU Ostrava Jan Kracík Department of Applied Mathematics FEECS, VŠB - TU Ostrava Introduction Bayes theorem fundamental theorem in probability theory named after reverend Thomas Bayes (1701 1761) discovered in Bayes

More information

Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data.

Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data. INST 737 April 1, 2013 Midterm Name: }{{} by writing my name I swear by the honor code Read all of the following information before starting the exam: For free response questions, show all work, clearly

More information

2.4. Conditional Probability

2.4. Conditional Probability 2.4. Conditional Probability Objectives. Definition of conditional probability and multiplication rule Total probability Bayes Theorem Example 2.4.1. (#46 p.80 textbook) Suppose an individual is randomly

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

II. Linear Models (pp.47-70)

II. Linear Models (pp.47-70) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Agree or disagree: Regression can be always reduced to classification. Explain, either way! A certain classifier scores 98% on the training set,

More information

Probability Review and Naïve Bayes

Probability Review and Naïve Bayes Probability Review and Naïve Bayes Instructor: Alan Ritter Some slides adapted from Dan Jurfasky and Brendan O connor What is Probability? The probability the coin will land heads is 0.5 Q: what does this

More information

Machine Learning Algorithm. Heejun Kim

Machine Learning Algorithm. Heejun Kim Machine Learning Algorithm Heejun Kim June 12, 2018 Machine Learning Algorithms Machine Learning algorithm: a procedure in developing computer programs that improve their performance with experience. Types

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts

More information

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Conditional Probability. CS231 Dianna Xu

Conditional Probability. CS231 Dianna Xu Conditional Probability CS231 Dianna Xu 1 Boy or Girl? A couple has two children, one of them is a girl. What is the probability that the other one is also a girl? Assuming 50/50 chances of conceiving

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learning from Data 1 Naive Bayes Copyright David Barber 2001-2004. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 1 2 1 Why Naive Bayes? Naive Bayes is

More information

Probability Theory and Applications

Probability Theory and Applications Probability Theory and Applications Videos of the topics covered in this manual are available at the following links: Lesson 4 Probability I http://faculty.citadel.edu/silver/ba205/online course/lesson

More information

1 The Basic Counting Principles

1 The Basic Counting Principles 1 The Basic Counting Principles The Multiplication Rule If an operation consists of k steps and the first step can be performed in n 1 ways, the second step can be performed in n ways [regardless of how

More information

1 Probability Theory. 1.1 Introduction

1 Probability Theory. 1.1 Introduction 1 Probability Theory Probability theory is used as a tool in statistics. It helps to evaluate the reliability of our conclusions about the population when we have only information about a sample. Probability

More information

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Introduction to AI Learning Bayesian networks. Vibhav Gogate Introduction to AI Learning Bayesian networks Vibhav Gogate Inductive Learning in a nutshell Given: Data Examples of a function (X, F(X)) Predict function F(X) for new examples X Discrete F(X): Classification

More information

Applied Machine Learning Lecture 2, part 1: Basic linear algebra; linear algebra in NumPy and SciPy. Richard Johansson

Applied Machine Learning Lecture 2, part 1: Basic linear algebra; linear algebra in NumPy and SciPy. Richard Johansson Applied Machine Learning Lecture 2, part 1: Basic linear algebra; linear algebra in NumPy and SciPy Richard Johansson linear algebra linear algebra is the branch of mathematics that deals with relationships

More information

Review of Basic Probability

Review of Basic Probability Review of Basic Probability Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA 01003 September 16, 2009 Abstract This document reviews basic discrete

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Machine Learning: Assignment 1

Machine Learning: Assignment 1 10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this

More information

the time it takes until a radioactive substance undergoes a decay

the time it takes until a radioactive substance undergoes a decay 1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 3 Probability Contents 1. Events, Sample Spaces, and Probability 2. Unions and Intersections 3. Complementary Events 4. The Additive Rule and Mutually Exclusive

More information

Programming Assignment 4: Image Completion using Mixture of Bernoullis

Programming Assignment 4: Image Completion using Mixture of Bernoullis Programming Assignment 4: Image Completion using Mixture of Bernoullis Deadline: Tuesday, April 4, at 11:59pm TA: Renie Liao (csc321ta@cs.toronto.edu) Submission: You must submit two files through MarkUs

More information

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU March 1, 2017 1 / 13 Bayes Rule Bayes Rule Assume that {B 1, B 2,..., B k } is a partition of S

More information

The Practice of Statistics Third Edition

The Practice of Statistics Third Edition The Practice of Statistics Third Edition Chapter 6: Probability and Simulation: The Study of Randomness Copyright 2008 by W. H. Freeman & Company Probability Rules True probability can only be found by

More information

What s Cooking? Predicting Cuisines from Recipe Ingredients

What s Cooking? Predicting Cuisines from Recipe Ingredients What s Cooking? Predicting Cuisines from Recipe Ingredients Kevin K. Do Department of Computer Science Duke University Durham, NC 27708 kevin.kydat.do@gmail.com Abstract Kaggle is an online platform for

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 206 Jonathan Pillow Homework 8: Logistic Regression & Information Theory Due: Tuesday, April 26, 9:59am Optimization Toolbox One

More information

Probability and Discrete Distributions

Probability and Discrete Distributions AMS 7L LAB #3 Fall, 2007 Objectives: Probability and Discrete Distributions 1. To explore relative frequency and the Law of Large Numbers 2. To practice the basic rules of probability 3. To work with the

More information

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Homework 6: Image Completion using Mixture of Bernoullis

Homework 6: Image Completion using Mixture of Bernoullis Homework 6: Image Completion using Mixture of Bernoullis Deadline: Wednesday, Nov. 21, at 11:59pm Submission: You must submit two files through MarkUs 1 : 1. a PDF file containing your writeup, titled

More information

Unit 4 Patterns and Algebra

Unit 4 Patterns and Algebra Unit 4 Patterns and Algebra In this unit, students will solve equations with integer coefficients using a variety of methods, and apply their reasoning skills to find mistakes in solutions of these equations.

More information

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary) Chapter 14 From Randomness to Probability How to measure a likelihood of an event? How likely is it to answer correctly one out of two true-false questions on a quiz? Is it more, less, or equally likely

More information

Ch 14 Randomness and Probability

Ch 14 Randomness and Probability Ch 14 Randomness and Probability We ll begin a new part: randomness and probability. This part contain 4 chapters: 14-17. Why we need to learn this part? Probability is not a portion of statistics. Instead

More information

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Review: Bayes Theorem & Diagnosis P( a b) Posterior Likelihood Prior P( b a) P( a)

More information

Conditional Probability, Independence, Bayes Theorem Spring 2018

Conditional Probability, Independence, Bayes Theorem Spring 2018 Conditional Probability, Independence, Bayes Theorem 18.05 Spring 2018 Slides are Posted Don t forget that after class we post the slides including solutions to all the questions. February 13, 2018 2 /

More information

Independence 1 2 P(H) = 1 4. On the other hand = P(F ) =

Independence 1 2 P(H) = 1 4. On the other hand = P(F ) = Independence Previously we considered the following experiment: A card is drawn at random from a standard deck of cards. Let H be the event that a heart is drawn, let R be the event that a red card is

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

Chapter 2 Class Notes

Chapter 2 Class Notes Chapter 2 Class Notes Probability can be thought of in many ways, for example as a relative frequency of a long series of trials (e.g. flips of a coin or die) Another approach is to let an expert (such

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes.

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes. Chapter 2 Introduction to Probability 2.1 Probability Model Probability concerns about the chance of observing certain outcome resulting from an experiment. However, since chance is an abstraction of something

More information

(6, 1), (5, 2), (4, 3), (3, 4), (2, 5), (1, 6)

(6, 1), (5, 2), (4, 3), (3, 4), (2, 5), (1, 6) Section 7.3: Compound Events Because we are using the framework of set theory to analyze probability, we can use unions, intersections and complements to break complex events into compositions of events

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Naive Bayes Classifier. Danushka Bollegala

Naive Bayes Classifier. Danushka Bollegala Naive Bayes Classifier Danushka Bollegala Bayes Rule The probability of hypothesis H, given evidence E P(H E) = P(E H)P(H)/P(E) Terminology P(E): Marginal probability of the evidence E P(H): Prior probability

More information

Probability 1 (MATH 11300) lecture slides

Probability 1 (MATH 11300) lecture slides Probability 1 (MATH 11300) lecture slides Márton Balázs School of Mathematics University of Bristol Autumn, 2015 December 16, 2015 To know... http://www.maths.bris.ac.uk/ mb13434/prob1/ m.balazs@bristol.ac.uk

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your

More information

Probabilistic models

Probabilistic models Probabilistic models Kolmogorov (Andrei Nikolaevich, 1903 1987) put forward an axiomatic system for probability theory. Foundations of the Calculus of Probabilities, published in 1933, immediately became

More information

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci. Soft Computing Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning

More information

Probability the chance that an uncertain event will occur (always between 0 and 1)

Probability the chance that an uncertain event will occur (always between 0 and 1) Quantitative Methods 2013 1 Probability as a Numerical Measure of the Likelihood of Occurrence Probability the chance that an uncertain event will occur (always between 0 and 1) Increasing Likelihood of

More information

3 PROBABILITY TOPICS

3 PROBABILITY TOPICS Chapter 3 Probability Topics 135 3 PROBABILITY TOPICS Figure 3.1 Meteor showers are rare, but the probability of them occurring can be calculated. (credit: Navicore/flickr) Introduction It is often necessary

More information

Homework #1 RELEASE DATE: 09/26/2013 DUE DATE: 10/14/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM.

Homework #1 RELEASE DATE: 09/26/2013 DUE DATE: 10/14/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM. Homework #1 REEASE DATE: 09/6/013 DUE DATE: 10/14/013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIAS ARE WECOMED ON THE FORUM. Unless granted by the instructor in advance, you must turn in a printed/written

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

STAT:5100 (22S:193) Statistical Inference I

STAT:5100 (22S:193) Statistical Inference I STAT:5100 (22S:193) Statistical Inference I Week 3 Luke Tierney University of Iowa Fall 2015 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 1 Recap Matching problem Generalized

More information

Intermediate Math Circles November 8, 2017 Probability II

Intermediate Math Circles November 8, 2017 Probability II Intersection of Events and Independence Consider two groups of pairs of events Intermediate Math Circles November 8, 017 Probability II Group 1 (Dependent Events) A = {a sales associate has training} B

More information

Machine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012

Machine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012 Machine Learning Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421 Introduction to Artificial Intelligence 8 May 2012 g 1 Many slides courtesy of Dan Klein, Stuart Russell, or Andrew

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Lecture 2 Probability Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 3-1 3.1 Definition Random Experiment a process leading to an uncertain

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

February 2017 February 18, 2017

February 2017 February 18, 2017 February 017 February 18, 017 Combinatorics 1. Kelvin the Frog is going to roll three fair ten-sided dice with faces labelled 0, 1,,..., 9. First he rolls two dice, and finds the sum of the two rolls.

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34 To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would

More information

Machine Learning 4. week

Machine Learning 4. week Machine Learning 4. week robability and Conditional robability ayes Theorem Naïve ayes Classifier Umut ORHN, hd. robability The term shows the occurring likelihood of each situation in a random process

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

LECTURE NOTES by DR. J.S.V.R. KRISHNA PRASAD

LECTURE NOTES by DR. J.S.V.R. KRISHNA PRASAD .0 Introduction: The theory of probability has its origin in the games of chance related to gambling such as tossing of a coin, throwing of a die, drawing cards from a pack of cards etc. Jerame Cardon,

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

A Event has occurred

A Event has occurred Statistics and probability: 1-1 1. Probability Event: a possible outcome or set of possible outcomes of an experiment or observation. Typically denoted by a capital letter: A, B etc. E.g. The result of

More information

Generative Learning algorithms

Generative Learning algorithms CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,

More information

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem COMP61011 : Machine Learning Probabilis*c Models + Bayes Theorem Probabilis*c Models - one of the most active areas of ML research in last 15 years - foundation of numerous new technologies - enables decision-making

More information

Tutorial 1 : Probabilities

Tutorial 1 : Probabilities Lund University ETSN01 Advanced Telecommunication Tutorial 1 : Probabilities Author: Antonio Franco Emma Fitzgerald Tutor: Farnaz Moradi January 11, 2016 Contents I Before you start 3 II Exercises 3 1

More information

3.2 Probability Rules

3.2 Probability Rules 3.2 Probability Rules The idea of probability rests on the fact that chance behavior is predictable in the long run. In the last section, we used simulation to imitate chance behavior. Do we always need

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

CS 188: Artificial Intelligence. Machine Learning

CS 188: Artificial Intelligence. Machine Learning CS 188: Artificial Intelligence Review of Machine Learning (ML) DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Module 8 Probability

Module 8 Probability Module 8 Probability Probability is an important part of modern mathematics and modern life, since so many things involve randomness. The ClassWiz is helpful for calculating probabilities, especially those

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

CDA6530: Performance Models of Computers and Networks. Chapter 1: Review of Practical Probability

CDA6530: Performance Models of Computers and Networks. Chapter 1: Review of Practical Probability CDA6530: Performance Models of Computers and Networks Chapter 1: Review of Practical Probability Probability Definition Sample Space (S) which is a collection of objects (all possible scenarios or values).

More information

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces. Probability Theory To start out the course, we need to know something about statistics and probability Introduction to Probability Theory L645 Advanced NLP Autumn 2009 This is only an introduction; for

More information

Statistics for Business and Economics

Statistics for Business and Economics Statistics for Business and Economics Basic Probability Learning Objectives In this lecture(s), you learn: Basic probability concepts Conditional probability To use Bayes Theorem to revise probabilities

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Probability & Random Variables

Probability & Random Variables & Random Variables Probability Probability theory is the branch of math that deals with random events, processes, and variables What does randomness mean to you? How would you define probability in your

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

A(VeryBrief)IntroductiontoIdealPoint Estimates

A(VeryBrief)IntroductiontoIdealPoint Estimates A(VeryBrief)IntroductiontoIdealPoint Estimates 1 Motivation 2 3 4 5 Motivation A cab was involved in a hit-and run accident at night. Two cab companies (Green and Blue) operate the city. 85% of the cabs

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Set 2 January 12 th, 2018

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Set 2 January 12 th, 2018 Policies Due 9 PM, January 9 th, via Moodle. You are free to collaborate on all of the problems, subject to the collaboration policy stated in the syllabus. You should submit all code used in the homework.

More information

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

Grades K 6. Tap into on-the-go learning! hmhco.com. Made in the United States Text printed on 100% recycled paper hmhco.

Grades K 6. Tap into on-the-go learning! hmhco.com. Made in the United States Text printed on 100% recycled paper hmhco. Tap into on-the-go learning! C A L I F O R N I A Scop e a n d Se q u e n c e Grades K 6 Made in the United States Text printed on 100% recycled paper 1560277 hmhco.com K Made in the United States Text

More information

CS 237 Fall 2018, Homework 06 Solution

CS 237 Fall 2018, Homework 06 Solution 0/9/20 hw06.solution CS 237 Fall 20, Homework 06 Solution Due date: Thursday October th at :59 pm (0% off if up to 24 hours late) via Gradescope General Instructions Please complete this notebook by filling

More information

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a

More information