COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Size: px
Start display at page:

Download "COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem"

Transcription

1 COMP61011 : Machine Learning Probabilis*c Models + Bayes Theorem

2 Probabilis*c Models - one of the most active areas of ML research in last 15 years - foundation of numerous new technologies - enables decision-making under uncertainty - Tough. Don t expect to get this immediately. It takes time.

3 I have four snooker balls in a bag 2 black, 2 white. I reach in with my eyes closed. What is the probability of picking a black ball? I give this variable a name, A. p(a = black) = 1 2

4 Picking a black ball, then replacing, then picking black again? p(a = black, B = black) = 1 4 Why? p(a = black)p(b = black) = = 1 4

5 Picking two black balls in sequence (i.e. no replacing)? p(a = black, B = black) =? p(a = black)p(b = black A = black) = 1 6

6 Probabilities and Conditional Probabilities p(a = black) p(b = black A = black) Events : A, B, C, etc random variables e.g. A is the random event of picking the first ball. B is the random event of picking the second ball. p(a =1) p(b =1 A =1) where 1 means the ball was black.

7 Rules of Probability Theory p(a =1, B =1) = p(a =1)p(B =1 A =1) Probability that both balls are black = Probability that the first is black x Probability that the second is black, given that the first was black

8 Shorthand notation p(a =1, B =1) = p(a =1)p(B =1 A =1) p(a, B) = p(a)p(b A) Means that the rule holds for all possible assignments of values to A and B.

9 If two events A,B are dependent : p(a, B) = p(b A)p(A) e.g. black/white balls example If two events A,B are independent : p(a, B) = p(a)p(b) e.g. two consecutive rolls of a dice

10 Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( wind = strong ) = The chances of the wind being strong, among all days.

11 Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( wind = strong ) = 6 /14 = The chances of the wind being strong, among all days.

12 Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( wind = strong tennis = yes ) = The chances of a strong wind day, given that the person enjoyed tennis.

13 Outlook Temperature Humidity Wind Tennis? D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D7 Overcast Cool Normal Strong Yes D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes p( wind = strong tennis = yes ) = The chances of a strong wind day, given that the person enjoyed tennis.

14 Outlook Temperature Humidity Wind Tennis? D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D7 Overcast Cool Normal Strong Yes D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes p( wind = strong tennis = yes ) = 3 / 9 = The chances of a strong wind day, given that the person enjoyed tennis.

15 Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( tennis = yes wind = strong ) = The chances of the person enjoying tennis, given that it is a strong wind day.

16 Outlook Temperature Humidity Wind Tennis? D2 Sunny Hot High Strong No D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D14 Rain Mild High Strong No p( tennis = yes wind = strong ) = 0.5 The chances of the person enjoying tennis, given that it is a strong wind day.

17 Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( temp = hot tennis = yes ) = p( tennis = yes temp = hot) = p( tennis = yes temp = hot, humidity = high) =

18 What s the use of all this? - We can calculate these numbers on data - Leads to an elegant theorem we can make use of

19 A problem to solve: 1% of the population get cancer 80% of people with cancer get a positive test 9.6% of people without cancer also get a positive test The question: A person has a test for cancer that comes back positive. What is the probability that they actually have cancer? Quick guess: a) less than 1% b) somewhere between 1% and 70% c) between 70% and 80% d) more than 80%

20 Write down the probabilities of everything Define variables: C : 1= presence of cancer, 0 = no cancer, E : 1= positive test, 0 = negative test The prior probability of cancer in the population is 1%, so p(c =1) = 0.01 The probability of positive test given there is cancer, p(e =1 C =1) = 0.8 If there is no cancer, we still have p(e =1 C = 0) = p(c=1 E =1) The question is: what is?

21 Working with Concrete Numbers 10,000 patients p(c=1) = cancer p(c=0) = no cancer p(e=1 C=1) = 0.8 p(e=1 C=0) = cancer, positive test 20 cancer, negative test no cancer, positive test no cancer, negative test p(c =1 E =1) =? How many people from 10,000 get E=1? How many from those get C=1?

22 Working with Concrete Numbers 10,000 patients p(c=1) = cancer p(c=0) = no cancer p(e=1 C=1) = 0.8 p(e=1 C=0) = cancer, positive test 20 cancer, negative test no cancer, positive test no cancer, negative test p(c =1 E =1) = = %

23 Surprising result! Do you trust your Doctor? Although the probability of a positive test given cancer is 80%, the probability of cancer given a positive test is only about 7.8%. 8/10 doctors would have said: c) between 70% and 80%. WRONG!! Common mistake: the probability that a person with positive test has cancer is not the same as the probability that a person with cancer has a positive test. One must also consider : the background chances (prior) of having cancer, the chances of receiving a false alarm in the test.

24 Solving the same problem, via Bayes Theorem p(e =1 C =1) = 0.8 p( C = 1) = 0.01 p(e =1,C =1) = p(e =1 C =1)p(C =1) The general statement is: p(e, C) = p(e C)p(C) And since the statement E and C is equivalent to C and E : p(e, C) = p(c E)p(E)

25 Solving the same problem, via Bayes Theorem p(e, C) = p(e C)p(C) = p(c E)p(E) Now rearrange p(c E)p(E) = p(e C)p(C) p(c E) = p(e C)p(C) p(e)

26 p(c E) = p(e C)p(C) p(e) Rev. Thomas Bayes, Bayes Theorem forms the backbone of the past 20 years of ML research into probabilistic models. Think of E as effect and C as cause. But.. warning: sometimes thinking this way will be very non-intuitive.

27 we know this we know this we want this p(c =1 E =1) = p(e =1 C =1)p(C =1) p(e =1) we can calculate this Another rule of probability theory: marginalizing p(e =1) = p(e =1C = c)p(c = c) c C Think of this as given all possible things that can happen with C, what is the probability of E=1?

28 p(c =1 E =1) = = p(e =1 C =1)p(C =1) p(e =1) p(e =1 C =1)p(C =1) p(e =1 C =1)p(C =1)+ p(e =1 C = 0)p(C = 0) Notice the denominator now contains the same term as the numerator. We only need to know two terms here: p(e=1 C=1)p(C=1) and p(e=1 C=0)p(C=0)

29 p(e =1 C =1)p(C =1) = p(e =1 C = 0)p(C = 0) = p(c =1 E =1) = = = 7.76%

30 Bayes theorem. Talk to your neighbours 5 mins or so.

31 Another Example what year is it? You jump in a time machine. It takes you somewhere. But you don t know to what year it has taken you. You know it is one of 1885, 1955, 1985, or 2015.

32 What year is it? You look out the window and see a STEAM train. What are the chances of seeing this in the year 2015? Let s guess

33 What year is it? In other years? And remember

34 What year is it? Bayes Theorem to the rescue. We can calculate the denominator as

35 What year is it? Bayes Theorem to the rescue.

36 What year is it? Bayes Theorem to the rescue. For other years.

37 What year is it? Then you look out the window. And see someone wearing Nike branded trainers.

38 What year is it? But now our belief over what year it is has changed, because of the train But, Bayes Theorem can just use this, plugging it back into the same equation

39 What year is it?

40 What year is it?

41 What year is it? Prior belief Observation Observation We believe we are in 1985, with p = 0.945

42 Bayes theorem, done. Take a 15 minute break.

43 More Problems Solved with Probabilities Your car is making a noise. What are the chances that the tank is empty? The chances of the car making noise, if the tank really is empty. The chances of the car making noise, if the tank is not empty p( noisy = 1 empty = 1) = p( noisy = 1 empty = 0) = The chances of the tank being empty, regardless of anything else. p( empty = 1) = 0.5 p( empty = 1 noisy = 1) =?

44 Bayes Theorem p(noisy =1 empty =1)p(empty =1) p(noisy =1 empty = 0)p(empty = 0) p(empty =1 noisy =1) = 0.9 * 0.5 (0.9 * 0.5)+ (0.2 * 0.5) =

45 Another Problem to Solve A person tests positive for a certain medical disease. What are the chances that they really do have the disease? The chances of the test being positive, if the person really is ill. The chances of the test being positive, if the person is in fact well. p( test = 1 disease = 1) = p( test = 1 disease = 0) = The chances of the condition, in the general population. p( disease = 1) = 0.05 p( disease = 1 test = 1) =?

46 Bayes Theorem p(test =1 disease =1)p(disease =1) p(test =1 disease = 0)p(disease = 0) p( disease = 1 test = 1) = =

47 Another Problem to Solve Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has rained only 5 days each year. Unfortunately, the weatherman has predicted rain for tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 20% of the time. What is are the chances it will rain on the day of Marie's wedding? The chances of the forecast saying rain, if it really does rain. The chances of the forecast saying rain, if it will be fine. p( forecastrain = 1 rain = 1) = p( forecastrain = 1 rain = 0) = The chances of rain, in the general case. p( rain = 1) = 5/365 =

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem! COMP61011 Probabilistic Classifiers Part 1, Bayes Theorem Reverend Thomas Bayes, 1702-1761 p ( T W ) W T ) T ) W ) Bayes Theorem forms the backbone of the past 20 years of ML research into probabilistic

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

Decision Trees. Gavin Brown

Decision Trees. Gavin Brown Decision Trees Gavin Brown Every Learning Method has Limitations Linear model? KNN? SVM? Explain your decisions Sometimes we need interpretable results from our techniques. How do you explain the above

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Spring Semester, 2008 Week 12 Course Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Spring Semester, 2008 Week

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Fall Semester, 2007 Lecture 12 Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Fall Semester, 2007 Lecture 12 Notes

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

Lecture 2. Conditional Probability

Lecture 2. Conditional Probability Math 408 - Mathematical Statistics Lecture 2. Conditional Probability January 18, 2013 Konstantin Zuev (USC) Math 408, Lecture 2 January 18, 2013 1 / 9 Agenda Motivation and Definition Properties of Conditional

More information

Bayesian Classification. Bayesian Classification: Why?

Bayesian Classification. Bayesian Classification: Why? Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches

More information

Decision Trees. Danushka Bollegala

Decision Trees. Danushka Bollegala Decision Trees Danushka Bollegala Rule-based Classifiers In rule-based learning, the idea is to learn a rule from train data in the form IF X THEN Y (or a combination of nested conditions) that explains

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Artificial Intelligence Programming Probability

Artificial Intelligence Programming Probability Artificial Intelligence Programming Probability Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science University of San Francisco p.1/?? 13-0: Uncertainty

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU March 1, 2017 1 / 13 Bayes Rule Bayes Rule Assume that {B 1, B 2,..., B k } is a partition of S

More information

Uncertain Knowledge and Bayes Rule. George Konidaris

Uncertain Knowledge and Bayes Rule. George Konidaris Uncertain Knowledge and Bayes Rule George Konidaris gdk@cs.brown.edu Fall 2018 Knowledge Logic Logical representations are based on: Facts about the world. Either true or false. We may not know which.

More information

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition Introduction Decision Tree Learning Practical methods for inductive inference Approximating discrete-valued functions Robust to noisy data and capable of learning disjunctive expression ID3 earch a completely

More information

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability MAE 493G, CpE 493M, Mobile Robotics 6. Basic Probability Instructor: Yu Gu, Fall 2013 Uncertainties in Robotics Robot environments are inherently unpredictable; Sensors and data acquisition systems are

More information

COMP 328: Machine Learning

COMP 328: Machine Learning COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

Algorithms for Classification: The Basic Methods

Algorithms for Classification: The Basic Methods Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.

More information

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II * A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II * kevin small & byron wallace * Slides borrow heavily from Andrew Moore, Weng- Keen Wong and Longin Jan Latecki today

More information

Decision Tree Learning and Inductive Inference

Decision Tree Learning and Inductive Inference Decision Tree Learning and Inductive Inference 1 Widely used method for inductive inference Inductive Inference Hypothesis: Any hypothesis found to approximate the target function well over a sufficiently

More information

Basics of Probability

Basics of Probability Basics of Probability Lecture 1 Doug Downey, Northwestern EECS 474 Events Event space E.g. for dice, = {1, 2, 3, 4, 5, 6} Set of measurable events S 2 E.g., = event we roll an even number = {2, 4, 6} S

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

Basic Probability and Statistics

Basic Probability and Statistics Basic Probability and Statistics Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Jerry Zhu, Mark Craven] slide 1 Reasoning with Uncertainty

More information

Decision Tree Learning - ID3

Decision Tree Learning - ID3 Decision Tree Learning - ID3 n Decision tree examples n ID3 algorithm n Occam Razor n Top-Down Induction in Decision Trees n Information Theory n gain from property 1 Training Examples Day Outlook Temp.

More information

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci. Soft Computing Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning

More information

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped

More information

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning Computer Science CPSC 322 Lecture 18 Marginalization, Conditioning Lecture Overview Recap Lecture 17 Joint Probability Distribution, Marginalization Conditioning Inference by Enumeration Bayes Rule, Chain

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

Uncertainty. Russell & Norvig Chapter 13.

Uncertainty. Russell & Norvig Chapter 13. Uncertainty Russell & Norvig Chapter 13 http://toonut.com/wp-content/uploads/2011/12/69wp.jpg Uncertainty Let A t be the action of leaving for the airport t minutes before your flight Will A t get you

More information

Learning Classification Trees. Sargur Srihari

Learning Classification Trees. Sargur Srihari Learning Classification Trees Sargur srihari@cedar.buffalo.edu 1 Topics in CART CART as an adaptive basis function model Classification and Regression Tree Basics Growing a Tree 2 A Classification Tree

More information

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability

More information

Intermediate Math Circles November 15, 2017 Probability III

Intermediate Math Circles November 15, 2017 Probability III Intermediate Math Circles November 5, 07 Probability III Example : You have bins in which there are coloured balls. The balls are identical except for their colours. The contents of the containers are:

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102 Mean, Median and Mode Lecture 3 - Axioms of Probability Sta102 / BME102 Colin Rundel September 1, 2014 We start with a set of 21 numbers, ## [1] -2.2-1.6-1.0-0.5-0.4-0.3-0.2 0.1 0.1 0.2 0.4 ## [12] 0.4

More information

Joint, Conditional, & Marginal Probabilities

Joint, Conditional, & Marginal Probabilities Joint, Conditional, & Marginal Probabilities The three axioms for probability don t discuss how to create probabilities for combined events such as P [A B] or for the likelihood of an event A given that

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Lecture 2 Probability Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 3-1 3.1 Definition Random Experiment a process leading to an uncertain

More information

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 15 Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc. The General Addition Rule When two events A and B are disjoint, we can use the addition rule for disjoint events from Chapter

More information

Probability Based Learning

Probability Based Learning Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic

More information

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. Bayesian Learning Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. (Linked from class website) Conditional Probability Probability of

More information

In today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality.

In today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality. In today s lecture COSC343: Artificial Intelligence Lecture 5: Bayesian Reasoning Conditional probability independence Curse of dimensionality Lech Szymanski Dept. of Computer Science, University of Otago

More information

Event A: at least one tail observed A:

Event A: at least one tail observed A: Chapter 3 Probability 3.1 Events, sample space, and probability Basic definitions: An is an act of observation that leads to a single outcome that cannot be predicted with certainty. A (or simple event)

More information

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Trees Part 1. Rao Vemuri University of California, Davis Decision Trees Part 1 Rao Vemuri University of California, Davis Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Classification Vs Regression

More information

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano Prof. Josenildo Silva jcsilva@ifma.edu.br 2015 2012-2015 Josenildo Silva (jcsilva@ifma.edu.br) Este material é derivado dos

More information

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty Machine Learning CS6375 --- Spring 2015 a Bayesian Learning (I) 1 Uncertainty Most real-world problems deal with uncertain information Diagnosis: Likely disease given observed symptoms Equipment repair:

More information

Lecture 3 - Axioms of Probability

Lecture 3 - Axioms of Probability Lecture 3 - Axioms of Probability Sta102 / BME102 January 25, 2016 Colin Rundel Axioms of Probability What does it mean to say that: The probability of flipping a coin and getting heads is 1/2? 3 What

More information

Linear Classifiers and the Perceptron

Linear Classifiers and the Perceptron Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible

More information

Building Bayesian Networks. Lecture3: Building BN p.1

Building Bayesian Networks. Lecture3: Building BN p.1 Building Bayesian Networks Lecture3: Building BN p.1 The focus today... Problem solving by Bayesian networks Designing Bayesian networks Qualitative part (structure) Quantitative part (probability assessment)

More information

10-701/ Machine Learning: Assignment 1

10-701/ Machine Learning: Assignment 1 10-701/15-781 Machine Learning: Assignment 1 The assignment is due September 27, 2005 at the beginning of class. Write your name in the top right-hand corner of each page submitted. No paperclips, folders,

More information

Decision Trees.

Decision Trees. . Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks. Bayesian Networks Today we ll introduce Bayesian Networks. This material is covered in chapters 13 and 14. Chapter 13 gives basic background on probability and Chapter 14 talks about Bayesian Networks.

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

Probability Notes (A) , Fall 2010

Probability Notes (A) , Fall 2010 Probability Notes (A) 18.310, Fall 2010 We are going to be spending around four lectures on probability theory this year. These notes cover approximately the first three lectures on it. Probability theory

More information

Single Maths B: Introduction to Probability

Single Maths B: Introduction to Probability Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Where are we in CS 440?

Where are we in CS 440? Where are we in CS 440? Now leaving: sequential deterministic reasoning Entering: probabilistic reasoning and machine learning robability: Review of main concepts Chapter 3 Making decisions under uncertainty

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

Quadratic Equations Part I

Quadratic Equations Part I Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing

More information

Bayesian Inference. Introduction

Bayesian Inference. Introduction Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,

More information

Machine Learning 2nd Edi7on

Machine Learning 2nd Edi7on Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e

More information

Administrative notes. Computational Thinking ct.cs.ubc.ca

Administrative notes. Computational Thinking ct.cs.ubc.ca Administrative notes Labs this week: project time. Remember, you need to pass the project in order to pass the course! (See course syllabus.) Clicker grades should be on-line now Administrative notes March

More information

ARTIFICIAL INTELLIGENCE. Supervised learning: classification

ARTIFICIAL INTELLIGENCE. Supervised learning: classification INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Supervised learning: classification Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from

More information

1 INFO 2950, 2 4 Feb 10

1 INFO 2950, 2 4 Feb 10 First a few paragraphs of review from previous lectures: A finite probability space is a set S and a function p : S [0, 1] such that p(s) > 0 ( s S) and s S p(s) 1. We refer to S as the sample space, subsets

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

the time it takes until a radioactive substance undergoes a decay

the time it takes until a radioactive substance undergoes a decay 1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete

More information

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 28

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 28 Conditional Probability, Independence, Bayes Theorem 18.05 Spring 2014 January 1, 2017 1 / 28 Sample Space Confusions 1. Sample space = set of all possible outcomes of an experiment. 2. The size of the

More information

Decision Trees.

Decision Trees. . Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology Decision trees Special Course in Computer and Information Science II Adam Gyenge Helsinki University of Technology 6.2.2008 Introduction Outline: Definition of decision trees ID3 Pruning methods Bibliography:

More information

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem

More information

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical

More information

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial C:145 Artificial Intelligence@ Uncertainty Readings: Chapter 13 Russell & Norvig. Artificial Intelligence p.1/43 Logic and Uncertainty One problem with logical-agent approaches: Agents almost never have

More information

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014 Bayes Formula MATH 07: Finite Mathematics University of Louisville March 26, 204 Test Accuracy Conditional reversal 2 / 5 A motivating question A rare disease occurs in out of every 0,000 people. A test

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

Probability, Statistics, and Bayes Theorem Session 3

Probability, Statistics, and Bayes Theorem Session 3 Probability, Statistics, and Bayes Theorem Session 3 1 Introduction Now that we know what Bayes Theorem is, we want to explore some of the ways that it can be used in real-life situations. Often the results

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

More information

STA Module 4 Probability Concepts. Rev.F08 1

STA Module 4 Probability Concepts. Rev.F08 1 STA 2023 Module 4 Probability Concepts Rev.F08 1 Learning Objectives Upon completing this module, you should be able to: 1. Compute probabilities for experiments having equally likely outcomes. 2. Interpret

More information

Stephen Scott.

Stephen Scott. 1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

More information

1.6/1.7 - Conditional Probability and Bayes Theorem

1.6/1.7 - Conditional Probability and Bayes Theorem 1.6/1.7 - Conditional Probability and Bayes Theorem Math 166-502 Blake Boudreaux Department of Mathematics Texas A&M University February 1, 2018 Blake Boudreaux (Texas A&M University) 1.6/1.7 - Conditional

More information

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997 Outline Training Examples for EnjoySport Learning from examples General-to-specific ordering over hypotheses [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Version spaces and candidate elimination

More information

Lecture 4 Bayes Theorem

Lecture 4 Bayes Theorem Lecture 4 Bayes Theorem Thais Paiva STA 111 - Summer 2013 Term II July 5, 2013 Lecture Plan 1 Probability Review 2 Bayes Theorem 3 More worked problems Why Study Probability? A probability model describes

More information

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction Classification What is classification Classification Simple methods for classification Classification by decision tree induction Classification evaluation Classification in Large Databases Classification

More information

Lecture 1: Basics of Probability

Lecture 1: Basics of Probability Lecture 1: Basics of Probability (Luise-Vitetta, Chapter 8) Why probability in data science? Data acquisition is noisy Sampling/quantization external factors: If you record your voice saying machine learning

More information

DATA MINING: NAÏVE BAYES

DATA MINING: NAÏVE BAYES DATA MINING: NAÏVE BAYES 1 Naïve Bayes Classifier Thomas Bayes 1702-1761 We will start off with some mathematical background. But first we start with some visual intuition. 2 Grasshoppers Antenna Length

More information

Machine Learning. Bayesian Learning.

Machine Learning. Bayesian Learning. Machine Learning Bayesian Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de

More information

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

Statistical Methods in Particle Physics. Lecture 2

Statistical Methods in Particle Physics. Lecture 2 Statistical Methods in Particle Physics Lecture 2 October 17, 2011 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2011 / 12 Outline Probability Definition and interpretation Kolmogorov's

More information

Basic Probabilistic Reasoning SEG

Basic Probabilistic Reasoning SEG Basic Probabilistic Reasoning SEG 7450 1 Introduction Reasoning under uncertainty using probability theory Dealing with uncertainty is one of the main advantages of an expert system over a simple decision

More information

Data classification (II)

Data classification (II) Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification

More information

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3 Probability Paul Schrimpf January 23, 2018 Contents 1 Definitions 2 2 Properties 3 3 Random variables 4 3.1 Discrete........................................... 4 3.2 Continuous.........................................

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the

More information

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probability. CS 3793/5233 Artificial Intelligence Probability 1 CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions

More information

Formalizing Probability. Choosing the Sample Space. Probability Measures

Formalizing Probability. Choosing the Sample Space. Probability Measures Formalizing Probability Choosing the Sample Space What do we assign probability to? Intuitively, we assign them to possible events (things that might happen, outcomes of an experiment) Formally, we take

More information

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary) Chapter 14 From Randomness to Probability How to measure a likelihood of an event? How likely is it to answer correctly one out of two true-false questions on a quiz? Is it more, less, or equally likely

More information