COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Similar documents
COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Decision Trees. Gavin Brown

Discrete Probability and State Estimation

Discrete Probability and State Estimation

Learning Decision Trees

Bayesian Learning Features of Bayesian learning methods:

Lecture 2. Conditional Probability

Bayesian Classification. Bayesian Classification: Why?

Decision Trees. Danushka Bollegala

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Artificial Intelligence Programming Probability

Learning Decision Trees

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Uncertain Knowledge and Bayes Rule. George Konidaris

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

COMP 328: Machine Learning

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Algorithms for Classification: The Basic Methods

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

Decision Tree Learning and Inductive Inference

Basics of Probability

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Basic Probability and Statistics

Decision Tree Learning - ID3

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Uncertainty. Russell & Norvig Chapter 13.

Learning Classification Trees. Sargur Srihari

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Intermediate Math Circles November 15, 2017 Probability III

Classification and Regression Trees

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

Joint, Conditional, & Marginal Probabilities

Econ 325: Introduction to Empirical Economics

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.

Probability Based Learning

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

In today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality.

Event A: at least one tail observed A:

Decision Trees Part 1. Rao Vemuri University of California, Davis

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Lecture 3 - Axioms of Probability

Linear Classifiers and the Perceptron

Building Bayesian Networks. Lecture3: Building BN p.1

10-701/ Machine Learning: Assignment 1

Decision Trees.

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Probability Notes (A) , Fall 2010

Single Maths B: Introduction to Probability

Lecture 3: Decision Trees

The Naïve Bayes Classifier. Machine Learning Fall 2017

Where are we in CS 440?

Lecture 9: Bayesian Learning

Quadratic Equations Part I

Bayesian Inference. Introduction

Machine Learning 2nd Edi7on

Administrative notes. Computational Thinking ct.cs.ubc.ca

ARTIFICIAL INTELLIGENCE. Supervised learning: classification

1 INFO 2950, 2 4 Feb 10

Lecture 1: Probability Fundamentals

Dan Roth 461C, 3401 Walnut

the time it takes until a radioactive substance undergoes a decay

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 28

Decision Trees.

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Chapter 1 Review of Equations and Inequalities

Probability, Statistics, and Bayes Theorem Session 3

Lecture 3: Decision Trees

STA Module 4 Probability Concepts. Rev.F08 1

Stephen Scott.

1.6/1.7 - Conditional Probability and Bayes Theorem

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Lecture 4 Bayes Theorem

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Lecture 1: Basics of Probability

DATA MINING: NAÏVE BAYES

Machine Learning. Bayesian Learning.

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Introduction to Machine Learning

Statistical Methods in Particle Physics. Lecture 2

Basic Probabilistic Reasoning SEG

Data classification (II)

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Mining Classification Knowledge

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Formalizing Probability. Choosing the Sample Space. Probability Measures

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Transcription:

COMP61011 : Machine Learning Probabilis*c Models + Bayes Theorem

Probabilis*c Models - one of the most active areas of ML research in last 15 years - foundation of numerous new technologies - enables decision-making under uncertainty - Tough. Don t expect to get this immediately. It takes time.

I have four snooker balls in a bag 2 black, 2 white. I reach in with my eyes closed. What is the probability of picking a black ball? I give this variable a name, A. p(a = black) = 1 2

Picking a black ball, then replacing, then picking black again? p(a = black, B = black) = 1 4 Why? p(a = black)p(b = black) = 1 2 1 2 = 1 4

Picking two black balls in sequence (i.e. no replacing)? p(a = black, B = black) =? p(a = black)p(b = black A = black) 1 2 1 3 = 1 6

Probabilities and Conditional Probabilities p(a = black) p(b = black A = black) Events : A, B, C, etc random variables e.g. A is the random event of picking the first ball. B is the random event of picking the second ball. p(a =1) p(b =1 A =1) where 1 means the ball was black.

Rules of Probability Theory p(a =1, B =1) = p(a =1)p(B =1 A =1) Probability that both balls are black = Probability that the first is black x Probability that the second is black, given that the first was black

Shorthand notation p(a =1, B =1) = p(a =1)p(B =1 A =1) p(a, B) = p(a)p(b A) Means that the rule holds for all possible assignments of values to A and B.

If two events A,B are dependent : p(a, B) = p(b A)p(A) e.g. black/white balls example If two events A,B are independent : p(a, B) = p(a)p(b) e.g. two consecutive rolls of a dice

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( wind = strong ) = The chances of the wind being strong, among all days.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( wind = strong ) = 6 /14 = 0. 4286 The chances of the wind being strong, among all days.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( wind = strong tennis = yes ) = The chances of a strong wind day, given that the person enjoyed tennis.

Outlook Temperature Humidity Wind Tennis? D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D7 Overcast Cool Normal Strong Yes D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes p( wind = strong tennis = yes ) = The chances of a strong wind day, given that the person enjoyed tennis.

Outlook Temperature Humidity Wind Tennis? D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D7 Overcast Cool Normal Strong Yes D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes p( wind = strong tennis = yes ) = 3 / 9 = 0. 333 The chances of a strong wind day, given that the person enjoyed tennis.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( tennis = yes wind = strong ) = The chances of the person enjoying tennis, given that it is a strong wind day.

Outlook Temperature Humidity Wind Tennis? D2 Sunny Hot High Strong No D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D14 Rain Mild High Strong No p( tennis = yes wind = strong ) = 0.5 The chances of the person enjoying tennis, given that it is a strong wind day.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( temp = hot tennis = yes ) = p( tennis = yes temp = hot) = p( tennis = yes temp = hot, humidity = high) =

What s the use of all this? - We can calculate these numbers on data - Leads to an elegant theorem we can make use of

A problem to solve: 1% of the population get cancer 80% of people with cancer get a positive test 9.6% of people without cancer also get a positive test The question: A person has a test for cancer that comes back positive. What is the probability that they actually have cancer? Quick guess: a) less than 1% b) somewhere between 1% and 70% c) between 70% and 80% d) more than 80%

Write down the probabilities of everything Define variables: C : 1= presence of cancer, 0 = no cancer, E : 1= positive test, 0 = negative test The prior probability of cancer in the population is 1%, so p(c =1) = 0.01 The probability of positive test given there is cancer, p(e =1 C =1) = 0.8 If there is no cancer, we still have p(e =1 C = 0) = 0.096 p(c=1 E =1) The question is: what is?

Working with Concrete Numbers 10,000 patients p(c=1) = 0.01 100 cancer p(c=0) = 0.99 9900 no cancer p(e=1 C=1) = 0.8 p(e=1 C=0) = 0.096 80 cancer, positive test 20 cancer, negative test 950.4 no cancer, positive test 8949.6 no cancer, negative test p(c =1 E =1) =? How many people from 10,000 get E=1? How many from those get C=1?

Working with Concrete Numbers 10,000 patients p(c=1) = 0.01 100 cancer p(c=0) = 0.99 9900 no cancer p(e=1 C=1) = 0.8 p(e=1 C=0) = 0.096 80 cancer, positive test 20 cancer, negative test 950.4 no cancer, positive test 8949.6 no cancer, negative test p(c =1 E =1) = 80 80 + 950.4 = 0.0776 7.76%

Surprising result! Do you trust your Doctor? Although the probability of a positive test given cancer is 80%, the probability of cancer given a positive test is only about 7.8%. 8/10 doctors would have said: c) between 70% and 80%. WRONG!! Common mistake: the probability that a person with positive test has cancer is not the same as the probability that a person with cancer has a positive test. One must also consider : the background chances (prior) of having cancer, the chances of receiving a false alarm in the test.

Solving the same problem, via Bayes Theorem p(e =1 C =1) = 0.8 p( C = 1) = 0.01 p(e =1,C =1) = p(e =1 C =1)p(C =1) The general statement is: p(e, C) = p(e C)p(C) And since the statement E and C is equivalent to C and E : p(e, C) = p(c E)p(E)

Solving the same problem, via Bayes Theorem p(e, C) = p(e C)p(C) = p(c E)p(E) Now rearrange p(c E)p(E) = p(e C)p(C) p(c E) = p(e C)p(C) p(e)

p(c E) = p(e C)p(C) p(e) Rev. Thomas Bayes, 1702-1761 Bayes Theorem forms the backbone of the past 20 years of ML research into probabilistic models. Think of E as effect and C as cause. But.. warning: sometimes thinking this way will be very non-intuitive.

we know this we know this we want this p(c =1 E =1) = p(e =1 C =1)p(C =1) p(e =1) we can calculate this Another rule of probability theory: marginalizing p(e =1) = p(e =1C = c)p(c = c) c C Think of this as given all possible things that can happen with C, what is the probability of E=1?

p(c =1 E =1) = = p(e =1 C =1)p(C =1) p(e =1) p(e =1 C =1)p(C =1) p(e =1 C =1)p(C =1)+ p(e =1 C = 0)p(C = 0) Notice the denominator now contains the same term as the numerator. We only need to know two terms here: p(e=1 C=1)p(C=1) and p(e=1 C=0)p(C=0)

p(e =1 C =1)p(C =1) 0.8 0. 01 = 0.008 p(e =1 C = 0)p(C = 0) 0.096 0.99 = 0.09504 p(c =1 E =1) = 0.008 0.008 + 0.09504 = 0.0776 = 7.76%

Bayes theorem. Talk to your neighbours 5 mins or so.

Another Example what year is it? You jump in a time machine. It takes you somewhere. But you don t know to what year it has taken you. You know it is one of 1885, 1955, 1985, or 2015.

What year is it? You look out the window and see a STEAM train. What are the chances of seeing this in the year 2015? Let s guess

What year is it? In other years? And remember

What year is it? Bayes Theorem to the rescue. We can calculate the denominator as

What year is it? Bayes Theorem to the rescue.

What year is it? Bayes Theorem to the rescue. For other years.

What year is it? Then you look out the window. And see someone wearing Nike branded trainers.

What year is it? But now our belief over what year it is has changed, because of the train But, Bayes Theorem can just use this, plugging it back into the same equation

What year is it?

What year is it?

What year is it? Prior belief Observation Observation We believe we are in 1985, with p = 0.945

Bayes theorem, done. Take a 15 minute break.

More Problems Solved with Probabilities Your car is making a noise. What are the chances that the tank is empty? The chances of the car making noise, if the tank really is empty. The chances of the car making noise, if the tank is not empty p( noisy = 1 empty = 1) = p( noisy = 1 empty = 0) = 0.9 0.2 The chances of the tank being empty, regardless of anything else. p( empty = 1) = 0.5 p( empty = 1 noisy = 1) =?

Bayes Theorem p(noisy =1 empty =1)p(empty =1) 0.9 0. 5 p(noisy =1 empty = 0)p(empty = 0) 0.2 0. 5 p(empty =1 noisy =1) = 0.9 * 0.5 (0.9 * 0.5)+ (0.2 * 0.5) = 0.8182

Another Problem to Solve A person tests positive for a certain medical disease. What are the chances that they really do have the disease? The chances of the test being positive, if the person really is ill. The chances of the test being positive, if the person is in fact well. p( test = 1 disease = 1) = p( test = 1 disease = 0) = 0.9 0.01 The chances of the condition, in the general population. p( disease = 1) = 0.05 p( disease = 1 test = 1) =?

Bayes Theorem p(test =1 disease =1)p(disease =1) 0.9 0. 05 p(test =1 disease = 0)p(disease = 0) 0.01 0. 95 0.045 p( disease = 1 test = 1) = = 0.045 + 0.0095 0.8257

Another Problem to Solve Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has rained only 5 days each year. Unfortunately, the weatherman has predicted rain for tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 20% of the time. What is are the chances it will rain on the day of Marie's wedding? The chances of the forecast saying rain, if it really does rain. The chances of the forecast saying rain, if it will be fine. p( forecastrain = 1 rain = 1) = p( forecastrain = 1 rain = 0) = 0.9 0.2 The chances of rain, in the general case. p( rain = 1) = 5/365 = 0.0137