COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Similar documents
COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning Features of Bayesian learning methods:

Decision Trees. Gavin Brown

Algorithms for Classification: The Basic Methods

Bayesian Classification. Bayesian Classification: Why?

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Learning Decision Trees

Artificial Intelligence Programming Probability

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Decision Tree Learning and Inductive Inference

Decision Trees. Danushka Bollegala

The Naïve Bayes Classifier. Machine Learning Fall 2017

Learning Decision Trees

In today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality.

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Probability Based Learning

CSCE 478/878 Lecture 6: Bayesian Learning

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

NLP: Probability. 1 Basics. Dan Garrette December 27, E : event space (sample space)

Lecture 9: Bayesian Learning

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Learning Classification Trees. Sargur Srihari

Decision Trees.

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Naïve Bayes Classifiers

Joint, Conditional, & Marginal Probabilities

Decision Trees Part 1. Rao Vemuri University of California, Davis

Lecture 3: Decision Trees

Machine Learning 2nd Edi7on

COMP 328: Machine Learning

Bayesian Learning. Bayesian Learning Criteria

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Building Bayesian Networks. Lecture3: Building BN p.1

Numerical Learning Algorithms

Decision Trees.

Mining Classification Knowledge

Lecture 3: Decision Trees

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

ARTIFICIAL INTELLIGENCE. Supervised learning: classification

The Bayesian Learning

CS 6375 Machine Learning

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Discrete Probability and State Estimation

Discrete Probability and State Estimation

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

P (E) = P (A 1 )P (A 2 )... P (A n ).

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Probability and Probability Distributions. Dr. Mohammed Alahmed

Decision Trees. Tirgul 5

Decision Tree Learning - ID3

the tree till a class assignment is reached

Linear Classifiers and the Perceptron

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

Classification II: Decision Trees and SVMs

1. True/False (40 points) total

Classification and regression trees

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

Refresher on Discrete Probability

Chapter 4.5 Association Rules. CSCI 347, Data Mining

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

CS 5522: Artificial Intelligence II

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Dan Roth 461C, 3401 Walnut

Induction on Decision Trees

Chapter 6: Classification

Stephen Scott.

CS 343: Artificial Intelligence

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

CC283 Intelligent Problem Solving 28/10/2013

Decision Tree Learning

Test 3 SOLUTIONS. x P(x) xp(x)

Quantitative Understanding in Biology 1.7 Bayesian Methods

Today we ll discuss ways to learn how to think about events that are influenced by chance.

DATA MINING: NAÏVE BAYES

Joint, Conditional, & Marginal Probabilities

THE SOLOVAY STRASSEN TEST

Bias Correction in Classification Tree Construction ICML 2001

Chapter 3: Decision Tree Learning

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Our learning goals for Bayesian Nets

UVA CS / Introduc8on to Machine Learning and Data Mining

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Transcription:

COMP61011 Probabilistic Classifiers Part 1, Bayes Theorem

Reverend Thomas Bayes, 1702-1761 p ( T W ) W T ) T ) W ) Bayes Theorem forms the backbone of the past 20 years of ML research into probabilistic models. Used everywhere : e.g. finding sunken shipwrecks, your last Google search, determining the guilt of defendants in a trial, assessing the outcome of a breast cancer screening.

Thinking in Probabilities Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No wind strong ) The chances of the wind being strong, among all days.

Thinking in Probabilities Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No wind strong ) 6 /14 0. 4286 The chances of the wind being strong, among all days.

Thinking in Probabilities Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No wind strong tennis yes ) The chances of a strong wind day, given that the person enjoyed tennis.

Thinking in Probabilities Outlook Temperature Humidity Wind Tennis? D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D7 Overcast Cool Normal Strong Yes D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes wind strong tennis yes ) The chances of a strong wind day, given that the person enjoyed tennis.

Thinking in Probabilities Outlook Temperature Humidity Wind Tennis? D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D7 Overcast Cool Normal Strong Yes D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes wind strong tennis yes ) 3 / 9 0. 333 The chances of a strong wind day, given that the person enjoyed tennis.

Thinking in Probabilities Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No tennis yes wind strong ) The chances of the person enjoying tennis, given that it is a strong wind day.

Thinking in Probabilities Outlook Temperature Humidity Wind Tennis? D2 Sunny Hot High Strong No D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D14 Rain Mild High Strong No tennis yes wind strong ) 0.5 The chances of the person enjoying tennis, given that it is a strong wind day.

Thinking in Probabilities Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No temp hot tennis yes ) tennis yes temp hot) tennis yes temp hot, humidity high)

A Problem to Solve The facts: 1% of the female population have breast cancer 80% of women with breast cancer get positive mammography 9.6% of women without breast cancer also get positive mammography The question: A woman has a positive mammography. What is the probability that she has breast cancer? Quick guess: a) less than 1% b) somewhere between 1% and 70% c) between 70% and 80% d) more than 80%

Write down the probabilities of everything Define variables: C :1 M :1 The prior probability of cancer in the population is 1%, so The probability of positive test given there is cancer, If there is no cancer, we still have The question is: what is presence of cancer, 0 no cancer, positivemammography,0 negative mammography M 1 C 0) 0.096 C 1 M 1)? C 1) M 1 C 1) 0.01 0.8

Working with Concrete Numbers 10,000 patients M1 C1) 0.8 C1) 0.01? cancer C0) 0.99 M1 C0) 0.096? no cancer? cancer, positive test? cancer, negative test? no cancer, positive test? no cancer, negative test

Working with Concrete Numbers 10,000 patients C1) 0.01 100 cancer C0) 0.99 9900 no cancer M1 C1) 0.8 M1 C0) 0.096 80 cancer, positive test 20 cancer, negative test 950.4 no cancer, positive test 8949.6 no cancer, negative test C1 M1)? How many people from 10,000 get M1? How many people from 10,000 get C1 and M1?

Working with Concrete Numbers 10,000 patients C1) 0.01 100 cancer C0) 0.99 9900 no cancer M1 C1) 0.8 M1 C0) 0.096 80 cancer, positive test 20 cancer, negative test 950.4 no cancer, positive test 8949.6 no cancer, negative test C1 M1) 80 / (80+950.4) 7.76%

Surprising result Do you trust your Doctor? Although the probability of a positive mammography given cancer is 80%, the probability of cancer given a positive mammography is only about 7.8%. 8/10 doctors would have said: c) between 70% and 80%. WRONG Common mistake: the probability that a woman with positive mammography has cancer is not the same as the probability that a woman with cancer has a positive mammography. One must also consider : the background chances (prior) of having breast cancer, the chances of receiving a false alarm in the test.

A return to tennis, and... an interesting symmetry W : 1strong 0weak T : 1yes 0no p ( W 1 T 1) T 1) T 1 W 1) W 1) Try it again for a different assignment e.g. T 1, W 0

Bayes Rule From the previous slide, we know that. (i.e. true for any assignment of values to the variables) This leads to what is known as Bayes Rule: ) ( ) ( ) ( ) ( T p T W p W p W T p ) ( ) ( ) ( ) ( W p T p T W p W T p

Solving the medical problem with Bayes Rule we know this we know this we want this C 1 M 1) M 1 C 1) C M 1) 1) we don t know this

10,000 patients C1) 0.01 100 cancer C0) 0.99 9900 no cancer M1 C1) 0.8 M1 C0) 0.096 80 cancer, posi1ve test 20 cancer, negative test 950.4 no cancer, posi1ve test 8949.6 no cancer, negative test 80 M 1) + 10,000 950.4 10,000 ( 0.01 0.8) + ( 0.99 0.096) To get M1) Just multiply probabilities on the branches

Solving the medical problem with Bayes Rule C 1 M 1) M 1 C 1) C M 1) 1) M 1 C M 1 C 1) C 1) 1) C 1) + M 1 C 0) C 0) Notice the denominator now contains the same term as the numerator. We only need to know two terms here: M1 C1)C1) and M1 C0)C0)

Solving the medical problem with Bayes Rule C 1 M 1) M 1 C 1) C 1) 0.008 0.8 0. 01 C 0 M 1) M 1 C 0) C 0) 0.09504 0.096 0. 99 C 1 M 1) 0.008 0.008 + 0.09504 0.0776 7.76%

Another Problem to Solve with Probabilities Your car is making a noise. What are the chances that the tank is empty? The chances of the car making noise, if the tank really is empty. The chances of the car making noise, if the tank is not empty noisy 1 empty 1) noisy 1 empty 0) 0.9 0.2 The chances of the tank being empty, regardless of anything else. empty 1) 0.5 empty 1 noisy 1)?

Bayes Rule empty 1 noisy 1) noisy 1 empty 1) empty 1) 0.45 0.9 0. 5 empty 0 noisy 1) noisy 1 empty 0) empty 0) 0.1 0.2 0. 5 0.45 empty 1 noisy 1) 0.45+ 0.1 0.8182

Another Problem to Solve A person tests positive for a certain medical disease. What are the chances that they really do have the disease? The chances of the test being positive, if the person really is ill. The chances of the test being positive, if the person is in fact well. test 1 disease 1) test 1 disease 0) 0.9 0.01 The chances of the condition, in the general population. disease 1) 0.05 disease 1 test 1)?

Bayes Rule disease 1 test 1) test 1 disease 1) disease 1) 0.045 0.9 0. 05 disease 0 test 1) test 1 disease 0) disease 0) 0.0095 0.01 0. 95 0.045 disease 1 test 1) 0.045 + 0.0095 0.8257

Another Problem to Solve Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has rained only 5 days each year. Unfortunately, the weatherman has predicted rain for tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 20% of the time. What is are the chances it will rain on the day of Marie's wedding? The chances of the forecast saying rain, if it really does rain. The chances of the forecast saying rain, if it will be fine. forecastrain 1 rain 1) forecastrain 1 rain 0) 0.9 0.2 The chances of rain, in the general case. rain 1) 5/365 0.0137

Bayes Rule rain 1 forecastrain 1) forecastrain 1 rain 1) rain 1) 0.0123 0.9 0. 0137 rain 0 forecastrain 1) forecastrain 1 rain 0) rain 0) 0.1973 0.2 0. 9863 0.0123 rain 1 forecastrain 1) 0.0123+ 0.1973 0.0587 Only 5.8% chance of rain

W T) T) as a network But what about the other features? T) T W T) W Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No

This? We ll run out of data.. T) T Wind, Temp, Humid, Outlook T) W,T,H,O Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No

) ( ) ( ) ( ) ( X p T p T X p X T p },,, { outlook humidity temp wind X etc,, 2 1 temp X wind X Let s assume that all the variables are INDEPENDENT given T : ) ( ) ( ) ( ) ( 4 1 X p T X p T p X T p i i This is the NAÏVE BAYES assumption.

X { wind, temp, humidity, outlook } X 1 wind, X 2 temp, etc T X ) T ) 4 i 1 X X ) i T ) T) T Wind Temp Humid Outlook Naïve Bayes - special case of a BAYESIAN NETWORK - can add more links to specify dependencies - e.g. temperature affects humidity

T) T Wind Temp Humid Outlook Can learn the linkage called structure learning but is NP hard Calculating the final probability T X1, Xn) is called inference. Computationally intensive when we have very complicated graphs. In spite of this BNs are a very flexible way of learning. A subclass of a wider class of probabilistic modelling algorithms. State of the art in modern Machine Learning