CISC 4631 Data Mining

Similar documents
Probabilistic and Bayesian Analytics

DATA MINING: NAÏVE BAYES

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Sources of Uncertainty

Machine Learning

CS 331: Artificial Intelligence Probability I. Dealing with Uncertainty

Machine Learning

Machine Learning

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Dealing with Uncertainty. CS 331: Artificial Intelligence Probability I. Outline. Random Variables. Random Variables.

CS 4649/7649 Robot Intelligence: Planning

Introduction to Machine Learning

Introduction to Machine Learning

Basic Probability and Statistics

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Bayes and Naïve Bayes Classifiers CS434

Probabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon University

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Uncertainty. Yeni Herdiyeni Departemen Ilmu Komputer IPB. The World is very Uncertain Place

CS 188: Artificial Intelligence Spring Today

A [somewhat] Quick Overview of Probability. Shannon Quinn CSCI 6900

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Uncertainty. Chapter 13

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining

Reinforcement Learning Wrap-up

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Discrete Probability and State Estimation

Discrete Probability and State Estimation

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Math Lecture 3 Notes

Why Probability? It's the right way to look at the world.

Bayesian Approaches Data Mining Selected Technique

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

Fundamentals of Machine Learning for Predictive Data Analytics

Our Status in CSE 5522

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Be able to define the following terms and answer basic questions about them:

Naive Bayes classification

CS 188: Artificial Intelligence Spring Announcements

Bayes Nets for representing and reasoning about uncertainty

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Probability review. Adopted from notes of Andrew W. Moore and Eric Xing from CMU. Copyright Andrew W. Moore Slide 1

Preference for Commitment

Expected Value II. 1 The Expected Number of Events that Happen

Lecture 2. Conditional Probability

Probabilistic representation and reasoning

Probabilistic representation and reasoning

Introduction to AI Learning Bayesian networks. Vibhav Gogate

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Bayes Nets for representing

n How to represent uncertainty in knowledge? n Which action to choose under uncertainty? q Assume the car does not have a flat tire

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Probability, Statistics, and Bayes Theorem Session 3

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

ECE521 Lecture7. Logistic Regression

Uncertainty. Outline

Probability Review and Naïve Bayes

CS188: Artificial Intelligence, Fall 2010 Written 3: Bayes Nets, VPI, and HMMs

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

Machine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012

Where are we in CS 440?

Refresher on Discrete Probability

Statistical Methods in Particle Physics. Lecture 2

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

CS 188: Artificial Intelligence. Our Status in CS188

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Our Status. We re done with Part I Search and Planning!

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

CS 188: Artificial Intelligence Fall 2011

Probability, Entropy, and Inference / More About Inference

Conditional probabilities and graphical models

Uncertain Knowledge and Bayes Rule. George Konidaris

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Uncertainty. Chapter 13

1 Review of the dot product

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

CS 343: Artificial Intelligence

A Study On Problem Solving Using Bayes Theorem

Uncertainty in the World. Representing Uncertainty. Uncertainty in the World and our Models. Uncertainty

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

CH 66 COMPLETE FACTORING

Artificial Intelligence

CS 188: Artificial Intelligence. Machine Learning

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

BAYESIAN DECISION THEORY

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Where are we in CS 440?

Transcription:

CISC 4631 Data Mining Lecture 06: ayes Theorem Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) Andrew Moore (CMU/Google) 1

Naïve ayes Classifier Thomas ayes 1702-1761 We will start off with a visual intuition, before looking at the math 2

Grasshoppers Antenna Length Katydids 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Abdomen Length Remember this example? Let s get lots more data 3

Antenna Length With a lot of data, we can build a histogram. Let us just build one for Antenna Length for now 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Katydids Grasshoppers 4

We can leave the histograms as they are, or we can summarize them with two normal distributions. Let us us two normal distributions for ease of visualization in the following slides 5

We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it? We can just ask ourselves, give the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid. There is a formal way to discuss the most probable classification p(c j d) = probability of class c j, given that we have observed d 3 Antennae length is 3 6

ayes Classifier A probabilistic framework for classification problems Often appropriate because the world is noisy and also some relationships are probabilistic in nature Is predicting who will win a baseball game probabilistic in nature? efore getting the heart of the matter, we will go over some basic probability. We will review the concept of reasoning with uncertainty also known as probability This is a fundamental building block for understanding how ayesian classifiers work It s really going to be worth it You may find a few of these basic probability questions on your exam Stop me if you have questions!!!! 7

Discrete Random Variables A is a oolean-valued random variable if A denotes an event, and there is some degree of uncertainty as to whether A occurs. Examples A = The next patient you examine is suffering from inhalational anthrax A = The next patient you examine has a cough A = There is an active terrorist cell in your city 8

Probabilities We write P(A) as the fraction of possible worlds in which A is true We could at this point spend 2 hours on the philosophy of this. ut we won t. 9

Visualizing A Event space of all possible worlds Worlds in which A is true P(A) = Area of reddish oval Its area is 1 Worlds in which A is False 10

The Axioms Of Probability 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or ) = P(A) + P() - P(A and ) The area of A can t get any smaller than 0 And a zero area would mean no world could ever have A true 11

Interpreting the axioms 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or ) = P(A) + P() - P(A and ) The area of A can t get any bigger than 1 And an area of 1 would mean all worlds will have A true 12

Interpreting the axioms 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or ) = P(A) + P() - P(A and ) A 13

Interpreting the axioms 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or ) = P(A) + P() - P(A and ) A P(A or ) P(A and ) Simple addition and subtraction 14

Another important theorem 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 P(A or ) = P(A) + P() - P(A and ) From these we can prove: P(A) = P(A and ) + P(A and not ) A 15

Conditional Probability P(A ) = Fraction of worlds in which is true that also have A true H = Have a headache F = Coming down with Flu F P(H) = 1/10 P(F) = 1/40 P(H F) = 1/2 H Headaches are rare and flu is rarer, but if you re coming down with flu there s a 50-50 chance you ll have a headache. 16

Conditional Probability F P(H F) = Fraction of flu-inflicted worlds in which you have a headache H = Have a headache F = Coming down with Flu P(H) = 1/10 P(F) = 1/40 P(H F) = 1/2 H = #worlds with flu and headache ------------------------------------ #worlds with flu = Area of H and F region ------------------------------ Area of F region = P(H and F) --------------- P(F) 17

Definition of Conditional Probability P(A and ) P(A ) = ----------- P() Corollary: The Chain Rule P(A and ) = P(A ) P() 18

Probabilistic Inference F H = Have a headache F = Coming down with Flu H P(H) = 1/10 P(F) = 1/40 P(H F) = 1/2 One day you wake up with a headache. You think: Drat! 50% of flus are associated with headaches so I must have a 50-50 chance of coming down with flu Is this reasoning good? 19

Probabilistic Inference F H = Have a headache F = Coming down with Flu H P(H) = 1/10 P(F) = 1/40 P(H F) = 1/2 P(F and H) = P(F H) = 20

Probabilistic Inference F H = Have a headache F = Coming down with Flu H P(H) = 1/10 P(F) = 1/40 P(H F) = 1/2 1 1 P( F and H) P( H F) P( F) 2 40 1 80 1 P( F and H ) P( F H ) 1 80 P( H ) 10 1 8 21

What we just did P(A & ) P(A ) P() P( A) = ----------- = --------------- P(A) P(A) This is ayes Rule ayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 22

Some more terminology The Prior Probability is the probability assuming no specific information. Thus we would refer to P(A) as the prior probability of even A occurring We would not say that P(A C) is the prior probability of A occurring The Posterior probability is the probability given that we know something We would say that P(A C) is the posterior probability of A (given that C occurs) 23

Given: Example of ayes Theorem A doctor knows that meningitis causes stiff neck 50% of the time Prior probability of any patient having meningitis is 1/50,000 Prior probability of any patient having stiff neck is 1/20 If a patient has stiff neck, what s the probability he/she has meningitis? P( S M ) P( M ) 0.5 1/ 50000 P( M S) P( S) 1/ 20 0.0002 24

Another Example of T Menu ad Hygiene Menu Menu Good Hygiene Menu Menu Menu Menu You are a health official, deciding whether to investigate a restaurant You lose a dollar if you get it wrong. You win a dollar if you get it right Half of all restaurants have bad hygiene In a bad restaurant, ¾ of the menus are smudged In a good restaurant, 1/3 of the menus are smudged You are allowed to see a randomly chosen menu 25

) ( S P ) ( ) and ( S P S P ) ( ) and ( S P S P ) and not ( ) and ( ) and ( S P S P S P ) and not ( ) and ( ) ( ) ( S P S P P S P ) not ( ) not ( ) ( ) ( ) ( ) ( P S P P S P P S P 13 9 2 1 3 1 2 1 4 3 2 1 4 3 26

Menu Menu Menu Menu Menu Menu Menu Menu Menu Menu Menu Menu Menu Menu Menu Menu 27

ayesian Diagnosis uzzword Meaning In our example True State The true state of the world, which you would like to know Is the restaurant bad? Our example s value 28

ayesian Diagnosis uzzword Meaning In our example True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(ad) 1/2 Our example s value 29

ayesian Diagnosis uzzword Meaning In our example True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(ad) 1/2 Evidence Some symptom, or other thing you can observe Smudge Our example s value 30

ayesian Diagnosis uzzword Meaning In our example True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(ad) 1/2 Evidence Conditional Some symptom, or other thing you can observe Probability of seeing evidence if you did know the true state P(Smudge ad) 3/4 P(Smudge not ad) 1/3 Our example s value 31

ayesian Diagnosis uzzword Meaning In our example True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(ad) 1/2 Evidence Conditional Posterior Some symptom, or other thing you can observe Probability of seeing evidence if you did know the true state The Prob(true state = x some evidence) P(Smudge ad) 3/4 P(Smudge not ad) 1/3 P(ad Smudge) 9/13 Our example s value 32

ayesian Diagnosis uzzword Meaning In our example True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(ad) 1/2 Evidence Conditional Posterior Inference, Diagnosis, ayesian Reasoning Some symptom, or other thing you can observe Probability of seeing evidence if you did know the true state The Prob(true state = x some evidence) Getting the posterior from the prior and the evidence P(Smudge ad) 3/4 P(Smudge not ad) 1/3 P(ad Smudge) 9/13 Our example s value 33

ayesian Diagnosis uzzword Meaning In our example True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x) P(ad) 1/2 Evidence Conditional Posterior Inference, Diagnosis, ayesian Reasoning Decision theory Some symptom, or other thing you can observe Probability of seeing evidence if you did know the true state The Prob(true state = x some evidence) Getting the posterior from the prior and the evidence Combining the posterior with known costs in order to decide what to do P(Smudge ad) 3/4 P(Smudge not ad) 1/3 P(ad Smudge) 9/13 Our example s value 34

Why ayes Theorem at all? P( C A) P( A C) P( C) P( A) Why modeling P(C A) via P(A C) Why not model P(C A) directly? P(A C)P(C) decomposition allows us to be sloppy P(C) and P(A C) can be trained independently 35

Crime Scene Analogy A is a crime scene. C is a person who may have committed the crime P(C A) - look at the scene - who did it? P(C) - who had a motive? (Profiler) P(A C) - could they have done it? (CSI - transportation, access to weapons, alibi) 36