Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Similar documents
Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Naïve Bayesian. From Han Kamber Pei

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Artificial Intelligence CS 6364

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Chapter 6 Classification and Prediction (2)

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

CSCE 478/878 Lecture 6: Bayesian Learning

Introduction to Machine Learning

Introduction to Machine Learning

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES

Uncertainty. Chapter 13

Artificial Intelligence Uncertainty

Where are we in CS 440?

Bayesian Learning Features of Bayesian learning methods:

Algorithms for Classification: The Basic Methods

Quantifying uncertainty & Bayesian networks

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Probabilistic Reasoning

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Artificial Intelligence Programming Probability

10/15/2015 A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) Probability, Conditional Probability & Bayes Rule. Discrete random variables

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

Where are we in CS 440?

Uncertainty. Chapter 13

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

In today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality.

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Data Mining Part 4. Prediction

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Lecture 9: Bayesian Learning

Probabilistic representation and reasoning

Bayesian Learning. Bayesian Learning Criteria

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Artificial Intelligence

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Stephen Scott.

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Pengju XJTU 2016

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Introduction to Bayesian Learning

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Uncertainty. Outline

Probabilistic Robotics. Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics (S. Thurn et al.

Reasoning Under Uncertainty

Pengju

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Why Probability? It's the right way to look at the world.

Notes on Machine Learning for and

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

PROBABILISTIC REASONING SYSTEMS

MODULE -4 BAYEIAN LEARNING

Review: Bayesian learning and inference

COMP 328: Machine Learning

Uncertainty. Russell & Norvig Chapter 13.

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Bayesian Belief Network

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Probabilistic Reasoning

Foundations of Artificial Intelligence

Basic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato

Naïve Bayes classification

Modeling and reasoning with uncertainty

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Reasoning Under Uncertainty

Machine Learning

Probabilistic Reasoning Systems

Bayesian Networks. Motivation

Y. Xiang, Inference with Uncertain Knowledge 1

The Naïve Bayes Classifier. Machine Learning Fall 2017

Basic Probability and Statistics

Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150 Lecture 14

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Bayesian Networks. Probability Theory and Probabilistic Reasoning. Emma Rollon and Javier Larrosa Q

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

CS 188: Artificial Intelligence. Our Status in CS188

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Uncertainty in the World. Representing Uncertainty. Uncertainty in the World and our Models. Uncertainty

Probability Based Learning

Uncertainty and Bayesian Networks

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Mining Classification Knowledge

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Naive Bayes classification

Probabilistic representation and reasoning

Be able to define the following terms and answer basic questions about them:

Basic Probability and Decisions

Transcription:

Bayes Classification n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes optimal Classifier n Joint distributions n Naive Bayes Classifier 1

Uncertainty n Our main tool is the probability theory, which assigns to each sentence numerical degree of belief between 0 and 1 n It provides a way of summarizing the uncertainty Variables n Boolean random variables: cavity might be true or false n Discrete random variables: weather might be sunny, rainy, cloudy, snow n Weather=sunny n Weather=rainy n Weather=cloudy n Weather=snow n Continuous random variables: the temperature has continuous values 2

Where do probabilities come from? n Frequents: n From experiments: form any finite sample, we can estimate the true fraction and also calculate how accurate our estimation is likely to be n Subjective: n Agent s believe n Objectivist: n True nature of the universe, that the probability up heads with probability 0.5 is a probability of the coin n Before the evidence is obtained; prior probability n a the prior probability that the proposition is true n cavity=0.1 n After the evidence is obtained; posterior probability n ab n The probability of a given that all we know is b n cavitytoothache=0.8 3

Axioms of robability Kolmogorov s axioms, first published in German 1933 n All probabilities are between 0 and 1. For any proposition a 0 a 1 n true=1, false=0 n The probability of disjunction is given by a b = a + b a b n roduct rule a b = a b b a b = b a a 4

Theorem of total probability If events A 1,..., A n are mutually exclusive with then Bayes s rule n Reverent Thomas Bayes 1702-1761 He set down his findings on probability in "Essay Towards Solving a roblem in the Doctrine of Chances" 1763, published posthumously in the hilosophical Transactions of the Royal Society of London b a = a bb a 5

How can we arrive at Bayes s rule? n For any finite sample, we can estimate the true fraction and also calculate how accurate our estimation is likely to be. n By using samples, as in most physical measurements, we estimate the values. n This approach is called frequentist. n We approach the true value by counting the frequency of an event. Conditional probability n If Ω is the set of all possible events, n Ω = 1, then a Ω. n The cardinality determines the number of elements of a set, cardω is the number of elements of the set Ω, carda is the number of elements of the set a 6

Baye s rule 7

8 Diagnosis n What is the probability of meningitis in the patient with stiff neck? n A doctor knows that the disease meningitis causes the patient to have a stiff neck in 50% of the time -> sm n rior robabilities: That the patient has meningitis is 1/50.000 -> m That the patient has a stiff neck is 1/20 -> s m s = 0.5*0.00002 0.05 = 0.0002 m s = s mm s Normalization x y y x x y x y y x x y = = 0.6,0.4 0.12,0.08, 1 = = + = α α α x y x y Y Y X X Y x y x y

Bayes Theorem n h = prior probability of hypothesis h n D = prior probability of training data D n hd = probability of h given D n Dh = probability of D given h Choosing Hypotheses n Generally want the most probable hypothesis given the training data n Maximum a posteriori hypothesis h MA : 9

n If assume h i =h j for all h i and h j, then can further simplify, and choose the n Maximum likelihood ML hypothesis 10

Example n Does patient have cancer or not? A patient takes a lab test and the result comes back positive. The test returns a correct positive result + in only 98% of the cases in which the disease is actually present, and a correct negative result - in only 97% of the cases in which the disease is not present Furthermore, 0.008 of the entire population have this cancer Suppose a positive result + is returned... 11

Normalization 0.0078 cancer + = 0.0078 + 0.0298 = 0.20745 cancer + = 0.0298 0.0078 + 0.0298 = 0.79255 n The result of Bayesian inference depends strongly on the prior probabilities, which must be available in order to apply the method Brute-Force Bayes Concept Learning n For each hypothesis h in H, calculate the posterior probability n Output the hypothesis h MA with the highest posterior probability 12

n Given no prior knowledge that one hypothesis is more likely than another, what values should we specify for h? n What choice shall we make for Dh? n Hypothesis generates data... n Choose h to be uniform distribution n for all h in H how often is h present? Frequency... n Dh=1 if h consistent with D n Dh=0 otherwise 13

D D = D = D h i h i h i H 1 1 H + 0 1 H h i VS H,D D = VS H,D H h i VS H,D n Version space VS H,D is the subset of consistent Hypotheses from H with the training examples in D n Hypothesis generates this data... if h is inconsistent with D h D = 1 1 H VS H,D H = 1 VS H,D if h is consistent with D 14

Hypothesis generates this data... Maximum Likelihood of real valued function.. see EM Clustering 15

n Maximize natural log of this instead... Bayes optimal Classifier A weighted majority classifier n What is the most probable classification of the new instance given the training data? n The most probable classification of the new instance is obtained by combining the prediction of all hypothesis, weighted by their posterior probabilities n If the classification of new example can take any value v j from some set V, then the probability v j D that the correct classification for the new instance is v j, is just remember, have to calculate h i D : 16

Bayes optimal classification: Gibbs Algorithm n Bayes optimal classifier provides best result, but can be expensive if many hypotheses we have to compute the posterior probability for every hypothesis n Gibbs algorithm: n Choose one hypothesis at random, according to h D n Use this to classify new instance n Surprisingly its expected error no worse than twice Bayes optimal Haussler et al. 1994 17

Joint distribution n A joint distribution for toothache, cavity, catch, dentist s probe catches in my tooth :- n We need to know the conditional probabilities of the conjunction of toothache and cavity n What can a dentist conclude if the probe catches in the aching tooth? toothache catch cavitycavity cavity toothache catch = toothache cavity n For n possible variables there are 2 n possible combinations Conditional Independence n Once we know that the patient has cavity we do not expect the probability of the probe catching to depend on the presence of toothache catch cavity toothache = catch cavity toothache cavity catch = toothache cavity n Independence between a and b a b = a b a = b 18

a b = a b toothache, catch, cavity, Weather = cloudy = = Weather = cloudy toothache, catch, cavity The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most important developments in the recent history of AI This can work well, even the assumption is not true! A single cause directly influence a number of effects, all of which are conditionally independent cause, effect1, effect2,... effectn = cause effecti cause n i= 1 19

Naive Bayes Classifier n Along with decision trees, neural networks, nearest neighbor, one of the most practical learning methods n When to use: n Moderate or large training set available n Attributes that describe instances are conditionally independent given classification n Successful applications: n Diagnosis n Classifying text documents Naive Bayes Classifier n Assume target function f: X è V, where each instance x described by attributes a 1, a 2.. a n n Most probable value of fx is: 20

v NB n Naive Bayes assumption: n which gives Naive Bayes Algorithm n For each target value v j n ç estimate v j n For each attribute value a i of each attribute a n ç estimate a i v j 21

Training dataset Class: C1:buys_computer= yes C2:buys_computer= no Data sample: X = age<=30, Income=medium, Student=yes Credit_rating=Fair age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 30 40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31 40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31 40 medium no excellent yes 31 40 high yes fair yes >40 medium no excellent no Naïve Bayesian Classifier: Example n Compute XC i for each class age= <30 buys_computer= yes = 2/9=0.222 age= <30 buys_computer= no = 3/5 =0.6 income= medium buys_computer= yes = 4/9 =0.444 income= medium buys_computer= no = 2/5 = 0.4 student= yes buys_computer= yes= 6/9 =0.667 student= yes buys_computer= no = 1/5=0.2 credit_rating= fair buys_computer= yes =6/9=0.667 credit_rating= fair buys_computer= no =2/5=0.4 buys_computer= yes =9/14 buys_computer= no =5/14 n X=age<=30,income =medium, student=yes,credit_rating=fair XC i : Xbuys_computer= yes = 0.222 x 0.444 x 0.667 x 0.0.667 =0.044 Xbuys_computer= no = 0.6 x 0.4 x 0.2 x 0.4 =0.019 XC i *C i : Xbuys_computer= yes * buys_computer= yes =0.028 Xbuys_computer= no * buys_computer= no =0.007 X belongs to class buys_computer=yes 22

n Conditional independence assumption is often violated n...but it works surprisingly well anyway Estimating robabilities n We have estimated probabilities by the fraction of times the event is observed to n c occur over the total number of opportunities n n It provides poor estimates when n c is very small n If none of the training instances with target value v j have attribute value a i? n n c is 0 23

n When n c is very small: n n is number of training examples for which v=v j n n c number of examples for which v=v j and a=a i n p is prior estimate n m is weight given to prior i.e. number of ``virtual'' examples Naïve Bayesian Classifier: Comments n Advantages : n Easy to implement n Good results obtained in most of the cases n Disadvantages n Assumption: class conditional independence, therefore loss of accuracy n ractically, dependencies exist among variables n E.g., hospitals: patients: rofile: age, family history etc Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc n Dependencies among these cannot be modeled by Naïve Bayesian Classifier n How to deal with these dependencies? n Bayesian Belief Networks 24

n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes optimal Classifier n Joint distributions n Naive Bayes Classifier Bayesian Belief Networks Burglary B 0.001 Earthquake E 0.002 Alarm Burg. Earth. A t t.95 t f.94 f t.29 f f.001 JohnCalls A J t.90 f.05 MaryCalls A M t.7 f.01 25