Probabilistic and Bayesian Analytics

Size: px

Start display at page:

Download "Probabilistic and Bayesian Analytics"

Thomasine Houston
5 years ago
Views:

1 This ersion of the Powerpoint presentation has been labeled to let you know which bits will by coered in the ain class (Tue/Thu orning lectures and which parts will be coered in the reiew sessions. Probabilistic and Bayesian Analytics Note to other teachers and users of these slides. Andrew would be delighted if you found this source aterial useful in giing your own lectures. Feel free to use these slides erbati, or to odify the to fit your own needs. PowerPoint originals are aailable. If you ake use of a significant portion of these slides in your own lecture, please include this essage, or the following link to the source repository of Andrew s tutorials: Coents and corrections gratefully receied. Andrew W. Moore Associate Professor School of Coputer Science Carnegie Mellon Uniersity aw@cs.cu.edu Copyright 2, Andrew W. Moore Aug 25th, 2 Probability The world is a ery uncertain place 3 years of Artificial Intelligence and Database research danced around this fact And then a few AI researchers decided to use soe ideas fro the eighteenth century Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2

2 What we re going to do We will reiew the fundaentals of probability. It s really going to be worth it In this lecture, you ll see an exaple of probabilistic analytics in action: Bayes Classifiers Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3 Discrete Rando Variables Reiew Session A is a Boolean-alued rando ariable if A denotes an eent, and there is soe degree of uncertainty as to whether A occurs. Exaples A The US president in 223 will be ale A ou wake up toorrow with a headache A ou hae Ebola Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4

3 Reiew Session Probabilities We write A as the fraction of possible worlds in which A is true We could at this point spend 2 hours on the philosophy of this. But we won t. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 5 Reiew Session Visualizing A Eent space of all possible worlds Worlds in which A is true A Area of reddish oal Its area is Worlds in which A is False Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 6

4 Reiew Session The Axios of Probability < A < True False A or B A + B - A and B Where do these axios coe fro? Were they discoered? Answers coing up later. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 7 Reiew Session Interpreting the axios < A < True False A or B A + B - A and B The area of A can t get any saller than And a zero area would ean no world could eer hae A true Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 8

5 Interpreting the axios < A < True False A or B A + B - A and B Reiew Session The area of A can t get any bigger than And an area of would ean all worlds will hae A true Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 9 Interpreting the axios < A < True False A or B A + B - A and B Reiew Session A B Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide

6 Interpreting the axios < A < True False A or B A + B - A and B Reiew Session A B A or B A and B B Siple addition and subtraction Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide These Axios are Not to be Trifled With There hae been attepts to do different ethodologies for uncertainty Fuzzy Logic Three-alued logic Depster-Shafer Non-onotonic reasoning Reiew Session But the axios of probability are the only syste with this property: If you gable using the you can t be unfairly exploited by an opponent using soe other syste [di Finetti 93] Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2

7 Theores fro the Axios < A <, True, False A or B A + B - A and B Fro these we can proe: not A ~A -A How? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3 Reiew Session Side Note I a inflicting these proofs on you for two reasons:. These kind of anipulations will need to be second nature to you if you use probabilistic analytics in depth 2. Suffering is good for you Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4

8 Another iportant theore < A <, True, False A or B A + B - A and B Fro these we can proe: A A ^ B + A ^ ~B How? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 5 Multialued Rando Variables Suppose A can take on ore than 2 alues A is a rando ariable with arity k if it can take on exactly one alue out of {, 2,.. k } Thus A P A A A if i A i j ( 2 k j Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 6

9 An easy fact about Multialued Rando Variables: Using the axios of probability < A <, True, False A or B A + B - A and B And assuing that A obeys A P A It s easy to proe that A A if i A i j ( 2 k j A A 2 A i A i j j Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 7 An easy fact about Multialued Rando Variables: Using the axios of probability < A <, True, False A or B A + B - A and B And assuing that A obeys A P A It s easy to proe that A A if i A i j ( 2 k j A A 2 A And thus we can proe k j A j i A i j j Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 8

10 Another fact about Multialued Rando Variables: Using the axios of probability < A <, True, False A or B A + B - A and B And assuing that A obeys A P A It s easy to proe that A A if i A i j ( 2 k j B [ A A 2 A ] i Reiew Session B A i j j Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 9 Another fact about Multialued Rando Variables: Using the axios of probability < A <, True, False A or B A + B - A and B And assuing that A obeys A P A It s easy to proe that A A if i A i j ( 2 k j B [ A A 2 A ] And thus we can proe B k j B A i B A i j j j Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2

11 Eleentary Probability in Pictures ~A + A Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2 Eleentary Probability in Pictures B B ^ A + B ^ ~A Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 22

12 Eleentary Probability in Pictures k j A j Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 23 Eleentary Probability in Pictures B k j B A j Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 24

13 Conditional Probability A B Fraction of worlds in which B is true that also hae A true H Hae a headache F Coing down with Flu F H / F /4 H F /2 H Headaches are rare and flu is rarer, but if you re coing down with flu there s a 5-5 chance you ll hae a headache. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 25 Conditional Probability F H F Fraction of flu-inflicted worlds in which you hae a headache H Hae a headache F Coing down with Flu H / F /4 H F /2 H #worlds with flu and headache #worlds with flu Area of H and F region Area of F region H ^ F F Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 26

14 Definition of Conditional Probability A ^ B A B B Corollary: The Chain Rule A ^ B A B B Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 27 Probabilistic Inference F H Hae a headache F Coing down with Flu H H / F /4 H F /2 One day you wake up with a headache. ou think: Drat! 5% of flus are associated with headaches so I ust hae a 5-5 chance of coing down with flu Is this reasoning good? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 28

15 Probabilistic Inference F H Hae a headache F Coing down with Flu H H / F /4 H F /2 F ^ H F H Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 29 What we just did A ^ B A B B B A A A This is Bayes Rule Bayes, Thoas (763 An essay towards soling a proble in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:37-48 Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3

16 Using Bayes Rule to Gable $. The Win enelope has a dollar and four beads in it The Lose enelope has three beads and no oney Triial question: soeone draws an enelope at rando and offers to sell it to you. How uch should you pay? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3 Using Bayes Rule to Gable $. The Win enelope has a dollar and four beads in it The Lose enelope has three beads and no oney Interesting question: before deciding, you are allowed to see one bead drawn fro the enelope. Suppose it s black: How uch should you pay? Suppose it s red: How uch should you pay? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 32

17 Calculation $. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 33 More General Fors of Bayes Rule B A A A B B A A + B ~ A ~ A B A X A X A B X B X Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 34

18 Reiew Session More General Fors of Bayes Rule A i B n A k B A A B A i A k i k Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 35 Useful Easy-to-proe facts A B + A B n A k A k B Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 36

19 The Joint Distribution Recipe for aking a joint distribution of M ariables: Exaple: Boolean ariables A, B, C Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 37 The Joint Distribution Recipe for aking a joint distribution of M ariables:. Make a truth table listing all cobinations of alues of your ariables (if there are M Boolean ariables then the table will hae 2 M rows. A Exaple: Boolean ariables A, B, C B C Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 38

20 The Joint Distribution Recipe for aking a joint distribution of M ariables:. Make a truth table listing all cobinations of alues of your ariables (if there are M Boolean ariables then the table will hae 2 M rows. 2. For each cobination of alues, say how probable it is. A Exaple: Boolean ariables A, B, C B C Prob Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 39 The Joint Distribution Recipe for aking a joint distribution of M ariables:. Make a truth table listing all cobinations of alues of your ariables (if there are M Boolean ariables then the table will hae 2 M rows. 2. For each cobination of alues, say how probable it is. 3. If you subscribe to the axios of probability, those nubers ust su to. A A Exaple: Boolean ariables A, B, C B.5 C Prob C.3 B. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4

21 Using the Joint One you hae the JD you can ask for the probability of any logical expression inoling your attribute E rows atching E row Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4 Using the Joint Poor Male.4654 E rows atching E row Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 42

22 Using the Joint Poor.764 E rows atching E row Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 43 Inference with the Joint E E 2 E E2 E 2 rows atching E and E rows atching E row 2 2 row Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 44

23 Inference with the Joint E E 2 E E2 E 2 rows atching E and E rows atching E row 2 2 row Male Poor.4654 / Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 45 Inference is a big deal I e got this eidence. What s the chance that this conclusion is true? I e got a sore neck: how likely a I to hae eningitis? I see y lights are out and it s 9p. What s the chance y spouse is already asleep? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 46

24 Inference is a big deal I e got this eidence. What s the chance that this conclusion is true? I e got a sore neck: how likely a I to hae eningitis? I see y lights are out and it s 9p. What s the chance y spouse is already asleep? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 47 Inference is a big deal I e got this eidence. What s the chance that this conclusion is true? I e got a sore neck: how likely a I to hae eningitis? I see y lights are out and it s 9p. What s the chance y spouse is already asleep? There s a thriing set of industries growing based around Bayesian Inference. Highlights are: Medicine, Phara, Help Desk Support, Engine Fault Diagnosis Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 48

25 Where do Joint Distributions coe fro? Idea One: Expert Huans Idea Two: Sipler probabilistic facts and soe algebra Exaple: Suppose you knew A.7 B A.2 B ~A. C A^B. C A^~B.8 C ~A^B.3 C ~A^~B. Then you can autoatically copute the JD using the chain rule Ax ^ By ^ Cz Cz Ax^ By By Ax Ax In another lecture: Bayes Nets, a systeatic way to do this. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 49 Where do Joint Distributions coe fro? Idea Three: Learn the fro data! Prepare to see one of the ost ipressie learning algoriths you ll coe across in the entire course. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 5

26 Learning a joint distribution Build a JD table for your attributes in which the probabilities are unspecified A B C Prob???????? Fraction of all records in which A and B are True but C is False The fill in each row with records atching row Pˆ (row total nuber of records A B C Prob Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 5 Exaple of Learning a Joint This Joint was obtained by learning fro three attributes in the UCI Adult Census Database [Kohai 995] Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 52

27 Where are we? We hae recalled the fundaentals of probability We hae becoe content with what JDs are and how to use the And we een know how to learn JDs fro data. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 53 Density Estiation Our Joint Distribution learner is our first exaple of soething called Density Estiation A Density Estiator learns a apping fro a set of attributes to a Probability Input Attributes Density Estiator Probability Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 54

28 Density Estiation Copare it against the two other ajor kinds of odels: Input Attributes Classifier Prediction of categorical output Input Attributes Density Estiator Probability Input Attributes Regressor Prediction of real-alued output Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 55 Ealuating Density Estiation Test-set criterion for estiating perforance on future data* * See the Decision Tree or Cross Validation lecture for ore detail Input Attributes Classifier Prediction of categorical output Test set Accuracy Input Attributes Density Estiator Probability? Input Attributes Regressor Prediction of real-alued output Test set Accuracy Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 56

29 Ealuating a density estiator Gien a record x, a density estiator M can tell you how likely the record is: Pˆ ( x M Gien a dataset with R records, a density estiator can tell you how likely the dataset is: (Under the assuption that all records were independently generated fro the Density Estiator s JD Pˆ (dataset M Pˆ( x R x2 K xr M Pˆ( xk M k Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 57 A sall dataset: Miles Per Gallon pg odelyear aker 92 Training Set Records good 75to78 asia bad 7to74 aerica bad 75to78 europe bad 7to74 aerica bad 7to74 aerica bad 7to74 asia bad 7to74 asia bad 75to78 aerica : : : : : : : : : bad 7to74 aerica good 79to83 aerica bad 75to78 aerica good 79to83 aerica bad 75to78 aerica good 79to83 aerica good 79to83 aerica bad 7to74 aerica good 75to78 europe bad 75to78 europe Fro the UCI repository (thanks to Ross Quinlan Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 58

30 A sall dataset: Miles Per Gallon pg odelyear aker 92 Training Set Records good 75to78 asia bad 7to74 aerica bad 75to78 europe bad 7to74 aerica bad 7to74 aerica bad 7to74 asia bad 7to74 asia bad 75to78 aerica : : : : : : : : : bad 7to74 aerica good 79to83 aerica bad 75to78 aerica good 79to83 aerica bad 75to78 aerica good 79to83 aerica good 79to83 aerica bad 7to74 aerica good 75to78 europe bad 75to78 europe Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 59 A sall dataset: Miles Per Gallon pg odelyear aker 92 Training Set Records Pˆ(dataset M good 75to78 asia bad 7to74 aerica bad 75to78 europe bad 7to74 aerica bad 7to74 aerica bad 7to74 asia bad 7to74 asia bad 75to78 aerica : : : : : : : : : bad 7to74 aerica good 79to83 aerica bad 75to78P ˆ( aerica x good 79to83 aerica bad 75to78 aerica good 79to83 aerica good 79to83 (in this aerica bad 7to74 aerica good 75to78 europe bad 75to78 europe R x2 K x M k -23 case 3.4 R Pˆ( xk M Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 6

31 Log Probabilities Since probabilities of datasets get so sall we usually use log probabilities log Pˆ(dataset M log R k R Pˆ( xk M log Pˆ( xk M k Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 6 A sall dataset: Miles Per Gallon pg odelyear aker 92 Training Set Records log Pˆ(dataset good 75to78 asia bad 7to74 aerica bad 75to78 europe bad 7to74 aerica bad 7to74 aerica bad 7to74 asia bad 7to74 asia bad 75to78 aerica : : : : : : : : : bad 7to74 aerica good 79to83 aerica bad M75to78 aerica log good 79to83 aerica bad 75to78 aerica 79to83 aerica good 79to83 (in this aerica bad 7to74 aerica good 75to78 europe bad 75to78 europe R Pˆ( xk M log Pˆ( xk M k k case R Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 62

32 Suary: The Good News We hae a way to learn a Density Estiator fro data. Density estiators can do any good things Can sort the records by probability, and thus spot weird records (anoaly detection Can do inference: E E2 Autoatic Doctor / Help Desk etc Ingredient for Bayes Classifiers (see later Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 63 Suary: The Bad News Density estiation by directly learning the joint is triial, indless and dangerous Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 64

Using a test set An independent test set with 96 cars has a worse log likelihood (actually it s a billion quintillion quintillion quintillion quintillion ties less likely.density estiators can oerfit.

33 Using a test set An independent test set with 96 cars has a worse log likelihood (actually it s a billion quintillion quintillion quintillion quintillion ties less likely.density estiators can oerfit. And the full joint density estiator is the oerfittiest of the all! Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 65 Oerfitting Density Estiators If this eer happens, it eans there are certain cobinations that we learn are ipossible R log Pˆ(testset M log Pˆ( xk M log Pˆ( xk M k k if for any k Pˆ( x M k R Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 66

34 Using a test set The only reason that our test set didn t score -infinity is that y code is hard-wired to always predict a probability of at least one in 2 We need Density Estiators that are less prone to oerfitting Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 67 Naïe Density Estiation The proble with the Joint Estiator is that it just irrors the training data. We need soething which generalizes ore usefully. The naïe odel generalizes strongly: Assue that each attribute is distributed independently of any of the other attributes. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 68

35 Independently Distributed Data Let x[i] denote the i th field of record x. The independently distributed assuption says that for any i,, u u 2 u i- u i+ u M x[ i] x[] u, x[2] u2, Kx[ i ] ui, x[ i + ] ui+, Kx[ M ] u x[ i] Or in other words, x[i] is independent of {x[],x[2],..x[i-], x[i+], x[m]} This is often written as x[ i] { x[], x[2], K x[ i ], x[ i + ], Kx[ M ]} M Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 69 A note about independence Assue A and B are Boolean Rando Variables. Then A and B are independent if and only if A B A A and B are independent is often notated as A B Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 7

36 Reiew Session Independence Theores Assue A B A Then A^B Assue A B A Then B A A B B Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 7 Reiew Session Independence Theores Assue A B A Then ~A B Assue A B A Then A ~B ~A A Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 72

37 Multialued Independence For ultialued Rando Variables A and B, A B if and only if u, : A u B A u fro which you can then proe things like u, : A u B A u B u, : B A B Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 73 Back to Naïe Density Estiation Let x[i] denote the i th field of record x: Naïe DE assues x[i] is independent of {x[],x[2],..x[i-], x[i+], x[m]} Exaple: Suppose that each record is generated by randoly shaking a green dice and a red dice Dataset : A red alue, B green alue Dataset 2: A red alue, B su of alues Dataset 3: A su of alues, B difference of alues Which of these datasets iolates the naïe assuption? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 74

38 Using the Naïe Distribution Once you hae a Naïe Distribution you can easily copute any row of the joint distribution. Suppose A, B, C and D are independently distributed. What is A^~B^C^~D? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 75 Using the Naïe Distribution Once you hae a Naïe Distribution you can easily copute any row of the joint distribution. Suppose A, B, C and D are independently distributed. What is A^~B^C^~D? A ~B^C^~D ~B^C^~D A ~B^C^~D A ~B C^~D C^~D A ~B C^~D A ~B C ~D ~D A ~B C ~D Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 76

39 Naïe Distribution General Case Suppose x[], x[2], x[m] are independently distributed. M, x[2] u2, x[ M ] um x[ k] u k k x[] u K So if we hae a Naïe Distribution we can construct any row of the iplied Joint Distribution on deand. So we can do any inference But how do we learn a Naïe Density Estiator? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 77 Learning a Naïe Density Estiator Pˆ ( x[ i] u # records in which x[ i] u total nuber of records Another triial learning algorith! Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 78

Joint DE Contrast Naïe DE Can odel anything No proble to odel C is a noisy copy of A Gien records and ore than 6 Boolean attributes will screw up badly Can odel only ery boring distributions Outside

40 Joint DE Contrast Naïe DE Can odel anything No proble to odel C is a noisy copy of A Gien records and ore than 6 Boolean attributes will screw up badly Can odel only ery boring distributions Outside Naïe s scope Gien records and, ultialued attributes will be fine Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 79 Epirical Results: Hopeless The hopeless dataset consists of 4, records and 2 Boolean attributes called a,b,c, u. Each attribute in each record is generated 5-5 randoly as or. Reiew Session Aerage test set log probability during folds of k-fold cross-alidation* Described in a future Andrew lecture Despite the ast aount of data, Joint oerfits hopelessly and does uch worse Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 8

Epirical Results: Logical Reiew Session The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or.

41 Epirical Results: Logical Reiew Session The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or. D A^~C, except that in % of records it is flipped The DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 8 Epirical Results: Logical Reiew Session The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or. D A^~C, except that in % of records it is flipped The DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 82

Epirical Results: MPG The MPG dataset consists of 392 records and 8 attributes Reiew Session A tiny part of the DE learned by Joint The DE learned by Naie Copyright 2, Andrew W.

42 Epirical Results: MPG The MPG dataset consists of 392 records and 8 attributes Reiew Session A tiny part of the DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 83 Epirical Results: MPG The MPG dataset consists of 392 records and 8 attributes Reiew Session A tiny part of the DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 84

43 Epirical Results: Weight s. MPG Suppose we train only fro the Weight and MPG attributes Reiew Session The DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 85 Epirical Results: Weight s. MPG Suppose we train only fro the Weight and MPG attributes Reiew Session The DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 86

44 Weight s. MPG : The best that Naïe can do Reiew Session The DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 87 Reinder: The Good News We hae two ways to learn a Density Estiator fro data. *In other lectures we ll see astly ore ipressie Density Estiators (Mixture Models, Bayesian Networks, Density Trees, Kernel Densities and any ore Density estiators can do any good things Anoaly detection Can do inference: E E2 Autoatic Doctor / Help Desk etc Ingredient for Bayes Classifiers Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 88

45 Bayes Classifiers A foridable and sworn eney of decision trees Input Attributes Classifier Prediction of categorical output DT BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 89 How to build a Bayes Classifier Assue you want to predict output which has arity n and alues, 2, ny. Assue there are input attributes called X, X 2, X Break dataset into n saller datasets called DS, DS 2, DS ny. Define DS i Records in which i For each DS i, learn Density Estiator M i to odel the input distribution aong the i records. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 9

46 How to build a Bayes Classifier Assue you want to predict output which has arity n and alues, 2, ny. Assue there are input attributes called X, X 2, X Break dataset into n saller datasets called DS, DS 2, DS ny. Define DS i Records in which i For each DS i, learn Density Estiator M i to odel the input distribution aong the i records. M i estiates X, X 2, X i Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 9 How to build a Bayes Classifier Assue you want to predict output which has arity n and alues, 2, ny. Assue there are input attributes called X, X 2, X Break dataset into n saller datasets called DS, DS 2, DS ny. Define DS i Records in which i For each DS i, learn Density Estiator M i to odel the input distribution aong the i records. M i estiates X, X 2, X i Idea: When a new set of input alues (X u, X 2 u 2,. X u coe along to be ealuated predict the alue of that akes X, X 2, X i ost likely predict Is this a good idea? argax X u L X u Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 92

47 How to build a Bayes Classifier Assue you want to predict output which has arity n and alues, 2, ny. Assue there are input attributes This called is a Maxiu X, X 2, X Likelihood classifier. Break dataset into n saller datasets called DS, DS 2, DS ny. Define DS i Records in which It i can get silly if soe s are For each DS i, learn Density Estiator M i to odel the input ery unlikely distribution aong the i records. M i estiates X, X 2, X i Idea: When a new set of input alues (X u, X 2 u 2,. X u coe along to be ealuated predict the alue of that akes X, X 2, X i ost likely predict Is this a good idea? argax X u L X u Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 93 How to build a Bayes Classifier Assue you want to predict output which has arity n and alues, 2, ny. Assue there are input attributes called X, X 2, X Break dataset into n saller datasets called DS, DS 2, DS ny. Define DS i Records in which i Much Better Idea For each DS i, learn Density Estiator M i to odel the input distribution aong the i records. M i estiates X, X 2, X i Idea: When a new set of input alues (X u, X 2 u 2,. X u coe along to be ealuated predict the alue of that akes i X, X 2, X ost likely predict argax X u L X u Is this a good idea? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 94

48 Terinology MLE (Maxiu Likelihood Estiator: predict argax X u L X u MAP (Maxiu A-Posteriori Estiator: predict argax X u L X u Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 95 Getting what we need predict argax X u L X u Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 96

49 Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 97 Getting a posterior probability n j j j P u X u X P P u X u X P u X u X P P u X u X P u X u X P ( ( ( ( ( ( ( ( L L L L L Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 98 Bayes Classifiers in a nutshell ( ( argax ( argax predict P u X u X P u X u X P L L. Learn the distribution oer inputs for each alue. 2. This gies X, X 2, X i. 3. Estiate i. as fraction of records with i. 4. For a new prediction:

50 Bayes Classifiers in a nutshell. Learn the distribution oer inputs for each alue. 2. This gies X, X 2, X i. 3. Estiate i. as fraction of records with i. 4. For a new prediction: We can use our faorite Density Estiator here. predict argax argax X u L X X u u L X Right now we hae two options: u Joint Density Estiator Naïe Density Estiator Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 99 Joint Density Bayes Classifier predict argax X u L X u In the case of the joint Bayes Classifier this degenerates to a ery siple rule: predict the ost coon alue of aong records in which X u, X 2 u 2,. X u. Note that if no records hae the exact set of inputs X u, X 2 u 2,. X u, then X, X 2, X i for all alues of. In that case we just hae to guess s alue Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide

Joint BC Results: Logical The logical dataset consists of 4, records and 4 Boolean attributes

D A^~C, except that in % of records it is flipped The Classifier learned by Joint BC Copyright

Moore Probabilistic Analytics: Slide Joint BC Results: All Irreleant The all irreleant dataset

51 Joint BC Results: Logical The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or. D A^~C, except that in % of records it is flipped The Classifier learned by Joint BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide Joint BC Results: All Irreleant The all irreleant dataset consists of 4, records and 5 Boolean attributes called a,b,c,d..o where a,b,c are generated 5-5 randoly as or. (output with probability.75, with prob.25 Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2

52 Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3 Naïe Bayes Classifier ( ( argax predict P u X u X P L In the case of the naie Bayes Classifier this can be siplified: n j j j u X P P predict ( ( argax Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4 Naïe Bayes Classifier ( ( argax predict P u X u X P L In the case of the naie Bayes Classifier this can be siplified: n j j j u X P P predict ( ( argax Technical Hint: If you hae, input attributes that product will underflow in floating point ath. ou should use logs: + n j j j u X P P predict ( log ( log argax

BC Results: XOR The XOR dataset consists of 4, records and 2 Boolean inputs called a and b,

c (output a XOR b The Classifier learned by Joint BC The Classifier learned by Naie BC Copyright

Moore Probabilistic Analytics: Slide 5 Naie BC Results: Logical Reiew Session The logical

53 BC Results: XOR The XOR dataset consists of 4, records and 2 Boolean inputs called a and b, generated 5-5 randoly as or. c (output a XOR b The Classifier learned by Joint BC The Classifier learned by Naie BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 5 Naie BC Results: Logical Reiew Session The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or. D A^~C, except that in % of records it is flipped The Classifier learned by Naie BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 6

Naie BC Results: Logical Reiew Session The logical dataset consists of 4, records and 4 Boolean attributes called

D A^~C, except that in % of records it is flipped This result surprised Andrew until he had thought about it a

Moore Probabilistic Analytics: Slide 7 Naïe BC Results: All Irreleant The all irreleant dataset consists of 4,

54 Naie BC Results: Logical Reiew Session The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or. D A^~C, except that in % of records it is flipped This result surprised Andrew until he had thought about it a little The Classifier learned by Joint BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 7 Naïe BC Results: All Irreleant The all irreleant dataset consists of 4, records and 5 Boolean attributes called a,b,c,d..o where a,b,c are generated 5-5 randoly as or. (output with probability.75, with prob.25 The Classifier learned by Naie BC Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 8

55 BC Results: MPG : 392 records Reiew Session The Classifier learned by Naie BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 9 BC Results: MPG : 4 records Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide

56 More Facts About Bayes Classifiers Many other density estiators can be slotted in*. Density estiation can be perfored with real-alued inputs* Bayes Classifiers can be built with real-alued inputs* Rather Technical Coplaint: Bayes Classifiers don t try to be axially discriinatie---they erely try to honestly odel what s going on* Zero probabilities are painful for Joint and Naïe. A hack (justifiable with the agic words Dirichlet Prior can help*. Naïe Bayes is wonderfully cheap. And suries, attributes cheerfully! *See future Andrew Lectures Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide Probability What you should know Fundaentals of Probability and Bayes Rule What s a Joint Distribution How to do inference (i.e. E E2 once you hae a JD Density Estiation What is DE and what is it good for How to learn a Joint DE How to learn a naïe DE Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2

57 What you should know Bayes Classifiers How to build one How to predict with a BC Contrast between naïe and joint BCs Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3 Interesting Questions Suppose you were ealuating NaieBC, JointBC, and Decision Trees Inent a proble where only NaieBC would do well Inent a proble where only Dtree would do well Inent a proble where only JointBC would do well Inent a proble where only NaieBC would do poorly Inent a proble where only Dtree would do poorly Inent a proble where only JointBC would do poorly Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4

Probabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon University

Probabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon University robabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon Uniersity www.cs.cmu.edu/~awm/tutorials Discrete Random Variables A is a Boolean-alued random ariable if A denotes