Probabilistic and Bayesian Learning

Size: px

Start display at page:

Download "Probabilistic and Bayesian Learning"

Lindsay Cross
5 years ago
Views:

1 Probabilistic and Bayesian Learning Note to other teachers and sers of these slides. Andrew wold be delighted if yo fond this sorce aterial sefl in giing yor own lectres. Feel free to se these slides erbati, or to odify the to fit yor own needs. PowerPoint originals are aailable. If yo ake se of a significant portion of these slides in yor own lectre, please inclde this essage, or the following link to the sorce repository of Andrew s ttorials: Coents and corrections grateflly receied. Ronald J. Willias CSG22 Spring 27 Containing any slides adapted fro the Andrew Moore ttorial Probabilistic and Bayesian Analytics Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Probability The world is a ery ncertain place 3 years of Artificial Intelligence and Database research danced arond this fact And then a few AI researchers decided to se soe ideas fro the eighteenth centry Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 2

2 What we re going to do We will reiew the fndaentals of probability. It s really going to be worth it In this lectre, yo ll see an exaple of probabilistic analysis in action: Bayes Classifiers Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 3 Discrete Rando Variables E is a Boolean-aled rando ariable if E denotes an eent, and there is soe degree of ncertainty as to whether E occrs. Exaples E The US president in 223 will be ale E Yo wake p toorrow with a headache E Yo hae Ebola E (Otlook snny and (Wind strong Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 4 2

3 Probabilities We write E as the fraction of possible worlds in which E is tre We cold at this point spend 2 hors on the philosophy of this. Bt we won t. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 5 Visalizing E Eent space of all possible worlds Worlds in which E is tre E Area of brown circle Its area is Worlds in which E is False Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 6 3

4 The Axios of Probability < E < Tre False E or E 2 E + E 2 - E and E 2 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 7 These Axios are Not to be Trifled With There hae been attepts to do different ethodologies for ncertainty Fzzy Logic Three-aled logic Depster-Shafer Non-onotonic reasoning Bt the axios of probability are the only syste with this property: If yo gable sing the yo can t be nfairly exploited by an opponent sing soe other syste [di Finetti 93] Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 8 4

5 Theores fro the Axios Easy conseqences of the axios: ~E - E E E ^ E 2 + E ^ ~E 2 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 9 Mltialed Rando Variables Sppose A can take on any of seeral ales A is a rando ariable with arity k if it can take on exactly one ale ot of {, 2,.. k } Ths A i A j if i j P ( A 2 k A K A Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 5

6 Conditional Probability E E 2 Fraction of worlds in which E 2 is tre that also hae E tre H Hae a headache F Coing down with Fl F H / F /4 H F /2 H Headaches are rare and fl is rarer, bt if yo re coing down with fl there s a 5-5 chance yo ll hae a headache. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide Conditional Probability F H F Fraction of fl-inflicted worlds in which yo hae a headache H Hae a headache F Coing down with Fl H / F /4 H F /2 H #worlds with fl and headache #worlds with fl Area of H and F region Area of F region H ^ F F Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 2 6

7 Definition of Conditional Probability E ^ E 2 E E E 2 Corollary: The Chain Rle E ^ E 2 E E 2 E 2 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 3 Probabilistic Inference F H Hae a headache F Coing down with Fl H H / F /4 H F /2 One day yo wake p with a headache. Yo think: Drat! 5% of fls are associated with headaches so I st hae a 5-5 chance of coing down with fl Is this reasoning good? Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 4 7

8 Probabilistic Inference F H Hae a headache F Coing down with Fl H H / F /4 H F /2 F ^ H... F H... Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 5 Probabilistic Inference F H Hae a headache F Coing down with Fl H H / F /4 H F /2 F ^ H H ^ F H F F (/2*(/4 /8 F H... Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 6 8

9 Probabilistic Inference F H Hae a headache F Coing down with Fl H H / F /4 H F /2 F ^ H H ^ F H F F (/2*(/4 /8 F ^ H /8 F H /8 H / Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 7 What we jst did E ^ E 2 E E 2 E 2 E 2 E E E This is Bayes Rle Bayes, Thoas (763 An essay towards soling a proble in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:37-48 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 8 9

10 More General Fors of Bayes Rle F E E E F F E E + F ~ E ~ E F EG EG E F G F G Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 9 More General Fors of Bayes Rle A i F n A k F A A F A i k A i k Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 2

11 Usefl Easy-to-proe facts E F + ~ E F n A k A k F Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 2 The Joint Distribtion If A, A 2,..., A n are ltialed rando ariables, A, A2, K, An eans the fnction assigning to any, 2,..., n the probability P ( A A2 2 K A n n Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 22

12 Conditional Distribtions Sppose we hae a joint distribtion oer the n+ ltialed rando ariables A, A 2,..., A n, B, B 2,..., B. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 23 Then Conditional Distribtions A, A2, K, An B, B2, K, B eans the fnction assigning to any, 2,..., n,, 2,..., the conditional probability P ( A K An n B K B Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 24 2

13 Bayesian Hypothesis Learning D training data H hypothesis (treated as rando ariable H prior distribtion oer hypotheses foralizes indctie bias H D posterior distribtion after seeing the training data Then P ( H D D H H D Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 25 Bayesian Hypothesis Learning D training data H hypothesis (treated as rando ariable H prior distribtion oer hypotheses foralizes indctie bias H D posterior distribtion after seeing the training data Then P ( H D D H H D likelihood of the data Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 26 3

14 Bayesian Hypothesis Learning D training data H hypothesis (treated as rando ariable H prior distribtion oer hypotheses foralizes indctie bias H D posterior distribtion after seeing the training data Then P ( H D D H H D Fixed for any gien set of training data can ignore and treat as a noralizing constant Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 27 Bayesian Hypothesis Learning Gien data d, want hypothesis h Use P ( H h D d D d H h H h Maxi a posteriori (MAP hypothesis: h axiizing Hh Dd Maxi likelihood (ML hypothesis: h axiizing Dd Hh If H is nifor ( flat prior, they re the sae Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 28 4

15 Bayesian Hypothesis Learning a priori distribtion - before seeing the data a posteriori distribtion - after seeing the data H H D H E.g., nifor prior H MAP hypothesis Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 29 The Joint Distribtion Recipe for aking a joint distribtion of M ariables: Exaple: Boolean ariables A, B, C Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 3 5

16 The Joint Distribtion Recipe for aking a joint distribtion of M ariables:. Make a trth table listing all cobinations of ales of yor ariables (if there are M Boolean ariables then the table will hae 2 M rows. A Exaple: Boolean ariables A, B, C B C Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 3 The Joint Distribtion Recipe for aking a joint distribtion of M ariables:. Make a trth table listing all cobinations of ales of yor ariables (if there are M Boolean ariables then the table will hae 2 M rows. 2. For each cobination of ales, say how probable it is. A Exaple: Boolean ariables A, B, C B C Prob Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 32 6

17 The Joint Distribtion Recipe for aking a joint distribtion of M ariables:. Make a trth table listing all cobinations of ales of yor ariables (if there are M Boolean ariables then the table will hae 2 M rows. 2. For each cobination of ales, say how probable it is. 3. If yo sbscribe to the axios of probability, those nbers st s to. A A Exaple: Boolean ariables A, B, C B.5 C Prob C.3 B. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 33 Using the Joint Once yo hae the JD yo can ask for the probability of any logical expression inoling yor attribte E rows atching E row Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 34 7

18 Using the Joint Poor Male.4654 E rows atching E row Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 35 Using the Joint Poor.764 E rows atching E row Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 36 8

19 Inference with the Joint E E 2 E E2 E 2 rows atching E and E rows atching E row 2 2 row Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 37 Inference with the Joint E E 2 E E2 E 2 rows atching E and E rows atching E row 2 2 row Male Poor.4654 / Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 38 9

20 Inference is a big deal I e got this eidence. What s the chance that this conclsion is tre? I e got a sore neck: how likely a I to hae eningitis? I see y lights are ot and it s 9p. What s the chance y spose is already asleep? Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 39 Inference is a big deal I e got this eidence. What s the chance that this conclsion is tre? I e got a sore neck: how likely a I to hae eningitis? I see y lights are ot and it s 9p. What s the chance y spose is already asleep? There s a thriing set of indstries growing based arond Bayesian Inference. Highlights are: Medicine, Phara, Help Desk Spport, Engine Falt Diagnosis Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 4 2

21 Where do Joint Distribtions coe fro? Idea One: Expert Hans Idea Two: Sipler probabilistic facts and soe algebra Exaple: Sppose yo knew A.7 B A.2 B ~A. C A^B. C A^~B.8 C ~A^B.3 C ~A^~B. Then yo can atoatically copte the JD sing the chain rle Ax ^ By ^ Cz Cz Ax^ By By Ax Ax Essential idea behind inference in Bayesian networks Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 4 Where do Joint Distribtions coe fro? Idea Three: Learn the fro data! Prepare to see one of the ost ipressie learning algoriths yo ll coe across in the entire corse. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 42 2

22 Learning a joint distribtion Bild a JD table for yor attribtes in which the probabilities are nspecified A B C Prob???????? Fraction of all records in which A and B are Tre bt C is False The fill in each row with records atching row Pˆ (row total nber of records A B C Prob Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 43 Exaple of Learning a Joint This Joint was obtained by learning fro three attribtes in the UCI Adlt Censs Database [Kohai 995] Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 44 22

23 Where are we? We hae recalled the fndaentals of probability We hae becoe content with what JDs are and how to se the And we een know how to learn JDs fro data. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 45 Density Estiation Or Joint Distribtion learner is or first exaple of soething called Density Estiation A Density Estiator learns a apping fro a set of attribtes to a Probability Inpt Attribtes Density Estiator Probability Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 46 23

24 Density Estiation Copare it against the two other ajor kinds of odels: Inpt Attribtes Inpt Attribtes Classifier Regressor Prediction of categorical otpt Prediction of real-aled otpt Inpt Attribtes Density Estiator Probability Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 47 Sary: The Good News We hae a way to learn a Density Estiator fro data. Density estiators can do any good things Can sort the records by probability, and ths spot weird records (anoaly detection Can do inference: E E 2 Atoatic Doctor / Help Desk etc Ingredient for Bayes Classifiers (see later Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 48 24

25 Sary: The Bad News Density estiation by directly learning the joint is triial and indless reqires an aont of training data exponential in the nber of attribtes Fortnately there are alternaties Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 49 PlayTennis Exaple Want joint O, T, H, W, PT, where Otlook ales are {snny, oercast, rain} Teperatre ales are {hot, ild, cool} Hidity ales are {high, noral} Wind ales are {weak, strong} PlayTennis ales are {yes, no} Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 5 25

26 PlayTennis Exaple: Directly Learning the Joint Need total of 3*3*2*2*2 72 probabilities (7 independent nbers since they s to Hae 4 training exaples Siple-inded estiation of the joint wold assign probability /4 to the training exaples and probability to the reaining 56 possible cobinations Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 5 Naïe Density Estiation The proble with the Joint Estiator is that it jst irrors the training data. It has no possibility of generalizing reasonably to nseen data. The naïe odel generalizes strongly: Asse that each attribte is distribted independently of any of the other attribtes. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 52 26

27 Independent Eents Let E and E 2 be eents. Then E and E 2 are independent if and only if E E 2 E Means knowing that E 2 is tre has no effect on the probability that E is tre. E and E 2 are independent is often denoted by E E 2 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 53 Independence Theores Asse E and E 2 are independent. Then E ^E 2 E E 2 E 2 E E 2 ~E E 2 ~E E ~E 2 E Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 54 27

28 28 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 55 Mltialed Independence For ltialed Rando Variables A,, A n, B,, B, },, { },, { B B A n A K K if and only if ( (,,,,, n n n n n A A P B B A A P K K K K L Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 56 Definition: Mtal Independence Set of rando ariables {A,, A n } satisfying i A A A A A n i i i + },,,,, { K K In this case, the joint satisfies (, ( n i n A i P A A P K

29 Back to Naïe Density Estiation Let x[i] denote the i th field of record x: Naïe DE asses x[i] is independent of {x[],x[2],..x[i-], x[i+], x[m]} Exaple: Sppose that each record is generated by randoly rolling a green die and a red die Dataset : A red ale, B green ale Dataset 2: A red ale, B s of ales Dataset 3: A s of ales, B difference of ales Which of these datasets iolates the naïe assption? Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 57 Using the Naïe Distribtion Once yo hae a Naïe Distribtion yo can easily copte any row of the joint distribtion. Sppose A, B, C and D are tally independently distribted. What is A^~B^C^~D? Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 58 29

30 Using the Naïe Distribtion Once yo hae a Naïe Distribtion yo can easily copte any row of the joint distribtion. Sppose A, B, C and D are independently distribted. What is A^~B^C^~D? A ~B^C^~D ~B^C^~D A ~B^C^~D A ~B C^~D C^~D A ~B C^~D A ~B C ~D ~D A ~B C ~D Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 59 Naïe Distribtion General Case Sppose x[], x[2], x[m] are independently distribted. M, x[2] 2, x[ M ] M x[ k] k k x[] K So if we hae a Naïe Distribtion we can constrct any row of the iplied Joint Distribtion on deand. So we can do any inference Bt how do we learn a Naïe Density Estiator? Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 6 3

31 Learning a Naïe Density Estiator Pˆ ( x[ i] # records in which x[ i] total nber of records Another triial learning algorith! Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 6 Contrast Direct Joint DE Naïe DE Can odel anything Gien records and ore than 6 Boolean attribtes will screw p badly Can odel only ery boring distribtions Gien records and, ltialed attribtes will be fine Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 62 3

32 Reinder: The Good News We hae two ways to learn a Density Estiator fro data. There are any other astly ore ipressie Density Estiators (Mixtre Models, Bayesian Networks, Density Trees, Kernel Densities and any ore Density estiators can do any good things Anoaly detection Can do inference: E E 2 Atoatic Doctor / Help Desk etc Ingredient for Bayes Classifiers Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 63 Bayes Classifiers Let Y be the class (a rando ariable and X a rando ector of inpt attribtes. If we estiate the joint X, Y fro training data, gien a ector of ales x we can classify x by selecting the ale of y axiizing Yy Xx. This is all there is to a Bayes classifier. Any way of estiating the joint gies rise to a corresponding Bayes classifier. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 64 32

33 Bayes Classifiers Ways of estiating the joint. Directly fro data: Gies rise to a seless classifier nless we hae lots of data. Really jst eorization of the data with no real generalization. 2. Make the naïe assption for X, Y: No good either becase then Y X Y so the reslt does not depend on the inpt attribtes. 3. Asse conditional independence of attribtes gien the class. We ll exaine this in a oent. This yields the naïe Bayes classifier. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 65 How to bild a Bayes Classifier Asse yo want to predict otpt Y which has arity n Y and ales, 2, ny. Asse there are inpt attribtes called X, X 2, X Break dataset into n Y saller datasets called DS, DS 2, DS ny. Define DS i Records in which Y i For each DS i, learn Density Estiator M i to odel the inpt distribtion aong the Y i records. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 66 33

34 How to bild a Bayes Classifier Asse yo want to predict otpt Y which has arity n Y and ales, 2, ny. Asse there are inpt attribtes called X, X 2, X Break dataset into n Y saller datasets called DS, DS 2, DS ny. Define DS i Records in which Y i For each DS i, learn Density Estiator M i to odel the inpt distribtion aong the Y i records. M i estiates X, X 2,, X Y i Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 67 How to se a Bayes Classifier When a new set of inpt ales (X, X 2 2,. X coe along to be ealated, predict the ale of Y that akes Y i X, X 2, X largest: predict Y argax Y X L X Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 68 34

35 35 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 69 Bayes Classifier n Y j j j Y P Y X X P Y P Y X X P X X P Y P Y X X P X X Y P ( ( ( ( ( ( ( ( L L L L L ( argax predict X X Y P Y L Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 7 Bayes Classifiers in a ntshell ( ( argax ( argax predict Y P Y X X P X X Y P Y L L. Learn the distribtion oer inpts for each ale of Y. 2. This gies X, X 2, X Y i. 3. Estiate Y i. as fraction of records with Y i. 4. For a new prediction: How shold we estiate these conditional densities?

36 Conditional Independence Let E, E 2, and E 3 be eents. Then E and E 2 are conditionally independent gien E 3 if and only if E E 2^E 3 E E 3 Means that when E 3 is known to be tre, knowing that E 2 is also tre has no effect on the probability that E is tre. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 7 Y predict Naïe Bayes Classifier General Bayes classifier: Y argax X L X predict argax Y j X j j Y Y Y Make the naïe assption that the attribtes are tally conditionally independent gien the class. This leads to the following drastic siplification: Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 72 36

37 Naïe Bayes Classifier Y predict argax Y j X j j Y Technical Hint: If yo hae, inpt attribtes that prodct will nderflow in floating point ath. Yo shold se logs: Y predict argax log Y + log X j j j Y Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 73 PlayTennis Exaple Hae joint O, T, H, W, PT, where Otlook ales are {snny, oercast, rain} Teperatre ales are {hot, ild, cool} Hidity ales are {high, noral} Wind ales are {weak, strong} PlayTennis ales are {yes, no} Total of 72 probabilities inoled (7 free paraeters Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 74 37

38 PlayTennis exaple: naïe Bayes Jst need 4 pairwise conditional densities: Otlook PlayTennis [4 free paras.] Teperatre PlayTennis [4 free paras.] Hidity PlayTennis [2 free paras.] Wind PlayTennis [2 free paras.] Pls the prior PlayTennis [ free para.] Total of only 3 free paraeters (22 probability ales inoled. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 75 PlayTennis exaple: estiating the reqired conditional probabilities For exaple: # of data with Os ^ PTy Os PT y # of data with PTy 2/9 In all, need to deterine 22 probability estiates. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 76 38

39 More Facts Abot Bayes Classifiers Many other density estiators can be slotted in*. Density estiation can be perfored with real-aled inpts* Bayes Classifiers can be bilt with real-aled inpts* Rather Technical Coplaint: Bayes Classifiers don t try to be axially discriinatie---they erely try to honestly odel what s going on* Zero probabilities are painfl for Joint and Naïe. A hack (jstifiable with the agic words Dirichlet Prior can help*. Naïe Bayes is wonderflly cheap. And sries, attribtes cheerflly! Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias *See ftre Andrew Lectres Bayesian Learning: Slide 77 More Facts Abot Bayes Classifiers Many other density estiators can be slotted in*. Density estiation can be perfored with real-aled inpts* Bayes Classifiers can be bilt with real-aled inpts* Coing attraction: We ll learn all abot Dirichletpriors and how they help s aoid zero probability estiates in the next set of slides. Rather Technical Coplaint: Bayes Classifiers don t try to be axially discriinatie---they erely try to honestly odel what s going on* Zero probabilities are painfl for Joint and Naïe. A hack (jstifiable with the agic words Dirichlet Prior can help*. Naïe Bayes is wonderflly cheap. And sries, attribtes cheerflly! Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias *See ftre Andrew Lectres Bayesian Learning: Slide 78 39

40 Probability What yo shold know Fndaentals of Probability and Bayes Rle What s a Joint Distribtion How to do inference (i.e. E E 2 once yo hae a JD Bayesian Hypothesis Learning MAP hypotheses Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 79 What yo shold know Density Estiation What is DE and what is it good for How to learn a Joint DE How to learn a naïe DE Bayes Classifiers How to bild one How to predict with a BC Contrast between naïe and joint BCs Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 8 4

Probabilistic and Bayesian Analytics

Probabilistic and Bayesian Analytics This ersion of the Powerpoint presentation has been labeled to let you know which bits will by coered in the ain class (Tue/Thu orning lectures and which parts will be coered in the reiew sessions. Probabilistic