Probabilistic and Bayesian Learning

Size: px
Start display at page:

Download "Probabilistic and Bayesian Learning"

Transcription

1 Probabilistic and Bayesian Learning Note to other teachers and sers of these slides. Andrew wold be delighted if yo fond this sorce aterial sefl in giing yor own lectres. Feel free to se these slides erbati, or to odify the to fit yor own needs. PowerPoint originals are aailable. If yo ake se of a significant portion of these slides in yor own lectre, please inclde this essage, or the following link to the sorce repository of Andrew s ttorials: Coents and corrections grateflly receied. Ronald J. Willias CSG22 Spring 27 Containing any slides adapted fro the Andrew Moore ttorial Probabilistic and Bayesian Analytics Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Probability The world is a ery ncertain place 3 years of Artificial Intelligence and Database research danced arond this fact And then a few AI researchers decided to se soe ideas fro the eighteenth centry Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 2

2 What we re going to do We will reiew the fndaentals of probability. It s really going to be worth it In this lectre, yo ll see an exaple of probabilistic analysis in action: Bayes Classifiers Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 3 Discrete Rando Variables E is a Boolean-aled rando ariable if E denotes an eent, and there is soe degree of ncertainty as to whether E occrs. Exaples E The US president in 223 will be ale E Yo wake p toorrow with a headache E Yo hae Ebola E (Otlook snny and (Wind strong Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 4 2

3 Probabilities We write E as the fraction of possible worlds in which E is tre We cold at this point spend 2 hors on the philosophy of this. Bt we won t. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 5 Visalizing E Eent space of all possible worlds Worlds in which E is tre E Area of brown circle Its area is Worlds in which E is False Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 6 3

4 The Axios of Probability < E < Tre False E or E 2 E + E 2 - E and E 2 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 7 These Axios are Not to be Trifled With There hae been attepts to do different ethodologies for ncertainty Fzzy Logic Three-aled logic Depster-Shafer Non-onotonic reasoning Bt the axios of probability are the only syste with this property: If yo gable sing the yo can t be nfairly exploited by an opponent sing soe other syste [di Finetti 93] Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 8 4

5 Theores fro the Axios Easy conseqences of the axios: ~E - E E E ^ E 2 + E ^ ~E 2 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 9 Mltialed Rando Variables Sppose A can take on any of seeral ales A is a rando ariable with arity k if it can take on exactly one ale ot of {, 2,.. k } Ths A i A j if i j P ( A 2 k A K A Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 5

6 Conditional Probability E E 2 Fraction of worlds in which E 2 is tre that also hae E tre H Hae a headache F Coing down with Fl F H / F /4 H F /2 H Headaches are rare and fl is rarer, bt if yo re coing down with fl there s a 5-5 chance yo ll hae a headache. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide Conditional Probability F H F Fraction of fl-inflicted worlds in which yo hae a headache H Hae a headache F Coing down with Fl H / F /4 H F /2 H #worlds with fl and headache #worlds with fl Area of H and F region Area of F region H ^ F F Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 2 6

7 Definition of Conditional Probability E ^ E 2 E E E 2 Corollary: The Chain Rle E ^ E 2 E E 2 E 2 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 3 Probabilistic Inference F H Hae a headache F Coing down with Fl H H / F /4 H F /2 One day yo wake p with a headache. Yo think: Drat! 5% of fls are associated with headaches so I st hae a 5-5 chance of coing down with fl Is this reasoning good? Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 4 7

8 Probabilistic Inference F H Hae a headache F Coing down with Fl H H / F /4 H F /2 F ^ H... F H... Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 5 Probabilistic Inference F H Hae a headache F Coing down with Fl H H / F /4 H F /2 F ^ H H ^ F H F F (/2*(/4 /8 F H... Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 6 8

9 Probabilistic Inference F H Hae a headache F Coing down with Fl H H / F /4 H F /2 F ^ H H ^ F H F F (/2*(/4 /8 F ^ H /8 F H /8 H / Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 7 What we jst did E ^ E 2 E E 2 E 2 E 2 E E E This is Bayes Rle Bayes, Thoas (763 An essay towards soling a proble in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:37-48 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 8 9

10 More General Fors of Bayes Rle F E E E F F E E + F ~ E ~ E F EG EG E F G F G Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 9 More General Fors of Bayes Rle A i F n A k F A A F A i k A i k Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 2

11 Usefl Easy-to-proe facts E F + ~ E F n A k A k F Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 2 The Joint Distribtion If A, A 2,..., A n are ltialed rando ariables, A, A2, K, An eans the fnction assigning to any, 2,..., n the probability P ( A A2 2 K A n n Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 22

12 Conditional Distribtions Sppose we hae a joint distribtion oer the n+ ltialed rando ariables A, A 2,..., A n, B, B 2,..., B. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 23 Then Conditional Distribtions A, A2, K, An B, B2, K, B eans the fnction assigning to any, 2,..., n,, 2,..., the conditional probability P ( A K An n B K B Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 24 2

13 Bayesian Hypothesis Learning D training data H hypothesis (treated as rando ariable H prior distribtion oer hypotheses foralizes indctie bias H D posterior distribtion after seeing the training data Then P ( H D D H H D Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 25 Bayesian Hypothesis Learning D training data H hypothesis (treated as rando ariable H prior distribtion oer hypotheses foralizes indctie bias H D posterior distribtion after seeing the training data Then P ( H D D H H D likelihood of the data Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 26 3

14 Bayesian Hypothesis Learning D training data H hypothesis (treated as rando ariable H prior distribtion oer hypotheses foralizes indctie bias H D posterior distribtion after seeing the training data Then P ( H D D H H D Fixed for any gien set of training data can ignore and treat as a noralizing constant Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 27 Bayesian Hypothesis Learning Gien data d, want hypothesis h Use P ( H h D d D d H h H h Maxi a posteriori (MAP hypothesis: h axiizing Hh Dd Maxi likelihood (ML hypothesis: h axiizing Dd Hh If H is nifor ( flat prior, they re the sae Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 28 4

15 Bayesian Hypothesis Learning a priori distribtion - before seeing the data a posteriori distribtion - after seeing the data H H D H E.g., nifor prior H MAP hypothesis Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 29 The Joint Distribtion Recipe for aking a joint distribtion of M ariables: Exaple: Boolean ariables A, B, C Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 3 5

16 The Joint Distribtion Recipe for aking a joint distribtion of M ariables:. Make a trth table listing all cobinations of ales of yor ariables (if there are M Boolean ariables then the table will hae 2 M rows. A Exaple: Boolean ariables A, B, C B C Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 3 The Joint Distribtion Recipe for aking a joint distribtion of M ariables:. Make a trth table listing all cobinations of ales of yor ariables (if there are M Boolean ariables then the table will hae 2 M rows. 2. For each cobination of ales, say how probable it is. A Exaple: Boolean ariables A, B, C B C Prob Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 32 6

17 The Joint Distribtion Recipe for aking a joint distribtion of M ariables:. Make a trth table listing all cobinations of ales of yor ariables (if there are M Boolean ariables then the table will hae 2 M rows. 2. For each cobination of ales, say how probable it is. 3. If yo sbscribe to the axios of probability, those nbers st s to. A A Exaple: Boolean ariables A, B, C B.5 C Prob C.3 B. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 33 Using the Joint Once yo hae the JD yo can ask for the probability of any logical expression inoling yor attribte E rows atching E row Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 34 7

18 Using the Joint Poor Male.4654 E rows atching E row Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 35 Using the Joint Poor.764 E rows atching E row Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 36 8

19 Inference with the Joint E E 2 E E2 E 2 rows atching E and E rows atching E row 2 2 row Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 37 Inference with the Joint E E 2 E E2 E 2 rows atching E and E rows atching E row 2 2 row Male Poor.4654 / Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 38 9

20 Inference is a big deal I e got this eidence. What s the chance that this conclsion is tre? I e got a sore neck: how likely a I to hae eningitis? I see y lights are ot and it s 9p. What s the chance y spose is already asleep? Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 39 Inference is a big deal I e got this eidence. What s the chance that this conclsion is tre? I e got a sore neck: how likely a I to hae eningitis? I see y lights are ot and it s 9p. What s the chance y spose is already asleep? There s a thriing set of indstries growing based arond Bayesian Inference. Highlights are: Medicine, Phara, Help Desk Spport, Engine Falt Diagnosis Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 4 2

21 Where do Joint Distribtions coe fro? Idea One: Expert Hans Idea Two: Sipler probabilistic facts and soe algebra Exaple: Sppose yo knew A.7 B A.2 B ~A. C A^B. C A^~B.8 C ~A^B.3 C ~A^~B. Then yo can atoatically copte the JD sing the chain rle Ax ^ By ^ Cz Cz Ax^ By By Ax Ax Essential idea behind inference in Bayesian networks Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 4 Where do Joint Distribtions coe fro? Idea Three: Learn the fro data! Prepare to see one of the ost ipressie learning algoriths yo ll coe across in the entire corse. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 42 2

22 Learning a joint distribtion Bild a JD table for yor attribtes in which the probabilities are nspecified A B C Prob???????? Fraction of all records in which A and B are Tre bt C is False The fill in each row with records atching row Pˆ (row total nber of records A B C Prob Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 43 Exaple of Learning a Joint This Joint was obtained by learning fro three attribtes in the UCI Adlt Censs Database [Kohai 995] Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 44 22

23 Where are we? We hae recalled the fndaentals of probability We hae becoe content with what JDs are and how to se the And we een know how to learn JDs fro data. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 45 Density Estiation Or Joint Distribtion learner is or first exaple of soething called Density Estiation A Density Estiator learns a apping fro a set of attribtes to a Probability Inpt Attribtes Density Estiator Probability Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 46 23

24 Density Estiation Copare it against the two other ajor kinds of odels: Inpt Attribtes Inpt Attribtes Classifier Regressor Prediction of categorical otpt Prediction of real-aled otpt Inpt Attribtes Density Estiator Probability Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 47 Sary: The Good News We hae a way to learn a Density Estiator fro data. Density estiators can do any good things Can sort the records by probability, and ths spot weird records (anoaly detection Can do inference: E E 2 Atoatic Doctor / Help Desk etc Ingredient for Bayes Classifiers (see later Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 48 24

25 Sary: The Bad News Density estiation by directly learning the joint is triial and indless reqires an aont of training data exponential in the nber of attribtes Fortnately there are alternaties Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 49 PlayTennis Exaple Want joint O, T, H, W, PT, where Otlook ales are {snny, oercast, rain} Teperatre ales are {hot, ild, cool} Hidity ales are {high, noral} Wind ales are {weak, strong} PlayTennis ales are {yes, no} Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 5 25

26 PlayTennis Exaple: Directly Learning the Joint Need total of 3*3*2*2*2 72 probabilities (7 independent nbers since they s to Hae 4 training exaples Siple-inded estiation of the joint wold assign probability /4 to the training exaples and probability to the reaining 56 possible cobinations Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 5 Naïe Density Estiation The proble with the Joint Estiator is that it jst irrors the training data. It has no possibility of generalizing reasonably to nseen data. The naïe odel generalizes strongly: Asse that each attribte is distribted independently of any of the other attribtes. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 52 26

27 Independent Eents Let E and E 2 be eents. Then E and E 2 are independent if and only if E E 2 E Means knowing that E 2 is tre has no effect on the probability that E is tre. E and E 2 are independent is often denoted by E E 2 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 53 Independence Theores Asse E and E 2 are independent. Then E ^E 2 E E 2 E 2 E E 2 ~E E 2 ~E E ~E 2 E Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 54 27

28 28 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 55 Mltialed Independence For ltialed Rando Variables A,, A n, B,, B, },, { },, { B B A n A K K if and only if ( (,,,,, n n n n n A A P B B A A P K K K K L Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 56 Definition: Mtal Independence Set of rando ariables {A,, A n } satisfying i A A A A A n i i i + },,,,, { K K In this case, the joint satisfies (, ( n i n A i P A A P K

29 Back to Naïe Density Estiation Let x[i] denote the i th field of record x: Naïe DE asses x[i] is independent of {x[],x[2],..x[i-], x[i+], x[m]} Exaple: Sppose that each record is generated by randoly rolling a green die and a red die Dataset : A red ale, B green ale Dataset 2: A red ale, B s of ales Dataset 3: A s of ales, B difference of ales Which of these datasets iolates the naïe assption? Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 57 Using the Naïe Distribtion Once yo hae a Naïe Distribtion yo can easily copte any row of the joint distribtion. Sppose A, B, C and D are tally independently distribted. What is A^~B^C^~D? Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 58 29

30 Using the Naïe Distribtion Once yo hae a Naïe Distribtion yo can easily copte any row of the joint distribtion. Sppose A, B, C and D are independently distribted. What is A^~B^C^~D? A ~B^C^~D ~B^C^~D A ~B^C^~D A ~B C^~D C^~D A ~B C^~D A ~B C ~D ~D A ~B C ~D Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 59 Naïe Distribtion General Case Sppose x[], x[2], x[m] are independently distribted. M, x[2] 2, x[ M ] M x[ k] k k x[] K So if we hae a Naïe Distribtion we can constrct any row of the iplied Joint Distribtion on deand. So we can do any inference Bt how do we learn a Naïe Density Estiator? Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 6 3

31 Learning a Naïe Density Estiator Pˆ ( x[ i] # records in which x[ i] total nber of records Another triial learning algorith! Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 6 Contrast Direct Joint DE Naïe DE Can odel anything Gien records and ore than 6 Boolean attribtes will screw p badly Can odel only ery boring distribtions Gien records and, ltialed attribtes will be fine Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 62 3

32 Reinder: The Good News We hae two ways to learn a Density Estiator fro data. There are any other astly ore ipressie Density Estiators (Mixtre Models, Bayesian Networks, Density Trees, Kernel Densities and any ore Density estiators can do any good things Anoaly detection Can do inference: E E 2 Atoatic Doctor / Help Desk etc Ingredient for Bayes Classifiers Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 63 Bayes Classifiers Let Y be the class (a rando ariable and X a rando ector of inpt attribtes. If we estiate the joint X, Y fro training data, gien a ector of ales x we can classify x by selecting the ale of y axiizing Yy Xx. This is all there is to a Bayes classifier. Any way of estiating the joint gies rise to a corresponding Bayes classifier. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 64 32

33 Bayes Classifiers Ways of estiating the joint. Directly fro data: Gies rise to a seless classifier nless we hae lots of data. Really jst eorization of the data with no real generalization. 2. Make the naïe assption for X, Y: No good either becase then Y X Y so the reslt does not depend on the inpt attribtes. 3. Asse conditional independence of attribtes gien the class. We ll exaine this in a oent. This yields the naïe Bayes classifier. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 65 How to bild a Bayes Classifier Asse yo want to predict otpt Y which has arity n Y and ales, 2, ny. Asse there are inpt attribtes called X, X 2, X Break dataset into n Y saller datasets called DS, DS 2, DS ny. Define DS i Records in which Y i For each DS i, learn Density Estiator M i to odel the inpt distribtion aong the Y i records. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 66 33

34 How to bild a Bayes Classifier Asse yo want to predict otpt Y which has arity n Y and ales, 2, ny. Asse there are inpt attribtes called X, X 2, X Break dataset into n Y saller datasets called DS, DS 2, DS ny. Define DS i Records in which Y i For each DS i, learn Density Estiator M i to odel the inpt distribtion aong the Y i records. M i estiates X, X 2,, X Y i Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 67 How to se a Bayes Classifier When a new set of inpt ales (X, X 2 2,. X coe along to be ealated, predict the ale of Y that akes Y i X, X 2, X largest: predict Y argax Y X L X Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 68 34

35 35 Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 69 Bayes Classifier n Y j j j Y P Y X X P Y P Y X X P X X P Y P Y X X P X X Y P ( ( ( ( ( ( ( ( L L L L L ( argax predict X X Y P Y L Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 7 Bayes Classifiers in a ntshell ( ( argax ( argax predict Y P Y X X P X X Y P Y L L. Learn the distribtion oer inpts for each ale of Y. 2. This gies X, X 2, X Y i. 3. Estiate Y i. as fraction of records with Y i. 4. For a new prediction: How shold we estiate these conditional densities?

36 Conditional Independence Let E, E 2, and E 3 be eents. Then E and E 2 are conditionally independent gien E 3 if and only if E E 2^E 3 E E 3 Means that when E 3 is known to be tre, knowing that E 2 is also tre has no effect on the probability that E is tre. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 7 Y predict Naïe Bayes Classifier General Bayes classifier: Y argax X L X predict argax Y j X j j Y Y Y Make the naïe assption that the attribtes are tally conditionally independent gien the class. This leads to the following drastic siplification: Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 72 36

37 Naïe Bayes Classifier Y predict argax Y j X j j Y Technical Hint: If yo hae, inpt attribtes that prodct will nderflow in floating point ath. Yo shold se logs: Y predict argax log Y + log X j j j Y Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 73 PlayTennis Exaple Hae joint O, T, H, W, PT, where Otlook ales are {snny, oercast, rain} Teperatre ales are {hot, ild, cool} Hidity ales are {high, noral} Wind ales are {weak, strong} PlayTennis ales are {yes, no} Total of 72 probabilities inoled (7 free paraeters Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 74 37

38 PlayTennis exaple: naïe Bayes Jst need 4 pairwise conditional densities: Otlook PlayTennis [4 free paras.] Teperatre PlayTennis [4 free paras.] Hidity PlayTennis [2 free paras.] Wind PlayTennis [2 free paras.] Pls the prior PlayTennis [ free para.] Total of only 3 free paraeters (22 probability ales inoled. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 75 PlayTennis exaple: estiating the reqired conditional probabilities For exaple: # of data with Os ^ PTy Os PT y # of data with PTy 2/9 In all, need to deterine 22 probability estiates. Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 76 38

39 More Facts Abot Bayes Classifiers Many other density estiators can be slotted in*. Density estiation can be perfored with real-aled inpts* Bayes Classifiers can be bilt with real-aled inpts* Rather Technical Coplaint: Bayes Classifiers don t try to be axially discriinatie---they erely try to honestly odel what s going on* Zero probabilities are painfl for Joint and Naïe. A hack (jstifiable with the agic words Dirichlet Prior can help*. Naïe Bayes is wonderflly cheap. And sries, attribtes cheerflly! Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias *See ftre Andrew Lectres Bayesian Learning: Slide 77 More Facts Abot Bayes Classifiers Many other density estiators can be slotted in*. Density estiation can be perfored with real-aled inpts* Bayes Classifiers can be bilt with real-aled inpts* Coing attraction: We ll learn all abot Dirichletpriors and how they help s aoid zero probability estiates in the next set of slides. Rather Technical Coplaint: Bayes Classifiers don t try to be axially discriinatie---they erely try to honestly odel what s going on* Zero probabilities are painfl for Joint and Naïe. A hack (jstifiable with the agic words Dirichlet Prior can help*. Naïe Bayes is wonderflly cheap. And sries, attribtes cheerflly! Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias *See ftre Andrew Lectres Bayesian Learning: Slide 78 39

40 Probability What yo shold know Fndaentals of Probability and Bayes Rle What s a Joint Distribtion How to do inference (i.e. E E 2 once yo hae a JD Bayesian Hypothesis Learning MAP hypotheses Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 79 What yo shold know Density Estiation What is DE and what is it good for How to learn a Joint DE How to learn a naïe DE Bayes Classifiers How to bild one How to predict with a BC Contrast between naïe and joint BCs Originals 2, Andrew W. Moore, Modifications 23, Ronald J. Willias Bayesian Learning: Slide 8 4

Probabilistic and Bayesian Analytics

Probabilistic and Bayesian Analytics This ersion of the Powerpoint presentation has been labeled to let you know which bits will by coered in the ain class (Tue/Thu orning lectures and which parts will be coered in the reiew sessions. Probabilistic

More information

Probabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon University

Probabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon University robabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon Uniersity www.cs.cmu.edu/~awm/tutorials Discrete Random Variables A is a Boolean-alued random ariable if A denotes

More information

Probabilistic and Bayesian Analytics

Probabilistic and Bayesian Analytics Probabilistic and Bayesian Analytics Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these

More information

CS 331: Artificial Intelligence Naïve Bayes. Naïve Bayes

CS 331: Artificial Intelligence Naïve Bayes. Naïve Bayes CS 33: Artificial Intelligence Naïe Bayes Thanks to Andrew Moore for soe corse aterial Naïe Bayes A special type of Bayesian network Makes a conditional independence assption Typically sed for classification

More information

Bayes and Naïve Bayes Classifiers CS434

Bayes and Naïve Bayes Classifiers CS434 Bayes and Naïve Bayes Classifiers CS434 In this lectre 1. Review some basic probability concepts 2. Introdce a sefl probabilistic rle - Bayes rle 3. Introdce the learning algorithm based on Bayes rle (ths

More information

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com Probability Basics These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Andrew W. Moore Professor School of Computer Science Carnegie Mellon University

Andrew W. Moore Professor School of Computer Science Carnegie Mellon University Spport Vector Machines Note to other teachers and sers of these slides. Andrew wold be delighted if yo fond this sorce material sefl in giving yor own lectres. Feel free to se these slides verbatim, or

More information

Gaussians. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Gaussians. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. Note to other teachers and users of these slides. Andrew would be delighted if you found this source aterial useful in giing your own lectures. Feel free to use these slides erbati, or to odify the to

More information

ANOVA INTERPRETING. It might be tempting to just look at the data and wing it

ANOVA INTERPRETING. It might be tempting to just look at the data and wing it Introdction to Statistics in Psychology PSY 2 Professor Greg Francis Lectre 33 ANalysis Of VAriance Something erss which thing? ANOVA Test statistic: F = MS B MS W Estimated ariability from noise and mean

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

Lecture 3. (2) Last time: 3D space. The dot product. Dan Nichols January 30, 2018

Lecture 3. (2) Last time: 3D space. The dot product. Dan Nichols January 30, 2018 Lectre 3 The dot prodct Dan Nichols nichols@math.mass.ed MATH 33, Spring 018 Uniersity of Massachsetts Janary 30, 018 () Last time: 3D space Right-hand rle, the three coordinate planes 3D coordinate system:

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 06: ayes Theorem Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) Andrew Moore (CMU/Google) 1 Naïve ayes Classifier

More information

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci. Soft Computing Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning

More information

ENGINEERING COUNCIL DYNAMICS OF MECHANICAL SYSTEMS D225 TUTORIAL 2 LINEAR IMPULSE AND MOMENTUM

ENGINEERING COUNCIL DYNAMICS OF MECHANICAL SYSTEMS D225 TUTORIAL 2 LINEAR IMPULSE AND MOMENTUM ENGINEERING COUNCIL DYNAMICS OF MECHANICAL SYSTEMS D5 TUTORIAL LINEAR IMPULSE AND MOMENTUM On copletion of this ttorial yo shold be able to do the following. State Newton s laws of otion. Define linear

More information

Sources of Uncertainty

Sources of Uncertainty Probability Basics Sources of Uncertainty The world is a very uncertain place... Uncertain inputs Missing data Noisy data Uncertain knowledge Mul>ple causes lead to mul>ple effects Incomplete enumera>on

More information

Change of Variables. f(x, y) da = (1) If the transformation T hasn t already been given, come up with the transformation to use.

Change of Variables. f(x, y) da = (1) If the transformation T hasn t already been given, come up with the transformation to use. MATH 2Q Spring 26 Daid Nichols Change of Variables Change of ariables in mltiple integrals is complicated, bt it can be broken down into steps as follows. The starting point is a doble integral in & y.

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

One Dimensional Collisions

One Dimensional Collisions One Diensional Collisions These notes will discuss a few different cases of collisions in one diension, arying the relatie ass of the objects and considering particular cases of who s oing. Along the way,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

A [somewhat] Quick Overview of Probability. Shannon Quinn CSCI 6900

A [somewhat] Quick Overview of Probability. Shannon Quinn CSCI 6900 A [somewhat] Quick Overview of Probability Shannon Quinn CSCI 6900 [Some material pilfered from http://www.cs.cmu.edu/~awm/tutorials] Probabilistic and Bayesian Analytics Note to other teachers and users

More information

Bayes Nets for representing and reasoning about uncertainty

Bayes Nets for representing and reasoning about uncertainty Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. ndrew would be delighted if you found this source material useful in giing your own lectures.

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

MATH2715: Statistical Methods

MATH2715: Statistical Methods MATH275: Statistical Methods Exercises III (based on lectres 5-6, work week 4, hand in lectre Mon 23 Oct) ALL qestions cont towards the continos assessment for this modle. Q. If X has a niform distribtion

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Models to Estimate the Unicast and Multicast Resource Demand for a Bouquet of IP-Transported TV Channels

Models to Estimate the Unicast and Multicast Resource Demand for a Bouquet of IP-Transported TV Channels Models to stiate the Unicast and Mlticast Resorce Deand for a Boqet of IP-Transported TV Channels Z. Avraova, D. De Vleeschawer,, S. Wittevrongel, H. Brneel SMACS Research Grop, Departent of Teleconications

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

CS 331: Artificial Intelligence Probability I. Dealing with Uncertainty

CS 331: Artificial Intelligence Probability I. Dealing with Uncertainty CS 331: Artificial Intelligence Probability I Thanks to Andrew Moore for some course material 1 Dealing with Uncertainty We want to get to the point where we can reason with uncertainty This will require

More information

3.3 Operations With Vectors, Linear Combinations

3.3 Operations With Vectors, Linear Combinations Operations With Vectors, Linear Combinations Performance Criteria: (d) Mltiply ectors by scalars and add ectors, algebraically Find linear combinations of ectors algebraically (e) Illstrate the parallelogram

More information

Bayes Nets for representing and reasoning about uncertainty

Bayes Nets for representing and reasoning about uncertainty Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. ndrew would be delighted if you found this source material useful in giving your own lectures.

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

1 The space of linear transformations from R n to R m :

1 The space of linear transformations from R n to R m : Math 540 Spring 20 Notes #4 Higher deriaties, Taylor s theorem The space of linear transformations from R n to R m We hae discssed linear transformations mapping R n to R m We can add sch linear transformations

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Relativity II. The laws of physics are identical in all inertial frames of reference. equivalently

Relativity II. The laws of physics are identical in all inertial frames of reference. equivalently Relatiity II I. Henri Poincare's Relatiity Principle In the late 1800's, Henri Poincare proposed that the principle of Galilean relatiity be expanded to inclde all physical phenomena and not jst mechanics.

More information

Will Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning

Will Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning Will Monroe August 9, 07 with aterials by Mehran Sahai and Chris Piech iage: Arito Paraeter learning Announceent: Proble Set #6 Goes out tonight. Due the last day of class, Wednesday, August 6 (before

More information

Basic Probability and Statistics

Basic Probability and Statistics Basic Probability and Statistics Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Jerry Zhu, Mark Craven] slide 1 Reasoning with Uncertainty

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

EE2 Mathematics : Functions of Multiple Variables

EE2 Mathematics : Functions of Multiple Variables EE2 Mathematics : Fnctions of Mltiple Variables http://www2.imperial.ac.k/ nsjones These notes are not identical word-for-word with m lectres which will be gien on the blackboard. Some of these notes ma

More information

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty Machine Learning CS6375 --- Spring 2015 a Bayesian Learning (I) 1 Uncertainty Most real-world problems deal with uncertain information Diagnosis: Likely disease given observed symptoms Equipment repair:

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

1. Tractable and Intractable Computational Problems So far in the course we have seen many problems that have polynomial-time solutions; that is, on

1. Tractable and Intractable Computational Problems So far in the course we have seen many problems that have polynomial-time solutions; that is, on . Tractable and Intractable Comptational Problems So far in the corse we have seen many problems that have polynomial-time soltions; that is, on a problem instance of size n, the rnning time T (n) = O(n

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Donald Fussell. October 28, Computer Science Department The University of Texas at Austin. Point Masses and Force Fields.

Donald Fussell. October 28, Computer Science Department The University of Texas at Austin. Point Masses and Force Fields. s Vector Moving s and Coputer Science Departent The University of Texas at Austin October 28, 2014 s Vector Moving s Siple classical dynaics - point asses oved by forces Point asses can odel particles

More information

CS 4649/7649 Robot Intelligence: Planning

CS 4649/7649 Robot Intelligence: Planning CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides

More information

Bayesian Updating: Discrete Priors: Spring

Bayesian Updating: Discrete Priors: Spring Bayesian Updating: Discrete Priors: 18.05 Spring 2017 http://xkcd.com/1236/ Learning from experience Which treatment would you choose? 1. Treatment 1: cured 100% of patients in a trial. 2. Treatment 2:

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

Extended Intervened Geometric Distribution

Extended Intervened Geometric Distribution International Jornal of Statistical Distribtions Applications 6; (): 8- http://www.sciencepblishinggrop.co//isda Extended Intervened Geoetric Distribtion C. Satheesh Kar, S. Sreeaari Departent of Statistics,

More information

Momentum, p. Crash! Collisions (L8) Momentum is conserved. Football provides many collision examples to think about!

Momentum, p. Crash! Collisions (L8) Momentum is conserved. Football provides many collision examples to think about! Collisions (L8) Crash! collisions can be ery coplicated two objects bang into each other and exert strong forces oer short tie interals fortunately, een though we usually do not know the details of the

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J. Probability and Stochastic Processes: A Friendly Introduction for Electrical and oputer Engineers Roy D. Yates and David J. Goodan Proble Solutions : Yates and Goodan,1..3 1.3.1 1.4.6 1.4.7 1.4.8 1..6

More information

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying

More information

USEFUL HINTS FOR SOLVING PHYSICS OLYMPIAD PROBLEMS. By: Ian Blokland, Augustana Campus, University of Alberta

USEFUL HINTS FOR SOLVING PHYSICS OLYMPIAD PROBLEMS. By: Ian Blokland, Augustana Campus, University of Alberta 1 USEFUL HINTS FOR SOLVING PHYSICS OLYMPIAD PROBLEMS By: Ian Bloland, Augustana Capus, University of Alberta For: Physics Olypiad Weeend, April 6, 008, UofA Introduction: Physicists often attept to solve

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

Training an RBM: Contrastive Divergence. Sargur N. Srihari

Training an RBM: Contrastive Divergence. Sargur N. Srihari Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Bayes Nets for representing

Bayes Nets for representing Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures.

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

TESTING MEANS. we want to test. but we need to know if 2 1 = 2 2 if it is, we use the methods described last time pooled estimate of variance

TESTING MEANS. we want to test. but we need to know if 2 1 = 2 2 if it is, we use the methods described last time pooled estimate of variance Introdction to Statistics in Psychology PSY Profess Greg Francis Lectre 6 Hypothesis testing f two sample case Planning a replication stdy TESTING MENS we want to test H : µ µ H a : µ µ 6 bt we need to

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

ON THE LINIARIZATION OF EXPERIMENTAL HYSTERETIC LOOPS

ON THE LINIARIZATION OF EXPERIMENTAL HYSTERETIC LOOPS ON THE LINIARIZATION OF EXPERIMENTAL HYSTERETIC LOOPS TUDOR SIRETEANU 1, MARIUS GIUCLEA 1, OVIDIU SOLOMON In this paper is presented a linearization ethod developed on the basis of the experiental hysteresis

More information

4.2 First-Order Logic

4.2 First-Order Logic 64 First-Order Logic and Type Theory The problem can be seen in the two qestionable rles In the existential introdction, the term a has not yet been introdced into the derivation and its se can therefore

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

An Investigation into Estimating Type B Degrees of Freedom

An Investigation into Estimating Type B Degrees of Freedom An Investigation into Estimating Type B Degrees of H. Castrp President, Integrated Sciences Grop Jne, 00 Backgrond The degrees of freedom associated with an ncertainty estimate qantifies the amont of information

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Algorithms for Classification: The Basic Methods

Algorithms for Classification: The Basic Methods Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.

More information

Momentum, p = m v. Collisions and Work(L8) Crash! Momentum and Collisions. Conservation of Momentum. elastic collisions

Momentum, p = m v. Collisions and Work(L8) Crash! Momentum and Collisions. Conservation of Momentum. elastic collisions Collisions and Work(L8) Crash! collisions can be ery coplicated two objects bang into each other and exert strong forces oer short tie interals fortunately, een though we usually do not know the details

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact Physically Based Modeling CS 15-863 Notes Spring 1997 Particle Collision and Contact 1 Collisions with Springs Suppose we wanted to ipleent a particle siulator with a floor : a solid horizontal plane which

More information

Graphs and Networks Lecture 5. PageRank. Lecturer: Daniel A. Spielman September 20, 2007

Graphs and Networks Lecture 5. PageRank. Lecturer: Daniel A. Spielman September 20, 2007 Graphs and Networks Lectre 5 PageRank Lectrer: Daniel A. Spielman September 20, 2007 5.1 Intro to PageRank PageRank, the algorithm reportedly sed by Google, assigns a nmerical rank to eery web page. More

More information

Dealing with Uncertainty. CS 331: Artificial Intelligence Probability I. Outline. Random Variables. Random Variables.

Dealing with Uncertainty. CS 331: Artificial Intelligence Probability I. Outline. Random Variables. Random Variables. Dealing with Uncertainty CS 331: Artificial Intelligence Probability I We want to get to the point where we can reason with uncertainty This will require using probability e.g. probability that it will

More information

ma x = -bv x + F rod.

ma x = -bv x + F rod. Notes on Dynaical Systes Dynaics is the study of change. The priary ingredients of a dynaical syste are its state and its rule of change (also soeties called the dynaic). Dynaical systes can be continuous

More information

Bayes Nets III: Inference

Bayes Nets III: Inference 1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy

More information

Lesson 81: The Cross Product of Vectors

Lesson 81: The Cross Product of Vectors Lesson 8: The Cross Prodct of Vectors IBHL - SANTOWSKI In this lesson yo will learn how to find the cross prodct of two ectors how to find an orthogonal ector to a plane defined by two ectors how to find

More information

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes. CS 188 Spring 2013 Introduction to Artificial Intelligence Midterm II ˆ You have approximately 1 hour and 50 minutes. ˆ The exam is closed book, closed notes except a one-page crib sheet. ˆ Please use

More information

Reduction of over-determined systems of differential equations

Reduction of over-determined systems of differential equations Redction of oer-determined systems of differential eqations Maim Zaytse 1) 1, ) and Vyachesla Akkerman 1) Nclear Safety Institte, Rssian Academy of Sciences, Moscow, 115191 Rssia ) Department of Mechanical

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information