Probabilistic and Bayesian Analytics

Size: px
Start display at page:

Download "Probabilistic and Bayesian Analytics"

Transcription

1 This ersion of the Powerpoint presentation has been labeled to let you know which bits will by coered in the ain class (Tue/Thu orning lectures and which parts will be coered in the reiew sessions. Probabilistic and Bayesian Analytics Note to other teachers and users of these slides. Andrew would be delighted if you found this source aterial useful in giing your own lectures. Feel free to use these slides erbati, or to odify the to fit your own needs. PowerPoint originals are aailable. If you ake use of a significant portion of these slides in your own lecture, please include this essage, or the following link to the source repository of Andrew s tutorials: Coents and corrections gratefully receied. Andrew W. Moore Associate Professor School of Coputer Science Carnegie Mellon Uniersity aw@cs.cu.edu Copyright 2, Andrew W. Moore Aug 25th, 2 Probability The world is a ery uncertain place 3 years of Artificial Intelligence and Database research danced around this fact And then a few AI researchers decided to use soe ideas fro the eighteenth century Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2

2 What we re going to do We will reiew the fundaentals of probability. It s really going to be worth it In this lecture, you ll see an exaple of probabilistic analytics in action: Bayes Classifiers Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3 Discrete Rando Variables Reiew Session A is a Boolean-alued rando ariable if A denotes an eent, and there is soe degree of uncertainty as to whether A occurs. Exaples A The US president in 223 will be ale A ou wake up toorrow with a headache A ou hae Ebola Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4

3 Reiew Session Probabilities We write A as the fraction of possible worlds in which A is true We could at this point spend 2 hours on the philosophy of this. But we won t. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 5 Reiew Session Visualizing A Eent space of all possible worlds Worlds in which A is true A Area of reddish oal Its area is Worlds in which A is False Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 6

4 Reiew Session The Axios of Probability < A < True False A or B A + B - A and B Where do these axios coe fro? Were they discoered? Answers coing up later. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 7 Reiew Session Interpreting the axios < A < True False A or B A + B - A and B The area of A can t get any saller than And a zero area would ean no world could eer hae A true Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 8

5 Interpreting the axios < A < True False A or B A + B - A and B Reiew Session The area of A can t get any bigger than And an area of would ean all worlds will hae A true Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 9 Interpreting the axios < A < True False A or B A + B - A and B Reiew Session A B Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide

6 Interpreting the axios < A < True False A or B A + B - A and B Reiew Session A B A or B A and B B Siple addition and subtraction Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide These Axios are Not to be Trifled With There hae been attepts to do different ethodologies for uncertainty Fuzzy Logic Three-alued logic Depster-Shafer Non-onotonic reasoning Reiew Session But the axios of probability are the only syste with this property: If you gable using the you can t be unfairly exploited by an opponent using soe other syste [di Finetti 93] Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2

7 Theores fro the Axios < A <, True, False A or B A + B - A and B Fro these we can proe: not A ~A -A How? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3 Reiew Session Side Note I a inflicting these proofs on you for two reasons:. These kind of anipulations will need to be second nature to you if you use probabilistic analytics in depth 2. Suffering is good for you Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4

8 Another iportant theore < A <, True, False A or B A + B - A and B Fro these we can proe: A A ^ B + A ^ ~B How? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 5 Multialued Rando Variables Suppose A can take on ore than 2 alues A is a rando ariable with arity k if it can take on exactly one alue out of {, 2,.. k } Thus A P A A A if i A i j ( 2 k j Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 6

9 An easy fact about Multialued Rando Variables: Using the axios of probability < A <, True, False A or B A + B - A and B And assuing that A obeys A P A It s easy to proe that A A if i A i j ( 2 k j A A 2 A i A i j j Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 7 An easy fact about Multialued Rando Variables: Using the axios of probability < A <, True, False A or B A + B - A and B And assuing that A obeys A P A It s easy to proe that A A if i A i j ( 2 k j A A 2 A And thus we can proe k j A j i A i j j Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 8

10 Another fact about Multialued Rando Variables: Using the axios of probability < A <, True, False A or B A + B - A and B And assuing that A obeys A P A It s easy to proe that A A if i A i j ( 2 k j B [ A A 2 A ] i Reiew Session B A i j j Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 9 Another fact about Multialued Rando Variables: Using the axios of probability < A <, True, False A or B A + B - A and B And assuing that A obeys A P A It s easy to proe that A A if i A i j ( 2 k j B [ A A 2 A ] And thus we can proe B k j B A i B A i j j j Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2

11 Eleentary Probability in Pictures ~A + A Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2 Eleentary Probability in Pictures B B ^ A + B ^ ~A Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 22

12 Eleentary Probability in Pictures k j A j Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 23 Eleentary Probability in Pictures B k j B A j Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 24

13 Conditional Probability A B Fraction of worlds in which B is true that also hae A true H Hae a headache F Coing down with Flu F H / F /4 H F /2 H Headaches are rare and flu is rarer, but if you re coing down with flu there s a 5-5 chance you ll hae a headache. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 25 Conditional Probability F H F Fraction of flu-inflicted worlds in which you hae a headache H Hae a headache F Coing down with Flu H / F /4 H F /2 H #worlds with flu and headache #worlds with flu Area of H and F region Area of F region H ^ F F Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 26

14 Definition of Conditional Probability A ^ B A B B Corollary: The Chain Rule A ^ B A B B Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 27 Probabilistic Inference F H Hae a headache F Coing down with Flu H H / F /4 H F /2 One day you wake up with a headache. ou think: Drat! 5% of flus are associated with headaches so I ust hae a 5-5 chance of coing down with flu Is this reasoning good? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 28

15 Probabilistic Inference F H Hae a headache F Coing down with Flu H H / F /4 H F /2 F ^ H F H Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 29 What we just did A ^ B A B B B A A A This is Bayes Rule Bayes, Thoas (763 An essay towards soling a proble in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:37-48 Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3

16 Using Bayes Rule to Gable $. The Win enelope has a dollar and four beads in it The Lose enelope has three beads and no oney Triial question: soeone draws an enelope at rando and offers to sell it to you. How uch should you pay? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3 Using Bayes Rule to Gable $. The Win enelope has a dollar and four beads in it The Lose enelope has three beads and no oney Interesting question: before deciding, you are allowed to see one bead drawn fro the enelope. Suppose it s black: How uch should you pay? Suppose it s red: How uch should you pay? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 32

17 Calculation $. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 33 More General Fors of Bayes Rule B A A A B B A A + B ~ A ~ A B A X A X A B X B X Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 34

18 Reiew Session More General Fors of Bayes Rule A i B n A k B A A B A i A k i k Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 35 Useful Easy-to-proe facts A B + A B n A k A k B Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 36

19 The Joint Distribution Recipe for aking a joint distribution of M ariables: Exaple: Boolean ariables A, B, C Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 37 The Joint Distribution Recipe for aking a joint distribution of M ariables:. Make a truth table listing all cobinations of alues of your ariables (if there are M Boolean ariables then the table will hae 2 M rows. A Exaple: Boolean ariables A, B, C B C Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 38

20 The Joint Distribution Recipe for aking a joint distribution of M ariables:. Make a truth table listing all cobinations of alues of your ariables (if there are M Boolean ariables then the table will hae 2 M rows. 2. For each cobination of alues, say how probable it is. A Exaple: Boolean ariables A, B, C B C Prob Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 39 The Joint Distribution Recipe for aking a joint distribution of M ariables:. Make a truth table listing all cobinations of alues of your ariables (if there are M Boolean ariables then the table will hae 2 M rows. 2. For each cobination of alues, say how probable it is. 3. If you subscribe to the axios of probability, those nubers ust su to. A A Exaple: Boolean ariables A, B, C B.5 C Prob C.3 B. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4

21 Using the Joint One you hae the JD you can ask for the probability of any logical expression inoling your attribute E rows atching E row Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4 Using the Joint Poor Male.4654 E rows atching E row Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 42

22 Using the Joint Poor.764 E rows atching E row Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 43 Inference with the Joint E E 2 E E2 E 2 rows atching E and E rows atching E row 2 2 row Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 44

23 Inference with the Joint E E 2 E E2 E 2 rows atching E and E rows atching E row 2 2 row Male Poor.4654 / Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 45 Inference is a big deal I e got this eidence. What s the chance that this conclusion is true? I e got a sore neck: how likely a I to hae eningitis? I see y lights are out and it s 9p. What s the chance y spouse is already asleep? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 46

24 Inference is a big deal I e got this eidence. What s the chance that this conclusion is true? I e got a sore neck: how likely a I to hae eningitis? I see y lights are out and it s 9p. What s the chance y spouse is already asleep? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 47 Inference is a big deal I e got this eidence. What s the chance that this conclusion is true? I e got a sore neck: how likely a I to hae eningitis? I see y lights are out and it s 9p. What s the chance y spouse is already asleep? There s a thriing set of industries growing based around Bayesian Inference. Highlights are: Medicine, Phara, Help Desk Support, Engine Fault Diagnosis Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 48

25 Where do Joint Distributions coe fro? Idea One: Expert Huans Idea Two: Sipler probabilistic facts and soe algebra Exaple: Suppose you knew A.7 B A.2 B ~A. C A^B. C A^~B.8 C ~A^B.3 C ~A^~B. Then you can autoatically copute the JD using the chain rule Ax ^ By ^ Cz Cz Ax^ By By Ax Ax In another lecture: Bayes Nets, a systeatic way to do this. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 49 Where do Joint Distributions coe fro? Idea Three: Learn the fro data! Prepare to see one of the ost ipressie learning algoriths you ll coe across in the entire course. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 5

26 Learning a joint distribution Build a JD table for your attributes in which the probabilities are unspecified A B C Prob???????? Fraction of all records in which A and B are True but C is False The fill in each row with records atching row Pˆ (row total nuber of records A B C Prob Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 5 Exaple of Learning a Joint This Joint was obtained by learning fro three attributes in the UCI Adult Census Database [Kohai 995] Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 52

27 Where are we? We hae recalled the fundaentals of probability We hae becoe content with what JDs are and how to use the And we een know how to learn JDs fro data. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 53 Density Estiation Our Joint Distribution learner is our first exaple of soething called Density Estiation A Density Estiator learns a apping fro a set of attributes to a Probability Input Attributes Density Estiator Probability Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 54

28 Density Estiation Copare it against the two other ajor kinds of odels: Input Attributes Classifier Prediction of categorical output Input Attributes Density Estiator Probability Input Attributes Regressor Prediction of real-alued output Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 55 Ealuating Density Estiation Test-set criterion for estiating perforance on future data* * See the Decision Tree or Cross Validation lecture for ore detail Input Attributes Classifier Prediction of categorical output Test set Accuracy Input Attributes Density Estiator Probability? Input Attributes Regressor Prediction of real-alued output Test set Accuracy Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 56

29 Ealuating a density estiator Gien a record x, a density estiator M can tell you how likely the record is: Pˆ ( x M Gien a dataset with R records, a density estiator can tell you how likely the dataset is: (Under the assuption that all records were independently generated fro the Density Estiator s JD Pˆ (dataset M Pˆ( x R x2 K xr M Pˆ( xk M k Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 57 A sall dataset: Miles Per Gallon pg odelyear aker 92 Training Set Records good 75to78 asia bad 7to74 aerica bad 75to78 europe bad 7to74 aerica bad 7to74 aerica bad 7to74 asia bad 7to74 asia bad 75to78 aerica : : : : : : : : : bad 7to74 aerica good 79to83 aerica bad 75to78 aerica good 79to83 aerica bad 75to78 aerica good 79to83 aerica good 79to83 aerica bad 7to74 aerica good 75to78 europe bad 75to78 europe Fro the UCI repository (thanks to Ross Quinlan Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 58

30 A sall dataset: Miles Per Gallon pg odelyear aker 92 Training Set Records good 75to78 asia bad 7to74 aerica bad 75to78 europe bad 7to74 aerica bad 7to74 aerica bad 7to74 asia bad 7to74 asia bad 75to78 aerica : : : : : : : : : bad 7to74 aerica good 79to83 aerica bad 75to78 aerica good 79to83 aerica bad 75to78 aerica good 79to83 aerica good 79to83 aerica bad 7to74 aerica good 75to78 europe bad 75to78 europe Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 59 A sall dataset: Miles Per Gallon pg odelyear aker 92 Training Set Records Pˆ(dataset M good 75to78 asia bad 7to74 aerica bad 75to78 europe bad 7to74 aerica bad 7to74 aerica bad 7to74 asia bad 7to74 asia bad 75to78 aerica : : : : : : : : : bad 7to74 aerica good 79to83 aerica bad 75to78P ˆ( aerica x good 79to83 aerica bad 75to78 aerica good 79to83 aerica good 79to83 (in this aerica bad 7to74 aerica good 75to78 europe bad 75to78 europe R x2 K x M k -23 case 3.4 R Pˆ( xk M Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 6

31 Log Probabilities Since probabilities of datasets get so sall we usually use log probabilities log Pˆ(dataset M log R k R Pˆ( xk M log Pˆ( xk M k Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 6 A sall dataset: Miles Per Gallon pg odelyear aker 92 Training Set Records log Pˆ(dataset good 75to78 asia bad 7to74 aerica bad 75to78 europe bad 7to74 aerica bad 7to74 aerica bad 7to74 asia bad 7to74 asia bad 75to78 aerica : : : : : : : : : bad 7to74 aerica good 79to83 aerica bad M75to78 aerica log good 79to83 aerica bad 75to78 aerica 79to83 aerica good 79to83 (in this aerica bad 7to74 aerica good 75to78 europe bad 75to78 europe R Pˆ( xk M log Pˆ( xk M k k case R Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 62

32 Suary: The Good News We hae a way to learn a Density Estiator fro data. Density estiators can do any good things Can sort the records by probability, and thus spot weird records (anoaly detection Can do inference: E E2 Autoatic Doctor / Help Desk etc Ingredient for Bayes Classifiers (see later Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 63 Suary: The Bad News Density estiation by directly learning the joint is triial, indless and dangerous Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 64

33 Using a test set An independent test set with 96 cars has a worse log likelihood (actually it s a billion quintillion quintillion quintillion quintillion ties less likely.density estiators can oerfit. And the full joint density estiator is the oerfittiest of the all! Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 65 Oerfitting Density Estiators If this eer happens, it eans there are certain cobinations that we learn are ipossible R log Pˆ(testset M log Pˆ( xk M log Pˆ( xk M k k if for any k Pˆ( x M k R Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 66

34 Using a test set The only reason that our test set didn t score -infinity is that y code is hard-wired to always predict a probability of at least one in 2 We need Density Estiators that are less prone to oerfitting Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 67 Naïe Density Estiation The proble with the Joint Estiator is that it just irrors the training data. We need soething which generalizes ore usefully. The naïe odel generalizes strongly: Assue that each attribute is distributed independently of any of the other attributes. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 68

35 Independently Distributed Data Let x[i] denote the i th field of record x. The independently distributed assuption says that for any i,, u u 2 u i- u i+ u M x[ i] x[] u, x[2] u2, Kx[ i ] ui, x[ i + ] ui+, Kx[ M ] u x[ i] Or in other words, x[i] is independent of {x[],x[2],..x[i-], x[i+], x[m]} This is often written as x[ i] { x[], x[2], K x[ i ], x[ i + ], Kx[ M ]} M Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 69 A note about independence Assue A and B are Boolean Rando Variables. Then A and B are independent if and only if A B A A and B are independent is often notated as A B Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 7

36 Reiew Session Independence Theores Assue A B A Then A^B Assue A B A Then B A A B B Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 7 Reiew Session Independence Theores Assue A B A Then ~A B Assue A B A Then A ~B ~A A Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 72

37 Multialued Independence For ultialued Rando Variables A and B, A B if and only if u, : A u B A u fro which you can then proe things like u, : A u B A u B u, : B A B Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 73 Back to Naïe Density Estiation Let x[i] denote the i th field of record x: Naïe DE assues x[i] is independent of {x[],x[2],..x[i-], x[i+], x[m]} Exaple: Suppose that each record is generated by randoly shaking a green dice and a red dice Dataset : A red alue, B green alue Dataset 2: A red alue, B su of alues Dataset 3: A su of alues, B difference of alues Which of these datasets iolates the naïe assuption? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 74

38 Using the Naïe Distribution Once you hae a Naïe Distribution you can easily copute any row of the joint distribution. Suppose A, B, C and D are independently distributed. What is A^~B^C^~D? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 75 Using the Naïe Distribution Once you hae a Naïe Distribution you can easily copute any row of the joint distribution. Suppose A, B, C and D are independently distributed. What is A^~B^C^~D? A ~B^C^~D ~B^C^~D A ~B^C^~D A ~B C^~D C^~D A ~B C^~D A ~B C ~D ~D A ~B C ~D Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 76

39 Naïe Distribution General Case Suppose x[], x[2], x[m] are independently distributed. M, x[2] u2, x[ M ] um x[ k] u k k x[] u K So if we hae a Naïe Distribution we can construct any row of the iplied Joint Distribution on deand. So we can do any inference But how do we learn a Naïe Density Estiator? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 77 Learning a Naïe Density Estiator Pˆ ( x[ i] u # records in which x[ i] u total nuber of records Another triial learning algorith! Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 78

40 Joint DE Contrast Naïe DE Can odel anything No proble to odel C is a noisy copy of A Gien records and ore than 6 Boolean attributes will screw up badly Can odel only ery boring distributions Outside Naïe s scope Gien records and, ultialued attributes will be fine Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 79 Epirical Results: Hopeless The hopeless dataset consists of 4, records and 2 Boolean attributes called a,b,c, u. Each attribute in each record is generated 5-5 randoly as or. Reiew Session Aerage test set log probability during folds of k-fold cross-alidation* Described in a future Andrew lecture Despite the ast aount of data, Joint oerfits hopelessly and does uch worse Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 8

41 Epirical Results: Logical Reiew Session The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or. D A^~C, except that in % of records it is flipped The DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 8 Epirical Results: Logical Reiew Session The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or. D A^~C, except that in % of records it is flipped The DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 82

42 Epirical Results: MPG The MPG dataset consists of 392 records and 8 attributes Reiew Session A tiny part of the DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 83 Epirical Results: MPG The MPG dataset consists of 392 records and 8 attributes Reiew Session A tiny part of the DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 84

43 Epirical Results: Weight s. MPG Suppose we train only fro the Weight and MPG attributes Reiew Session The DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 85 Epirical Results: Weight s. MPG Suppose we train only fro the Weight and MPG attributes Reiew Session The DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 86

44 Weight s. MPG : The best that Naïe can do Reiew Session The DE learned by Joint The DE learned by Naie Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 87 Reinder: The Good News We hae two ways to learn a Density Estiator fro data. *In other lectures we ll see astly ore ipressie Density Estiators (Mixture Models, Bayesian Networks, Density Trees, Kernel Densities and any ore Density estiators can do any good things Anoaly detection Can do inference: E E2 Autoatic Doctor / Help Desk etc Ingredient for Bayes Classifiers Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 88

45 Bayes Classifiers A foridable and sworn eney of decision trees Input Attributes Classifier Prediction of categorical output DT BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 89 How to build a Bayes Classifier Assue you want to predict output which has arity n and alues, 2, ny. Assue there are input attributes called X, X 2, X Break dataset into n saller datasets called DS, DS 2, DS ny. Define DS i Records in which i For each DS i, learn Density Estiator M i to odel the input distribution aong the i records. Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 9

46 How to build a Bayes Classifier Assue you want to predict output which has arity n and alues, 2, ny. Assue there are input attributes called X, X 2, X Break dataset into n saller datasets called DS, DS 2, DS ny. Define DS i Records in which i For each DS i, learn Density Estiator M i to odel the input distribution aong the i records. M i estiates X, X 2, X i Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 9 How to build a Bayes Classifier Assue you want to predict output which has arity n and alues, 2, ny. Assue there are input attributes called X, X 2, X Break dataset into n saller datasets called DS, DS 2, DS ny. Define DS i Records in which i For each DS i, learn Density Estiator M i to odel the input distribution aong the i records. M i estiates X, X 2, X i Idea: When a new set of input alues (X u, X 2 u 2,. X u coe along to be ealuated predict the alue of that akes X, X 2, X i ost likely predict Is this a good idea? argax X u L X u Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 92

47 How to build a Bayes Classifier Assue you want to predict output which has arity n and alues, 2, ny. Assue there are input attributes This called is a Maxiu X, X 2, X Likelihood classifier. Break dataset into n saller datasets called DS, DS 2, DS ny. Define DS i Records in which It i can get silly if soe s are For each DS i, learn Density Estiator M i to odel the input ery unlikely distribution aong the i records. M i estiates X, X 2, X i Idea: When a new set of input alues (X u, X 2 u 2,. X u coe along to be ealuated predict the alue of that akes X, X 2, X i ost likely predict Is this a good idea? argax X u L X u Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 93 How to build a Bayes Classifier Assue you want to predict output which has arity n and alues, 2, ny. Assue there are input attributes called X, X 2, X Break dataset into n saller datasets called DS, DS 2, DS ny. Define DS i Records in which i Much Better Idea For each DS i, learn Density Estiator M i to odel the input distribution aong the i records. M i estiates X, X 2, X i Idea: When a new set of input alues (X u, X 2 u 2,. X u coe along to be ealuated predict the alue of that akes i X, X 2, X ost likely predict argax X u L X u Is this a good idea? Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 94

48 Terinology MLE (Maxiu Likelihood Estiator: predict argax X u L X u MAP (Maxiu A-Posteriori Estiator: predict argax X u L X u Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 95 Getting what we need predict argax X u L X u Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 96

49 Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 97 Getting a posterior probability n j j j P u X u X P P u X u X P u X u X P P u X u X P u X u X P ( ( ( ( ( ( ( ( L L L L L Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 98 Bayes Classifiers in a nutshell ( ( argax ( argax predict P u X u X P u X u X P L L. Learn the distribution oer inputs for each alue. 2. This gies X, X 2, X i. 3. Estiate i. as fraction of records with i. 4. For a new prediction:

50 Bayes Classifiers in a nutshell. Learn the distribution oer inputs for each alue. 2. This gies X, X 2, X i. 3. Estiate i. as fraction of records with i. 4. For a new prediction: We can use our faorite Density Estiator here. predict argax argax X u L X X u u L X Right now we hae two options: u Joint Density Estiator Naïe Density Estiator Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 99 Joint Density Bayes Classifier predict argax X u L X u In the case of the joint Bayes Classifier this degenerates to a ery siple rule: predict the ost coon alue of aong records in which X u, X 2 u 2,. X u. Note that if no records hae the exact set of inputs X u, X 2 u 2,. X u, then X, X 2, X i for all alues of. In that case we just hae to guess s alue Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide

51 Joint BC Results: Logical The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or. D A^~C, except that in % of records it is flipped The Classifier learned by Joint BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide Joint BC Results: All Irreleant The all irreleant dataset consists of 4, records and 5 Boolean attributes called a,b,c,d..o where a,b,c are generated 5-5 randoly as or. (output with probability.75, with prob.25 Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2

52 Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3 Naïe Bayes Classifier ( ( argax predict P u X u X P L In the case of the naie Bayes Classifier this can be siplified: n j j j u X P P predict ( ( argax Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4 Naïe Bayes Classifier ( ( argax predict P u X u X P L In the case of the naie Bayes Classifier this can be siplified: n j j j u X P P predict ( ( argax Technical Hint: If you hae, input attributes that product will underflow in floating point ath. ou should use logs: + n j j j u X P P predict ( log ( log argax

53 BC Results: XOR The XOR dataset consists of 4, records and 2 Boolean inputs called a and b, generated 5-5 randoly as or. c (output a XOR b The Classifier learned by Joint BC The Classifier learned by Naie BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 5 Naie BC Results: Logical Reiew Session The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or. D A^~C, except that in % of records it is flipped The Classifier learned by Naie BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 6

54 Naie BC Results: Logical Reiew Session The logical dataset consists of 4, records and 4 Boolean attributes called a,b,c,d where a,b,c are generated 5-5 randoly as or. D A^~C, except that in % of records it is flipped This result surprised Andrew until he had thought about it a little The Classifier learned by Joint BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 7 Naïe BC Results: All Irreleant The all irreleant dataset consists of 4, records and 5 Boolean attributes called a,b,c,d..o where a,b,c are generated 5-5 randoly as or. (output with probability.75, with prob.25 The Classifier learned by Naie BC Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 8

55 BC Results: MPG : 392 records Reiew Session The Classifier learned by Naie BC Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 9 BC Results: MPG : 4 records Reiew Session Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide

56 More Facts About Bayes Classifiers Many other density estiators can be slotted in*. Density estiation can be perfored with real-alued inputs* Bayes Classifiers can be built with real-alued inputs* Rather Technical Coplaint: Bayes Classifiers don t try to be axially discriinatie---they erely try to honestly odel what s going on* Zero probabilities are painful for Joint and Naïe. A hack (justifiable with the agic words Dirichlet Prior can help*. Naïe Bayes is wonderfully cheap. And suries, attributes cheerfully! *See future Andrew Lectures Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide Probability What you should know Fundaentals of Probability and Bayes Rule What s a Joint Distribution How to do inference (i.e. E E2 once you hae a JD Density Estiation What is DE and what is it good for How to learn a Joint DE How to learn a naïe DE Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 2

57 What you should know Bayes Classifiers How to build one How to predict with a BC Contrast between naïe and joint BCs Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 3 Interesting Questions Suppose you were ealuating NaieBC, JointBC, and Decision Trees Inent a proble where only NaieBC would do well Inent a proble where only Dtree would do well Inent a proble where only JointBC would do well Inent a proble where only NaieBC would do poorly Inent a proble where only Dtree would do poorly Inent a proble where only JointBC would do poorly Copyright 2, Andrew W. Moore Probabilistic Analytics: Slide 4

Probabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon University

Probabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon University robabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon Uniersity www.cs.cmu.edu/~awm/tutorials Discrete Random Variables A is a Boolean-alued random ariable if A denotes

More information

Probabilistic and Bayesian Analytics

Probabilistic and Bayesian Analytics Probabilistic and Bayesian Analytics Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these

More information

Probabilistic and Bayesian Learning

Probabilistic and Bayesian Learning Probabilistic and Bayesian Learning Note to other teachers and sers of these slides. Andrew wold be delighted if yo fond this sorce aterial sefl in giing yor own lectres. Feel free to se these slides erbati,

More information

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com Probability Basics These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Gaussians. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Gaussians. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. Note to other teachers and users of these slides. Andrew would be delighted if you found this source aterial useful in giing your own lectures. Feel free to use these slides erbati, or to odify the to

More information

Sources of Uncertainty

Sources of Uncertainty Probability Basics Sources of Uncertainty The world is a very uncertain place... Uncertain inputs Missing data Noisy data Uncertain knowledge Mul>ple causes lead to mul>ple effects Incomplete enumera>on

More information

Bayes Nets for representing and reasoning about uncertainty

Bayes Nets for representing and reasoning about uncertainty Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. ndrew would be delighted if you found this source material useful in giing your own lectures.

More information

CS 331: Artificial Intelligence Naïve Bayes. Naïve Bayes

CS 331: Artificial Intelligence Naïve Bayes. Naïve Bayes CS 33: Artificial Intelligence Naïe Bayes Thanks to Andrew Moore for soe corse aterial Naïe Bayes A special type of Bayesian network Makes a conditional independence assption Typically sed for classification

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 06: ayes Theorem Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) Andrew Moore (CMU/Google) 1 Naïve ayes Classifier

More information

One Dimensional Collisions

One Dimensional Collisions One Diensional Collisions These notes will discuss a few different cases of collisions in one diension, arying the relatie ass of the objects and considering particular cases of who s oing. Along the way,

More information

A [somewhat] Quick Overview of Probability. Shannon Quinn CSCI 6900

A [somewhat] Quick Overview of Probability. Shannon Quinn CSCI 6900 A [somewhat] Quick Overview of Probability Shannon Quinn CSCI 6900 [Some material pilfered from http://www.cs.cmu.edu/~awm/tutorials] Probabilistic and Bayesian Analytics Note to other teachers and users

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci. Soft Computing Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Bayes and Naïve Bayes Classifiers CS434

Bayes and Naïve Bayes Classifiers CS434 Bayes and Naïve Bayes Classifiers CS434 In this lectre 1. Review some basic probability concepts 2. Introdce a sefl probabilistic rle - Bayes rle 3. Introdce the learning algorithm based on Bayes rle (ths

More information

Bayes Nets for representing and reasoning about uncertainty

Bayes Nets for representing and reasoning about uncertainty Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. ndrew would be delighted if you found this source material useful in giving your own lectures.

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields Finite fields I talked in class about the field with two eleents F 2 = {, } and we ve used it in various eaples and hoework probles. In these notes I will introduce ore finite fields F p = {,,...,p } for

More information

CS 4649/7649 Robot Intelligence: Planning

CS 4649/7649 Robot Intelligence: Planning CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

CS 331: Artificial Intelligence Probability I. Dealing with Uncertainty

CS 331: Artificial Intelligence Probability I. Dealing with Uncertainty CS 331: Artificial Intelligence Probability I Thanks to Andrew Moore for some course material 1 Dealing with Uncertainty We want to get to the point where we can reason with uncertainty This will require

More information

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J. Probability and Stochastic Processes: A Friendly Introduction for Electrical and oputer Engineers Roy D. Yates and David J. Goodan Proble Solutions : Yates and Goodan,1..3 1.3.1 1.4.6 1.4.7 1.4.8 1..6

More information

Probability review. Adopted from notes of Andrew W. Moore and Eric Xing from CMU. Copyright Andrew W. Moore Slide 1

Probability review. Adopted from notes of Andrew W. Moore and Eric Xing from CMU. Copyright Andrew W. Moore Slide 1 robablty reew dopted from notes of ndrew W. Moore and Erc Xng from CMU Copyrght ndrew W. Moore Slde So far our classfers are determnstc! For a gen X, the classfers we learned so far ge a sngle predcted

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

VC-dimension for characterizing classifiers

VC-dimension for characterizing classifiers VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1 Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

NB1140: Physics 1A - Classical mechanics and Thermodynamics Problem set 2 - Forces and energy Week 2: November 2016

NB1140: Physics 1A - Classical mechanics and Thermodynamics Problem set 2 - Forces and energy Week 2: November 2016 NB1140: Physics 1A - Classical echanics and Therodynaics Proble set 2 - Forces and energy Week 2: 21-25 Noveber 2016 Proble 1. Why force is transitted uniforly through a assless string, a assless spring,

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Note-A-Rific: Mechanical

Note-A-Rific: Mechanical Note-A-Rific: Mechanical Kinetic You ve probably heard of inetic energy in previous courses using the following definition and forula Any object that is oving has inetic energy. E ½ v 2 E inetic energy

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

A nonstandard cubic equation

A nonstandard cubic equation MATH-Jan-05-0 A nonstandard cubic euation J S Markoitch PO Box West Brattleboro, VT 050 Dated: January, 05 A nonstandard cubic euation is shown to hae an unusually econoical solution, this solution incorporates

More information

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one

More information

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty Machine Learning CS6375 --- Spring 2015 a Bayesian Learning (I) 1 Uncertainty Most real-world problems deal with uncertain information Diagnosis: Likely disease given observed symptoms Equipment repair:

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

16 Independence Definitions Potential Pitfall Alternative Formulation. mcs-ftl 2010/9/8 0:40 page 431 #437

16 Independence Definitions Potential Pitfall Alternative Formulation. mcs-ftl 2010/9/8 0:40 page 431 #437 cs-ftl 010/9/8 0:40 page 431 #437 16 Independence 16.1 efinitions Suppose that we flip two fair coins siultaneously on opposite sides of a roo. Intuitively, the way one coin lands does not affect the way

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Some Perspective. Forces and Newton s Laws

Some Perspective. Forces and Newton s Laws Soe Perspective The language of Kineatics provides us with an efficient ethod for describing the otion of aterial objects, and we ll continue to ake refineents to it as we introduce additional types of

More information

What is Probability? (again)

What is Probability? (again) INRODUCTION TO ROBBILITY Basic Concepts and Definitions n experient is any process that generates well-defined outcoes. Experient: Record an age Experient: Toss a die Experient: Record an opinion yes,

More information

26 Impulse and Momentum

26 Impulse and Momentum 6 Ipulse and Moentu First, a Few More Words on Work and Energy, for Coparison Purposes Iagine a gigantic air hockey table with a whole bunch of pucks of various asses, none of which experiences any friction

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Derivative at a point

Derivative at a point Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Derivative at a point Wat you need to know already: Te concept of liit and basic etods for coputing liits. Wat you can

More information

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers Ocean 40 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers 1. Hydrostatic Balance a) Set all of the levels on one of the coluns to the lowest possible density.

More information

VC-dimension for characterizing classifiers

VC-dimension for characterizing classifiers VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

Bayes Nets for representing

Bayes Nets for representing Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures.

More information

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1. Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Lesson 24: Newton's Second Law (Motion)

Lesson 24: Newton's Second Law (Motion) Lesson 24: Newton's Second Law (Motion) To really appreciate Newton s Laws, it soeties helps to see how they build on each other. The First Law describes what will happen if there is no net force. The

More information

A Note on A Solution Method to the Problem. Proposed by Wang in Voting Systems

A Note on A Solution Method to the Problem. Proposed by Wang in Voting Systems Applied Matheatical Sciences, Vol. 5, 0, no. 6, 305 3055 A Note on A Solution Method to the Proble Proposed by Wang in Voting Systes F. Hosseinzadeh Lotfi Departent of Matheatics, Science and Research

More information

Donald Fussell. October 28, Computer Science Department The University of Texas at Austin. Point Masses and Force Fields.

Donald Fussell. October 28, Computer Science Department The University of Texas at Austin. Point Masses and Force Fields. s Vector Moving s and Coputer Science Departent The University of Texas at Austin October 28, 2014 s Vector Moving s Siple classical dynaics - point asses oved by forces Point asses can odel particles

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

Basic Probability and Statistics

Basic Probability and Statistics Basic Probability and Statistics Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Jerry Zhu, Mark Craven] slide 1 Reasoning with Uncertainty

More information

Birthday Paradox Calculations and Approximation

Birthday Paradox Calculations and Approximation Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Dealing with Uncertainty. CS 331: Artificial Intelligence Probability I. Outline. Random Variables. Random Variables.

Dealing with Uncertainty. CS 331: Artificial Intelligence Probability I. Outline. Random Variables. Random Variables. Dealing with Uncertainty CS 331: Artificial Intelligence Probability I We want to get to the point where we can reason with uncertainty This will require using probability e.g. probability that it will

More information

Will Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning

Will Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning Will Monroe August 9, 07 with aterials by Mehran Sahai and Chris Piech iage: Arito Paraeter learning Announceent: Proble Set #6 Goes out tonight. Due the last day of class, Wednesday, August 6 (before

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networs Brain s. Coputer Designed to sole logic and arithetic probles Can sole a gazillion arithetic and logic probles in an hour absolute precision Usually one ery fast procesor high

More information

Standard & Canonical Forms

Standard & Canonical Forms Standard & Canonical Fors CHAPTER OBJECTIVES Learn Binary Logic and BOOLEAN AlgebraLearn How to ap a Boolean Expression into Logic Circuit Ipleentation Learn How To anipulate Boolean Expressions and Siplify

More information

Kinetic Molecular Theory of Ideal Gases

Kinetic Molecular Theory of Ideal Gases Lecture -3. Kinetic Molecular Theory of Ideal Gases Last Lecture. IGL is a purely epirical law - solely the consequence of experiental obserations Explains the behaior of gases oer a liited range of conditions.

More information

SF Chemical Kinetics.

SF Chemical Kinetics. SF Cheical Kinetics. Lecture 5. Microscopic theory of cheical reaction inetics. Microscopic theories of cheical reaction inetics. basic ai is to calculate the rate constant for a cheical reaction fro first

More information

Introduction to Optimization Techniques. Nonlinear Programming

Introduction to Optimization Techniques. Nonlinear Programming Introduction to Optiization echniques Nonlinear Prograing Optial Solutions Consider the optiization proble in f ( x) where F R n xf Definition : x F is optial (global iniu) for this proble, if f( x ) f(

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng

EE5900 Spring Lecture 4 IC interconnect modeling methods Zhuo Feng EE59 Spring Parallel LSI AD Algoriths Lecture I interconnect odeling ethods Zhuo Feng. Z. Feng MTU EE59 So far we ve considered only tie doain analyses We ll soon see that it is soeties preferable to odel

More information