Probabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon University

Size: px
Start display at page:

Download "Probabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon University"

Transcription

1 robabilistic and Bayesian Analytics Based on a Tutorial by Andrew W. Moore, Carnegie Mellon Uniersity

2 Discrete Random Variables A is a Boolean-alued random ariable if A denotes an eent, and there is some degree of uncertainty as to whether A occurs. Examples A The US president in 223 will be male A You wake up tomorrow with a headache A You hae Ebola robabilistic Analytics: Slide 2

3 robabilities We write A as the fraction of possible worlds in which A is true We could at this point spend 2 hours on the philosophy of this. But we won t. robabilistic Analytics: Slide 3

4 Visualizing A Eent space of all possible worlds Worlds in which A is true A Area of reddish oal Its area is Worlds in which A is False robabilistic Analytics: Slide 4

5 The Axioms of robability < A < True False A or B A + B - A and B Where do these axioms come from? Were they discoered? Answers coming up later. robabilistic Analytics: Slide 5

6 Interpreting the axioms < A < True False A or B A + B - A and B The area of A can t get any smaller than And a zero area would mean no world could eer hae A true robabilistic Analytics: Slide 6

7 Interpreting the axioms < A < True False A or B A + B - A and B The area of A can t get any bigger than And an area of would mean all worlds will hae A true robabilistic Analytics: Slide 7

8 Interpreting the axioms < A < True False A or B A + B - A and B A B robabilistic Analytics: Slide 8

9 Interpreting the axioms < A < True False A or B A + B - A and B A A or B B A and B B Simple addition and subtraction robabilistic Analytics: Slide 9

10 These Axioms are Not to be Trifled With There hae been attempts to do different methodologies for uncertainty Fuzzy Logic Three-alued logic Dempster-Shafer Non-monotonic reasoning But the axioms of probability are the only system with this property: If you gamble using them you can t be unfairly exploited by an opponent using some other system [di Finetti 93] robabilistic Analytics: Slide

11 Theorems from the Axioms < A <, True, False A or B A + B - A and B From these we can proe: not A ~A -A How? robabilistic Analytics: Slide

12 Another important theorem < A <, True, False A or B A + B - A and B From these we can proe: A A ^ B + A ^ ~B How? robabilistic Analytics: Slide 2

13 Multialued Random Variables Suppose A can take on more than 2 alues A is a random ariable with arity k if it can take on exactly one alue out of {, 2,.. k } Thus A A i " A j if i!! A! A 2 k j robabilistic Analytics: Slide 3

14 An easy fact about Multialued Random Variables: Using the axioms of probability < A <, True, False A or B A + B - A and B And assuming that A obeys A A It s easy to proe that i " A j if i!! A! A 2 k A j " A 2 " A! i A i j j robabilistic Analytics: Slide 4

15 An easy fact about Multialued Random Variables: Using the axioms of probability < A <, True, False A or B A + B - A and B And assuming that A obeys A A It s easy to proe that i " A j if i!! A! A 2 k A And thus we can proe j " A 2 " A! k! j A j i A i j j robabilistic Analytics: Slide 5

16 Another fact about Multialued Random Variables: Using the axioms of probability < A <, True, False A or B A + B - A and B And assuming that A obeys A A It s easy to proe that B i " A j if i!! A! A 2 k "[ A j # A 2 # A ]! i B " A i j j robabilistic Analytics: Slide 6

17 robabilistic Analytics: Slide 7 Another fact about Multialued Random Variables: Using the axioms of probability < A <, True, False A or B A + B - A and B And assuming that A obeys It s easy to proe that j i A A j i! " if 2!! k A A A ] [ 2! " # # " i j i j A B A A A B And thus we can proe! " k j j A B B

18 Elementary robability in ictures ~A + A robabilistic Analytics: Slide 8

19 Elementary robability in ictures B B ^ A + B ^ ~A robabilistic Analytics: Slide 9

20 Elementary robability in ictures k! j A j robabilistic Analytics: Slide 2

21 Elementary robability in ictures B k! j B " A j robabilistic Analytics: Slide 2

22 Definition of Conditional robability A ^ B A B B Corollary: The Chain Rule A ^ B A B B robabilistic Analytics: Slide 22

23 robabilistic Inference F H Hae a headache F Coming down with Flu H H / F /4 H F /2 One day you wake up with a headache. You think: Drat! 5% of flus are associated with headaches so I must hae a 5-5 chance of coming down with flu Is this reasoning good? robabilistic Analytics: Slide 23

24 Bayes Rule A ^ B A B B B A A A This is Bayes Rule Bayes, Thomas 763 An essay towards soling a problem in the doctrine of chances. hilosophical Transactions of the Royal Society of London, 53:37-48 robabilistic Analytics: Slide 24

25 robabilistic Analytics: Slide 25 More General Forms of Bayes Rule ~ ~ A A B A A B A A B A B + X B X A X A B X A B!!!!

26 robabilistic Analytics: Slide 26 More General Forms of Bayes Rule! n A k k k i i i A A B A A B B A

27 Useful Easy-to-proe facts A B + A B n A! A B k k robabilistic Analytics: Slide 27

28 The Joint Distribution Recipe for making a joint distribution of M ariables: Example: Boolean ariables A, B, C robabilistic Analytics: Slide 28

29 The Joint Distribution Recipe for making a joint distribution of M ariables: A B Example: Boolean ariables A, B, C C. Make a truth table listing all combinations of alues of your ariables if there are M Boolean ariables then the table will hae 2 M rows. robabilistic Analytics: Slide 29

30 The Joint Distribution Recipe for making a joint distribution of M ariables: A B Example: Boolean ariables A, B, C C rob.3.5. Make a truth table listing all combinations of alues of your ariables if there are M Boolean ariables then the table will hae 2 M rows. 2. For each combination of alues, say how probable it is robabilistic Analytics: Slide 3

31 The Joint Distribution Recipe for making a joint distribution of M ariables: A B Example: Boolean ariables A, B, C C rob.3.5. Make a truth table listing all combinations of alues of your ariables if there are M Boolean ariables then the table will hae 2 M rows. 2. For each combination of alues, say how probable it is. 3. If you subscribe to the axioms of probability, those numbers must sum to. A C.3 B. robabilistic Analytics: Slide 3

32 Using the Joint One you hae the JD you can ask for the probability of any logical expression inoling your attribute E! rows matching E row robabilistic Analytics: Slide 32

33 Using the Joint oor Male.4654 E! rows matching E row robabilistic Analytics: Slide 33

34 Using the Joint oor.764 E! rows matching E row robabilistic Analytics: Slide 34

35 Inference with the Joint E E 2 E " E 2 E 2! rows matching E! rows matching E row and E 2 2 row robabilistic Analytics: Slide 35

36 Inference with the Joint E E 2 E " E 2 E 2! rows matching E! rows matching E row and E 2 2 row Male oor.4654 / robabilistic Analytics: Slide 36

37 Inference is a big deal I e got this eidence. What s the chance that this conclusion is true? I e got a sore neck: how likely am I to hae meningitis? I see my lights are out and it s 9pm. What s the chance my spouse is already asleep? robabilistic Analytics: Slide 37

38 Inference is a big deal I e got this eidence. What s the chance that this conclusion is true? I e got a sore neck: how likely am I to hae meningitis? I see my lights are out and it s 9pm. What s the chance my spouse is already asleep? robabilistic Analytics: Slide 38

39 Inference is a big deal I e got this eidence. What s the chance that this conclusion is true? I e got a sore neck: how likely am I to hae meningitis? I see my lights are out and it s 9pm. What s the chance my spouse is already asleep? There s a thriing set of industries growing based around Bayesian Inference. Highlights are: Medicine, harma, Help Desk Support, Engine Fault Diagnosis robabilistic Analytics: Slide 39

40 Where do Joint Distributions come from? Idea One: Expert Humans Idea Two: Simpler probabilistic facts and some algebra Example: Suppose you knew A.7 B A.2 B ~A. C A^B. C A^~B.8 C ~A^B.3 C ~A^~B. Then you can automatically compute the JD using the chain rule Ax ^ By ^ Cz Cz Ax^ By By Ax Ax In another lecture: Bayes Nets, a systematic way to do this. robabilistic Analytics: Slide 4

41 Where do Joint Distributions come from? Idea Three: Learn them from data! repare to see one of the most impressie learning algorithms you ll come across in the entire course. robabilistic Analytics: Slide 4

42 Learning a joint distribution Build a JD table for your attributes in which the probabilities are unspecified A B C rob The fill in each row with ˆ row records matching row total number of records?? A B C rob?.3?.5?.?.5?.5?..25 Fraction of all records in which A and B are True but C is False. robabilistic Analytics: Slide 42

43 Example of Learning a Joint This Joint was obtained by learning from three attributes in the UCI Adult Census Database [Kohai 995] robabilistic Analytics: Slide 43

44 Where are we? We hae recalled the fundamentals of probability We hae become content with what JDs are and how to use them And we een know how to learn JDs from data. robabilistic Analytics: Slide 44

45 Density Estimation Our Joint Distribution learner is our first example of something called Density Estimation A Density Estimator learns a mapping from a set of attributes to a robability Input Attributes Density Estimator robability robabilistic Analytics: Slide 45

46 Density Estimation Compare it against the two other major kinds of models: Input Attributes Classifier rediction of categorical output Input Attributes Density Estimator robability Input Attributes Regressor rediction of real-alued output robabilistic Analytics: Slide 46

47 Ealuating Density Estimation Test-set criterion for estimating performance on future data* * See the Decision Tree or Cross Validation lecture for more detail Input Attributes Classifier rediction of categorical output Test set Accuracy Input Attributes Density Estimator robability? Input Attributes Regressor rediction of real-alued output Test set Accuracy robabilistic Analytics: Slide 47

48 Ealuating a density estimator Gien a record x, a density estimator M can tell you how likely the record is: Gien a dataset with R records, a density estimator can tell you how likely the dataset is: Under the assumption that all records were independently generated from the Density Estimator s R JD ˆ dataset M ˆ x ˆ x M ˆ! " x2 K" xr M xk M k robabilistic Analytics: Slide 48

49 A small dataset: Miles er Gallon mpg modelyear maker 92 Training Set Records good 75to78 asia bad 7to74 america bad 75to78 europe bad 7to74 america bad 7to74 america bad 7to74 asia bad 7to74 asia bad 75to78 america : : : : : : : : : bad 7to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 7to74 america good 75to78 europe bad 75to78 europe From the UCI repository thanks to Ross Quinlan robabilistic Analytics: Slide 49

50 A small dataset: Miles er Gallon mpg modelyear maker 92 Training Set Records good 75to78 asia bad 7to74 america bad 75to78 europe bad 7to74 america bad 7to74 america bad 7to74 asia bad 7to74 asia bad 75to78 america : : : : : : : : : bad 7to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 7to74 america good 75to78 europe bad 75to78 europe robabilistic Analytics: Slide 5

51 A small dataset: Miles er Gallon mpg modelyear maker 92 Training Set Records ˆdataset M good 75to78 asia bad 7to74 america bad 75to78 europe bad 7to74 america bad 7to74 america bad 7to74 asia bad 7to74 asia bad 75to78 america : : : : : : : : : bad 7to74 america good 79to83 america bad 75to78 ˆ america x good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 7to74 america good 75to78 europe bad 75to78 europe R " x2 K" x M # k -23 in this case R ˆ xk M 3.4! robabilistic Analytics: Slide 5

52 Log robabilities Since probabilities of datasets get so small we usually use log probabilities log ˆdataset M log R " k R ˆ xk M log ˆ xk M! k robabilistic Analytics: Slide 52

53 A small dataset: Miles er Gallon mpg modelyear maker 92 Training Set Records log ˆdataset M good 75to78 asia bad 7to74 america bad 75to78 europe bad 7to74 america bad 7to74 america bad 7to74 asia bad 7to74 asia bad 75to78 america : : : : : : : : : bad 7to74 america good 79to83 america log# ˆ x " k M log ˆ xk M bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america in this case bad 7to74 america good 75to78 europe bad 75to78 europe R k! R k robabilistic Analytics: Slide 53

54 Summary: The Good News We hae a way to learn a Density Estimator from data. Density estimators can do many good things Can sort the records by probability, and thus spot weird records anomaly detection Can do inference: E E2 Automatic Doctor / Help Desk etc Ingredient for Bayes Classifiers see later robabilistic Analytics: Slide 54

55 Summary: The Bad News Density estimation by directly learning the joint is triial, mindless and dangerous robabilistic Analytics: Slide 55

56 Using a test set An independent test set with 96 cars has a worse log likelihood actually it s a billion quintillion quintillion quintillion quintillion times less likely.density estimators can oerfit. And the full joint density estimator is the oerfittiest of them all! robabilistic Analytics: Slide 56

57 Oerfitting Density Estimators If this eer happens, it means there are certain combinations that we learn are impossible log ˆtestset M "! log$ ˆ x M # k log if R k for any k ˆ x k M R k ˆ x k M robabilistic Analytics: Slide 57

58 Using a test set The only reason that our test set didn t score -infinity is that my code is hard-wired to always predict a probability of at least one in 2 We need Density Estimators that are less prone to oerfitting robabilistic Analytics: Slide 58

59 Naïe Density Estimation The problem with the Joint Estimator is that it just mirrors the training data. We need something which generalizes more usefully. The naïe model generalizes strongly: Assume that each attribute is distributed independently of any of the other attributes. robabilistic Analytics: Slide 59

60 Independently Distributed Data Let x[i] denote the i th field of record x. The independently distributed assumption says that for any i,, u u 2 u i- u i+ u M x[ i] x[] u, x[2] u2, Kx[ i! ] ui!, x[ i + ] ui+, Kx[ M ] x[ i] Or in other words, x[i] is independent of {x[],x[2],..x[i-], x[i+], x[m]} This is often written as x[ i] " { x[], x[2], K x[ i! ], x[ i + ], Kx[ M ]} u M robabilistic Analytics: Slide 6

61 A note about independence Assume A and B are Boolean Random Variables. Then if and only if A and B are independent A B A A and B are independent is often notated as A! B robabilistic Analytics: Slide 6

62 Independence Theorems Assume A B A Then A^B Assume A B A Then B A A B B robabilistic Analytics: Slide 62

63 Independence Theorems Assume A B A Then ~A B Assume A B A Then A ~B ~A A robabilistic Analytics: Slide 63

64 robabilistic Analytics: Slide 64 Multialued Independence For multialued Random Variables A and B, A! B if and only if :, u A B u A u! from which you can then proe things like :, B u A B u A u! " :, B A B u!

65 Using the Naïe Distribution Once you hae a Naïe Distribution you can easily compute any row of the joint distribution. Suppose A, B, C and D are independently distributed. What is A^~B^C^~D? robabilistic Analytics: Slide 65

66 Using the Naïe Distribution Once you hae a Naïe Distribution you can easily compute any row of the joint distribution. Suppose A, B, C and D are independently distributed. What is A^~B^C^~D? A ~B^C^~D ~B^C^~D A ~B^C^~D A ~B C^~D C^~D A ~B C^~D A ~B C ~D ~D A ~B C ~D robabilistic Analytics: Slide 66

67 Naïe Distribution General Case Suppose x[], x[2], x[m] are independently distributed. M!, x[2] u2, x[ M ] um x[ k] u k k x[] u K So if we hae a Naïe Distribution we can construct any row of the implied Joint Distribution on demand. So we can do any inference But how do we learn a Naïe Density Estimator? robabilistic Analytics: Slide 67

68 Learning a Naïe Density Estimator ˆ x[ i] u # records in which x[ i] u total number of records Another triial learning algorithm! robabilistic Analytics: Slide 68

69 Contrast Joint DE Naïe DE Can model anything No problem to model C is a noisy copy of A Gien records and more than 6 Boolean attributes will screw up badly Can model only ery boring distributions Outside Naïe s scope Gien records and, multialued attributes will be fine robabilistic Analytics: Slide 69

70 Reminder: The Good News We hae two ways to learn a Density Estimator from data. *In other lectures we ll see astly more impressie Density Estimators Mixture Models, Bayesian Networks, Density Trees, Kernel Densities and many more Density estimators can do many good things Anomaly detection Can do inference: E E2 Automatic Doctor / Help Desk etc Ingredient for Bayes Classifiers robabilistic Analytics: Slide 7

71 How to build a Bayes Classifier Assume you want to predict output Y which has arity n Y and alues, 2, ny. Assume there are m input attributes called X, X 2, X m Break dataset into n Y smaller datasets called DS, DS 2, DS ny. Define DS i Records in which Y i For each DS i, learn Density Estimator M i to model the input distribution among the Y i records. robabilistic Analytics: Slide 7

72 How to build a Bayes Classifier Assume you want to predict output Y which has arity n Y and alues, 2, ny. Assume there are m input attributes called X, X 2, X m Break dataset into n Y smaller datasets called DS, DS 2, DS ny. Define DS i Records in which Y i For each DS i, learn Density Estimator M i to model the input distribution among the Y i records. M i estimates X, X 2, X m Y i robabilistic Analytics: Slide 72

73 How to build a Bayes Classifier Assume you want to predict output Y which has arity n Y and alues, 2, ny. Assume there are m input attributes called X, X 2, X m Break dataset into n Y smaller datasets called DS, DS 2, DS ny. Define DS i Records in which Y i For each DS i, learn Density Estimator M i to model the input distribution among the Y i records. M i estimates X, X 2, X m Y i Idea: When a new set of input alues X u, X 2 u 2,. X m u m come along to be ealuated predict the alue of Y that makes X, X 2, X m Y i most likely Y predict Is this a good idea? argmax X u L X m u m Y robabilistic Analytics: Slide 73

74 How to build a Bayes Classifier Assume you want to predict output Y which has arity n Y and alues, 2, ny. Assume there are m input attributes called This X is, X a 2, Maximum X m Likelihood classifier. Break dataset into n Y smaller datasets called DS, DS 2, DS ny. Define DS i Records in which Y i It can get silly if some Ys For each DS i, learn Density Estimator M i to model the input are ery unlikely distribution among the Y i records. M i estimates X, X 2, X m Y i Idea: When a new set of input alues X u, X 2 u 2,. X m u m come along to be ealuated predict the alue of Y that makes X, X 2, X m Y i most likely Y predict Is this a good idea? argmax X u L X m u m Y robabilistic Analytics: Slide 74

75 How to build a Bayes Classifier Assume you want to predict output Y which has arity n Y and alues, 2, ny. Assume there are m input attributes called X, X 2, X m Break dataset into n Y smaller datasets called DS, DS 2, DS ny. Define DS i Records in which Y i Much Better Idea For each DS i, learn Density Estimator M i to model the input distribution among the Y i records. M i estimates X, X 2, X m Y i Idea: When a new set of input alues X u, X 2 u 2,. X m u m come along to be ealuated predict the alue of Y that makes Y i X, X 2, X m most likely predict Y argmax Y X u L X m u Is this a good idea? m robabilistic Analytics: Slide 75

76 Terminology MLE Maximum Likelihood Estimator: Y predict argmax X u L X m u m Y MA Maximum A-osteriori Estimator: predict Y argmax Y X u L X m u m robabilistic Analytics: Slide 76

77 Getting what we need predict Y argmax Y X u L X m u m robabilistic Analytics: Slide 77

78 robabilistic Analytics: Slide 78 Getting a posterior probability! n Y j j j m m m m m m m m m m Y Y u X u X Y Y u X u X u X u X Y Y u X u X u X u X Y L L L L L

79 robabilistic Analytics: Slide 79 Bayes Classifiers in a nutshell argmax argmax predict Y Y u X u X u X u X Y Y m m m m L L. Learn the distribution oer inputs for each alue Y. 2. This gies X, X 2, X m Y i. 3. Estimate Y i. as fraction of records with Y i. 4. For a new prediction:

80 Bayes Classifiers in a nutshell. Learn the distribution oer inputs for each alue Y. 2. This gies X, X 2, X m Y i. 3. Estimate Y i. as fraction of records with Y i. We can use our faorite 4. For a new prediction: Density Estimator here. Y predict argmax Y argmax X u L X m X u m u Right now m we hae m two Y L X options: Y u Joint Density Estimator Naïe Density Estimator robabilistic Analytics: Slide 8

81 Joint Density Bayes Classifier Y predict argmax X u L X m u m Y In the case of the joint Bayes Classifier this degenerates to a ery simple rule: Y Y predict the most common alue of Y among records in which X u, X 2 u 2,. X m u m. Note that if no records hae the exact set of inputs X u, X 2 u 2,. X m u m, then X, X 2, X m Y i for all alues of Y. In that case we just hae to guess Y s alue robabilistic Analytics: Slide 8

82 robabilistic Analytics: Slide 82 Naïe Bayes Classifier argmax predict Y Y u X u X Y m m L In the case of the naie Bayes Classifier this can be simplified:! n Y j j j Y u X Y Y predict argmax

83 robabilistic Analytics: Slide 83 Naïe Bayes Classifier argmax predict Y Y u X u X Y m m L In the case of the naie Bayes Classifier this can be simplified:! n Y j j j Y u X Y Y predict argmax Technical Hint: If you hae, input attributes that product will underflow in floating point math. You should use logs:!! " # $ $ % & + ' n Y j j j Y u X Y Y predict log log argmax

84 BC Results: XOR The XOR dataset consists of 4, records and 2 Boolean inputs called a and b, generated 5-5 randomly as or. c output a XOR b The Classifier learned by Joint BC The Classifier learned by Naie BC robabilistic Analytics: Slide 84

85 Naïe BC Results: All Irreleant The all irreleant dataset consists of 4, records and 5 Boolean attributes called a,b,c,d..o where a,b,c are generated 5-5 randomly as or. output with probability.75, with prob.25 The Classifier learned by Naie BC robabilistic Analytics: Slide 85

86 More Facts About Bayes Classifiers Many other density estimators can be slotted in*. Density estimation can be performed with real-alued inputs* Bayes Classifiers can be built with real-alued inputs* Rather Technical Complaint: Bayes Classifiers don t try to be maximally discriminatie---they merely try to honestly model what s going on* Zero probabilities are painful for Joint and Naïe. A hack justifiable with the magic words Dirichlet rior can help*. Naïe Bayes is wonderfully cheap. And suries, attributes cheerfully! *See future Andrew Lectures robabilistic Analytics: Slide 86

87 robability What you should know Fundamentals of robability and Bayes Rule What s a Joint Distribution How to do inference i.e. E E2 once you hae a JD Density Estimation What is DE and what is it good for How to learn a Joint DE How to learn a naïe DE robabilistic Analytics: Slide 87

88 What you should know Bayes Classifiers How to build one How to predict with a BC Contrast between naïe and joint BCs robabilistic Analytics: Slide 88

Probabilistic and Bayesian Analytics

Probabilistic and Bayesian Analytics Probabilistic and Bayesian Analytics Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these

More information

Probabilistic and Bayesian Analytics

Probabilistic and Bayesian Analytics This ersion of the Powerpoint presentation has been labeled to let you know which bits will by coered in the ain class (Tue/Thu orning lectures and which parts will be coered in the reiew sessions. Probabilistic

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci. Soft Computing Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning

More information

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com Probability Basics These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 06: ayes Theorem Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) Andrew Moore (CMU/Google) 1 Naïve ayes Classifier

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Bayes Nets for representing and reasoning about uncertainty

Bayes Nets for representing and reasoning about uncertainty Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. ndrew would be delighted if you found this source material useful in giing your own lectures.

More information

Sources of Uncertainty

Sources of Uncertainty Probability Basics Sources of Uncertainty The world is a very uncertain place... Uncertain inputs Missing data Noisy data Uncertain knowledge Mul>ple causes lead to mul>ple effects Incomplete enumera>on

More information

A [somewhat] Quick Overview of Probability. Shannon Quinn CSCI 6900

A [somewhat] Quick Overview of Probability. Shannon Quinn CSCI 6900 A [somewhat] Quick Overview of Probability Shannon Quinn CSCI 6900 [Some material pilfered from http://www.cs.cmu.edu/~awm/tutorials] Probabilistic and Bayesian Analytics Note to other teachers and users

More information

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1 Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating

More information

CS 4649/7649 Robot Intelligence: Planning

CS 4649/7649 Robot Intelligence: Planning CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides

More information

Bayes Nets for representing and reasoning about uncertainty

Bayes Nets for representing and reasoning about uncertainty Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. ndrew would be delighted if you found this source material useful in giving your own lectures.

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

CS 331: Artificial Intelligence Probability I. Dealing with Uncertainty

CS 331: Artificial Intelligence Probability I. Dealing with Uncertainty CS 331: Artificial Intelligence Probability I Thanks to Andrew Moore for some course material 1 Dealing with Uncertainty We want to get to the point where we can reason with uncertainty This will require

More information

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014 Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty Machine Learning CS6375 --- Spring 2015 a Bayesian Learning (I) 1 Uncertainty Most real-world problems deal with uncertain information Diagnosis: Likely disease given observed symptoms Equipment repair:

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Probability review. Adopted from notes of Andrew W. Moore and Eric Xing from CMU. Copyright Andrew W. Moore Slide 1

Probability review. Adopted from notes of Andrew W. Moore and Eric Xing from CMU. Copyright Andrew W. Moore Slide 1 robablty reew dopted from notes of ndrew W. Moore and Erc Xng from CMU Copyrght ndrew W. Moore Slde So far our classfers are determnstc! For a gen X, the classfers we learned so far ge a sngle predcted

More information

Bayesian Networks Structure Learning (cont.)

Bayesian Networks Structure Learning (cont.) Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic

More information

Bayes Nets for representing

Bayes Nets for representing Bayes Nets for representing and reasoning about uncertainty Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures.

More information

Bayes and Naïve Bayes Classifiers CS434

Bayes and Naïve Bayes Classifiers CS434 Bayes and Naïve Bayes Classifiers CS434 In this lectre 1. Review some basic probability concepts 2. Introdce a sefl probabilistic rle - Bayes rle 3. Introdce the learning algorithm based on Bayes rle (ths

More information

Probabilistic and Bayesian Learning

Probabilistic and Bayesian Learning Probabilistic and Bayesian Learning Note to other teachers and sers of these slides. Andrew wold be delighted if yo fond this sorce aterial sefl in giing yor own lectres. Feel free to se these slides erbati,

More information

Mixtures of Gaussians continued

Mixtures of Gaussians continued Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others

More information

Bayesian Approaches Data Mining Selected Technique

Bayesian Approaches Data Mining Selected Technique Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1 EM Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 19 th, 2007 2005-2007 Carlos Guestrin 1 K-means 1. Ask user how many clusters they d like. e.g. k=5 2. Randomly guess

More information

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty Bayes Classification n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes

More information

Introduction to Karnaugh Maps

Introduction to Karnaugh Maps Introduction to Karnaugh Maps Review So far, you (the students) have been introduced to truth tables, and how to derive a Boolean circuit from them. We will do an example. Consider the truth table for

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the

More information

Basic Probabilistic Reasoning SEG

Basic Probabilistic Reasoning SEG Basic Probabilistic Reasoning SEG 7450 1 Introduction Reasoning under uncertainty using probability theory Dealing with uncertainty is one of the main advantages of an expert system over a simple decision

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Symbolic Logic Outline

Symbolic Logic Outline Symbolic Logic Outline 1. Symbolic Logic Outline 2. What is Logic? 3. How Do We Use Logic? 4. Logical Inferences #1 5. Logical Inferences #2 6. Symbolic Logic #1 7. Symbolic Logic #2 8. What If a Premise

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Answering. Question Answering: Result. Application of the Theorem Prover: Question. Overview. Question Answering: Example.

Answering. Question Answering: Result. Application of the Theorem Prover: Question. Overview. Question Answering: Example. ! &0/ $% &) &% &0/ $% &) &% 5 % 4! pplication of the Theorem rover: Question nswering iven a database of facts (ground instances) and axioms we can pose questions in predicate calculus and answer them

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

CS 331: Artificial Intelligence Naïve Bayes. Naïve Bayes

CS 331: Artificial Intelligence Naïve Bayes. Naïve Bayes CS 33: Artificial Intelligence Naïe Bayes Thanks to Andrew Moore for soe corse aterial Naïe Bayes A special type of Bayesian network Makes a conditional independence assption Typically sed for classification

More information

KEY CONCEPTS IN PROBABILITY: SMOOTHING, MLE, AND MAP

KEY CONCEPTS IN PROBABILITY: SMOOTHING, MLE, AND MAP KEY CONCEPTS IN PROBABILITY: SMOOTHING, MLE, AND MAP Outline MAPs and MLEs catchup from last week Joint Distributions a new learner Naïve Bayes another new learner Administrivia Homeworks: Due tomorrow

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning! CS188 Outline We re done with art I: Search and lanning! CS 188: Artificial Intelligence robability art II: robabilistic Reasoning Diagnosis Speech recognition Tracking objects Robot mapping Genetics Error

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Ways to make neural networks generalize better

Ways to make neural networks generalize better Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights

More information

Expectation-Maximization

Expectation-Maximization Expectation-Maximization Léon Bottou NEC Labs America COS 424 3/9/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression,

More information

Probability Theory Review

Probability Theory Review Probability Theory Review Brendan O Connor 10-601 Recitation Sept 11 & 12, 2012 1 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2 Probability

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

Fundamentals of Machine Learning for Predictive Data Analytics

Fundamentals of Machine Learning for Predictive Data Analytics Fundamentals of Machine Learning for Predictive Data Analytics Chapter 6: Probability-based Learning Sections 6.1, 6.2, 6.3 John Kelleher and Brian Mac Namee and Aoife D Arcy john.d.kelleher@dit.ie brian.macnamee@ucd.ie

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

ML (cont.): DECISION TREES

ML (cont.): DECISION TREES ML (cont.): DECISION TREES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 Some slides from Andrew Moore http://www.cs.cmu.edu/~awm/tutorials

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore Supervised Learning Hidden Markov Models Some of these slides were inspired by the tutorials of Andrew Moore A Markov System S 2 Has N states, called s 1, s 2.. s N There are discrete timesteps, t=0, t=1,.

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Fitting a Straight Line to Data

Fitting a Straight Line to Data Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,

More information

EM (cont.) November 26 th, Carlos Guestrin 1

EM (cont.) November 26 th, Carlos Guestrin 1 EM (cont.) Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 26 th, 2007 1 Silly Example Let events be grades in a class w 1 = Gets an A P(A) = ½ w 2 = Gets a B P(B) = µ

More information

VC-dimension for characterizing classifiers

VC-dimension for characterizing classifiers VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to

More information

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING

SAMPLE CHAPTER. Avi Pfeffer. FOREWORD BY Stuart Russell MANNING SAMPLE CHAPTER Avi Pfeffer FOREWORD BY Stuart Russell MANNING Practical Probabilistic Programming by Avi Pfeffer Chapter 9 Copyright 2016 Manning Publications brief contents PART 1 INTRODUCING PROBABILISTIC

More information

A Primer on Statistical Inference using Maximum Likelihood

A Primer on Statistical Inference using Maximum Likelihood A Primer on Statistical Inference using Maximum Likelihood November 3, 2017 1 Inference via Maximum Likelihood Statistical inference is the process of using observed data to estimate features of the population.

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Dealing with Uncertainty. CS 331: Artificial Intelligence Probability I. Outline. Random Variables. Random Variables.

Dealing with Uncertainty. CS 331: Artificial Intelligence Probability I. Outline. Random Variables. Random Variables. Dealing with Uncertainty CS 331: Artificial Intelligence Probability I We want to get to the point where we can reason with uncertainty This will require using probability e.g. probability that it will

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Fall Semester, 2007 Lecture 12 Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Fall Semester, 2007 Lecture 12 Notes

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic Computer Science 324 Computer Architecture Mount Holyoke College Fall 2007 Topic Notes: Digital Logic Our goal for the next few weeks is to paint a a reasonably complete picture of how we can go from transistor

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Basic Probability and Statistics

Basic Probability and Statistics Basic Probability and Statistics Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Jerry Zhu, Mark Craven] slide 1 Reasoning with Uncertainty

More information

Artificial Intelligence CS 6364

Artificial Intelligence CS 6364 Artificial Intelligence CS 6364 rofessor Dan Moldovan Section 12 robabilistic Reasoning Acting under uncertainty Logical agents assume propositions are - True - False - Unknown acting under uncertainty

More information

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015 S 88 Spring 205 Introduction to rtificial Intelligence Midterm 2 V ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Sampling Distribution Models. Chapter 17

Sampling Distribution Models. Chapter 17 Sampling Distribution Models Chapter 17 Objectives: 1. Sampling Distribution Model 2. Sampling Variability (sampling error) 3. Sampling Distribution Model for a Proportion 4. Central Limit Theorem 5. Sampling

More information

Artificial Intelligence Programming Probability

Artificial Intelligence Programming Probability Artificial Intelligence Programming Probability Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science University of San Francisco p.1/?? 13-0: Uncertainty

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Refresher on Discrete Probability

Refresher on Discrete Probability Refresher on Discrete Probability STAT 27725/CMSC 25400: Machine Learning Shubhendu Trivedi University of Chicago October 2015 Background Things you should have seen before Events, Event Spaces Probability

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others) Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed

More information

Decision Trees Lecture 12

Decision Trees Lecture 12 Decision Trees Lecture 12 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Machine Learning in the ER Physician documentation Triage Information

More information

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Introduction to AI Learning Bayesian networks. Vibhav Gogate Introduction to AI Learning Bayesian networks Vibhav Gogate Inductive Learning in a nutshell Given: Data Examples of a function (X, F(X)) Predict function F(X) for new examples X Discrete F(X): Classification

More information

Math 381 Midterm Practice Problem Solutions

Math 381 Midterm Practice Problem Solutions Math 381 Midterm Practice Problem Solutions Notes: -Many of the exercises below are adapted from Operations Research: Applications and Algorithms by Winston. -I have included a list of topics covered on

More information

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins. Bayesian Reasoning Adapted from slides by Tim Finin and Marie desjardins. 1 Outline Probability theory Bayesian inference From the joint distribution Using independence/factoring From sources of evidence

More information