Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets

Size: px
Start display at page:

Download "Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets"

Transcription

1 Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets

2

3

4 Bayesian Learning

5 Conditional Probability Probability of an event given the occurrence of some other event. P( X Y) P( X Y) P( Y) P( X, Y) P( Y)

6 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam.

7 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam?

8 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1

9 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder?

10 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder? P(Y) = 200 /1000 =.2

11 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder? P(Y) = 200 /1000 =.2 Given that an is in your junk folder, what is the probability it is spam?

12 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder? P(Y) = 200 /1000 =.2 Given that an is in your junk folder, what is the probability it is spam? P(X ÇY ) P(X Y ) = =.09 /.2 =.45 P(Y )

13 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder? P(Y) = 200 /1000 =.2 Given that an is in your junk folder, what is the probability it is spam? P(X ÇY ) P(X Y ) = =.09 /.2 =.45 P(Y ) Given that an is spam, what is the probability it is in your junk folder?

14 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder? P(Y) = 200 /1000 =.2 Given that an is in your junk folder, what is the probability it is spam? P(X ÇY ) P(X Y ) = =.09 /.2 =.45 P(Y ) Given that an is spam, what is the probability it is in your junk folder? P(Y X) = P(X ÇY ) P(X) =.09 /.1=.9

15 Deriving Bayes Rule P(X Y ) = P(Y X) = P(X ÇY ) P(Y ) P(X ÇY ) P(X) Bayes rule : P(X Y ) = P(Y X)P(X) P(Y )

16 General Application to Data Models In machine learning we have a space H of hypotheses: h 1, h 2,..., h n (possibly infinite) We also have a set D of data We want to calculate P(h D) Bayes rule gives us: P( h D) P( D h) P( h) P( D)

17 Prior probability of h: Terminology P(h): Probability that hypothesis h is true given our prior knowledge If no prior knowledge, all h H are equally probable Posterior probability of h: P(h D): Probability that hypothesis h is true, given the data D. Likelihood of D: P(D h): Probability that we will see data D, given hypothesis h is true. Marginal likelihood of D P(D) = å h P(D h)p(h)

18 A Bayesian Approach to the Monty Hall Problem You are a contestant on a game show. There are 3 doors, A, B, and C. There is a new car behind one of them and goats behind the other two. Monty Hall, the host, knows what is behind the doors. He asks you to pick a door, any door. You pick door A. Monty tells you he will open a door, different from A, that has a goat behind it. He opens door B: behind it there is a goat. Monty now gives you a choice: Stick with your original choice A or switch to C.

19 Bayesian probability formulation Hypothesis space H: h 1 = Car is behind door A h 2 = Car is behind door B h 3 = Car is behind door C Data D: After you picked door A, Monty opened B to show a goat Prior probability: P(h 1 ) = 1/3 P(h 2 ) =1/3 P(h 3 ) =1/3 Likelihood: P(D h 1 ) = 1/2 P(D h 2 ) = 0 P(D h 3 ) = 1 What is P(h 1 D)? What is P(h 2 D)? What is P(h 3 D)? Marginal likelihood: P(D) = p(d h 1 )p(h 1 ) + p(d h 2 )p(h 2 ) + p(d h 3 )p(h 3 ) = 1/ /3 = 1/2

20 By Bayes rule: P(h 1 D) = P(D h 1 )P(h 1 ) P(D) æ = 1 ö ç è 2 ø æ ç 1ö è 3 ø (2) = 1 3 P(h 2 D) = P(D h 2)P(h 2 ) P(D) ( ) 1 è 3 = 0 æ ç ö ø (2) = 0 P(h 3 D) = P(D h 3 )P(h 3 ) P(D) ( ) 1 è 3 = 1 æ ç ö ø (2) = 2 3 So you should switch!

21 MAP ( maximum a posteriori ) Learning Bayes rule: P( h D) P( D h) P( h) P( D) Goal of learning: Find maximum a posteriori hypothesis h MAP : h MAP = argmax hîh P(h D) = argmax hîh P(D h)p(h) P(D) = argmax hîh P(D h)p(h) because P(D) is a constant independent of h.

22 Note: If every h H is equally probable, then h MAP argmax h H P( D h) h MAP is called the maximum likelihood hypothesis.

23 A Medical Example Toby takes a test for leukemia. The test has two outcomes: positive and negative. It is known that if the patient has leukemia, the test is positive 98% of the time. If the patient does not have leukemia, the test is positive 3% of the time. It is also known that of the population has leukemia. Toby s test is positive. Which is more likely: Toby has leukemia or Toby does not have leukemia?

24 Hypothesis space: h 1 = T. has leukemia h 2 = T. does not have leukemia Prior: of the population has leukemia. Thus P(h 1 ) = P(h 2 ) = Likelihood: P(+ h 1 ) = 0.98, P( h 1 ) = 0.02 P(+ h 2 ) = 0.03, P( h 2 ) = 0.97 Posterior knowledge: Blood test is + for this patient.

25 In summary Thus: h MAP = P(h 1 ) = 0.008, P(h 2 ) = P(+ h 1 ) = 0.98, P( h 1 ) = 0.02 P(+ h 2 ) = 0.03, P( h 2 ) = 0.97 argmax P(D h)p(h) hîh P(+ leukemia)p(leukemia) = (0.98)(0.008) = P(+ Øleukemia)P(Øleukemia) = (0.03)(0.992) = h MAP = Øleukemia

26 What is P(leukemia +)? P( h D) P( D h) P( h) P( D) So, P(leukemia +) = = 0.21 P(Øleukemia +) = = 0.79 These are called the posterior probabilities.

27 Bayesianism vs. Frequentism Classical probability: Frequentists Probability of a particular event is defined relative to its frequency in a sample space of events. E.g., probability of the coin will come up heads on the next trial is defined relative to the frequency of heads in a sample space of coin tosses. Bayesian probability: Combine measure of prior belief you have in a proposition with your subsequent observations of events. Example: Bayesian can assign probability to statement There was life on Mars a billion years ago but frequentist cannot.

28 Independence and Conditional Independence Recall that two random variables, X and Y, are independent if Two random variables, X and Y, are independent given C if ) ( ) ( ), ( Y P X P Y X P ) ( ) ( ), ( C Y P C X P C Y X P

29 Naive Bayes Classifier Let f (x) be a target function for classification: f (x) {+1, 1}. Let x = (x 1, x 2,..., x n ) We want to find the most probable class value, h MAP, given the data x: class MAP = argmax P(class D) class Î {+1,-1} = argmax P(class x 1, x 2,..., x n ) class Î {+1,-1}

30 By Bayes Theorem: class MAP = argmax P(x, x,..., x class)p(class) 1 2 n class Î {+1,-1} P(x 1, x 2,..., x n ) = argmax P(x 1, x 2,..., x n class)p(class) class Î {+1,-1} P(class) can be estimated from the training data. How? However, in general, not practical to use training data to estimate P(x 1, x 2,..., x n class). Why not?

31 Naive Bayes classifier: Assume P(x 1, x 2,..., x n class) = P(x 1 class)p(x 2 class) Is this a good assumption? P(x n class) Given this assumption, here s how to classify an instance x = (x 1, x 2,...,x n ): Naive Bayes classifier: class NB (x) = argmax P(class) class Î {+1,-1} Õ i P(x i class) To train: Estimate the values of these various probabilities over the training set.

32 Training data: Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Test data: D15 Sunny Cool High Strong?

33 Use training data to compute a probabilistic model: P(Outlook = Sunny Yes) = 2 / 9 P(Outlook = Sunny No) = 3 / 5 P(Outlook = Overcast Yes) = 4 / 9 P(Outlook = Overcast No) = 0 P(Outlook = Rain Yes) = 3 / 9 P(Outlook = Rain No) = 2 / 5 P(Temperature = Hot Yes) = 2 / 9 P(Temperature = Hot No) = 2 / 5 P(Temperature = Mild Yes) = 4 / 9 P(Temperature = Mild No) = 2 / 5 P(Temperature = Cool Yes) = 3 / 9 P(Temperature = Cool No) =1/ 5 P(Humidity = High Yes) = 3 / 9 P(Humidity = High No) = 4 / 5 P(Humidity = Normal Yes) = 6 / 9 P(Humidity = Normal No) =1/ 5 P(Wind = Strong Yes) = 3 / 9 P(Wind = Strong No) = 3 / 5 P(Wind = Weak Yes) = 6 / 9 P(Wind = Weak No) = 2 / 5

34 Use training data to compute a probabilistic model: P(Outlook = Sunny Yes) = 2 / 9 P(Outlook = Sunny No) = 3 / 5 P(Outlook = Overcast Yes) = 4 / 9 P(Outlook = Overcast No) = 0 P(Outlook = Rain Yes) = 3 / 9 P(Outlook = Rain No) = 2 / 5 P(Temperature = Hot Yes) = 2 / 9 P(Temperature = Hot No) = 2 / 5 P(Temperature = Mild Yes) = 4 / 9 P(Temperature = Mild No) = 2 / 5 P(Temperature = Cool Yes) = 3 / 9 P(Temperature = Cool No) =1/ 5 P(Humidity = High Yes) = 3 / 9 P(Humidity = High No) = 4 / 5 P(Humidity = Normal Yes) = 6 / 9 P(Humidity = Normal No) =1/ 5 P(Wind = Strong Yes) = 3 / 9 P(Wind = Strong No) = 3 / 5 P(Wind = Weak Yes) = 6 / 9 P(Wind = Weak No) = 2 / 5 Day Outlook Temp Humidity Wind PlayTennis D15 Sunny Cool High Strong?

35 Use training data to compute a probabilistic model: P(Outlook = Sunny Yes) = 2 / 9 P(Outlook = Sunny No) = 3 / 5 P(Outlook = Overcast Yes) = 4 / 9 P(Outlook = Overcast No) = 0 P(Outlook = Rain Yes) = 3 / 9 P(Outlook = Rain No) = 2 / 5 P(Temperature = Hot Yes) = 2 / 9 P(Temperature = Hot No) = 2 / 5 P(Temperature = Mild Yes) = 4 / 9 P(Temperature = Mild No) = 2 / 5 P(Temperature = Cool Yes) = 3 / 9 P(Temperature = Cool No) =1/ 5 P(Humidity = High Yes) = 3 / 9 P(Humidity = High No) = 4 / 5 P(Humidity = Normal Yes) = 6 / 9 P(Humidity = Normal No) =1/ 5 P(Wind = Strong Yes) = 3 / 9 P(Wind = Strong No) = 3 / 5 P(Wind = Weak Yes) = 6 / 9 P(Wind = Weak No) = 2 / 5 Day Outlook Temp Humidity Wind PlayTennis D15 Sunny Cool High Strong? class NB (x) = argmax P(class) class Î {+1,-1} Õ i P(x i class)

36 Estimating probabilities / Smoothing Recap: In previous example, we had a training set and a new example, (Outlook=sunny, Temperature=cool, Humidity=high, Wind=strong) We asked: What classification is given by a naive Bayes classifier? Let n c be the number of training instances with class c. Let n x i =a k c be the number of training instances with attribute value x i =a k and class c. Then: P(x i = a i c) = n c x i =a k n c

37 Problem with this method: If n c is very small, gives a poor estimate. E.g., P(Outlook = Overcast no) = 0.

38 Now suppose we want to classify a new instance: (Outlook=overcast, Temperature=cool, Humidity=high, Wind=strong) Then: P(no) Õ P(x i no) = 0 i This incorrectly gives us zero probability due to small sample.

39 One solution: Laplace smoothing (also called add-one smoothing) For each class c and attribute x i with value a k, add one virtual instance. That is, for each class c, recalculate: P(x i = a i c) = n c x i =a k +1 n c + K where K is the number of possible values of attribute a.

40 Training data: Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Laplace smoothing: Add the following virtual instances for Outlook: Outlook=Sunny: Yes Outlook=Overcast: Yes Outlook=Rain: Yes Outlook=Sunny: No Outlook=Overcast: No Outlook=Rain: No P(Outlook = overcast No) = 0 5 n x i =a k c +1 n c + K = = 1 8 P(Outlook = overcast Yes) = 4 9 n c x i =a k +1 n c + K = = 5 12

41 P(Outlook = Sunny Yes) = 2 / 9 3 /12 P(Outlook = Sunny No) = 3 / 5 4 / 8 P(Outlook = Overcast Yes) = 4 / 9 5 /12 P(Outlook = Overcast No) = 0 / 5 1/ 8 P(Outlook = Rain Yes) = 3 / 9 4 /12 P(Outlook = Rain No) = 2 / 5 3 / 8 Etc.

42 In-class exercise 1. Recall the Naïve Bayes Classifier class NB (x) = argmax P(class) class Î {+1,-1} Õ Consider this training set, in which each instance has four binary features and a binary class: i P(x i class) 2. Recall the formula for Laplace smoothing: Instance x1 x2 x3 x4 Class x POS x POS x POS x POS where K is the number of possible values of attribute a. (a) Apply Laplace smoothing to all the probabilities from the training set in question 1. x NEG x NEG x NEG (b) Use the smoothed probabilities to determine class NB for the following new instances: (a) Create a probabilistic model that you could use to classify new instances. That is, calculate P(class) and P(xi class) for each class. No smoothing is needed (yet). Instance x1 x2 x3 x4 Class x x (b) Use your probabilistic model to determine classnb for the following new instance: Instance x1 x2 x3 x4 Class x

43 Naive Bayes on continuousvalued attributes How to deal with continuous-valued attributes? Two possible solutions: Discretize Assume particular probability distribution of classes over values (estimate parameters from training data)

44 Discretization: Equal-Width Binning For each attribute x i, create k equal-width bins in interval from min(x i ) to max(x i ). The discrete attribute values are now the bins. Questions: What should k be? What if some bins have very few instances? Problem with balance between discretization bias and variance. The more bins, the lower the bias, but the higher the variance, due to small sample size.

45 Discretization: Equal-Frequency Binning For each attribute x i, create k bins so that each bin contains an equal number of values. Also has problems: What should k be? Hides outliers. Can group together instances that are far apart.

46 Gaussian Naïve Bayes Assume that within each class, values of each numeric feature are normally distributed: where μ i,c is the mean of feature i given the class c, and σ i,c is the standard deviation of feature i given the class c We estimate μ i,c and σ i,c from training data.

47 Example x 1 x 2 Class POS POS POS NEG NEG NEG

48 Example x 1 x 2 Class POS POS POS NEG NEG NEG P(POS) = 0.5 P(NEG) = 0.5

49 N 1,POS = N(x; 4.8, 1.8) N 2,POS = N(x; 7.1, 2.0) N 1,NEG = N(x; 4.7, 2.5) N 2,NEG = N(x; 4.2, 3.7)

50 Now, suppose you have a new example x, with x 1 = 5.2, x 2 = 6.3. What is class NB (x)?

51 Now, suppose you have a new example x, with x 1 = 5.2, x 2 = 6.3. What is class NB (x)? class NB (x) = argmax P(class) class Î {+1,-1} Õ i P(x i class) Note: N is the probability density function, but can be used analogously to probability in Naïve Bayes calculations.

52 Now, suppose you have a new example x, with x 1 = 5.2, x 2 = 6.3. What is class NB (x)? class NB (x) = argmax P(class) class Î {+1,-1} Õ i P(x i class)

53 Positive : P(POS)P(x 1 POS)P(x 2 POS) = (.5)(.22)(.18) =.02 Negative : P(NEG)P(x 1 NEG)P(x 2 NEG) = (.5)(.16)(.09) =.0072 class NB (x) = POS

54 Use logarithms to avoid underflow class NB (x) = argmax P(class) class Î {+1,-1} Õ i P(x i class) = æ argmax logçp(class) è class Î {+1,-1} Õ i P(x i ö class) ø = æ ö argmax çlog P(class)+ log P(x i class) è ø class Î {+1,-1} å i

55 Bayes Nets

56 Another example A patient comes into a doctor s office with a bad cough and a high fever. Hypothesis space H: Data D: h 1 : patient has flu h 2 : patient does not have flu coughing = true, fever = true Prior probabilities: Likelihoods Prob. of data p(h 1 ) =.1 p(d h1) =.8 P(D) = p(h 2 ) =. 9 p(d h2) =.4 Posterior probabilities: P(h 1 D) = P(h 2 D) =

57 Let s say we have the following random variables: cough fever flu smokes

58 Full joint probability distribution smokes cough cough Fever Fever Fever Fever Sum of all boxes is 1. flu p 1 p 2 p 3 p 4 flu p 5 p 6 p 7 p 8 smokes cough cough fever fever fever fever flu p 9 p 10 p 11 p 12 flu p 13 p 14 p 15 p 16 In principle, the full joint distribution can be used to answer any question about probabilities of these combined parameters. However, size of full joint distribution scales exponentially with number of parameters so is expensive to store and to compute with.

59 Bayesian networks Idea is to represent dependencies (or causal relations) for all the variables so that space and computation-time requirements are minimized. smokes flu cough fever Graphical Models

60 smoke true 0.2 false 0.8 smoke flu smoke cough true false True True True False False True false false flu Conditional probability tables for each node flu true 0.01 false 0.99 cough fever flu fever true false true false

61 Semantics of Bayesian networks If network is correct, can calculate full joint probability distribution from network. P(( X 1 x 1 ) ( X 2 x 2 )... ( X n x n )) n i 1 P( X i x i parents( X i )) where parents(x i ) denotes specific values of parents of X i.

62 Example Calculate P[( cough t) ( fever f ) ( flu f ) ( smoke f )]

63 Example Calculate P[( cough t) ( fever f ) ( flu f ) ( smoke f )]

64 Different types of inference in Bayesian Networks Causal inference Evidence is cause, inference is probability of effect Example: Instantiate evidence flu = true. What is P(fever flu)? P( fever flu).9 (up from.207)

65 Diagnostic inference Evidence is effect, inference is probability of cause Example: Instantiate evidence fever = true. What is P(flu fever)? P( fever flu) P( flu) (.9)(.01) P( flu fever).043 P( fever).207 (up from.01)

66 Example: What is P(flu cough)? (.8)(.8)](.01) [(.95)(.2) ) ( ) ( )] ( ), ( ) ( ), ( [ ) ( ) ( ) ( ) ( cough p flu P smoke p smoke flu cough P smoke p smoke flu cough P cough P flu P flu cough P cough flu P

67 Inter-causal inference Explain away different possible causes of effect Example: What is P(flu cough,smoke)? P( flu cough, smoke) p( flu cough smoke) p( cough smoke) p( cough (.95)(.01)(.2) /[(.95)(.01)(.2) (.6)(.2)(.99)] p( cough flu, smoke) p( flu) p( smoke) flu, smoke) p( flu) p( smoke) p( cough smoke, flu) p( smoke) p( flu) Why is P(flu cough,smoke) < P(flu cough)?

68 Complexity of Bayesian Networks For n random Boolean variables: Full joint probability distribution: 2 n entries Bayesian network with at most k parents per node: Each conditional probability table: at most 2 k entries Entire network: n 2 k entries

69 What are the advantages of Bayesian networks? Intuitive, concise representation of joint probability distribution (i.e., conditional dependencies) of a set of random variables. Represents beliefs and knowledge about a particular class of situations. Efficient (?) (approximate) inference algorithms Efficient, effective learning algorithms

70 Issues in Bayesian Networks Building / learning network topology Assigning / learning conditional probability tables Approximate inference via sampling

71 Real-World Example: The Lumière Project at Microsoft Research Bayesian network approach to answering user queries about Microsoft Office. At the time we initiated our project in Bayesian information retrieval, managers in the Office division were finding that users were having difficulty finding assistance efficiently. As an example, users working with the Excel spreadsheet might have required assistance with formatting a graph. Unfortunately, Excel has no knowledge about the common term, graph, and only considered in its keyword indexing the term chart.

72

73 Networks were developed by experts from user modeling studies.

74 Offspring of project was Office Assistant in Office 97.

75 flu smoke fever cough headache nausea ) ( ) ( ) ( ) ( ) ( ) ( ) ( flu headache P flu nausea P flu cough P flu fever P flu smoke P flu P headache nausea cough fever smoke flu P

76 flu smoke fever cough headache nausea Naive Bayes ) ( ) ( ),..., ( : tion for classifica More generally, j i i i j n j c C x X P c C P x X x X c C P

77 Learning network topology Many different approaches, including: Heuristic search, with evaluation based on information theory measures Genetic algorithms Using meta Bayesian networks!

78 Learning conditional probabilities In general, random variables are not binary, but real-valued Conditional probability tables conditional probability distributions Estimate parameters of these distributions from data

79 Approximate inference via sampling Recall: We can calculate full joint probability distribution from network. P( X,..., X ) d P( X parents( X 1 d i i i 1 where parents(x i ) denotes specific values of parents of X i. )) We can do diagnostic, causal, and inter-causal inference But if there are a lot of nodes in the network, this can be very slow! Need efficient algorithms to do approximate calculations!

80 A Précis of Sampling Algorithms Gibbs Sampling Suppose that we want to sample from: p x θ, x R d Basic Idea: Sample, sequentially from full conditionals. Initialize: x i : i = 1,..., D For t=1 T t sample : x ~ p x x t sample : x ~ p x x t sample : x ~ p x x ( 1) ( t) 1 1 ( 1) ( 1) ( t) 2 2 ( 2) ( 1) ( t) D D ( D) Complications: (i) How to order samples (can be random, but be careful); (ii) Need full conditionals (can use approximation). Under nice assumptions (ergodicity), we get sample: x ~ p x

81 A Précis of Sampling Algorithms Sampling Algorithms More Generally: How do we perform posterior inference more generally? Oftentimes, we rely on strong parametric assumptions (e.g. conjugacy, exponential family structures). Monte Carlo Approximation/Inference can get around this. Basic Idea: (1) Draw samples x s ~p(x θ). (2) Compute quantity of interest, e.g., (marginals): 1 E p x1 D x1, i, etc. S S 1 s In general: E f x f x p x dx f x S S

82 A Précis of Sampling Algorithms Sampling Algorithms More Generally: CDF Method (MC technique) Steps: (1) sample u~u(0,1) (2) F -1 (U)~F

83 A Précis of Sampling Algorithms Sampling Algorithms More Generally: Rejection Sampling (MC technique) One can show that x~p(x) Issues: need a good q(x), c and rejection rate can grow astronomically! Pros of MC sampling: samples are independent; Cons: very inefficient in high dimensions. Alternatively, one can use MCMC methods.

84 Markov Chain Monte Carlo Sampling One of most common methods used in real applications. Recall that: By construction of Bayesian network, a node is conditionally independent of its non-descendants, given its parents. Also recall that: a node can be conditionally dependent on its children and on the other parents of its children. (Why?) Definition: The Markov blanket of a variable X i is X i s parents, children, and children s other parents.

85 Example What is the Markov blanket of cough? of flu? smokes flu cough fever

86 Theorem: A node X i is conditionally independent of all other nodes in the network, given its Markov blanket.

87 Markov Chain Monte Carlo (MCMC) Sampling Start with random sample from variables: (x 1,..., x n ). This is the current state of the algorithm. Next state: Randomly sample value for one non-evidence variable X i, conditioned on current values in Markov Blanket of X i.

88 Example Query: What is P(cough smoke)? MCMC: Random sample, with evidence variables fixed: flu smoke fever cough truetrue false true Repeat: 1. Sample flu probabilistically, given current values of its Markov blanket: smoke = true, fever = false, cough = true Suppose result is false. New state: flu smoke fever cough false true false true

89 2. Sample cough, given current values of its Markov blanket: smoke = true, flu = false Suppose result is true. New state: flu smoke fever cough false true false true 3. Sample fever, given current values of its Markov blanket: flu = false Suppose result is true. New state: flu smoke fever cough false true true true

90 Each sample contributes to estimate for query P(cough smoke) Suppose we perform 100 such samples, 20 with cough = true and 80 with cough= false. Then answer to the query is P(cough smoke) =.20 Theorem: MCMC settles into behavior in which each state is sampled exactly according to its posterior probability, given the evidence.

91 Applying Bayesian Reasoning to Speech Recognition Task: Identify sequence of words uttered by speaker, given acoustic signal. Uncertainty introduced by noise, speaker error, variation in pronunciation, homonyms, etc. Thus speech recognition is viewed as problem of probabilistic inference.

92 So far, we ve looked at probabilistic reasoning in static environments. Speech: Time sequence of static environments. Let X be the state variables (i.e., set of non-evidence variables) describing the environment (e.g., Words said during time step t) Let E be the set of evidence variables (e.g., S = features of acoustic signal).

93 The E values and X joint probability distribution changes over time. t 1 : X 1, e 1 t 2 : X 2, e 2 etc.

94 At each t, we want to compute P(Words S). We know from Bayes rule: P( Words S) P( S Words) P( Words) P(S Words), for all words, is a previously learned acoustic model. E.g. For each word, probability distribution over phones, and for each phone, probability distribution over acoustic signals (which can vary in pitch, speed, volume). P(Words), for all words, is the language model, which specifies prior probability of each utterance. E.g. bigram model : probability of each word following each other word.

95 Speech recognition typically makes three assumptions: 1. Process underlying change is itself stationary i.e., state transition probabilities don t change 2. Current state X depends on only a finite history of previous states ( Markov assumption ). Markov process of order n: Current state depends only on n previous states. 3. Values e t of evidence variables depend only on current state X t. ( Sensor model )

96

97

98 Hidden Markov Models Markov model: Given state X t, what is probability of transitioning to next state X t+1? E.g., word bigram probabilities give P (word t+1 word t ) Hidden Markov model: There are observable states (e.g., signal S) and hidden states (e.g., Words). HMM represents probabilities of hidden states given observable states.

99

100

101 Example: I m firsty, um, can I have something to dwink?

102

103 Graphical Models and Computer Vision

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. Bayesian Learning Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. (Linked from class website) Conditional Probability Probability of

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

COMP 328: Machine Learning

COMP 328: Machine Learning COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Bayesian Methods in Artificial Intelligence

Bayesian Methods in Artificial Intelligence WDS'10 Proceedings of Contributed Papers, Part I, 25 30, 2010. ISBN 978-80-7378-139-2 MATFYZPRESS Bayesian Methods in Artificial Intelligence M. Kukačka Charles University, Faculty of Mathematics and Physics,

More information

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty Machine Learning CS6375 --- Spring 2015 a Bayesian Learning (I) 1 Uncertainty Most real-world problems deal with uncertain information Diagnosis: Likely disease given observed symptoms Equipment repair:

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Naïve Bayes Classifiers

Naïve Bayes Classifiers Naïve Bayes Classifiers Example: PlayTennis (6.9.1) Given a new instance, e.g. (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong ), we want to compute the most likely hypothesis: v NB

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem

More information

Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Artificial Intelligence Bayesian Networks Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence

More information

PROBABILISTIC REASONING SYSTEMS

PROBABILISTIC REASONING SYSTEMS PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain

More information

Bayesian Classification. Bayesian Classification: Why?

Bayesian Classification. Bayesian Classification: Why? Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches

More information

Approximate Inference

Approximate Inference Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate

More information

Stephen Scott.

Stephen Scott. 1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

More information

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Building Bayesian Networks. Lecture3: Building BN p.1

Building Bayesian Networks. Lecture3: Building BN p.1 Building Bayesian Networks Lecture3: Building BN p.1 The focus today... Problem solving by Bayesian networks Designing Bayesian networks Qualitative part (structure) Quantitative part (probability assessment)

More information

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B) Examples My mood can take 2 possible values: happy, sad. The weather can take 3 possible vales: sunny, rainy, cloudy My friends know me pretty well and say that: P(Mood=happy Weather=rainy) = 0.25 P(Mood=happy

More information

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes. CMSC 310 Artificial Intelligence Probabilistic Reasoning and Bayesian Belief Networks Probabilities, Random Variables, Probability Distribution, Conditional Probability, Joint Distributions, Bayes Theorem

More information

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1 Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II * A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II * kevin small & byron wallace * Slides borrow heavily from Andrew Moore, Weng- Keen Wong and Longin Jan Latecki today

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probability. CS 3793/5233 Artificial Intelligence Probability 1 CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions

More information

Lecture 8: Bayesian Networks

Lecture 8: Bayesian Networks Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1

More information

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.

Soft Computing. Lecture Notes on Machine Learning. Matteo Mattecci. Soft Computing Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning

More information

Sampling from Bayes Nets

Sampling from Bayes Nets from Bayes Nets http://www.youtube.com/watch?v=mvrtaljp8dm http://www.youtube.com/watch?v=geqip_0vjec Paper reviews Should be useful feedback for the authors A critique of the paper No paper is perfect!

More information

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Steve Cheng, Ph.D. Guest Speaker for EECS 6893 Big Data Analytics Columbia University October 26, 2017 Outline Introduction Probability

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Probabilistic Classification

Probabilistic Classification Bayesian Networks Probabilistic Classification Goal: Gather Labeled Training Data Build/Learn a Probability Model Use the model to infer class labels for unlabeled data points Example: Spam Filtering...

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods AI: Stochastic inference in BNs AI: Stochastic inference in BNs 1 Outline ypes of inference in (causal) BNs Hardness of exact

More information

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams. Course Introduction Probabilistic Modelling and Reasoning Chris Williams School of Informatics, University of Edinburgh September 2008 Welcome Administration Handout Books Assignments Tutorials Course

More information

Artificial Intelligence. Topic

Artificial Intelligence. Topic Artificial Intelligence Topic What is decision tree? A tree where each branching node represents a choice between two or more alternatives, with every branching node being part of a path to a leaf node

More information

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses Bayesian Learning Two Roles for Bayesian Methods Probabilistic approach to inference. Quantities of interest are governed by prob. dist. and optimal decisions can be made by reasoning about these prob.

More information

Probability Based Learning

Probability Based Learning Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic

More information

CS 188: Artificial Intelligence. Our Status in CS188

CS 188: Artificial Intelligence. Our Status in CS188 CS 188: Artificial Intelligence Probability Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Our Status in CS188 We re done with Part I Search and Planning! Part II: Probabilistic Reasoning

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas Hidden Markov Models Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1

More information

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Reasoning. (Mostly using Bayesian Networks) Probabilistic Reasoning (Mostly using Bayesian Networks) Introduction: Why probabilistic reasoning? The world is not deterministic. (Usually because information is limited.) Ways of coping with uncertainty

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

10-701/ Machine Learning: Assignment 1

10-701/ Machine Learning: Assignment 1 10-701/15-781 Machine Learning: Assignment 1 The assignment is due September 27, 2005 at the beginning of class. Write your name in the top right-hand corner of each page submitted. No paperclips, folders,

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision: Confusion matrix classifier-determined positive label classifier-determined negative label true positive a b label true negative c d label Accuracy = (a+d)/(a+b+c+d) a = true positives b = false negatives

More information

HMM part 1. Dr Philip Jackson

HMM part 1. Dr Philip Jackson Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM part 1 Dr Philip Jackson Probability fundamentals Markov models State topology diagrams Hidden Markov models -

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Logistics. Naïve Bayes & Expectation Maximization. 573 Schedule. Coming Soon. Estimation Models. Topics

Logistics. Naïve Bayes & Expectation Maximization. 573 Schedule. Coming Soon. Estimation Models. Topics Logistics Naïve Bayes & Expectation Maximization CSE 7 eam Meetings Midterm Open book, notes Studying See AIMA exercises Daniel S. Weld Daniel S. Weld 7 Schedule Selected opics Coming Soon Selected opics

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Bayes Nets: Sampling

Bayes Nets: Sampling Bayes Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Approximate Inference:

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Class Website: http://ircamera.as.arizona.edu/astr_518 Class Texts: Practical Statistics for Astronomers, J.V. Wall, and C.R. Jenkins Measuring the Universe,

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

CSE 473: Artificial Intelligence Autumn Topics

CSE 473: Artificial Intelligence Autumn Topics CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics

More information

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers and Logistic Regression Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers Combines all ideas we ve covered Conditional Independence Bayes Rule Statistical Estimation

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline CSE 473: Artificial Intelligence Probability Review à Markov Models Daniel Weld University of Washington [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

T Machine Learning: Basic Principles

T Machine Learning: Basic Principles Machine Learning: Basic Principles Bayesian Networks Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007

More information

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem! COMP61011 Probabilistic Classifiers Part 1, Bayes Theorem Reverend Thomas Bayes, 1702-1761 p ( T W ) W T ) T ) W ) Bayes Theorem forms the backbone of the past 20 years of ML research into probabilistic

More information

PROBABILITY AND INFERENCE

PROBABILITY AND INFERENCE PROBABILITY AND INFERENCE Progress Report We ve finished Part I: Problem Solving! Part II: Reasoning with uncertainty Part III: Machine Learning 1 Today Random variables and probabilities Joint, marginal,

More information

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks

More information