Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets
|
|
- Hortense Julianna Lynch
- 5 years ago
- Views:
Transcription
1 Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets
2
3
4 Bayesian Learning
5 Conditional Probability Probability of an event given the occurrence of some other event. P( X Y) P( X Y) P( Y) P( X, Y) P( Y)
6 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam.
7 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam?
8 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1
9 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder?
10 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder? P(Y) = 200 /1000 =.2
11 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder? P(Y) = 200 /1000 =.2 Given that an is in your junk folder, what is the probability it is spam?
12 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder? P(Y) = 200 /1000 =.2 Given that an is in your junk folder, what is the probability it is spam? P(X ÇY ) P(X Y ) = =.09 /.2 =.45 P(Y )
13 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder? P(Y) = 200 /1000 =.2 Given that an is in your junk folder, what is the probability it is spam? P(X ÇY ) P(X Y ) = =.09 /.2 =.45 P(Y ) Given that an is spam, what is the probability it is in your junk folder?
14 Example You ve been keeping track of the last s you received. You find that 100 of them are spam. You also find that 200 of them were put in your junk folder, of which 90 were spam. What is the probability an you receive is spam? P(X) =100 /1000 =.1 What is the probability an you receive is put in your junk folder? P(Y) = 200 /1000 =.2 Given that an is in your junk folder, what is the probability it is spam? P(X ÇY ) P(X Y ) = =.09 /.2 =.45 P(Y ) Given that an is spam, what is the probability it is in your junk folder? P(Y X) = P(X ÇY ) P(X) =.09 /.1=.9
15 Deriving Bayes Rule P(X Y ) = P(Y X) = P(X ÇY ) P(Y ) P(X ÇY ) P(X) Bayes rule : P(X Y ) = P(Y X)P(X) P(Y )
16 General Application to Data Models In machine learning we have a space H of hypotheses: h 1, h 2,..., h n (possibly infinite) We also have a set D of data We want to calculate P(h D) Bayes rule gives us: P( h D) P( D h) P( h) P( D)
17 Prior probability of h: Terminology P(h): Probability that hypothesis h is true given our prior knowledge If no prior knowledge, all h H are equally probable Posterior probability of h: P(h D): Probability that hypothesis h is true, given the data D. Likelihood of D: P(D h): Probability that we will see data D, given hypothesis h is true. Marginal likelihood of D P(D) = å h P(D h)p(h)
18 A Bayesian Approach to the Monty Hall Problem You are a contestant on a game show. There are 3 doors, A, B, and C. There is a new car behind one of them and goats behind the other two. Monty Hall, the host, knows what is behind the doors. He asks you to pick a door, any door. You pick door A. Monty tells you he will open a door, different from A, that has a goat behind it. He opens door B: behind it there is a goat. Monty now gives you a choice: Stick with your original choice A or switch to C.
19 Bayesian probability formulation Hypothesis space H: h 1 = Car is behind door A h 2 = Car is behind door B h 3 = Car is behind door C Data D: After you picked door A, Monty opened B to show a goat Prior probability: P(h 1 ) = 1/3 P(h 2 ) =1/3 P(h 3 ) =1/3 Likelihood: P(D h 1 ) = 1/2 P(D h 2 ) = 0 P(D h 3 ) = 1 What is P(h 1 D)? What is P(h 2 D)? What is P(h 3 D)? Marginal likelihood: P(D) = p(d h 1 )p(h 1 ) + p(d h 2 )p(h 2 ) + p(d h 3 )p(h 3 ) = 1/ /3 = 1/2
20 By Bayes rule: P(h 1 D) = P(D h 1 )P(h 1 ) P(D) æ = 1 ö ç è 2 ø æ ç 1ö è 3 ø (2) = 1 3 P(h 2 D) = P(D h 2)P(h 2 ) P(D) ( ) 1 è 3 = 0 æ ç ö ø (2) = 0 P(h 3 D) = P(D h 3 )P(h 3 ) P(D) ( ) 1 è 3 = 1 æ ç ö ø (2) = 2 3 So you should switch!
21 MAP ( maximum a posteriori ) Learning Bayes rule: P( h D) P( D h) P( h) P( D) Goal of learning: Find maximum a posteriori hypothesis h MAP : h MAP = argmax hîh P(h D) = argmax hîh P(D h)p(h) P(D) = argmax hîh P(D h)p(h) because P(D) is a constant independent of h.
22 Note: If every h H is equally probable, then h MAP argmax h H P( D h) h MAP is called the maximum likelihood hypothesis.
23 A Medical Example Toby takes a test for leukemia. The test has two outcomes: positive and negative. It is known that if the patient has leukemia, the test is positive 98% of the time. If the patient does not have leukemia, the test is positive 3% of the time. It is also known that of the population has leukemia. Toby s test is positive. Which is more likely: Toby has leukemia or Toby does not have leukemia?
24 Hypothesis space: h 1 = T. has leukemia h 2 = T. does not have leukemia Prior: of the population has leukemia. Thus P(h 1 ) = P(h 2 ) = Likelihood: P(+ h 1 ) = 0.98, P( h 1 ) = 0.02 P(+ h 2 ) = 0.03, P( h 2 ) = 0.97 Posterior knowledge: Blood test is + for this patient.
25 In summary Thus: h MAP = P(h 1 ) = 0.008, P(h 2 ) = P(+ h 1 ) = 0.98, P( h 1 ) = 0.02 P(+ h 2 ) = 0.03, P( h 2 ) = 0.97 argmax P(D h)p(h) hîh P(+ leukemia)p(leukemia) = (0.98)(0.008) = P(+ Øleukemia)P(Øleukemia) = (0.03)(0.992) = h MAP = Øleukemia
26 What is P(leukemia +)? P( h D) P( D h) P( h) P( D) So, P(leukemia +) = = 0.21 P(Øleukemia +) = = 0.79 These are called the posterior probabilities.
27 Bayesianism vs. Frequentism Classical probability: Frequentists Probability of a particular event is defined relative to its frequency in a sample space of events. E.g., probability of the coin will come up heads on the next trial is defined relative to the frequency of heads in a sample space of coin tosses. Bayesian probability: Combine measure of prior belief you have in a proposition with your subsequent observations of events. Example: Bayesian can assign probability to statement There was life on Mars a billion years ago but frequentist cannot.
28 Independence and Conditional Independence Recall that two random variables, X and Y, are independent if Two random variables, X and Y, are independent given C if ) ( ) ( ), ( Y P X P Y X P ) ( ) ( ), ( C Y P C X P C Y X P
29 Naive Bayes Classifier Let f (x) be a target function for classification: f (x) {+1, 1}. Let x = (x 1, x 2,..., x n ) We want to find the most probable class value, h MAP, given the data x: class MAP = argmax P(class D) class Î {+1,-1} = argmax P(class x 1, x 2,..., x n ) class Î {+1,-1}
30 By Bayes Theorem: class MAP = argmax P(x, x,..., x class)p(class) 1 2 n class Î {+1,-1} P(x 1, x 2,..., x n ) = argmax P(x 1, x 2,..., x n class)p(class) class Î {+1,-1} P(class) can be estimated from the training data. How? However, in general, not practical to use training data to estimate P(x 1, x 2,..., x n class). Why not?
31 Naive Bayes classifier: Assume P(x 1, x 2,..., x n class) = P(x 1 class)p(x 2 class) Is this a good assumption? P(x n class) Given this assumption, here s how to classify an instance x = (x 1, x 2,...,x n ): Naive Bayes classifier: class NB (x) = argmax P(class) class Î {+1,-1} Õ i P(x i class) To train: Estimate the values of these various probabilities over the training set.
32 Training data: Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Test data: D15 Sunny Cool High Strong?
33 Use training data to compute a probabilistic model: P(Outlook = Sunny Yes) = 2 / 9 P(Outlook = Sunny No) = 3 / 5 P(Outlook = Overcast Yes) = 4 / 9 P(Outlook = Overcast No) = 0 P(Outlook = Rain Yes) = 3 / 9 P(Outlook = Rain No) = 2 / 5 P(Temperature = Hot Yes) = 2 / 9 P(Temperature = Hot No) = 2 / 5 P(Temperature = Mild Yes) = 4 / 9 P(Temperature = Mild No) = 2 / 5 P(Temperature = Cool Yes) = 3 / 9 P(Temperature = Cool No) =1/ 5 P(Humidity = High Yes) = 3 / 9 P(Humidity = High No) = 4 / 5 P(Humidity = Normal Yes) = 6 / 9 P(Humidity = Normal No) =1/ 5 P(Wind = Strong Yes) = 3 / 9 P(Wind = Strong No) = 3 / 5 P(Wind = Weak Yes) = 6 / 9 P(Wind = Weak No) = 2 / 5
34 Use training data to compute a probabilistic model: P(Outlook = Sunny Yes) = 2 / 9 P(Outlook = Sunny No) = 3 / 5 P(Outlook = Overcast Yes) = 4 / 9 P(Outlook = Overcast No) = 0 P(Outlook = Rain Yes) = 3 / 9 P(Outlook = Rain No) = 2 / 5 P(Temperature = Hot Yes) = 2 / 9 P(Temperature = Hot No) = 2 / 5 P(Temperature = Mild Yes) = 4 / 9 P(Temperature = Mild No) = 2 / 5 P(Temperature = Cool Yes) = 3 / 9 P(Temperature = Cool No) =1/ 5 P(Humidity = High Yes) = 3 / 9 P(Humidity = High No) = 4 / 5 P(Humidity = Normal Yes) = 6 / 9 P(Humidity = Normal No) =1/ 5 P(Wind = Strong Yes) = 3 / 9 P(Wind = Strong No) = 3 / 5 P(Wind = Weak Yes) = 6 / 9 P(Wind = Weak No) = 2 / 5 Day Outlook Temp Humidity Wind PlayTennis D15 Sunny Cool High Strong?
35 Use training data to compute a probabilistic model: P(Outlook = Sunny Yes) = 2 / 9 P(Outlook = Sunny No) = 3 / 5 P(Outlook = Overcast Yes) = 4 / 9 P(Outlook = Overcast No) = 0 P(Outlook = Rain Yes) = 3 / 9 P(Outlook = Rain No) = 2 / 5 P(Temperature = Hot Yes) = 2 / 9 P(Temperature = Hot No) = 2 / 5 P(Temperature = Mild Yes) = 4 / 9 P(Temperature = Mild No) = 2 / 5 P(Temperature = Cool Yes) = 3 / 9 P(Temperature = Cool No) =1/ 5 P(Humidity = High Yes) = 3 / 9 P(Humidity = High No) = 4 / 5 P(Humidity = Normal Yes) = 6 / 9 P(Humidity = Normal No) =1/ 5 P(Wind = Strong Yes) = 3 / 9 P(Wind = Strong No) = 3 / 5 P(Wind = Weak Yes) = 6 / 9 P(Wind = Weak No) = 2 / 5 Day Outlook Temp Humidity Wind PlayTennis D15 Sunny Cool High Strong? class NB (x) = argmax P(class) class Î {+1,-1} Õ i P(x i class)
36 Estimating probabilities / Smoothing Recap: In previous example, we had a training set and a new example, (Outlook=sunny, Temperature=cool, Humidity=high, Wind=strong) We asked: What classification is given by a naive Bayes classifier? Let n c be the number of training instances with class c. Let n x i =a k c be the number of training instances with attribute value x i =a k and class c. Then: P(x i = a i c) = n c x i =a k n c
37 Problem with this method: If n c is very small, gives a poor estimate. E.g., P(Outlook = Overcast no) = 0.
38 Now suppose we want to classify a new instance: (Outlook=overcast, Temperature=cool, Humidity=high, Wind=strong) Then: P(no) Õ P(x i no) = 0 i This incorrectly gives us zero probability due to small sample.
39 One solution: Laplace smoothing (also called add-one smoothing) For each class c and attribute x i with value a k, add one virtual instance. That is, for each class c, recalculate: P(x i = a i c) = n c x i =a k +1 n c + K where K is the number of possible values of attribute a.
40 Training data: Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Laplace smoothing: Add the following virtual instances for Outlook: Outlook=Sunny: Yes Outlook=Overcast: Yes Outlook=Rain: Yes Outlook=Sunny: No Outlook=Overcast: No Outlook=Rain: No P(Outlook = overcast No) = 0 5 n x i =a k c +1 n c + K = = 1 8 P(Outlook = overcast Yes) = 4 9 n c x i =a k +1 n c + K = = 5 12
41 P(Outlook = Sunny Yes) = 2 / 9 3 /12 P(Outlook = Sunny No) = 3 / 5 4 / 8 P(Outlook = Overcast Yes) = 4 / 9 5 /12 P(Outlook = Overcast No) = 0 / 5 1/ 8 P(Outlook = Rain Yes) = 3 / 9 4 /12 P(Outlook = Rain No) = 2 / 5 3 / 8 Etc.
42 In-class exercise 1. Recall the Naïve Bayes Classifier class NB (x) = argmax P(class) class Î {+1,-1} Õ Consider this training set, in which each instance has four binary features and a binary class: i P(x i class) 2. Recall the formula for Laplace smoothing: Instance x1 x2 x3 x4 Class x POS x POS x POS x POS where K is the number of possible values of attribute a. (a) Apply Laplace smoothing to all the probabilities from the training set in question 1. x NEG x NEG x NEG (b) Use the smoothed probabilities to determine class NB for the following new instances: (a) Create a probabilistic model that you could use to classify new instances. That is, calculate P(class) and P(xi class) for each class. No smoothing is needed (yet). Instance x1 x2 x3 x4 Class x x (b) Use your probabilistic model to determine classnb for the following new instance: Instance x1 x2 x3 x4 Class x
43 Naive Bayes on continuousvalued attributes How to deal with continuous-valued attributes? Two possible solutions: Discretize Assume particular probability distribution of classes over values (estimate parameters from training data)
44 Discretization: Equal-Width Binning For each attribute x i, create k equal-width bins in interval from min(x i ) to max(x i ). The discrete attribute values are now the bins. Questions: What should k be? What if some bins have very few instances? Problem with balance between discretization bias and variance. The more bins, the lower the bias, but the higher the variance, due to small sample size.
45 Discretization: Equal-Frequency Binning For each attribute x i, create k bins so that each bin contains an equal number of values. Also has problems: What should k be? Hides outliers. Can group together instances that are far apart.
46 Gaussian Naïve Bayes Assume that within each class, values of each numeric feature are normally distributed: where μ i,c is the mean of feature i given the class c, and σ i,c is the standard deviation of feature i given the class c We estimate μ i,c and σ i,c from training data.
47 Example x 1 x 2 Class POS POS POS NEG NEG NEG
48 Example x 1 x 2 Class POS POS POS NEG NEG NEG P(POS) = 0.5 P(NEG) = 0.5
49 N 1,POS = N(x; 4.8, 1.8) N 2,POS = N(x; 7.1, 2.0) N 1,NEG = N(x; 4.7, 2.5) N 2,NEG = N(x; 4.2, 3.7)
50 Now, suppose you have a new example x, with x 1 = 5.2, x 2 = 6.3. What is class NB (x)?
51 Now, suppose you have a new example x, with x 1 = 5.2, x 2 = 6.3. What is class NB (x)? class NB (x) = argmax P(class) class Î {+1,-1} Õ i P(x i class) Note: N is the probability density function, but can be used analogously to probability in Naïve Bayes calculations.
52 Now, suppose you have a new example x, with x 1 = 5.2, x 2 = 6.3. What is class NB (x)? class NB (x) = argmax P(class) class Î {+1,-1} Õ i P(x i class)
53 Positive : P(POS)P(x 1 POS)P(x 2 POS) = (.5)(.22)(.18) =.02 Negative : P(NEG)P(x 1 NEG)P(x 2 NEG) = (.5)(.16)(.09) =.0072 class NB (x) = POS
54 Use logarithms to avoid underflow class NB (x) = argmax P(class) class Î {+1,-1} Õ i P(x i class) = æ argmax logçp(class) è class Î {+1,-1} Õ i P(x i ö class) ø = æ ö argmax çlog P(class)+ log P(x i class) è ø class Î {+1,-1} å i
55 Bayes Nets
56 Another example A patient comes into a doctor s office with a bad cough and a high fever. Hypothesis space H: Data D: h 1 : patient has flu h 2 : patient does not have flu coughing = true, fever = true Prior probabilities: Likelihoods Prob. of data p(h 1 ) =.1 p(d h1) =.8 P(D) = p(h 2 ) =. 9 p(d h2) =.4 Posterior probabilities: P(h 1 D) = P(h 2 D) =
57 Let s say we have the following random variables: cough fever flu smokes
58 Full joint probability distribution smokes cough cough Fever Fever Fever Fever Sum of all boxes is 1. flu p 1 p 2 p 3 p 4 flu p 5 p 6 p 7 p 8 smokes cough cough fever fever fever fever flu p 9 p 10 p 11 p 12 flu p 13 p 14 p 15 p 16 In principle, the full joint distribution can be used to answer any question about probabilities of these combined parameters. However, size of full joint distribution scales exponentially with number of parameters so is expensive to store and to compute with.
59 Bayesian networks Idea is to represent dependencies (or causal relations) for all the variables so that space and computation-time requirements are minimized. smokes flu cough fever Graphical Models
60 smoke true 0.2 false 0.8 smoke flu smoke cough true false True True True False False True false false flu Conditional probability tables for each node flu true 0.01 false 0.99 cough fever flu fever true false true false
61 Semantics of Bayesian networks If network is correct, can calculate full joint probability distribution from network. P(( X 1 x 1 ) ( X 2 x 2 )... ( X n x n )) n i 1 P( X i x i parents( X i )) where parents(x i ) denotes specific values of parents of X i.
62 Example Calculate P[( cough t) ( fever f ) ( flu f ) ( smoke f )]
63 Example Calculate P[( cough t) ( fever f ) ( flu f ) ( smoke f )]
64 Different types of inference in Bayesian Networks Causal inference Evidence is cause, inference is probability of effect Example: Instantiate evidence flu = true. What is P(fever flu)? P( fever flu).9 (up from.207)
65 Diagnostic inference Evidence is effect, inference is probability of cause Example: Instantiate evidence fever = true. What is P(flu fever)? P( fever flu) P( flu) (.9)(.01) P( flu fever).043 P( fever).207 (up from.01)
66 Example: What is P(flu cough)? (.8)(.8)](.01) [(.95)(.2) ) ( ) ( )] ( ), ( ) ( ), ( [ ) ( ) ( ) ( ) ( cough p flu P smoke p smoke flu cough P smoke p smoke flu cough P cough P flu P flu cough P cough flu P
67 Inter-causal inference Explain away different possible causes of effect Example: What is P(flu cough,smoke)? P( flu cough, smoke) p( flu cough smoke) p( cough smoke) p( cough (.95)(.01)(.2) /[(.95)(.01)(.2) (.6)(.2)(.99)] p( cough flu, smoke) p( flu) p( smoke) flu, smoke) p( flu) p( smoke) p( cough smoke, flu) p( smoke) p( flu) Why is P(flu cough,smoke) < P(flu cough)?
68 Complexity of Bayesian Networks For n random Boolean variables: Full joint probability distribution: 2 n entries Bayesian network with at most k parents per node: Each conditional probability table: at most 2 k entries Entire network: n 2 k entries
69 What are the advantages of Bayesian networks? Intuitive, concise representation of joint probability distribution (i.e., conditional dependencies) of a set of random variables. Represents beliefs and knowledge about a particular class of situations. Efficient (?) (approximate) inference algorithms Efficient, effective learning algorithms
70 Issues in Bayesian Networks Building / learning network topology Assigning / learning conditional probability tables Approximate inference via sampling
71 Real-World Example: The Lumière Project at Microsoft Research Bayesian network approach to answering user queries about Microsoft Office. At the time we initiated our project in Bayesian information retrieval, managers in the Office division were finding that users were having difficulty finding assistance efficiently. As an example, users working with the Excel spreadsheet might have required assistance with formatting a graph. Unfortunately, Excel has no knowledge about the common term, graph, and only considered in its keyword indexing the term chart.
72
73 Networks were developed by experts from user modeling studies.
74 Offspring of project was Office Assistant in Office 97.
75 flu smoke fever cough headache nausea ) ( ) ( ) ( ) ( ) ( ) ( ) ( flu headache P flu nausea P flu cough P flu fever P flu smoke P flu P headache nausea cough fever smoke flu P
76 flu smoke fever cough headache nausea Naive Bayes ) ( ) ( ),..., ( : tion for classifica More generally, j i i i j n j c C x X P c C P x X x X c C P
77 Learning network topology Many different approaches, including: Heuristic search, with evaluation based on information theory measures Genetic algorithms Using meta Bayesian networks!
78 Learning conditional probabilities In general, random variables are not binary, but real-valued Conditional probability tables conditional probability distributions Estimate parameters of these distributions from data
79 Approximate inference via sampling Recall: We can calculate full joint probability distribution from network. P( X,..., X ) d P( X parents( X 1 d i i i 1 where parents(x i ) denotes specific values of parents of X i. )) We can do diagnostic, causal, and inter-causal inference But if there are a lot of nodes in the network, this can be very slow! Need efficient algorithms to do approximate calculations!
80 A Précis of Sampling Algorithms Gibbs Sampling Suppose that we want to sample from: p x θ, x R d Basic Idea: Sample, sequentially from full conditionals. Initialize: x i : i = 1,..., D For t=1 T t sample : x ~ p x x t sample : x ~ p x x t sample : x ~ p x x ( 1) ( t) 1 1 ( 1) ( 1) ( t) 2 2 ( 2) ( 1) ( t) D D ( D) Complications: (i) How to order samples (can be random, but be careful); (ii) Need full conditionals (can use approximation). Under nice assumptions (ergodicity), we get sample: x ~ p x
81 A Précis of Sampling Algorithms Sampling Algorithms More Generally: How do we perform posterior inference more generally? Oftentimes, we rely on strong parametric assumptions (e.g. conjugacy, exponential family structures). Monte Carlo Approximation/Inference can get around this. Basic Idea: (1) Draw samples x s ~p(x θ). (2) Compute quantity of interest, e.g., (marginals): 1 E p x1 D x1, i, etc. S S 1 s In general: E f x f x p x dx f x S S
82 A Précis of Sampling Algorithms Sampling Algorithms More Generally: CDF Method (MC technique) Steps: (1) sample u~u(0,1) (2) F -1 (U)~F
83 A Précis of Sampling Algorithms Sampling Algorithms More Generally: Rejection Sampling (MC technique) One can show that x~p(x) Issues: need a good q(x), c and rejection rate can grow astronomically! Pros of MC sampling: samples are independent; Cons: very inefficient in high dimensions. Alternatively, one can use MCMC methods.
84 Markov Chain Monte Carlo Sampling One of most common methods used in real applications. Recall that: By construction of Bayesian network, a node is conditionally independent of its non-descendants, given its parents. Also recall that: a node can be conditionally dependent on its children and on the other parents of its children. (Why?) Definition: The Markov blanket of a variable X i is X i s parents, children, and children s other parents.
85 Example What is the Markov blanket of cough? of flu? smokes flu cough fever
86 Theorem: A node X i is conditionally independent of all other nodes in the network, given its Markov blanket.
87 Markov Chain Monte Carlo (MCMC) Sampling Start with random sample from variables: (x 1,..., x n ). This is the current state of the algorithm. Next state: Randomly sample value for one non-evidence variable X i, conditioned on current values in Markov Blanket of X i.
88 Example Query: What is P(cough smoke)? MCMC: Random sample, with evidence variables fixed: flu smoke fever cough truetrue false true Repeat: 1. Sample flu probabilistically, given current values of its Markov blanket: smoke = true, fever = false, cough = true Suppose result is false. New state: flu smoke fever cough false true false true
89 2. Sample cough, given current values of its Markov blanket: smoke = true, flu = false Suppose result is true. New state: flu smoke fever cough false true false true 3. Sample fever, given current values of its Markov blanket: flu = false Suppose result is true. New state: flu smoke fever cough false true true true
90 Each sample contributes to estimate for query P(cough smoke) Suppose we perform 100 such samples, 20 with cough = true and 80 with cough= false. Then answer to the query is P(cough smoke) =.20 Theorem: MCMC settles into behavior in which each state is sampled exactly according to its posterior probability, given the evidence.
91 Applying Bayesian Reasoning to Speech Recognition Task: Identify sequence of words uttered by speaker, given acoustic signal. Uncertainty introduced by noise, speaker error, variation in pronunciation, homonyms, etc. Thus speech recognition is viewed as problem of probabilistic inference.
92 So far, we ve looked at probabilistic reasoning in static environments. Speech: Time sequence of static environments. Let X be the state variables (i.e., set of non-evidence variables) describing the environment (e.g., Words said during time step t) Let E be the set of evidence variables (e.g., S = features of acoustic signal).
93 The E values and X joint probability distribution changes over time. t 1 : X 1, e 1 t 2 : X 2, e 2 etc.
94 At each t, we want to compute P(Words S). We know from Bayes rule: P( Words S) P( S Words) P( Words) P(S Words), for all words, is a previously learned acoustic model. E.g. For each word, probability distribution over phones, and for each phone, probability distribution over acoustic signals (which can vary in pitch, speed, volume). P(Words), for all words, is the language model, which specifies prior probability of each utterance. E.g. bigram model : probability of each word following each other word.
95 Speech recognition typically makes three assumptions: 1. Process underlying change is itself stationary i.e., state transition probabilities don t change 2. Current state X depends on only a finite history of previous states ( Markov assumption ). Markov process of order n: Current state depends only on n previous states. 3. Values e t of evidence variables depend only on current state X t. ( Sensor model )
96
97
98 Hidden Markov Models Markov model: Given state X t, what is probability of transitioning to next state X t+1? E.g., word bigram probabilities give P (word t+1 word t ) Hidden Markov model: There are observable states (e.g., signal S) and hidden states (e.g., Words). HMM represents probabilities of hidden states given observable states.
99
100
101 Example: I m firsty, um, can I have something to dwink?
102
103 Graphical Models and Computer Vision
Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.
Bayesian Learning Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. (Linked from class website) Conditional Probability Probability of
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationCOMP 328: Machine Learning
COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationLecture 9: Bayesian Learning
Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal
More informationBayesian Learning Features of Bayesian learning methods:
Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More informationIntroduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees
Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationIntroduction to Machine Learning
Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationBayesian Methods in Artificial Intelligence
WDS'10 Proceedings of Contributed Papers, Part I, 25 30, 2010. ISBN 978-80-7378-139-2 MATFYZPRESS Bayesian Methods in Artificial Intelligence M. Kukačka Charles University, Faculty of Mathematics and Physics,
More informationMachine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty
Machine Learning CS6375 --- Spring 2015 a Bayesian Learning (I) 1 Uncertainty Most real-world problems deal with uncertain information Diagnosis: Likely disease given observed symptoms Equipment repair:
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationNaïve Bayes Classifiers
Naïve Bayes Classifiers Example: PlayTennis (6.9.1) Given a new instance, e.g. (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong ), we want to compute the most likely hypothesis: v NB
More informationBayesian Learning. Bayesian Learning Criteria
Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:
More informationCSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas
ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem
More informationArtificial Intelligence Bayesian Networks
Artificial Intelligence Bayesian Networks Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence
More informationPROBABILISTIC REASONING SYSTEMS
PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain
More informationBayesian Classification. Bayesian Classification: Why?
Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches
More informationApproximate Inference
Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate
More informationStephen Scott.
1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationIntroduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013
Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What
More informationBayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction
15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationLecture 10: Introduction to reasoning under uncertainty. Uncertainty
Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationBuilding Bayesian Networks. Lecture3: Building BN p.1
Building Bayesian Networks Lecture3: Building BN p.1 The focus today... Problem solving by Bayesian networks Designing Bayesian networks Qualitative part (structure) Quantitative part (probability assessment)
More informationBayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)
Examples My mood can take 2 possible values: happy, sad. The weather can take 3 possible vales: sunny, rainy, cloudy My friends know me pretty well and say that: P(Mood=happy Weather=rainy) = 0.25 P(Mood=happy
More informationConsider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.
CMSC 310 Artificial Intelligence Probabilistic Reasoning and Bayesian Belief Networks Probabilities, Random Variables, Probability Distribution, Conditional Probability, Joint Distributions, Bayes Theorem
More informationBayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1
Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationIntroduction to Artificial Intelligence. Unit # 11
Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian
More informationA.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *
A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II * kevin small & byron wallace * Slides borrow heavily from Andrew Moore, Weng- Keen Wong and Longin Jan Latecki today
More informationCS 188: Artificial Intelligence. Bayes Nets
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationCOS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference
COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationProbability. CS 3793/5233 Artificial Intelligence Probability 1
CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions
More informationLecture 8: Bayesian Networks
Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1
More informationSoft Computing. Lecture Notes on Machine Learning. Matteo Mattecci.
Soft Computing Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning
More informationSampling from Bayes Nets
from Bayes Nets http://www.youtube.com/watch?v=mvrtaljp8dm http://www.youtube.com/watch?v=geqip_0vjec Paper reviews Should be useful feedback for the authors A critique of the paper No paper is perfect!
More informationImplementing Machine Reasoning using Bayesian Network in Big Data Analytics
Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Steve Cheng, Ph.D. Guest Speaker for EECS 6893 Big Data Analytics Columbia University October 26, 2017 Outline Introduction Probability
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationDirected and Undirected Graphical Models
Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationProbabilistic Classification
Bayesian Networks Probabilistic Classification Goal: Gather Labeled Training Data Build/Learn a Probability Model Use the model to infer class labels for unlabeled data points Example: Spam Filtering...
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationStochastic inference in Bayesian networks, Markov chain Monte Carlo methods
Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods AI: Stochastic inference in BNs AI: Stochastic inference in BNs 1 Outline ypes of inference in (causal) BNs Hardness of exact
More informationCourse Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.
Course Introduction Probabilistic Modelling and Reasoning Chris Williams School of Informatics, University of Edinburgh September 2008 Welcome Administration Handout Books Assignments Tutorials Course
More informationArtificial Intelligence. Topic
Artificial Intelligence Topic What is decision tree? A tree where each branching node represents a choice between two or more alternatives, with every branching node being part of a path to a leaf node
More informationBayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses
Bayesian Learning Two Roles for Bayesian Methods Probabilistic approach to inference. Quantities of interest are governed by prob. dist. and optimal decisions can be made by reasoning about these prob.
More informationProbability Based Learning
Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic
More informationCS 188: Artificial Intelligence. Our Status in CS188
CS 188: Artificial Intelligence Probability Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Our Status in CS188 We re done with Part I Search and Planning! Part II: Probabilistic Reasoning
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationHidden Markov Models. Vibhav Gogate The University of Texas at Dallas
Hidden Markov Models Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1
More informationProbabilistic Reasoning. (Mostly using Bayesian Networks)
Probabilistic Reasoning (Mostly using Bayesian Networks) Introduction: Why probabilistic reasoning? The world is not deterministic. (Usually because information is limited.) Ways of coping with uncertainty
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More information10-701/ Machine Learning: Assignment 1
10-701/15-781 Machine Learning: Assignment 1 The assignment is due September 27, 2005 at the beginning of class. Write your name in the top right-hand corner of each page submitted. No paperclips, folders,
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10
EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationConfusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:
Confusion matrix classifier-determined positive label classifier-determined negative label true positive a b label true negative c d label Accuracy = (a+d)/(a+b+c+d) a = true positives b = false negatives
More informationHMM part 1. Dr Philip Jackson
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM part 1 Dr Philip Jackson Probability fundamentals Markov models State topology diagrams Hidden Markov models -
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationLogistics. Naïve Bayes & Expectation Maximization. 573 Schedule. Coming Soon. Estimation Models. Topics
Logistics Naïve Bayes & Expectation Maximization CSE 7 eam Meetings Midterm Open book, notes Studying See AIMA exercises Daniel S. Weld Daniel S. Weld 7 Schedule Selected opics Coming Soon Selected opics
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationBayes Nets: Sampling
Bayes Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Approximate Inference:
More informationAST 418/518 Instrumentation and Statistics
AST 418/518 Instrumentation and Statistics Class Website: http://ircamera.as.arizona.edu/astr_518 Class Texts: Practical Statistics for Astronomers, J.V. Wall, and C.R. Jenkins Measuring the Universe,
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationCSE 473: Artificial Intelligence Autumn Topics
CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics
More informationNaïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014
Naïve Bayes Classifiers and Logistic Regression Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers Combines all ideas we ve covered Conditional Independence Bayes Rule Statistical Estimation
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationCSE 473: Artificial Intelligence Probability Review à Markov Models. Outline
CSE 473: Artificial Intelligence Probability Review à Markov Models Daniel Weld University of Washington [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationT Machine Learning: Basic Principles
Machine Learning: Basic Principles Bayesian Networks Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007
More informationTopics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning
Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationCOMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!
COMP61011 Probabilistic Classifiers Part 1, Bayes Theorem Reverend Thomas Bayes, 1702-1761 p ( T W ) W T ) T ) W ) Bayes Theorem forms the backbone of the past 20 years of ML research into probabilistic
More informationPROBABILITY AND INFERENCE
PROBABILITY AND INFERENCE Progress Report We ve finished Part I: Problem Solving! Part II: Reasoning with uncertainty Part III: Machine Learning 1 Today Random variables and probabilities Joint, marginal,
More informationEE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks
More information