# CS 188: Artificial Intelligence. Bayes Nets

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Representation Bayes Nets Conditional Independences Probabilistic Inference Enumeration (exact, exponential complexity) Variable elimination (exact, worst-case exponential complexity, often better) Probabilistic inference is NP-complete Sampling (approximate) Learning Bayes Nets from Data 2 1

2 Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E Most likely explanation: J M 4 Inference by Enumeration Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them Example: B E A J M 5 2

3 Example: Enumeration In this simple method, we only need the BN to synthesize the joint entries 6 Inference by Enumeration? 7 3

4 Variable Elimination Why is inference by enumeration so slow? You join up the whole joint distribution before you sum out the hidden variables Idea: interleave joining and marginalizing! Called Variable Elimination Still NP-hard, but usually much faster than inference by enumeration We ll need some new notation to define VE 8 Factor Zoo I Joint distribution: P(X,Y) Entries P(x,y) for all x, y Sums to 1 T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) T W P cold sun 0.2 cold rain 0.3 Number of capitals = dimensionality of the table 9 4

5 Factor Zoo II Family of conditionals: P(X Y) Multiple conditionals Entries P(x y) for all x, y Sums to Y T W P hot sun 0.8 hot rain 0.2 cold sun 0.4 cold rain 0.6 Single conditional: P(Y x) Entries P(y x) for fixed x, all y Sums to 1 T W P cold sun 0.4 cold rain Factor Zoo III Specified family: P(y X) Entries P(y x) for fixed y, but for all x Sums to who knows! T W P hot rain 0.2 cold rain 0.6 In general, when we write P(Y 1 Y N X 1 X M ) It is a factor, a multi-dimensional array Its values are all P(y 1 y N x 1 x M ) Any assigned X or Y is a dimension missing (selected) from the array 11 5

6 Example: Traffic Domain Random Variables R: Raining T: Traffic L: Late for class! R +r r 0.9 T L +r +t 0.8 +r - t r +t r - t 0.9 +t +l 0.3 +t - l t +l t - l Variable Elimination Outline Track objects called factors Initial factors are local CPTs (one per node) +r r 0.9 Any known values are selected E.g. if we know +r +t 0.8 +r - t r +t r - t 0.9 +t +l 0.3 +t - l t +l t - l 0.9, the initial factors are +r r 0.9 +r +t 0.8 +r - t r +t r - t 0.9 +t +l t +l 0.1 VE: Alternately join factors and eliminate variables 13 6

7 Operation 1: Join Factors First basic operation: joining factors Combining factors: Just like a database join Get all factors over the joining variable Build a new factor over the union of the variables involved Example: Join on R R T +r r 0.9 +r +t 0.8 +r - t r +t r - t 0.9 +r +t r - t r +t r - t 0.81 R,T Computation for each entry: pointwise products 14 Example: Multiple Joins R T L +r r 0.9 +r +t 0.8 +r - t r +t r - t 0.9 Join R +r +t r - t r +t r - t 0.81 R, T L +t +l 0.3 +t - l t +l t - l 0.9 +t +l 0.3 +t - l t +l t - l

8 Example: Multiple Joins R, T, L R, T L +r +t r - t r +t r - t t +l 0.3 +t - l t +l t - l 0.9 Join T +r +t +l r +t - l r - t +l r - t - l r +t +l r +t - l r - t +l r - t - l Operation 2: Eliminate Second basic operation: marginalization Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: +r +t r - t r +t r - t t t

9 Multiple Elimination R, T, L T, L L +r +t +l r +t - l r - t +l r - t - l r +t +l r +t - l r - t +l r - t - l Sum out R +t +l t - l t +l t - l Sum out T +l l P(L) : Marginalizing Early! +r r 0.9 Join R Sum out R R T +r +t 0.8 +r - t r +t r - t 0.9 +r +t r - t r +t r - t 0.81 R, T +t t 0.83 T L +t +l 0.3 +t - l t +l t - l 0.9 +t +l 0.3 +t - l t +l t - l 0.9 L +t +l 0.3 +t - l t +l t - l 0.9 L 20 9

10 Marginalizing Early (aka VE*) T L Join T T, L L Sum out T +t t t +l 0.3 +t - l t +l t - l 0.9 +t +l t - l t +l t - l l l * VE is variable elimination Evidence If evidence, start with factors that select that evidence No evidence uses these initial factors: +r r 0.9 +r +t 0.8 +r - t r +t r - t 0.9 +t +l 0.3 +t - l t +l t - l 0.9 Computing, the initial factors become: +r 0.1 +r +t 0.8 +r - t 0.2 +t +l 0.3 +t - l t +l t - l 0.9 We eliminate all vars other than query + evidence 22 10

11 Evidence II Result will be a selected joint of query and evidence E.g. for P(L +r), we d end up with: +r +l r - l Normalize +l l 0.74 To get our answer, just normalize this! That s it! 23 General Variable Elimination Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize 24 11

12 Example Choose A 25 Example Choose E Finish with B Normalize 26 12

13 Same Example in Equations marginal can be obtained from joint by summing out use Bayes net joint distribution expression use x*(y+z) = xy + xz joining on a, and then summing out gives f_1 x*(y+z) = xy + xz joining on e, and then summing out gives f_2 28 All we are doing is exploiting xy + xz = x(y+z) to improve computational efficiency! Another (bit more abstractly worked out) Variable Elimination Example Computational complexity critically depends on the largest factor being generated in this process. Size of factor = number of entries in table. In example above (assuming binary) all factors generated are of size as they all only have one variable (Z, Z, and X3 respectively)

14 Variable Elimination Ordering For the query P(X n y 1,,y n ) work through the following two different orderings as done in previous slide: Z, X 1,, X n-1 and X 1,, X n-1, Z. What is the size of the maximum factor generated for each of the orderings? Answer: 2 n versus 2 (assuming binary) In general: the ordering can greatly affect efficiency. 30 Computational and Space Complexity of Variable Elimination The computational and space complexity of variable elimination is determined by the largest factor The elimination ordering can greatly affect the size of the largest factor. E.g., previous slide s example 2 n vs. 2 Does there always exist an ordering that only results in small factors? No! 31 14

15 Worst Case Complexity? Consider the 3-SAT clause: which can be encoded by the following Bayes net: If we can answer P(z) equal to zero or not, we answered whether the 3-SAT problem has a solution. Subtlety: why the cascaded version of the AND rather than feeding all OR clauses into a single AND? Answer: a single AND would have an exponentially large CPT, whereas with representation above the Bayes net has small CPTs only. Hence inference in Bayes nets is NP-hard. No known efficient probabilistic inference in general. 32 Polytrees A polytree is a directed graph with no undirected cycles For poly-trees you can always find an ordering that is efficient Try it!! Cut-set conditioning for Bayes net inference Choose set of variables such that if removed only a polytree remains 33 Think about how the specifics would work out! 15

16 Representation Bayes Nets Conditional Independences Probabilistic Inference Enumeration (exact, exponential complexity) Variable elimination (exact, worst-case exponential complexity, often better) Probabilistic inference is NP-complete Sampling (approximate) Learning Bayes Nets from Data 36 Sampling Simulation has a name: sampling (e.g., predicting the weather, basketball games, ) Basic idea: Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P Why sample? Learning: get samples from a distribution you don t know Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination) 37 16

17 Sampling How do you sample? Simplest way is to use a random number generator to get a continuous value uniformly distributed between 0 and 1 (e.g. random() in Python) Assign each value in the domain of your random variable a sub-interval of [0,1] with a size equal to its probability The sub-intervals cannot overlap 38 Sampling Example Each value in the domain of W has a subinterval of [0,1] with a size equal to its probability W P(W) Sun 0.6 Rain 0.1 Fog 0.3 Meteor 0.0 u is a uniform random value in [0, 1] if 0.0 u<0.6,w = sun if 0.6 u<0.7,w = rain if 0.7 u<1.0,w = fog e.g. if random() returns u = 0.83, then our sample is w = fog 39 17

18 Sampling in Bayes Nets Prior Sampling Rejection Sampling Likelihood Weighting Gibbs Sampling 40 Prior Sampling +c c 0.5 Cloudy +c +s s c +s s 0.5 Sprinkler Rain +c +r r c +r r 0.8 +s +r +w w r +w w s +r +w w r +w w 0.99 WetGrass Samples: +c, -s, +r, +w -c, +s, -r, +w 41 18

19 Prior Sampling To generate one sample from a Bayes net with n variables. Assume variables are named such that ordering X 1, X 2,, X n is consistent with the DAG. For i=1, 2,, n Sample x i from P(X i Parents(X i )) End For Return (x 1, x 2,, x n ) 42 Prior Sampling This process generates samples with probability: i.e. the BN s joint probability Let the number of samples of an event be Then I.e., the sampling procedure is consistent 43 19

20 Example We ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w S +c, -s, +r, +w -c, -s, -r, +w If we want to know P(W) Sprinkler Cloudy C WetGrass W Rain R We have counts <+w:4, -w:1> Normalize to get P(W) = <+w:0.8, -w:0.2> This will get closer to the true distribution with more samples Can estimate anything else, too What about P(C +w)? P(C +r, +w)? P(C -r, -w)? Fast: can use fewer samples if less time (what s the drawback?) 44 Rejection Sampling Let s say we want P(C) No point keeping all samples around Just tally counts of C as we go Let s say we want P(C +s) Same thing: tally C outcomes, but ignore (reject) samples which don t have S=+s This is called rejection sampling It is also consistent for conditional probabilities (i.e., correct in the limit) Sprinkler S Cloudy C WetGrass W Rain R +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w 45 20

21 Rejection Sampling For i=1, 2,, n Sample x i from P(X i Parents(X i )) If x i not consistent with the evidence in the query, exit this for-loop and no sample is generated End For Return (x 1, x 2,, x n ) 46 Likelihood Weighting Problem with rejection sampling: If evidence is unlikely, you reject a lot of samples You don t exploit your evidence as you sample Consider P(B +a) Burglary Alarm -b, -a -b, -a -b, -a -b, -a +b, +a Idea: fix evidence variables and sample the rest -b +a -b, +a Burglary Alarm -b, +a -b, +a Problem: sample distribution not consistent! +b, +a Solution: weight by probability of evidence given parents 48 21

22 Likelihood Weighting +c c 0.5 +c +s s c +s s 0.5 Sprinkler Cloudy Rain +c +r r c +r r 0.8 +s +r +w w r +w w s +r +w w r +w w 0.99 WetGrass Samples: +c, +s, +r, +w 49 Likelihood Weighting Set w = 1.0 For i=1, 2,, n If X i is an evidence variable Set X i = observation x i for X i Set w = w * P(x i Parents(X i )) Else Sample x i from P(X i Parents(X i )) End For Return (x 1, x 2,, x n ), w 50 22

23 Likelihood Weighting Sampling distribution if z sampled and e fixed evidence Cloudy C Now, samples have weights S R W Together, weighted sampling distribution is consistent 51 Likelihood Weighting Likelihood weighting is good We have taken evidence into account as we generate the sample E.g. here, W s value will get picked based on the evidence values of S, R More of our samples will reflect the state of the world suggested by the evidence Cloudy C Likelihood weighting doesn t solve all our problems Evidence influences the choice of downstream variables, but not upstream ones (C isn t more likely to get a value matching the evidence) S W Rain R We would like to consider evidence when we sample every variable à Gibbs sampling 52 23

24 Gibbs Sampling Procedure: keep track of a full instantiation x 1, x 2,, x n. Start with an arbitrary instantation consistent with the evidence. Sample one variable at a time, conditioned on all the rest, but keep evidence fixed. Keep repeating this for a long time. Property: in the limit of repeating this infinitely many times the resulting sample is coming from the correct distribution What s the point: both upstream and downstream variables condition on evidence. In contrast: likelihood weighting only conditions on upstream evidence, and hence weights obtained in likelihood weighting can sometimes be very small. Sum of weights over all samples is indicative of how many effective samples were obtained, so want high weight. 53 Gibbs Sampling Say we want to sample P(S R = +r) Step 1: Initialize Set evidence (R = +r) Set all other variables (S, C, W) to random values (e.g. by prior sampling or just uniformly sampling; say S = s, W = +w, C = -c) Steps 2+: Repeat the following for some number of iterations Choose a non-evidence variable (S, W, or C in this case) Sample this variable conditioned on nothing else changing The first time through, if we pick S, we sample from P(S R = +r, W = +w, C =-c) The new sample can only be different in a single variable 54 24

25 Gibbs Sampling Example Want to sample from P(R +s,-c,-w) Shorthand for P(R S=+s,C=-c,W=-w) P (R + s, c, w) = = = = P (R, +s, c, w) P (+s, c, w) r r r P (R, +s, c, w) P (R = r, +s, c, w) P ( c)p (+s c)p (R c)p ( w + s, R) P ( c)p (+s c)p (R = r c)p ( w + s, R = r) P (R c)p ( w + s, R) P (R = r c)p ( w + s, R = r) Many things cancel out -- just a join on R 56 Further Reading* Gibbs sampling is a special case of more general methods called Markov chain Monte Carlo (MCMC) methods Metropolis-Hastings is one of the more famous MCMC methods (in fact, Gibbs sampling is a special case of Metropolis- Hastings) You may read about Monte Carlo methods they re just sampling 57 25

### CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

### Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

CS 188: Artificial Intelligence Spring 2011 Announcements Assignments W4 out today --- this is your last written!! Any assignments you have not picked up yet In bin in 283 Soda [same room as for submission

### Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

### CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel --- UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore

### Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Sampling Methods Oliver Schulte - CMP 419/726 Bishop PRML Ch. 11 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure

### Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Outline Syntax Semantics Exact inference by enumeration Exact inference by variable elimination s A simple, graphical notation for conditional independence assertions and hence for compact specication

### Bayesian networks. Chapter Chapter

Bayesian networks Chapter 14.1 3 Chapter 14.1 3 1 Outline Syntax Semantics Parameterized distributions Chapter 14.1 3 2 Bayesian networks A simple, graphical notation for conditional independence assertions

### CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 Lecture 14: Bayes Nets 10/14/2008 Dan Klein UC Berkeley 1 1 Announcements Midterm 10/21! One page note sheet Review sessions Friday and Sunday (similar) OHs on

### Inference in Bayesian networks

Inference in Bayesian networks AIMA2e hapter 14.4 5 1 Outline Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation Approximate inference

### Markov Networks.

Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

### Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence

### Bayes Networks 6.872/HST.950

Bayes Networks 6.872/HST.950 What Probabilistic Models Should We Use? Full joint distribution Completely expressive Hugely data-hungry Exponential computational complexity Naive Bayes (full conditional

### Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

1 Hal Daumé III (me@hal3.name) Probability 101++ Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 27 Mar 2012 Many slides courtesy of Dan

### Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Sampling Methods Machine Learning orsten Möller Möller/Mori 1 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure defined

### Quantifying uncertainty & Bayesian networks

Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,

### Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Hidden Markov Models Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 19 Apr 2012 Many slides courtesy of Dan Klein, Stuart Russell, or

### Bayesian networks. Chapter 14, Sections 1 4

Bayesian networks Chapter 14, Sections 1 4 Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1 4 1 Bayesian networks

### A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

### Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

### CS 188: Artificial Intelligence Spring 2009

CS 188: Artificial Intelligence Spring 2009 Lecture 21: Hidden Markov Models 4/7/2009 John DeNero UC Berkeley Slides adapted from Dan Klein Announcements Written 3 deadline extended! Posted last Friday

### Lecture 8: Bayesian Networks

Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1

### Bayesian Networks. Philipp Koehn. 6 April 2017

Bayesian Networks Philipp Koehn 6 April 2017 Outline 1 Bayesian Networks Parameterized distributions Exact inference Approximate inference 2 bayesian networks Bayesian Networks 3 A simple, graphical notation

### Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

### Bayesian networks: approximate inference

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Motivation Because of the (worst-case) intractability of exact

### Bayesian Networks. Philipp Koehn. 29 October 2015

Bayesian Networks Philipp Koehn 29 October 2015 Outline 1 Bayesian Networks Parameterized distributions Exact inference Approximate inference 2 bayesian networks Bayesian Networks 3 A simple, graphical

### Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Steve Cheng, Ph.D. Guest Speaker for EECS 6893 Big Data Analytics Columbia University October 26, 2017 Outline Introduction Probability

### CS 343: Artificial Intelligence

CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro

### Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Bayesian Networks Machine Learning, Fall 2010 Slides based on material from the Russell and Norvig AI Book, Ch. 14 1 Administrativia Bayesian networks The inference problem: given a BN, how to make predictions

### Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Bayesian Networks Today we ll introduce Bayesian Networks. This material is covered in chapters 13 and 14. Chapter 13 gives basic background on probability and Chapter 14 talks about Bayesian Networks.

### Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Soft Computing Lecture Notes on Machine Learning Matteo Matteucci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning

### The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

### Inference in Bayesian Networks

Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

### Introduction to Bayesian Learning

Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

### Informatics 2D Reasoning and Agents Semester 2,

Informatics 2D Reasoning and Agents Semester 2, 2017 2018 Alex Lascarides alex@inf.ed.ac.uk Lecture 23 Probabilistic Reasoning with Bayesian Networks 15th March 2018 Informatics UoE Informatics 2D 1 Where

### CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Hidden Markov Models Instructor: Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein, Pieter Abbeel, and Anca. http://ai.berkeley.edu.]

### Lecture 10: Bayesian Networks and Inference

Lecture 10: Bayesian Networks and Inference CS 580 (001) - Spring 2016 Amarda Shehu Department of Computer Science George Mason University, Fairfax, VA, USA Apr 13-20, 2016 Amarda Shehu (580) 1 1 Outline

### Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

Probability, Markov models and HMMs Vibhav Gogate The University of Texas at Dallas CS 6364 Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell and Andrew Moore

### Expectation Maximization Algorithm

Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

### DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

### Variable Elimination: Algorithm

Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

### CS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III

CS221 / Autumn 2017 / Liang & Ermon Lecture 15: Bayesian networks III cs221.stanford.edu/q Question Which is computationally more expensive for Bayesian networks? probabilistic inference given the parameters

### CS221 / Autumn 2017 / Liang & Ermon. Lecture 13: Bayesian networks I

CS221 / Autumn 2017 / Liang & Ermon Lecture 13: Bayesian networks I Review: definition X 1 X 2 X 3 f 1 f 2 f 3 f 4 Definition: factor graph Variables: X = (X 1,..., X n ), where X i Domain i Factors: f

### Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:

### Bayesian Machine Learning

Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

### Probabilistic Representation and Reasoning

Probabilistic Representation and Reasoning Alessandro Panella Department of Computer Science University of Illinois at Chicago May 4, 2010 Alessandro Panella (CS Dept. - UIC) Probabilistic Representation

### MCMC notes by Mark Holder

MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

### Introduction to Bayesian Networks

Introduction to Bayesian Networks Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/23 Outline Basic Concepts Bayesian

### Probabilistic Robotics

Probabilistic Robotics Overview of probability, Representing uncertainty Propagation of uncertainty, Bayes Rule Application to Localization and Mapping Slides from Autonomous Robots (Siegwart and Nourbaksh),

### CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

### Variable Elimination: Algorithm

Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

### Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

### 4 : Exact Inference: Variable Elimination

10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

### Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

### CMPSCI 240: Reasoning about Uncertainty

CMPSCI 240: Reasoning about Uncertainty Lecture 17: Representing Joint PMFs and Bayesian Networks Andrew McGregor University of Massachusetts Last Compiled: April 7, 2017 Warm Up: Joint distributions Recall

### Monte Carlo Methods. Geoff Gordon February 9, 2006

Monte Carlo Methods Geoff Gordon ggordon@cs.cmu.edu February 9, 2006 Numerical integration problem 5 4 3 f(x,y) 2 1 1 0 0.5 0 X 0.5 1 1 0.8 0.6 0.4 Y 0.2 0 0.2 0.4 0.6 0.8 1 x X f(x)dx Used for: function

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

### Reasoning Under Uncertainty

Reasoning Under Uncertainty Introduction Representing uncertain knowledge: logic and probability (a reminder!) Probabilistic inference using the joint probability distribution Bayesian networks (theory

### CPSC 540: Machine Learning

CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

### Stephen Scott.

1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

### Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Introduction to Probabilistic Reasoning Brian C. Williams 16.410/16.413 November 17 th, 2010 11/17/10 copyright Brian Williams, 2005-10 1 Brian C. Williams, copyright 2000-09 Image credit: NASA. Assignment

### Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Problem of Logic Agents 22c:145 Artificial Intelligence Uncertainty Reading: Ch 13. Russell & Norvig Logic-agents almost never have access to the whole truth about their environments. A rational agent

### Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

### Markov Chain Monte Carlo (MCMC)

School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due

### Introduction to Artificial Intelligence Belief networks

Introduction to Artificial Intelligence Belief networks Chapter 15.1 2 Dieter Fox Based on AIMA Slides c S. Russell and P. Norvig, 1998 Chapter 15.1 2 0-0 Outline Bayesian networks: syntax and semantics

### Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

### CS 188, Fall 2005 Solutions for Assignment 4

CS 188, Fall 005 Solutions for Assignment 4 1. 13.1[5pts] Show from first principles that P (A B A) = 1 The first principles needed here are the definition of conditional probability, P (X Y ) = P (X Y

### Foundations of Artificial Intelligence

Foundations of Artificial Intelligence 12. Making Simple Decisions under Uncertainty Probability Theory, Bayesian Networks, Other Approaches Wolfram Burgard, Maren Bennewitz, and Marco Ragni Albert-Ludwigs-Universität

### Bayesian Networks: Independencies and Inference

Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful

### Particle-Based Approximate Inference on Graphical Model

article-based Approimate Inference on Graphical Model Reference: robabilistic Graphical Model Ch. 2 Koller & Friedman CMU, 0-708, Fall 2009 robabilistic Graphical Models Lectures 8,9 Eric ing attern Recognition

### Belief Networks for Probabilistic Inference

1 Belief Networks for Probabilistic Inference Liliana Mamani Sanchez lmamanis@tcd.ie October 27, 2015 Background Last lecture we saw that using joint distributions for probabilistic inference presented

### CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II Particle Filters and Applications of HMMs Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials

### What Causality Is (stats for mathematicians)

What Causality Is (stats for mathematicians) Andrew Critch UC Berkeley August 31, 2011 Introduction Foreword: The value of examples With any hard question, it helps to start with simple, concrete versions

### CS 188: Artificial Intelligence. Machine Learning

CS 188: Artificial Intelligence Review of Machine Learning (ML) DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered.

### A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

### PROBABILISTIC REASONING Outline

PROBABILISTIC REASONING Outline Danica Kragic Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule September 23, 2005 Uncertainty - Let action A t = leave for airport t minutes

### Based on slides by Richard Zemel

CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

### Bayesian Networks. Exact Inference by Variable Elimination. Emma Rollon and Javier Larrosa Q

Bayesian Networks Exact Inference by Variable Elimination Emma Rollon and Javier Larrosa Q1-2015-2016 Emma Rollon and Javier Larrosa Bayesian Networks Q1-2015-2016 1 / 25 Recall the most usual queries

### Reasoning Under Uncertainty: Conditional Probability

Reasoning Under Uncertainty: Conditional Probability CPSC 322 Uncertainty 2 Textbook 6.1 Reasoning Under Uncertainty: Conditional Probability CPSC 322 Uncertainty 2, Slide 1 Lecture Overview 1 Recap 2

### Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Learning With Bayesian Networks Markus Kalisch ETH Zürich Inference in BNs - Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal

### Probabilistic Graphical Models

2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

### Propositional Logic. Logic. Propositional Logic Syntax. Propositional Logic

Propositional Logic Reading: Chapter 7.1, 7.3 7.5 [ased on slides from Jerry Zhu, Louis Oliphant and ndrew Moore] Logic If the rules of the world are presented formally, then a decision maker can use logical

### Bayesian belief networks. Inference.

Lecture 13 Bayesian belief networks. Inference. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Midterm exam Monday, March 17, 2003 In class Closed book Material covered by Wednesday, March 12 Last

### 6.867 Machine learning, lecture 23 (Jaakkola)

Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

### CS 331: Artificial Intelligence Probability I. Dealing with Uncertainty

CS 331: Artificial Intelligence Probability I Thanks to Andrew Moore for some course material 1 Dealing with Uncertainty We want to get to the point where we can reason with uncertainty This will require

### Lecture 13: Bayesian networks I. Review: definition. Review: person tracking. Course plan X 1 X 2 X 3. Definition: factor graph. Variables.

Lecture 13: ayesian networks I Review: definition X 1 X 2 X 3 f 1 f 2 f 3 f 4 Definition: factor graph Variables: X = (X 1,..., X n ), where X i Domain i Factors: f 1,..., f m, with each f j (X) 0 Weight(x)

### Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

### Bayesian Networks: Representation, Variable Elimination

Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation

### A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II *

A.I. in health informatics lecture 3 clinical reasoning & probabilistic inference, II * kevin small & byron wallace * Slides borrow heavily from Andrew Moore, Weng- Keen Wong and Longin Jan Latecki today

### Inference methods in discrete Bayesian networks

University of Amsterdam MSc Mathematics Master Thesis Inference methods in discrete Bayesian networks Author: R. Weikamp Supervisors: prof. dr. R. Núñez Queija (UvA) dr. F. Phillipson (TNO) October 30,

### Learning in Bayesian Networks

Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

### CSE 473: Artificial Intelligence Spring 2014

CSE 473: Artificial Intelligence Spring 2014 Hanna Hajishirzi Problem Spaces and Search slides from Dan Klein, Stuart Russell, Andrew Moore, Dan Weld, Pieter Abbeel, Luke Zettelmoyer Outline Agents that

### Directed and Undirected Graphical Models

Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

### Artificial Intelligence Uncertainty

Artificial Intelligence Uncertainty Ch. 13 Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? A 25, A 60, A 3600 Uncertainty: partial observability (road

### Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Brief Intro. to Bayesian Networks Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom Factor Graph Review Tame combinatorics in many calculations Decoding codes (origin

### Probability & statistics for linguists Class 2: more probability. D. Lassiter (h/t: R. Levy)

Probability & statistics for linguists Class 2: more probability D. Lassiter (h/t: R. Levy) conditional probability P (A B) = when in doubt about meaning: draw pictures. P (A \ B) P (B) keep B- consistent

### Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Quantifying Uncertainty & Probabilistic Reasoning Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari Outline Previous Implementations What is Uncertainty? Acting Under Uncertainty Rational Decisions Basic

### Approximate Inference using MCMC

Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

### Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

### Review: Bayesian learning and inference

Review: Bayesian learning and inference Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem:

### Bayesian Methods in Artificial Intelligence

WDS'10 Proceedings of Contributed Papers, Part I, 25 30, 2010. ISBN 978-80-7378-139-2 MATFYZPRESS Bayesian Methods in Artificial Intelligence M. Kukačka Charles University, Faculty of Mathematics and Physics,