# CMPSCI 240: Reasoning about Uncertainty

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 CMPSCI 240: Reasoning about Uncertainty Lecture 17: Representing Joint PMFs and Bayesian Networks Andrew McGregor University of Massachusetts Last Compiled: April 7, 2017

2 Warm Up: Joint distributions Recall Question 5 from Homework 2: Suppose you toss four fair coins (C 1, C 2, C 3, C 4 ). The outcome is the result of the four tosses. Let X correspond to the number of heads in the outcome. Let Y be 1 if there are an even number of heads in the outcome and 0 otherwise. Consider the joint distribution of P(C 1, C 2, C 3, C 4, X, Y ) Clicker Question: How many rows are in the table of the joint distribution? A 15 B 16 C 64 D 159 E 160

3 Warm Up: Joint distributions Consider a sample table: C 1 C 2 C 3 C 4 X Y P H H H H H H H H H H H H Naively, the joint distribution is over all possible values of C 1, C 2, C 3, C 4, X, and Y : = 160 Intuitively though, if we know C 1, C 2, C 3, C 4, the rest are deterministic (so 16...?)...

4 Outline 1 The Problem with Joint PMFs 2 Approximating Joint PMFs 3 Bayesian Networks 4 Using Bayesian Networks 5 Learning Bayesian Networks from data

5 Many Random Variables So far we have been discussing estimation problems assuming that we measure just one or two random variables on each repetition of an experiment. e.g., Spam filtering in last lecture: D 1 = contains cash D 2 = contains pharmacy P(H = spam D 1, D 2 ) In practice, it is much more common to encounter real-world problems that involve measuring multiple random variables X 1,..., X d for each repetition of the experiment. These random variables X 1,..., X d may have complex relationships among themselves.

6 Example: ICU Monitoring (d 10) Heart rate, blood pressure, temperature...

7 Example: Handwritten Digit Recognition (d 1000) Important to look at the digits together...easy to get files with 1000 characters!

8 The Problem with Joint PMFs Approximating Joint PMFs Bayesian Networks Using Bayesian Networks Learning BNs from data Example: Movie Recommendation (d 10, 000) A complex decision process. Needs to look at ratings and viewing patterns of a large number of subscribers.

9 Joint PMFs for Many Random Variables Before we can think about inference or estimation problems with many random variables, we need to think about the implications of representing joint PMFs over many random variables. Suppose we have an experiment where we obtain the values of d random variables X 1,..., X d per repetition. For now, assume all the random variables are binary. Rhetorical Question: If we have d binary random variables, how many numbers does it take to write down a joint distribution for them?

10 Joint PMFs for Many Random Variables Rhetorical Question: If we have d binary random variables, how many numbers does it take to write down a joint distribution for them? Answer: We need to define a probability for each d-bit sequence: P(X 1 = 0, X 2 = 0,..., X d = 0) P(X 1 = 1, X 2 = 0,..., X d = 0). P(X 1 = 1, X 2 = 1,..., X d = 1) The number of d-bit sequences is 2 d. Because we know that the probabilities have to add up to 1, we only need to write down 2 d 1 numbers to specify the full joint PMF on d binary variables.

11 How Fast is Exponential Growth? 2 d 1 grows extremely fast as d increases: d 2 d ,267,650,600,228,229,401,496,703,205,375.. Storing the full joint PMF for 100 binary variables would take about real numbers or about terabytes of storage! Joint PMFs grow in size so rapidly, we have no hope whatsoever of storing them explicitly for problems with more than about 30 random variables.

12 Outline 1 The Problem with Joint PMFs 2 Approximating Joint PMFs 3 Bayesian Networks 4 Using Bayesian Networks 5 Learning Bayesian Networks from data

13 Factorizing Joint Distributions We start by factorizing the joint distribution, i.e., re-writing the joint distribution as a product of conditional PMFs over single variables (called factors). We obtain this form by iteratively applying the multiplication rule and conditional multiplication rule for random variables. P(A, B) = P(A B)P(B) = P(B A)P(A) P(A, B C) = P(A B, C)P(B C) = P(B A, C)P(A C) Recall: Multiplication rules derived from definition of conditional probability

14 Factorizing Joint Distributions Let s start by specifying an ordering on the variables. Consider the random variables X, Y, Z. Use the order X 1, Y 2, Z 3 and pick a variable to condition on: P(X 1 =x, Y 2 =y, Z 3 =z) = P(Y 2 =y, Z 3 =z X 1 =x)p(x 1 =x) = P(Y 2 =y X 1 =x, Z 3 =z)p(z 3 =z X 1 =x)p(x 1 =x) So what? The representation is exact and has exactly the same storage requirements as the full joint PMF.

15 Conditional Independence: Simplification 1 To get a savings in terms of storage, we need to assume some additional conditional independencies. Suppose we assume that: P(Z 3 =z X 1 =x) = P(Z 3 =z) for all x, z P(Y 2 =y X 1 =x, Z 3 =z) = P(Y 2 =y) for all x, y, z. This gives the Marginal independence model P(X 1 = x, Y 2 = y, Z 3 = z) = P(Y 2 =y X 1 =x, Z 3 =z)p(z 3 =z X 1 =x)p(x 1 =x) = P(Y 2 =y)p(z 3 =z)p(x 1 =x)

16 Clicker Question P(X 1 = x, Y 2 = y, Z 3 = z) = P(Y 2 =y X 1 =x, Z 3 =z)p(z 3 =zx 1 =x)p(x 1 =x) = P(Y 2 =y)p(z 3 =z)p(x 1 =x) Clicker Question: How many numbers do we need to store for three binary random variables in this case (Recall: X, Y, and Z are all binary)? (A) 2 (B) 3 (C) 4 (D) 5 (E) 6 Answer: 3 (as opposed to = 7 if we encoded the full joint)

17 Conditional Independence: Simplification 2 Suppose we instead only assume that: P(Y 2 =y X 1 =x, Z 3 =z) = P(Y 2 =y X 1 =x) for all x, y, z. Note how this differ from the previous assumptions: Before: marginally independent; Now: conditionally independent. Mechanically, dropped all conditions before; Only dropping one now. This gives the conditional independence model : Y 2 is conditionally independent of Z 3 given X 1 P(X 1 = x, Y 2 = y, Z 3 = z) = P(Y 2 =y X 1 =x, Z 3 =z)p(z 3 =z X 1 =x)p(x 1 =x) = P(Y 2 =y X 1 =a 1 )P(Z 3 =z X 1 =x)p(x 1 =x)

18 Clicker Question P(X 1 = x, Y 2 = y, Z 3 = z) = P(Y 2 =y X 1 =x, Z 3 =z)p(z 3 =z X 1 =x)p(x 1 =x) = P(Y 2 =y X 1 =a 1 )P(Z 3 =z X 1 =x)p(x 1 =x) Clicker Question: How many numbers do we need to store for three binary random variables in this case? (A) 2 (B) 3 (C) 4 (D) 5 (E) 6 Answer: = 5 (as opposed to = 7 if we encoded the full joint)

19 Conditional Independence: Simplification 3 Suppose we instead only assume that: P(Y 2 =y X 1 =x, Z 3 =z) = P(Y 2 =y Z 3 =z) for all x, y, z. P(X 1 = x, Y 2 = y, Z 3 = z) = P(X 1 =x)p(y 2 =y X 1 =x)p(z 3 =z X 1 =x, Y 2 =y) Stuck! Factorization by itself is not a panacea

20 Example Toothache: boolean variable indicating whether the patient has a toothache Cavity: boolean variable indicating whether the patient has a cavity Catch: whether the dentists probe catches in the cavity If the patient has a cavity, the probability that the probe catches in it doesn t depend on whether he/she has a toothache P(Catch Toothache, Cavity) = P(Catch Cavity) Therefore, Catch is conditionally independent of Toothache given Cavity Likewise, Toothache is conditionally independent of Catch given Cavity P(Toothache Catch, Cavity) = P(Toothache Cavity) Equivalent statement: P(Toothache, Catch Cavity) = P(Toothache Cavity)P(Catch Cavity) Rhetorical Question: What is the space requirement to represent the Joint Distribution P(Toothache, Catch, Cavity)? Answer: 5

21 Outline 1 The Problem with Joint PMFs 2 Approximating Joint PMFs 3 Bayesian Networks 4 Using Bayesian Networks 5 Learning Bayesian Networks from data

22 Bayesian Networks Keeping track of all the conditional independence assumptions gets tedious when there are a lot of variables. Consider the following situation: You live in quiet neighborhood in the suburbs of LA. There are two reasons the alarm system in your house will go off: your house is broken into or there is an earthquake. If your alarm goes off you might get a call from the police department. You might also get a call from your neighbor. To get around this problem, we use Bayesian Networks to express the conditional independence structure of these models.

23 Bayesian Networks A Bayesian network uses conditional independence assumptions to more compactly represent a joint PMF of many random variables. We use a directed acyclic graph to encode conditional independence assumptions. Remember how we ordered X 1, Y 2, Z 3? Now let X i be an ordered node in graph G. Nodes X i in the graph G represent random variables. A directed edge X j X i means X j is a parent of X i. This means X i directly depends on X j The set of variables that are parents of X i is denoted Pa i. Each variable is conditionally independent of all its nondescendants in the graph given the value of all its parents. The factor associated with variable X i is P(X i Pa i ).

24 Example: Bayesian Network X 1 X 3 X 2 Pa 1 = {}, Pa 3 = {X 1 }, Pa 2 = {X 3 } P(X 1 = a 1, X 2 = a 2, X 3 = a 3 ) = P(X 1 =a 1 )P(X 3 =a 3 X 1 =a 1 )P(X 2 =a 2 X 3 =a 3 )

25 From Graphs to Factorizations and Back If we have a valid graph, we can infer the parent sets and the factors. If we have a valid set of factors, we can infer the parent sets and the graph.

26 Example: Graph to Factorization X 1 X 2 X 3 Pa 1 = {}, Pa 3 = {}, Pa 2 = {} P(X 1, X 2, X 3 ) = P(X 1 )P(X 3 )P(X 2 )

27 Example: Graph to Factorization X 1 X 3 X 2 Pa 1 = {}, Pa 3 = {}, Pa 2 = {X 1, X 3 } P(X 1, X 2, X 3 ) = P(X 1 )P(X 3 )P(X 2 X 1, X 3 )

28 Example: Graph to Factorization X 1 X 2 X 3 Pa 1 = {}, Pa 3 = {X 1 }, Pa 2 = {X 1 } P(X 1 = a 1, X 2 = a 2, X 3 = a 3 ) = P(X 1 =a 1 )P(X 3 =a 3 X 1 =a 1 )P(X 2 =a 2 X 1 =a 1 )

29 The Bayesian Network Theorem Definition: A joint PMF P(X 1,..., X d ) is a Bayesian network with respect to a directed acyclic graph G with parent sets {Pa 1,..., Pa d } if and only if: P(X 1,..., X d ) = d i=1 P(X i Pa i ) In other words, to be a valid Bayesian network for a given graph G, the joint PMF must factorize according to G.

30 Outline 1 The Problem with Joint PMFs 2 Approximating Joint PMFs 3 Bayesian Networks 4 Using Bayesian Networks 5 Learning Bayesian Networks from data

31 The Alarm Network: Random Variable Consider the following situation: You live in quiet neighborhood in the suburbs of LA. There are two reasons the alarm system in your house will go off: your house is broken into or there is an earthquake. If your alarm goes off you might get a call from the police department. You might also get a call from your neighbor. Rhetorical Question: What random variables can we use to describe this problem? Answer: Break-in (B), Earthquake (E), Alarm (A), Police Department calls (PD), Neighbor calls (N).

32 The Alarm Network: Factorization Rhetorical Question What direct dependencies might exist between the random variables B, E, A, PD, N? Rhetorical Question: What is the factorization implied by the graph? Answer: P(B, E, A, PD, N) = P(B)P(E)P(A B, E)P(PD A)P(N A)

33 The Alarm Network: Factor Tables

34 The Alarm Network: Joint Query Question: What is the probability that there is a break-in, but no earthquake, the alarm goes off, the police call, but your neighbor does not call? P(B =1, E =0, A=1, PD =1, N =0) = P(B =1)P(E =0)P(A=1 B =1, E =0)P(PD =1 A=1)P(N =0 A=1) = ( ) (1 0.75)

35 The Alarm Network: Marginal Query Question: What is the probability that there was a break-in, but no earthquake, the police call, but your neighbor does not call? P(B =1, E =0, PD =1, N =0) = P(B =1, E =0, A=0, PD =1, N =0) + P(B =1, E =0, A=1, PD =1, N =0) = P(B =1)P(E =0)P(A=1 B =1, E =0)P(PD =1 A=1)P(N =0 A=1) + P(B =1)P(E =0)P(A=0 B =1, E =0)P(PD =1 A=0)P(N =0 A=0) = ( ) (1 0.75) ( ) (1 0.94) (1 0.1) =

36 The Alarm Network: Conditional Query Question: What is the probability that the alarm went off given that there was a break-in, but no earthquake, the police call, but your neighbor does not call? P(A = 1 B =1, E =0, PD =1, N =0) P(B =1, E =0, A=1, PD =1, N =0) = 1 a=0 P(B =1, E =0, A=a, PD =1, N =0) P(B =1)P(E =0)P(A=1 B =1, E =0)P(PD =1 A=1)P(N =0 A=1) = 1 a=0 P(B =1)P(E =0)P(A=a B =1, E =0)P(PD =1 A=a)P(N =0 A=a)

37 Answering Probabilistic Queries Joint Query: To compute the probability of an assignment to all of the variables we simply express the joint probability as a product over the individual factors. We then look up the correct entries in the factor tables and multiply them together. Marginal Query: To compute the probability of an observed subset of the variables in the Bayesian network, we sum the joint probability of all the variables over the possible configurations of the unobserved variables. Conditional Query: To compute the probability of one subset of the variables given another subset, we first apply the conditional probability formula and then compute the ratio of the resulting marginal probabilities.

38 Outline 1 The Problem with Joint PMFs 2 Approximating Joint PMFs 3 Bayesian Networks 4 Using Bayesian Networks 5 Learning Bayesian Networks from data

39 Estimating Bayesian Networks from Data Just as with simpler models like the biased coin, we can estimate the unknown model parameters from data. If we have data consisting of n simultaneous observations of all of the variables in the network, we can easily estimate the entries of each conditional probability table using separate estimators.

40 Estimating Bayesian Networks: Counting No Parents: For a variable X with no parents, the estimate of P(X = x) is just the number of times that the variable X takes the value x in the data, divided by the total number of data cases n. Some Parents: For a variable X with parents Y 1,..., Y p, the estimate of P(X = x Y 1 = y 1,..., Y p = y p ) is just the number of times that the variable X takes the value x when the parent variables Y 1,..., Y p take the values y 1,..., y p, divided by the total number of times that the parent variables take the values y 1,..., y p.

41 Computing the Factor Tables from Observations Suppose we have a sample of data as shown below. Each row i is a joint configuration of all of the random variables in the network. E B A PD N In the alarm network, consider the factor P(E). We need to estimate P(E = 0) and P(E = 1). Given our data sample, we get the answers P(E = 0) = 4/5 and P(E = 1) = 1/5.

42 Computing the Factor Tables from Observations In the alarm network, consider the factor P(N A). We need to estimate P(N = 0 A = 0),P(N = 1 A = 0), P(N = 0 A = 1), P(N = 1 A = 1). How can we do this? E B A PD N P(N = 0 A = 0) = 1 2, P(N = 1 A = 0) = 1 2 P(N = 0 A = 1) = 2 3, P(N = 1 A = 1) = 1 3

43 Learning the structure of a Bayesian Network from data What if you have a dataset, but you do not know the dependencies that exist between the random variables? In other words, you do not know what is the graph of your Bayesian network. You can estimate the structure of the graph from data!

### Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Quantifying Uncertainty & Probabilistic Reasoning Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari Outline Previous Implementations What is Uncertainty? Acting Under Uncertainty Rational Decisions Basic

### Cartesian-product sample spaces and independence

CS 70 Discrete Mathematics for CS Fall 003 Wagner Lecture 4 The final two lectures on probability will cover some basic methods for answering questions about probability spaces. We will apply them to the

### Artificial Intelligence Uncertainty

Artificial Intelligence Uncertainty Ch. 13 Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? A 25, A 60, A 3600 Uncertainty: partial observability (road

### Probabilistic Reasoning

Course 16 :198 :520 : Introduction To Artificial Intelligence Lecture 7 Probabilistic Reasoning Abdeslam Boularias Monday, September 28, 2015 1 / 17 Outline We show how to reason and act under uncertainty.

### Informatics 2D Reasoning and Agents Semester 2,

Informatics 2D Reasoning and Agents Semester 2, 2017 2018 Alex Lascarides alex@inf.ed.ac.uk Lecture 23 Probabilistic Reasoning with Bayesian Networks 15th March 2018 Informatics UoE Informatics 2D 1 Where

### Introduction to Bayesian Learning

Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

### Probabilistic Reasoning

Probabilistic Reasoning Philipp Koehn 4 April 2017 Outline 1 Uncertainty Probability Inference Independence and Bayes Rule 2 uncertainty Uncertainty 3 Let action A t = leave for airport t minutes before

### Bayesian networks. Chapter Chapter

Bayesian networks Chapter 14.1 3 Chapter 14.1 3 1 Outline Syntax Semantics Parameterized distributions Chapter 14.1 3 2 Bayesian networks A simple, graphical notation for conditional independence assertions

### Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Outline Syntax Semantics Exact inference by enumeration Exact inference by variable elimination s A simple, graphical notation for conditional independence assertions and hence for compact specication

### Bayesian Networks. Material used

Bayesian Networks Material used Halpern: Reasoning about Uncertainty. Chapter 4 Stuart Russell and Peter Norvig: Artificial Intelligence: A Modern Approach 1 Random variables 2 Probabilistic independence

### Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Uncertainty and Belief Networks Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2! This lecture Conditional Independence Bayesian (Belief) Networks: Syntax and semantics

### The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

### This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

This lecture Conditional Independence Bayesian (Belief) Networks: Syntax and semantics Reading Chapter 14.1-14.2 Propositions and Random Variables Letting A refer to a proposition which may either be true

### Directed and Undirected Graphical Models

Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

### An AI-ish view of Probability, Conditional Probability & Bayes Theorem

An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there

### Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

### 13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

494 Chapter 13. Quantifying Uncertainty table. In a realistic problem we could easily have n>100, makingo(2 n ) impractical. The full joint distribution in tabular form is just not a practical tool for

### CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

### Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Bayesian Networks Today we ll introduce Bayesian Networks. This material is covered in chapters 13 and 14. Chapter 13 gives basic background on probability and Chapter 14 talks about Bayesian Networks.

### Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Problem of Logic Agents 22c:145 Artificial Intelligence Uncertainty Reading: Ch 13. Russell & Norvig Logic-agents almost never have access to the whole truth about their environments. A rational agent

### Uncertainty. Chapter 13

Uncertainty Chapter 13 Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1. partial observability (road state, other drivers' plans, noisy

### Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

1 Hal Daumé III (me@hal3.name) Probability 101++ Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 27 Mar 2012 Many slides courtesy of Dan

### CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

### Machine Learning Lecture 14

Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

### Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

### Lecture 4 October 18th

Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

### CS Lecture 4. Markov Random Fields

CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

### Reasoning Under Uncertainty

Reasoning Under Uncertainty Introduction Representing uncertain knowledge: logic and probability (a reminder!) Probabilistic inference using the joint probability distribution Bayesian networks (theory

### Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence

### Uncertain Knowledge and Reasoning

Uncertainty Part IV Uncertain Knowledge and Reasoning et action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1) partial observability (road state, other drivers

### Chapter 13 Quantifying Uncertainty

Chapter 13 Quantifying Uncertainty CS5811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Probability basics Syntax and semantics Inference

### A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

### CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

### Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Brief Intro. to Bayesian Networks Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom Factor Graph Review Tame combinatorics in many calculations Decoding codes (origin

### Probabilistic Representation and Reasoning

Probabilistic Representation and Reasoning Alessandro Panella Department of Computer Science University of Illinois at Chicago May 4, 2010 Alessandro Panella (CS Dept. - UIC) Probabilistic Representation

### Based on slides by Richard Zemel

CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

### PROBABILISTIC REASONING Outline

PROBABILISTIC REASONING Outline Danica Kragic Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule September 23, 2005 Uncertainty - Let action A t = leave for airport t minutes

### Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

### Introduction to Discrete Probability Theory and Bayesian Networks

Introduction to Discrete Probability Theory and Bayesian Networks Dr Michael Ashcroft October 10, 2012 This document remains the property of Inatas. Reproduction in whole or in part without the written

### Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics Steve Cheng, Ph.D. Guest Speaker for EECS 6893 Big Data Analytics Columbia University October 26, 2017 Outline Introduction Probability

### Uncertainty. Chapter 13, Sections 1 6

Uncertainty Chapter 13, Sections 1 6 Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 13, Sections 1 6 1 Outline Uncertainty Probability

### Bayesian networks: Modeling

Bayesian networks: Modeling CS194-10 Fall 2011 Lecture 21 CS194-10 Fall 2011 Lecture 21 1 Outline Overview of Bayes nets Syntax and semantics Examples Compact conditional distributions CS194-10 Fall 2011

### 4.1 Notation and probability review

Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

### Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

Probability, Markov models and HMMs Vibhav Gogate The University of Texas at Dallas CS 6364 Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell and Andrew Moore

### Conditional Independence and Factorization

Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

### Probability and Uncertainty. Bayesian Networks

Probability and Uncertainty Bayesian Networks First Lecture Today (Tue 28 Jun) Review Chapters 8.1-8.5, 9.1-9.2 (optional 9.5) Second Lecture Today (Tue 28 Jun) Read Chapters 13, & 14.1-14.5 Next Lecture

### Bayesian Machine Learning

Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

### CS221 / Autumn 2017 / Liang & Ermon. Lecture 13: Bayesian networks I

CS221 / Autumn 2017 / Liang & Ermon Lecture 13: Bayesian networks I Review: definition X 1 X 2 X 3 f 1 f 2 f 3 f 4 Definition: factor graph Variables: X = (X 1,..., X n ), where X i Domain i Factors: f

### Reasoning Under Uncertainty: Belief Network Inference

Reasoning Under Uncertainty: Belief Network Inference CPSC 322 Uncertainty 5 Textbook 10.4 Reasoning Under Uncertainty: Belief Network Inference CPSC 322 Uncertainty 5, Slide 1 Lecture Overview 1 Recap

### Introduction to Artificial Intelligence Belief networks

Introduction to Artificial Intelligence Belief networks Chapter 15.1 2 Dieter Fox Based on AIMA Slides c S. Russell and P. Norvig, 1998 Chapter 15.1 2 0-0 Outline Bayesian networks: syntax and semantics

### Foundations of Artificial Intelligence

Foundations of Artificial Intelligence 12. Making Simple Decisions under Uncertainty Probability Theory, Bayesian Networks, Other Approaches Wolfram Burgard, Maren Bennewitz, and Marco Ragni Albert-Ludwigs-Universität

### Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

### Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:

### Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

### Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Machine learning: lecture 20 ommi. Jaakkola MI AI tommi@csail.mit.edu Bayesian networks examples, specification graphs and independence associated distribution Outline ommi Jaakkola, MI AI 2 Bayesian networks

### COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks Lecture 2: Bayesian Networks Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology Fall

### Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

CS 188: Artificial Intelligence Spring 2011 Announcements Assignments W4 out today --- this is your last written!! Any assignments you have not picked up yet In bin in 283 Soda [same room as for submission

### CIS 2033 Lecture 5, Fall

CIS 2033 Lecture 5, Fall 2016 1 Instructor: David Dobor September 13, 2016 1 Supplemental reading from Dekking s textbook: Chapter2, 3. We mentioned at the beginning of this class that calculus was a prerequisite

### Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are

### Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

### Reasoning Under Uncertainty: Bayesian networks intro

Reasoning Under Uncertainty: Bayesian networks intro Alan Mackworth UBC CS 322 Uncertainty 4 March 18, 2013 Textbook 6.3, 6.3.1, 6.5, 6.5.1, 6.5.2 Lecture Overview Recap: marginal and conditional independence

### Lecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 11: Uncertainty. Uncertainty.

Lecture Overview COMP 3501 / COMP 4704-4 Lecture 11: Uncertainty Return HW 1/Midterm Short HW 2 discussion Uncertainty / Probability Prof. JGH 318 Uncertainty Previous approaches dealt with relatively

### Conditional Independence

Conditional Independence Sargur Srihari srihari@cedar.buffalo.edu 1 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why

### Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

### Bayesian Methods in Artificial Intelligence

WDS'10 Proceedings of Contributed Papers, Part I, 25 30, 2010. ISBN 978-80-7378-139-2 MATFYZPRESS Bayesian Methods in Artificial Intelligence M. Kukačka Charles University, Faculty of Mathematics and Physics,

### An Embedded Implementation of Bayesian Network Robot Programming Methods

1 An Embedded Implementation of Bayesian Network Robot Programming Methods By Mark A. Post Lecturer, Space Mechatronic Systems Technology Laboratory. Department of Design, Manufacture and Engineering Management.

### Belief Networks for Probabilistic Inference

1 Belief Networks for Probabilistic Inference Liliana Mamani Sanchez lmamanis@tcd.ie October 27, 2015 Background Last lecture we saw that using joint distributions for probabilistic inference presented

### Bayesian networks. Chapter Chapter

Bayesian networks Chapter 14.1 3 Chapter 14.1 3 1 Outline Syntax Semantics Parameterized distributions Chapter 14.1 3 2 Bayesian networks A simple, graphical notation for conditional independence assertions

### Bayesian Methods: Naïve Bayes

Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

### Probability and Independence Terri Bittner, Ph.D.

Probability and Independence Terri Bittner, Ph.D. The concept of independence is often confusing for students. This brief paper will cover the basics, and will explain the difference between independent

### Introduction to Bayesian Networks

Introduction to Bayesian Networks Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/23 Outline Basic Concepts Bayesian

### Bayesian Networks: Representation, Variable Elimination

Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation

### Bayes and Naïve Bayes Classifiers CS434

Bayes and Naïve Bayes Classifiers CS434 In this lectre 1. Review some basic probability concepts 2. Introdce a sefl probabilistic rle - Bayes rle 3. Introdce the learning algorithm based on Bayes rle (ths

### Probability Based Learning

Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic

### Lecture 13: Bayesian networks I. Review: definition. Review: person tracking. Course plan X 1 X 2 X 3. Definition: factor graph. Variables.

Lecture 13: ayesian networks I Review: definition X 1 X 2 X 3 f 1 f 2 f 3 f 4 Definition: factor graph Variables: X = (X 1,..., X n ), where X i Domain i Factors: f 1,..., f m, with each f j (X) 0 Weight(x)

### Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Introduction to Probabilistic Reasoning Brian C. Williams 16.410/16.413 November 17 th, 2010 11/17/10 copyright Brian Williams, 2005-10 1 Brian C. Williams, copyright 2000-09 Image credit: NASA. Assignment

### Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

### Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Bayesian Reasoning Adapted from slides by Tim Finin and Marie desjardins. 1 Outline Probability theory Bayesian inference From the joint distribution Using independence/factoring From sources of evidence

### LEARNING WITH BAYESIAN NETWORKS

LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang - 2006, Jeremy Gould 2013, Chip Galusha -2014 Jeremy Gould 2013Chip Galus May 6th, 2016

### Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference David Poole University of British Columbia 1 Overview Belief Networks Variable Elimination Algorithm Parent Contexts

### Bayesian Networks and Decision Graphs

Bayesian Networks and Decision Graphs A 3-week course at Reykjavik University Finn V. Jensen & Uffe Kjærulff ({fvj,uk}@cs.aau.dk) Group of Machine Intelligence Department of Computer Science, Aalborg University

### Machine Learning: Probability Theory

Machine Learning: Probability Theory Prof. Dr. Martin Riedmiller Albert-Ludwigs-University Freiburg AG Maschinelles Lernen Theories p.1/28 Probabilities probabilistic statements subsume different effects

### 6.867 Machine learning, lecture 23 (Jaakkola)

Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

### What is a random variable

OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr

### Uncertainty (Chapter 13, Russell & Norvig)

Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150 Administration Midterm next Tuesday!!! I will try to find an old one to post. The MT will cover chapters 1-6, with

### CS 188, Fall 2005 Solutions for Assignment 4

CS 188, Fall 005 Solutions for Assignment 4 1. 13.1[5pts] Show from first principles that P (A B A) = 1 The first principles needed here are the definition of conditional probability, P (X Y ) = P (X Y

### Uncertainty. CmpE 540 Principles of Artificial Intelligence Pınar Yolum Uncertainty. Sources of Uncertainty

CmpE 540 Principles of Artificial Intelligence Pınar Yolum pinar.yolum@boun.edu.tr Department of Computer Engineering Boğaziçi University Uncertainty (Based mostly on the course slides from http://aima.cs.berkeley.edu/

### 2.4. Conditional Probability

2.4. Conditional Probability Objectives. Definition of conditional probability and multiplication rule Total probability Bayes Theorem Example 2.4.1. (#46 p.80 textbook) Suppose an individual is randomly

### Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

### 4 : Exact Inference: Variable Elimination

10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

### Today s Outline. Biostatistics Statistical Inference Lecture 01 Introduction to BIOSTAT602 Principles of Data Reduction

Today s Outline Biostatistics 602 - Statistical Inference Lecture 01 Introduction to Principles of Hyun Min Kang Course Overview of January 10th, 2013 Hyun Min Kang Biostatistics 602 - Lecture 01 January

### Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

(Bayesian Networks) Undirected Graphical Models 2: Use d-separation to read off independencies in a Bayesian network Takes a bit of effort! 1 2 (Markov networks) Use separation to determine independencies

### Bayes Networks 6.872/HST.950

Bayes Networks 6.872/HST.950 What Probabilistic Models Should We Use? Full joint distribution Completely expressive Hugely data-hungry Exponential computational complexity Naive Bayes (full conditional

### Lecture 6: Entropy Rate

Lecture 6: Entropy Rate Entropy rate H(X) Random walk on graph Dr. Yao Xie, ECE587, Information Theory, Duke University Coin tossing versus poker Toss a fair coin and see and sequence Head, Tail, Tail,

### Brief Review of Probability

Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic

### MAT Mathematics in Today's World

MAT 1000 Mathematics in Today's World Last Time We discussed the four rules that govern probabilities: 1. Probabilities are numbers between 0 and 1 2. The probability an event does not occur is 1 minus

### STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Bayesian Networks Representing Joint Probability Distributions 2 n -1 free parameters Reducing Number of Parameters: Conditional Independence

### Probability Theory. Prof. Dr. Martin Riedmiller AG Maschinelles Lernen Albert-Ludwigs-Universität Freiburg.

Probability Theory Prof. Dr. Martin Riedmiller AG Maschinelles Lernen Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de Prof. Dr. Martin Riedmiller Machine Learning Lab, University