Outline. Introduction to Bayesian Networks. Outline. Directed Acyclic Graphs. Bayesian Networks Overview Building Models Modeling Tricks.

Similar documents
Bayesian networks II. Model building. Anders Ringgaard Kristensen

Bayesian Networks and Decision Graphs

Artificial Intelligence Bayesian Networks

Bayesian Networks: Belief Propagation in Singly Connected Networks

Modeling and Reasoning with Bayesian Networks. p.1

Directed and Undirected Graphical Models

Lecture 17: May 29, 2002

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Introduction to Artificial Intelligence. Unit # 11

Bayesian Networks Inference with Probabilistic Graphical Models

Cavities. Independence. Bayes nets. Why do we care about variable independence? P(W,CY,T,CH) = P(W )P(CY)P(T CY)P(CH CY)

4.1 Notation and probability review

What is it all about? Introduction to Bayesian Networks. Method to reasoning under uncertainty. Where we reason using probabilities

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Outline. Bayesian Networks: Belief Propagation in Singly Connected Networks. Form of Evidence and Notation. Motivation

Conditional Independence and Factorization

Machine Learning Lecture 14

Outline. A quiz

Probabilistic Graphical Models Lecture Notes Fall 2009

CPSC 540: Machine Learning

Introduction to Bayesian Networks

Bayesian & Markov Networks: A unified view

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Example: multivariate Gaussian Distribution

Directed Graphical Models or Bayesian Networks

Machine Learning Summer School

Based on slides by Richard Zemel

COMP538: Introduction to Bayesian Networks

Applying Bayesian networks in the game of Minesweeper

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

Building Bayesian Networks. Lecture3: Building BN p.1

Local Probability Models

Undirected Graphical Models

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Lecture 4 October 18th

Machine Learning for Data Science (CS4786) Lecture 19

Probabilistic Reasoning. (Mostly using Bayesian Networks)

9 Forward-backward algorithm, sum-product on factor graphs

Undirected Graphical Models

Directed and Undirected Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian Networks (Part II)

Introduction to Bayesian Networks

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Markov properties for directed graphs

Graphical Models and Kernel Methods

COMP538: Introduction to Bayesian Networks

4 : Exact Inference: Variable Elimination

3 : Representation of Undirected GM

Artificial Intelligence Bayes Nets: Independence

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Undirected Graphical Models: Markov Random Fields

2 : Directed GMs: Bayesian Networks

4. Conditional Probability

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Chapter 16. Structured Probabilistic Models for Deep Learning

CMPSCI 240: Reasoning about Uncertainty

Probabilistic Graphical Models

p L yi z n m x N n xi

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Intelligent Systems (AI-2)

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayes Nets: Independence

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

6.867 Machine learning, lecture 23 (Jaakkola)

Capturing Independence Graphically; Undirected Graphs

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Outline. Probability. Math 143. Department of Mathematics and Statistics Calvin College. Spring 2010

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

CSC 412 (Lecture 4): Undirected Graphical Models

2 : Directed GMs: Bayesian Networks

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

CS Lecture 4. Markov Random Fields

Theory of Stochastic Processes 3. Generating functions and their applications

Bayesian Learning in Undirected Graphical Models

A brief introduction to Conditional Random Fields

CS 5522: Artificial Intelligence II

Computational Genomics

Bayesian Network Representation

Y. Xiang, Inference with Uncertain Knowledge 1

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Equivalence in Non-Recursive Structural Equation Models

Part I Qualitative Probabilistic Networks

1 INFO 2950, 2 4 Feb 10

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Transcription:

Outline Introduction to Huizhen Yu janey.yu@cs.helsinki.fi Dept. omputer Science, Univ. of Helsinki Probabilistic Models, Spring, 2010 cknowledgment: Illustrative examples in this lecture are mostly from Finn Jensen s book, n Introduction to, 1996. Huizhen Yu (U.H.) Introduction to Feb. 4 1 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 2 / 31 Outline Directed cyclic Graphs Notation and definitions for a directed graph G = (V, E): V : the set of vertices E: the set of directed edges; (α, β) E means there is an edge from vertex α to vertex β. If (α, β) E, we say α is a parent of β and β is a child of α. We denote the set of parents of a vertex α by pa(α), and the set of its children by ch(α). cycle: a path that starts and ends at the same vertex. directed acyclic graph (DG) is a directed graph that has no cycles. Let G be a DG and X = {X v, v V } discrete random variables associated with V. We say P(X ) factorizes recursively according to G if p(x) = Y v V p`x v x pa(v). Examples we have seen: Markov chains, HMMs Huizhen Yu (U.H.) Introduction to Feb. 4 3 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 4 / 31

omparison between DG and MRF uilding and Using Models DG: Dependence structure is specified hierarchically. Edges represent direct or causal influence. Undirected graphs/mrf: Neighborhood relationship is symmetric. Edges represent interaction/association. Graphs having both directed and undirected edges are called chain graphs; they can represent both kinds of dependence and are more general than DG or MRF. (We will not study them, however.) Three phases of developing a ayesian network: (i) uilding models Specify random variables Specify structural dependence between variables ssign conditional probabilities to components of the model (ii) onstructing inference engine ompile the model: representations convenient for computation (answering queries) are created internally. (iii) Using the inference engine for case analysis Parameter learning/adaptation based on data links (iii) partially to (i). Structural learning/adaptation is an active research topic. In this lecture, we illustrate model building with simple examples. Huizhen Yu (U.H.) Introduction to Feb. 4 5 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 6 / 31 Outline Model Elements Random variables: Hypothesis variables: their values are unobservable, but of interest in our problem. Information variables: their values are observed and informative about the hypothesis variables. Mediating variables: related to the underlying physical process, or introduced just for convenience. Dependence structure: DG onditional probabilities {p(x v x pa(v) )} for model components Specifying variables in the model is the first step of model building and very important in practice, although in studying the theory we have taken it for granted. Huizhen Yu (U.H.) Introduction to Feb. 4 7 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 8 / 31

Description: When I go home at night, I want to know if my family is at home before I try the door. Often when the family leaves, an outdoor light is turned on. However, sometimes the light is turned on if the family is expecting a guest. lso, we have a dog. When nobody is at home, the dog is put in the back yard. The same is true if the dog has bowel trouble. Finally, if the dog is in the back yard, I will probably hear her barking, but sometimes I can be confused by other dogs barking. Hypothesis variable: Family out? () Information variables: Light on? () Hear dog barking? () 1st model of causal structure We now specify (subjective) conditional probabilities for the model. For P(): One out of five week days, my family is out; so I set For P( ): The family rarely leaves home without turning the light on. ut sometimes they may forget. So I set = y = n = y 0.99 0.01 =n?? How to specify the rest of the probabilities for P( = n) and P( )? = y = n 0.2 0.8 Huizhen Yu (U.H.) Introduction to Feb. 4 9 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 10 / 31 Introduce mediating variables for assessing probabilities: Expect guest? (Exp-g) For P( = n): We have guests three times a month, so P(Exp-g = y = n) = 0.1, P( = y Exp-g = y, = n) = 1, and I set P( = y = n) = 0.1. = y = n = y 0.99 0.01 =n 0.1 0.9 Exp-g 2nd model ack to 1st model Introduce mediating variables for assessing probabilities P( ): Dog out? (D-out) owel problem? (P) For P(P): I set P = y P = n 0.05 0.95 For P(D-out, P): Sometimes the dog is out just because she wants to be out: P(D-out = y = n, P = n) = 0.2. fter some reasoning..., I set P(D-out = y, P) to be For P( D-out): Sometimes I can confuse the barking of the neighbor s dog with that of mine. Without introducing another mediating variable, I take this into account in the probability assessment by setting 3rd model D-out P = y P = n = y 0.994 0.88 =n 0.960 0.2 P = y = n D-out = y 0.6 0.4 D-out = n 0.2 0.8 Huizhen Yu (U.H.) Introduction to Feb. 4 11 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 12 / 31

In this example, if mediating variables will never be observed, we can eliminate (marginalize out) them and get back to the 1st model with the corresponding probabilities: D-out P = Description: Example II: Insemination Six weeks after insemination of a cow there are three tests for the result: blood test (T), urine test (UT) and scanning (Sc). The results of the blood test and the urine test are mediated through the hormonal state (Ho) which is affected by a possible pregnancy (Pr). For both the blood test and the urine test there is a risk that a pregnancy does not show after six weeks because the change in the hormonal state may be too weak. We can verify this directly. lternatively, we can also use the global Markov property on undirected graphs: P of the larger model factorizes according to the graph on the right. The vertex set {} separates {} from { }, so. P D-out Hypothesis variable: Pregnant? (Pr) Information variables: lood test result (T) Urine test result (UT) Scanning result (Sc) Mediating variable: Hormonal state (Ho) 1st naive ayes model without the variable Ho T Pr UT Sc Huizhen Yu (U.H.) Introduction to Feb. 4 13 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 14 / 31 Example II: Insemination Over-confidence of Naive ayes model: Example II: Insemination 2nd model with the mediating variable Ho Pr Pr = y Pr = n 0.87 0.13 Ho = y Ho = n Pr = y 0.9 0.1 Pr =n 0.01 0.99 1st naive ayes model without the variable Ho T Pr UT T = y T = n Pr = y 0.64 0.36 Pr =n 0.106 0.894 UT = y UT = n Pr = y 0.73 0.27 Pr =n 0.107 0.893 Sc = y Sc = n Pr = y 0.9 0.1 Pr =n 0.01 0.99 Pr = y Pr = n 0.87 0.13 Sc T and UT results are counted as two independent pieces of evidence: P(T = n Pr = n)p(ut = n Pr = n)p(pr = n) = 0.894 0.893 0.13 P(T = n Pr = y)p(ut = n Pr = y)p(pr = y) = 0.36 0.27 0.87 and P(Pr = n T = n, UT = n) 0.894 0.893 0.13 = 0.894 0.893 0.13 + 0.36 0.27 0.87 0.55. T Ho UT We calculate P(Pr = n T = n, UT = n): Sc T = y T = n Ho = y 0.7 0.3 Ho =n 0.1 0.9 UT = y UT = n Ho = y 0.8 0.2 Ho =n 0.1 0.9 X P(Pr = n, T = n, UT = n) = P(T = n Ho = x)p(ut = n Ho = x)p(ho = x Pr = n)p(pr = n) x {y,n} = 0.3 0.2 0.01 0.13 + 0.9 0.9 0.99 0.13 0.1043 X P(Pr = y, T = n, UT = n) = P(T = n Ho = x)p(ut = n Ho = x)p(ho = x Pr = y)p(pr = y) x {y,n} and (Naive ayes prediction: 0.55) = 0.3 0.2 0.9 0.87 + 0.9 0.9 0.1 0.87 0.1175 P(Pr = n T = n, UT = n) 0.1043 0.1043 + 0.1175 0.47. Huizhen Yu (U.H.) Introduction to Feb. 4 15 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 16 / 31

Example II: Insemination Mediating variable can play another important role, as shown here: There is dependence between the results of blood test (T) and urine test (UT). ut there is no causal direction in the dependence, and we are also unwilling to introduce a directed edge between the two variables. y introducing a mediating variable as their parent variable, we create association like an undirected edge between T and UT when the mediating variable is not observed. T Ho Pr UT Sc Description: Example III: Stud Farm stud farm has 10 horses. Their geneological structure is shown below. nn is the mother of both Fred and Gwenn, but their fathers are unrelated and unknown. Every horse may have three genotypes: it may be sick (aa), a carrier (a), or he may be pure (). None of the horses are sick except for John, who has been born recently. s the disease is so serious, the farm wants to find out the probabilities for the remaining horses to be carriers of the unwanted gene. Hypothesis variables: genotypes of all the horses except for John Information variable: John s genotype dditional information: none of the other horses are sick Mediating variables: genotypes of the two unknown fathers (L, K) Fred nn Henry Dorothy rian John Eric ecily Irene Gwenn Huizhen Yu (U.H.) Introduction to Feb. 4 17 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 18 / 31 Example III: Stud Farm Introduce mediating variables L and K: L nn rian ecily K Example III: Stud Farm We can add evidence variables to represent the additional information that the other horses are not sick: e- = {not aa}, graph partially shown Fred Dorothy Eric Gwenn e- L nn rian Henry Irene e- Fred Dorothy John e-f e-d The probabilities P(child father, mother) of a child s genetic inheritance: numbers are probabilities for (aa, a, ) aa a aa (1, 0, 0) (0.5, 0.5, 0) (0, 1, 0) a (0.5, 0.5, 0) (0.25, 0.5, 0.25) (0, 0.5, 0.5) (0, 1, 0) (0, 0.5, 0.5) (0, 0, 1) Prior probability of a horse being a carrier or pure: a 0.01 0.99 Henry e-h Now we have specified the model. (How to compute the probability of a horse being a carrier given all the evidence?) Huizhen Yu (U.H.) Introduction to Feb. 4 19 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 20 / 31

Outline of the List For handling undirected relations and constraints Mediating variables For reducing the number of parameters in the model Noisy-or and its variants Divorcing (grouping) parents For handling expert disagreement and model adaptation Huizhen Yu (U.H.) Introduction to Feb. 4 21 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 22 / 31 Handling Undirected Relations and onstraints Suppose,, are variables that do not have parents. They are marginally dependent with PMF r(a, b, c), but it is undesirable to link them with directed edges. Noisy-Or Gate typical network for modeling causes and consequences: e.g., Introduce variable D, and define P(D = y = a, = b, = c) = r(a, b, c), P(D = n = a, = b, = c) = 1 r(a, b, c). {X i}: the presences of n possible diseases Y : the symptom X X X 1 2 n Y Let P(), P(), P() be uniform distributions. When using the network, we always enter the evidence D = y. Now P( = a, = b, = c D = y) = r(a, b, c). D Difficulty in building/using the model: # parameters grows exponentially with the number of parents: Even if variables are all binary, the size of the conditional probability table p(y x 1,..., x n) is 2 n+1 with 2 n free parameters. onstraints can be handled similarly. In this case,, can also have parents. If they have to satisfy f (,, ) = 0, we let ( 1 if f (a, b, c) = 0, P(D = y = a, = b, = c) = 0 otherwise. and we always let D = y. (For an example, see the washed-socks example in reference [1].) Huizhen Yu (U.H.) Introduction to Feb. 4 23 / 31 Noisy-or trick is useful for reducing the number of parameters, when each cause is thought to act independently. Huizhen Yu (U.H.) Introduction to Feb. 4 24 / 31

Noisy-or assumption (binary case): Noisy-Or Gate Each event X i = 1 can cause Y = 1 unless an inhibitor prevents it. The inhibition probability is q i, and the inhibitors are independent. The conditional probabilities of y 0 = 0, y 1 = 1: p(y 0 x 1,..., x n) = Y q i, i:x i =1 p(y 1 x 1,..., x n) = 1 Y q i. i:x i =1 # parameters is now linear in the number of parents. X X X 1 2 n 1 - q 1 1 - q 2 1 - q n orresponding graphical model: X X X 1 2 n Y 1 Y 2 Y n Y Y Y = OR( Y 1, Y 2,... Y n ) Generalization: noisy-and, noisy-max, noisy functional dependence. Noisy-Or: Family-Out? Example In the Family-out? example, to assign probabilities P(D-out, P), I reason that there are three causes for the dog to be out, and if any one of them is present, the dog is out: the background event that the dog wants to be out: probability 0.2; = y, which causes the dog to be out with probability 0.85; P = y, which causes the dog to be out with probability 0.95. Then P(D-out = y, P) is given by P = y P = n = y 1 0.8 0.05 0.15 1 0.8 0.15 =n 1 0.8 0.05 1 0.8 which gives P = y P = n = y 0.994 0.88 =n 0.960 0.2 D-out P Huizhen Yu (U.H.) Introduction to Feb. 4 25 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 26 / 31 Noisy Functional Dependence Example: Headache Headache (Ha) may be caused by fever (Fe), hangover (Ho), fibrositis (Fb), brain tumor (t), and other causes (Ot). Let Ha has states no, mild, moderate, severe. The various causes support each other in the effect. We still feel however that the impacts of the causes are independent: if the headache is at level l, and we add an extra cause for headache, then we expect the result to be a headache at level q independent of how the initial state has been caused. We want to combine the effects of various causes. Ot Ha-Ot Fe Ha-Fe Ho Ha-Ho Ha Fb Ha-Fb t Ha-t Node Ha adds up the numbers Mediating Variables for Divorcing Parents 1 2 4 3 1 2 3 4 Example: ank Loan To help the bank decide when a customer applies for a mortgage, the customer is asked to fill in a form giving information on: type of job, yearly income, other financial commitments, number and type of cars in the family, size and age of the house, price of the house, number of previous addresses during the last five years, number of children in the family, number of divorces, and number of children not living in the family. We can partition the 10 variables into three mediating variables describing: economic potentials, stability, and security of the mortgage. Huizhen Yu (U.H.) Introduction to Feb. 4 27 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 28 / 31

Headache Example with Divorcing Expert Disagreement and Model daptation Suppose two experts agree on the model structure for,,, D, but disagree on the probabilities P() and P(D, ). S Ha-Ot Ha-Fe Ha-Ho Ha-Fb Ha-t D D Ha-x Ha-z Ha-y Ha We can add a node S representing the experts, s {1, 2}, and express our confidence in the experts via P(S). Similarly, we can prepare for adaptation of model parameters based on data by introducing a type variable T and copying other variables for each case: T ase 1 ase 2 Huizhen Yu (U.H.) Introduction to Feb. 4 29 / 31 Huizhen Yu (U.H.) Introduction to Feb. 4 30 / 31 Further Readings For an introduction to building models: 1. Finn V. Jensen. n Introduction to. UL Press, 1996. hap. 3. 2. Finn V. Jensen and Thomas D. Nielsen. and Decision Graphs. Springer, 2007. hap. 3. [2] describes more advanced models such as object-oriented ayesian networks. Huizhen Yu (U.H.) Introduction to Feb. 4 31 / 31