Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32

Lecture 5a Bayesian network April 14, 2016 2/32

Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian network (BN) 2.1. Definition of Bayesian network 2.2. Basics of BN 2.3. Basic building blocks of BN 2.4. D-separation 2.5. Preview of Deep Belief Network (DBN) 2.6. Markov chain as BN 2.7. HMM as BN 3/32

1. Objectives of Lecture 5a Objective 1 Learn minimal Bayesian network formalism that is necessary for the understanding of the deep belief network Objective 2 Learn how dependency and independency are encoded in the directed acyclic graph structure Objective 3 Learn the D-separation theorem 4/32

Objective 4 Understand to view Markov chain and hidden Markov in the BN framework Objective 5 Present a very simple example as a preview of how deep belief network works 5/32

2.1. Definition of Bayesian network 2. Bayesian network (BN) 2.1.Definition of Bayesian network Bayesian network G: DAG (directed acyclic graph) Each node of G represents a random variable Joint probability distribution P(a 1,, a n ) = n i=1 P(a i pa(a i )), where pa(a i ) denotes the parent nodes of a i BN models probabilistic causal structure 6/32

2.2. Basics of BN 2.2. Basics of BN Example A, B, C random variables with values a, b and c respectively P(a, b) = P(A = a, B = b) Chain rule P(a, b) = P(a)P(b a) P(a, b) = P(b)P(a b) 7/32

2.2. Basics of BN P(a, b, c) = P(A = a, B = b, C = c) = P(a)P(b a)p(c a, b) P(a, b, c) = P(c)P(b c)p(a c, b) 8/32

2.2. Basics of BN Probabilistic causality (inference) P(a, b) = P(a)P(b a) 9/32

2.2. Basics of BN Marginal Probability Conditional Probability Table (CPT) of P(b a) 10/32

2.2. Basics of BN Evidence = Conditioning If a = 1, P(b = 1 a = 1) = 0.8 Knowledge of a gives better knowledge of b, i.e. P(b = 1 a = 1) = 0.8 CPT of P(a b) Knowledge of b also gives better knowledge of a, i.e. P(a = 0 b = 0) = 0.75 11/32

2.2. Basics of BN Thus the information flows both ways even if the direction of arrow is from a to b 12/32

2.3. Basic building blocks of BN 2.3. Basic building blocks of BN Example: head-head node (c) P(a, b, c) = P(a)P(b)P(c a, b) P(a, b) = c P(a, b, c) = P(a)P(b) P(c a, b) c = P(a)P(b) 13/32

2.3. Basic building blocks of BN Thus A and B are independent (denoted as A B ) But in general P(a, b, c) P(a c)p(b c) i.e. A / B C Case: c is unknown Since A B, there is no information flow between a and b Case: c is known Since A / B C, information flows between a and b 14/32

2.3. Basic building blocks of BN Case: d is known Knowledge of d creates an information flow from d to c, thus some information flows between a and b. Thus A / B D, and thus A / B C 15/32

2.3. Basic building blocks of BN Example: tail-tail node (c) P(a, b, c) = P(a c)p(b c)p(c) P(a, b, c) P(a, b c) = = P(a c)p(b c) i.e. A B C. P(c) But in general P(a, b) P(a)P(b) i.e. A / B 16/32

2.3. Basic building blocks of BN Case: c is unknown Information flows a to c to b and also information flow from b to c to a Case: c is known Since A B C, knowledge of a does not given any further knowledge of b and vice versa 17/32

2.3. Basic building blocks of BN Example: head-tail node (b) P(a, b, c) = P(a)P(b a)p(c b) P(a, c b) = = P(a, b, c) = P(a)P(b a)p(c b) P(b) P(b) P(a, b)p(c b) = P(a b)p(c b) P(b) i.e. A C B. But in general P(a, c b) P(a b)p(c b), i.e. A / C B 18/32

2.4. D-separation 2.4. D-separation D-separation Type of node along a path A path from a node to another node is said to be blocked by a set A of nodes, if one of the following is true (i) the path contains a node t A such that it is of head-tail or tail-tail type (ii) there is a head-head node in the path such that neither the node itself nor any of its descendants is in A 19/32

2.4. D-separation D-separation Theorem Let A, B, C be mutually disjoint sets of nodes. Suppose any path from a node of A to a node of B is blocked by C. Then A B C, i.e. the random variables represented by A are conditionally independent of the random variables represented by B, given the random variables represented by C 20/32

2.5. Preview of Deep Belief Network (DBN) 2.5. Preview of Deep Belief Network (DBN) Example P(x 1, x 2, h) = P(h)P(x 1 h)p(x 2 h) 21/32

2.5. Preview of Deep Belief Network (DBN) Scenario [Output(image) reconstruction] when h = 1, it is very likely (x 1, x 2 ) = (1, 0) when h = 0, x 1 and x 2 are random CPT of P(x 1 h) 22/32

2.5. Preview of Deep Belief Network (DBN) CPT of P(x 2 h) Probability of P(h) 23/32

2.5. Preview of Deep Belief Network (DBN) Probability of P(x 1, x 2 h) By d-separation P(x 1, x 2 h) = P(x 1 h)p(x 2 h) 24/32

2.5. Preview of Deep Belief Network (DBN) Probability of P(x 1, x 2, h) 25/32

2.5. Preview of Deep Belief Network (DBN) Probability of P(x 1, x 2 ) Probability of P(h x 1, x 2 ) 26/32

2.5. Preview of Deep Belief Network (DBN) When (x 1, x 2 ) = (1, 0), h = 1 with probability 0.74 (x 1, x 2 ) = (0, 1), h = 0 with probability 0.93 (x 1, x 2 ) = (1, 1), h = 0 with slightly higher probability (x 1, x 2 ) = (0, 0), h = 0 with probability 0.76 It is more likely that (x 1, x 2 ) = (1, 0) produces h = 1 and all else h = 0 27/32

2.6. Markov chain as BN 2.6. Markov chain as BN Example 1 Markov chain Markov property P(x t x 1,, x t 1 ) = P(x t x t 1 ) Joint probability P(x 1,, x n ) = P(x 1 )P(x 2 x 1 )P(x 3 x 1, x 2 ) P(x t x 1,, x t 1 ) P(x n x 1,, x n 1 ) = P(x 1 )P(x 2 x 1 )P(x 3 x 2 ) P(x t x t 1 ) P(x n x n 1 ) 28/32

2.6. Markov chain as BN Markov chain is an example of Bayesian network Example 2 Markov chain of order 2 Thus P(x t x 1,, x t 1 ) = P(x t x t 1, x t 2 ) P(x 1,, x n ) = P(x 1 )P(x 2 x 1 )P(x 3 x 1, x 2 )P(x 4 x 2, x 3 ) P(x t x t 1, x t 2 ) Markov chain of order 2 is also an example of Bayesian network 29/32

2.6. Markov chain as BN Similarly, can define Markov chain of order k for any k 30/32

2.7. HMM as BN 2.7. HMM as BN Example 3 Hidden Markov Model (HMM) is a Markov chain The observation(output) x t depends on z t 31/32

2.7. HMM as BN Joint probability P(z 1,, z n, x 1,, x n ) = P(z 1 )P(z 2 z 1 ) P(z n z n 1 ) P(x 1 z 1 )P(x 2 z 2 ) P(x n z n ) HMM is an example of Bayesian network Can also define HMM of any order 32/32