Rapid Introduction to Machine Learning/ Deep Learning

Similar documents
Directed Graphical Models

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Bayesian Machine Learning

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS Lecture 3. More Bayesian Networks

Chris Bishop s PRML Ch. 8: Graphical Models

{ p if x = 1 1 p if x = 0

Probabilistic Graphical Models (I)

Directed and Undirected Graphical Models

Machine Learning Lecture 14

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian Networks BY: MOHAMAD ALSABBAGH

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Chapter 16. Structured Probabilistic Models for Deep Learning

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Conditional Independence

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Directed and Undirected Graphical Models

Rapid Introduction to Machine Learning/ Deep Learning

Graphical Models - Part I

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

Probabilistic Graphical Networks: Definitions and Basic Results

An Introduction to Bayesian Machine Learning

Directed Graphical Models or Bayesian Networks

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

3 : Representation of Undirected GM

Lecture 5: Bayesian Network

Graphical Models and Kernel Methods

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks

Intro to AI: Lecture 8. Volker Sorge. Introduction. A Bayesian Network. Inference in. Bayesian Networks. Bayesian Networks.

Bayesian Networks Representation

Introduction to Artificial Intelligence. Unit # 11

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayesian Networks (Part II)

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Graphical Models 359

Reasoning Under Uncertainty: Belief Network Inference

Probabilistic Models

Introduction to Bayesian Networks. Probabilistic Models, Spring 2009 Petri Myllymäki, University of Helsinki 1

Bayesian Networks and Decision Graphs

CMPSCI 240: Reasoning about Uncertainty

Introduction to Bayesian Networks

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayes Nets: Independence

Review: Bayesian learning and inference

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Introduction to Causal Calculus

Bayesian Networks. Motivation

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Statistical Approaches to Learning and Discovery

STA 4273H: Statistical Machine Learning

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Introduction to Bayesian Learning

Introduction to Probabilistic Graphical Models

CS 5522: Artificial Intelligence II

Learning in Bayesian Networks

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Bayesian Networks Inference with Probabilistic Graphical Models

Graphical Models - Part II

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Probabilistic Graphical Models Redes Bayesianas: Definição e Propriedades Básicas

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Artificial Intelligence Bayes Nets: Independence

Lecture 4 October 18th

Modeling and Inference with Relational Dynamic Bayesian Networks Cristina Manfredotti

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 188: Artificial Intelligence Fall 2009

Informatics 2D Reasoning and Agents Semester 2,

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Causality in Econometrics (3)

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Lecture 6: Graphical Models

Junction Tree, BP and Variational Methods

Bayesian Approaches Data Mining Selected Technique

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

The Monte Carlo Method: Bayesian Networks

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Probabilistic Classification

Learning Bayesian Networks (part 1) Goals for the lecture

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Statistical Learning

Bayesian networks (1) Lirong Xia

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

BN Semantics 3 Now it s personal!

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Transcription:

Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32

Lecture 5a Bayesian network April 14, 2016 2/32

Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian network (BN) 2.1. Definition of Bayesian network 2.2. Basics of BN 2.3. Basic building blocks of BN 2.4. D-separation 2.5. Preview of Deep Belief Network (DBN) 2.6. Markov chain as BN 2.7. HMM as BN 3/32

1. Objectives of Lecture 5a Objective 1 Learn minimal Bayesian network formalism that is necessary for the understanding of the deep belief network Objective 2 Learn how dependency and independency are encoded in the directed acyclic graph structure Objective 3 Learn the D-separation theorem 4/32

Objective 4 Understand to view Markov chain and hidden Markov in the BN framework Objective 5 Present a very simple example as a preview of how deep belief network works 5/32

2.1. Definition of Bayesian network 2. Bayesian network (BN) 2.1.Definition of Bayesian network Bayesian network G: DAG (directed acyclic graph) Each node of G represents a random variable Joint probability distribution P(a 1,, a n ) = n i=1 P(a i pa(a i )), where pa(a i ) denotes the parent nodes of a i BN models probabilistic causal structure 6/32

2.2. Basics of BN 2.2. Basics of BN Example A, B, C random variables with values a, b and c respectively P(a, b) = P(A = a, B = b) Chain rule P(a, b) = P(a)P(b a) P(a, b) = P(b)P(a b) 7/32

2.2. Basics of BN P(a, b, c) = P(A = a, B = b, C = c) = P(a)P(b a)p(c a, b) P(a, b, c) = P(c)P(b c)p(a c, b) 8/32

2.2. Basics of BN Probabilistic causality (inference) P(a, b) = P(a)P(b a) 9/32

2.2. Basics of BN Marginal Probability Conditional Probability Table (CPT) of P(b a) 10/32

2.2. Basics of BN Evidence = Conditioning If a = 1, P(b = 1 a = 1) = 0.8 Knowledge of a gives better knowledge of b, i.e. P(b = 1 a = 1) = 0.8 CPT of P(a b) Knowledge of b also gives better knowledge of a, i.e. P(a = 0 b = 0) = 0.75 11/32

2.2. Basics of BN Thus the information flows both ways even if the direction of arrow is from a to b 12/32

2.3. Basic building blocks of BN 2.3. Basic building blocks of BN Example: head-head node (c) P(a, b, c) = P(a)P(b)P(c a, b) P(a, b) = c P(a, b, c) = P(a)P(b) P(c a, b) c = P(a)P(b) 13/32

2.3. Basic building blocks of BN Thus A and B are independent (denoted as A B ) But in general P(a, b, c) P(a c)p(b c) i.e. A / B C Case: c is unknown Since A B, there is no information flow between a and b Case: c is known Since A / B C, information flows between a and b 14/32

2.3. Basic building blocks of BN Case: d is known Knowledge of d creates an information flow from d to c, thus some information flows between a and b. Thus A / B D, and thus A / B C 15/32

2.3. Basic building blocks of BN Example: tail-tail node (c) P(a, b, c) = P(a c)p(b c)p(c) P(a, b, c) P(a, b c) = = P(a c)p(b c) i.e. A B C. P(c) But in general P(a, b) P(a)P(b) i.e. A / B 16/32

2.3. Basic building blocks of BN Case: c is unknown Information flows a to c to b and also information flow from b to c to a Case: c is known Since A B C, knowledge of a does not given any further knowledge of b and vice versa 17/32

2.3. Basic building blocks of BN Example: head-tail node (b) P(a, b, c) = P(a)P(b a)p(c b) P(a, c b) = = P(a, b, c) = P(a)P(b a)p(c b) P(b) P(b) P(a, b)p(c b) = P(a b)p(c b) P(b) i.e. A C B. But in general P(a, c b) P(a b)p(c b), i.e. A / C B 18/32

2.4. D-separation 2.4. D-separation D-separation Type of node along a path A path from a node to another node is said to be blocked by a set A of nodes, if one of the following is true (i) the path contains a node t A such that it is of head-tail or tail-tail type (ii) there is a head-head node in the path such that neither the node itself nor any of its descendants is in A 19/32

2.4. D-separation D-separation Theorem Let A, B, C be mutually disjoint sets of nodes. Suppose any path from a node of A to a node of B is blocked by C. Then A B C, i.e. the random variables represented by A are conditionally independent of the random variables represented by B, given the random variables represented by C 20/32

2.5. Preview of Deep Belief Network (DBN) 2.5. Preview of Deep Belief Network (DBN) Example P(x 1, x 2, h) = P(h)P(x 1 h)p(x 2 h) 21/32

2.5. Preview of Deep Belief Network (DBN) Scenario [Output(image) reconstruction] when h = 1, it is very likely (x 1, x 2 ) = (1, 0) when h = 0, x 1 and x 2 are random CPT of P(x 1 h) 22/32

2.5. Preview of Deep Belief Network (DBN) CPT of P(x 2 h) Probability of P(h) 23/32

2.5. Preview of Deep Belief Network (DBN) Probability of P(x 1, x 2 h) By d-separation P(x 1, x 2 h) = P(x 1 h)p(x 2 h) 24/32

2.5. Preview of Deep Belief Network (DBN) Probability of P(x 1, x 2, h) 25/32

2.5. Preview of Deep Belief Network (DBN) Probability of P(x 1, x 2 ) Probability of P(h x 1, x 2 ) 26/32

2.5. Preview of Deep Belief Network (DBN) When (x 1, x 2 ) = (1, 0), h = 1 with probability 0.74 (x 1, x 2 ) = (0, 1), h = 0 with probability 0.93 (x 1, x 2 ) = (1, 1), h = 0 with slightly higher probability (x 1, x 2 ) = (0, 0), h = 0 with probability 0.76 It is more likely that (x 1, x 2 ) = (1, 0) produces h = 1 and all else h = 0 27/32

2.6. Markov chain as BN 2.6. Markov chain as BN Example 1 Markov chain Markov property P(x t x 1,, x t 1 ) = P(x t x t 1 ) Joint probability P(x 1,, x n ) = P(x 1 )P(x 2 x 1 )P(x 3 x 1, x 2 ) P(x t x 1,, x t 1 ) P(x n x 1,, x n 1 ) = P(x 1 )P(x 2 x 1 )P(x 3 x 2 ) P(x t x t 1 ) P(x n x n 1 ) 28/32

2.6. Markov chain as BN Markov chain is an example of Bayesian network Example 2 Markov chain of order 2 Thus P(x t x 1,, x t 1 ) = P(x t x t 1, x t 2 ) P(x 1,, x n ) = P(x 1 )P(x 2 x 1 )P(x 3 x 1, x 2 )P(x 4 x 2, x 3 ) P(x t x t 1, x t 2 ) Markov chain of order 2 is also an example of Bayesian network 29/32

2.6. Markov chain as BN Similarly, can define Markov chain of order k for any k 30/32

2.7. HMM as BN 2.7. HMM as BN Example 3 Hidden Markov Model (HMM) is a Markov chain The observation(output) x t depends on z t 31/32

2.7. HMM as BN Joint probability P(z 1,, z n, x 1,, x n ) = P(z 1 )P(z 2 z 1 ) P(z n z n 1 ) P(x 1 z 1 )P(x 2 z 2 ) P(x n z n ) HMM is an example of Bayesian network Can also define HMM of any order 32/32