Bayesian Networks Inference with Probabilistic Graphical Models

Similar documents
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Probabilistic Machine Learning

Introduction to Machine Learning

Be able to define the following terms and answer basic questions about them:

Bayesian Networks BY: MOHAMAD ALSABBAGH

제 7 장 : 베이즈넷과확률추론 ( 교재 ) 장병탁, 인공지능개론, 2017 장병탁 서울대학교컴퓨터공학부 ( 인공지능강의슬라이드 )

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

4 : Exact Inference: Variable Elimination

Directed Graphical Models or Bayesian Networks

Mathematical Formulation of Our Example

Pattern Recognition and Machine Learning

CS6220: DATA MINING TECHNIQUES

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Introduction to Artificial Intelligence. Unit # 11

Directed and Undirected Graphical Models

Rapid Introduction to Machine Learning/ Deep Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

STA 4273H: Statistical Machine Learning

Machine Learning Overview

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

CSCI-567: Machine Learning (Spring 2019)

Recent Advances in Bayesian Inference Techniques

Computational Genomics

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 6375 Machine Learning

CS6220: DATA MINING TECHNIQUES

Bayesian Machine Learning

Probabilistic Graphical Models

Probabilistic Graphical Models

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Chris Bishop s PRML Ch. 8: Graphical Models

Machine Learning Summer School

Lecture 9: PGM Learning

Probabilistic Graphical Models (I)

Be able to define the following terms and answer basic questions about them:

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Graphical Models and Kernel Methods

Probabilistic Graphical Networks: Definitions and Basic Results

Lecture 6: Graphical Models: Learning

Final Exam, Spring 2006

Bayesian Networks Representation

Introduction to Graphical Models

A brief introduction to Conditional Random Fields

Introduction to Probabilistic Graphical Models

Qualifier: CS 6375 Machine Learning Spring 2015

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Based on slides by Richard Zemel

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

3 : Representation of Undirected GM

Algorithmisches Lernen/Machine Learning

STA 4273H: Statistical Machine Learning

Intelligent Systems (AI-2)

The Origin of Deep Learning. Lili Mou Jan, 2015

Chapter 16. Structured Probabilistic Models for Deep Learning

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Probabilistic Graphical Models

Undirected Graphical Models

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Probabilistic Graphical Models for Image Analysis - Lecture 1

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Bayesian Methods in Artificial Intelligence

Intelligent Systems (AI-2)

Final Examination CS540-2: Introduction to Artificial Intelligence

Final Exam, Machine Learning, Spring 2009

Probabilistic Graphical Models Lecture Notes Fall 2009

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Generative Clustering, Topic Modeling, & Bayesian Inference

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

CSC 412 (Lecture 4): Undirected Graphical Models

Introduction to Bayesian Learning

Sampling Methods (11/30/04)

Graphical Models for Automatic Speech Recognition

Unsupervised Learning

Directed and Undirected Graphical Models

Probabilistic Graphical Models

CSCE 478/878 Lecture 6: Bayesian Learning

Deep unsupervised learning

Lecture 13 : Variational Inference: Mean Field Approximation

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Bayesian Networks. Motivation

Intelligent Systems (AI-2)

CPSC 540: Machine Learning

Lecture 15. Probabilistic Models on Graph

Bayesian Networks (Part II)

Mining Classification Knowledge

Directed Graphical Models

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Introduction to Probabilistic Graphical Models

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Machine Learning, Midterm Exam

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Transcription:

4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1

Machine Learning? Learning System: A system which autonomously improves its performance (P) by automatically forming model (M) based on experiential data (D) obtained from interaction with environment (E) Self-improving Systems (Perspective of AI) Knowledge Discovery (Perspective of Data Mining) Data-Driven Software Design (Perspective of Software Engineering) Automatic Programming (Perspective of Computer Engineering) 4190.408 Artificial (2016-Spring) 2

Machine Learning as Automatic Programming Traditional Programming Data Program Computer Output Machine Learning Data Output Computer Program 4190.408 Artificial (2016-Spring) 3

Machine Learning (ML): Three Tasks Supervised Learning Estimate an unknown mapping from known input and target output pairs Learn f w from training set D = {(x,y)} s.t. fw ( x) y f ( x) Classification: y is discrete Regression: y is continuous Unsupervised Learning Only input values are provided Learn f w from D = {(x)} s.t. f ( w x) x Density estimation and compression Clustering, dimension reduction Sequential (Reinforcement) Learning Not target, but rewards (critiques) are provided sequentially Learn a heuristic function f w from D t = {(s t,a t,r t ) t = 1, 2, } s.t. With respect to the future, not just past Sequential decision-making Action selection and policy learning f w ( s, a, r ) t t t 4190.408 Artificial (2016-Spring) 4

Machine Learning Models Supervised Learning Neural Nets Decision Trees K-Nearest Neighbors Support Vector Machines Unsupervised Learning Self-Organizing Maps Clustering Algorithms Manifold Learning Evolutionary Learning Probabilistic Graph Bayesian Networks Markov Networks Hidden Markov Models Hypernetworks Dynamic System Kalman Filters Sequential Monte Carlo Particle Filters Reinforcement Learning 4190.408 Artificial (2016-Spring) 5

Outline Bayesian Inference Monte Carlo Importance Sampling MCMC Probabilistic Graphical Models Bayesian Networks Markov Random Fields Hypernetworks Architecture and Algorithms Application Examples Discussion 4190.408 Artificial (2016-Spring) 6

Bayes Theorem 4190.408 Artificial (2016-Spring) 7

MAP vs. ML What is the most probable hypothesis given data? From Bayes Theorem MAP (Maximum A Posteriori) ML (Maximum Likelihood) 4190.408 Artificial (2016-Spring) 8

Bayesian Inference 4190.408 Artificial (2016-Spring) 9

Prof. Schrater s Lecture Notes (Univ. of Minnesota) 4190.408 Artificial (2016-Spring) 10

4190.408 Artificial (2016-Spring) 11

Monte Carlo (MC) Approximation 4190.408 Artificial (2016-Spring) 12

Markov chain Monte Carlo 4190.408 Artificial (2016-Spring) 13

MC with Importance Sampling 4190.408 Artificial (2016-Spring) 14

Graphical Models Graphical Models (GM) Causal Models Chain Graphs Other Semantics Dependency Networks Directed GMs Undirected GMs FST HMMs DBNs Kalman Bayesian Networks Mixture Models Decision Trees Simple Models Markov Random Fields / Markov networks Segment Models Factorial HMM Mixed Memory Markov Models BMMs PCA LDA Gibbs/Boltzman Distributions 4190.408 Artificial (2016-Spring) 15

BAYESIAN NETWORKS 4190.408 Artificial (2016-Spring) 16

Bayesian network Bayesian Networks DAG (Directed Acyclic Graph) Express dependence relations between variables Can use prior knowledge on the data (parameters) A B C n P( X) P( pa i 1 X i i ) D E P(A,B,C,D,E) = P(A)P(B A)P(C B) P(D A,B)P(E B,C,D) 4190.408 Artificial (2016-Spring) 17

Representing Probability Distributions Probability distribution = probability for each combination of values of these attributes Hospital patients described by Background: age, gender, history of diseases, Symptoms: fever, blood pressure, headache, Diseases: pneumonia, heart attack, Naïve representations (such as tables) run into troubles 20 attributes require more than 220 106 parameters Real applications usually involve hundreds of attributes 4190.408 Artificial (2016-Spring) 18

Bayesian Networks - Key Idea Exploit regularities!!! utilize conditional independence Graphical representation of conditional independence respectively causal dependencies 4190.408 Artificial (2016-Spring) 19

Bayesian Networks 1. Finite, directed acyclic graph 2. Nodes: (discrete) random variables 3. Edges: direct influences 4. Associated with each node: a table representing a conditional probability distribution (CPD), quantifying the effect the parents have on the node E J A B M 4190.408 Artificial (2016-Spring) 20

Bayesian Networks X 1 X 2 (0.2, 0.8) X (0.6, 0.4) 3 true 1 (0.2,0.8) true 2 (0.5,0.5) false 1 (0.23,0.77) false 2 (0.53,0.47) 4190.408 Artificial (2016-Spring) 21

Example: Use a DAG to model the causality Martin Oversleep Train Strike Norman Oversleep Martin Late Norman Late Norman untidy Boss Failure-in-Love Project Delay Office Dirty Boss Angry 4190.408 Artificial (2016-Spring) 22

Example: Attach prior probabilities to all root nodes Martin Oversleep Martin Oversleep Probability T 0.01 F 0.99 Train Strike Probability T 0.1 F 0.9 Train Strike Martin Late Norman Oversleep Probability T 0.2 F 0.8 Norman Late Norman Oversleep Norman untidy Boss Failure-in-Love Project Delay Office Dirty Boss failurein-love Probability T 0.01 F 0.99 Boss Angry 4190.408 Artificial (2016-Spring) 23

Example: Attach prior probabilities to non-root nodes Each column is summed to 1. Martin Oversleep Train Strike Norman Oversleep Martin Late Norman Late Norman untidy Boss Failure-in-Love Martin Late Train strike Project Delay T F Martin oversleep T F T F T 0.95 0.8 0.7 0.05 F 0.05 0.2 0.3 0.95 Boss Angry Office Dirty Norman untidy Norman oversleep T F T 0.6 0.2 F 0.4 0.8 4190.408 Artificial (2016-Spring) 24

Example: Attach prior probabilities to non-root nodes Martin Oversleep Each column is summed to 1. T Boss Failure-in-love Train Strike Project Delay T F T F Martin Norman Office Dirty Late Late T F T F T F T F very 0.98 0.85 0.6 0.5 0.3 0.2 0 0.01 Boss Project Office Boss mid 0.02 0.15 0.3 0.25 0.5 0.5 0.2 0.02 Failure-in-Love Delay Dirty Angry little 0 0 0.1 0.25 0.2 0.3 0.7 0.07 no 0 0 0 0 0 0 0.1 0.9 Boss Angry F Norman Oversleep Norman untidy 4190.408 Artificial (2016-Spring) 25

Inference 4190.408 Artificial (2016-Spring) 26

MARKOV RANDOM FIELDS (MARKOV NETWORKS) 4190.408 Artificial (2016-Spring) 27

Graphical Models Directed Graph (e.g. Bayesian Network) Undirected Graph (e.g. Markov Random Field) 4190.408 Artificial (2016-Spring) 28

Bayesian Image Analysis Noise Transmission Original Image Degraded (observed) Image Pr Original Image Degraded Image A Posteriori Probabilit y Degradatio n Process A Priori Probabilit y Pr Degraded Image Original Image Pr Original Image Pr Degraded Image Marginal Likelihood 4190.408 Artificial (2016-Spring) 29

Image Analysis We could thus represent both the observed image (X) and the true image (Y) as Markov random fields. X observed image Y true image And invoke the Bayesian framework to find P(Y X) 4190.408 Artificial (2016-Spring) 30

Remember P(Y X) = Details P(X Y )P(Y ) P(X) µ P(X Y )P(Y ) P(Y X) proportional to P(X Y)P(Y) P(X Y) is the data model. P(Y) models the label interaction. Next we need to compute the prior P(Y=y) and the likelihood P(X Y). 4190.408 Artificial (2016-Spring) 31

Back to Image Analysis Likelihood can be modeled as a mixture of Gaussians. The potential is modeled to capture the domain knowledge. One common model is the Ising model of the form βy i y j 4190.408 Artificial (2016-Spring) 32

Bayesian Image Analysis Let X be the observed image = {x1,x2 xmn} Let Y be the true image = {y1,y2 ymn} Goal : find Y = y* = {y1*,y2* } such that P(Y = y* X) is maximum. Labeling problem with a search space of Lmn L is the set of labels. m*n observations. 4190.408 Artificial (2016-Spring) 33

Unfortunately Observed Image SVM MRF 4190.408 Artificial (2016-Spring) 34

Markov Random Fields (MRFs) Introduced in the 1960s, a principled approach for incorporating context information. Incorporating domain knowledge. Works within the Bayesian framework. Widely worked on in the 70s, disappeared over the 80s, and finally made a big come back in the late 90s. 4190.408 Artificial (2016-Spring) 35

Markov Random Field Random Field: Let F { F1, F2,..., FM } be a family of random variables defined on the set S, in which each random variable takes a value in a label set L. The family F is called a random field. F i f i Markov Random Field: F is said to be a Markov random field on S with respect to a neighborhood system N if and only if the following two conditions are satisfied: Positivity: P( f ) 0, f F Markoviani ty : P( f i S { i}) P( f i f N i ) 4190.408 Artificial (2016-Spring) 36

Inference Finding the optimal y* such that P(Y=y* X) is maximum. Search space is exponential. Exponential algorithm - simulated annealing (SA) Greedy algorithm iterated conditional modes (ICM) There are other more advanced graph cut based strategies. 4190.408 Artificial (2016-Spring) 37

Sampling and Simulated Annealing Sampling A way to generate random samples from a (potentially very complicated) probability distribution. Gibbs/Metropolis. Simulated annealing A schedule for modifying the probability distribution so that, at zero temperature, you draw samples only from the MAP solution. If you can find the right cooling schedule the algorithm will converge to a global MAP solution. Flip side --- SLOW finding the correct schedule is non trivial. 4190.408 Artificial (2016-Spring) 38

Iterated Conditional Modes Greedy strategy, fast convergence Idea is to maximize the local conditional probabilities iteratively, given an initial solution. Simulated annealing with T =0. 4190.408 Artificial (2016-Spring) 39

Parameter Learning Supervised learning (easiest case) * Maximum likelihood: For an MRF: 1 P( f ) e Z( ) arg max P( f ) U ( f )/ T 4190.408 Artificial (2016-Spring) 40

So we approximate Pseudo Likelihood PL( f ) P( f f ) = i X U( f ) Ui( fi, fn i ) i U ( f, f ) i i N i i N U ( f, f ) i j j N j Large lattice theorem: in the large lattice limit M, PL converges to ML estimate. Turns out that a local learning method like pseudo-likelihood when combined with a local inference method such as ICM does quite well. Close to optimal results. f e j L e 4190.408 Artificial (2016-Spring) 41