Variable Elimination: Basic Ideas

Similar documents
Variable Elimination: Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Variable Elimination (VE) Barak Sternberg

Variable Elimination: Algorithm

Computational Complexity of Inference

Exact Inference: Variable Elimination

Inference as Optimization

Directed and Undirected Graphical Models

Junction Tree, BP and Variational Methods

Variational Inference. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari

Probabilistic Graphical Models (I)

Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4

Linear Dynamical Systems

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Structured Variational Inference

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

CS281A/Stat241A Lecture 19

Template-Based Representations. Sargur Srihari

Bayesian Network Representation

Likelihood Weighting and Importance Sampling

Message Passing Algorithms and Junction Tree Algorithms

Learning MN Parameters with Approximation. Sargur Srihari

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Exact Inference: Clique Trees. Sargur Srihari

Inference in Bayesian Networks

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

COMPSCI 276 Fall 2007

last two digits of your SID

Machine Learning for Computer Vision Winter term November 2017 Topic: Probabilistic Graphical Models

12 : Variational Inference I

COMP538: Introduction to Bayesian Networks

Machine Learning Lecture 14

MAP Examples. Sargur Srihari

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Using Graphs to Describe Model Structure. Sargur N. Srihari

Probabilistic Graphical Models (Cmput 651): Hybrid Network. Matthew Brown 24/11/2008

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Graphical Models - Part II

4 : Exact Inference: Variable Elimination

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Lecture 8: December 17, 2003

6.867 Machine learning, lecture 23 (Jaakkola)

Introduction to Probabilistic Graphical Models

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Directed Graphical Models or Bayesian Networks

Stat 521A Lecture 18 1

Intelligent Systems (AI-2)

Probabilistic Graphical Models

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

Graphical Models and Kernel Methods

Bayesian Networks. Exact Inference by Variable Elimination. Emma Rollon and Javier Larrosa Q

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Recitation 9: Graphical Models: D-separation, Variable Elimination and Inference

Local Probabilistic Models: Continuous Variable CPDs

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Bayes Networks 6.872/HST.950

Context-specific independence Parameter learning: MLE

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Lecture 17: May 29, 2002

Basic Sampling Methods

Lecture 6: Graphical Models

Inference and Representation

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Probabilistic Graphical Models

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Intro to AI: Lecture 8. Volker Sorge. Introduction. A Bayesian Network. Inference in. Bayesian Networks. Bayesian Networks.

On the Relationship between Sum-Product Networks and Bayesian Networks

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Probabilistic Graphical Models

Bayesian Approaches Data Mining Selected Technique

AND/OR Importance Sampling

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Structure Learning in Bayesian Networks

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473

13: Variational inference II

Graphical Models. Andrea Passerini Statistical relational learning. Graphical Models

Inference in Bayesian networks

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

From Bayesian Networks to Markov Networks. Sargur Srihari

Lecture 8: Bayesian Networks

PROBABILISTIC REASONING SYSTEMS

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Chris Bishop s PRML Ch. 8: Graphical Models

11 The Max-Product Algorithm

Graph structure in polynomial systems: chordal networks

Probability. CS 3793/5233 Artificial Intelligence Probability 1

13 : Variational Inference: Loopy Belief Propagation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Bayesian Networks. Example of Inference. Another Bayes Net Example (from Andrew Moore) Example Bayes Net. Want to know P(A)? Proceed as follows:

Probability Propagation in Singly Connected Networks

Transcription:

Variable Elimination: asic Ideas Sargur srihari@cedar.buffalo.edu 1

Topics 1. Types of Inference lgorithms 2. Variable Elimination: the asic ideas 3. Variable Elimination Sum-Product VE lgorithm Sum-Product VE for Conditional Probabilities 4. Variable Ordering for VE 2

Inference lgorithms Types of inference algorithms 1. Exact 1. Variable Elimination 2. Clique trees (elief Propagation) 2. pproximate 1. Optimization 1. Propagation with approximate messages 2. Variational (analytical approximations) 2. Particle-based (sampling) 3

Variable Elimination: asic Ideas We begin with principles underlying exact inference in PGMs s we show, the same N structure that allows compaction of complex distributions also helps support inference In particular we can use dynamic programming techniques to perform inference even for large and complex networks in reasonable time 4

Intuition for Variable Elimination Consider inference is a very simple N ààcàd E.g., sequence of words CPDs are first order word probabilities We consider phased computation Probabilities of four words: The, quick, brown, fox Use results of a previous phase in computation of next phase Then reformulate this process in terms of a global computation on the joint distribution 5 D C

Exact Inference: Variable Elimination To compute P(), i.e., distribution of values b of, we have P() = P(,) = P(a)P( a) a required P(a), P(b a) available in N If has k values and has m values For each b: k multiplications and k-1 addition Since there are m values of, process is repeated for each value of b: this computation is O(k x m) a C D 6

Moving Down N ssume we want to compute P(C) Using same analysis P(C) = P(,C) = P(b)P(C b) P(c b) is given in CPD b ut P() is not given as network parameters It can be computed using P() = P(,) = P(a)P( a) a b a C D If and C have k values each, complexity is O(k 2 ) 7

Computation depends on Structure 1. Structure of N is critical for computation If had been a parent of C P(C) = b P(b)P(C b) would not have sufficed 2. lgorithm does not compute single values but sets of values at a time P() over all possible values of are used to compute P(C) C D

Complexity of General Chain In general, if we have X 1 àx 2 à..àx n and there are k values of X i, total cost is O(nk 2 ) Naïve evaluation Generate entire joint and summing it out Would generate k n probabilities for the events x 1,.. x n In this example, despite exponential size of joint distribution we can do inference in linear time X 1 X 2 X n 9

Insight that avoids exponentiality The joint probability decomposes as P(,,C,D)=P()P( )P(C )P(D C) To compute P(D) we need to sum together all entries where D=d 1 nd separately entries where D=d 2 Exact computation for P(D) is: Examine summation 3 rd & 4 th terms of first 2 terms: P(c 1 b 1 )P(d 1 c 1 ) Modify to first compute P(a 1 )P(b 1 a 1 )+P(a 2 )P(b 1 a 2 ) then multiply by common term 10 D C

First Transformation of sum Same structure is repeated throughout table Performing the same transformation we get the summation for P(D) as Observe certain terms are repeated several times in this expression P(a 1 )P(b 1 a 1 )+P(a 2 )P(b 1 a 2 ) and P(a 1 )P(b 2 a 1 )+P(a 2 )P(b 2 a 2 ) are repeated four times 11

2 nd & 3 rd transformation on the sum Defining τ 1 : Val() àr where τ 1 (b 1 ) and τ 1 (b 2 ) are the two expressions, we get Can reverse the order of a sum and product sum first, product next 12

Fourth Transformation of sum gain notice shared expressions that are better computed once and used multiple times We define τ 2 : Val(C) àr τ 2 (c 1 )=τ 1 (b 1 )P(c 1 b 1 )+τ 1 (b 2 )P(c 1 b 2 ) τ 2 (c 2 )=τ 1 (b 1 )P(c 2 b 1 )+τ 1 (b 2 )P(c 2 b 2 ) 13

Summary of computation We begin by computing τ 1 () Requires 4 multiplications and 2 additions Using it we can compute τ 2 (C) also requires 4 multis and 2 adds which Finally we compute P(D) at same cost Total no of ops is 18 Joint distribution requires 16 x 3=48 mps and 14 adds 14

Computation Summary Transformation we have performed has steps P(D) = P()P( ) P(C )P(D C) We push the first summation resulting in P(D) = P(D C) P(C ) P()P( ) We compute the product ψ 1 (,) =P()P( ) and sum out to obtain the function For each value of b, we compute τ 1 (b) = ψ 1 (,b) = P()P(b ) We then continue C C τ 2 (C) = ψ 2 (,C) Resulting τ 2 (C) is used to compute P(D) ψ 2 (,C) = τ 1 ()P(C ) τ 1 () = ψ 1 (,) 15

Computation is Dynamic Programming Naiive way for would have us compute every P(b) = P()P(b ) many times, once for every value of C and D For a chain of length n this would be computed exponentially many times Dynamic Programming inverts order of computation performing it inside out rather than outside in First computing once for all values in τ 1 (), that allows us to compute τ 2 (C) once for all, etc. 16 P(D) = P()P( ) P(C )P(D C) C

Ideas that prevented exponential blowup ecause of structure of N, some subexpressions depend only on a small no. of variables y computing and caching these results we can avoid generating them exponential no. of times 17