Stat 521A Lecture 18 1

Similar documents
Probabilistic Graphical Models

Probabilistic Graphical Models (Cmput 651): Hybrid Network. Matthew Brown 24/11/2008

Dynamic models 1 Kalman filters, linearization,

Probabilistic Graphical Models

Minimizing D(Q,P) def = Q(h)

Probabilistic Graphical Models

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Inference and Representation

Representation of undirected GM. Kayhan Batmanghelich

Variational Inference (11/04/13)

Probabilistic Graphical Models

Part 1: Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation

Lecture 9: PGM Learning

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Machine Learning for Data Science (CS4786) Lecture 24

Lecture 2: From Linear Regression to Kalman Filter and Beyond

4 : Exact Inference: Variable Elimination

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Probabilistic Graphical Models

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Bayesian Machine Learning - Lecture 7

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

CPSC 540: Machine Learning

Lecture 17: May 29, 2002

Linear Dynamical Systems

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

9 Forward-backward algorithm, sum-product on factor graphs

Sampling Algorithms for Probabilistic Graphical models

Probabilistic and Bayesian Machine Learning

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

14 : Theory of Variational Inference: Inner and Outer Approximation

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Probabilistic Graphical Models

Graphical Models and Kernel Methods

p L yi z n m x N n xi

Variational algorithms for marginal MAP

Exact Inference I. Mark Peot. In this lecture we will look at issues associated with exact inference. = =

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Junction Tree, BP and Variational Methods

CPSC 540: Machine Learning

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models

Review: Directed Models (Bayes Nets)

COMP90051 Statistical Machine Learning

Expectation Propagation in Dynamical Systems

Inference in Bayesian Networks

STA 4273H: Statistical Machine Learning

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Inference as Optimization

Machine Learning Summer School

13 : Variational Inference: Loopy Belief Propagation

3 : Representation of Undirected GM

Expectation Propagation in Factor Graphs: A Tutorial

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Lecture 8: PGM Inference

Fisher Information in Gaussian Graphical Models

Machine Learning 4771

Variable Elimination: Algorithm

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Artificial Intelligence

Lecture 6: Graphical Models: Learning

Probabilistic Fundamentals in Robotics. DAUIN Politecnico di Torino July 2010

CSC 412 (Lecture 4): Undirected Graphical Models

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

CPSC 540: Machine Learning

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Approximate Inference Part 1 of 2

Introduc)on to Ar)ficial Intelligence

Approximate Inference Part 1 of 2

5. Sum-product algorithm

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Pattern Recognition and Machine Learning

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Lecture 6: Bayesian Inference in SDE Models

Machine Learning Lecture 14

Lecture 6: Graphical Models

Lecture 7: Optimal Smoothing

Variable Elimination: Algorithm

What you need to know about Kalman Filters

Expectation propagation for signal detection in flat-fading channels

Recitation 9: Loopy BP

Hidden Markov Models (recap BNs)

Undirected graphical models

Belief Update in CLG Bayesian Networks With Lazy Propagation

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Statistical Learning

Graphical Models for Statistical Inference and Data Assimilation

12 : Variational Inference I

Bayesian Machine Learning

6.867 Machine learning, lecture 23 (Jaakkola)

A Gaussian Tree Approximation for Integer Least-Squares

Bayesian Networks: Belief Propagation in Singly Connected Networks

Efficient Belief Propagation for Higher Order Cliques Using Linear Constraint Nodes

Artificial Intelligence

Probabilistic Graphical Models (I)

Transcription:

Stat 521A Lecture 18 1

Outline Cts and discrete variables (14.1) Gaussian networks (14.2) Conditional Gaussian networks (14.3) Non-linear Gaussian networks (14.4) Sampling (14.5) 2

Hybrid networks A hybrid GM contains discrete and cts variables Except in the case that everything is all discrete or all Gaussian, exact inference is rarely possible The reason is that the basic operations of multiplication, marginalization and conditioning are not closed except for tables and MVNs 3

Gaussian networks We can always convert a Gaussian DGM or UGM to an MVN and do exact inference in O(d 2 ) space and O(d 3 ) time However, d can be large (eg 1000x1000 image) We seek methods that exploit the graph structure, that will take O(d w 2 ) space and O(d w 3 ) time, where w is the tree width In cases where w is too large, we can use loopy belief propagation, which takes O(1) space and O(d) time 4

Canonical potentials When performing VarElim or ClqTree propagation, we have to represent factors \phi(x). These may not be Gaussians, but can always be represented as exponentials of quadratics 5

Operations on canonical potentials Multiplication * = Division 6

Operations on canonical potentials Marginalization (requires KYY be pd) Conditioning (Y=y) 7

Kalman filter- smoother If you apply the FB algorithm with these new operators, you get the same results as the RTS smoother 8

Gaussian LBP If the treewidth is too large, we can pass messages on the original (pairwise) graph We just apply the regular BP rules with the new operators. Once can show this is equivalent to the following: 9

Gaussian LBP Thm 14.2.4. If LBP converges, then the means are exact, but the variances are too small (overconfident) Thm. A sufficient condition for convergence is that the potentials are pairwise normalizable Any attractive model (all +ve correlations) is pairwise normalizable The method for computing the means is similar to solving a set of linear equations 10

Pairwise normalizable Def 7.3.3. A pairwise MRF with energies of the form ǫ i (x i ) = d i 0+d i 1x 1 +d i 2x 2 i ǫ ij (x i,x j ) = a i,j 00 +ai,j 01 x i+a ij 10 x j+a ij 11 x ix j +a ij 02 x2 i +a ij 20 x2 j is called pairwise normalizable if d i 2>0, i and ( a ij 02 a ij a ij 11 11 /2 /2 aij 20 Thm 7.3.4. If the MRF is pairwise normalizable, then it defines a valid Gaussian. Sufficient but not necessary eg. 1 0.6 0.6 0.6 1 0.6 0.6 0.6 1 ) is psd for all i,j May be able to reparameterize the node/ edge potentials to ensure pairwise normalized. 11

12

Conditional linear Gaussian networks Suppose all discrete nodes only have discrete parents, and all cts nodes either have discrete parents, cts parents, or no parents. Further, assume all cts CPDs have the form p(x=x C=c,D=k)=N(x w T kc,σ 2 k) This is called a CLG network. It is equivalent to a mixture of MVNs, where the distribution over discrete indicators has structure, as does each covariance matrix. We create a canonical factor for each discrete setting of the variables in a clique. 13

Inference in CLG networks Thm 14.3.1. Inference in CLG networks is NP-hard, even if they are polytrees. Pf (sketch). Consider the network below. When we sum out D 1, p(x 1 ) is a mixture of 2 Gaussians. In general, p(x i ) is a mixture of 2 i Gaussians. 14

Weak marginalization To prevent the blowup in the number of mixture components, we can project back to the class of single mixtures at each step, as in EP Prop 14.3.6. argmin_q KL(p q) where q is a Gaussian has parameters ( Prop 14.3.7. argmin_q KL(p,q) where p is a mixture of Gaussians is a single Gaussian with params M projection 15

Weak marginalization 16

Canonical vs moment form Weak marginalization is defined in terms of moment form To convert a canonical factor to moment form, we require that it represent a valid joint density This typically requires we pass messages from parents to children. Once we have initialized all factors, they can be converted to moment form. However, division in the backwards pass may cause some variances to become negative! (see Ex 14.3.13) EP is hairy! 17

Strong marginalization By using a constrained elimination order, in which we integrate out before summing out, we can ensure that the upwards pass never needs to perform weak marginalization. Furthermore, one can show that the downwards pass results in exact results for the discrete variables and exact 1 st and 2 nd moments for the cts variables (Lauritzen s strong jtree algorithm) However, the constrained elim order usually results in large discrete cliques, making this often impractical. 18

19

Non linear dependencies In a linear Gaussian network, the mean is a linear function of its parents. Now assume X i = f(u i, Z i ), where Z i ~ N(0,I) Examples 20

Taylor series approx We can linearize f and then fit a Gaussian (basis of the EKF algorithm) Can be bad if f not linear near mu 21

M projection using quadrature Best Gaussian approx has these moments Gaussian quadrature computes this integral for any W(z)>0 (here, Gaussian) 22

Unscented transform Pass mean and +- std in each dim through transform, and then fit Gaussian to transformed points 23

Nonlinear GMs We approximate nonlinear factors by approximating them by Gaussians The above methods require a joint Gaussian factor, not a canonical factor we have to pass messages in topological order, and introduce variables one at a time to use the above tricks Linearization is done relative to current \mu. In EP, we iterate, and re-approximate each factor in the context of its incoming messages, which provides a better approx. to the posterior. Pretty hairy. 24

Discrete children, cts parents C -> D arcs are useful eg thermostat turns on/off depending on temperature We can approximate Gaussian * logistic by a Gaussian (variational approx) We can combine these Gaussian factors with the other factors as usual. 25

26

Sampling Sampling is the easiest way to handle cts and mixed variables Collapsed particles (Rao-Blackwellisation): sample the discretes, integrate out cts analytically. Each particle has a value for D and a Gaussian over C. Good for PF or MCMC. 27

Non-parametric BP We can combine sampling and msg passing. We approximate factors/ msgs by samples. Factors are lower dimensional than full joints. Eg hand-pose tracking 28

Adaptive discretization We can discretize all the cts variables, then use a method for discrete vars. To increase accuracy, we expand the grid resolution for variables whose posterior entropy is high. Can use such approximations as proposal distributions for MH. 29