Expectation propagation

Similar documents
Gaussian process classification: a message-passing viewpoint

Differentiating Gaussian Processes

NUMERICAL DIFFERENTIATION

EM and Structure Learning

Lecture 10 Support Vector Machines II

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

Feature Selection: Part 1

Lecture 10: Euler s Equations for Multivariable

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

1 Motivation and Introduction

Online Appendix to The Allocation of Talent and U.S. Economic Growth

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

Conjugacy and the Exponential Family

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Variational Bayesian Theory

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

Probabilistic & Unsupervised Learning

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Solutions Homework 4 March 5, 2018

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Mean Field / Variational Approximations

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Limited Dependent Variables

First Year Examination Department of Statistics, University of Florida

Why BP Works STAT 232B

= z 20 z n. (k 20) + 4 z k = 4

Math 426: Probability MWF 1pm, Gasson 310 Homework 4 Selected Solutions

A tutorial on variational Bayesian inference. Charles W. Fox & Stephen J. Roberts

Gaussian Mixture Models

Lecture 3: Probability Distributions

SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

The Expectation-Maximization Algorithm

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Integrals and Invariants of Euler-Lagrange Equations

Numerical Algorithms for Visual Computing 2008/09 Example Solutions for Assignment 4. Problem 1 (Shift invariance of the Laplace operator)

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Chapter 4: Root Finding

Least squares cubic splines without B-splines S.K. Lucas

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

% & 5.3 PRACTICAL APPLICATIONS. Given system, (49) , determine the Boolean Function, , in such a way that we always have expression: " Y1 = Y2

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Lecture 12: Discrete Laplacian

Joint Statistical Meetings - Biopharmaceutical Section

Global Gaussian approximations in latent Gaussian models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Note 10. Modeling and Simulation of Dynamic Systems

Maximum Likelihood Estimation

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Lecture 21: Numerical methods for pricing American type derivatives

1 Matrix representations of canonical matrices

Lecture Notes on Linear Regression

Modelli Clamfim Equazione del Calore Lezione ottobre 2014

PROBABILITY PRIMER. Exercise Solutions

What would be a reasonable choice of the quantization step Δ?

CIE4801 Transportation and spatial modelling Trip distribution

P exp(tx) = 1 + t 2k M 2k. k N

Integrals and Invariants of

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

A Hybrid Variational Iteration Method for Blasius Equation

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material

NP-Completeness : Proofs

Comments on Detecting Outliers in Gamma Distribution by M. Jabbari Nooghabi et al. (2010)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Announcements EWA with ɛ-exploration (recap) Lecture 20: EXP3 Algorithm. EECS598: Prediction and Learning: It s Only a Game Fall 2013.

Formal solvers of the RT equation

arxiv: v2 [stat.me] 26 Jun 2012

Inexact Newton Methods for Inverse Eigenvalue Problems

Stat 543 Exam 2 Spring 2016

ON MECHANICS WITH VARIABLE NONCOMMUTATIVITY

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

VQ widely used in coding speech, image, and video

Lecture 20: Hypothesis testing

Stat 543 Exam 2 Spring 2016

Parametric fractional imputation for missing data analysis

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning

Generalized Linear Methods

Computing Correlated Equilibria in Multi-Player Games

Economics 101. Lecture 4 - Equilibrium and Efficiency

Problem Set 9 Solutions

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

EEE 241: Linear Systems

Assortment Optimization under MNL

Professor Terje Haukaas University of British Columbia, Vancouver The Q4 Element

CHAPTER 14 GENERAL PERTURBATION THEORY

Thermodynamics and statistical mechanics in materials modelling II

MATH 281A: Homework #6

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Week 5: Neural Networks

Learning Theory: Lecture Notes

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Radar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed.

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Transcription:

Expectaton propagaton Lloyd Ellott May 17, 2011

Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors f. In dong so, we may recover approxmatons of the margnals and jonts of p, or we may fnd the normalzng constant for p. EP nvolves parametersng an approxmaton f of each factor f and teratvely ncludng each factor nto the approxmaton by mnmsng a KL-dvergence.

For each factor f, fx an approxmatng famly of dstrbutons Ω. Gven (1) and Ω, the EP algorthm s as follows: ntalze approxmatons f repeat for = 1,..., n do f argmax ˆf Ω KL 1 B f 1 f C ˆf j end for untl stoppng condton reached Here, B and C are normalsng constants. j f (2)

Wrtng p = j f, we see that the update n the EP algorthm sets f to: 1 argmn ˆf Ω B (f p )(x) log Cf (x) dx, such that B ˆf (x) (ˆf p )(x)dx = C. (3) From ths equaton, we see that f ˆf were unconstraned (.e. f Ω were all functons on the range of x), then ˆf = C B f would be a soluton. Unfortunately, the computaton of B and C are often ntractable. Therefore, to make progress n EP, we must place constrants on f so that mnmsng (3) s tractable.

There are two man sorts of constrants on f that we wll examne: 1. Exponental famly constrans, 2. Fully factorsed constrants. In what follows we wll see the general mplcaton of these assumptons n detal, makng reference to the formulaton of EP updates as mnmsng (2). Other constrants are possble: any choce of Ω for whch the computaton of (3) s tractable leads to an EP algorthm.

Exponental famly constrants Suppose f (x) = h(x) exp(η T u(x) A(η)) and p(x) s any dstrbuton. We want to fnd the suffcent statstc η that mnmses the followng KL-dvergence: KL(p q) = p(x) log p f (x) dx, = E p [p(x)] + E p [h(x)] A(η) + η T E p [u(x)]. We proceed by equatng the dervatve of wth respect to η to zero: η A(η) = E p [u(x)]. (4) But, because f s from an exponental famly, η A(η) = E f [u(x)]. Thus, (9) s mnmsed when E f [u(x)] = E p [u(x)]. Ths s why EP s sometmes called moment matchng.

Returnng to the stuaton of EP, suppose we restrct f to be proportonal to a dstrbuton n a a gven exponental famly: Ω = {f (x) : f (x) h (x) exp(η T u(x) A (η)) η}. Wthout loss of generalty, we have assumed the same form of the suffcent statstcs u(x) for each approxmatng dstrbuton. Suppose f exp(η T u(x) A ( η )) are the current ste approxmatons (proportonalty n η ). The EP mnmsaton step for f (2) s: ( ) 1 f argmax KL ˆf Ω B f p 1 C ˆf p. Collectng terms n the exponent, the second argument n the KL-dvergence s exponental famly wth (proportonalty n ˆη ): ˆf p exp (ˆη T + j η T j )u(x) A (ˆη ) j A ( η ). (5)

Suppose η j agree gven for all j. We wll use (5) to wrte Eˆf p [u(x)] as a functon of ˆη : Suppose Φ (ˆη ) = Eˆf p [u(x)]. To proceed, we must be able to compute E f p [u(x)] for the fxed η j. In ths case, the update (2) s gven by the followng: ˆη Φ 1 (E f p [u(x)]). (6)

Fully factorsed constrants Suppose x = (x 1,..., x k ) and p(x) = 1 B n f (C ), =1 where C 1,..., C n are subsets of x. (N.b. that the C mght overlap.) Ths model has the same expressve power as factor graphs: If G s a factor graph then the terms f (C ) correspond to the factors of G. In partcular, f G s an undrected graphcal model, then we can choose C 1,..., C n so that C s the par of vertces conencted by the -th edge of G.

The fully factorsed constrant on f (C ) s: f (C ) = x l C f l (x l ) We wll also assume that f l (x l ) are restrcted to functons proportonal to exponental famles wth base measure, suffcent statstcs, and partton functons h l, η l, A l respectvely. As above: f l (x l ) exp( η l T u l(x l ) A l ( η l )). Note that as f splts, we wrte seperate suffcent statstcs for each component of x. We have constraned Ω to be an exponental famly that splts over the random varables contaned n C.

Under these constrants, we fnd factors n the KL-dvergence (3) that depend on ˆf for a fxed : ( ) 1 KL B f p 1 C f p = 1 (f p )(x) log(f /ˆf )(x)dx B = 1 f (C ) f jl (x l ) log(f /ˆf )(x)dx B j x l C j = 1 fjl (x l ) no ˆη dependence B x\c f (C ) C j j x l C j \C x l C j C ( ) 1 =KL B f p C 1 C ˆf p C, f jl (x l ) log(f /ˆf )(x)dx where p C = j,x l :x l C j C f jl (x l ). Expectatons wth respect to the frst argument of ths KL are ntegrals over C whch are tractable.

In partcluar, ˆf = ˆ x l C f l (x l ), and so the above KL s optmsed when the followng KL-dvergences are mnmsed for each l: ( ) 1 KL B f p C 1 D ˆf l p C. By the exponental famly dervaton above, (ˆf l p C )(x l ) exp ˆη l T + j :x l C j η T jl u l (x l ) A l (ˆη l ) A jl ( η l ) (7) j :x l C j

So the EP update for ˆf l s found as follows: 1. Use equaton (7) above to wrte Eˆfl p C [u l (x l )] as a functon of ˆη l : suppose the functon s Φ l (ˆη l ) = Eˆf l p C [u l (x l )] 2. Compute E fl p C [u l (x l )]. ( ) 3. Set ˆf l Φ 1 l E fl p C [u l (x l )]. These frst two steps nvolve ntegraton over C whch s tractable f the szes of C are small. Every named exponental famly admts an analytc form for Φ 1.

Example: Graphcal models on bnary varables Suppose G s an undrected graphcal model on bnary random varables V (G) = {x 1,..., x n }: p(g) 1 Z xy E(G) f xy (x, y). (8) Here, E(G) are the edges of G. We have absorbed the factors nvolvng just one varable nto the factors on the edges. We can wrte f xy as the followng exponental famly wth suffcent statstcs x, y, xy: f xy (xy) = µ (1 x)(1 y) xy;00 µ x(1 y) xy;10 µ(1 x)y xy;01 µxy xy;11 = exp(σ x x + yσ y + σ xy xy + b xy ). (9)

In (9), the suffcent statstcs for f xy are: And the partton functon s: σ x = log(µ xy;10 /µ xy;00 ), σ y = log(µ xy;01 /µ xy;00 ), σ xy = log µ xy;11µ xy;00 µ xy;10 ; µ xy;01 b xy = log µ xy;00. We wll apply the fully factorzed constrant to the approxmate ste potentals: f xy (xy) = f xy:x (x)f xy:y, exp(δ xy:x x) exp(δ xy:y y). (10) The suffcent statstcs of ths approxmaton are x and y.

We derve the update (6) for ˆf xy assumng that f x y, f x are gven for all x y xy. We must fnd the expected values of the suffcent statstcs of f xy p xy {xy}. As n (7), wth C = {xy}: f xy p xy {xy} (x, y) exp(σ xx + σ y y + σ xy xy + b xy + σ xy ;xx + σ x y,y y). (11) y N(x)\y x N(y)\x

We compute the expected value of x under (11). E fxy p xy [x] s: {xy} exp σ x + 1 + exp(σ y + σ xy + σ x y;y ) y N(x)\y / 1 + exp(σ x + σ xy ;x y N(x)\y + exp(σ x + σ y + σ xy + =ρ x. σ xy ;x) + exp(σ y + x N(y)\x σ x y;y + x N(y)\x y N(x)\y x N(y)\x σ xy ;x) σ x y;y ), (12)

The expresson for (12) n the prevous slde can be calculated drectly from (11) by expandng E fxy p xy [x] as: {xy} 0 (f xy p xy {xy} (0, 0) + f xy p xy {xy} (0, 1)) + 1 (f xy p xy {xy} (1, 0) + f xy p xy {xy}(1, 1)) f xy p xy {xy} (0, 0) + f xy p xy {xy} (0, 1) + f xy p xy {xy} (1, 0) + f xy p xy {xy} (1, 1). Next, E f xy [x] = (0 (exp(0 σ xy:x + 0 σ xy:y ) + exp(0 σ xy:x + 1 σ xy:y )) +1 (exp(1 σ xy:x + 0 σ xy:y ) + exp(1 σ xy:x + 1 σ xy:y ))) / (exp(0 σ xy:x + 0 σ xy:y ) + exp(1 σ xy:x + 0 σ xy:y ) + exp(0 σ xy:x + 1 σ xy:y ) + exp(1 σ xy:x + 1 σ xy:y ))) = exp( δ xy;x ) 1 + exp( δ xy;x ). (13)

Equatng (12) and (13) yelds the update for δ xy;x : Thus, the update for δ xy;x s: E f xy [x] = E fxy p xy [x], {xy} exp( δ xy;x ) 1 + exp( δ xy;x ) = ρ x, δ xy;x = log δ xy;x log ρ x 1 ρ x, ρ x 1 ρ x. (14) and the update for δ xy;y s by symmetry. Ths completes the EP algorthm for arbtrary undrected graphs of bnary random varables. Note that (??) s found by nvertng the expected value as a functon of the natural parameter. Ths s the Φ 1 functon from (6).