Mean Field / Variational Approximations

Similar documents
Why BP Works STAT 232B

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Some modelling aspects for the Matlab implementation of MMA

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

MMA and GCMMA two methods for nonlinear optimization

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Markov Chain Monte Carlo Lecture 6

Week 5: Neural Networks

Feature Selection: Part 1

Expectation Maximization Mixture Models HMMs

Grid Generation around a Cylinder by Complex Potential Functions

PHYS 705: Classical Mechanics. Calculus of Variations II

Generalized Linear Methods

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Kernel Methods and SVMs Extension

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

), it produces a response (output function g (x)

Pattern Classification

Linear Approximation with Regularization and Moving Least Squares

e i is a random error

Chapter Newton s Method

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

CSC 411 / CSC D11 / CSC C11

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

Report on Image warping

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

1 GSW Iterative Techniques for y = Ax

Lecture Notes on Linear Regression

Expectation propagation

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Lecture 10: Euler s Equations for Multivariable

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Maximal Margin Classifier

Hidden Markov Models

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Portfolios with Trading Constraints and Payout Restrictions

4DVAR, according to the name, is a four-dimensional variational method.

Learning Theory: Lecture Notes

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

EM and Structure Learning

Solution of Linear System of Equations and Matrix Inversion Gauss Seidel Iteration Method

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

CinChE Problem-Solving Strategy Chapter 4 Development of a Mathematical Model. formulation. procedure

Speech and Language Processing

A Hybrid Variational Iteration Method for Blasius Equation

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

A NOTE ON CES FUNCTIONS Drago Bergholt, BI Norwegian Business School 2011

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

The Expectation-Maximization Algorithm

6. Stochastic processes (2)

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

6. Stochastic processes (2)

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

Homework Assignment 3 Due in class, Thursday October 15

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Solving Nonlinear Differential Equations by a Neural Network Method

Limited Dependent Variables

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Classification as a Regression Problem

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

9 : Learning Partially Observed GM : EM Algorithm

coordinates. Then, the position vectors are described by

Poisson brackets and canonical transformations

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

,, MRTS is the marginal rate of technical substitution

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Machine Learning for Signal Processing Linear Gaussian Models

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Calculus of Variations Basics

Introduction to Density Functional Theory. Jeremie Zaffran 2 nd year-msc. (Nanochemistry)

Lecture 12: Discrete Laplacian

The Feynman path integral

Advanced Topics in Optimization. Piecewise Linear Approximation of a Nonlinear Function

Lecture 10 Support Vector Machines. Oct

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Basic Statistical Analysis and Yield Calculations

Module 9. Lecture 6. Duality in Assignment Problems

CHAPTER 14 GENERAL PERTURBATION THEORY

β0 + β1xi and want to estimate the unknown

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

χ x B E (c) Figure 2.1.1: (a) a material particle in a body, (b) a place in space, (c) a configuration of the body

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Notes on Kehoe Perri, Econometrica 2002

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

Optimal Dispatch in Electricity Markets

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

Transcription:

Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods

Introducton roblem: We have dstrbuton but nference s hard to compute. revous solutons: Appromate energy functonal: Bethe, Kkuch Introducton New dea: Drectly optmze the energy functonal ntroducng a dstrbuton defned on the same doman of varables as whch ncorporates some constrants. Obectve: We want to fnd whch s the best appromaton of and use to make nferences. Fnd that mnmzes F,

Mean Feld Appromaton Assumptons: s our mean feld appromaton. Varables n the dstrbuton are ndependent varables. In the standard mean feld approach, s completely factorzed: Mean Feld Appromaton What happens when we apply mean feld?

Mean Feld Appromaton, F F + F F, H H Mean Feld Appromaton Task: fnd mnmzng F, such that F, Solvng: buld a Lagrangan, dfferentate and set to 0! H, F F +

Mean Feld Appromaton The dstrbuton s locally optmal soluton gven,, -, +,, n, f: ep Z F [ ln ] MF-equaton Where Z s a local normalzng constant and [ln ] s the condtonal epectaton gven the value. Mean Feld Appromaton Localty: Only local operatons are needed for teraton of the MF-equatons. In other words, only neghborng varables are needed. ep Z : Scope [ ] [ ln U, ] MF-equaton smplfed where U Scope [] Calculaton of depends only on clusters belongs to

Mean Feld Appromaton Soluton: Iterate mean feld equatons Converge to a fed pont. roblem: convergence to a local optma. [ ] ] [ :, ln ep Scope U Z MF-equaton smplfed Mean Feld Appromaton Haft et al. paper: Optmze the KL dvergence nstead of the free energy D D D + D + Assume:

Mean Feld Appromaton Haft et al. paper: Optmze the KL dvergence nstead of the free energy D + Does not depend on D Subect to Depends on Mean Feld Appromaton Haft et al. paper: ep ep MF-equaton Localty: ep M M MF-equaton smplfed where M s the Markov boundary.

Mean Feld Appromaton Algorthm: Mean Feld Appromaton Converges to one of typcally many local mnma. asy to compute but sometmes s not good enough. It cannot descrbe comple posterors eg. OR We must use a rcher class of dstrbutons.

Structured Mean Feld plotng Substructures If we use a dstrbuton that can capture some of the dependences n, we can get a better appromaton. Two possble substructures for A, A,2 A,3 A,4 A, A,2 A,3 A,4 A, A,2 A,3 A,4 Independent chans A 4, Dstrbuton A 4, Dstrbuton A 4, Dstrbuton 2 Structured Mean Feld plotng Substructures Z ψ where ψ s a factor wth Scope[ψ ] C. and assume we have the set of potental scopes: {C χ:,,j}

Structured Mean Feld plotng Substructures Gven: Z ψ And restrcton: ψ c c Then the potental ψ s locally optmal when: [ ln F c ] [ ln k c ] ψ c ep ψ k Structured Mean Feld plotng Substructures Localty as Mean Felds: where and ψ c ep A ψ k B A B { F : U C } { ψ k : C k C } [ c ] [ lnψ k c ]

Structured Mean Feld Updatng: Calculaton of depends on clusters where belongs to. And on clusters overlappng C n. And on scopes C k dependent of C n also. C A, A,2 A,3 A,4 A, A,2 A,3 A,4 A 4, Dstrbuton A 4, Dstrbuton Structured Mean Feld In other words, we want to compute A, : C {A, A,2 } C 2 {A,2 A,3 } C 3 { A 2,2 }. A Clusters belongs to as standard mean feld.e. {A, A,2 } and {A, } Clusters overlappng C and those from F. For eample n ths case A,2 n C overlaps n F, thus we need to consder {A,2 A,3 } and {A, A 2,2 }. The same occurs wth A,3 and A,4 B Clusters n dependent on C. In ths eample every C s ndependent from each other, therefore B s empty. C A, A,2 A,3 A,4 A, A,2 A,3 A,4 A 4, Dstrbuton A 4, Dstrbuton

Structured Mean Feld Agan we want to compute A,, assume the new substructure n : Now we choose C {A, A,2 A,3 A,4 } C 2 { A 2,2 A 2,3 A 2,4 } A We consder the same clusters as before but now we add those overlappng wth,.e. { } and { A 2,2 } B Clusters n dependent on C. Now we have n C overlappng wth n C 2. We need to subtract snce we already used t n A. A, A,2 A,3 A,4 A, A,2 A,3 A,4 C C 2 A 4, Dstrbuton A 4, Dstrbuton Structured Mean Feld Another eample, we want to compute a,b: Now we choose C {A B} C 2 {C D} A { {A B} {A D} {B C}} B mpty, snce C and C 2 do not overlap. C A A B D B D C 2 C C Dstrbuton Dstrbuton

Structured Mean Feld plotng Substructures Updates are relatvely costly due to the consderaton of structure. Two approaches for updates: Sequental: Choose a factor and update t, then perform nference. It wll converge. arallel: Update all factors, then nference. It doesn t guarantee convergence. Structured Mean Feld ample: Structure b can be eploted: A, B, C, D ψ A, B ψ 2 C, D Z ' A, B, C, D AB A, B CD C, D ψ ' A ψ '' B ψ ' 2 C ψ '' 2 D Z Structure c cannot be eploted redundant

Structured Mean Feld Refnement Theorem: Refnes an ntal appromatng network by factorzng ts factors nto a product of factors and potentals from F. ψ k can be wrtten as the product of two sets of factors: Those n F that are subsets of the scope of ψ k. artally covered factors n F by the scope of ψ and other factors n. Wegthed Mean Feld General Mture Weghts Idea: Instead of selectng one partcular MF soluton, we form a weghted average a mture of several solutons. numerate the dfferent MF-solutons by a hdden varable a, a. Assgn mture weghts a. a a a

Wegthed Mean Feld Gven a a under the constrant a a a Determne a such that D s mnmzed: a ep a a ep [ D a ] Wegthed Mean Feld General Mture Weghts The prevous formula means that dfferent solutons a contrbute to the global dstrbuton accordng to ther dstance to. Note however, we are not optmzng a smultaneously.

Wegthed Mean Feld ample: Nosy-OR Varatonal Methods Idea: Introducng aulary varatonal parameters that help n smplfyng a comple obectve functon. ln λ - lnλ - Ths upper bound allows to appromate ln wth a term that s lnear n.

Thank you! Mean Feld Appromaton ample from Wegernck: Nosy-OR from Haft et al.: