Tom Heskes and Onno Zoeter. Presented by Mark Buller

Similar documents
Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Lecture 2 October ε-approximation of 2-player zero-sum games

Machine Learning 4771

A variational radial basis function approximation for diffusion processes.

Speech and Language Processing

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Vehicle Arrival Models : Headway

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France

Maximum Likelihood Parameter Estimation in State-Space Models

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

20. Applications of the Genetic-Drift Model

Temporal probability models

Linear Response Theory: The connection between QFT and experiments

Online Appendix to Solution Methods for Models with Rare Disasters

Object tracking: Using HMMs to estimate the geographical location of fish

Notes for Lecture 17-18

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

Estimation of Poses with Particle Filters

Temporal probability models. Chapter 15, Sections 1 5 1

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

An introduction to the theory of SDDP algorithm

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

CHAPTER 12 DIRECT CURRENT CIRCUITS

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

Timed Circuits. Asynchronous Circuit Design. Timing Relationships. A Simple Example. Timed States. Timing Sequences. ({r 6 },t6 = 1.

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

Hidden Markov Models

Random Walk with Anti-Correlated Steps

04. Kinetics of a second order reaction

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Final Spring 2007

KEY. Math 334 Midterm I Fall 2008 sections 001 and 003 Instructor: Scott Glasgow

OBJECTIVES OF TIME SERIES ANALYSIS

Expectation- Maximization & Baum-Welch. Slides: Roded Sharan, Jan 15; revised by Ron Shamir, Nov 15

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

Planning in POMDPs. Dominik Schoenberger Abstract

Sequential Importance Resampling (SIR) Particle Filter

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

1 Review of Zero-Sum Games

Math 10B: Mock Mid II. April 13, 2016

Variational Learning for Switching State-Space Models

A Bayesian Approach to Spectral Analysis

Written HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence

10. State Space Methods

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

BU Macro BU Macro Fall 2008, Lecture 4

Hidden Markov Models. Adapted from. Dr Catherine Sweeney-Reed s slides

Testing for a Single Factor Model in the Multivariate State Space Framework

Lecture 33: November 29

Isolated-word speech recognition using hidden Markov models

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

Some Basic Information about M-S-D Systems

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Lecture 9: September 25

Notes on Kalman Filtering

Chapter 3 Boundary Value Problem

MANY FACET, COMMON LATENT TRAIT POLYTOMOUS IRT MODEL AND EM ALGORITHM. Dimitar Atanasov

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION

hen found from Bayes rule. Specically, he prior disribuion is given by p( ) = N( ; ^ ; r ) (.3) where r is he prior variance (we add on he random drif

Bias-Variance Error Bounds for Temporal Difference Updates

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

Scheduling of Crude Oil Movements at Refinery Front-end

Lecture Notes 2. The Hilbert Space Approach to Time Series

14 Autoregressive Moving Average Models

Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

Pattern Classification (VI) 杜俊

Probabilistic learning

Math 334 Fall 2011 Homework 11 Solutions

Lecture 20: Riccati Equations and Least Squares Feedback Control

Stationary Distribution. Design and Analysis of Algorithms Andrei Bulatov

Chapter 6. Systems of First Order Linear Differential Equations

Ordinary dierential equations

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INDEX. Transient analysis 1 Initial Conditions 1

An EM based training algorithm for recurrent neural networks

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Air Traffic Forecast Empirical Research Based on the MCMC Method

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

13.3 Term structure models

System of Linear Differential Equations

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering

1. VELOCITY AND ACCELERATION

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

u(x) = e x 2 y + 2 ) Integrate and solve for x (1 + x)y + y = cos x Answer: Divide both sides by 1 + x and solve for y. y = x y + cos x

Inferring Dynamic Dependency with Applications to Link Analysis

Linear Time-invariant systems, Convolution, and Cross-correlation

Chapter 2. First Order Scalar Equations

Math 333 Problem Set #2 Solution 14 February 2003

Transcription:

Tom Heskes and Onno Zoeer Presened by Mark Buller

Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden Markov Models (HMM)

Goal is Inference Far Lef coupled HMM wih 5 chains Lef DBN o monior wase waer reamen plan. Murphy and Weiss 2001 Will generally like o perform inference: P(x y 1:T ) Why no discreize and use he Forward-Backward algorihm for exac inference? Very quickly can become unenable.

Approximae Inference Sampling Paricle Filers Variaional (Ghahramani and Hinon 1998) Swiching Linear Dynamical Sysem (Ghahramani and Jordan 1997) Facorial Hidden Markov Models Variaional Subse Greedy projecion algorihms Where projecion provides a simpler approximae belief Expecaion Propagaion

Problem Seup x super node ha conains all laen variables a a ime poin. y 1:T fixed and is included in he definiion of he poenials: ψ (x -1, ) ψ (x -1, x, y )

Goal: Infer P(x y 1:T ) Find he marginal beliefs or he probabiliy disribuions of he laen variables a a given ime given all he evidence. Pearl s Belief Propagaion (1988) Specific case of he sum-produc rule in facor graphs (Kschischang e al., 2001) Noe: In chain facor graphs variable nodes simply pass received messages on o he nex funcion node.

Message Propagaion 1. Compue esimae of disribuion a local funcion node: 2. Inegrae ou all variables excep x (x he node o which he message is sen) o ge curren esimae of he belief and projec his belief ono a disribuion in he exponenial family: 3. Condiionalize, i.e. divide by message from X o ψ

Belief Approximaion Projec belief akes an exponenial family form: Where γ = canonical parameers and f(x ) he sufficien saisics. If he forward and backward messages are iniialized as: Wih hen he canonical parameers α and β will fully specify he messages α (x ) and β (x ). Thus he belief can be specified as a combinaion of he messages

Momen Maching To projec he belief o he bes exponenial family approximaion is found when he Kullback-Leibler (KL) divergence is minimized: Minima is found when he momens of P(x) and q(x) are mached. Bishop 2006 KL(p q) KL(q p) KL(q p) Funcion g convers from canonical form o momens

Compuing Forward and Backward Messages Compue α such ha: Wih β kep fixed: Similarly Compue β -1 such ha: Noe: wihou he projecion o he exponenial family his is basically he sandard forward backward algorihm. Order of message updaing is free

Example: Swiching Linear Dynamical Sysem Poenials: Messages are aken o be condiional Gaussian poenials:

Example: Sep 1 Compue esimae of disribuion a local funcion node : Messages are combinaions of M Gaussian poenials one for each swich sae i. Transform o a represenaion wih momens

Example: Sep 2 Inegrae and sum ou componens z -1 and s -1 : Inegraion over z -1 can be done direcly: Summaion over s -1 yields a mixure of Gaussians and mus be approximaed using momen maching:

Example: Sep 3 Forward message is found by dividing he approximae belief by he backward message : = Conver o Canonical form

Observaions Backward pass is symmeric o he forward pass. Forward filering pass is equivalen o a popular inference algorihm for swiching linear dynamical sysem (GPB2 Bar-Shalom and Li 1993) Backward smoohing pass improves upon curren algorihms because no addiional approximaions were required. Forward and Backward passes can be ieraed unil convergence. Expecaion propagaion can be used o ieraively improve oher mehods for inference in DBNs (e.g. Murphy and Weiss 2001) Bu his algorihm does no always converge

Behe Free Energy Fixed poins of expecaion propagaion correspond o fixed poins of he Behe free energy (Minka, 2001) Expecaion consrains Under hese consrains he free energy funcion may no be convex. i.e. Can have local fixed poins.

Double Loop Algorihm Linearly bound concave par: For each ouer loop sep rese he bound: For inner loop solve convex consrained minimizaion problem, guaraneeing:

Inner Loop Change o a consrained maximizaion problem over Lagrange mulipliers δ : old Wih: log q ( x ) f( x ) and subsiuing: Tha is, δ can be inerpreed as he difference beween he forward and backward messages, γ as heir sum.

Inner Loop Maximizaion In erms of: gradien wih respec o δ : Se o 0: Damp updae: Ouer-loop can be re-wrien as he updae:

Damped Expecaion Propagaion Minimizaion of he free energy under he expecaion consrains is equivalen o Saddle Poin problem. Double-loop algorihm solves his problem, bu Full compleion in he inner loop is required o guaranee convergence Gradien descen-ascen behavior can be achieved by damping he full updaes in EP: Sable fixed poins of damped EP mus be a leas local minima of Behe free energy

Simulaions Randomly generaed swiching linear dynamical sysems. T varied beween 2 and 5, number of swiches beween 2 and 4 Exac beliefs calculaed using an algorihm by (Laurizen, 1992) using a srong juncion ree. Compared approximae algorihm beliefs o exac beliefs using KL divergence.

Simulaion Resuls Undamped EP One forward pass yields accepable resuls KL drops afer 1 o 2 more passes Double-loop and damped EP converge o same poin

Simulaion Resuls Difficul Insance Undamped suck in a limi cycle (solid line) Damped EP (ε = 0.5), allows sable convergence Double-loop converges bu usually akes longer

Non Convergence One Insance where damped EP did no converge Does i make sense o force convergence using double-loop? Compared KL divergence afer a single forward pass and afer convergence For easy (damped EP) and difficul (double-loop) Conclude: I makes sense o search for he minimum of he free energy using more exhausive means. Convergence of undamped belief propagaion is an indicaion of he qualiy of an approximaion

Conclusion Inroduced a belief propagaion algorihm for DBN ha is symmeric for boh forwards and backward messages Projec beliefs and derive messages from approximae beliefs raher han approximae messages Derived double-loop algorihm guaraneed o converge Derived damped EP as a single-loop version Propery ha when i converges his mus be a minimum of Behe free energy. Thus minimum KL divergence for approximaion Undamped EP works well in many cases When i fails could be due o: Need for damping Need for more edious double-loop algorihm

Kevin Murphy and Yair Weiss Presened by Mark Buller

Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden Markov Models (HMM)

Goal is Inference Far Lef coupled HMM wih 5 chains Lef DBN o monior wase waer reamen plan. Murphy and Weiss 2001 Will generally like o perform inference: P(x y 1:T ) Why no discreize and use he Forward-Backward algorihm? O(TS 2 ), S=num saes

Forwards Backward Algorihm ) ( : 1 def i y i X P ) ( : 1T def i y i X P i i T def i y i X P ) ( : 1 ) ( ), ( i X y P i i W def Transiion Marix ) ( ), ( 1 i X j X P j i M def Diagonal Evidence Marix 1 T M W 1 1 M W

Fronier Algorihm Mehod o compue α and β s wihou he need o form he Q N x Q N ransiion marix: N = number of hidden nodes Q = number possible saes of a node Sweep a Markov Blanke forwards hen backwards across he DBN. The se of nodes composed of a node s he parens, children, and children s oher parens. Every oher node is condiionally independen of A when condiioned on A s Markov blanke. Wikipedia

Fronier Algorihm F Fronier Se = Nodes in Markov Blanke, Nodes o lef = L, Nodes o righ = R. A every sep F d-separaes L and R. A join disribuion over nodes in F is mainained.

Fronier Algorihm A node is added from R o F as soon as all parens are in F To add a node muliply by condiional probabiliy able (CPT) A node is moved from F o L as soon as all children are in F To remove a marginalize by he removed node.

Fronier Algorihm Add X(1) Add X(2) Rem X(1)-1 Forward Message

Fronier Algorihm (Observaions) Exac Inference akes O(TNQ N+2 ) ime and space: N = number of hidden nodes Q = number possible saes of a node Exponenial in he size of he larges fronier Opimal ordering of addiions and removals o minimize F is NP- Hard. For regular DBNs when unrolled, he fronier algorihm is equivalen o he juncion ree algorihm. Fronier ses correspond o: maximal cliques in he moralized riangulaed graph.

Facored Fronier Algorihm Approximae he belief sae wih a produc of marginals: N 1 i 1 i P( X y : ) P( X y1 : When a node is added he node s CPT is muliplied by he produc of facors corresponding o is parens. Join disribuion for he family Paren nodes are immediaely marginalized ou Can be done for any node in any order as long as parens are added firs. Join disribuion over fronier nodes is mainained in facored form. Takes O(TNQ F+1 ) )

Boyen-Koller Algorihm Belief sae wih a produc of marginals over C clusers: C c P X y ) P( X y ) Where X is a subse of he variables { i } c Accuracy depends on size of clusers used o approximae belief sae Exac inference corresponds o using a single cluser wih all hidden variables a a ime slice Mos aggressive approximaion uses N clusers one per variable very similar o FF ( 1: c 1 1: X

BK and FF as Special Cases of Loopy Belief Propagaion Pearl s belief propagaion algorihm compues exac marginal poserior probabiliies in graphs wihou cycles Generalizes he forward-backward algorihm o rees. Assumes messages coming ino a node are independen. FF makes he same assumpion Boh algorihms are equivalen if he order of messages in LBP is specified Normally LBP every node compues λ and π messages in parallel and hen sends ou o all of he neighbors However, messages can be compued in a forwards backward approach. Firs send π (α) from lef o righ, hen send λ (β) messages from righ o lef. FF and BK are equivalen o one ieraion LBP, hus hey can be improved by ieraing more han once.

Experimens Used a coupled HMM (CHMM) wih 10 chains rained wih real highway daa. Define L1 error as: N Q P( X i s y1: T ) Pˆ( X i s y1 : i 1 s 1 T )

Resuls Damping was necessary wih LBP. Ieraing wih damped LBP improves jus a single run of BK

Resuls Waer Nework

Resuls Speed BK and FF / LBP have a running ime linear in N BK is slower because of repeaed marginalizaions When N<11 BK slower han exac inference

Conclusions Described a simple approximae inference algorihm for DBNs and shown equivalence o LBP Shown a connecion beween BK and LBP Showed empirically ha LBP can improve FF and BK.

Shakhnarovich 1996,CS195-5

Shakhnarovich 1996,CS195-5

Shakhnarovich 1996,CS195-5

Shakhnarovich 1996,CS195-5

Shakhnarovich 1996,CS195-5

Shakhnarovich 1996,CS195-5

Shakhnarovich 1996,CS195-5

Shakhnarovich 1996,CS195-5