Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeer Presened by Mark Buller

Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden Markov Models (HMM)

Goal is Inference Far Lef coupled HMM wih 5 chains Lef DBN o monior wase waer reamen plan. Murphy and Weiss 2001 Will generally like o perform inference: P(x y 1:T ) Why no discreize and use he Forward-Backward algorihm for exac inference? Very quickly can become unenable.

Approximae Inference Sampling Paricle Filers Variaional (Ghahramani and Hinon 1998) Swiching Linear Dynamical Sysem (Ghahramani and Jordan 1997) Facorial Hidden Markov Models Variaional Subse Greedy projecion algorihms Where projecion provides a simpler approximae belief Expecaion Propagaion

Problem Seup x super node ha conains all laen variables a a ime poin. y 1:T fixed and is included in he definiion of he poenials: ψ (x -1, ) ψ (x -1, x, y )

Goal: Infer P(x y 1:T ) Find he marginal beliefs or he probabiliy disribuions of he laen variables a a given ime given all he evidence. Pearl s Belief Propagaion (1988) Specific case of he sum-produc rule in facor graphs (Kschischang e al., 2001) Noe: In chain facor graphs variable nodes simply pass received messages on o he nex funcion node.

Message Propagaion 1. Compue esimae of disribuion a local funcion node: 2. Inegrae ou all variables excep x (x he node o which he message is sen) o ge curren esimae of he belief and projec his belief ono a disribuion in he exponenial family: 3. Condiionalize, i.e. divide by message from X o ψ

Belief Approximaion Projec belief akes an exponenial family form: Where γ = canonical parameers and f(x ) he sufficien saisics. If he forward and backward messages are iniialized as: Wih hen he canonical parameers α and β will fully specify he messages α (x ) and β (x ). Thus he belief can be specified as a combinaion of he messages

Momen Maching To projec he belief o he bes exponenial family approximaion is found when he Kullback-Leibler (KL) divergence is minimized: Minima is found when he momens of P(x) and q(x) are mached. Bishop 2006 KL(p q) KL(q p) KL(q p) Funcion g convers from canonical form o momens

Compuing Forward and Backward Messages Compue α such ha: Wih β kep fixed: Similarly Compue β -1 such ha: Noe: wihou he projecion o he exponenial family his is basically he sandard forward backward algorihm. Order of message updaing is free

Example: Swiching Linear Dynamical Sysem Poenials: Messages are aken o be condiional Gaussian poenials:

Example: Sep 1 Compue esimae of disribuion a local funcion node : Messages are combinaions of M Gaussian poenials one for each swich sae i. Transform o a represenaion wih momens

Example: Sep 2 Inegrae and sum ou componens z -1 and s -1 : Inegraion over z -1 can be done direcly: Summaion over s -1 yields a mixure of Gaussians and mus be approximaed using momen maching:

Example: Sep 3 Forward message is found by dividing he approximae belief by he backward message : = Conver o Canonical form

Observaions Backward pass is symmeric o he forward pass. Forward filering pass is equivalen o a popular inference algorihm for swiching linear dynamical sysem (GPB2 Bar-Shalom and Li 1993) Backward smoohing pass improves upon curren algorihms because no addiional approximaions were required. Forward and Backward passes can be ieraed unil convergence. Expecaion propagaion can be used o ieraively improve oher mehods for inference in DBNs (e.g. Murphy and Weiss 2001) Bu his algorihm does no always converge

Behe Free Energy Fixed poins of expecaion propagaion correspond o fixed poins of he Behe free energy (Minka, 2001) Expecaion consrains Under hese consrains he free energy funcion may no be convex. i.e. Can have local fixed poins.

Double Loop Algorihm Linearly bound concave par: For each ouer loop sep rese he bound: For inner loop solve convex consrained minimizaion problem, guaraneeing:

Inner Loop Change o a consrained maximizaion problem over Lagrange mulipliers δ : old Wih: log q ( x ) f( x ) and subsiuing: Tha is, δ can be inerpreed as he difference beween he forward and backward messages, γ as heir sum.

Inner Loop Maximizaion In erms of: gradien wih respec o δ : Se o 0: Damp updae: Ouer-loop can be re-wrien as he updae:

Damped Expecaion Propagaion Minimizaion of he free energy under he expecaion consrains is equivalen o Saddle Poin problem. Double-loop algorihm solves his problem, bu Full compleion in he inner loop is required o guaranee convergence Gradien descen-ascen behavior can be achieved by damping he full updaes in EP: Sable fixed poins of damped EP mus be a leas local minima of Behe free energy

Simulaions Randomly generaed swiching linear dynamical sysems. T varied beween 2 and 5, number of swiches beween 2 and 4 Exac beliefs calculaed using an algorihm by (Laurizen, 1992) using a srong juncion ree. Compared approximae algorihm beliefs o exac beliefs using KL divergence.

Simulaion Resuls Undamped EP One forward pass yields accepable resuls KL drops afer 1 o 2 more passes Double-loop and damped EP converge o same poin

Simulaion Resuls Difficul Insance Undamped suck in a limi cycle (solid line) Damped EP (ε = 0.5), allows sable convergence Double-loop converges bu usually akes longer

Non Convergence One Insance where damped EP did no converge Does i make sense o force convergence using double-loop? Compared KL divergence afer a single forward pass and afer convergence For easy (damped EP) and difficul (double-loop) Conclude: I makes sense o search for he minimum of he free energy using more exhausive means. Convergence of undamped belief propagaion is an indicaion of he qualiy of an approximaion

Conclusion Inroduced a belief propagaion algorihm for DBN ha is symmeric for boh forwards and backward messages Projec beliefs and derive messages from approximae beliefs raher han approximae messages Derived double-loop algorihm guaraneed o converge Derived damped EP as a single-loop version Propery ha when i converges his mus be a minimum of Behe free energy. Thus minimum KL divergence for approximaion Undamped EP works well in many cases When i fails could be due o: Need for damping Need for more edious double-loop algorihm

Kevin Murphy and Yair Weiss Presened by Mark Buller

Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden Markov Models (HMM)

Forwards Backward Algorihm ) ( : 1 def i y i X P ) ( : 1T def i y i X P i i T def i y i X P ) ( : 1 ) ( ), ( i X y P i i W def Transiion Marix ) ( ), ( 1 i X j X P j i M def Diagonal Evidence Marix 1 T M W 1 1 M W

Fronier Algorihm Mehod o compue α and β s wihou he need o form he Q N x Q N ransiion marix: N = number of hidden nodes Q = number possible saes of a node Sweep a Markov Blanke forwards hen backwards across he DBN. The se of nodes composed of a node s he parens, children, and children s oher parens. Every oher node is condiionally independen of A when condiioned on A s Markov blanke. Wikipedia

Fronier Algorihm F Fronier Se = Nodes in Markov Blanke, Nodes o lef = L, Nodes o righ = R. A every sep F d-separaes L and R. A join disribuion over nodes in F is mainained.

Fronier Algorihm A node is added from R o F as soon as all parens are in F To add a node muliply by condiional probabiliy able (CPT) A node is moved from F o L as soon as all children are in F To remove a marginalize by he removed node.

Fronier Algorihm Add X(1) Add X(2) Rem X(1)-1 Forward Message

Fronier Algorihm (Observaions) Exac Inference akes O(TNQ N+2 ) ime and space: N = number of hidden nodes Q = number possible saes of a node Exponenial in he size of he larges fronier Opimal ordering of addiions and removals o minimize F is NP- Hard. For regular DBNs when unrolled, he fronier algorihm is equivalen o he juncion ree algorihm. Fronier ses correspond o: maximal cliques in he moralized riangulaed graph.

Facored Fronier Algorihm Approximae he belief sae wih a produc of marginals: N 1 i 1 i P( X y : ) P( X y1 : When a node is added he node s CPT is muliplied by he produc of facors corresponding o is parens. Join disribuion for he family Paren nodes are immediaely marginalized ou Can be done for any node in any order as long as parens are added firs. Join disribuion over fronier nodes is mainained in facored form. Takes O(TNQ F+1 ) )

Boyen-Koller Algorihm Belief sae wih a produc of marginals over C clusers: C c P X y ) P( X y ) Where X is a subse of he variables { i } c Accuracy depends on size of clusers used o approximae belief sae Exac inference corresponds o using a single cluser wih all hidden variables a a ime slice Mos aggressive approximaion uses N clusers one per variable very similar o FF ( 1: c 1 1: X

BK and FF as Special Cases of Loopy Belief Propagaion Pearl s belief propagaion algorihm compues exac marginal poserior probabiliies in graphs wihou cycles Generalizes he forward-backward algorihm o rees. Assumes messages coming ino a node are independen. FF makes he same assumpion Boh algorihms are equivalen if he order of messages in LBP is specified Normally LBP every node compues λ and π messages in parallel and hen sends ou o all of he neighbors However, messages can be compued in a forwards backward approach. Firs send π (α) from lef o righ, hen send λ (β) messages from righ o lef. FF and BK are equivalen o one ieraion LBP, hus hey can be improved by ieraing more han once.

Experimens Used a coupled HMM (CHMM) wih 10 chains rained wih real highway daa. Define L1 error as: N Q P( X i s y1: T ) Pˆ( X i s y1 : i 1 s 1 T )

Resuls Damping was necessary wih LBP. Ieraing wih damped LBP improves jus a single run of BK

Resuls Waer Nework

Resuls Speed BK and FF / LBP have a running ime linear in N BK is slower because of repeaed marginalizaions When N<11 BK slower han exac inference

Conclusions Described a simple approximae inference algorihm for DBNs and shown equivalence o LBP Shown a connecion beween BK and LBP Showed empirically ha LBP can improve FF and BK.

Shakhnarovich 1996,CS195-5