Expectation propagation - PDF Free Download

Expectaton propagaton Lloyd Ellott May 17, 2011

Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors f. In dong so, we may recover approxmatons of the margnals and jonts of p, or we may fnd the normalzng constant for p. EP nvolves parametersng an approxmaton f of each factor f and teratvely ncludng each factor nto the approxmaton by mnmsng a KL-dvergence.

For each factor f, fx an approxmatng famly of dstrbutons Ω. Gven (1) and Ω, the EP algorthm s as follows: ntalze approxmatons f repeat for = 1,..., n do f argmax ˆf Ω KL 1 B f 1 f C ˆf j end for untl stoppng condton reached Here, B and C are normalsng constants. j f (2)

Wrtng p = j f, we see that the update n the EP algorthm sets f to: 1 argmn ˆf Ω B (f p )(x) log Cf (x) dx, such that B ˆf (x) (ˆf p )(x)dx = C. (3) From ths equaton, we see that f ˆf were unconstraned (.e. f Ω were all functons on the range of x), then ˆf = C B f would be a soluton. Unfortunately, the computaton of B and C are often ntractable. Therefore, to make progress n EP, we must place constrants on f so that mnmsng (3) s tractable.

There are two man sorts of constrants on f that we wll examne: 1. Exponental famly constrans, 2. Fully factorsed constrants. In what follows we wll see the general mplcaton of these assumptons n detal, makng reference to the formulaton of EP updates as mnmsng (2). Other constrants are possble: any choce of Ω for whch the computaton of (3) s tractable leads to an EP algorthm.

Exponental famly constrants Suppose f (x) = h(x) exp(η T u(x) A(η)) and p(x) s any dstrbuton. We want to fnd the suffcent statstc η that mnmses the followng KL-dvergence: KL(p q) = p(x) log p f (x) dx, = E p [p(x)] + E p [h(x)] A(η) + η T E p [u(x)]. We proceed by equatng the dervatve of wth respect to η to zero: η A(η) = E p [u(x)]. (4) But, because f s from an exponental famly, η A(η) = E f [u(x)]. Thus, (9) s mnmsed when E f [u(x)] = E p [u(x)]. Ths s why EP s sometmes called moment matchng.

Returnng to the stuaton of EP, suppose we restrct f to be proportonal to a dstrbuton n a a gven exponental famly: Ω = {f (x) : f (x) h (x) exp(η T u(x) A (η)) η}. Wthout loss of generalty, we have assumed the same form of the suffcent statstcs u(x) for each approxmatng dstrbuton. Suppose f exp(η T u(x) A ( η )) are the current ste approxmatons (proportonalty n η ). The EP mnmsaton step for f (2) s: ( ) 1 f argmax KL ˆf Ω B f p 1 C ˆf p. Collectng terms n the exponent, the second argument n the KL-dvergence s exponental famly wth (proportonalty n ˆη ): ˆf p exp (ˆη T + j η T j )u(x) A (ˆη ) j A ( η ). (5)

Suppose η j agree gven for all j. We wll use (5) to wrte Eˆf p [u(x)] as a functon of ˆη : Suppose Φ (ˆη ) = Eˆf p [u(x)]. To proceed, we must be able to compute E f p [u(x)] for the fxed η j. In ths case, the update (2) s gven by the followng: ˆη Φ 1 (E f p [u(x)]). (6)

Fully factorsed constrants Suppose x = (x 1,..., x k ) and p(x) = 1 B n f (C ), =1 where C 1,..., C n are subsets of x. (N.b. that the C mght overlap.) Ths model has the same expressve power as factor graphs: If G s a factor graph then the terms f (C ) correspond to the factors of G. In partcular, f G s an undrected graphcal model, then we can choose C 1,..., C n so that C s the par of vertces conencted by the -th edge of G.

The fully factorsed constrant on f (C ) s: f (C ) = x l C f l (x l ) We wll also assume that f l (x l ) are restrcted to functons proportonal to exponental famles wth base measure, suffcent statstcs, and partton functons h l, η l, A l respectvely. As above: f l (x l ) exp( η l T u l(x l ) A l ( η l )). Note that as f splts, we wrte seperate suffcent statstcs for each component of x. We have constraned Ω to be an exponental famly that splts over the random varables contaned n C.

Under these constrants, we fnd factors n the KL-dvergence (3) that depend on ˆf for a fxed : ( ) 1 KL B f p 1 C f p = 1 (f p )(x) log(f /ˆf )(x)dx B = 1 f (C ) f jl (x l ) log(f /ˆf )(x)dx B j x l C j = 1 fjl (x l ) no ˆη dependence B x\c f (C ) C j j x l C j \C x l C j C ( ) 1 =KL B f p C 1 C ˆf p C, f jl (x l ) log(f /ˆf )(x)dx where p C = j,x l :x l C j C f jl (x l ). Expectatons wth respect to the frst argument of ths KL are ntegrals over C whch are tractable.

In partcluar, ˆf = ˆ x l C f l (x l ), and so the above KL s optmsed when the followng KL-dvergences are mnmsed for each l: ( ) 1 KL B f p C 1 D ˆf l p C. By the exponental famly dervaton above, (ˆf l p C )(x l ) exp ˆη l T + j :x l C j η T jl u l (x l ) A l (ˆη l ) A jl ( η l ) (7) j :x l C j

So the EP update for ˆf l s found as follows: 1. Use equaton (7) above to wrte Eˆfl p C [u l (x l )] as a functon of ˆη l : suppose the functon s Φ l (ˆη l ) = Eˆf l p C [u l (x l )] 2. Compute E fl p C [u l (x l )]. ( ) 3. Set ˆf l Φ 1 l E fl p C [u l (x l )]. These frst two steps nvolve ntegraton over C whch s tractable f the szes of C are small. Every named exponental famly admts an analytc form for Φ 1.

Example: Graphcal models on bnary varables Suppose G s an undrected graphcal model on bnary random varables V (G) = {x 1,..., x n }: p(g) 1 Z xy E(G) f xy (x, y). (8) Here, E(G) are the edges of G. We have absorbed the factors nvolvng just one varable nto the factors on the edges. We can wrte f xy as the followng exponental famly wth suffcent statstcs x, y, xy: f xy (xy) = µ (1 x)(1 y) xy;00 µ x(1 y) xy;10 µ(1 x)y xy;01 µxy xy;11 = exp(σ x x + yσ y + σ xy xy + b xy ). (9)

In (9), the suffcent statstcs for f xy are: And the partton functon s: σ x = log(µ xy;10 /µ xy;00 ), σ y = log(µ xy;01 /µ xy;00 ), σ xy = log µ xy;11µ xy;00 µ xy;10 ; µ xy;01 b xy = log µ xy;00. We wll apply the fully factorzed constrant to the approxmate ste potentals: f xy (xy) = f xy:x (x)f xy:y, exp(δ xy:x x) exp(δ xy:y y). (10) The suffcent statstcs of ths approxmaton are x and y.

We derve the update (6) for ˆf xy assumng that f x y, f x are gven for all x y xy. We must fnd the expected values of the suffcent statstcs of f xy p xy {xy}. As n (7), wth C = {xy}: f xy p xy {xy} (x, y) exp(σ xx + σ y y + σ xy xy + b xy + σ xy ;xx + σ x y,y y). (11) y N(x)\y x N(y)\x

We compute the expected value of x under (11). E fxy p xy [x] s: {xy} exp σ x + 1 + exp(σ y + σ xy + σ x y;y ) y N(x)\y / 1 + exp(σ x + σ xy ;x y N(x)\y + exp(σ x + σ y + σ xy + =ρ x. σ xy ;x) + exp(σ y + x N(y)\x σ x y;y + x N(y)\x y N(x)\y x N(y)\x σ xy ;x) σ x y;y ), (12)

The expresson for (12) n the prevous slde can be calculated drectly from (11) by expandng E fxy p xy [x] as: {xy} 0 (f xy p xy {xy} (0, 0) + f xy p xy {xy} (0, 1)) + 1 (f xy p xy {xy} (1, 0) + f xy p xy {xy}(1, 1)) f xy p xy {xy} (0, 0) + f xy p xy {xy} (0, 1) + f xy p xy {xy} (1, 0) + f xy p xy {xy} (1, 1). Next, E f xy [x] = (0 (exp(0 σ xy:x + 0 σ xy:y ) + exp(0 σ xy:x + 1 σ xy:y )) +1 (exp(1 σ xy:x + 0 σ xy:y ) + exp(1 σ xy:x + 1 σ xy:y ))) / (exp(0 σ xy:x + 0 σ xy:y ) + exp(1 σ xy:x + 0 σ xy:y ) + exp(0 σ xy:x + 1 σ xy:y ) + exp(1 σ xy:x + 1 σ xy:y ))) = exp( δ xy;x ) 1 + exp( δ xy;x ). (13)

Equatng (12) and (13) yelds the update for δ xy;x : Thus, the update for δ xy;x s: E f xy [x] = E fxy p xy [x], {xy} exp( δ xy;x ) 1 + exp( δ xy;x ) = ρ x, δ xy;x = log δ xy;x log ρ x 1 ρ x, ρ x 1 ρ x. (14) and the update for δ xy;y s by symmetry. Ths completes the EP algorthm for arbtrary undrected graphs of bnary random varables. Note that (??) s found by nvertng the expected value as a functon of the natural parameter. Ths s the Φ 1 functon from (6).