Global Gaussian approximations in latent Gaussian models

Size: px

Start display at page:

Download "Global Gaussian approximations in latent Gaussian models"

Rhoda Fisher
5 years ago
Views:

1 Global Gaussan approxmatons n latent Gaussan models Botond Cseke Aprl 9, 2010 Abstract A revew of global approxmaton methods n latent Gaussan models. 1 Latent Gaussan models In ths secton we ntroduce notaton and defne the model under consderaton. Let p y x, θ l be the condtonal probablty of the observatons y = y 1,..., y n gven the latent varables x = x 1,..., x n and the hyper-parameters θ l. We assume that the lkelhood p y x, θ l factorzes over the latent varables as n p y x, θ l = p y x, θ l. =1 he pror p x θ p over the latent varables s taken to be Gaussan wth canoncal parameters hθ p and Qθ p, that s, p x θ p exp x hθ p 1 2 x Qθ p x. Examples for p x θ p nclude Gaussan process models, where Q 1 θ p s the covarance functon evaluated at the nput locatons and Gaussan Markov random felds, where the elements of Qθ p are the nteractons strengths Q j θ p between the latent varables x and x j. he pror p θ l, θ p over the hyper-parameters s typcally taken to be non-nformatve unform for locaton varables and log-unform for scale varables and factorzes w.r.t. the parameters of the lkelhood and the parameters of the pror. In order to smplfy notaton we use a sngle proxy θ = θ l, θ p to denote the hyper-parameters of the model. he jont dstrbuton of the varables n the model we study s n p y, x, θ p y x, θ exp x hθ 1 2 x Qθ x p θ. =1 We take y fxed and we consder the problem of computng accurate approxmatons of the posteror margnal denstes of the latent varables p x y, θ, gven a fxed hyper-parameter value. hen we ntegrate these margnals over the approxmatons of the hyper-parameters posteror p θ y. he exact quanttes are gven by the formulas p x y, θ = 1 p y θ py x, θ dx \ py j x j, θpx θ 1 j p θ y p θ p y θ. 2 he use the word evdence for p y θ = dxp y, x θ. In the followng we omt p y x, θ s and p x θ s dependence on θ whenever t s not relevant and use t x as an alas of p y x, θ and p 0 x as an alas of p x θ. We use p x = Zp 1 t x p 0 x, that s Z q θ p y θ. A Gaussan approxmaton of p wll be denoted as q and Z q wll denote ts normalzaton constant. 1

2 2 Global Gaussan approxmatons 2.1 he Laplace method he Laplace method 1 computes an approxmatng Gaussan that s characterzed by the local propertes of the dstrbuton at ts mode x = argmax x log p x. he mean s m s defned as m = x whle the nverse of the covarance matrx s the Hessan of log p at the mode x. he dea behnd the method s the followng. Let f = log p. Expandng f n second order at an arbtrary value x we get f x = f x + x x x f x x x 2 xxf x x x + R 2 [f] x; x, where R 2 [f] x; x s the resdual term of the expanson at x wth R 2 [f] x; x = 0. By usng a change of varables s = x x, we have log dxe fx = f x 1 2 xf x [ 2 xxf x ] 1 x f x log 2 xxf x + log E s [e R2[f]s+ x; x], where denotes the determnant and the expectaton w.r.t. s s taken over a normal dstrbuton wth canoncal parameters x f x and 2 xxf x. A closer look at 3 and 4 suggests that choosng x = x and usng the approxmaton R 2 [log p] x; x 0 yelds an approxmaton of the log evdence log dxp x log p x 1 2 log 2 xx log p x. 5 Meanwhle, p can be approxmated by the Gaussan q x = N x x, [ 2 xx log p x ] 1. 6 Note that any reasonably good approxmaton of E s [ e R 2[f]s+ x; x ] can mprove the accuracy of the approxmaton n 5. he Laplace method requres the second order dfferentablty of log p at x, thus a necessary condton for the applcablty of ths approxmaton scheme s the second order dfferentablty of log p. A dstrbuton p for whch the method fals to gve any meanngful nformaton about the varances s, for example, when p y x = λ exp λ y x /2. In ths case the Hessan of log p at an arbtrary pont x s ether equal to the nverse varance of the pror or t s undefned. Snce the Laplace approxmaton captures the characterstcs of the modal confguraton, t often gves poor estmates of the normalzaton constant e.g. Kuss and Rasmussen, However, compared to other methods the man advantage of the Laplace method s ts speed the optmzaton log p w.r.t. x requres only a few Newton steps. 2.2 Varatonal approxmaton An approxmaton scheme that goes beyond local characterstcs s the so called varatonal approxmaton. As mentoned above the mnmzaton of D [p q] leads to the true Gaussan approxmaton but the computatons are ntractable. An alternatve approach s to use to mnmze D [q p]. As shown n Opper and Archambeau 2009, ths approach leads to a tractable optmzaton problem. 1 he Laplace method s known n statstcs as the Gaussan approxmaton e.g., Sva,

3 Expressng D [q p] n terms of the moment parameters of m and V of q, one gets D [q p] = [ ] q x dxq x log + log Z p p x = F m, V + log p y θ, where F s the varatonal free energy e.g. Opper and Archambeau, 2009 F m, V 1 2 log V tr QV m Qm m h E q [log t x ] + C, where C s an rrelevant constant. he optmalty condtons for F are E q [log p x] m that s, V 1 2 = Q dag 2 E q [log t x ] m = 0 and V 1 = 2 E q [log p x] m m, 7 and m = Q h 1 + E q [log t x ].8 m As t was ponted out by Opper and Archambeau, 2009, due to the propertes of Gaussan ntegrals see 3.1 of the Appendx these are equvalent to [ ] [ log p x E q = 0 and V 1 2 ] log p x = E q x x x, 9 that s, the statonary condtons for D [q p] w.r.t. m and V resemble that of the Laplace approxmaton. Loosely speakng, the statonary condtons for the varatonal free energy F w.r.t. the average confguraton m are smlar to the statonary condtons of the energy functon w.r.t model parameters x. Intutvely, n contrast to Laplace approxmaton, the the optmalty condtons for the varatonal free energy hold n average. Snce D [q p] = 0 f and only f q s equal to the posteror, the varatonal free energy F s an upper bound on log Z p and one can approxmate log Z p by the mnmum of F e.g. Neal and Hnton, 1998, that s, log Z p mn m,v F m, V. If t depends only on x then a suffcent condton for the convexty of F n m, V s the convexty Im, v = log t x N x m, v dx n m, v. As t s was ponted out by many authors e.g. Kuss and Rasmussen, 2005; Mnka, 2005, the varatonal approxmaton tends to have the same hallmark as the Laplace approxmaton, whch s the underestmaton of the posteror margnal varances. hs can be explaned by the fact that the varatonal approxmaton s a lmt case of expectaton propagaton when usng local α-dvergences wth α Expectaton propagaton Expectaton propagaton EP approxmates the ntegral for the evdence n the followng way. Let us assume that q s an Gaussan approxmaton of p constraned to have the form qx = Zq 1 t j j x j p 0 x. hen the evdence can be approxmated as Z p = dx p 0 x t j x j j = Z q dx qx t j x j t j j x j Z q dx j qx j t jx j t j x j. 10 j 3

4 and we are left wth choosng the approprate t j x j that yeld both a good approxmaton of the ntegral and of px. EP computes the terms t j x j by teratng t new j x j Collapse t j x j t j x j 1 qx t j x j, for all j {1,..., n}, 11 qx where Collapser s the Kullback-Lebler KL projecton of the dstrbuton r nto the famly of Gaussan dstrbutons. In other words, t s the Gaussan dstrbuton that matches the frst two moments of r. Usng the propertes of the KL dvergence, one can check that when the terms t depend only on x then Collapse t j x j t j x j 1 qx /qx = Collapse t j x j t j x j 1 qx /qx, therefore, the teraton n 11 s well defned. At any fxed pont of ths teraton we have a set of t j x j terms such that Collapse t j x j t j x j 1 qx = qx for any j {1,..., n}. By defnng the cavty dstrbuton q \j x t j x j 1 qx and scalng the terms t j, the above statonarty condton can be rewrtten as { dx j 1, xj, x 2 j} t j x j q \j { x j = dx j 1, xj, x 2 j} tj x j q \j x j, for all j {1,..., n}, and so, the approxmaton for Z p has the form Z p dx p 0 x j t j x j. he updates 11 can be vewed as an teratve applcaton of ADF. It turns out that the Gaussan terms t depend on the same subset or lnear transformatons of the parameters as t and the projecton step n Equaton 11 bols down to computng low dmensonal ntegrals see Secton 3 of the Appendx. In practce these ntegrals are typcally one or two dmensonal and are tractable or can be accurately approxmated usng numercal quadrature rules. Expectaton propagaton, as proposed n Mnka 2001, can be vewed as a generalzaton of loopy belef propagaton e.g. Murphy et al., 1999 to probablstc models wth contnuos varables and and also as an teratve applcaton of the ADF procedure e.g. Csató and Opper, As we can see from the Equatons 12, 14, and 15 n Secton 3 the convexty of log t x N x m, V dx w.r.t. m or the concavty of log t j x j Seeger, 2008 s a suffcent condton for t to be normalzable and thus for the exstence of q new. However, ths alone does not guarantee convergence. o our knowledge, the ssue of EP s convergence n case of the models we study n ths paper s stll an open queston. he teraton n 11 can also be derved by usng varatonal free energes and t can be relaxed such that the projectons are taken on t α j x j t j x j 1 qx, wth α 0, 1]. he lmt α 0 corresponds to the varatonal approxmaton of Opper and Archambeau Detals of EP 3.1 Gaussan formulas he frst and second moments of a dstrbuton p x = q x = N x m, V, are gven by 1 Z f x q x, where q s a Gaussan E p [x] = m + V m log Z and V p [x] = V + V 2 mm log ZV. 12 Usng the ntegraton by parts one can show that the moments of p can also be wrtten n the form E p [x] = m + 1 Z V E q [ x f] and V p [x] = V + 1 Z 2 V [ ZE q [ 2 xx f ] E q [ x f] E q [ x f] ] V, 13 provded that f x e x x and fx x x e x vanshes at nfnty and the requred ntegrals exst. 4

5 3.2 Detals of the expectaton propagaton for Gaussan models Assume the dstrbuton has the form p x = p 0 x t U x, where U are lnear transformatons. hs formulaton ncludes both the representatons when t j depend only on a subset of parameters, that s, t x = t x I wth U = I,I and the representaton used n logstc regresson, where U s the th row of the desgn matrx. Computng t new Frst we compute the form of the term approxmatons, and show that t has a low rank representaton. Let q x = N x m, V and let h = V 1 m, Q = V 1 the canoncal parameters of q. We use q \ x = N x m \, V \ to denote the dstrbuton q \ q/ t α. After some calculus one can show that the moment matchng Gaussan q new x = N x m new, V new of q t q \ s gven by 1 m new = m \ + V \ U U V \ U E [z ] U m \ 1 1 V new = V \ + V \ U U V \ U V [z ] U V \ U U V \ U U V \, where z s a random varable dstrbuted accordng to z t z N z U m \, U V \ U. he update for the term approxmaton t s gven by t new q new /q \. he latter dvson yelds V new 1 V \ 1 1 = U V [z ] 1 U V \ U U 14 V new 1 m new V \ 1 1 m \ = U V [z ] 1 E [z ] U V \ U U m \ 15 leadng to t new x = exp U j x h j 1 2 U jx K j U j x, where h and K are gven by the correspondng quanttes n 14 and 15. he approxmatng dstrbuton q s defned by the canoncal parameters Q = Q + j U j K j U j h = b + j U j h j, that s, the sum over the parameters of t h j and the parameters of the pror p 0 x exp x x Qx/2. Computng q \ Now we turn out attenton to the computaton of the dstrbuton q \. he quanttes we are nterested n are U m \ and U V \ Uj. After some calculus, one can show that these are gven by U V \ U = U K αu K U 1 U = U V U I αk U V U 1 U m \ = U K αu K U 1 h αu h = I αk U V U 1 U m α U V U h. herefore, the computatonal bottleneck of EP reduces to the computaton of the quanttes U m and U V U. hese can be computed from the canoncal representaton of q by U Q 1 h and U Q 1 U. 5

6 Computng the margnal lkelhood approxmaton Let us defne log Z m, V 1 2 m V m log det V n log 2π 2 and log Z m, V log t α U x N x m, V dx. Expectaton propagaton approxmates the margnal lkelhood p y θ by Z ep = Z 1 n/α Zα. Usng the above ntroduced notaton ths can be wrtten as log Z EP = 1 log Z j m \, V \ + log Z m \, V \ log Z m, V + log Z m, V 16 α whch n the case when t depends on U x bols down to log Z ep = 1 log Z j U m \, U V \ U + α [ log Z U m \, U V \ U 1 α log Z U m, U V U ] + log Z m, V. One can see that Z ep can be wrtten as the sum of the approxmate leave-one-out errors log Z j U m \, U V \ U note, that these are not the approxmatons of the leave-one-out densty snce t j does depend on t and a term dependng on the approxmatng densty. 3.3 Solvng the akahash equatons he akahash equatons akahash et al., 1973 am to compute certan elements of the nverse of a postve defnte matrx from ts Cholesky factor. he dervaton of the equatons or the algorthm can be found n many papers e.g., Ersman and nney, 1975; Rue et al., In the followng we present the lne of arguments n Rue et al Let Q = LL, z N0, I and L x = z. hen usng the notaton V = Q 1 we fnd that x N0, V. he equatons L x = z can be rewrtten as L x = z L 1 n k=+1 L kx k. Multplyng both sdes wth x j, j n, usng z = L x and takng expectatons we arrve to the akahash equatons V j = δ j L 2 n L 1 k=+1 L kv kj. Snce we only want to compute the dagonal of V or the elements V j for whch L j 0 the algorthm can be wrtten n the followng Matlab frendly form 1: functon V = SolveakahashL 2: for = n : 1 : 1 3: I = {j : L j 0, j > } 4: V I, = V I,I L I, /L, 5: V,I = VI, 6: V, = 1/L 2, V,IL I, /L 7: end he complexty of ths algorthm scales wth non zerosq 2 /n. References Csató, L. and Opper, M Sparse representaton for Gaussan process models. In. K. Leen,. G. Detterch, and V. resp, edtors, Advances n Neural Informaton Processng Systems 13, Cambrdge, MA, USA. MI Press. Ersman, A. M. and nney, W. F On computng certan elements of the nverse of a sparse matrx. Commun. ACM, 183,

7 Kuss, M. and Rasmussen, C. E Assessng approxmate nference for bnary Gaussan process classfcaton. Journal of Machne Learnng Research, 6, Mnka,. P A famly of algorthms for approxmate Bayesan nference. Ph.D. thess, MI. Mnka,. P Dvergence measures and message passng. echncal Report MSR-R , Mcrosoft Research Ltd., Cambrdge, UK. Murphy, K., Wess, Y., and Jordan, M. I Loopy belef propagaton for approxmate nference: An emprcal study. In Proceedngs of the Ffteenth Conference on Uncertanty n Artfcal Intellgence, volume 9, pages , San Francsco, USA. Morgan Kaufman. Neal, R. and Hnton, G A vew of the EM algorthm that justfes ncremental, sparse, and other varants. In M. I. Jordan, edtor, Learnng n Graphcal Models, pages Kluwer Academc Publshers. Opper, M. and Archambeau, C he varatonal Gaussan approxmaton revsted. Neural Comput., 213, Rue, H., Martno, S., and Chopn, N Approxmate Bayesan nference for latent Gaussan models by usng ntegrated nested Laplace approxmatons. Journal Of he Royal Statstcal Socety Seres B, 712, Seeger, M. W Bayesan nference and optmal desgn for the sparse lnear model. Journal of Machne Learnng Research, 9, Sva, D. S Data Analyss: A Bayesan utoral. Clarendon Oxford Unv. Press, Oxford. akahash, K., Fagan, J., and Chn, M.-S Formaton od a sparse mpedance matrx and ts applcaton to short crcut study. In Proceedngs of the 8th PICA Conference. 7

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP