Gaussian process classification: a message-passing viewpoint

Size: px

Start display at page:

Download "Gaussian process classification: a message-passing viewpoint"

Bertram Morgan
5 years ago
Views:

1 Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP algorthm commonly used for Gaussan process GP classfcaton wth probt lkelhood. By presentng ths EP algorthm as message-passng n the factor graph gves the reader a dfferent more unfed perspectve on what the algorthm s dong and facltates the use of GP classfcaton and regresson as a buldng block for larger factor graphs. 1 Problem setup and factor graph Fgure 1 shows the factor graph for Gaussan process classfcaton wth probt lkelhood. gf GP0, K f h f, y Φf y y N Fgure 1: Factor graph for GP classfcaton. Gven a dataset of N observatons D {X, y}, X {x } N and y {y } N, our goal s to estmate the posteror on f pf X, y 1 N Z pf X py f. 1 1

2 We wll approxmate the posteror on f by makng use of the Expectaton Propagaton algorthm [3], whch approxmates each lkelhood term py f n turn wth an unnormalsed Gaussan on f The approxmate posteror on f s then gven by py f Z N f µ, σ. wth qf X, y 1 Z EP pf X N f µ, σ N f µ, Σ 3 µ Σ Σ 1 µ 4 Σ K 1 + Σ µ s the vector of µ and Σ s dagonal wth Σ σ. Message-passng for classfcaton We wll now present a message-passng vewpont of the typcal EP algorthm presented for example n [1] and []. The man advantage of ths s that s allows us to easly generalze GP classfcaton and regresson to be part of larger factor graphs. The EP algorthm, as a message-passng algorthm n the factor graph of Fgure 1, comprses the followng steps: Step 1: Compute message from the factor gf to the f varables m g f f gf m fj gf j df j 6 j pf X j N f j µ j, σ j df j 7 Conceptually, one can thnk of the combnaton of pror gf and the n 1 approxmate lkelhoods n eq. 7 n two ways, ether by explctly multplyng out the terms, or equvalently by removng approxmate lkelhood from the approxmate posteror n eq. 3. Here we wll follow the latter approach. The margnal for f from qf X, y s gven by qf X, y N f µ, σ 8 The message from the factor gf to the f varables s then gven by m g f f qf X, y m f gf N f µ, σ N f µ, σ N f µ, σ 9 µ σ σ µ σ µ 10 σ σ σ 1 11

3 we made use of the property n A.1. Ths step s therefore equvalent to the computaton of the cavty dstrbuton as presented n [1] and []. However, note that we need to make use of equatons 4 and 5 n order to compute µ and Σ, from whch we get the values of µ and σ. Step : Compute approxmate posteror on f qf m g f f m h f f 1 N f µ, σ Φf y 13 ẐN f ˆµ, ˆσ 14 the approxmaton s done by matchng the moments of the two dstrbutons [3]. The moments are then gven by [1, 4]: we defned Ẑ Φz 15 ˆµ µ + y σ N z Φz 1 + σ ˆσ σ σ 4 N z 1 + σ Φz z + N z Φz z y µ 1 + σ 18 See [1] for the dervaton of ths moments. Notce that we are combnng the cavty dstrbuton wth the exact lkelhood py f to get the desred non-gaussan margnal, whch we then approxmate wth a Gaussan wth moment matchng. Hence, ths step s equvalent to the computaton of the ste parameters as presented n [1] and []. Step 3: Compute message from f to the factor gf wth m fj gf j qf m g f f N f ˆµ, ˆσ N f µ, σ Z N f µ, σ 19 µ σ ˆσ ˆµ σ 0 σ ˆσ σ 1 Z Ẑ π σ + σ exp 1 we made use of the property n A.1. µ µ σ + σ Notce that ths step corresponds to the fnal of EP as presented n [1] and [], we compute the parameters of the approxmaton N f µ, σ whch acheves a match wth the desred moments. Fgure provdes an overvew of the message-passng algorthm on the factor graph. 3

4 Fgure : Overvew of the message-passng algorthm. 3 Margnal lkelhood The EP approxmaton to the margnal lkelhood Z EP s gven by Z EP pf X Z N f µ, σ df 3 N f 0, KN f µ, Σdf Z 4 Makng use of the results for the product of two Gaussans n A.1, we get Z EP π D/ K + Σ 1/ exp 1 µt K + Σ N 1 µ Z 5 π D/ K + Σ 1/ exp 1 µt K + Σ 1 µ y µ π Φ σ + σ exp 1 + σ Takng the logarthm gves 1 µ µ σ + σ log Z EP D logπ 1 log K + Σ 1 µt K + Σ 1 µ N y µ + log Φ + N 1 + σ logπ + 1 N logσ + σ + 1 log K + Σ 1 µt K + Σ 1 µ N logσ + σ + N N y µ log Φ 1 + σ N 6 µ µ σ + σ 7 µ µ σ + + const. 8 σ 4

5 4 Message-passng for regresson We wll now take a quck look at the smpler case of GP regresson wth a Gaussan lkelhood. Ths change corresponds to replacng the factor h f, y Φf y wth h f, y N y f, σ. Hence, we now have to revse step of the EP algorthm presented n the prevous secton, whch computed the approxmate posteror on f. Now, ths posteror s exact, and t s gven by qf m g f f m h f f 9 N f µ, σ N y f, σ 30 N f ˆµ, ˆσ 31 wth ˆµ ˆσ σ y + σ µ 3 1 ˆσ σ + σ 33 we made use of the propertes n A. to get the posteror on f. An alternatve way to arrve at these equatons n by takng dervatves of the log partton functon Ẑ, n order to compute the moments of the dstrbutons by makng use of the ADF updates provded by Mnka n [4]. The margnal lkelhood for the regresson case s smply gven by Z EP pf X N y f, σ df 34 N f 0, K N y f, σ df 35 Notce that, n ths case, the lkelhood terms are already Gaussan. Hence, we can proceed by makng use of the Gaussan denttes n A.1, to evaluate the ntegral, and get Takng the logarthm gves 36 Z EP N y 0, K + σ I log Z EP D logπ 1 log K + σ I 1 yt K + σ I 1 y 39 5

6 A Operatons wth Gaussans A.1 Product and dvson Gven two multvarate Gaussan dstrbutons N x µ 1, Σ 1 and N x µ, Σ, the product s gven by The normalzaton constant s gven by Z 1 π D/ Σ 1 + Σ 1/ exp Σ π D Σ 1 Σ exp Smlarly, for dvson we have N x µ 1, Σ 1 N x µ, Σ Z 1 N x µ, Σ 40 µ ΣΣ 1 1 µ 1 + Σ 1 µ 41 Σ Σ Σ µ 1 µ T Σ 1 + Σ 1 µ 1 µ 1 µt 1 Σ 1 1 µ 1 + µ T Σ 1 µ µ T Σ 1 µ N x µ 1, Σ 1 N x µ, Σ Z 1 N x µ, Σ 45 µ ΣΣ 1 1 µ 1 Σ 1 µ 46 Σ Σ 1 1 Σ The normalzaton constant s gven by Z 1 Σ Σ π D Σ 1 exp 1 µt 1 Σ 1 1 µ 1 µ T Σ 1 µ µ T Σ 1 µ A. Bayes rule Gven a margnal Gaussan dstrbuton for x and a condtonal Gaussan dstrbuton for y gven x n the form 48 px N x µ, Λ 1 49 py x N y Ax + b, L 1 50 the margnal dstrbuton of y and the condtonal dstrbuton of x gven y are gven by py N y Aµ + b, AΛ 1 A T + L 1 51 px y N x S{A T Ly b + Λµ}, S 5 S Λ + A T LA

7 References [1] Carl Edward Rasmussen and Chrstopher K. I. Wllams Gaussan Processes for Machne Learnng Adaptve Computaton and Machne Learnng. The MIT Press. [] Chrstopher M. Bshop Pattern Recognton and Machne Learnng Informaton Scence and Statstcs. Sprnger-Verlag New York, Inc., Secaucus, NJ, USA. [3] Thomas Mnka A Famly of Algorthms for Approxmate Bayesan Inference. MIT Press. [4] Thomas Mnka EP: A quck reference. 7

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,