Approximating the Sum Operation for Marginal-MAP Inference

Approxmatng the Sum Operaton or Margnal-MAP Inerence Qang Cheng, Feng Chen, Janwu Dong, Wenl Xu Alexander Ihler Tnghua Natonal Laboratory or Inormaton Scence and Technology Inormaton and Computer Scence Department o Automaton, Tnghua Unverty Unverty o Calorna, Irvne Bejng, Chna Irvne, CA 67- {cheng-q@mal, cheneng@mal, hler@c.uc.edu djw@mal, xuwl@mal}.tnghua.edu.cn Abtract We tudy the margnal-map problem on graphcal model, and preent a novel approxmaton method baed on drect approxmaton o the um operaton. A prmary dculty o margnal-map problem le n the non-commutatvty o the um and max operaton, o that even n hghly tructured model, margnalzaton may produce a denely connected graph over the varable to be maxmzed, reultng n an ntractable potental uncton wth exponental ze. We propoe a chan decompoton approach or ummng over the margnalzed varable, n whch we produce a tructured approxmaton to the MAP component o the problem contng o only parwe potental. We how that th approach equvalent to the maxmzaton o a pecc varatonal ree energy, and t provde an upper bound o the optmal probablty. Fnally, expermental reult demontrate that our method perorm avorably compared to prevou method. Introducton Graphcal model provde an explct and compact repreentaton or probablty dtrbuton that exhbt actorzaton tructure. They are powerul tool or modelng uncertanty n the eld o artcal ntellgence, computer von, bonormatc, gnal proceng, and many other. Many uch applcaton can be reduced to bac probabltc nerence tak; typcal tak nclude computng margnal probablte um-nerence, ndng the maxmum a poteror MAP etmate max-nerence and margnal-map nerence max-um-nerence. The margnal-map problem rt margnalze over a ubet o the varable um operaton, and then eek the MAP etmate or the ret o the model varable max operaton. Margnal-MAP nerence NP P P -complete, and harder than ether max-nerence or um-nerence Park and Darwche. Part o the dculty o margnal-map nerence le n the non-commutatvty o the um and the max operaton, whch can prevent ecent elmnaton order; even or tree-tructured graphcal model, t can be computatonally ntractable Park and Darwche ; Koller and Fredman. Copyrght c, Aocaton or the Advancement o Artcal Intellgence www.aaa.org. All rght reerved. There ha been relatvely lttle work on approxmatng the margnal-map problem untl recently. State-othe-art method nclude amplng method, earch method and meage pang method. Doucet, Godll, and Robert propoe a mple Markov chan Monte Carlo MCMC trategy or margnal-map etmate. Johanen, Doucet, and Davy ample rom a equence o artcal dtrbuton ung a equental Monte Carlo approach. de Campo, Gámez, and Moral preent a genetc algorthm to perorm margnal-map nerence. Park and Darwche nvetgate bele propagaton or the approxmate um-nerence, and ue local earch or the approxmate max-nerence. Huang, Chavra, and Darwche 6 propoe a branch-and-bound earch method or exact margnal-map nerence by computng the bound on a compled arthmetc crcut repreentaton. Dechter and Rh propoe a mn-bucket cheme or the margnal-map problem by parttonng the potental nto group durng elmnaton, and Meek and Wexler propoe a related approxmate varable elmnaton cheme that drectly approxmate the reult o each elmnaton wth a product o uncton, boundng the error between the correct and approxmate potental. Recently, reearcher have alo tuded margnal-map nerence rom the perpectve o ree energy maxmzaton, and propoed meage pang approxmaton algorthm. For example, Jang, Ra, and Daumé III propoe a hybrd meage pang algorthm motvated by a Bethe-lke ree energy. Lu and Ihler b provde a general varatonal ramework or margnal MAP, and derve everal approxmate nerence algorthm baed on the Bethe and tree-reweghted approxmaton; the treereweghted approxmaton provde an upper bound o the optmal energy. In th paper, we explore a two-tep approxmaton method or margnal-map nerence, n whch we contruct an explct actorzed approxmaton o the margnalzed dtrbuton ung a orm o approxmate varable elmnaton, producng a tructured MAP problem that can be olved ung a varety o extng method, uch a dual decompoton Sontag, Globeron, and Jaakkola. We ue a novel chan decompoton approach to contruct the approxmate margnalzaton, and apply a Hölder nequalty Lu and Ihler a to obtan bound on the exact margnalzaton. Th alo allow u to nterpret our method n term o an up-

a Fgure : The llutraton o the margnal-map problem. a the orgnal graph, wth a um node haded and the max node unhaded. b the complete graph ater ummng over the um node. per boundng varatonal ree energy Lu and Ihler b. We how n experment that our approach provde better bound, and mlar etmated oluton, to recently propoed meage pang approxmaton. b Overvew o Margnal-MAP Inerence In th ecton, we brely revew the margnal-map problem on graphcal model. We conder only parwe Markov random eld MRF n th paper, o that a probablty dtrbuton p dened on a graph G can be dened a p x = Z ψ,j E ψ j x, x j, where E V V the et o edge, and V = {,,..., N} the et o node. It oten ueul to expre px n the overcomplete [ exponental ] amly orm, by denng ψ j x, x j = exp θj x, x j, o that px = exp θ j x, x j A θ,,j E where θ = {θ j :, j E}, and A θ = log Z ψ = log x exp θ x. A common, we abue notaton lghtly to reer to θ and θ j a both uncton o x and a vector dened by the value o thoe uncton. Margnal-MAP nerence eek the MAP etmate or a ubet o the varable max varable by margnalzng over the ret o the model varable um varable. The node V on the graphcal model are thu parttoned nto two et: the um node V and the max node V m, wth V = {V, V m }. The edge can be dvded nto three type: um um denoted E, max um E m and max max E mm. The margnal-map problem repreented a p = max p x, x m, x m x where x, x m are the varable correpondng to V, V m. Much o the dculty o margnal-map nerence le n the non-commutatvty o the um and the max operaton. That to ay, we mut rt um over varable x, and then eek the MAP etmate or varable x m. For many model, uch a the mple tree n Fgure, the ummaton operator nduce a dene, perhap even complete graph over the max node, whch requre exponental complexty n the number o max node to expre. 6 7 a 6 b,7, c 6 Fgure : The llutraton o the bgraph vew o margnal- MAP nerence. a the orgnal graph, wth the um node haded and the max node unhaded. b the bpartte graph o the um node rght and the max node let. c the graph ater ummng over the um node. 6 um 6, m 6,,, m 6,,,,, mm 6,, 6,,, Fgure : An example o the ubgraph G let. Bgraph Vew o Margnal-MAP In th ecton, we repreent the graphcal model or margnal-map nerence by a bpartte graph, whch wll be helpul durng the ubequent expoton. A bpartte graph or bgraph a graph whoe vertce can be dvded nto two djont et U and V uch that every edge connect a vertex n U to one n V Bondy and Murty. Let the edge et E m = E m be the edge o the bgraph, and let the node et U = V m. We then contruct actor correpondng to the um node, wth each actor repreentng a et o connected node n V; there an edge between V m and j V n graph G, then n the bgraph there an edge between U and the actor V wth j. In eence, th tructure repreent the actor graph that would be nduced by the elmnaton o the um node n G, and we reer to thee actor a um actor. Fgure gve an llutraton o the bpartte actor graph or a model wth three dconnected ubgraph o um node, along wth the Markov random eld nduced by elmnatng the um node. Our approxmaton algorthm operate on each o the um actor ndependently; thu wthout lo o generalty n the ollowng we conder only a ubgraph G contng o the um node V n a ngle actor, the max node Vm connected to n the bgraph, and the edge n E and Em. Th mean that, when the node V are elmnated, we wll nduce a ully connected graph over the remanng max node Vm. Fgure gve an example or a graph G correpondng to a ubgraph o Fgure a wth = {, }. The potental on graph G ψ x = ψ x, x m = ψ j x, x j.,j E E m Chan Decompoton o Sum Factor In th ecton, we ntroduce a tranormaton o the orgnal model G that wll be ued to contruct our approxmate

a b c Fgure : The chan decompoton o graph G. margnalzaton, and control t computatonal complexty. We ue the common emantc o varable plttng, or ntroducng cope o varable that are contraned to take on equal value, and re-parameterzaton, or allocatng the uncton dened on thoe cope uch that the overall dtrbuton reman nvarant, to dene our tranormaton. Conder ψ x, whch a product o the parwe potental on the edge o G. We repreent ψ x a a product o chan potental, each o whch dened on a chan between two node n Vm. Th re-repreentaton ψ x, x m = ψ j x, x, x j,,j E mm where Emm denote the et o edge between two node {, j} n Vm, and ψ j x, x, x j denote the potental dened on the chan between node and j. Eq. and Eq. repreent the ame potental ung derent actorzaton orm. To acheve th tranormaton, we rt decompoe graph G nto a et o chan, wth ther two end beng max node. Thee chan hould be a coverng Bondy and Murty o the graph G. Then, we dtrbute the orgnal potental o G to the potental on the chan. Fnally, we combne all the max node wth the ame label nto one node. Thu, we re-repreent ψ x wth the product o chan potental. The ollowng example llutrate our repreentaton. Example: Conder the graph hown n Fgure a, where the haded and unhaded node denote the um and max node repectvely, o that V = {} and Vm = {,, }. Then, ψ x, x m = ψ ψ ψ. Let ψ x, x, x, x = ψ ψ ψ x, x, x, x = ψ ψ ψ x, x, x, x = ψ ψ, and we can conclude that ψ x, x m = ψ ψ ψ, The graph repreentaton or ψ ψ ψ hown n Fgure c. An mmedate queton or th repreentaton, how many chan are needed to cover graph G? Snce Eq. nvolve one term per par o node n Vm, t reaonable to expect th many chan. However, th not alway the cae; or ome graph ewer chan are ucent, whle or other more are requred. Fgure b how an example n whch a ngle par o max node requre more than one chan to cover the graph. However, wthout lo o generalty, we wll aume one chan per par o max node, j, a 6 6 Fgure : An example that two chan are needed to cover the orgnal graph. and aocate the chan wth edge, j n the margnalzed model. The advantage o the chan-baed repreentaton o Eq. that each um node copy now ha at mot two neghbor. By relaxng the contrant on equalty among cope, we can obtan an upper bound, whle each um node copy can be elmnated ecently. An Upper Bound o Complete Potental Wthn the ubgraph G, the margnalzaton operator wll produce ully connected or complete graph Km over the max node Vm, reultng n a computatonally ntractable uncton reerred to a the complete potental. In th ecton, we wll approxmate the complete potental wth a product o parwe potental that are dened on the edge o the complete graph. Furthermore, we degn th approxmaton o that t provde an upper bound o the complete potental. Ung the chan-tructured coverng degned n the prevou ecton, each um node copy aocated wth ome chan wth two max node endpont, ay and j. We agn a weght ω j to th copy, wth ω j and,j E ω mm j =. We can then approxmate the complete potental ung an approxmate elmnaton baed on Hölder nequalty ee Lu and Ihler a, o that ψ x m = x ψ x, x m b,j E mm ψ j x, x j where ψ j x, x j dened a ψ j x, x j = ψ j x, x, x j ωj ω j. x Becaue ψ j x, x, x j chan-tructured, t can be computed ecently n O Nd, where N the number o varable on the chan and d the number o tate or each varable. The overall complexty or computng the parwe potental o the complete graph Km no more than Emm O V d. Eq. alo provde an upper bound on the true complete potental: Theorem. The product o the par-we potental yeld an upper bound o the true complete potental, that ψ x m ψ j x, x j, 6,j E mm where ψ j x, x j dened a n Eq.. Equalty hold x X, x m X m

x ψ j x, x, x j ω j ψ j x, x, x j ω j = cont., 7 where, j E mm, and cont. denote the ame contant. Proo. The reult ollow drectly rom Hölder nequalty. Gven x, ω, = {,,..., n}, and n ω =, Hölder nequalty Hardy, Lttlewood, and = Pólya tate that x a n x ω = n x ω. xa = Takng x = ψ j x, x, x j ω j, and the denton o the complete potental n Eq., the r.h.. o Eq. 6 yeld the denton Eq.. The condton n Eq. 7 can be derved rom the equalty condton o Hölder nequalty. In eect, th replace the complete graph nduced by elmnatng connected component G wth a par-we graphcal model that upper bound the orgnal. Dual Decompoton or MAP By ummng over all the um varable ung Eq., we obtan a graph wth only the max varable. The next tep to etmate the maxmum a poteror MAP conguraton o thee varable. The MAP problem can be olved ecently ung the technque o dual decompoton Sontag, Globeron, and Jaakkola. For eay mplementaton, we can ue tree-decompoed block coordnate decent algorthm, uch a the max-um duon MSD algorthm Werner 7, the max product lnear programmng MPLP algorthm Globeron and Jaakkola or the equental tree-reweghted meage pang TRW-S algorthm Kolmogorov 6. To obtan tghter bound, we can ue algorthm wth hgh-order contrant, uch a the generalzed MPLP GMPLP algorthm Sontag et al. or outerplanar decompoton Batra et al.. Under the ramework o dual decompoton, the above algorthm yeld an upper bound on the MAP agnment. Recall that the chan decompoton approach return an upper bound or the complete potental; thu we conclude that the approxmaton approach baed on chan decompoton and dual decompoton yeld an upper bound o p n Eq.. We gve a ketch o our algorthm or olvng the margnal- MAP problem n Algorthm. Varatonal Repreentaton Our algorthm can alo be nterpreted n a varatonal ramework, ung the connecton between Hölder nequalty and weghted entropy decompoton Lu and Ihler a. Lu and Ihler b provde a varatonal ramework or addreng the margnal-map problem. Conderng only the graph G, the varatonal repreentaton Φ θ on G Algorthm The Chan Decompoton Algorthm Input: A graphcal model G or margnal-map nerence. Output: An upper bound o p n Eq.. : Repreent the potental o the um node wth the potental on a et o chan ung Eq.. : Sum over the um varable ung Eq.. : Ue a dual decompoton technque or MAP nerence. : Return the upper bound and the MAP etmate. Φ θ = { max θ, µ + H x µ M x m; µ }, where M the margnal polytope, and H x x m; µ = x q µ x log q µ x xm the condtonal entropy, wth q µ x beng the maxmum entropy dtrbuton correpondng to µ. The varatonal repreentaton an equvalent tranormaton o the orgnal margnal-map problem, wth Φ θ = log p, where p dened on graph G a n Eq.. However, th dual repreentaton doe not reduce the computatonal cot. For our purpoe, t more convenent to expre the varatonal orm on the um node alone, keepng the optmzaton over x m n t combnatoral orm: Φ θ = max x m X m { θ x m, x m + Φ θ x x } m, where x m Xm, Φ θ x xm dened a: { θ x x } m, µ Φ θ x x m = max µ Mx +H x ; µ The two repreentaton n Eq. and Eq. are equvalent at ther optmal value. Our approxmaton decompoe G nto a et o chan, wth the two end o each chan beng max node. Let C G be the et o chan, and C be a chan n C G. Frt, we decompoe the parameter θ x xm on G nto a combnaton o the parameter on a et o chan, uch a =, θ x x m C CG C CG θ C x x m where =, and θ C x x m the parameter on chan C. Snce Φ θ x xm a convex uncton w.r.t. the parameter θ x xm Wanwrght, Jaakkola, and Wllky, we can apply Jenen nequalty to a convex combnaton o the parameter and obtan an upper bound: Φ θ x x m = Φ θ C x x m C CG Then, Φ θ can be approxmated a Φ θ C x x m..

Φ θ = max x m X m C CG θ x m, x m + Φ θ C x x m. The max operaton on x m n Eq. to olve an nteger programmng problem. Th problem can be urther approxmated ung the technque o lnear programmng relaxaton or dual decompoton. Algorthm provde a drect mplementaton o Eq., then apple the dual decompoton technque or the MAP etmate component. Relaton wth A-B Tree Decompoton Baed on the varatonal ramework, Lu and Ihler b ntroduce a tree-reweghted ree energy by decompong the orgnal graph nto a combnaton o A-B tree. The A-B tree uch a tree that no two edge n E m are connected by node n V. In the ollowng, we wll analyze the relaton between our chan-baed decompoton method and the A- B tree-baed decompoton method. Both method ue reweghted ree energy approxmaton to provde upper bound on the optmal margnal MAP value. However, the prmary derence are: I. I tree-decompoed block coordnate decent algorthm are ued or the MAP etmate n Algorthm, our chan-baed method provde a orm o hypertree decompoton on the um node, nce elmnaton o each um node allowed to nvolve two adjacent node a chan. II. Our method doe not requre any partcular choce o optmzaton or the max node, nce an explct par-we model produced. In practce we ue dualdecompoton, but other method are ealy appled. III. Our ramework explctly elect a xed allocaton o the um node parameter ψ j x, x, x j to each chan, wherea the meage-pang proce n the A-B tree method able to tghten t bound durng the teratve proce. I ugget that, the optmal value o the weght ω j and the re-parameterzaton nto chan ψ j x, x, x j are ued, the chan decompoton bound wll be tghter than that o a tree-reweghted collecton o A-B tree. Experment In th ecton, we conduct experment to how the eectvene o the chan decompoton algorthm. We tet the chan decompoton algorthm on three type o graph: tar model, chan model, and grd model, a hown n Fgure 6. The dtrbuton on thee model are dened a p x exp θ x + θ j x, x j. V,j E We et θ j k, k =, and randomly generate θ k N,., θ j k, l N, σ or k l, where σ {.,.,...,.} the couplng trength. For the tar model and the chan model, each varable ha three tate, a b 6 6 7 7 7 6 6 6 7 7 7 6 6 6 Fgure 6: The three model or experment, wth haded um node and unhaded max node. a tar model, b chan model, c grd model........ a......6 c Mx Bethe SP Bethe MP Bethe SP TRW MP TRW HMP Fgure 7: Reult on the tar model o Fgure 6a. a The upper bound obtaned by the tree-reweghted mxed meage pang and the chan decompoton algorthm; b The relatve energy error o derent algorthm. and or the grd model, each varable ha two tate. The reult are obtaned ater averagng tral. We tet the Bethe mxed meage pang Mx-Bethe Lu and Ihler b, the tree-reweghted mxed meage pang Mx-TRW Lu and Ihler b, the hybrd meage pang HMP Jang, Ra, and Daumé III, the Bethe um-product SP-Bethe, the Bethe max-product MP-Bethe, the tree-reweghted um-product SP-TRW, the tree-reweghted max-product MP-TRW, and the chan decompoton Chan-Dec algorthm on thee graphcal model. To mplement the tree-reweghted algorthm on the grd model, we decompoe t nto a combnaton o our pannng A-B tree. We compute the relatve energy error o derent algorthm. The relatve energy error dened a log ˆp log p / log p, where log p the maxmal energy and log ˆp the approxmate energy obtaned by that algorthm. Here, ˆp = x p x, ˆx m, where ˆx m the etmated oluton by derent algorthm. For the Mx- TRW and Chan-Dec algorthm, we alo compute the upper bound o the maxmal energy. The reult are hown n Fgure 7,,. Fgure 7a,a,a how that the upper bound obtaned by the chan decompoton algorthm tghter than the upper bound obtaned by the tree-reweghted mxed meage pang algorthm. Fgure 7b,b,b how that the Bethe mxed meage pang algorthm and the hybrd meage pang algorthm perorm much better than the other algorthm. Although the chan decompoton algorthm doe not gve the bet oluton, t perormance comparable to the Bethe mxed meage pang algorthm and the hybrd meage pang algorthm. Moreover, the chan decompoton algorthm alway gve maller relatve error than the tree-reweghted mxed meage pang algorthm. b

... a.... Mx Bethe SP Bethe MP Bethe SP TRW MP TRW HMP Fgure : Reult on the chan model o Fgure 6b. a The upper bound obtaned by Mx-TRW and Chan-Dec; b The relatve energy error o derent algorthm..... a.... b Mx Bethe SP Bethe MP Bethe SP TRW MP TRW HMP Fgure : Reult on the grd model o Fgure 6c. a The upper bound obtaned by Mx-TRW and Chan-Dec; b The relatve energy error o derent algorthm. Concluon Th paper preent a novel method to ecently approxmate the margnalzaton tep n margnal-map problem on graphcal model. The um operaton reult n a complete potental over the connected neghborhood, wth exponental ze. We propoe a chan decompoton approach to approxmate th complete potental wth a product o par-we potental. Th technque can be nterpreted a a reweghted varatonal approach, wth a correpondng ree energy approxmaton, and return an upper bound o the maxmal energy. Expermental reult how that our method gve good upper bound when compared to extng technque, and perorm comparably to tate-o-the-art method on oluton qualty. Acknowledgement Th work wa upported by the Natonal Natural Scence Foundaton o Chna No.67, Bejng Natural Scence Foundaton No., Natonal Key Bac Reearch and Development Program o Chna No.CB6 and Unted Technologe Reearch Center UTRC. Reerence Batra, D.; Gallagher, A.; Parkh, D.; and Chen, T.. Beyond tree: MRF nerence va outer-planar decompoton. In CVPR. Bondy, J., and Murty, U.. Graph Theory. Sprnger Berln. de Campo, L.; Gámez, J.; and Moral, S.. Partal abductve nerence n Bayean bele network ung b a genetc algorthm. Pattern Recogn. Lett. -: 7. Dechter, R., and Rh, I.. Mn-bucket: A general cheme or bounded nerence. J. ACM :7. Doucet, A.; Godll, S.; and Robert, C.. Margnal maxmum a poteror etmaton ung Markov chan Monte Carlo. Stat. Comput. :77. Globeron, A., and Jaakkola, T.. Fxng maxproduct: Convergent meage pang algorthm or MAP LP-relaxaton. In NIPS. Hardy, G.; Lttlewood, J.; and Pólya, G.. Inequalte. Cambrdge Unverty Pre. Huang, J.; Chavra, M.; and Darwche, A. 6. Solvng MAP exactly by earchng on compled arthmetc crcut. In AAAI. Jang, J.; Ra, P.; and Daumé III, H.. Meage-pang or approxmate MAP nerence wth latent varable. In NIPS. Johanen, A.; Doucet, A.; and Davy, M.. Partcle method or maxmum lkelhood etmaton n latent varable model. Stat. Comput. :7 7. Koller, D., and Fredman, N.. Probabltc Graphcal Model. MIT Pre. Kolmogorov, V. 6. Convergent tree-reweghted meage pang or energy mnmzaton. IEEE Tran. PAMI :6. Lu, Q., and Ihler, A. a. Boundng the partton uncton ung Hölder nequalty. In ICML, 6. Lu, Q., and Ihler, A. b. Varatonal algorthm or margnal MAP. In UAI. Meek, C., and Wexler, Y.. Improved approxmate um-product nerence ung multplcatve error bound. In Bayean Stattc. Oxord Unverty Pre. Park, J., and Darwche, A.. Complexty reult and approxmaton tratege or MAP explanaton. J. Art. Intell. Re. :. Sontag, D.; Meltzer, T.; Globeron, A.; Jaakkola, T.; and We, Y.. Tghtenng LP relaxaton or MAP ung meage pang. In UAI. Sontag, D.; Globeron, A.; and Jaakkola, T.. Introducton to dual decompoton or nerence. In Optmzaton or Machne Learnng. MIT Pre. Wanwrght, M.; Jaakkola, T.; and Wllky, A.. A new cla o upper bound on the log partton uncton. IEEE Tran. In. Theory 7:. Werner, T. 7. A lnear programmng approach to maxum problem: A revew. IEEE Tran. PAMI 6 7.