An Alternating Direction Method for Dual MAP LP Relaxation

Size: px
Start display at page:

Download "An Alternating Direction Method for Dual MAP LP Relaxation"

Transcription

1 An Alternatng Drecton Method for Dual MAP LP Relaxaton Ofer Mesh and Amr Globerson The School of Computer Scence and Engneerng, The Hebrew Unversty of Jerusalem, Jerusalem, Israel Abstract. Maxmum a-posteror MAP) estmaton s an mportant task n many applcatons of probablstc graphcal models. Although fndng an exact soluton s generally ntractable, approxmatons based on lnear programmng LP) relaxaton often provde good approxmate solutons. In ths paper we present an algorthm for solvng the LP relaxaton optmzaton problem. In order to overcome the lack of strct convexty, we apply an augmented Lagrangan method to the dual LP. The algorthm, based on the alternatng drecton method of multplers ADMM), s guaranteed to converge to the global optmum of the LP relaxaton objectve. Our expermental results show that ths algorthm s compettve wth other state-of-the-art algorthms for approxmate MAP estmaton. Keywords: Graphcal Models, Maxmum a-posteror, Approxmate Inference, LP Relaxaton, Augmented Lagrangan Methods 1 Introducton Graphcal models are wdely used to descrbe multvarate statstcs for dscrete varables, and have found wdespread applcatons n numerous domans. One of the basc nference tasks n such models s to fnd the maxmum a-posteror MAP) assgnment. Unfortunately, ths s typcally a hard computatonal problem whch cannot be solved exactly for many problems of nterest. It has turned out that lnear programmng LP) relaxatons provde effectve approxmatons to the MAP problem n many cases e.g., see [15, 1, 4]). Despte the theoretcal computatonal tractablty of MAP-LP relaxatons, solvng them n practce s a challenge for real world problems. Usng off-theshelf LP solvers s typcally nadequate for large models snce the resultng LPs have too many constrants and varables [9]. Ths has led researchers to seek optmzaton algorthms that are talored to the specfc structure of the MAP- LP [7, 13, 14, 16, 0, 8]. The advantage of such methods s that they work wth very smple local updates and are therefore easy to mplement n the large scale settng. The suggested algorthms fall nto several classes, dependng on ther approach to the problem. The TRW-S [14], MSD [8] and MPLP [7] algorthms

2 Ofer Mesh and Amr Globerson employ coordnate descent n the dual of the LP. Whle these methods typcally show good emprcal behavor, they are not guaranteed to reach the global optmum of the LP relaxaton. Ths s a result of non strct-convexty of the dual LP and the fact that block coordnate descent mght get stuck n suboptmal ponts under these condtons. One way to avod ths problem s to use a soft-max functon whch s smooth and strctly convex, hence ths results n globally convergent algorthms [6, 10, 1]. Another class of algorthms [13, 16] uses the same dual objectve, but employs varants of subgradent descent to t. Whle these methods are guaranteed to converge globally, they are typcally slower n practce than the coordnate descent ones e.g., see [13] for a comparson). Fnally, there are also algorthms that optmze the prmal LP drectly. One example s the proxmal pont method of Ravkumar et al. [0]. Whle also globally convergent, t has the dsadvantage of usng a double loop scheme where every update nvolves an teratve algorthm for projectng onto the local polytope. More recently, Martns et al. [17] proposed a globally convergent algorthm for MAP-LP based on the alternatng drecton method of multplers ADMM) [8, 5, 4, ]. Ths method proceeds by teratvely updatng prmal and dual varables n order to fnd a saddle pont of an augmented Lagrangan for the problem. They suggest to use an augmented Lagrangan of the prmal MAP-LP problem. However, ther formulaton s restrcted to bnary parwse factors and several specfc global factors. In ths work, we propose an algorthm that s based on the same key dea of ADMM, however t stems from augmentng the Lagrangan of the dual MAP-LP problem nstead. An mportant advantage of our approach s that the resultng algorthm can be appled to models wth general local factors non-parwse, non-bnary). We also show that n practce our algorthm converges much faster than the prmal ADMM algorthm and that t compares favorably wth other state-of-the-art methods for MAP-LP optmzaton. MAP and LP relaxaton Markov Random Felds MRFs) are probablstc graphcal models that encode the jont dstrbuton of a set of dscrete random varables X = {X 1,..., X n }. The jont probablty s defned by combnng a set C of local functons θ c x c ), termed factors. The factors depend only on small) subsets of the varables X c X ) and model the drect nteractons between them to smplfy notaton we drop the varable name n X c = x c ; see [7]). The jont dstrbuton s then gven by: P x) exp θ x ) + c C θ cx c ) ), where we have ncluded also sngleton factors over ndvdual varables [7]. In many applcatons of MRFs we are nterested n fndng the maxmum probablty assgnment MAP assgnment). Ths yelds the optmzaton problem: arg max x θ x ) + θ c x c ) c C Due to ts combnatoral nature, ths problem s NP-hard for general graphcal models, and tractable only n solated cases such as tree structured graphs. Ths has motvated research on approxmaton algorthms.

3 An Alternatng Drecton Method for Dual MAP LP Relaxaton 3 One of the most successful approxmaton schemes has been to use LP relaxatons of the MAP problem. In ths approach the orgnal combnatoral problem s posed as a LP and then some of the constrants are relaxed to obtan a tractable LP problem that approxmates the orgnal one. In our case, the resultng MAP-LP relaxaton problem s: max µ LG) µ x )θ x ) + µ c x c )θ c x c ) 1) x c x c where µ are auxlary varables that correspond to pseudo) margnal dstrbutons, and LG) s the reduced set of constrants called the local polytope [7], defned by: { } LG) = µ 0 x c\ µ c x c\, x ) = µ x ) c, : c, x x µ x ) = 1 In ths paper we use the dual problem of Eq. 1), whch takes the form: mn max θ x ) + ) δ c x ) + max θ c x c ) ) δ c x ) δ x x c c: c c : c ) where δ are dual varables correspondng to the margnalzaton constrants n LG) see [, 8, 3]). 1 Ths formulaton offers several advantages. Frst, t mnmzes an upper bound on the true MAP value. Second, t provdes an optmalty certfcate through the dualty gap w.r.t. a decoded prmal soluton [3]. Thrd, the resultng problem s unconstraned, whch facltates ts optmzaton. Indeed, several algorthms have been proposed for optmzng ths dual problem. The two man approaches are block coordnate descent [14, 8, 7] and subgradent descent [16], each wth ts advantages and dsadvantages. In partcular, coordnate descent algorthms are typcally much faster at mnmzng the dual, whle the subgradent method s guaranteed to converge to the global optmum see [3] for n-depth dscusson). Recently, Jojc et al. [13] presented an accelerated dual decomposton algorthm whch stems from addng strongly convex smoothng terms to the subproblems n the dual functon Eq. ). Ther method acheves a better convergence rate over the standard subgradent method O ) 1 ɛ vs. O 1 ) ɛ ). An alternatve approach, that s also globally convergent, has been recently suggested by Martns et al. [17]. Ther approach s based on an augmented Lagrangan method, whch we next dscuss. 3 The Alternatng Drecton Method of Multplers We now brefly revew ADMM for convex optmzaton [8, 5, 4, ]. 1 An equvalent optmzaton problem can be derved va a dual decomposton approach [3].

4 4 Ofer Mesh and Amr Globerson Consder the followng optmzaton problem: mnmze fx) + gz) s.t. Ax = z 3) where f and g are convex functons. The ADMM approach begns by addng the functon ρ Ax z to the above objectve, where ρ > 0 s a penalty parameter. Ths results n the optmzaton problem: mnmze fx) + gz) + ρ Ax z s.t. Ax = z 4) Clearly the above has the same optmum as Eq. 3) snce when the constrants Ax = z are satsfed, the added quadratc term equals zero. The Lagrangan of the augmented problem Eq. 4) s gven by: L ρ x, z, ν) = fx) + gz) + ν Ax z) + ρ Ax z 5) where ν s a vector of Lagrange multplers. The soluton to the problem of Eq. 4) s gven by max ν mn x,z L ρ x, z, ν). The ADMM method provdes an elegant algorthm for fndng ths saddle pont. The dea s to combne subgradent descent over ν wth coordnate descent over the x and z varables. The method apples the followng teratons: x t+1 = arg mn L ρ x, z t, ν t ) x z t+1 = arg mn L ρ x t+1, z, ν t ) z ν t+1 =ν t + ρ Ax t+1 z t+1) 6) The algorthm conssts of prmal and dual updates, where the prmal update s executed sequentally, mnmzng frst over x and then over z. Ths splt retans the decomposton of the objectve that has been lost due to the addton of the quadratc term. The algorthm s run ether untl the number of teratons exceeds a predefned lmt, or untl some termnaton crteron s met. A commonly used such stoppng crteron s: Ax z ɛ and z t+1 z t ɛ. These two condtons can serve to bound the suboptmalty of the soluton. The ADMM algorthm s guaranteed to converge to the global optmum of Eq. 3) under rather mld condtons []. However, n terms of convergence rate, the worst case complexty of ADMM s O 1 ɛ ). Despte ths potental caveat, ADMM has been shown to work well n practce e.g., [1, 6]). Recently, accelerated varants on the basc alternatng drecton method have been proposed [9]. These faster algorthms are based on lnearzaton and come wth mproved convergence rate of O 1 ɛ ), achevng the theoretcal lower bound for frst-order methods [19]. In ths paper we focus on the basc ADMM formulaton and leave dervaton of accelerated varants to future work.

5 An Alternatng Drecton Method for Dual MAP LP Relaxaton 5 4 The Augmented Dual LP Algorthm In ths secton we derve our algorthm by applyng ADMM to the dual MAP- LP problem of Eq. ). The challenge s to desgn the constrants n a way that facltates effcent closed-form solutons for all updates. To ths end, we duplcate the dual varables δ and denote the second copy by δ. We then ntroduce addtonal varables λ c correspondng to the summaton of δ s pertanng to factor c. These agreement constrants are enforced through δ, and thus we have a constrant δ c x ) = δ c x ) for all c, : c, x, and λ c x c ) = : c δ c x ) for all c, x c. Followng the ADMM framework, we add quadratc terms and obtan the augmented Lagrangan for the dual MAP-LP problem of Eq. ): L ρδ, λ, δ, γ, µ) = max θ x x ) + ) δ cx ) + max θ x cx c) λ cx c)) c c: c c + γ cx ) δ cx ) δ ) cx ) + ρ δcx ) δ ) cx ) c : c x c : c x + µ cx c) λ cx c) ) δ cx ) + ρ λ cx c) δ cx ) c x c : c c : c To see the relaton of ths formulaton to Eq. 5), notce that δ, λ) subsume the role of x, δ subsumes the role of z wth gz) = 0), and the multplers γ, µ) correspond to ν. The updates of our algorthm, whch stem from Eq. 6), are summarzed n Alg. 1 a detaled dervaton appears n Appendx A). In Alg. 1 we defne N) = {c : c}, and the subroutne w = TRIMv, d) that serves to clp the values n the vector v at some threshold t.e., w = mn{v, t}) such that the sum of removed parts equals d > 0.e., v w = d). Ths can be carred out effcently n lnear tme n expectaton) by parttonng [3]. Notce that all updates can be computed effcently so the cost of each teraton s smlar to that of message passng algorthms lke MPLP [7] or MSD [8], and to that of dual decomposton [13, 16]. Furthermore, sgnfcant speedup s attaned by cachng some results for future teratons. In partcular, the threshold n the TRIM subroutne the new maxmum) can serve as a good ntal guess at the next teraton, especally at later teratons where the change n varable values s qute small. Fnally, many of the updates can be executed n parallel. In partcular, the δ update can be carred out smultaneously for all varables, and lkewse all factors c can be updated smultaneously n the λ and δ updates. In addton, δ and λ can be optmzed ndependently, snce they appear n dfferent parts of the objectve. Ths may result n consderable reducton n runtme when executed on parallel archtecture. In our experments we used sequental updates. x c )

6 6 Ofer Mesh and Amr Globerson Algorthm 1 The Augmented Dual LP Algorthm ADLP) for t = 1 to T do Update δ: for all = 1,..., n Set θ = θ + c: c δ c 1 γc) ρ θ = TRIM θ, N) ) ρ q = θ θ )/ N) Update δ c = δ c 1 ρ γc q c : c Update λ: for all c C Set θ c = θ c : c δ c + 1 ρ µc θ c = TRIM θ c, 1 ρ ) Update λ c = θ c θ c Update δ: for all c C, : c, x Set v cx ) = δ cx ) + 1 γcx) + ρ x c\ λ cx c\, x ) + 1 ρ x c\ µ cx c\, x ) 1 v c = 1+ k:k c X c\k k:k c X c\k x k v ck x k ) Update δ cx ) = 1 1+ X c\ [ v cx ) j:j c,j X c\{,j} xj vcjxj) vc )] Update the multplers: γ cx ) γ cx ) + ρ δ cx ) δ ) cx ) for all c C, : c, x µ cx c) µ cx c) + ρ λ cx c) δ ) : c cx ) for all c C, x c end for 5 Expermental Results To evaluate our augmented dual LP ADLP) algorthm Alg. 1) we compare t to two other algorthms for fndng an approxmate MAP soluton. The frst s MPLP of Globerson and Jaakkola [7], whch mnmzes the dual LP of Eq. ) va block coordnate descent steps cast as message passng). The second s the accelerated dual decomposton ADD) algorthm of Jojc et al. [13]. 3 We conduct experments on proten desgn problems from the dataset of Yanover et al. [9]. In these problems we are gven a 3D structure and the goal s to fnd a sequence of amno-acds that s the most stable for that structure. The problems are modeled by sngleton and parwse factors and can be posed as fndng a MAP assgnment for the gven model. Ths s a demandng settng n whch each problem may have hundreds of varables wth 100 possble states on average [9, 4]. Fgure 1 shows two typcal examples of proten desgn problems. It plots the objectve of Eq. ) computed usng δ varables only) as a functon of the executon tme for all algorthms. Frst, n Fgure 1 left) we observe that the coordnate descent algorthm MPLP) converges faster than the other algorthms, 3 For both algorthms we used the same C++ mplementaton used by Jojc et al. [13], avalable at Our own algorthm was mplemented as an extenson of ther package.

7 An Alternatng Drecton Method for Dual MAP LP Relaxaton 7 Objectve jo8 MPLP ADD ε=1) ADLP ρ=0.05) Objectve ycc MPLP ADD ε=1) ADD ε=10) ADLP ρ=0.01) ADLP ρ=0.05) Runtme secs) Runtme secs) Fg. 1. Comparson of three algorthms for approxmate MAP estmaton: our augmented dual LP algorthm ADLP), accelerated dual decomposton algorthm ADD) by Jojc et al. [13], and the dual coordnate descent MPLP algorthm [7]. The fgure shows two examples of proten desgn problems, for each the dual objectve of Eq. ) s plotted as a functon of executon tme. Dashed lnes denote the value of the best decoded prmal soluton. however t tends to stop prematurely and yeld suboptmal solutons. In contrast, ADD and ADLP take longer to converge but acheve the globally optmal soluton to the approxmate objectve. Second, t can be seen that the convergence tmes of ADD and ADLP are very close, wth a slght advantage to ADD. The dashed lnes n Fgure 1 show the value of the decoded prmal soluton assgnment) [3]. We see that there s generally a correlaton between the qualty of the dual objectve and the decoded prmal soluton, namely the decoded prmal soluton mproves as the dual soluton approaches optmalty. Nevertheless, we note that there s no domnant algorthm n terms of decodng here we show examples where our decodng s superor). In many cases MPLP yelds better decoded solutons despte beng suboptmal n terms of the dual objectve not shown; ths s also noted n [13]). We also conduct experments to study the effect of the penalty parameter ρ. Our algorthm s guaranteed to globally converge for all ρ > 0, but ts choce affects the actual rate of convergence. In Fgure 1 rght) we compare two values of the penalty parameter ρ = 0.01 and ρ = It shows that settng ρ = 0.01 results n somewhat slower convergence to the optmum, however n ths case the fnal prmal soluton dashed lne) s better than that of the other algorthms. In practce, n order to choose an approprate ρ, one can run a few teratons of ADLP wth several values and see whch one acheves the best objectve [17]. We menton n passng that ADD employs an accuracy parameter ɛ whch determnes the desred suboptmalty of the fnal soluton [13]. Settng ɛ to a large value results n faster convergence to a lower accuracy soluton. On the one hand, ths trade-off can be vewed as a mert of ADD, whch allows to obtan coarser approxmatons at reduced cost. On the other hand, an advantage of our method s that the choce of penalty ρ affects only the rate of convergence and does not mpose addtonal reducton n soluton accuracy over that of the LP relaxaton. In Fgure 1 left) we use ɛ = 1, as n Jojc et al., whle n Fgure 1

8 8 Ofer Mesh and Amr Globerson Objectve a8 MPLP ADD ε=1) ADLP ρ=0.05) Objectve jo8 ADLP APLP Runtme secs) Runtme secs) Fg.. Left) Comparson for a sde chan predcton problem smlar to Fgure 1 left). Rght) Comparson of our augmented dual LP algorthm ADLP) and a generalzed varant APLP) of the ADMM algorthm by Martns et al. [17] on a proten desgn problem. The dual objectve of Eq. ) s plotted as a functon of executon tme. Dashed lnes denote the value of the best decoded prmal soluton. rght) we compare two values ɛ = 1 and ɛ = 10 to demonstrate the effect of ths accuracy parameter. We next compare performance of the algorthms on a sde chan predcton problem [9]. Ths problem s the nverse of the proten desgn problem, and nvolves fndng the 3D confguraton of rotamers gven the backbone structure of a proten. Fgure left) shows a comparson of MPLP, ADD and ADLP on one of the largest protens n the dataset 81 varables wth 1 states on average). As n the proten desgn problems, MPLP converges fast to a suboptmal soluton. We observe that here ADLP converges somewhat faster than ADD, possbly because the smaller state space results n faster ADLP updates. As noted earler, Martns et al. [17] recently presented an approach that apples ADMM to the prmal LP.e., Eq. 1)). Although ther method s lmted to bnary parwse factors and several global factors), t can be modfed to handle non-bnary hgher-order factors, as the dervaton n Appendx B shows. We denote ths varant by APLP. As n ADLP, n the APLP algorthm all updates are computed analytcally and executed effcently. Fgure rght) shows a comparson of ADLP and APLP on a proten desgn problem. It llustrates that ADLP converges sgnfcantly faster than APLP smlar results, not shown here, are obtaned for the other protens). 6 Dscusson Approxmate MAP nference methods based on LP relaxaton have drawn much attenton lately due to ther practcal success and attractve propertes. In ths paper we presented a novel globally convergent algorthm for approxmate MAP estmaton va LP relaxaton. Our algorthm s based on the augmented Lagrangan method for convex optmzaton, whch overcomes the lack of strct convexty by addng a quadratc term to smooth the objectve. Importantly, our algorthm proceeds by applyng smple to mplement closed-form updates, and

9 An Alternatng Drecton Method for Dual MAP LP Relaxaton 9 t s hghly scalable and parallelzable. We have shown emprcally that our algorthm compares favorably wth other state-of-the-art algorthms for approxmate MAP estmaton n terms of accuracy and convergence tme. Several exstng globally convergent algorthms for MAP-LP relaxaton rely on addng local entropy terms n order to smooth the objectve [6, 10, 1, 13]. Those methods must specfy a temperature control parameter whch affects the qualty of the soluton. Specfcally, solvng the optmzaton subproblems at hgh temperature reduces soluton accuracy, whle solvng them at low temperature mght rase numercal ssues. In contrast, our algorthm s qute nsenstve to the choce of such control parameters. In fact, the penalty parameter ρ affects the rate of convergence but not the accuracy or numercal stablty of the algorthm. Moreover, despte lack of fast convergence rate guarantees, n practce the algorthm has smlar or better convergence tmes compared to other globally convergent methods n varous settngs. Note that [17] also show an advantage of ther prmal based ADMM method over several baselnes. Several mprovements over our basc algorthm can be consdered. One such mprovement s to use smart ntalzaton of the varables. For example, snce MPLP acheves larger decrease n objectve at early teratons, t s possble to run t for a lmted number of steps and then take the resultng varables δ for the ntalzaton of ADLP. Notce, however, that for ths scheme to work well, the Lagrange multplers γ and µ should be also ntalzed accordngly. Another potental mprovement s to use an adaptve penalty parameter ρ t e.g., [11]). Ths may mprove convergence n practce, as well as reduce senstvty to the ntal choce of ρ. On the downsde, the theoretcal convergence guarantees of ADMM no longer hold n ths case. Martns et al. [17] show that the ADMM framework s also sutable for handlng certan types of global factors, whch nclude a large number of varables n ther scope e.g., XOR factor). Usng an approprate formulaton, t s possble to ncorporate such factors n our dual LP framework as well. 4 Fnally, t s lkely that our method can be further mproved by usng recently ntroduced accelerated varants of ADMM [9]. Snce these varants acheve asymptotcally better convergence rate, the applcaton of such methods to MAP-LP smlar to the one we presented here wll lkely result n faster algorthms for approxmate MAP estmaton. In ths paper, we assumed that the model parameters were gven. However, n many cases one wshes to learn these from data, for example by mnmzng a predcton loss e.g., hnge loss [5]). We have recently shown how to ncorporate dual relaxaton algorthms nto such learnng problems [18]. It wll be nterestng to apply our ADMM approach n ths settng to yeld an effcent learnng algorthm for structured predcton problems. Acknowledgments. We thank Am Wesel and Elad Eban for useful dscussons and comments on ths manuscrpt. We thank Stephen Gould for hs SVL code. Ofer Mesh s a recpent of the Google European Fellowshp n Machne Learnng, and ths research s supported n part by ths Google Fellowshp. 4 The auxlary varables λ c are not used n ths case.

10 10 Ofer Mesh and Amr Globerson A Dervaton of Augmented Dual LP Algorthm In ths secton we derve the ADMM updates for the augmented Lagrangan of the dual MAP-LP whch we restate here for convenence: L ρδ, λ, δ, γ, µ) = max θ x x ) + ) δ cx ) + max θ x cx c) λ cx c)) c c: c c + γ cx ) δ cx ) δ ) cx ) + ρ δcx ) δ ) cx ) c : c x c : c x + µ cx c) λ cx c) ) δ cx ) + ρ λ cx c) δ cx ) c x c : c : c c x c ) Updates: The δ update: For each varable = 1,..., n consder a block δ whch conssts of δ c for all c : c. For ths block we need to mnmze the followng functon: ) max x θ x ) + c: c δ cx ) + c: c x γ cx )δ cx )+ ρ c: c x δcx ) δ cx ) ) Equvalently, ths can be wrtten more compactly n vector notaton as: 1 mn δ δ δ 1 ρ γ ) δ + 1 ρ max θ x ) + δ c x )) x c: c where δ and γ are defned analogous to δ. The closed-form soluton to ths QP s gven by the update n Alg. 1. It s obtaned by nspectng KKT condtons and explotng the structure of the summaton nsde the max for a smlar dervaton see [3]). The λ update: For each factor c C we seek to mnmze the functon: max x c θ c x c ) λ c x c )) + x c µ c x c )λ c x c ) + ρ In equvalent vector notaton we have the problem: x c λ c x c ) : c δ c x ) ) 1 mn λ c λ c δ c 1 ρ µ c λ c + 1 ρ max θ c x c ) λ c x c )) x c : c Ths QP s very smlar to that of the δ update and can be solved usng the same technque. The resultng closed-form update s gven n Alg. 1. )

11 An Alternatng Drecton Method for Dual MAP LP Relaxaton 11 The δ update: For each c C we consder a block whch conssts of δ c for all : c. We seek a mnmzer of the functon: γ c x ) δ c x ) + ρ δc x ) δ c x ) ) : c x : c x µ c x c ) δ c x ) + ρ λ c x c ) ) δ c x ) : c : c x c x c Takng partal dervatve w.r.t. δ c x ) and settng to 0 yelds: 1 δ c x ) = v c x ) X c\{,j} δcj x j ) 1 + X c\ x j j:j c,j where: v c x ) = δ c x ) + 1 ρ γ cx ) + x c\ λ c x c\, x ) + 1 ρ x c\ µ c x c\, x ). Summng ths over x and : c and pluggng back n, we get the update n Alg. 1. Fnally, the multplers update s straghtforward. B Dervaton of Augmented Prmal LP Algorthm We next derve the algorthm for optmzng Eq. 1) wth general local factors. Consder the followng formulaton whch s equvalent to the prmal MAP-LP problem of Eq. 1). Defne: { x f µ ) = µ x )θ x ) µ x ) 0 and µ x x ) = 1 otherwse { µ xc cx c )θ c x c ) µ c x c ) 0 and µ xc cx c ) = 1 f c µ c ) = otherwse f accounts for the non-negatvty and normalzaton constrants n LG). We add the margnalzaton constrants va copes of µ c for each c, denoted by µ c. Thus we get the augmented Lagrangan: L ρµ, µ, δ, β) = f µ ) + f cµ c) c δ cx ) µ cx ) µ x )) ρ µ cx ) µ x )) c : c x c : c x β cx c) µ cx c) µ cx c)) ρ µ cx c) µ cx c)) c x c c x c : c : c

12 1 Ofer Mesh and Amr Globerson where µ c x ) = x c\ µ c x c\, x ). To draw the connecton wth Eq. 5), n ths formulaton µ subsumes the role of x, µ subsumes the role of z wth gz) = 0), and the multplers δ, β) correspond to ν. We next show the updates whch result from applyng Eq. 6) to ths formulaton. Update µ for all = 1,..., n: µ arg max µ µ θ + c: c δ c + ρm µ c ) ) 1 µ ρ N) I)µ where M µ c = x c\ µ c x c\, ). We have to maxmze ths QP under smplex constrants on µ. Notce that the objectve matrx s dagonal, so ths can be solved n closed form by shftng the target vector and then truncatng at 0 such that the sum of postve elements equals 1 see [3]). The soluton can be computed n lnear tme n expectaton) by parttonng [3]. Update µ c for all c C: µ c arg max µ c c µ c θ c + : c β c + ρ µ c ) ) 1 µ c ρ Nc) I)µ c where Nc) = { : c}. Agan we have a projecton onto the smplex wth dagonal objectve matrx, whch can be done effcently. Update µ c for all c C, : c: µ c arg max µ ) c M ρ ρµ δ c ) β c + ρµ c µ c µ c M M + I ) µ c Here we have an unconstraned QP, so the soluton s obtaned by H 1 v. Further notce that the nverse H 1 can be computed n closed form. To see how, M M s a block-dagonal matrx wth blocks of ones wth sze X. Therefore, H = ρ M M + I ) s also block-dagonal. It follows that the nverse H 1 s a block-dagonal matrx where each block s the nverse of the correspondng block n H. Fnally, t s easy to verfy that the nverse of a block ρ 1 X + I X ) s gven by 1 ρ Update the multplers: I X 1 X +1 1 X ). δ c x ) δ c x ) + ρ µ c x ) µ x )) β c x c ) β c x c ) + ρ µ c x c ) µ c x c )) for all c C, : c, x for all c C, : c, x c

13 Bblography [1] M. Afonso, J. Boucas-Das, and M. Fgueredo. Fast mage recovery usng varable splttng and constraned optmzaton. Image Processng, IEEE Transactons on, 199): , sept [] D. P. Bertsekas and J. N. Tstskls. Parallel and dstrbuted computaton: numercal methods. Prentce-Hall, Inc., Upper Saddle Rver, NJ, USA, 003. [3] J. Duch, S. Shalev-Shwartz, Y. Snger, and T. Chandra. Effcent projectons onto the l1-ball for learnng n hgh dmensons. In Proceedngs of the 5th nternatonal conference on Machne learnng, pages 7 79, 008. [4] J. Ecksten and D. P. Bertsekas. On the douglas-rachford splttng method and the proxmal pont algorthm for maxmal monotone operators. Mathematcal Programmng, 55:93 318, June 199. [5] D. Gabay and B. Mercer. A dual algorthm for the soluton of nonlnear varatonal problems va fnte-element approxmatons. Computers and Mathematcs wth Applcatons, :17 40, [6] K. Gmpel and N. A. Smth. Softmax-margn crfs: tranng log-lnear models wth cost functons. In Human Language Technologes: The 010 Annual Conference of the North Amercan Chapter of the Assocaton for Computatonal Lngustcs, pages , 010. [7] A. Globerson and T. Jaakkola. Fxng max-product: Convergent message passng algorthms for MAP LP-relaxatons. In Advances n Neural Informaton Processng Systems, pages , 008. [8] R. Glownsk and A. Marrocco. Sur lapproxmaton, par elements fns dordre un, et la resoluton, par penalsaton-dualté, dune classe de problems de drchlet non lneares. Revue Françase d Automatque, Informatque, et Recherche Opératonelle, 9:4176, [9] D. Goldfarb, S. Ma, and K. Schenberg. Fast alternatng lnearzaton methods for mnmzng the sum of two convex functons. Techncal report, UCLA CAM, 010. [10] T. Hazan and A. Shashua. Norm-product belef propagaton: Prmal-dual message-passng for approxmate nference. Informaton Theory, IEEE Transactons on, 561): , Dec [11] B. S. He, H. Yang, and S. L. Wang. Alternatng drecton method wth selfadaptve penalty parameters for monotone varatonal nequaltes. Journal of Optmzaton Theory and Applcatons, 106: , 000. [1] J. Johnson. Convex Relaxaton Methods for Graphcal Models: Lagrangan and Maxmum Entropy Approaches. PhD thess, EECS, MIT, 008. [13] V. Jojc, S. Gould, and D. Koller. Fast and smooth: Accelerated dual decomposton for MAP nference. In Proceedngs of Internatonal Conference on Machne Learnng, 010. [14] V. Kolmogorov. Convergent tree-reweghted message passng for energy mnmzaton. Pattern Analyss and Machne Intellgence, IEEE Transactons on, 810): , 006.

14 14 Ofer Mesh and Amr Globerson [15] N. Komodaks and N. Paragos. Beyond loose LP-relaxatons: Optmzng MRFs by reparng cycles. In 10th European Conference on Computer Vson, pages , 008. [16] N. Komodaks, N. Paragos, and G. Tzrtas. Mrf energy mnmzaton and beyond va dual decomposton. Pattern Analyss and Machne Intellgence, IEEE Transactons on, 33:531 55, March 011. [17] A. F. T. Martns, M. A. T. Fgueredo, P. M. Q. Aguar, N. A. Smth, and E. P. Xng. An augmented lagrangan approach to constraned map nference. In Internatonal Conference on Machne Learnng, June 011. [18] O. Mesh, D. Sontag, T. Jaakkola, and A. Globerson. Learnng effcently wth approxmate nference va dual losses. In Proceedngs of the 7th Internatonal Conference on Machne Learnng, pages , 010. [19] Y. Nesterov. Smooth mnmzaton of non-smooth functons. Mathematcal Programmng, 103:17 15, 005. [0] P. Ravkumar, A. Agarwal, and M. Wanwrght. Message-passng for graphstructured lnear programs: proxmal projectons, convergence and roundng schemes. In Proc. of the 5th Internatonal Conference on Machne Learnng, pages , 008. [1] A. M. Rush, D. Sontag, M. Collns, and T. Jaakkola. On dual decomposton and lnear programmng relaxatons for natural language processng. In Proceedngs of the 010 Conference on Emprcal Methods n Natural Language Processng EMNLP), 010. [] M. I. Schlesnger. Syntactc analyss of two-dmensonal vsual sgnals n nosy condtons. Kbernetka, 4: , [3] D. Sontag, A. Globerson, and T. Jaakkola. Introducton to dual decomposton for nference. In S. Sra, S. Nowozn, and S. J. Wrght, edtors, Optmzaton for Machne Learnng. MIT Press, 011. [4] D. Sontag, T. Meltzer, A. Globerson, T. Jaakkola, and Y. Wess. Tghtenng LP relaxatons for MAP usng message passng. In Proc. of the 4th Annual Conference on Uncertanty n Artfcal Intellgence, pages , 008. [5] B. Taskar, C. Guestrn, and D. Koller. Max margn Markov networks. In S. Thrun, L. Saul, and B. Schölkopf, edtors, Advances n Neural Informaton Processng Systems 16, pages 5 3. MIT Press, Cambrdge, MA, 004. [6] S. Tosserams, L. Etman, P. Papalambros, and J. Rooda. An augmented lagrangan relaxaton for analytcal target cascadng usng the alternatng drecton method of multplers. Structural and Multdscplnary Optmzaton, 31: , 006. [7] M. J. Wanwrght and M. Jordan. Graphcal models, exponental famles, and varatonal nference. Foundatons and Trends n Machne Learnng, 11-):1 305, 008. [8] T. Werner. A lnear programmng approach to max-sum problem: A revew. Pattern Analyss and Machne Intellgence, IEEE Transactons on, 9: , 007. [9] C. Yanover, T. Meltzer, and Y. Wess. Lnear programmng relaxatons and belef propagaton an emprcal study. Journal of Machne Learnng Research, 7: , 006.

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008 Convexty A convex

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

On the Global Linear Convergence of the ADMM with Multi-Block Variables

On the Global Linear Convergence of the ADMM with Multi-Block Variables On the Global Lnear Convergence of the ADMM wth Mult-Block Varables Tany Ln Shqan Ma Shuzhong Zhang May 31, 01 Abstract The alternatng drecton method of multplers ADMM has been wdely used for solvng structured

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Tree Block Coordinate Descent for MAP in Graphical Models

Tree Block Coordinate Descent for MAP in Graphical Models ree Block Coordnate Descent for MAP n Graphcal Models Davd Sontag omm Jaakkola Computer Scence and Artfcal Intellgence Laboratory Massachusetts Insttute of echnology Cambrdge, MA 02139 Abstract A number

More information

An Interactive Optimisation Tool for Allocation Problems

An Interactive Optimisation Tool for Allocation Problems An Interactve Optmsaton ool for Allocaton Problems Fredr Bonäs, Joam Westerlund and apo Westerlund Process Desgn Laboratory, Faculty of echnology, Åbo Aadem Unversty, uru 20500, Fnland hs paper presents

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

A Hybrid Variational Iteration Method for Blasius Equation

A Hybrid Variational Iteration Method for Blasius Equation Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

arxiv:cs.cv/ Jun 2000

arxiv:cs.cv/ Jun 2000 Correlaton over Decomposed Sgnals: A Non-Lnear Approach to Fast and Effectve Sequences Comparson Lucano da Fontoura Costa arxv:cs.cv/0006040 28 Jun 2000 Cybernetc Vson Research Group IFSC Unversty of São

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

Lagrange Multipliers Kernel Trick

Lagrange Multipliers Kernel Trick Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Efficient Methods for Learning and Inference in Structured Output Prediction

Efficient Methods for Learning and Inference in Structured Output Prediction Effcent Methods for Learnng and Inference n Structured Output Predcton Thess submtted for the degree of Doctor of Phlosophy By Ofer Mesh Submtted to the Senate of the Hebrew Unversty of Jerusalem August

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Why BP Works STAT 232B

Why BP Works STAT 232B Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called

More information

Linear Approximation to ADMM for MAP inference

Linear Approximation to ADMM for MAP inference JMLR: Workshop and Conerence Proceedngs 9:48 6, ACML Lnear Approxmaton to ADMM or MAP nerence Sholeh Forouzan Alexander Ihler Department o Computer Scence Unversty o Calorna, Irvne Irvne, CA, 9697 SFOROUZA@ICS.UCI.EDU

More information

Combining Constraint Programming and Integer Programming

Combining Constraint Programming and Integer Programming Combnng Constrant Programmng and Integer Programmng GLOBAL CONSTRAINT OPTIMIZATION COMPONENT Specal Purpose Algorthm mn c T x +(x- 0 ) x( + ()) =1 x( - ()) =1 FILTERING ALGORITHM COST-BASED FILTERING ALGORITHM

More information

Introduction to Hidden Markov Models

Introduction to Hidden Markov Models Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

Probability-Theoretic Junction Trees

Probability-Theoretic Junction Trees Probablty-Theoretc Juncton Trees Payam Pakzad, (wth Venkat Anantharam, EECS Dept, U.C. Berkeley EPFL, ALGO/LMA Semnar 2/2/2004 Margnalzaton Problem Gven an arbtrary functon of many varables, fnd (some

More information

PHYS 705: Classical Mechanics. Calculus of Variations II

PHYS 705: Classical Mechanics. Calculus of Variations II 1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Least squares cubic splines without B-splines S.K. Lucas

Least squares cubic splines without B-splines S.K. Lucas Least squares cubc splnes wthout B-splnes S.K. Lucas School of Mathematcs and Statstcs, Unversty of South Australa, Mawson Lakes SA 595 e-mal: stephen.lucas@unsa.edu.au Submtted to the Gazette of the Australan

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

A New Evolutionary Computation Based Approach for Learning Bayesian Network

A New Evolutionary Computation Based Approach for Learning Bayesian Network Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 4026 4030 Advanced n Control Engneerng and Informaton Scence A New Evolutonary Computaton Based Approach for Learnng Bayesan Network Yungang

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Lecture 17: Lee-Sidford Barrier

Lecture 17: Lee-Sidford Barrier CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the

More information

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016 CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.

More information

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Power law and dimension of the maximum value for belief distribution with the max Deng entropy Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Convergent Propagation Algorithms via Oriented Trees

Convergent Propagation Algorithms via Oriented Trees Convergent Propagaton Algorthms va Orented Trees Amr Globerson CSAIL Massachusetts Insttute of Technology Cambrdge, MA 02139 Tomm Jaakkola CSAIL Massachusetts Insttute of Technology Cambrdge, MA 02139

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl RECURSIVE SPLINE INTERPOLATION METHOD FOR REAL TIME ENGINE CONTROL APPLICATIONS A. Stotsky Volvo Car Corporaton Engne Desgn and Development Dept. 97542, HA1N, SE- 405 31 Gothenburg Sweden. Emal: astotsky@volvocars.com

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information