An Alternating Direction Method for Dual MAP LP Relaxation
|
|
- Georgina Burke
- 6 years ago
- Views:
Transcription
1 An Alternatng Drecton Method for Dual MAP LP Relaxaton Ofer Mesh and Amr Globerson The School of Computer Scence and Engneerng, The Hebrew Unversty of Jerusalem, Jerusalem, Israel Abstract. Maxmum a-posteror MAP) estmaton s an mportant task n many applcatons of probablstc graphcal models. Although fndng an exact soluton s generally ntractable, approxmatons based on lnear programmng LP) relaxaton often provde good approxmate solutons. In ths paper we present an algorthm for solvng the LP relaxaton optmzaton problem. In order to overcome the lack of strct convexty, we apply an augmented Lagrangan method to the dual LP. The algorthm, based on the alternatng drecton method of multplers ADMM), s guaranteed to converge to the global optmum of the LP relaxaton objectve. Our expermental results show that ths algorthm s compettve wth other state-of-the-art algorthms for approxmate MAP estmaton. Keywords: Graphcal Models, Maxmum a-posteror, Approxmate Inference, LP Relaxaton, Augmented Lagrangan Methods 1 Introducton Graphcal models are wdely used to descrbe multvarate statstcs for dscrete varables, and have found wdespread applcatons n numerous domans. One of the basc nference tasks n such models s to fnd the maxmum a-posteror MAP) assgnment. Unfortunately, ths s typcally a hard computatonal problem whch cannot be solved exactly for many problems of nterest. It has turned out that lnear programmng LP) relaxatons provde effectve approxmatons to the MAP problem n many cases e.g., see [15, 1, 4]). Despte the theoretcal computatonal tractablty of MAP-LP relaxatons, solvng them n practce s a challenge for real world problems. Usng off-theshelf LP solvers s typcally nadequate for large models snce the resultng LPs have too many constrants and varables [9]. Ths has led researchers to seek optmzaton algorthms that are talored to the specfc structure of the MAP- LP [7, 13, 14, 16, 0, 8]. The advantage of such methods s that they work wth very smple local updates and are therefore easy to mplement n the large scale settng. The suggested algorthms fall nto several classes, dependng on ther approach to the problem. The TRW-S [14], MSD [8] and MPLP [7] algorthms
2 Ofer Mesh and Amr Globerson employ coordnate descent n the dual of the LP. Whle these methods typcally show good emprcal behavor, they are not guaranteed to reach the global optmum of the LP relaxaton. Ths s a result of non strct-convexty of the dual LP and the fact that block coordnate descent mght get stuck n suboptmal ponts under these condtons. One way to avod ths problem s to use a soft-max functon whch s smooth and strctly convex, hence ths results n globally convergent algorthms [6, 10, 1]. Another class of algorthms [13, 16] uses the same dual objectve, but employs varants of subgradent descent to t. Whle these methods are guaranteed to converge globally, they are typcally slower n practce than the coordnate descent ones e.g., see [13] for a comparson). Fnally, there are also algorthms that optmze the prmal LP drectly. One example s the proxmal pont method of Ravkumar et al. [0]. Whle also globally convergent, t has the dsadvantage of usng a double loop scheme where every update nvolves an teratve algorthm for projectng onto the local polytope. More recently, Martns et al. [17] proposed a globally convergent algorthm for MAP-LP based on the alternatng drecton method of multplers ADMM) [8, 5, 4, ]. Ths method proceeds by teratvely updatng prmal and dual varables n order to fnd a saddle pont of an augmented Lagrangan for the problem. They suggest to use an augmented Lagrangan of the prmal MAP-LP problem. However, ther formulaton s restrcted to bnary parwse factors and several specfc global factors. In ths work, we propose an algorthm that s based on the same key dea of ADMM, however t stems from augmentng the Lagrangan of the dual MAP-LP problem nstead. An mportant advantage of our approach s that the resultng algorthm can be appled to models wth general local factors non-parwse, non-bnary). We also show that n practce our algorthm converges much faster than the prmal ADMM algorthm and that t compares favorably wth other state-of-the-art methods for MAP-LP optmzaton. MAP and LP relaxaton Markov Random Felds MRFs) are probablstc graphcal models that encode the jont dstrbuton of a set of dscrete random varables X = {X 1,..., X n }. The jont probablty s defned by combnng a set C of local functons θ c x c ), termed factors. The factors depend only on small) subsets of the varables X c X ) and model the drect nteractons between them to smplfy notaton we drop the varable name n X c = x c ; see [7]). The jont dstrbuton s then gven by: P x) exp θ x ) + c C θ cx c ) ), where we have ncluded also sngleton factors over ndvdual varables [7]. In many applcatons of MRFs we are nterested n fndng the maxmum probablty assgnment MAP assgnment). Ths yelds the optmzaton problem: arg max x θ x ) + θ c x c ) c C Due to ts combnatoral nature, ths problem s NP-hard for general graphcal models, and tractable only n solated cases such as tree structured graphs. Ths has motvated research on approxmaton algorthms.
3 An Alternatng Drecton Method for Dual MAP LP Relaxaton 3 One of the most successful approxmaton schemes has been to use LP relaxatons of the MAP problem. In ths approach the orgnal combnatoral problem s posed as a LP and then some of the constrants are relaxed to obtan a tractable LP problem that approxmates the orgnal one. In our case, the resultng MAP-LP relaxaton problem s: max µ LG) µ x )θ x ) + µ c x c )θ c x c ) 1) x c x c where µ are auxlary varables that correspond to pseudo) margnal dstrbutons, and LG) s the reduced set of constrants called the local polytope [7], defned by: { } LG) = µ 0 x c\ µ c x c\, x ) = µ x ) c, : c, x x µ x ) = 1 In ths paper we use the dual problem of Eq. 1), whch takes the form: mn max θ x ) + ) δ c x ) + max θ c x c ) ) δ c x ) δ x x c c: c c : c ) where δ are dual varables correspondng to the margnalzaton constrants n LG) see [, 8, 3]). 1 Ths formulaton offers several advantages. Frst, t mnmzes an upper bound on the true MAP value. Second, t provdes an optmalty certfcate through the dualty gap w.r.t. a decoded prmal soluton [3]. Thrd, the resultng problem s unconstraned, whch facltates ts optmzaton. Indeed, several algorthms have been proposed for optmzng ths dual problem. The two man approaches are block coordnate descent [14, 8, 7] and subgradent descent [16], each wth ts advantages and dsadvantages. In partcular, coordnate descent algorthms are typcally much faster at mnmzng the dual, whle the subgradent method s guaranteed to converge to the global optmum see [3] for n-depth dscusson). Recently, Jojc et al. [13] presented an accelerated dual decomposton algorthm whch stems from addng strongly convex smoothng terms to the subproblems n the dual functon Eq. ). Ther method acheves a better convergence rate over the standard subgradent method O ) 1 ɛ vs. O 1 ) ɛ ). An alternatve approach, that s also globally convergent, has been recently suggested by Martns et al. [17]. Ther approach s based on an augmented Lagrangan method, whch we next dscuss. 3 The Alternatng Drecton Method of Multplers We now brefly revew ADMM for convex optmzaton [8, 5, 4, ]. 1 An equvalent optmzaton problem can be derved va a dual decomposton approach [3].
4 4 Ofer Mesh and Amr Globerson Consder the followng optmzaton problem: mnmze fx) + gz) s.t. Ax = z 3) where f and g are convex functons. The ADMM approach begns by addng the functon ρ Ax z to the above objectve, where ρ > 0 s a penalty parameter. Ths results n the optmzaton problem: mnmze fx) + gz) + ρ Ax z s.t. Ax = z 4) Clearly the above has the same optmum as Eq. 3) snce when the constrants Ax = z are satsfed, the added quadratc term equals zero. The Lagrangan of the augmented problem Eq. 4) s gven by: L ρ x, z, ν) = fx) + gz) + ν Ax z) + ρ Ax z 5) where ν s a vector of Lagrange multplers. The soluton to the problem of Eq. 4) s gven by max ν mn x,z L ρ x, z, ν). The ADMM method provdes an elegant algorthm for fndng ths saddle pont. The dea s to combne subgradent descent over ν wth coordnate descent over the x and z varables. The method apples the followng teratons: x t+1 = arg mn L ρ x, z t, ν t ) x z t+1 = arg mn L ρ x t+1, z, ν t ) z ν t+1 =ν t + ρ Ax t+1 z t+1) 6) The algorthm conssts of prmal and dual updates, where the prmal update s executed sequentally, mnmzng frst over x and then over z. Ths splt retans the decomposton of the objectve that has been lost due to the addton of the quadratc term. The algorthm s run ether untl the number of teratons exceeds a predefned lmt, or untl some termnaton crteron s met. A commonly used such stoppng crteron s: Ax z ɛ and z t+1 z t ɛ. These two condtons can serve to bound the suboptmalty of the soluton. The ADMM algorthm s guaranteed to converge to the global optmum of Eq. 3) under rather mld condtons []. However, n terms of convergence rate, the worst case complexty of ADMM s O 1 ɛ ). Despte ths potental caveat, ADMM has been shown to work well n practce e.g., [1, 6]). Recently, accelerated varants on the basc alternatng drecton method have been proposed [9]. These faster algorthms are based on lnearzaton and come wth mproved convergence rate of O 1 ɛ ), achevng the theoretcal lower bound for frst-order methods [19]. In ths paper we focus on the basc ADMM formulaton and leave dervaton of accelerated varants to future work.
5 An Alternatng Drecton Method for Dual MAP LP Relaxaton 5 4 The Augmented Dual LP Algorthm In ths secton we derve our algorthm by applyng ADMM to the dual MAP- LP problem of Eq. ). The challenge s to desgn the constrants n a way that facltates effcent closed-form solutons for all updates. To ths end, we duplcate the dual varables δ and denote the second copy by δ. We then ntroduce addtonal varables λ c correspondng to the summaton of δ s pertanng to factor c. These agreement constrants are enforced through δ, and thus we have a constrant δ c x ) = δ c x ) for all c, : c, x, and λ c x c ) = : c δ c x ) for all c, x c. Followng the ADMM framework, we add quadratc terms and obtan the augmented Lagrangan for the dual MAP-LP problem of Eq. ): L ρδ, λ, δ, γ, µ) = max θ x x ) + ) δ cx ) + max θ x cx c) λ cx c)) c c: c c + γ cx ) δ cx ) δ ) cx ) + ρ δcx ) δ ) cx ) c : c x c : c x + µ cx c) λ cx c) ) δ cx ) + ρ λ cx c) δ cx ) c x c : c c : c To see the relaton of ths formulaton to Eq. 5), notce that δ, λ) subsume the role of x, δ subsumes the role of z wth gz) = 0), and the multplers γ, µ) correspond to ν. The updates of our algorthm, whch stem from Eq. 6), are summarzed n Alg. 1 a detaled dervaton appears n Appendx A). In Alg. 1 we defne N) = {c : c}, and the subroutne w = TRIMv, d) that serves to clp the values n the vector v at some threshold t.e., w = mn{v, t}) such that the sum of removed parts equals d > 0.e., v w = d). Ths can be carred out effcently n lnear tme n expectaton) by parttonng [3]. Notce that all updates can be computed effcently so the cost of each teraton s smlar to that of message passng algorthms lke MPLP [7] or MSD [8], and to that of dual decomposton [13, 16]. Furthermore, sgnfcant speedup s attaned by cachng some results for future teratons. In partcular, the threshold n the TRIM subroutne the new maxmum) can serve as a good ntal guess at the next teraton, especally at later teratons where the change n varable values s qute small. Fnally, many of the updates can be executed n parallel. In partcular, the δ update can be carred out smultaneously for all varables, and lkewse all factors c can be updated smultaneously n the λ and δ updates. In addton, δ and λ can be optmzed ndependently, snce they appear n dfferent parts of the objectve. Ths may result n consderable reducton n runtme when executed on parallel archtecture. In our experments we used sequental updates. x c )
6 6 Ofer Mesh and Amr Globerson Algorthm 1 The Augmented Dual LP Algorthm ADLP) for t = 1 to T do Update δ: for all = 1,..., n Set θ = θ + c: c δ c 1 γc) ρ θ = TRIM θ, N) ) ρ q = θ θ )/ N) Update δ c = δ c 1 ρ γc q c : c Update λ: for all c C Set θ c = θ c : c δ c + 1 ρ µc θ c = TRIM θ c, 1 ρ ) Update λ c = θ c θ c Update δ: for all c C, : c, x Set v cx ) = δ cx ) + 1 γcx) + ρ x c\ λ cx c\, x ) + 1 ρ x c\ µ cx c\, x ) 1 v c = 1+ k:k c X c\k k:k c X c\k x k v ck x k ) Update δ cx ) = 1 1+ X c\ [ v cx ) j:j c,j X c\{,j} xj vcjxj) vc )] Update the multplers: γ cx ) γ cx ) + ρ δ cx ) δ ) cx ) for all c C, : c, x µ cx c) µ cx c) + ρ λ cx c) δ ) : c cx ) for all c C, x c end for 5 Expermental Results To evaluate our augmented dual LP ADLP) algorthm Alg. 1) we compare t to two other algorthms for fndng an approxmate MAP soluton. The frst s MPLP of Globerson and Jaakkola [7], whch mnmzes the dual LP of Eq. ) va block coordnate descent steps cast as message passng). The second s the accelerated dual decomposton ADD) algorthm of Jojc et al. [13]. 3 We conduct experments on proten desgn problems from the dataset of Yanover et al. [9]. In these problems we are gven a 3D structure and the goal s to fnd a sequence of amno-acds that s the most stable for that structure. The problems are modeled by sngleton and parwse factors and can be posed as fndng a MAP assgnment for the gven model. Ths s a demandng settng n whch each problem may have hundreds of varables wth 100 possble states on average [9, 4]. Fgure 1 shows two typcal examples of proten desgn problems. It plots the objectve of Eq. ) computed usng δ varables only) as a functon of the executon tme for all algorthms. Frst, n Fgure 1 left) we observe that the coordnate descent algorthm MPLP) converges faster than the other algorthms, 3 For both algorthms we used the same C++ mplementaton used by Jojc et al. [13], avalable at Our own algorthm was mplemented as an extenson of ther package.
7 An Alternatng Drecton Method for Dual MAP LP Relaxaton 7 Objectve jo8 MPLP ADD ε=1) ADLP ρ=0.05) Objectve ycc MPLP ADD ε=1) ADD ε=10) ADLP ρ=0.01) ADLP ρ=0.05) Runtme secs) Runtme secs) Fg. 1. Comparson of three algorthms for approxmate MAP estmaton: our augmented dual LP algorthm ADLP), accelerated dual decomposton algorthm ADD) by Jojc et al. [13], and the dual coordnate descent MPLP algorthm [7]. The fgure shows two examples of proten desgn problems, for each the dual objectve of Eq. ) s plotted as a functon of executon tme. Dashed lnes denote the value of the best decoded prmal soluton. however t tends to stop prematurely and yeld suboptmal solutons. In contrast, ADD and ADLP take longer to converge but acheve the globally optmal soluton to the approxmate objectve. Second, t can be seen that the convergence tmes of ADD and ADLP are very close, wth a slght advantage to ADD. The dashed lnes n Fgure 1 show the value of the decoded prmal soluton assgnment) [3]. We see that there s generally a correlaton between the qualty of the dual objectve and the decoded prmal soluton, namely the decoded prmal soluton mproves as the dual soluton approaches optmalty. Nevertheless, we note that there s no domnant algorthm n terms of decodng here we show examples where our decodng s superor). In many cases MPLP yelds better decoded solutons despte beng suboptmal n terms of the dual objectve not shown; ths s also noted n [13]). We also conduct experments to study the effect of the penalty parameter ρ. Our algorthm s guaranteed to globally converge for all ρ > 0, but ts choce affects the actual rate of convergence. In Fgure 1 rght) we compare two values of the penalty parameter ρ = 0.01 and ρ = It shows that settng ρ = 0.01 results n somewhat slower convergence to the optmum, however n ths case the fnal prmal soluton dashed lne) s better than that of the other algorthms. In practce, n order to choose an approprate ρ, one can run a few teratons of ADLP wth several values and see whch one acheves the best objectve [17]. We menton n passng that ADD employs an accuracy parameter ɛ whch determnes the desred suboptmalty of the fnal soluton [13]. Settng ɛ to a large value results n faster convergence to a lower accuracy soluton. On the one hand, ths trade-off can be vewed as a mert of ADD, whch allows to obtan coarser approxmatons at reduced cost. On the other hand, an advantage of our method s that the choce of penalty ρ affects only the rate of convergence and does not mpose addtonal reducton n soluton accuracy over that of the LP relaxaton. In Fgure 1 left) we use ɛ = 1, as n Jojc et al., whle n Fgure 1
8 8 Ofer Mesh and Amr Globerson Objectve a8 MPLP ADD ε=1) ADLP ρ=0.05) Objectve jo8 ADLP APLP Runtme secs) Runtme secs) Fg.. Left) Comparson for a sde chan predcton problem smlar to Fgure 1 left). Rght) Comparson of our augmented dual LP algorthm ADLP) and a generalzed varant APLP) of the ADMM algorthm by Martns et al. [17] on a proten desgn problem. The dual objectve of Eq. ) s plotted as a functon of executon tme. Dashed lnes denote the value of the best decoded prmal soluton. rght) we compare two values ɛ = 1 and ɛ = 10 to demonstrate the effect of ths accuracy parameter. We next compare performance of the algorthms on a sde chan predcton problem [9]. Ths problem s the nverse of the proten desgn problem, and nvolves fndng the 3D confguraton of rotamers gven the backbone structure of a proten. Fgure left) shows a comparson of MPLP, ADD and ADLP on one of the largest protens n the dataset 81 varables wth 1 states on average). As n the proten desgn problems, MPLP converges fast to a suboptmal soluton. We observe that here ADLP converges somewhat faster than ADD, possbly because the smaller state space results n faster ADLP updates. As noted earler, Martns et al. [17] recently presented an approach that apples ADMM to the prmal LP.e., Eq. 1)). Although ther method s lmted to bnary parwse factors and several global factors), t can be modfed to handle non-bnary hgher-order factors, as the dervaton n Appendx B shows. We denote ths varant by APLP. As n ADLP, n the APLP algorthm all updates are computed analytcally and executed effcently. Fgure rght) shows a comparson of ADLP and APLP on a proten desgn problem. It llustrates that ADLP converges sgnfcantly faster than APLP smlar results, not shown here, are obtaned for the other protens). 6 Dscusson Approxmate MAP nference methods based on LP relaxaton have drawn much attenton lately due to ther practcal success and attractve propertes. In ths paper we presented a novel globally convergent algorthm for approxmate MAP estmaton va LP relaxaton. Our algorthm s based on the augmented Lagrangan method for convex optmzaton, whch overcomes the lack of strct convexty by addng a quadratc term to smooth the objectve. Importantly, our algorthm proceeds by applyng smple to mplement closed-form updates, and
9 An Alternatng Drecton Method for Dual MAP LP Relaxaton 9 t s hghly scalable and parallelzable. We have shown emprcally that our algorthm compares favorably wth other state-of-the-art algorthms for approxmate MAP estmaton n terms of accuracy and convergence tme. Several exstng globally convergent algorthms for MAP-LP relaxaton rely on addng local entropy terms n order to smooth the objectve [6, 10, 1, 13]. Those methods must specfy a temperature control parameter whch affects the qualty of the soluton. Specfcally, solvng the optmzaton subproblems at hgh temperature reduces soluton accuracy, whle solvng them at low temperature mght rase numercal ssues. In contrast, our algorthm s qute nsenstve to the choce of such control parameters. In fact, the penalty parameter ρ affects the rate of convergence but not the accuracy or numercal stablty of the algorthm. Moreover, despte lack of fast convergence rate guarantees, n practce the algorthm has smlar or better convergence tmes compared to other globally convergent methods n varous settngs. Note that [17] also show an advantage of ther prmal based ADMM method over several baselnes. Several mprovements over our basc algorthm can be consdered. One such mprovement s to use smart ntalzaton of the varables. For example, snce MPLP acheves larger decrease n objectve at early teratons, t s possble to run t for a lmted number of steps and then take the resultng varables δ for the ntalzaton of ADLP. Notce, however, that for ths scheme to work well, the Lagrange multplers γ and µ should be also ntalzed accordngly. Another potental mprovement s to use an adaptve penalty parameter ρ t e.g., [11]). Ths may mprove convergence n practce, as well as reduce senstvty to the ntal choce of ρ. On the downsde, the theoretcal convergence guarantees of ADMM no longer hold n ths case. Martns et al. [17] show that the ADMM framework s also sutable for handlng certan types of global factors, whch nclude a large number of varables n ther scope e.g., XOR factor). Usng an approprate formulaton, t s possble to ncorporate such factors n our dual LP framework as well. 4 Fnally, t s lkely that our method can be further mproved by usng recently ntroduced accelerated varants of ADMM [9]. Snce these varants acheve asymptotcally better convergence rate, the applcaton of such methods to MAP-LP smlar to the one we presented here wll lkely result n faster algorthms for approxmate MAP estmaton. In ths paper, we assumed that the model parameters were gven. However, n many cases one wshes to learn these from data, for example by mnmzng a predcton loss e.g., hnge loss [5]). We have recently shown how to ncorporate dual relaxaton algorthms nto such learnng problems [18]. It wll be nterestng to apply our ADMM approach n ths settng to yeld an effcent learnng algorthm for structured predcton problems. Acknowledgments. We thank Am Wesel and Elad Eban for useful dscussons and comments on ths manuscrpt. We thank Stephen Gould for hs SVL code. Ofer Mesh s a recpent of the Google European Fellowshp n Machne Learnng, and ths research s supported n part by ths Google Fellowshp. 4 The auxlary varables λ c are not used n ths case.
10 10 Ofer Mesh and Amr Globerson A Dervaton of Augmented Dual LP Algorthm In ths secton we derve the ADMM updates for the augmented Lagrangan of the dual MAP-LP whch we restate here for convenence: L ρδ, λ, δ, γ, µ) = max θ x x ) + ) δ cx ) + max θ x cx c) λ cx c)) c c: c c + γ cx ) δ cx ) δ ) cx ) + ρ δcx ) δ ) cx ) c : c x c : c x + µ cx c) λ cx c) ) δ cx ) + ρ λ cx c) δ cx ) c x c : c : c c x c ) Updates: The δ update: For each varable = 1,..., n consder a block δ whch conssts of δ c for all c : c. For ths block we need to mnmze the followng functon: ) max x θ x ) + c: c δ cx ) + c: c x γ cx )δ cx )+ ρ c: c x δcx ) δ cx ) ) Equvalently, ths can be wrtten more compactly n vector notaton as: 1 mn δ δ δ 1 ρ γ ) δ + 1 ρ max θ x ) + δ c x )) x c: c where δ and γ are defned analogous to δ. The closed-form soluton to ths QP s gven by the update n Alg. 1. It s obtaned by nspectng KKT condtons and explotng the structure of the summaton nsde the max for a smlar dervaton see [3]). The λ update: For each factor c C we seek to mnmze the functon: max x c θ c x c ) λ c x c )) + x c µ c x c )λ c x c ) + ρ In equvalent vector notaton we have the problem: x c λ c x c ) : c δ c x ) ) 1 mn λ c λ c δ c 1 ρ µ c λ c + 1 ρ max θ c x c ) λ c x c )) x c : c Ths QP s very smlar to that of the δ update and can be solved usng the same technque. The resultng closed-form update s gven n Alg. 1. )
11 An Alternatng Drecton Method for Dual MAP LP Relaxaton 11 The δ update: For each c C we consder a block whch conssts of δ c for all : c. We seek a mnmzer of the functon: γ c x ) δ c x ) + ρ δc x ) δ c x ) ) : c x : c x µ c x c ) δ c x ) + ρ λ c x c ) ) δ c x ) : c : c x c x c Takng partal dervatve w.r.t. δ c x ) and settng to 0 yelds: 1 δ c x ) = v c x ) X c\{,j} δcj x j ) 1 + X c\ x j j:j c,j where: v c x ) = δ c x ) + 1 ρ γ cx ) + x c\ λ c x c\, x ) + 1 ρ x c\ µ c x c\, x ). Summng ths over x and : c and pluggng back n, we get the update n Alg. 1. Fnally, the multplers update s straghtforward. B Dervaton of Augmented Prmal LP Algorthm We next derve the algorthm for optmzng Eq. 1) wth general local factors. Consder the followng formulaton whch s equvalent to the prmal MAP-LP problem of Eq. 1). Defne: { x f µ ) = µ x )θ x ) µ x ) 0 and µ x x ) = 1 otherwse { µ xc cx c )θ c x c ) µ c x c ) 0 and µ xc cx c ) = 1 f c µ c ) = otherwse f accounts for the non-negatvty and normalzaton constrants n LG). We add the margnalzaton constrants va copes of µ c for each c, denoted by µ c. Thus we get the augmented Lagrangan: L ρµ, µ, δ, β) = f µ ) + f cµ c) c δ cx ) µ cx ) µ x )) ρ µ cx ) µ x )) c : c x c : c x β cx c) µ cx c) µ cx c)) ρ µ cx c) µ cx c)) c x c c x c : c : c
12 1 Ofer Mesh and Amr Globerson where µ c x ) = x c\ µ c x c\, x ). To draw the connecton wth Eq. 5), n ths formulaton µ subsumes the role of x, µ subsumes the role of z wth gz) = 0), and the multplers δ, β) correspond to ν. We next show the updates whch result from applyng Eq. 6) to ths formulaton. Update µ for all = 1,..., n: µ arg max µ µ θ + c: c δ c + ρm µ c ) ) 1 µ ρ N) I)µ where M µ c = x c\ µ c x c\, ). We have to maxmze ths QP under smplex constrants on µ. Notce that the objectve matrx s dagonal, so ths can be solved n closed form by shftng the target vector and then truncatng at 0 such that the sum of postve elements equals 1 see [3]). The soluton can be computed n lnear tme n expectaton) by parttonng [3]. Update µ c for all c C: µ c arg max µ c c µ c θ c + : c β c + ρ µ c ) ) 1 µ c ρ Nc) I)µ c where Nc) = { : c}. Agan we have a projecton onto the smplex wth dagonal objectve matrx, whch can be done effcently. Update µ c for all c C, : c: µ c arg max µ ) c M ρ ρµ δ c ) β c + ρµ c µ c µ c M M + I ) µ c Here we have an unconstraned QP, so the soluton s obtaned by H 1 v. Further notce that the nverse H 1 can be computed n closed form. To see how, M M s a block-dagonal matrx wth blocks of ones wth sze X. Therefore, H = ρ M M + I ) s also block-dagonal. It follows that the nverse H 1 s a block-dagonal matrx where each block s the nverse of the correspondng block n H. Fnally, t s easy to verfy that the nverse of a block ρ 1 X + I X ) s gven by 1 ρ Update the multplers: I X 1 X +1 1 X ). δ c x ) δ c x ) + ρ µ c x ) µ x )) β c x c ) β c x c ) + ρ µ c x c ) µ c x c )) for all c C, : c, x for all c C, : c, x c
13 Bblography [1] M. Afonso, J. Boucas-Das, and M. Fgueredo. Fast mage recovery usng varable splttng and constraned optmzaton. Image Processng, IEEE Transactons on, 199): , sept [] D. P. Bertsekas and J. N. Tstskls. Parallel and dstrbuted computaton: numercal methods. Prentce-Hall, Inc., Upper Saddle Rver, NJ, USA, 003. [3] J. Duch, S. Shalev-Shwartz, Y. Snger, and T. Chandra. Effcent projectons onto the l1-ball for learnng n hgh dmensons. In Proceedngs of the 5th nternatonal conference on Machne learnng, pages 7 79, 008. [4] J. Ecksten and D. P. Bertsekas. On the douglas-rachford splttng method and the proxmal pont algorthm for maxmal monotone operators. Mathematcal Programmng, 55:93 318, June 199. [5] D. Gabay and B. Mercer. A dual algorthm for the soluton of nonlnear varatonal problems va fnte-element approxmatons. Computers and Mathematcs wth Applcatons, :17 40, [6] K. Gmpel and N. A. Smth. Softmax-margn crfs: tranng log-lnear models wth cost functons. In Human Language Technologes: The 010 Annual Conference of the North Amercan Chapter of the Assocaton for Computatonal Lngustcs, pages , 010. [7] A. Globerson and T. Jaakkola. Fxng max-product: Convergent message passng algorthms for MAP LP-relaxatons. In Advances n Neural Informaton Processng Systems, pages , 008. [8] R. Glownsk and A. Marrocco. Sur lapproxmaton, par elements fns dordre un, et la resoluton, par penalsaton-dualté, dune classe de problems de drchlet non lneares. Revue Françase d Automatque, Informatque, et Recherche Opératonelle, 9:4176, [9] D. Goldfarb, S. Ma, and K. Schenberg. Fast alternatng lnearzaton methods for mnmzng the sum of two convex functons. Techncal report, UCLA CAM, 010. [10] T. Hazan and A. Shashua. Norm-product belef propagaton: Prmal-dual message-passng for approxmate nference. Informaton Theory, IEEE Transactons on, 561): , Dec [11] B. S. He, H. Yang, and S. L. Wang. Alternatng drecton method wth selfadaptve penalty parameters for monotone varatonal nequaltes. Journal of Optmzaton Theory and Applcatons, 106: , 000. [1] J. Johnson. Convex Relaxaton Methods for Graphcal Models: Lagrangan and Maxmum Entropy Approaches. PhD thess, EECS, MIT, 008. [13] V. Jojc, S. Gould, and D. Koller. Fast and smooth: Accelerated dual decomposton for MAP nference. In Proceedngs of Internatonal Conference on Machne Learnng, 010. [14] V. Kolmogorov. Convergent tree-reweghted message passng for energy mnmzaton. Pattern Analyss and Machne Intellgence, IEEE Transactons on, 810): , 006.
14 14 Ofer Mesh and Amr Globerson [15] N. Komodaks and N. Paragos. Beyond loose LP-relaxatons: Optmzng MRFs by reparng cycles. In 10th European Conference on Computer Vson, pages , 008. [16] N. Komodaks, N. Paragos, and G. Tzrtas. Mrf energy mnmzaton and beyond va dual decomposton. Pattern Analyss and Machne Intellgence, IEEE Transactons on, 33:531 55, March 011. [17] A. F. T. Martns, M. A. T. Fgueredo, P. M. Q. Aguar, N. A. Smth, and E. P. Xng. An augmented lagrangan approach to constraned map nference. In Internatonal Conference on Machne Learnng, June 011. [18] O. Mesh, D. Sontag, T. Jaakkola, and A. Globerson. Learnng effcently wth approxmate nference va dual losses. In Proceedngs of the 7th Internatonal Conference on Machne Learnng, pages , 010. [19] Y. Nesterov. Smooth mnmzaton of non-smooth functons. Mathematcal Programmng, 103:17 15, 005. [0] P. Ravkumar, A. Agarwal, and M. Wanwrght. Message-passng for graphstructured lnear programs: proxmal projectons, convergence and roundng schemes. In Proc. of the 5th Internatonal Conference on Machne Learnng, pages , 008. [1] A. M. Rush, D. Sontag, M. Collns, and T. Jaakkola. On dual decomposton and lnear programmng relaxatons for natural language processng. In Proceedngs of the 010 Conference on Emprcal Methods n Natural Language Processng EMNLP), 010. [] M. I. Schlesnger. Syntactc analyss of two-dmensonal vsual sgnals n nosy condtons. Kbernetka, 4: , [3] D. Sontag, A. Globerson, and T. Jaakkola. Introducton to dual decomposton for nference. In S. Sra, S. Nowozn, and S. J. Wrght, edtors, Optmzaton for Machne Learnng. MIT Press, 011. [4] D. Sontag, T. Meltzer, A. Globerson, T. Jaakkola, and Y. Wess. Tghtenng LP relaxatons for MAP usng message passng. In Proc. of the 4th Annual Conference on Uncertanty n Artfcal Intellgence, pages , 008. [5] B. Taskar, C. Guestrn, and D. Koller. Max margn Markov networks. In S. Thrun, L. Saul, and B. Schölkopf, edtors, Advances n Neural Informaton Processng Systems 16, pages 5 3. MIT Press, Cambrdge, MA, 004. [6] S. Tosserams, L. Etman, P. Papalambros, and J. Rooda. An augmented lagrangan relaxaton for analytcal target cascadng usng the alternatng drecton method of multplers. Structural and Multdscplnary Optmzaton, 31: , 006. [7] M. J. Wanwrght and M. Jordan. Graphcal models, exponental famles, and varatonal nference. Foundatons and Trends n Machne Learnng, 11-):1 305, 008. [8] T. Werner. A lnear programmng approach to max-sum problem: A revew. Pattern Analyss and Machne Intellgence, IEEE Transactons on, 9: , 007. [9] C. Yanover, T. Meltzer, and Y. Wess. Lnear programmng relaxatons and belef propagaton an emprcal study. Journal of Machne Learnng Research, 7: , 006.
Probabilistic & Unsupervised Learning
Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008 Convexty A convex
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationOn the Global Linear Convergence of the ADMM with Multi-Block Variables
On the Global Lnear Convergence of the ADMM wth Mult-Block Varables Tany Ln Shqan Ma Shuzhong Zhang May 31, 01 Abstract The alternatng drecton method of multplers ADMM has been wdely used for solvng structured
More informationThe Study of Teaching-learning-based Optimization Algorithm
Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationOn the Multicriteria Integer Network Flow Problem
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationTree Block Coordinate Descent for MAP in Graphical Models
ree Block Coordnate Descent for MAP n Graphcal Models Davd Sontag omm Jaakkola Computer Scence and Artfcal Intellgence Laboratory Massachusetts Insttute of echnology Cambrdge, MA 02139 Abstract A number
More informationAn Interactive Optimisation Tool for Allocation Problems
An Interactve Optmsaton ool for Allocaton Problems Fredr Bonäs, Joam Westerlund and apo Westerlund Process Desgn Laboratory, Faculty of echnology, Åbo Aadem Unversty, uru 20500, Fnland hs paper presents
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationA Hybrid Variational Iteration Method for Blasius Equation
Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationarxiv:cs.cv/ Jun 2000
Correlaton over Decomposed Sgnals: A Non-Lnear Approach to Fast and Effectve Sequences Comparson Lucano da Fontoura Costa arxv:cs.cv/0006040 28 Jun 2000 Cybernetc Vson Research Group IFSC Unversty of São
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationVQ widely used in coding speech, image, and video
at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationComputing Correlated Equilibria in Multi-Player Games
Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,
More informationLagrange Multipliers Kernel Trick
Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationEfficient Methods for Learning and Inference in Structured Output Prediction
Effcent Methods for Learnng and Inference n Structured Output Predcton Thess submtted for the degree of Doctor of Phlosophy By Ofer Mesh Submtted to the Senate of the Hebrew Unversty of Jerusalem August
More informationAPPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14
APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationNON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS
IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationBayesian predictive Configural Frequency Analysis
Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationWhy BP Works STAT 232B
Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called
More informationLinear Approximation to ADMM for MAP inference
JMLR: Workshop and Conerence Proceedngs 9:48 6, ACML Lnear Approxmaton to ADMM or MAP nerence Sholeh Forouzan Alexander Ihler Department o Computer Scence Unversty o Calorna, Irvne Irvne, CA, 9697 SFOROUZA@ICS.UCI.EDU
More informationCombining Constraint Programming and Integer Programming
Combnng Constrant Programmng and Integer Programmng GLOBAL CONSTRAINT OPTIMIZATION COMPONENT Specal Purpose Algorthm mn c T x +(x- 0 ) x( + ()) =1 x( - ()) =1 FILTERING ALGORITHM COST-BASED FILTERING ALGORITHM
More informationIntroduction to Hidden Markov Models
Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationEstimation: Part 2. Chapter GREG estimation
Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the
More informationOn a direct solver for linear least squares problems
ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationProbability-Theoretic Junction Trees
Probablty-Theoretc Juncton Trees Payam Pakzad, (wth Venkat Anantharam, EECS Dept, U.C. Berkeley EPFL, ALGO/LMA Semnar 2/2/2004 Margnalzaton Problem Gven an arbtrary functon of many varables, fnd (some
More informationPHYS 705: Classical Mechanics. Calculus of Variations II
1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationLeast squares cubic splines without B-splines S.K. Lucas
Least squares cubc splnes wthout B-splnes S.K. Lucas School of Mathematcs and Statstcs, Unversty of South Australa, Mawson Lakes SA 595 e-mal: stephen.lucas@unsa.edu.au Submtted to the Gazette of the Australan
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationA New Evolutionary Computation Based Approach for Learning Bayesian Network
Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 4026 4030 Advanced n Control Engneerng and Informaton Scence A New Evolutonary Computaton Based Approach for Learnng Bayesan Network Yungang
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors
Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationLecture 17: Lee-Sidford Barrier
CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the
More informationCS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016
CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.
More informationPower law and dimension of the maximum value for belief distribution with the max Deng entropy
Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng
More information4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA
4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationReport on Image warping
Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.
More information6.854J / J Advanced Algorithms Fall 2008
MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationConvergent Propagation Algorithms via Oriented Trees
Convergent Propagaton Algorthms va Orented Trees Amr Globerson CSAIL Massachusetts Insttute of Technology Cambrdge, MA 02139 Tomm Jaakkola CSAIL Massachusetts Insttute of Technology Cambrdge, MA 02139
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationSuppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl
RECURSIVE SPLINE INTERPOLATION METHOD FOR REAL TIME ENGINE CONTROL APPLICATIONS A. Stotsky Volvo Car Corporaton Engne Desgn and Development Dept. 97542, HA1N, SE- 405 31 Gothenburg Sweden. Emal: astotsky@volvocars.com
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More information