Convergent Propagation Algorithms via Oriented Trees

Size: px
Start display at page:

Download "Convergent Propagation Algorithms via Oriented Trees"

Transcription

1 Convergent Propagaton Algorthms va Orented Trees Amr Globerson CSAIL Massachusetts Insttute of Technology Cambrdge, MA Tomm Jaakkola CSAIL Massachusetts Insttute of Technology Cambrdge, MA Abstract Inference problems n graphcal models are often approxmated by castng them as constraned optmzaton problems. Message passng algorthms, such as belef propagaton, have prevously been suggested as methods for solvng these optmzaton problems. However, there are few convergence guarantees for such algorthms, and the algorthms are therefore not guaranteed to solve the correspondng optmzaton problem. Here we present an orented tree decomposton algorthm that s guaranteed to converge to the global optmum of the Tree-Reweghted (TRW) varatonal problem. Our algorthm performs local updates n the convex dual of the TRW problem an unconstraned generalzed geometrc program. Prmal updates, also local, correspond to orented reparametrzaton operatons that leave the dstrbuton ntact. 1 Introducton The problem of probablstc nference n graphcal models refers to the task of calculatng margnal dstrbutons or the most lkely assgnment varables. Both these problems are generally NP hard, requrng approxmate methods. Many approxmate nference methods, ncludng message passng algorthms, can be vewed as tryng to solve a varatonal formulaton of the nference problem. The dea n varatonal approaches s to cast approxmate nference as a constraned mnmzaton of a free energy functon (see [14] for a recent revew). Two key questons arse n ths context. The frst s how to choose the free energy, and the second s how to desgn effcent algorthms that mnmze t. When the Bethe free energy s used, t has been shown [16] that fxed-ponts of the belef propagaton (BP) algorthm correspond to local mnma of the free energy. However, BP s not generally guaranteed to converge to a fxed-pont. Although there do exst algorthms that are guaranteed to converge to a local mnmum of the Bethe free energy [15, 17], ts global mnmzaton s stll a hard non-convex problem for whch no effcent algorthms are known. The dffcultes wth the Bethe free energy derve from ts non-convexty and correspondng local mnma problem. To avod ths dffculty, several authors have recently studed convex free energes [6, 7, 13]. The assocated convex optmzaton problems can n prncple be solved usng generc convex optmzaton procedures [1] wth guarantees of fndng the global optmum n polynomal tme. Although ths presents a sgnfcant mprovement over the non-convex case, the generc optmzaton route may be very costly n large practcal problems. For example, when usng a generc convex solver, every update of the varables has complexty O(n), where n s the number of varables. In contrast, the optmzaton usng message passng algorthms can be reduced to local updates wth O(1) operatons. Interestngly, even n the convex settng, the convergence of these message passng algorthms s typcally not guaranteed, and dampng heurstcs are requred to ensure convergence n practce [13]. A promnent excepton s [7] where the author provdes a provably convergent message passng algorthm for free energes where the entropy term s a non-negatve combnaton of jont entropes. Here we provde a provably convergent message passng algorthm for a specfc varatonal setup, namely the Tree-Reweghted (TRW) optmzaton problem of Wanwrght et al. [13]. The algorthm we propose s guaranteed to converge to the global optmum of the free energy, and does not requre addtonal parameters such as the dampng rato. A key step n obtanng the updates s dervng the convex-dual of TRW, whch we show to be an unconstraned nstance of a

2 generalzed geometrc program (GP) [3]. We derve a message passng algorthm, whch we call TRW Geometrc Programmng (TRW-GP), that yelds monotone mprovement of the dual GP. We demonstrate the utlty of our TRW-GP algorthm by provdng an example where the TRW message passng algorthm n [13] does not converge, but TRW-GP does. 2 The Tree-Reweghtng Formulaton We consder parwse Markov random felds (MRF) over a set of varables x = x 1,..., x n. Gven a graph G wth n vertces V and a set of edges E, an MRF s a dstrbuton over x defned by p(x; θ) = 1 Z(θ) ep j E θj(x P,x j)+ V θ(x) (1) where θ j (x, x j ) and θ (x ) are parameters, θ denotes all the parameters, and Z(θ) s the partton functon. Our focus here s on approxmatng sngleton margnals of p(x; θ), namely p(x ; θ). Ths problem s closely related to that of evaluatng the partton functon Z(θ). We focus on the TRW varatonal problem whch yelds an upper bound on Z(θ) as well as a set of approxmate margnals obtaned from the mnmzng soluton. We begn by brefly revewng the TRW formalsm. Consder a set of k spannng trees on G denoted by T 1,..., T k, and a dstrbuton ρ over these trees where ρ 0 and ρ = 1. To avod overloadng notaton n subsequent analyss, we assume here that the trees are drected, so that the same tree structure may appear multple tmes wth dfferent edge orentatons. Ths dffers from the presentaton n [13] though the dstncton s mmateral n the remander of ths secton. We also ntroduce the noton of pseudomargnals defned as the sngleton and parwse margnals µ (x ), µ j (x, x j ) assocated wth the edges and nodes of G. We use µ to denote the set of all these margnals and C(G) the set of µ s that are parwse consstent µ j (x, x j ) = µ (x ), µ j (x, x j ) = µ j (x j ) x j x µ (x ) = 1, µ j (x, x j ) 0. x For a gven tree T and µ C(G), defne the entropy H(µ; T ) to be the entropy of an MRF on the tree T wth margnals gven by µ. Note that only a subset of the parwse dstrbutons n µ wll be used for each tree, namely µ j (x, x j ) such that j s an edge n T. The tree entropy may be wrtten n closed form as (cf. [13]) H(µ; T ) = H(X ) I(X ; X j ) (2) j T where H(X ) s the entropy of µ (x ) and I(X ; X j ) s the mutual nformaton calculated from µ j (x, x j ). Note that ths expresson s ndependent of the drecton of the edges n the tree. We wll make use of the drected edges n the next secton. Defne the followng varatonal free energy functon F(µ; ρ, θ) F(µ; ρ, θ) = µ θ k ρ H(µ; T ). (3) =1 In [13] t s shown that mnmzng F(µ; ρ, θ) results n an upper bound on the log-partton functon log Z(θ) mn F(µ; ρ, θ). (4) µ C(G) The mnmzaton also results n an optmal (mnmzng) µ, whch s used to approxmate the margnals of p(x; θ). Emprcal results n [13] show that TRW usually performs as well as, and often better than the standard Bethe free energy approxmatons, especally n regmes where BP fals to converge. 3 Condtonal Entropes and Drected Edge Probabltes Our goal s to use convex dualty to obtan the dual problem of Eq. (4). To acheve ths, we frst seek a representaton of F(µ; ρ, θ) that s a convex functon of µ for all values of µ, and not just wthn the consstent set µ C(G). For example, the entropy term n Eq. (2) s concave only for µ C(G) but not for a general µ. We therefore seek an alternatve expresson for the tree entropy. Let r(t ) be the root node of T (recall that the trees are drected). We wrte the entropy assocated wth the tree as H(µ; T ) = H(X r(t ) ) + H(X X j ) (5) j T where j T mples that there s a drected edge from vertex j to vertex n the drected tree T. The condtonal entropy H(X X j ) s assumed to be calculated only on the bass of the jont margnal µ j (x, x j ), and does not nvolve µ (x ). The entropy H(X r(t ) ) s calculated va the sngleton margnal µ r(t ) (x r(t ) ). The expressons n Eq. (5) and Eq. (2) wll agree whenever µ C(G). However, they wll yeld dfferent results when µ / C(G). The advantage of Eq. (5) s that H(µ; T ) s now a concave functon of the set of margnals µ. The concavty follows mmedately from the concavty of H(X ) as a functon of µ (x ) and the concavty of the condtonal entropy H(X X j ) as a functon of µ j (x, x j ) [6].

3 The functon F(µ; ρ, θ) nvolves a summaton over a potentally large number of tree entropes. To express ths compactly whle mantanng drectonalty, we defne ρ j as the probablty that the drected edge j s present n a tree drawn accordng to the dstrbuton ρ over trees. Smlarly, we defne ρ as the probablty that node appears as a root. We note that t s possble to fnd such edge probabltes for dstrbutons (e.g. unform) over the set of all spannng trees by employng a varant of the matrx tree theorem for drected trees (see [12] p. 141 and [11]). The functon F(µ; ρ, θ) can now be wrtten as µ θ V ρ H(X ) ρ j H(X X j ) (6) j Ē where the edge set Ē contans edges n both drectons. In other words, f j Ē then j s also n Ē. The new functon F(µ; ρ, θ) s convex n µ wthout assumng consstency of the margnals. 4 The TRW Convex Dual The TRW prmal problem s gven by mn F(µ; ρ, θ). (7) µ C(G) Snce the functon F(µ; ρ, θ) s now convex for all µ and the set of constrants s lnear, ths optmzaton problem s convex and thus has an equvalent convex dual [1]. 1 However, t s not mmedately clear how to derve ths dual n closed form. The man dffculty s that two terms n the objectve F(µ; ρ, θ) depend on µ j (x, x j ), namely H(X X j ) and H(X j X ). To get around ths problem we ntroduce addtonal varables to the prmal problem. Specfcally, we replace µ j (x, x j ) by two copes whch we denote by µ j (x, x j ) and µ j (x, x j ), and requre that these two copes are dentcal. The entropy H(X X j ) s then evaluated va the varables µ j (x, x j ). We shall also fnd t convenent to replace the consstency constrants n C(G) by the followng equvalent drected consstency constrants µ j (x, x j ) = µ j (x, x j ) µ j (x, x j ) = µ (x ), µ j (x, x j ) = µ j (x j ) x j x µ (x ) = 1, µ j (x, x j ) 0, µ j (x, x j ) 0. x For smplcty we wll contnue to denote the new extended varable set by µ (as we wll be usng t from 1 Strct dualty follows from Slater s condtons, whch are satsfed n ths case. now on) and refer to the consstency constrants by C(G). The TRW prmal problem s then P T RW : mn F(µ; ρ, θ). µ C(G) (8) The convex dual of P T RW s derved n App. A. and s n fact a convex unconstraned mnmzaton problem. In what follows we descrbe ths dual. The dual varables wll be denoted by β j (x, x j ) for j E, and are not constraned. 2 The dual objectve s gven by F D (β; ρ, θ) = ρ log x e ρ 1 (θ(x) P k N () λ k (x ;β)) where λ j (x ; β) s a functon of the β varables: λ j (x ; β) = ρ j log x j and δ j s defned as δ j = { 1 j E 1 j E. e ρ 1 j (θj(x,xj)+δ j β j(x,x j)) The dual TRW optmzaton problem s then DT RW : mn β F D (β; ρ, θ). (9) We re-emphasze the fact the DTRW s an unconstraned mnmzaton of a functon of β. The varables λ j (x ; β) are ntroduced merely for the purpose of notatonal convenence. The mappng between dual and prmal varables can be shown to be µ (x ) e ρ 1 (θ(x) P k N () λ k (x ;β)) µ j (x j x ) e ρ 1 j (θj(x,xj)+δ j β j(x,x j)). (10) Ths relaton maps the optmal β to the optmal µ, but we shall also use t for non-optmal values. The dual objectve F D (β; ρ, θ) s a convex functon (see App. B) and therefore has no local mnma. 5 Dual Gradent and Optmum The DTRW problem presented above s unconstraned and can thus be solved usng a varety of gradent based algorthms, such as conjugate gradent or BFGS [10]. The gradent of F D (β; ρ, θ) w.r.t. β s F D (β; ρ, θ) β j (x, x j ) = µ j(x x j )µ j (x j ) µ j (x j x )µ (x ) where the dstrbutons are gven by the dual to prmal mappng n Eq. (10). The gradent s thus a measure 2 Note that β varables are not drected,.e., there s one varable β j per edge.

4 of the dscrepancy between two ways of calculatng the jont parwse margnal, based on the two dfferent orentatons of the edge j. To characterze the optmum of DTRW we set the gradent to zero, yeldng the followng smple dual optmalty crteron µ j (x x j )µ j (x j ) = µ j (x j x )µ (x ). (11) Thus at the optmum the two alternatve ways of estmatng µ j (x, x j ) wll yeld the same result. Calculatng the gradent w.r.t a gven β j (x, x j ) has complexty O(1), and reles only on β j (x, x j ) for edges contanng or j. Thus the gradent can be calculated locally, and gradent descent algorthms can be mplemented effcently. One drawback of gradent based algorthms s ther relance on lne-search modules for fndng a step sze that decreases the objectve. In the next secton we consder updates that are parameter-free. 6 Local Margnal Updates The gradent updates descrbed n the prevous secton use the dfference between two jont dstrbutons. We wll now focus on updates relyng on the rato between these dstrbutons. Consder β t+1 j (x, x j ) = β t j(x, x j )+ɛ log µt j (x j x )µ t (x ) µ t j (x x j )µ t j (x j) (12) where µ t j (x j x ) and µ t (x ) are functons of β as n Eq. (10) and ɛ s a step sze whose value wll be dscussed n the next secton. As a rato of two expected values, the update s remnscent of Generalzed Iteratve Scalng [5]. We shall assume for smplcty that only one edge s updated at each tme step t. The update n Eq. (12) s performed on the β varables. An equvalent, and somewhat smpler update may be derved n terms of the varables µ t j (x x j ) and µ t j (x j). The resultng updates and algorthm are descrbed n Fgure 1. We call the resultng algorthm TRW-GP (TRW Geometrc Programmng). 6.1 Convergence Proof To analyze the convergence of the update n the prevous secton, we need to consder the resultng change n the objectve F D (β; ρ, θ), namely F D (β t ; ρ, θ) F D (β t+1 ; ρ, θ). It can be shown (see App. D) that ths dfference only depends on the µ varables n the TRW-GP algorthm, and thus we denote t by D (µ t ). Snce F D (β t ; ρ, θ) should be mnmzed, ths dfference needs to be non-negatve. Ths s ndeed guaranteed by the followng lemma (see App. D): Lemma 6.1 : For 0 < ɛ < mn(ρ, ρ j, ρ j, ρ j ) the dual objectve s decreased at every teraton so that D (µ t ) 0 for all t. Furthermore, D (µ t ) = 0 holds f and only f the optmum condton of Eq. (11) s satsfed. Any choce of ɛ that s smaller than mn(ρ, ρ j, ρ j, ρ j ) wll result n monotone mprovement of the objectve. In the current mplementaton we use ɛ = 1 2 mn(ρ, ρ j, ρ j, ρ j ). Ths value turns out to mnmze a frst order approxmaton of the mprovement n the objectve, and was found to work well n practce. The convergence to the global optmum now follows from Lemma 6.1. Lemma 6.2 : The updates n Eq. (12) wth ɛ as n Lemma 6.1 converge to the jont optmum of PTRW and DTRW. Proof: Denote the mappng from µ t to µ t+1 by R(µ t ) = µ t+1. The mappng s clearly contnuous. By Lemma 6.1 the sequence F D (β t ; ρ, θ) s monotoncally decreasng. It s also bounded snce F D (β; ρ, θ) s bounded and thus the dfference seres D (µ t ) converges to zero. Takng t to nfnty then mples that µ t has a convergent subsequence that converges to some µ. Ths µ wll then satsfy F D (µ ; ρ, θ) = F D (R(µ ); ρ, θ). We know from the Lemma 6.1 that such a pont necessarly satsfes the zero gradent condton n Eq. (11), and thus µ (or more precsely, the correspondng β) mnmzes the dual objectve. 3 7 Tree Re-parametrzaton Vew The TRW problem can be nterpreted n terms of teratng through dfferent re-parametrzatons of the dstrbuton p(x; θ) [13]. Here we present a related vew of our algorthm. We wsh to show that the margnal varables obtaned by the algorthm can always be used to obtan the orgnal dstrbuton va p(x; θ) = c t µ t (x ) ρ µ t j (x j x ) ρ j. (13) j Ē For t = 0 ths s clearly true. We proceed by nducton. Assume that at teraton t we have a reparametrzaton wth constant c t. Substtutng the update rule n Fgure 1 and usng smple algebra shows that we agan have a reparametrzaton, only wth c t+1 = c t e FD(βt+1 ;ρ,θ) F D(β t ;ρ,θ) = c t e D(µt). 3 To carefully account for the possblty that some of the convergng margnals would nvolve zero probabltes, the updates n the prmal form, along wth the objectve, can be wrtten n a form wthout any ratos.

5 Inputs: A graph G = (E, V ), parameter vector θ on G, root probabltes ρ and drected edge probabltes ρ j for (j), (j) E. Intalzaton: Set µ 0 (x ) e ρ 1 θ(x) and µ 0 j (x x j ) e ρ 1 j θj(x,xj) Algorthm: Iterate untl small enough change n margnals: Set ɛ = 1 2 mn(ρ, ρ j, ρ j, ρ j ), and update ( µ t+1 (x ) µ t (x ) µ t j (x j x ) x j µ t+1 j (x x j ) µ t j (x x j ) 1 ɛρ 1 j ( ) ɛρ µ t j (x xj)µt j (xj) µ t j (xj x)µt (x) ( ) ɛρ 1 µ t j (xj x)µt (x) j µ t j (xj) 1) ρj ρ 1 j Output: Fnal values of margnals. Fgure 1: The TRW-GP algorthm expressed n terms of condtonal and sngleton margnals. In other words the multplcatve constant turns out to be related to the mprovement n the dual functon. Ths creates an nterestng lnk between reparametrzaton and mnmzaton, and may be used to study message passng algorthms where a dual s more dffcult to characterze. 8 Relaton to Prevous Work Heskes [7] recently presented a detaled study of convex free energes. When the entropy term s a postve combnaton of jont and sngleton entropes (and s therefore concave), he provdes a local update algorthm that s monotone n the convex dual, and converges to the global optmum. He then dscusses the applcaton of the same algorthm to the case where the sngleton entropes all have negatve weght, and the overall entropy s convex over the set of constrants. 4 In ths case, the dual s generally not gven n closed form and t s not known f the algorthm decreases t at every step. However, Heskes argues that wth suffcent dampng the algorthm can be shown to converge, although the exact form of dampng s not gven. Snce the TRW entropy can be shown to decompose nto postvely weghted parwse entropes and negatvely weghted sngleton entropes, t satsfes the above condton n Heskes work. Our analyss provdes several advantages over the algorthm n [7]. Frst, we derve a closed form soluton of dual. Second, the dual s unconstraned, and thus allows unconstraned mnmzaton methods to be appled. Thrd, unlke most belef propagaton varants, our algorthm 4 The dscusson n [7] s n terms of general regons, not just pars. We present hs argument for the smpler parwse case. s shown to provde a monotone mprovement of an objectve functon 5, and thus dverges from the standard fxed pont analyss used n message passng algorthms. Fnally, another algorthm whch s guaranteed to converge to a global mnmum of convexfed free energes s the double loop CCCP algorthm of Yulle [17]. The man dsadvantage of CCCP s that each teraton requres solvng an optmzaton problem. Ths usually results n slower convergence, and furthermore t s not clear what precson s requred for the nner loop optmzaton, and how ths affects convergence guarantees. The algorthm we present here s essentally a sngle loop method, and s thus easer to analyze. 9 Emprcal Demonstraton The orgnal TRW message passng (TRW-MP) algorthm presented n [13] s not generally guaranteed to converge. However, we observed emprcally that when dampng of α = 0.5 s appled to the log-messages, convergence s always acheved. 6 To compare TRW- MP to TRW-GP, we use the pseudomargnals generated by TRW-MP 7 as margnals n the prmal objectve F(µ; ρ, θ) n Eq. (3). Ths value s not expected to be an upper or lower bound on the optmum of F(µ; ρ, θ), snce the TRW-MP pseudomargnals are 5 As mentoned above, Heskes presents such an algorthm for postvely weghted sngleton and parwse entropes. It s however not clear that such entropes are useful n practce 6 Ths observaton s n lne wth Heskes argument that suffcently damped messages wll converge for the case of the TRW free energy. 7 See Equatons (58) and (59) n [13].

6 Prmal Dual Value TRW GP TRW MP TRW MP(damped) Iteraton Prmal Dual Value TRW GP TRW MP TRW MP(damped) Iteraton Fgure 2: Illustraton of the dual message passng algorthm for a Isng model. The TRW-GP curve shows the dual objectve value F D(β; ρ, θ) obtaned by the TRW-GP algorthm. The TRW-MP curves show the prmal objectve values F(µ; ρ, θ) obtaned by TRW message passng algorthms. The damped TRW-MP used a dampng of 0.5 n the log doman. The MRF parameters were set as follows: α F = 1, α I = 9 for the left fgure, and α F = 1, α I = 1 for the rght fgure. not guaranteed to be parwse consstent, except at the optmum. However, snce the TRW-MP pseudomargnals converge to the optmal prmal margnals, the value F(µ; ρ, θ) wll converge to the prmal optmum. The progress of TRW-GP may be montored by evaluatng F D (β; ρ, θ) at every teraton. Ths value s guaranteed to decrease and converge to the optmum of F D (β; ρ, θ) whch s dentcal to the optmum of F(µ; ρ, θ). We can thus observe the rate at whch the dfferent algorthms converge to ther jont optmum. To study the convergence rate of the two algorthms, we used an Isng model on a grd wth nteracton parameters θ j drawn unformly from [ α I, α I ] and feld parameters θ drawn unformly from [ α F, α F ]. The MRF s gven by p(x; θ) e P j E θjxxj+p V θx where x {+1, 1}. We used a unform dstrbuton over drected spannng trees calculated as n [11]. Fgure 2 (left) shows an example run where the undamped TRW-MP algorthm does not converge, but the TRW-GP and the damped TRW-MP do converge, and do so roughly at the same rate. Fgure 2 (rght) shows an example where both TRW-MP algorthms converge and do so at a faster rate than TRW-GP. We expermented wth varous values of α F and α I and have observed that at lower nteracton levels (e.g., α I 4 for α F = 1) the TRW-MP algorthms outperform TRW-GP, whereas for hgher nteracton levels the undamped TRW-MP does not converge, but the damped verson converges at roughly the same rate as TRW-GP. We also expermented wth conjugate gradent mnmzaton of F D (β; ρ, θ), but these dd not yeld better rates than TRW-GP. 10 Conclusons We presented a novel message passng algorthm whose updates yeld a monotone mprovement on the dual of the TRW free energy mnmzaton problem. In order to obtan a closed form dual we used two trcks. The frst was to decouple dfferent entropes that depend on the same margnals by ntroducng multple copes of these margnals. The second was to use un-drectonal consstency constrants, so that every copy of a jont margnal appears n a sngle consstency constrant. Although we presented the method n the context of tree decompostons, the algorthm tself stll apples as long as ρ j and ρ are non-negatve (although the upper bound on the log partton functon may not be guaranteed n ths case). The TRW-GP algorthm resolves the convergence problems wth the undamped TRW-MP algorthm. However, we observed emprcally that the damped TRW-MP algorthm always converges, and typcally at a better rate than TRW-GP. Thus, the man contrbuton of the current paper s n ntroducng a dual framework for message passng algorthms, whch could be used to analyze exstng algorthms, and possbly develop faster varants n the future. Free energes may be defned usng margnals of more than two varables [13, 16]. In a recent paper [6] we study the relaton between such free energes and GP. It wll be worthwhle to study generalzatons of TRW-GP to ths case. Another nterestng extenson s to the MAP problem, where the correspondng varatonal problem s a lnear program. Global convergence results for MAP message passng algorthms such as max-product are also hard to obtan n the general case. It turns out that an approach smlar to the one presented here may be used to obtaned convergent algorthms to solve the MAP lnear program. These algorthms wll be presented elsewhere. References [1] D. P. Bertsekas. Nonlnear Programmng. Athena Scentfc, Belmont, MA, [2] S. Boyd and L. Vandenberghe. Convex Optmzaton. Cambrdge Unv. Press, [3] M. Chang. Geometrc programmng for communcaton systems. Foundatons and Trends n Communcatons and Informaton Theory, 2(1):1 154, [4] M. Chang and S. Boyd. Geometrc programmng duals of channel capacty and rate dstorton. IEEE Trans. on Informaton Theory, 50(2): , [5] J.N. Darroch and D. Ratclff. Generalzed teratve scalng for log-lnear models. Ann. Math. Statst., 43(5): , 1972.

7 [6] A. Globerson and T. Jaakkola. Approxmate nference usng condtonal entropy decompostons. In AISTATS, [7] T. Heskes. Convexty arguments for effcent mnmzaton of the Bethe and Kkuch free energes. Journal of Artfcal Intellgence Research, 26: , [8] E.L. Peterson R.J. Duffn and C. Zener. Geometrc programmng. Wley, [9] L. Vandenberghe S. Boyd, S.J. Km and A. Hassb. A tutoral on geometrc programmng. Optmzaton and Engneerng, [10] F. Sha and F. Perera. Shallow parsng wth condtonal random felds. In Proc. HLT NAACL, [11] X. Carreras T. Koo, A. Globerson and M. Collns. Structured predcton models va the matrx-tree theorem. In EMNLP, [12] W. Tutte. Graph Theory. Addson-Wesley, [13] M. J. Wanwrght, T. Jaakkola, and A. S. Wllsky. A new class of upper bounds on the log partton functon. IEEE Trans. on Informaton Theory, 51(7): , [14] M.J. Wanwrght and M.I. Jordan. Graphcal models, exponental famles, and varatonal nference. Techncal report, UC Berkeley Dept. of Statstcs, [15] M. Wellng and Y.W. Teh. Belef optmzaton for bnary networks: A stable alternatve to loopy belef propagaton. In Uncertanty n Artfcal Intellgence, [16] J.S. Yedda, W.T. Freeman, and Y. Wess. Constructng free-energy approxmatons and generalzed belef propagaton algorthms. IEEE Trans. on Informaton Theory, 51(7): , [17] A. L. Yulle. CCCP algorthms to mnmze the Bethe and Kkuch free energes: Convergent alternatves to belef propagaton. Neural Computaton, 14(7): , A Dervng the TRW Dual Our goal s to show that the problems n Eq. (8) and Eq. (9) are convex duals of each other. Frst, we clam that the convex dual of the PTRW problem n Eq. (8) s gven by DT RW C mn ρ log e ρ 1 (θ P (x ) k N () λ k (x )) x s.t. e ρ 1 j (θj(x,xj)+δ j β j(x,x j)+λ j (x )) 1. x j The varables n the above problem are λ j (x ), λ j (x j ) and β j (x, x j ) for every edge j E. The dualty between PTRW and DTRWC results from the dualty between condtonal entropy maxmzaton and geometrc programs, and appears n several works n slghtly dfferent forms [3, 8]. A dervaton of the dualty result can be found n [2] (page 256) and [4]. It s mportant to note that the dual can be found n ths case because the objectve s a sum of condtonal entropes (and sngleton entropes) as n Eq. (6). It s not clear how to derve a dual f tree entropes are expressed va mutual nformaton as n Eq. (2). Due to complementary slackness condtons, the nequalty n the constrants of DTRWC wll hold wth equalty at the optmum ff the optmal prmal varables satsfy µ (x ) > 0. In App. C we show that for the current objectve, ths wll always happen,.e., µ (x ) > 0 for all and x. We thus conclude that all the nequalty constrants n DTRWC are always satsfed as equaltes at the optmum. We therefore lose nothng by replacng them wth equalty constrants x j e ρ 1 j (θj(x,xj)+δ j β j(x,x j)+λ j (x )) = 1. (14) Snce each varable λ j (x ) appears n only one constrant, we can elmnate t by expressng t as a functon of the β varables λ j (x ; β) = ρ j log x j e ρ 1 j (θj(x,xj)+δ j β j(x,x j)). Snce the λ j (x ) varables have been elmnated and the equalty constrants are satsfed, optmzaton s now only over the β varables, yeldng the DTRW problem n Eq. (9) B mn β ρ log x e ρ 1 (θ (x ) P k N () λ k (x ;β)). Convexty of the Dual Here we argue that the functon F D (β; ρ, θ) s a convex functon of β. We frst defne the class of posynomal functons as functons of the form [9] f(x 1,..., x n ) = K k=1 c k x a 1k 1 x a 2k 2,..., x a1n n (15) where c k > 0. A functon f(x 1,..., x n ) s sad to be a generalzed posynomal f t s ether a posynomal or t can be formed from generalzed posynomals usng the operatons of addton, multplcaton, postve power, maxmum and composton. A key property of generalzed posynomals s that they can be turned nto a convex functon by a smple change of varables. Specfcally, f f(x) s a generalzed posynomal, then F (y) = log f(e y ) s a convex functon of y [9].

8 It s easy to see that f j (x ; e β ) = e λ j (x ;β) s a generalzed posynomal n e β (snce t s a postve power of a posynomal). The functon g (e β ) = x e ρ 1 (θ (x ) P k N () λ k (x ;β)) = e ρ 1 θ(x) x k N() f k (x ; e β ) ρ 1 s then also a generalzed posynomal. Therefore log g (e β ) s a convex functon of β. Snce F D (β; ρ, θ) = ρ log g (e β ), t follows that F D (β; ρ, θ) s a convex functon of β. C Strct Postvty of TRW Margnals Here we want to show that the soluton of the prmal problem must satsfy µ (x ) > 0. To do so, we employ an alternatve formulaton of TRW [13]. Assgn a parameter vector θ T to every tree n the set of trees, and denote by Z(θ T ) the partton functon of an MRF on tree T wth parameters Z(θ T ). TRW can then be cast as mn T ρ T log Z(θ T ) s.t. T ρ T θ T = θ. (16) At the optmum, all tree dstrbutons p(x; θ T ) can be shown to have the same sngleton margnals µ (x ), and these correspond to the margnals that solve PTRW. The optmzaton above can be rewrtten as mn T ρ T D KL [p(x; θ) p(x; θ T )] s.t. T ρ T θ T = θ. (17) The objectve n Eq. (16) can be obtaned from that n Eq. (17) by expandng the D KL and usng the fact that the constrants hold. The two objectves then dffer by a constant log Z(θ) Assume that at the optmum of PTRW, there exst and x such that µ (x ) = 0. Then the above argument mples that all trees dstrbutons wll have ths zero margnal. However, n that case there wll be an assgnment x such that p(x; θ T ) = 0. On the other hand, for any fnte θ the true dstrbuton p(x; θ) wll be strctly greater than zero. The D KL wll then be nfnte, mplyng the parameters are not optmal and resultng n a contradcton. 8 D Monotoncty of Updates Assume we perform an update on the µ varables correspondng to an edge j E (.e., update 8 We note that the same argument can be appled to show that the optmal parwse margnals µ j(x, x j) are never zero. µ j (x, x j ), µ j (x, x j ), µ (x ) and µ j (x j )) and that ɛ < mn(ρ, ρ, ρ j, ρ j ). The resultng dfference n objectve value can be wrtten as D (µ t ) = f + f j where f = ρ log x µ t (x )e ρ 1 (λt j (x) λt+1 j (x)). (18) The λ t j (x ) λ t+1 j (x ) dfference can be wrtten n terms of µ t as ρ j log x j µ t (x j x ) 1 ɛρ 1 j ( ) µ t (x x j)µ t ɛρ 1 (x j) j µ t (x ) = ρ j log x j e (1 ɛρ 1 j ) log µt (x j x )+ɛρ 1 j log µt (x x j )µ t (x j ) µ t (x ) Snce 0 < ɛ < ρ j we can use the convexty of the log exp functon and the fact that x j µ t j (x j x ) = 1 to obtan λ t j (x ) λ t+1 j (x ) ɛ log x j µ t j (x x j )µ t j (x j) µ t (x ) Note that snce log exp s strctly convex, equalty here s acheved f and only f µ t (x j x ) = µ t (x x j)µ t (x j) µ t (x ), whch mples the optmum condton n Eq. (11) s satsfed. Substtutng ths n the expresson for f n Eq. (18) and rearrangng we have f ρ log x (µ t (x ) 1 ɛ ρ xj µ t j (x x j )µ t j(x j ) The above expresson s of the form log x p(x ) 1 η q(x ) η where p(x ), q(x ) are dstrbutons over x. Defne a dstrbuton r(x ) p(x ) 1 η q(x ) η. Smple algebra then yelds log x p(x ) 1 η q(x ) η = (1 η)d KL [r p] + ηd KL [r q] where D KL s the KL dvergence, and s non-negatve. Here η = ɛ ρ and thus 0 < η < 1 and the above weghted sum of the two D KL dvergences s always non-negatve. It follows that f 0 wth equalty f and only f the condton n Eq. (11) s satsfed. A smlar argument shows that f j 0 f wth equalty ff Eq. (11) s satsfed. The result for the non-negatvty of D (µ t ) then follows mmedately.. ɛ ρ

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008 Convexty A convex

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Approximate inference using conditional entropy decompositions

Approximate inference using conditional entropy decompositions Approxmate nference usng condtonal entropy decompostons Amr Globerson, Tomm Jaakkola Computer Scence and Artfcal Intellgence Laboratory MIT Cambrdge, MA 239 Abstract We ntroduce a novel method for estmatng

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

PHYS 705: Classical Mechanics. Calculus of Variations II

PHYS 705: Classical Mechanics. Calculus of Variations II 1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Why BP Works STAT 232B

Why BP Works STAT 232B Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Expectation propagation

Expectation propagation Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mt.edu 6.854J / 18.415J Advanced Algorthms Fall 2008 For nformaton about ctng these materals or our Terms of Use, vst: http://ocw.mt.edu/terms. 18.415/6.854 Advanced Algorthms

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Affine transformations and convexity

Affine transformations and convexity Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016 CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.

More information

Lagrange Multipliers Kernel Trick

Lagrange Multipliers Kernel Trick Lagrange Multplers Kernel Trck Ncholas Ruozz Unversty of Texas at Dallas Based roughly on the sldes of Davd Sontag General Optmzaton A mathematcal detour, we ll come back to SVMs soon! subject to: f x

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Tree Block Coordinate Descent for MAP in Graphical Models

Tree Block Coordinate Descent for MAP in Graphical Models ree Block Coordnate Descent for MAP n Graphcal Models Davd Sontag omm Jaakkola Computer Scence and Artfcal Intellgence Laboratory Massachusetts Insttute of echnology Cambrdge, MA 02139 Abstract A number

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

MA 323 Geometric Modelling Course Notes: Day 13 Bezier Curves & Bernstein Polynomials

MA 323 Geometric Modelling Course Notes: Day 13 Bezier Curves & Bernstein Polynomials MA 323 Geometrc Modellng Course Notes: Day 13 Bezer Curves & Bernsten Polynomals Davd L. Fnn Over the past few days, we have looked at de Casteljau s algorthm for generatng a polynomal curve, and we have

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

Edge Isoperimetric Inequalities

Edge Isoperimetric Inequalities November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

Convex Optimization. Optimality conditions. (EE227BT: UC Berkeley) Lecture 9 (Optimality; Conic duality) 9/25/14. Laurent El Ghaoui.

Convex Optimization. Optimality conditions. (EE227BT: UC Berkeley) Lecture 9 (Optimality; Conic duality) 9/25/14. Laurent El Ghaoui. Convex Optmzaton (EE227BT: UC Berkeley) Lecture 9 (Optmalty; Conc dualty) 9/25/14 Laurent El Ghaou Organsatonal Mdterm: 10/7/14 (1.5 hours, n class, double-sded cheat sheet allowed) Project: Intal proposal

More information

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis A Appendx for Causal Interacton n Factoral Experments: Applcaton to Conjont Analyss Mathematcal Appendx: Proofs of Theorems A. Lemmas Below, we descrbe all the lemmas, whch are used to prove the man theorems

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan. THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY Wllam A. Pearlman 2002 References: S. Armoto - IEEE Trans. Inform. Thy., Jan. 1972 R. Blahut - IEEE Trans. Inform. Thy., July 1972 Recall

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Min Cut, Fast Cut, Polynomial Identities

Min Cut, Fast Cut, Polynomial Identities Randomzed Algorthms, Summer 016 Mn Cut, Fast Cut, Polynomal Identtes Instructor: Thomas Kesselhem and Kurt Mehlhorn 1 Mn Cuts n Graphs Lecture (5 pages) Throughout ths secton, G = (V, E) s a mult-graph.

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS BOUNDEDNESS OF THE IESZ TANSFOM WITH MATIX A WEIGHTS Introducton Let L = L ( n, be the functon space wth norm (ˆ f L = f(x C dx d < For a d d matrx valued functon W : wth W (x postve sem-defnte for all

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Approximate Smallest Enclosing Balls

Approximate Smallest Enclosing Balls Chapter 5 Approxmate Smallest Enclosng Balls 5. Boundng Volumes A boundng volume for a set S R d s a superset of S wth a smple shape, for example a box, a ball, or an ellpsod. Fgure 5.: Boundng boxes Q(P

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

Temperature. Chapter Heat Engine

Temperature. Chapter Heat Engine Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

One-sided finite-difference approximations suitable for use with Richardson extrapolation

One-sided finite-difference approximations suitable for use with Richardson extrapolation Journal of Computatonal Physcs 219 (2006) 13 20 Short note One-sded fnte-dfference approxmatons sutable for use wth Rchardson extrapolaton Kumar Rahul, S.N. Bhattacharyya * Department of Mechancal Engneerng,

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information