Duality and Auxiliary Functions for Bregman Distances

Size: px

Start display at page:

Download "Duality and Auxiliary Functions for Bregman Distances"

Douglas Lee
5 years ago
Views:

1 Dualty and Auxlary Functons for Bregman Dstances Stephen Della Petra, Vncent Della Petra, and John Lafferty October 8, 2001 CMU-CS School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA Abstract We formulate and prove a convex dualty theorem for Bregman dstances and present a technque based on auxlary functons for dervng and provng convergence of teratve algorthms to mnmze Bregman dstance subject to lnear constrants. Ths research was partally supported by the Advanced Research and Development Actvty n Informaton Technology (ARDA), contract number MDA C-2106, and by the Natonal Scence Foundaton (NSF), grant CCR The vews and conclusons contaned n ths document are those of the authors and should not be nterpreted as representng the offcal polces, ether expressed or mpled, of ARDA, NSF, or the U.S. government.

2 Keywords: Bregman dstance, convex dualty, Legendre functons, auxlary functons

3 I. Introducton Convexty plays a central role n a wde varety of machne learnng and statstcal nference problems. A standard paradgm s to dstngush a preferred member from a set of canddates based upon a convex mpurty measure or loss functon talored to the specfc problem to be solved. Examples nclude least squares regresson, decson trees, boostng, onlne learnng, maxmum lkelhood for exponental models, logstc regresson, maxmum entropy, and support vector machnes. Such problems can often be naturally cast as convex optmzaton problems nvolvng a Bregman dstance, whch can lead to new algorthms, analytcal tools, and nsghts derved from the powerful methods of convex analyss. In ths paper we formulate and prove a convex dualty theorem for mnmzng a general class of Bregman dstances subject to lnear constrants. The dualty result s then used to derve teratve algorthms for solvng the assocated optmzaton problem. Our presentaton s motvated by the recent work of Collns, Schapre, and Snger (2001), who showed how certan boostng algorthms and maxmum lkelhood logstc regresson can be unfed wthn the framework of Bregman dstances. In partcular, specfc nstances of the results gven here are used by Collns et al. (2001) to show the convergence of a famly teratve algorthms for mnmzng the exponental or logstc loss. Whle nvokng methods from convex analyss can unfy and clarfy the relatonshp between dfferent methods, the hgher level of abstracton often comes at a prce, snce there can be consderable techncaltes. For example, n some treatments the assumptons on the convex functons that can be used to defne Bregman dstances are very techncal and dffcult to verfy. Here we trade off generalty for relatve smplcty by workng wth a restrcted class of Bregman dstances, whch however ncludes many of the examples that arse n machne learnng. Our treatment of dualty and auxlary functons for Bregman dstances closely parallels the results presented by Della Petra et al. (1997) for the Kullback-Lebler dvergence. In partcular, the statement and proof of the dualty theorem gven n (Della Petra et al., 1997) carres over wth only a few changes to the class of Bregman dstances we consder. Our approach dffers from much of the lterature n convex analyss n several ways. Frst, we work prmarly wth the argument at whch a convex conjugate takes on ts value, rather than the value of the functon tself. The reason for ths s that the argument corresponds to a statstcal model, whch s the man object of nterest n statstcal or machne learnng applcatons, whle the value corresponds to a lkelhood or loss functon. Second, whle Bregman dstances are typcally defned only on the nteror of the doman of the underlyng convex functon, we assume that there s a contnuous extenson to the entre doman. Ths makes t possble to formulate a very natural dualty theorem that also ncludes many cases requred n practce, when the desred model may le on the boundary of the doman. The followng secton recalls the standard defntons from convex analyss that wll be requred, and presents the techncal assumptons made on the class of Bregman dstances that we work wth. We also ntroduce some new termnology, usng the terms Legendre-Bregman conjugate and Legendre- Bregman projecton to extend the classcal noton of the Legendre conjugate and transform to Bregman dstances. Secton 3 contans the statement and proof of the dualty theorem that connects the prmal problem wth ts dual, showng that the soluton s characterzed n geometrcal terms 1

4 by a Pythagorean equalty. Secton 4 defnes the noton of an auxlary functon, whch s used to construct teratve algorthms for solvng constraned optmzaton problems. Ths secton shows how convexty can be used to derve an auxlary functon for Bregman dstances based on separable functons. The last secton summarzes the man results of the paper. II. Bregman Dstances and Legendre-Bregman Projectons In ths secton we begn by establshng our notaton and recallng the relevant notons from convex analyss that we requre; the classc text (Rockafellar, 1970) remans one of the best references for ths materal. We then defne Bregman dstances and ther assocated conjugate functons and projectons, and derve varous relatons between these that wll be mportant n provng the dualty theorem. Next we state our assumptons on the underlyng convex functon that enable us to derve these propertes for the contnous extenson of the Bregman dstance to the entre doman. A. Notaton and Basc Defntons We wll use notaton that s suggestve of our man applcatons: rather than φ(x) we wll wrte φ(p) or φ(q), havng n mnd probablty dstrbutons p or q. A convex functon φ : S R m [, + ] s proper f there s no q S wth φ(q) = and there s some q wth φ(q). The effectve doman of φ, denoted φ, s the set of ponts where φ s fnte: φ = {q S φ(q) < }; for brevty we usually refer to φ as smply the doman of φ. A proper convex functon s closed f t s lower sem-contnuous. The conjugate φ of φ s gven by φ (v) = sup ( q, v φ(q)) (2.1) q S A proper convex functon φ s sad to be essentally smooth or steep f t s dfferentable on the nteror of ts doman nt( φ ), and f lm n φ(q n ) = + whenever q n s a sequence n nt( φ ) convergng to a pont on the boundary of nt( φ ). The functon φ s sad to be coercve n case the level set {q S φ(q) c} s bounded for every c R. Defnton 2.1. Let φ be a closed, convex and proper functon defned on S R m, such that φ s dfferentable on nt( φ ). The Bregman dstance D φ : φ nt( φ ) [0, ) s defned by D φ (p, q) = φ(p) φ(q) φ(q), p q (2.2) Bregman dstance can be nterpreted as a measure of the convexty of φ. Ths s easy to vsualze n the one-dmensonal case: drawng a tangent lne to the graph of φ at q, the Bregman dstance D φ (p, q) s seen as the vertcal dstance between ths lne and the pont φ(p). Legendre functons are a very well behaved famly of convex functons that wll make workng wth Bregman dstances much easer. Defnton 2.2. A closed convex functon φ s Legendre, or a convex functon of Legendre type, n case nt( φ ) s convex and φ s essentally smooth and strctly convex on nt( φ ). 2

5 The prmary propertes that make workng wth Legendre functons convenent are summarzed n the followng results quoted from (Rockafellar, 1970). Proposton 2.3. (Rockafellar, 1970; Theorem 26.5) If φ s a convex functon of Legendre type then φ : nt( φ ) nt( φ ) s a bjecton, contnuous n both drectons, and φ = ( φ) 1. Proposton 2.4. Suppose that φ s Legendre, and ψ s a proper closed, convex functon that s also essentally smooth. Then φ + ψ s Legendre. In partcular, snce for fxed q nt( φ ) the mappng p φ(q), p q + φ(q) s affne lnear, the functon p D φ (p, q) s Legendre wth doman φ, and wth conjugate doman φ φ(q). Defnton 2.5. For φ a convex functon of Legendre type we defne the Legendre-Bregman conjugate l φ : nt( φ ) R m R { } as l φ (q, v) = sup p φ ( v, p D φ (p, q)) (2.3) We defne the Legendre-Bregman projecton L φ : nt( φ ) R m φ as whenever ths s well defned. L φ (q, v) = arg max p φ ( v, p D φ (p, q)) (2.4) Let us explan our choce of termnology n the above defnton, whch s nonstandard. When h s a convex functon of Legendre type, the Legendre conjugate, as defned n (Rockafellar 1970; Chapter 26), corresponds to the convex conjugate h. For fxed q, our defnton of the Legendre- Bregman conjugate s smply the classcal Legendre conjugate for the convex functon h(p) = D φ (p, q). Note that Rockafellar defnes the Legendre transform as the mappng from the orgnal convex functon (and doman) to ts Legendre conjugate (and assocated doman). The Legendre- Bregman projecton L φ (q, v) s the actual argument at whch the maxmum s attaned. As shown by the followng result, our use of the term projecton accords wth the standard termnology of Bregman projectons. Proposton 2.6. Let φ be Legendre. Then for q nt( φ ) and v nt( φ ) φ(q), the Legendre-Bregman projecton s gven explctly by Moreover, t can be wrtten as a Bregman projecton L φ (q, v) = ( φ ) ( φ(q) + v) (2.5) L φ (q, v) = arg mn D φ (p, q) (2.6) p φ H for the hyperplane H = {p R m p, v = b} wth b = L φ (q, v), v. 3

6 Proof. satsfy: To prove the frst statement, note that a statonary pont p of p, v D φ (p, q) must Snce φ s Legendre, for φ(q) + v nt( φ ), we then have that φ(p ) = φ(q) + v (2.7) p = ( φ) 1 ( φ(q) + v) (2.8) = ( φ )( φ(q) + v) (2.9) usng the fact that ( φ) 1 s well defned and equal to φ from Proposton 2.3. The second statement follows from, for example, the results n Secton 2.2 of Censor and Zenos (1997) on projectons onto hyperplanes, notng that every Legendre functon s zone consstent. In our formulaton of the dualty theorem, t s the Legendre-Bregman projecton L φ (q, v) rather than the conjugate functon l φ (q, v) that plays a central role, leadng to a natural and smple statement of convex dualty for Bregman dstances. Ths projecton corresponds to a statstcal model, whch s the prmary object of nterest for machne learnng problems. B. Basc Relatons We now derve some basc algebrac relatons between D φ, L φ, and l φ. These relatons wll be mportant n establshng the geometrcal aspects of the dualty theorem, as well as for dervng auxlary functons. In order to free us from havng to specfy the doman of l φ and L φ, we wll n the followng assume that φ = R m. Proposton 2.7. Let φ be Legendre, wth φ = R m. For fxed p φ, D φ (p, L φ (q, v)) s contnuous n q nt( φ ) and convex n v. Together, the Legendre-Bregman conjugate and projecton satsfy D φ (p, q) D φ (p, L φ (q, v)) = v, p l φ (q, v) (2.10) for all p φ, q nt( φ ) and v R m. = D(L φ (q, v), q) + v, p L φ (q, v) (2.11) Proof. Usng the defnton of Bregman dstance and Proposton 2.6, we have for q nt( φ ) that D φ (p, q) D φ (p, L φ (q, v)) = φ(l φ (q, v)) φ(q) + φ(l φ (q, v)), p L φ (q, v) φ(q), p q (2.12) = φ(l φ (q, v)) φ(q) + φ(q) + v, p L φ (q, v) φ(q), p q (2.13) = v, p v, L φ (q, v) + φ(l φ (q, v)) φ(q) φ(q), L φ (q, v) q (2.14) = v, p v, L φ (q, v) + D φ (L φ (q, v), q) (2.15) = v, p l φ (q, v) (2.16) 4

7 Therefore (2.10) holds for p φ and q nt( φ ). From the defnton of the Legendre-Bregman conjugate we have for q nt( φ ), that l φ (q, v) = v, L φ (q, v) D φ (L φ (q, v), q) (2.17) Equaton (2.24) results from combnng ths wth (2.10). The convexty of D φ (p, L φ (q, v)) n v follows from (2.10), whch expresses D φ (p, L φ (q, v)) as a sum of the convex functons l φ (q, v) and D φ (p, q) v, p. The next result shows how D φ (p, L φ (q, v)) vares wth v, and wll be an mportant ngredent n the dualty result of the next secton. Proposton 2.8. Let φ be Legendre, wth p φ and q nt( φ ). Then for v R m, the mappng t D φ (p, L φ (q, tv)) s dfferentable at t = 0, wth dervatve d D φ (p, L φ (q, tv)) = v, q v, p. (2.18) dt t=0 Proof. Snce φ s Legendre, f q nt( φ ) then L φ (q, tv) nt( φ ); see for example Theorem 3.12 of (Bauschke & Borwen, 1997). From Proposton 2.7 we have that d dt D φ (p, L φ (q, tv)) = d dt ( tv, L φ (q, tv) tv, p + D φ (L φ (q, tv), q)) (2.19) t=0 t=0 = v, q v, p + d d dt φ(l φ (q, tv)) φ(q), dt L φ (q, tv) (2.20) t=0 t=0 = v, q v, p (2.21) whch proves the result for p φ and q nt( φ ). C. The Contnuous Extenson The results above are gven n terms of the Bregman dstance usng ts standard defnton as a functon on φ nt( φ ). We now make assumptons that allow us to work wth D φ as an extended real-valued functon on φ φ. Ths enables us to formlate a very natural and general dualty result, presented n the followng secton. Informally, we assume that D φ extends contnously from φ nt( φ ) to φ φ, and that L φ extends contnously from nt( φ ) R m to φ R m. In addton, we requre a form of compactness to guarantee the exstence of certan mnmzers. As before, n order to smplfy the presentaton we assume that the range of φ s all of R m. 5

8 Thus, we make the followng assumptons on φ: A1. φ s of Legendre type; A2. φ = R m ; A3. D φ extends to a functon D φ : φ φ [0, ] such that D φ (p, q) s jontly contnuous n p and q, and satsfes D φ (p, q) = 0 f and only f p = q. A4. L φ extends to a functon L φ : φ R m φ such that L φ (q, v) s jontly contnuous n q and v, and satsfes L φ (q, 0) = q. A5. D φ (p, ) s coercve for every p φ \nt( φ ); Note that snce for a Legendre functon φ s contnous on nt( φ ) (Proposton 2.3), t follows from Defnton 2.1 that D φ (p, q) s jontly contnous on φ nt( φ ) and from Proposton 2.6 that L φ (q, v) s jontly contnous on nt( φ ) R m. We also note that snce we assume φ = R m, D φ (p, ) s automatcally coercve for p nt( φ ). Together, condtons A1 A5 mply that φ s a Bregman-Legendre functon as defned by Bauschke and Borwen (1997). Now, from the defnton of the Legendre-Bregman conjugate we have l φ (q, v) = v, L φ (q, v) D φ (L φ (q, v), q) (2.22) for q nt( φ ). Propertes A4 and A5 allow us to defne l φ : φ R m R as the contnuous extenson of l φ : nt( φ ) R m R, satsfyng the same dentty. Thus, the Legendre-Bregman conjugate l φ (q, v) s contnuous n q, contnuous and convex n v, and satsfes l φ (q, 0) = 0. Proposton 2.7 now generalzes to the contnuous extenson. Proposton 2.9. Let φ satsfy A1 A4. For fxed p φ, D φ (p, L φ (q, v)) s contnuous n q and convex n v. Together, the Legendre-Bregman conjugate and projecton satsfy D φ (p, q) D φ (p, L φ (q, v)) = v, p l φ (q, v) (2.23) = D(L φ (q, v), q) + v, p L φ (q, v) (2.24) for all p, q φ and v R m. Proof. Ths follows drectly from the jont contnuty of D φ, L φ, and l φ. The dfferental dentty n Proposton 2.8 also extends. Proposton Let φ satsfy A1 A4, and let p, q φ wth D φ (p, q) <. Then for v R m, the mappng t D φ (p, L φ (q, tv)) s dfferentable at t = 0, wth dervatve d D φ (p, L φ (q, tv)) = v, q v, p. (2.25) dt t=0 6

9 Proof. Let q nt( φ ). From Proposton 2.8 we know that d D φ (p, L φ (q, tv)) = v, q v, p (2.26) dt t=0 Thus, snce L φ (q, (t + s)v) = L φ (L φ (q, tv), sv), we also have that d dt D(p, L φ(q, tv)) = v, L φ (q, tv) v, p (2.27) To show that the result holds when q φ \nt( φ ), we ll use a fact from elementary analyss: f f n f, and f n s contnuous wth f n(t) g(t) unformly for t [a, b], then g s contnous and f (t) = g(t). Frst, let q nt( φ ) and p / nt( φ ). Suppose p n nt( φ ) wth p n p. Snce L φ (q, tv) nt( φ ), we have from the above calculaton that d dt D(p n, L φ (q, tv)) = v, L φ (q, tv) v, p n v, L φ (q, tv) v, p (2.28) where the convergence s unform on every nterval [a, b] around zero; property (2.25) follows. Now suppose that p φ, q φ \nt( φ ), and q n nt( φ ) wth q n q. Because L φ (q, v) s jontly contnous n q and v, t s unformly contnuous on every compact set of (q, v). In partcular, v, L φ (q n, tv) v, p converges unformly n t to v, L φ (q, tv) v, p on every nterval [a, b]. Thus property (2.25) holds for all p, q φ. Proposton 2.9 and 2.10 are the man computatons that we wll requre n the followng secton. III. Dualty In ths secton we derve the man dualty result. The setup s that we have a set of features f (j) R m, j = 1, 2,..., n and denote by F the m n matrx wth columns gven by the f (j). These features correspond to the weak learners n boostng, or to the suffcent statstcs n an exponental model. The prmal problem constrans the values p, f (j), and these constrants carry over to Lagrange multplers n a famly of Legendre-Bregman projectons L(q, Fλ) n the dual problem. Defnton 3.1. For a gven element p 0 φ, the feasble set for p 0 and F s defned by { } P(p 0, F ) = p φ p, f (j) = p 0, f (j), j = 1,... n (3.1) For a gven q 0 φ, the Legendre-Bregman projecton famly for q 0 and F s defned by Q(q 0, F ) = {q φ q = L φ (q 0, Fλ) for some λ R n } (3.2) Trvally, both sets are non-empty snce p 0 P(p 0, F ) and q 0 Q(q 0, F ). Snce p 0, q 0, and F wll be fxed, we wll use abbrevated notaton and refer to these sets as P and Q. We use Q to denote the closure of Q(q 0, F ) as a subset of R m. Dualty relates the projecton onto P to the projecton onto Q. 7

10 Proposton 3.2. Let φ satsfy A1 A5, and suppose that p 0, q 0 φ wth D φ (p 0, q 0 ) <. Then there exsts a unque q φ satsfyng the followng four propertes: (1) q P Q (2) D φ (p, q) = D φ (p, q ) + D φ (q, q) for any p P and q Q (3) q = arg mn p P (4) q = arg mn q Q D φ (p, q 0 ) D φ (p 0, q) Moreover, any one of these four propertes determnes q unquely. In order to prove Proposton 3.2, we frst prove two lemmas. The frst shows that there s at least one member n common between P and Q; the second shows that the Pythagorean equalty (2) holds for any such member. Lemma 3.3. If D φ (p 0, q 0 ) < then P Q s nonempty. Proof. Note that snce D φ (p 0, q 0 ) <, D φ (p 0, q) s not dentcally on Q. Also, the mappng λ D φ (p 0, L φ (q 0, Fλ)) s contnuous and convex. Let R be the level set R = {q φ D φ (p 0, q) D(p 0, q 0 )} (3.3) We know from Assumpton A5 that R s bounded. Thus D φ (p 0, q) attans ts mnmum at a (not necessarly unque) pont q Q R Q. We wll show that q s also n P. Let q Q, and let µ j R n be such that q = lm j L φ (q 0, F µ j ). Then by the contnuty of L φ (, ), L φ (q, Fλ) = lm j L φ (L φ (q 0, F µ j ), Fλ) (3.4) = lm j L φ (q 0, F (µ j + λ)) Q (3.5) Thus Q s closed under the mappng q L φ (q, Fλ) for λ R m, and L φ (q, Fλ) s n Q for any λ. By the defnton of q, t follows that λ = 0 s a mnmum of the functon λ D φ (p 0, L φ (q, Fλ)). Takng dervatves wth respect to λ and usng Proposton 2.10 we conclude that q, f = p 0, f ; thus q P. Lemma 3.4. If q P Q then the Pythagorean equalty D φ (p, q ) = D φ (p, q ) + D φ (q, q ) holds for any p P and q Q. Proof. that Suppose that p 1, p 2, q 1, q 2 φ wth q 2 = L φ (q 1, Fλ). From Proposton 2.9 we have D φ (p 1, q 1 ) D φ (p 1, q 2 ) = p 1, Fλ l φ (p 1, Fλ) (3.6) 8

11 and smlarly Therefore, D φ (p 2, q 1 ) D φ (p 2, q 2 ) = p 2, Fλ l φ (p 2, Fλ) (3.7) D φ (p 1, q 1 ) D φ (p 1, q 2 ) D φ (p 2, q 1 ) + D φ (p 2, q 2 ) = p 1, Fλ p 2, Fλ (3.8) n ) = λ j ( p 1, f (j) p 2, f (j) It follows from ths dentty and the contnuty of D φ that D φ (p 1, q 1 ) D φ (p 1, q 2 ) D φ (p 2, q 1 ) + D φ (p 2, q 2 ) = 0 (3.9) f p 1, p 2 P and q 1, q 2 Q. The lemma follows by takng p 1 = q 1 = q. Proof of Proposton 3.2. Choose q to be any pont n P Q. Such a q exsts by Lemma 3.3. It satsfes property (1) by defnton, and t satsfes property (2) by Lemma 3.4. As a consequence of property (2), t also satsfes propertes (3) and (4). To check property (3), note that f q s any pont n Q, then D φ (p 0, q ) = D φ (p 0, q )+D φ (q, q ) D φ (p 0, q ). Smlarly, property (4) must hold snce f p s any pont n P, then D φ (p, q 0 ) = D φ (p, q ) + D φ (q, q 0 ) D(q, q 0 ). It remans to prove that each of the four propertes (1) (4) determnes q unquely. In other words, we need to show that f m s a pont n φ satsfyng any of the four propertes (1) (4), then m = q. Suppose that m satsfes property (1). Then property (2) wth p = q = m mples that D φ (m, m) = D φ (m, q )+D φ (q, m). Snce D φ (m, m) = 0, t follows that D φ (m, q ) = 0 so m = q. If m satsfes property (2), then the same argument wth q and m reversed proves that m = q. Suppose that m satsfes property (3). Then D φ (p 0, q ) D φ (p 0, m) = D φ (p 0, q ) + D φ (q, m) (3.10) where the second equalty follows from property (2) for q. Thus D φ (q, m) 0 so m = q. If m satsfes property (4), then showng once agan that m = q. D φ (q, q 0 ) D φ (m, q 0 ) = D φ (m, q ) + D φ (q, q 0 ) (3.11) In the followng secton we outlne the auxlary functon method for buldng teratve algorthms to compute q, and show how to use convexty to derve an auxlary functon for separable Bregman dstances. IV. Auxlary Functons The auxlary functon approach s conceptually smple: bound the change n Bregman dstance from below usng a functon that s easy to compute and that decouples the constrants. Maxmzng 9

12 ths auxlary functon we obtan new parameters λ = λ + λ and a new model q λ+ λ gven by q λ+ λ = L φ (q λ, F λ) (4.1) = L φ (L φ (q 0, Fλ), F λ) (4.2) = L φ (q 0, F (λ + λ)) (4.3) We then use the dualty theorem to show that when λ = 0, we must have that q = q. The strategy s very smlar to EM. In an EM algorthm, the Q-functon Q(λ, λ) s computed as a lower bound to the change n log-lkelhood: p 0 (x) log q (x λ ) = h p 0 (x) log ) q (x λ) q (x λ) x x (4.4) = p 0 (x) log q (h x, λ) q (x, h λ ) q (x, h λ) x h (4.5) q (h x, λ) log q (x, h λ ) q (x, h λ) (4.6) h def = Q(λ, λ) (4.7) where (x, h) s the complete data, x s the ncomplete data, and the nequalty follows from the concavty of the logarthm. After computng Q n the E-step, t s then maxmzed over λ n the M-step. In the same way, for Bregman dstances the am s to derve an auxlary functon A(λ, λ) whose calculaton can be carred out effcently n somethng lke an E-step, and such that t can be easly maxmzed over λ n an M-step. However, just as for EM, ths s a general strategy more than t s a precse algorthm. A partcular Bregman dstance problem may requre some ngenuty n order to come up wth an approprate auxlary functon. Ths s the general motvaton behnd the followng two defntons. Defnton 4.1. A functon A : φ R n R s called an auxlary functon for p 0 and F n case 1. A(q, λ) s contnuous n q and A(q, 0) = 0 2. D φ (p 0, q ) D φ (p 0, L φ (q, Fλ)) A(q, λ) 3. If λ = 0 s a maxmum of A(q, λ), then q, f (j) = p 0, f (j) for j = 1,..., n. Defnton 4.2. Let A be an auxlary functon and q 0 φ. The update sequence for q 0 wth respect to A s defned by q (0) = q 0 and q (t+1) def = L φ (q (t), Fλ (t) ) where λ (t) = arg max λ A(q (t), λ) (4.8) The reason for defnng auxlary functons n ths way s the followng result, whch can be proved n a smlar way to Proposton 5 n (Della Petra et al., 1997) or Lemma 2 n (Collns et al., 2001). The compactness assumpton wll n general follow from coercvty n Assumpton A5. 10

13 Proposton 4.3. Suppose that the sequence q (t) les n a compact set. Then lm q (t) = arg mn D φ (p 0, q ) (4.9) t q Q As we now explan, auxlary functons can be convenently constructed by usng the relaton D φ (p, q ) D φ (p, L φ (q, v)) = p, v l φ (q, v) (4.10) from Proposton 2.9 and explotng the convexty of l φ (q, v). For smplcty, we wll assume that φ s separable, so that φ(p) = φ (p ) wth each φ : R R satsfyng propertes A1 A5 (wth m = 1). Auxlary functons n the general case can be derved usng smlar arguments. For the separable case, clearly D φ (p, q ) = l φ (q, v) = D φ (p, q ) (4.11) l φ (q, v ) (4.12) where l φ (q, v) = sup p φ (pv D φ (p, q)) s the Legendre-Bregman conjugate of φ. Proposton 4.4. sgn(f (j) ). Then A(q, λ) For each = 1,..., m, select N so that n (j) f N, and set s j = def = n λ j p 0, f (j) 1 n N f (j) l φ (q, s j N λ j ) (4.13) s an auxlary functon for p 0 and F, and the correspondng update scheme s gven by q (t+1) = L φ (q (t), F λ (t) ) (4.14) where λ (t) j satsfes f (j) L φ (q (t), s j N λ j ) = p 0, f (j) (4.15) The dea of factorng out the sgns s j s taken from Collns et al. (2001). If we take N = 1 for all we obtan the algorthms presented n that paper. If we take N = (j) j f then we obtan algorthms analogous to the IIS algorthm of Della Petra et al. (1997). Proof. We verfy that the functon defned n (4.13) satfes the three propertes of Defnton 4.1. Property (1) of the defnton holds snce l φ (q, 0) = 0. Property (2) follows from the convexty of l φ. In partcular, we have the nequalty l φ (q, F λ) = l φ (q, (F λ) ) (4.16) 11

14 = = l φ (q, n s j F j λ j ) (4.17) n F j l φ (q, s j N λ j ) + 1 N n n F j l φ (q, 0) (4.18) N F j N l φ (q, s j N λ j ) (4.19) whch together wth Proposton 2.9 says that D φ (p, q ) D φ (p, L φ (q, F λ)) n λ j p, f (j) n F j N l φ (q, s j N λ j ) (4.20) Now, usng Propostons 2.9 and 2.10, t can be shown that v l φ (q, v) = L φ (q, v) (4.21) whch shows that (4.15) s the correct update. Therefore, at a maxmum λ of A(q, λ) we have that p 0, f (j) = f (j) L φ (q, s j N λ j ) for each j (4.22) If λ = 0, then p 0, f (j) = f (j) L φ (q, 0) = q, f (j) (4.23) showng that Property (3) holds. Thus (4.13) defnes an auxlary functon. Whle the auxlary functon (4.13) looks a bt messy, both the E-step and M-step for ths type of auxlary functon are generally qute practcal and easy to mplement. V. Concluson Ths paper has presented a convex dualty theorem for constraned optmzaton usng Bregman dstances. The man result, Proposton 3.2, dffers from results presented n the convex analyss lterature n that the Bregman dstance s defned on the entre essental doman, rather than only on the nteror. Ths generalty s needed n many applcatons. Though the assumptons A1 A5 that we make on the underlyng convex functon are farly restrctve, t may well be possble to relax these assumptons to cover a broader class of examples. In partcular, assumpton A2, whch states that the conjugate doman s all of R m, may not be essental n our approach. The auxlary functon technque presented n Secton 4 s a general and practcal method for dervng algorthms for solvng the dual problem. Although the specfc auxlary functon we derve assumes the Bregman dstance s separable, smlar arguments can be used for non-separable 12

15 Bregman dstances. The analyss gven here makes clear the role of convexty, as the bounds are derved usng only the propertes of the underlyng Legendre-Bregman conjugate. Acknowledgments We are grateful to Rob Schapre for encouragng us to wrte ths paper, and for helpful questons and suggestons. We also thank Jonathan Borwen and Henz Bauschke for comments on the paper. References Bauschke, H., & Borwen, J. (1997). Legendre functons and the method of random Bregman projectons. Journal of Convex Analyss, 4, Censor, Y., & Zenos, S. A. (1997). Parallel optmzaton: Theory, algorthms and applcatons. Oxford Unversty Press. Collns, M., Schapre, R., & Snger, Y. (2001). Logstc regresson, AdaBoost, and Bregman dstances. Machne Learnng. In press. Della Petra, S., Della Petra, V., & Lafferty, J. (1997). Inducng features of random felds. IEEE Transactons on Pattern Analyss and Machne Intellgence, 19, Rockafellar, R. T. (1970). Convex analyss. Prnceton Unversty Press. 13

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed