Convex Formulation for Learning from Positive and Unlabeled Data. is convex, 2

Size: px

Start display at page:

Download "Convex Formulation for Learning from Positive and Unlabeled Data. is convex, 2"

Darren Johns
5 years ago
Views:

1 Covex Formulatio for Learig from Positive ad Ulabeled Data A. Proofs A.. Proof of Theorem If the composite loss lz is covex, it is liear. Proof: The composite loss is a odd fuctio: l z = l z lz = lz, d Therefore, dz lz = d dz l z. If the composite loss lz d is covex, dz lz 0 holds for allz. Sice the covexity of d lz implies the covexity of l z, dz l z 0 should also hold for allz. However, if d dz lz > 0, the d dz l z < 0 holds, which is cotradictory to the covexity of l z. d Therefore, dz lz = 0 should hold, which is satisfied oly whe lz is liear. A.. Proof of Lemma J S α is strogly covex iαwith parameter at least λ, ad thus J S α J S α S+ J S α S α α S+λ α α S J S α S+λ α α S, where we use the optimality coditio J S α S = 0. Similarly, we ca prove the other two iequalities. A.3. Proof of Lemma 3 The differece fuctio ca be writte as J S α,u J S α = 4 α u α+ u α πu 3 α, with a partial gradiet α J Sα,u J S α = u α+ u πu 3. Give theδ-ball of α S, i.e.,b δα S = α α α S δ}, it is easy to see that for ay α B δ α S, ad the α α α S + α S +M α, α J Sα,u J S α +M α u Fro + u +π u 3. This meas that J S,u J S is Lipschitz cotiuous o B δ α S with a Lipschitz costat of order O u Fro + u + u 3. A.4. Proof of Lemma 5 The differece fuctio ca be writte as J LL α,u J LL α = πu 3 α+u 4 α. Give α B δ α LL, we have kow that πu 3 α is Lipschitz cotiuous with a Lipschitz costat of order O u 3 i the proof of Lemma 3. Cosequetly, J LL,u J LL is Lipschitz cotiuous o B δ α LL with a Lipschitz costat of order O u 3 +Lipu 4. A.5. Proof of Lemma 7 Same as the proof of Lemma 5.

2 Covex Formulatio for Learig from Positive ad Ulabeled Data A.6. Proof of Theorem 4 Let u,u ad u 3 be defied as i Eq. 3. Accordig to the cetral limit theorem, as,. Thus, we have u Fro = O p /, u = O p /, u 3 = O p /, α S α S λ ωu = O u Fro + u + u 3 = O p / + / by Lemma, Lemma 3, ad Propositio 6. i Boas & Shapiro 998, p. 9. O the other had, i which ĴS α S J S α S ĴS α S ĴSα S + ĴSα S J S α S, Ĵ S α S ĴSα S = α S +α S + 4 ϕx iϕx i + λ I m α S α S ϕx i α S α S π Ĵ S α S J S α S = 4 α S u α S + u α S πu 3 α S. Sice 0 ϕ j x, α S M α ad α S M α, which completes the proof. A.7. Proof of Theorem 6 ϕx i α S α S, ĴS α S J S α S ĴS α S ĴSα S + ĴSα S J S α S O p α S α S +O p u Fro + u + u 3 = O p / + /, Let u 3 ad u 4 α be defied as i Eq. 4. The gradiet of u 4 is give by Accordig to the cetral limit theorem, u 4 α = ϕx i +exp ϕx i α ϕx +exp ϕx α pxdx. u 3 = O p /, Lipu 4 = O p /, as,, sicelipu 4 = sup α u 4 α ad ϕx sup α Rm,x R d +exp ϕx α m / <. Thus, we have α LL α LL λ ωu = O u 3 +Lipu 4

3 Covex Formulatio for Learig from Positive ad Ulabeled Data = O p / + / by Lemma, Lemma 5, ad Propositio 6. i Boas & Shapiro 998, p. 9. O the other had, For the secod term, ĴLL α LL J LL α LL ĴLL α LL ĴLLα LL + ĴLLα LL J LL α LL. ĴLLα LL J LL α LL = πu 3 α LL +u 4 α LL πm α u 3 + u 4 α LL = O p / + / accordig to the cetral limit theorem. For the first term, it is a bit more complex: ĴLL α LL ĴLLα LL λ α LL +α LL α LL α LL + π ϕx i α LL α Let fz,t = l+expz +t, the lim t 0 fz,t = fz,0 ad + l+expϕx i α LL l+expϕx i α LL. fz,t fz,0 lim = lim t 0 t t 0 t fz,t = +exp z t <, where we use L Hôpital s rule. I other words,fz,t approachesfz,0 iot ast 0. Subsequetly, for ayx R d, by z = ϕx α LL ad t = ϕx α LL ϕx α LL we ca obtai l+expϕx α LL l+expϕx α LL = O ϕx α LL ϕx α LL which results i ĴLL α LL ĴLLα LL = O p / + /. A.8. Proof of Theorem 8 = Om / α LL α LL, The proof goes alog the same lie as that of Theorem 6. Letu 3 adu 5 α be defied as i Eq. 5. Note that the fuctio max0,+z/, z} is piecewise liear i z, differetiable almost everywhere, ad 0 d/dz max0,+z/, z}. As a result, u 3 = O p /, Lipu 5 = O p /, as,, ad α DH α DH λ ωu = O u 3 +Lipu 5 = O p / + / by Lemma, Lemma 7, ad Propositio 6. i Boas & Shapiro 998, p. 9. O the other had, ĴDH α DH J DH α DH ĴDH α DH ĴDHα DH + ĴDHα DH J DH α DH max0,+ϕx i α LL /,ϕx i α LL } LL

4 Covex Formulatio for Learig from Positive ad Ulabeled Data max0,+ϕx i α LL/,ϕx i α LL} +O p / + /. Let fz,t = max0,+z +t/,z +t}, the lim t 0 fz,t = fz,0 ad forz R\0,}, fz,t fz,0 lim = lim 0, t 0 t t 0 t fz,t },. I other words, fz,t approaches fz,0 i Ot as t 0 almost surely. Subsequetly, for ay x R d, by z = ϕx α DH ad t = ϕx α DH ϕx α DH we ca obtai max0,+ϕx α LL /,ϕx α LL } max0,+ϕx α LL/,ϕx α LL} = O ϕx α LL ϕx α LL which completes the proof. B. Optimizatio problems = Om / α LL α LL = O p / + /, I this sectio, we give exact optimizatio problems for the optimizatio methods preseted i the paper. The logistic regressio ad logistic loss method is solved with a quasi-newto method, ad therefore we provide the derivatives i Sec. B.. The Hige loss ad Double Hige loss result i quadratic problems. The ramp-loss is solved via a sequece of quadratic problems. All quadratic problems are expressed i the form mi α α Hα+f α s.t. Lα k l α This stadard form ca the just be plugged ito a off-the-shelf optimizatio package such as Gurobi, IBM CPLEX or MATLAB s iteral quadprog fuctio. B.. Logistic loss The gradiet for the objective fuctio i Eq. 8 is ĴLLα,b α where l LL z is the derivative of l LLz: = π Φ P +λα l LL α ϕx j b ϕx j, j= The derivative with respect to the uregularized costat b is ĴLLα,b b B.. Double Hige Loss - PU Learig The objective fuctio ca be expressed as π j= l LLz = exp z +exp z. = π j= l LL α ϕx j b. gx i + max 0,max gx j, + gx j + λ g

5 = π m α l ϕ l x i +b Covex Formulatio for Learig from Positive ad Ulabeled Data + j= The objective fuctio ca the be expressed as Let The H is defied as mi α,b,ξ m max 0,max α l ϕ l x j+b, m + α l ϕ l x j+b + λ π Φ P α πb+ ξ + λ α α s.t. ξ 0, ξ + Φ Uα+ b, ξ Φ U α+b, H = γ = α b b ξ λi m m O m O m O m 0 O O m O O where O m is a zero matrix of rows ad m colums. The liear part of the objective is f = π Φ P π The lower-boud is The first liear costrait is l = m 0, m α l The secod liear costrait is [ Φ U ξ + Φ Uα+ b Φ Uα+ b ξ ] α I u. ξ ξ Φ U α+b Φ U α+b ξ 0 [ ] α ΦU I b 0. ξ Combiig the two sets of iequalities, we get [ L = Φ U I Φ U I ad k = [ 0 ]. ],

6 B.3. Weighted hige loss classifier Covex Formulatio for Learig from Positive ad Ulabeled Data We wat a cost-sesitive classifier with a per-sample weightig. Usig the model m gx = α l ϕ l x +b, where we wish to miimize This gives a QP of We the set H is the The liear term is The lower boud is Defie Φ as Jg = mi α,b,ξ = c,...,c m } := x,...,x }, b m w i l H y i α l ϕ l x i +b + λ α α, m w i max 0, y i α l ϕ l x i +b + λ α α. w ξ + λ α Rα s.t. ξ i 0, i =,..., ξ i y i b α lkx i,c l +u H = γ = α b ξ λi O m O m O 0 O O O O f = l = 0 m 0 w m 0 Φ il = y i ϕ l x i. i =,...,. The costrait ca be writte i matrix form as ξ Φα+by The matrix is the ad k is Φα by ξ L = [ Φ y I ], k = [ ].

7 Covex Formulatio for Learig from Positive ad Ulabeled Data.5 H z H z z Figure 6. Decompositio of the ramp-loss ito covex ad cocave parts. B.4. Weighted ramp-loss classifier CCCP Classificatio with the ramp-loss is difficult, due to the the o-covexity of the loss fuctio. Oe popular method to perform optimizatio is to split the o-covex fuctio ito a covex ad cocave part. The cocave part is the upperbouded by a liear fuctio, ad optimizatio is iteratively performed: miimizatio of the upper-boud, ad tighteig of the upper-boud aroud the ew miima. We miimize the ramp-loss problem here usig this approach. This is a straightforward applicatio of the covex-cocave procedure CCCP i Yuille & Ragaraja 00 ad is essetially the same as Collobert et al We wish to miimize the followig o-covex objective fuctio: Jα,b = m w i l R y i α l ϕ l x i +b + λ α α, 6 where the ramp lossl R z is defied as l R z = max By defiig the followig slightly more geeral hige loss the ramp lossl R z ca be decomposed as: 0,mi, z = max0,mi, z. H ǫ z = max0,ǫ z, l R z = H z H z. This is illustrated i Fig. 6. The objective Eq. 6 ca therefore be decomposed as Jα,b = J vex α,b+j cave α,b, J vex α,b = m w i H α l ϕ l x i +b + λ α α, J cave α,b = m w i H α l ϕ l x i +b The followig self-evidet relatio ca be used to upper-boud the cocave part where tz fz supyt fy y R fz tz f t, 7 f t = supyt fy. y R

8 Covex Formulatio for Learig from Positive ad Ulabeled Data The iequality i Eq.7 is kow as the Fechel iequality ad the fuctiof z is kow as the Fechel dual or covex cojugate. Applyig the above iequality toh ǫ z, we ca obtai a boud as H ǫ z zt H ǫt, H ǫ z H ǫt zt, wherehǫt is the Fechel dual ofh ǫ z. The Fechel dual ofh t is the full calculatio is give i Appedix B.4.3 H t t = t 0, otherwise. We ca miimize the upper-boud as arg mih t tz = t t = 0 z >. t = z. The cocave part is the bouded, with the parameter a as J cave α,b,a = m w i H a i a i y i α l ϕ l x i +b, where J cave α,u J cave α,b,a, for ay a. B.4.. TIGHTENING OF THE UPPER-BOUND The upperboud is miimized tighteed whe a i = y i m α lϕ l x i +b, 0 otherwise. B.4.. MINIMIZING THE OBJECTIVE We wish to miimize the covex part ad the upper boud Jα,u,a = J vex α,u + J cave α,u,a with respect to a. This gives a objective of Jα,b,a = m w i H y i α l ϕ l x i +b + λ α α m w i a i y i α l ϕ l x i +b. We defie the followig matrices: Φ i,l = y i kx i,c l, Φ i,l = w i a i y i kx i,c l, The QP for this is the mi α,b,ξ w ξ + λ α α Φα b w ia i y i. s.t. ξ i 0 i =,..., b ξ i y i α lϕ l x i +b i =,...,. We defie agai γ = α b ξ

9 Covex Formulatio for Learig from Positive ad Ulabeled Data The quadratic term is The liear term is The lower-boud is The liear term is This gives a matrix of ad k is H = f = λi m m O m O O 0 O O O O lb = Φ w ia i y i w m 0 Φα by ξ. L = [ Φ y I ], k = [ ]. B.4.3. CALCULATION OF THE FENCHEL DUAL OF H ǫ z I this sectio, we briefly give the derivatio of the Fechel dual ofh ǫ z Hǫt = suptv H ǫ v v v ǫ = suptv v max0,ǫ v. To make the above easier, we split the domai of thev: Hǫt = max suptv max0,ǫ v,sup tv max0,ǫ v, For the first part: The secod part is Puttig these two together gives: = max sup v ǫ suptv v ǫ tv ǫ v,sup tv v>ǫ ǫ v = sup suptv = t>ǫ H ǫt = = v>ǫ. v t+ v ǫ ǫ, ǫt t, t < ǫv t 0, t > 0. ǫt t 0, otherwise.

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate