Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We itroduce two cocetratio iequalities i Lemma 4, which are used frequetly i the proofs Lemma 4 Radomized versio of Hoeffdig s iequality Suppose is a radom variable takig value o N +, ad let X,, X be idepedet radom variables Defie X = X + + X If every X i is strictly bouded by the itervals [a i, b i ], the we have with probability at least, X E X l/ i= b i a i 7 Similarly, with probability at least, E X X l/ i= b i a i 8 i= Proof he proof is quite straightforward For the radom- 9 Pr X + + X EX = Pr X E X t= Pr = t Pr = t =, t= l/ l/ i= b i a i i= b i a i = t where the first iequality follows from the determiistic Radomized versio of vector cocetratio iequality Suppose is a radom variable takig value versio of Hoeffdig s iequality It is easy to show the correctess of the radomized o N +, ad let X,, X R d be iid radom versio of vector cocetratio iequality by employig variables If φ : R d H, where H is a Hilbert the same techique he determiistic versio space edowed with orm actually we ca take ca be derived via McDiarmid s iequality McDiarmid, H to be R d edowed with ifiity orm Suppose 989 A stadard proof ca be foud i the sectio B = sup x R d φx < he we have with probability 4 of Shawe-aylor & Cristiaii, 004 For at least, completeess, we iclude the proof here o derive φx i E φx B [ + the determiistic versio, defie S = X,, X, ] S = X log/,, X to be two collectios of idepe- det samples, S = X,, X i, X i, X i+,, X, ad fs = i= φx i EφX, we have fs fs B/ By McDiarmid s iequality, we have Departmet of Computer Sciece, he Uiversity of Iowa, Iowa City, IA 54, USA Uiversity of Sciece ad echology of Chia 3 Itellifusio Correspodece to: Migrui Liu <migrui-liu@uiowaedu>, Xiaoxua Zhag <xiaoxuazhag@uiowaedu>, Zaiyi Che <czy656@mailutsceduc>, Xiaoyu Wag <faghuaxue@gmailcom>, iabao Yag <tiabao-yag@uiowaedu> ɛ Pr fs EfS > ɛ exp 4B 0 Proceedigs of the 35 th Iteratioal Coferece o Machie Learig, Stockholm, Swede, PMLR 80, 08 Copyright 08 by the authors Defie σ = σ,, σ to be Rademacher variables, ie Prσ i = = Prσ i = = /, ad σ i s are iid

Fast Stochastic AUC Maximizatio Defie φ S = i= φz i, the E fs = E φs Eφ S = E φ S Eφ S = E E φ S φ S E φ S φ S = E σ i φx i φx i i= E σ i φx i i= = E σi φ X i + σ i σ j φx i φx j i= i j E σi φ X i + σ i σ j φx i φx j i= i j = Eφ X i B i= Combig this result with 0, ad takig ɛ = B suffice to get the result log Proof of Lemma Proof Accordig to the equatio 6 i Yig et al, 06, we have fv, α = fw, a, b, α = p p w x a + αw x P x y = dx+ x w x b + + αw x } P x y = α x = p p w Exx y = + Exx y = w w aex y = + bex y = + a + b + + αw Ex y = Ex y = α } Whe α = w Ex y = Ex y = Ω, it is easy to see that α Ω by employig Cauchy-Schwarz iequality, ie α w Ex y = Ex y = Rκ, fv, α achieves its maximum with respect to α, so we get f v = f w, a, b, w Ex y = Ex y = = p p w Exx y = w aw Ex y = + a + w Exx y = w bw Ex y = + b + [ w Ex y = Ex y = } + w Ex y = Ex y = ] = p p [ w, a, b M w, a, b + affie fuctio of v ], where M = M + M + M 3, M = Exx y = Ex y = 0 Ex y = 0 0 0 0 M = Exx y = 0 Ex y = 0 0 0 Ex y = 0 M 3 = qq 0 0 0 0 0 0 0 0 q = Ex y = Ex y = Note that M, M, M 3 are positive semidefiite matrix, ad hece M is positive semidefiite So f v is covex ad piecewise quadratic Sice Ω is a polyhedro, accordig to Corollary 3 of Li, 03, we ca kow that f v restricted o Ω satisfies the quadratic growth coditio 3 Proof of Lemma 3 Proof By applyig the iequality 9 i Lemma 4, the triagle iequality, ad the uio boud, we have with probability at least 6, Â A κ + l + κ + l +, Note that both ad + follow the Beroulli distributio, ad deote p = Pry = By applyig determiistic versio of Hoeffdig s iequality i Lemma 4 ie, iequality 8 to idicator fuctios of radom variables I [yi= ] ad I [yi=] respectively ad the uio boud, we have with probability at least 6, the followig two equatios hold simultaeously: p l l, + p

Fast Stochastic AUC Maximizatio Accordig to ad, by uio boud, we kow that with probability at least 3, we have Â A κ + p l l + κ + p l l Note that is equivalet to l p p, l p p 3 4 Utilizig 4 ad pluggig it ito 3, we kow that with probability at least 3, κ + l Â A l p l κ + l + p + κ = l l p 4κ + l, ξ where l l + ξ mi p, p 4 Proof of heorem κ + l l p l Proof Defie = log, ad a, = G 3γ + γ l 6, µ 0 = R0 a 0,, µ k = k µ 0, R k = R 0 / k, where k =,, m he we have µ k Rk = k µ 0 R0 By defiitio of m, whe 00, 0 < log log m log log log, 5 so we have m 4 log 6 o employ the result of Lemma, at the i-th stage, we eed R 0 to satisfy R +4κ = = 4 i / + 4κ, which should Ri 4 i R0 hold for ay i m So 0 4 m suffices to achieve this requiremet Now we argue that this coditio ca be implied by 00 Note that 0 = /m m o show the impli- 4 m, it suffices to prove log 3 log, ie, ad 4 m 4 log catio from 00 to 0 that whe 00, log log log log log log log, log log = log log 3, which obviously holds Accordig to Lemma, we kow that P v f v satisfy the quadratic growth coditio, which implies that there exists some c > 0, such that v v cp v P v, where v is the closest poit to v i Ω We ca assume c R0 G, ie, c G R 0 Otherwise we ca set c to be R0 G such that the quadratic growth property i Lemma still holds Whe 00, we have µ m = m µ 0 4 log R 0 G G R 0 log G 3 l3 0 log R 0 log 0 G 3 l3 log R 0 log m + G 3 l3 log R 0 log + log log = G R 3 l3 log G 0 R 0 log log +3 log 3γ + γ l 60 / 0 0 3 + l 3 0 log 0 0 + log

Fast Stochastic AUC Maximizatio where the first iequality holds because of 6, the secod iequality stems from the fact that γ, γ, 0 < <, ad the defiitio of, the third iequality holds by employig a + b ab, the fourth iequality holds because 0 m +, the fifth iequality holds because of the lower boud of m i 5, ad the last iequality holds sice 00 ad the fuctio is mootoically icreasig with respect to So G R 0 µ m Recall that c G R 0, ad thus c µ m Give v k, deote v k by the closest optimal solutio to v k We cosider two cases Case If c µ 0, the µ 0 c µ m So there exists a k such that µ k c µ k, where 0 k < m o utilize this fact, we have the followig lemma Lemma 5 Let k satisfy µ k c µ k he for ay k k, there exists a Borel set A k Ω of probability at least k, such that for ω A k, the poits v k } m k= geerated by the Algorithm satisfy v k v k R k = k+ R 0, 7 P v k P µ k R k = k µ 0 R 0 8 Moreover, for k > k there is a Borel set C k Ω of probability at least k k such that o C k, we have P v k P v k µ k Rk 9 Proof We prove 7 ad 8 by iductio Note that 7 holds for k = Assume it is true for some k > o A k Accordig to the Lemma, there exists a Borel set B k with PrB k such that P v k P R k a 0, = µ k k R 0 R k = µ k R k, which is 8 By the iductive hypothesis, v k v k R k o the set A k Defie A k = A k B k Note that PrA k PrA k + PrB k k, ad o A k, by the HEB ad the defiitio of k, we have v k v k c P v k P the RHS becomes zero ad hece we ca get a tighter boud of P v k P v k, we here relax the boud to be R k a 0,, which is, there exists a Borel set B k with PrB k such that P v k P v k R k a 0, = k k R k a 0, = k k µ k R k = µ kr k, which implies that o C k = k j=k + B j, we have P v k P v k = k j=k + k j=k + P v j P v j k j µ k R k µ k R k By uio boud, we have PrC k = Pr k j=k + B j k k Here completes the proof Now we proceed the proof as follows Note that µ 0 c µ m At the ed of k -th stage, o the Borel set A k of probability at least k, we have P v k P µ k R k he o the Borel set D m = C m A k = m j=k + B j A k with PrD m m, we have P v m P = P v m P v k + P v k P µ k Rk 4µ k c µ k R k = 4c a 0, By the defiitio of m ad, ad the fact that m log, we have m So PrD m Case If c < µ 0, the o A = B, P v P R 0 a 0, = R 0 a 0, a 0, P v k P µ k µ kr k µ k R k, = µ 0 a 0, c a 0, which is 7 for k + Now we prove 9 For k > k, it is easy to show a similar coclusio as i Lemma Remark: At k-th stage with k > k, oe ca use the similar proof of Lemma by substitutig all v to v k, the first term I o Hece o A C m, by usig Lemma 5 ad a similar argumet as i case, we have P v m P =P v m P v + P v P R 0 a 0, c a 0,,

where PrA C m Combiig the two cases, we have with probability at least, P v m P 4c c G 3γ + γ l 60 log 0 l = Õ 0 Fast Stochastic AUC Maximizatio Refereces Li, Guoyi Global error bouds for piecewise covex polyomials Mathematical Programmig, pp 8, 03 McDiarmid, Coli O the method of bouded differeces Surveys i combiatorics, 4:48 88, 989 Shawe-aylor, Joh ad Cristiaii, Nello Kerel methods for patter aalysis Cambridge uiversity press, 004 Yig, Yimig, We, Logyi, ad Lyu, Siwei Stochastic olie auc maximizatio I Advaces i Neural Iformatio Processig Systems, pp 45 459, 06