Supplement for SADAGRAD: Strongly Adaptive Stochastic Gradient Methods"

Suppleme for SADAGRAD: Srogly Adapive Sochasic Gradie Mehods" Zaiyi Che * 1 Yi Xu * Ehog Che 1 iabao Yag 1. Proof of Proposiio 1 Proposiio 1. Le ɛ > 0 be fixed, H 0 γi, γ g, EF (w 1 ) F (w ) ɛ 0 ad ieraio umber saisfies ɛ ɛ0(γ i g 1:,i ) η, η d g 1:,i. Algorihm 1 gives a soluio ŵ such ha EF (ŵ ) F ɛ. Proof. Le ψ 0 (w) 0 ad x H x Hx. Firs, we ca see ha ψ 1 (w) ψ (w) for ay 0. Defie z τ1 g ad τ ( F (w ) g ) (w w). Le ψ be defied by ψ (g) sup g x 1 x Ω η ψ (x) aig he summaio of objecive gap i all ieraios, we (F (w ) F (w)) g (w w) g w 1 η ψ (w) sup x Ω F (w ) (w w) g w 1 η ψ (w) 1 η ψ (w) g w g x 1 η ψ (x) 1 η ψ (w) g w ψ ( z ) Noe ha ψ ( z ) g w 1 1 η ψ (w 1 ) g w 1 1 η ψ 1(w 1 ) sup z x 1 x Ω η ψ 1(x) ψ 1( z ) ψ 1( z 1 ) g ψ 1( z 1 ) η g ψ 1 where he las iequaliy uses he fac ha ψ (w) is 1- srogly covex w.r. ψ H ad cosequeially ψ (w) is η-smooh wr.. ψ H 1. hus, we g w ψ ( z ) g w ψ 1( z 1 ) g ψ 1( z 1 ) η g ψ 1 1 g w ψ 1( z 1 ) η g ψ 1 By repeaig his process, we g w ψ ( z ) ψ 0( z 0 ) η η g ψ 1 g ψ 1 he (F (w ) F (w)) 1 η ψ (w) η g ψ 1 (1)

SADAGRAD: Srogly Adapive Sochasic Gradie Mehods Followig he aalysis i (Duchi e al., 011), we hus g ψ g 1 1:,i (F (w ) F (w)) γ w w 1 (w w 1) diag(s )(w w 1 ) η η η g 1:,i γ i g 1:,i w w 1 η η g 1:,i Now by he value of ɛ ɛ0(γ i g 1:,i ) η, η d g 1:,i, we (γ i g 1:,i ) η η d g 1:,i ɛ ɛ 4ɛ 0 Dividig by o boh sides ad seig w w, followig he iequaliy () ad he covexiy of F (w) we F (ŵ) F ɛ 4ɛ 0 w w 1 ɛ 1 Le F be he filraio associaed wih Algorihm 1 i he paper. Noicig ha is a radom variable wih respec o F, we cao ge rid of he las erm direcly. Defie he Sequece X N as X 1 i 1 g i Eg i, w i w () where Eg i F (w i ). ( Sice E g 1 Eg 1 0 ad ) w 1 arg mi w Ω ηw 1 τ1 g τ 1 ψ (w), which is measurable wih respec o g 1,..., g ad w 1,..., w, i is easy o see N is a marigale differece sequece wih respec o F, e.g. E F 1 0. O he oher had, sice g is upper bouded (e.g., by G), followig he saeme of i he heorem, N 4 ɛ ( Gɛ0 θ ), θ d G < always holds. he followig Lemma 1 below we ha EX 0. Now aig he expecaio we ha EF (ŵ) F ɛ E w w 1 ɛ 4ɛ E 1 0 ɛ E F (w 1 ) F (w ) ɛ ɛ 0 0 ɛ he we fiish he proof. Lemma 1. Le N be a marigale differece sequece w.r. he filraio F N, is a soppig ime such ha F for all N. If 0 < N <, he we Proof. 1 E 1 E 0. E 1 I( ) E F N I( )E 1 F I( ) E 0 I( ) I( ) I( ) I( ) E F F F E F E E F F 1 E F 1 where I( ) is he idicaor fucio. he firs equaio follows from he defiiio of codiioal expecaio ad N; he secod equaio follows from he fac ha I( ) 1; he hird ad fifh equaios follow from he defiiio of soppig ime (( ) F ); he seveh ad las equaios follow from he defiiio of marigale differece sequece; ad eighh equaio follows from heorem 5.1.6 i (Durre, 010).

SADAGRAD: Srogly Adapive Sochasic Gradie Mehods. Proof of heorem 1 heorem 1. Cosider SCO (1) wih a propery () ad a give ɛ > 0. Assume H 0 γi i Algorihm 1 ad γ,τ gτ, F (w 0 ) F ɛ 0 ad is he miimum umber such ha ɛ (γ i g 1:,i ) θ, θ d g 1:,i K log (ɛ 0 /ɛ), we EF (w K ) F ɛ.. Wih Proof of heorem 1. We will show by iducio ha EF (w ) F ɛ ɛ0 for 0, 1,..., K, which leads o our coclusio whe K log(ɛ 0 /ɛ). he iequaliy holds obviously for 0. Codiioed o EF (w 1 ) F ɛ 1, we will show ha EF (w ) F ɛ. We will modify Proposiio 1, he use i o he -h epoch of Algorihm codiioed o radomess i previous epoches. Le E deoes he expecaio over all radomess before he las ieraio of he -h epoch ad E 1: 1 deoes he expecaio over he radomess i he -h epoch give he radomess before -h epoch. Give w 1, we le w 1 deoe he opimal soluio ha is closes o w 1 1. Accordig o he proof of Proposio 1, We E 1: 1 F (w ) F (w 1) γ i g1: E,i 1: 1 w 1 w η 1 η d g 1:,i Eg g, w w 1 By he value of η θ ɛ / ad 4(γ i g 1:,i ), θ d g 1:,i ɛ, we hus θ ɛ (γ i g 1:,i ) η 8 η d g 1:,i ɛ E 1: 1 F (w ) F (w 1) E 1: 1 8 w 1 w 1 ɛ Eg g, w w 1 1 Sice we oly assume he codiio () ha does o ecessarily imply he uiqueess of he opimal soluios. he followig he similar argumes i Proposiio 1, we E 1: 1 F (w ) F (w 1) E 1: 1 8 w 1 w 1 ɛ aig expecaio over radomess i sages 1,..., 1, we EF (w ) F (w 1) E 8 w 1 w 1 ɛ 1 4 EF (w 1) F ɛ ɛ 1 4 ɛ ɛ herefore by iducio, we EF (w K ) F ɛ K ɛ.. Proof of heorem Lemma. Cosider SCO (4) wih he propery (). Le H 0 γi i Algorihm ad γ g. For ay w Ω ad is closes opimal soluio w, we F ( w ) F (w) G w 1 w 1 1 (Eg g ) (w w) η d g 1:,i γ i g 1:,i w w 1 η where w 1 w /. Proof. his proof is similar o he proof of Proposiio 1, bu we do o ae expecaio here. For compleeess, we give he proof here. hroughou he whole proof, we se he oaio g as he sochasic gradie of f(w ) ad as a resul Eg f(w ). Le ψ 0 (w) 0 ad x H x Hx. Firs, we ca see ha ψ 1 (w) ψ (w) for ay 0. Defie z τ1 g ad τ ( f(w ) g ) (w w). Le ψ be defied by ψ (g) sup g x 1 x Ω η ψ (x) φ(x) aig he summaio of objecive gap i all ieraios, we

SADAGRAD: Srogly Adapive Sochasic Gradie Mehods ψ (w) is η-smooh w.r.. ψ H 1. hus, we (f(w ) f(w) φ(w ) φ(w)) ( f(w ) (w w) φ(w ) φ(w)) g (w w) g w 1 η ψ (w) 1 η ψ (w) sup x Ω (φ(w ) φ(w)) g w 1 η ψ (w) φ(w) φ(w ) g w φ(w ) g x 1 η ψ (x) φ(x) 1 η ψ (w) g w φ(w ) ψ ( z ) () Noe ha ψ ( z ) g w 1 1 η ψ (w 1 ) φ(w 1 ) g w 1 1 η ψ 1(w 1 ) ( 1)φ(w 1 ) φ(w 1 ) sup z x 1 x Ω η ψ 1(x) ( 1)φ(x) φ(w 1 ) ψ 1( z ) φ(w 1 ) ψ 1( z 1 ) g ψ 1( z 1 ) η g ψ 1 φ(w 1 ) where he las iequaliy uses he fac ha ψ (w) is 1- srogly covex w.r. ψ H ad cosequeially g w ψ ( z ) g w ψ 1( z 1 ) g ψ 1( z 1 ) η g ψ 1 φ(w 1) 1 g w ψ 1( z 1 ) η g ψ 1 φ(w 1 ) By repeaig his process, we g w ψ ( z ) ψ 0( z 0 ) η η g ψ φ(w 1 1 ) g ψ φ(w 1 1 ) (4) Pluggig iequaliy (4) i iequaliy (), he (F (w ) F (w)) 1 η ψ (w) η φ(w 1 ) g ψ 1 φ(w 1 ) By addig F (w 1 ) F (w 1 ) o he boh sides of above iequaliy ad usig he fac ha F (w) f(w) φ(w), we ge 1 (F (w ) F (w)) 1 η ψ (w) η f(w 1 ) g ψ 1 f(w 1 ) Followig he aalysis i (Duchi e al., 011), we g ψ g 1 1:,i

SADAGRAD: Srogly Adapive Sochasic Gradie Mehods hus 1 (F (w ) F (w)) γ w w 1 (w w 1) diag(s )(w w 1 ) η η η g 1:,i f(w 1 ) f(w 1 ) γ i g 1:,i w w 1 η η g 1:,i ( f(w 1 )) (w 1 w 1 ) γ i g 1:,i w w 1 η η G w 1 w 1 g 1:,i where he las iequaliy hold usig Cauchy-Schwarz Iequaliy ad he fac ha f(w 1 ) G. Dividig by o boh sides, he we fiish he proof by usig he covexiy of F (w). heorem. For a give ɛ > 0, le K log (ɛ 0 /ɛ). Assume H 0 γi ad γ,τ gτ, F (w 0 ) F ɛ 0 ad is he miimum umber such G w 1 ɛ A, w 1 ɛ, where ha A (γ i g 1:,i ) θ, θ d g 1:,i. Algorihm 4 guaraees ha EF (w K ) F ɛ. Proof. his resul is proved by revisig Lemma o hold for a bouded soppig ime of he supermarigale sequece X i (). aig he expecaio of Lemma, we ha G w1 w 1 EF ( w ) F (w) E 1 E (Eg g ) (w w) η d E g 1:,i γ i g 1:,i w w 1 η he followig he same argumes o Proposiio 1, we ha E 1 (Eg g ) (w w) 0 Similar o he iducio of heorem 1, le η θ ɛ / ad he ieraio umber i -h epoch o be he smalles umber saisfyig followig iequaliies (γ i g 1:,i ) η 1 η d g 1:,i G w 1 w 1 ɛ ɛ hus codiioed o 1,..., 1-h epoches, we ha E 1: 1 F (w ) F (w 1) E 1: 1 1 w 1 w 1 ɛ aig expecaio over radomess i sages 1,..., 1, we EF (w ) F (w 1) E 1 w 1 w 1 ɛ 1 6 EF (w 1) F ɛ ɛ 1 6 ɛ ɛ herefore by iducio, we EF (w K ) F ɛ K ɛ. 4. Proof of heorem heorem. Uder he same assumpios as heorem 1 ad F (w 0 ) F ɛ 0, where w 0 is a iiial soluio. Le 1, ɛ ɛ0, K log ɛ0 ɛ ad (s) (γ i g sɛ 1:,i ) θ, θ d g 1:,i. he wih a mos a oal umber of S log ( 1 ) 1 calls of SADAGRAD ad a worse-cas ieraio complexiy of O(1/(ɛ)), Algorihm 5 fids a soluio w (S) such ha EF (w (S) ) F ɛ. Proof. Sice 1 / > 1, he F (w 0 ) F ( 1 /)ɛ 0. Followig he proof of heorem 1, we ca show ha EF (w (1) ) F ( ( ) 1/)ɛ 0 1 K ɛ ɛ wih K log 0ɛ ad (1) (γ i g 1:,i ) θ, θ d g 1:,i ( 1 ɛ ),

1,..., K. Nex, sice ɛ ɛ0, he we EF (w (1) ) F ( 1 ) ɛ0 ( ) ɛ0. By ruig SADAGRAD from w (1), heorem 1 esures ha EF (w () ) F EF (w(1) ) F K ( /)ɛ 0 ( ) K ɛ By coiuig he process, wih S ( log 1 ) 1, we ( ) EF (w (S) S ) F ɛ ɛ (5) he oal umber of ieraios for he S calls of SADAGRAD is upper bouded by oal for some C > 0. S K (s) s1 1 Acowledgeme SADAGRAD: Srogly Adapive Sochasic Gradie Mehods S s1 C S K s 1 1 ɛ 0 s1 1 ( ) 1 O ɛ C s ɛ 0 1 K 1 1 We ha Prof. Qihe ag from Uiversiy of Iowa for his help o he proof of Lemma 1. Refereces Duchi, Joh, Haza, Elad, ad Siger, Yoram. Adapive subgradie mehods for olie learig ad sochasic opimizaio. Joural of Machie Learig Research, 1 (Jul):11 159, 011. Durre, Ric. Probabiliy: heory ad examples. Cambridge uiversiy press, 010.