arxiv: v1 [cs.lg] 5 Nov 2018

Size: px
Start display at page:

Download "arxiv: v1 [cs.lg] 5 Nov 2018"

Transcription

1 Sochaic Modified Equaion I: Mahemaical Foundaion Sochaic Modified Equaion and Dynamic of Sochaic Gradien Algorihm I: Mahemaical Foundaion arxiv: v1 [c.lg] 5 Nov 218 Qianxiao Li Iniue of High Performance Compuing Agency for Science, Technology and Reearch 1 Fuionopoli Way, Connexi Norh, Singapore Cheng Tai Beijing Iniue of Big Daa Reearch and Peking Univeriy Beijing, China, 18 Weinan E Princeon Univeriy Princeon, NJ 8544, USA Beijing Iniue of Big Daa Reearch and Peking Univeriy, Beijing, China Edior: Abrac liqix@ihpc.a-ar.edu.g chengai@pku.edu.cn weinan@mah.princeon.edu We develop he mahemaical foundaion of he ochaic modified equaion (SME) framework for analyzing he dynamic of ochaic gradien algorihm, where he laer i approximaed by a cla of ochaic differenial equaion wih mall noie parameer. We prove ha hi approximaion can be underood mahemaically a an weak approximaion, which lead o a number of precie and ueful reul on he approximaion of ochaic gradien decen (SGD), momenum SGD and ochaic Neerov acceleraed gradien mehod in he general eing of ochaic objecive. We alo demonrae hrough explici calculaion ha hi coninuou-ime approach can uncover imporan analyical inigh ino he ochaic gradien algorihm under conideraion ha may no be eay o obain in a purely dicree-ime eing. Keyword: ochaic gradien algorihm, modified equaion, ochaic differenial equaion, momenum, Neerov acceleraed gradien 1. Inroducion Sochaic gradien algorihm (SGA) are ofen ued o olve opimizaion problem of he form min f(x) := Ef γ (x) (1.1) x R d where {f r : r Γ} i a family of funcion from R d o R and γ i a Γ-valued random variable, wih repec o which he expecaion i aken (hee noion will be made precie in he following ecion). For empirical lo minimizaion in upervied learning applicaion, γ i uually a uniform random variable aking value in Γ = {1, 2,..., n}. In hi cae, f i 1

2 Li, Tai and E he oal empirical lo funcion and f r, r Γ are he lo funcion due o he r h raining ample. In hi paper, we hall conider he general iuaion of a expecaion over arbirary index e and diribuion. Solving (1.1) uing he andard gradien decen (GD) on x give he ieraion cheme x k+1 = x k η Ef γ (x k ), (1.2) for k and η i a mall poiive ep-ize known a he learning rae. Noe ha hi require he evaluaion of he gradien of an expecaion, which can be coly (in hi empirical rik minimizaion cae, hi happen when n i large). In i imple form, he ochaic gradien decen (SGD) algorihm replace he expecaion of he gradien wih a ampled gradien, i.e. x k+1 = x k η f γk (x k ), (1.3) where each γ k i an independen and idenically diribued (i.i.d.) random variable wih he ame diribuion a γ. Under mild condiion, we hen have E[ f γk (x k ) x k ] = Ef(x k ). In oher word, (1.3) i a ampled verion of (1.2). In he lieraure, many convergence reul are available for SGD and i varian (Shamir and Zhang, 213; Mouline and Bach, 211; Needell e al., 214; Xiao and Zhang, 214; Shalev-Shwarz and Zhang, 214; Bach and Mouline, 213; Défoez and Bach, 215). However, i i ofen he cae ha differen analyi echnique mu be adoped for differen varian of he algorihm and here generally lacked a yemaic approach o udy heir precie dynamical properie. In Li e al. (215), a general approach wa inroduced o addre hi problem, in which dicree-ime ochaic gradien algorihm are approximaed by coninuou-ime ochaic differenial equaion wih he noie erm depending on a mall parameer (he learning rae). Thi can be viewed a a generalizaion of he mehod of modified equaion (Hir, 1968; Noh and Proer, 196; Daly, 1963; Warming and Hye, 1974) o he ochaic eing, and allow one o employ ool from ochaic calculu o yemaically analyze he dynamic of ochaic gradien algorihm. The ochaic modified equaion (SME) approach wa furher developed in Li e al. (217), where a weak approximaion reul for he SGD wa proved in a finie-um-objecive eing. The preen erie of paper build on he earlier work of Li e al. (215, 217) and aim o eablih he framework of ochaic modified equaion and heir applicaion in greaer generaliy and deph, and highligh he advanage of hi yemaic framework for udying ochaic gradien algorihm uing coninuou-ime mehod. A he fir in he erie, hi paper will focu on mahemaical apec, namely he main approximaion heorem relaing ochaic gradien algorihm o ochaic modified equaion in he form of weak approximaion. Thee generalize he approximaion reul in Li e al. (217) in variou apec. In a ubequen paper in he erie, we will dicu he applicaion of hi formalim o adapive ochaic gradien algorihm and relaed problem. The organizaion of hi paper i a follow. We fir dicu relaed work in Sec. 2, epecially in he conex of coninuou-ime approximaion. Nex, we moivae he SME approach and e up he precie mahemaical framework in Sec We hen prove in Sec. 4 a cenral reul relaing dicree ochaic algorihm and coninuou ochaic procee, which allow u o derive SME for ochaic gradien decen and varian. In Sec. 5, he 2

3 Sochaic Modified Equaion I: Mahemaical Foundaion SME approach i ued o analyze he dynamic of ochaic gradien algorihm when applied o opimize a imple ye non-rivial objecive. Laly, we conclude wih ome dicuion of our reul in Sec. 6. The longer proof of he reul ued in he paper are organized in he appendix. Thee are eenially elf-conained, bu baic knowledge of ochaic calculu and probabiliy heory are aumed. Unfamiliar reader may refer o andard inroducory ex, uch a Durre (21) and Okendal (213). 1.1 Noaion In hi paper, we adhere wherever poible o he following noaion. Dimenional indice are wrien a ubcrip wih a bracke o avoid confuion wih oher equenial indice (e.g. ime, ieraion number), which do no have bracke. When more han one indice are preen, we eparae hem wih a comma, e.g. x k,(i) i he i-h coordinae of he vecor x k, he k h member of a equence. We adop he Einein ummaion convenion, where repeaed (paial) indice are ummed, i.e. x (i) x (i) := d i=1 x (i)x (i). For a marix A, we denoe by λ(a) = {λ 1 (A), λ 2 (A),... } he e of eigenvalue of A. If A i Hermiian, hen he eigenvalue are ordered o ha λ 1 (A) denoe a maximum eigenvalue. We denoe he uual Euclidean norm by and for higher rank enor, we ue he ame noaion o denoe he flaened vecor norm (e.g. for marice i will be he Frobeniu norm). The ymbol denoe he minimum operaor, i.e. a b := min(a, b). For a probabiliy pace (or generally, a meaure pace) (Ω, F, P), he ymbol L(Ω, F, P), p (1, ) denoe he uual Lebegue pace, i.e. u L p (Ω, F, P) if u p L p (Ω,F,P) := Ω u(ω) p dp(ω) E u p <. When he underlying probabiliy pace i obviou, we ue he horhand L p (Ω) L(Ω, F, P). In addiion, when Ω = R d, we alo wrie he local L p pace a L p loc (Rd ), which conain u for which u p i inegrable on compac ube of R d. Finally, we noe ha in he proof of variou reul, we ypically ue he leer C (whoe value may change acro reul) o denoe a generic poiive conan. Thi i uually independen of he learning rae η, bu if no explicily aed oherwie, i may depend on e.g. Lipchiz conan, ambien dimenion, ec. 2. Relaed work In hi ecion, we dicu everal relaed work on analyzing dicree-ime algorihm uing coninuou-ime approache. The idea of approximaing dicree-ime ochaic algorihm by coninuou equaion dae back o he large body of work known a ochaic approximaion heory (Kuhner and Yin, 23; Ljung e al., 212). Thee ypically eablih law of large number ype reul where he limiing equaion i an ODE, which can hen be ued o prove powerful convergence reul for he ochaic algorihm under conideraion. A noion of convergence in diribuion, imilar o a cenral limi heorem, wa alo udied for he purpoe of eimaing he rae of convergence of he ODE mehod (Kuhner, 1978; Kuhner and Shwarz, 1984; Kuhner and Clark, 212), where connecion beween leading order perurbaion and Ornein-Uhlenbeck (OU) procee are eablihed. How- 3

4 Li, Tai and E ever, hee eimae are no yemaically ued o yemaically udy he dynamic of ochaic gradien algorihm. A far a he auhor are aware, he fir work on uing ochaic differenial equaion o udy he precie properie of ochaic gradien algorihm are he independen work of Li e al. (215) and Mand e al. (215). In Li e al. (215), a yemaic framework of SDE approximaion of SGD and SGD wih momenum are derived and applied o udy dynamical properie of he ochaic algorihm a well a adapive parameer uning cheme. Thee go beyond OU proce approximaion and hi diincion i imporan ince he OU proce i no alway he appropriae ochaic approximaion in general eing (See Sec. 4.2 of hi paper). In Mand e al. (215), a imilar procedure i employed o derive a SDE approximaion for he SGD, from which iue uch a choice of learning rae are udied. Alhough he concree analyi in Mand e al. (215) i on he rericed cae of conan diffuion marice leading o OU procee, he eenial idea on he general leading order approximaion are alo dicued. I i imporan o noe ha he approximaion argumen in boh Li e al. (215) and Mand e al. (215) are heuriic from a mahemaical poin of view. In Li e al. (217), he SME approximaion i rigorouly proved in he finie-um-objecive cae wih rong regulariy condiion, and furher aympoic analyi and uning algorihm are udied. The SME approach ha ubequenly been uilized o udy varian of ochaic gradien algorihm, including hoe in he diribued opimizaion eing (An e al., 218). The work of Mand e al. (215) i furher developed in Mand e al. (216, 217), wih applicaion uch a he developmen calable MCMC algorihm. The preen paper build on he earlier work of Li e al. (215, 217), bu focue on exending and olidifying he mahemaical apec. In paricular, we preen an enirely rigorou and elf-conained mahemaical formulaion of he SME framework ha applie o more general algorihm (including momenum SGD and ochaic Neerov acceleraed gradien mehod) and more general objecive (expecaion over random funcion, inead of ju a finie-um). Moreover, variou regulariy condiion in Li e al. (217) have been relaxed. The main approximaion procedure i inpired by he eminal work of Milein (1986, 1975) in numerical analyi of ochaic differenial equaion, bu lower regulariy condiion are required in our cae due o he preence of he mall noie parameer, which allow for beer runcaion of Iô-Taylor expanion. The mahemaical analyi of he SME-ype approximaion for he SGD wa alo performed in Feng e al. (217); Hu e al. (217) uing emi-group approache, alhough he moohne requiremen preened here are greaer han hoe eablihed uing he curren mehod. Laly, he Neerov acceleraed gradien SME we derive in Sec. 4.4 can be viewed a a generalizaion of he ODE approach in Su e al. (214) o ochaic gradien, and we how ha he preence of noie give addiional feaure o he dynamic. Finally, we noe ha coninuou-ime approximaion ha eablih link beween opimizaion, calculu of variaion and ymplecic inegraion ha been udied in Wibiono e al. (216); Beancour e al. (218). 3. Sochaic modified equaion We now inroduce he ochaic modified equaion framework. The aring moivaion i he obervaion ha GD ieraion i a (Euler) dicreizaion of he coninuou-ime, 4

5 Sochaic Modified Equaion I: Mahemaical Foundaion ordinary differenial equaion dx d = f(x), (3.1) and udying (3.1) can give u imporan inigh o he dynamic of he dicree-ime algorihm for mall enough learning rae. The naural queion when exending hi o SGD i, wha i he righ coninuou-ime equaion o conider? Below, we begin wih ome heuriic conideraion. 3.1 Heuriic moivaion we rewrie he SGD ieraion (1.3) a x k+1 = x k η f(x k ) + ηv k (x k, γ k ), (3.2) where V k (x k, γ k ) = η( f(x k ) f γk (x k )) i a d-dimenional random vecor. A raighforward calculaion how ha E[V k x k ] = cov[v k, V k x k ] = ησ(x k ), Σ(x k ) := E[( f γk (x k ) f(x k ))(f γk (x k ) f(x k )) T x k ], (3.3) i.e. condiional on x k, V k (x k ) ha mean and covariance ησ(x k ). Here, Σ i imply he condiional covariance of he ochaic gradien approximaion f γ of f. Now, conider a ime-homogeneou Iô ochaic differenial equaion (SDE) of he form dx = b(x )d + ησ(x )dw, (3.4) where X R d for and W i a andard d-dimenional Wiener proce. The funcion b : R d R d i known a he drif and σ : R d R d d i he diffuion marix. The key obervaion i ha if we apply he Euler dicreizaion wih ep-ize η o (3.4), approximaing X kη by ˆX k, we obain he following dicree ieraion for he laer: ˆX k+1 = ˆX k + ηb( ˆX k ) + ησ( ˆX k )Z k, (3.5) where Z k := W (k+1)η W kη are d-dimenional i.i.d. andard normal random variable. Comparing wih (3.2), if we e b = f, σ(x) = Σ(x) 1 /2 and idenify wih kη, we hen have maching fir and econd condiional momen. Hence, hi moivae he approximaing equaion dx = f(x )d + (ησ(x )) 1/2 dw. (3.6) Noe ha a hi heuriic argumen how, he preence of he mall parameer η on he diffuion erm i neceary o model he fac ha when learning rae decreae, he flucuaion o he SGA ierae mu alo decreae. The immediae mahemaical queion i hen: in wha ene i an SDE like (3.6) an approximaion of (1.3)? Le u now eablih he precie mahemaical framework in which we can anwer hi queion. 5

6 Li, Tai and E 3.2 The mahemaical framework Le (Ω, F, P) be a ufficienly rich probabiliy pace and (Γ, F Γ ) be a meaure pace repreening he index pace for our random objecive. Le γ : Ω Γ be a random variable and (r, x) f r (x) a meaurable mapping from Γ R d o R. Hence, for each x, f γ (x) i a random variable. Throughou hi paper, we aume he follow fac abou f γ (x): Aumpion 3.1 The random variable f γ (x) aifie (i) f γ (x) L 1 (Ω) for all x R d (ii) f γ (x) i coninuouly differeniable in x almo urely and for each R >, here exi a random variable M R,γ uch ha max x R f γ (x) M R,γ almo urely, wih E M R,γ < (iii) f γ (x) L 2 (Ω) for all x R d Noe ha in he empirical rik minimizaion cae where Γ i finie, he condiion above are ofen rivially aified. Condiion (i) in Aumpion 3.1 allow u o define he oal objecive funcion we would like o minimize a he expecaion f(x) := Ef γ (x) f γ(ω) (x)dp(ω). (3.7) Moreover, Aumpion 3.1 (ii) implie via he dominaed convergence heorem ha E f γ = Ef γ f. Now, le {γ k : k =, 1,... } be a equence of i.i.d. Γ-valued random variable wih he ame diribuion a γ. Le x R d be fixed and define he generalized ochaic gradien ieraion a he ochaic proce Ω x k+1 = x k + ηh(x k, γ k, η) (3.8) for k, where h : R d Γ R R d i a meaurable funcion and η > i he learning rae. In he imple cae of SGD, we have h(x, r, η) = f r (x), bu we hall conider he generalized verion above o ha modified equaion for SGD varian can alo be derived from our approximaion heorem. Nex, le u define he cla of approximaing coninuou ochaic procee, which we call ochaic modified equaion. Conider he ime-homogeneou Iô diffuion proce {X : } repreened by he following ochaic differenial equaion (SDE) dx = b(x, η)d + ησ(x, η)dw, X = x (3.9) where {W : } i a andard d-dimenional Wiener proce independen of {γ k }, b : R d R R d i he approximaing drif vecor and σ : R d R R d d i he approximaing diffuion marix. In he following, we will need o pick b, σ appropriaely o ha (3.8) i approximaed by (3.9), he ene of which we now decribe. Fir, noice ha he ochaic proce {x k } induce a probabiliy meaure on he produc pace R d R d, wherea {X } induce a probabiliy meaure on C ([, ), R d ). Hence, we can only compare heir value by ampling a dicree number of poin from he laer. Second, he proce {x k } i adaped o he filraion generaed by {γ k } (e.g. in 6

7 Sochaic Modified Equaion I: Mahemaical Foundaion he cae of SGD, hi i he random ampling of funcion in {f r }), wherea he proce {X } i adaped o an independen, Wiener filraion. Hence, i i no appropriae o compare individual ample pah. Raher, we define below a ene of weak approximaion by comparing he diribuion of he wo procee. Definiion 1 Le G denoe he e of coninuou funcion R d R of a mo polynomial growh, i.e. g G if here exi poiive ineger κ 1, κ 2 > uch ha g(x) κ 1 (1 + x 2κ 2 ), for all x R d. Moreover, for each ineger α 1 we denoe by G α he e of α-ime coninuouly differeniable funcion R d R which, ogeher wih i parial derivaive up o and including order α, belong o G. Noe ha each G α i a ubpace of C α, he uual pace of α-ime coninuouly differeniable funcion. Moreover, if g depend on addiional parameer, we ay g G α if he conan κ 1, κ 2 are independen of hee parameer, i.e. g G α uniformly. Finally, he definiion generalize o vecor-valued funcion coordinae-wie in he co-domain. Definiion 2 Le T >, η (, 1 T ), and α 1 be an ineger. Se N = T/η. We ay ha a coninuou-ime ochaic proce {X : [, T ]} i an order α weak approximaion of a dicree ochaic proce {x k : k =,..., N} if for every g G α+1, here exi a poiive conan C, independen of η, uch ha max Eg(x k) Eg(X kη ) Cη α. (3.1) k=,...,n Le u dicu briefly he noion of weak approximaion a inroduced above. Thee are approximaion of he diribuion of ample pah, inead of he ample pah hemelve. Thi i enforced by requiring ha he expecaion of he wo procee {X } and {x k } over a ufficienly large cla of e funcion o be cloe. In our definiion, he e funcion cla G α+1 i quie large, and in paricular i include all polynomial. Thu, Eq. (3.1) implie in paricular ha all momen of he wo procee become cloe a he rae of η α, and hence o mu heir diribuion. The noion of weak approximaion mu be conraed wih ha of rong approximaion, where one would for example require (in he cae of mean-quare approximaion) [E x k X kη 2 ] 1 /2 Cη α. The above force he acual ample-pah of he wo procee o be cloe, per realizaion of he random proce, which everely limi i applicaion. In fac, one imporan advanage of weak approximaion i ha he approximaing SDE proce X can in fac approximae dicree ochaic procee whoe ep-wie driving noie i no Gauian, which i exacly wha we need o analyze general ochaic gradien ieraion. 4. The approximaion heorem We now preen he main approximaion heorem. The derivaion i baed on he following wo-ep proce: 7

8 Li, Tai and E 1. We eablih a connecion beween one-ep approximaion and approximaion on a finie ime inerval. 2. We conruc a one-ep approximaion ha i of order α+1, and o he approximaion on a finie inerval i of order α. 4.1 Relaing one-ep o N-ep approximaion Le u conider generally he queion of he relaionhip beween one-ep approximaion and approximaion on a finie inerval. Le T >, η (, 1 T ) and N = T/η and recall he general SGA ieraion x k+1 = x k + ηh(x k, γ k, η), x R d, k =,..., N. (4.1) and he general candidae family of approximaing SDE dx η,ɛ = b(x η,ɛ, η, ɛ)d + ησ(x η,ɛ, η, ɛ)dw, X = x, [, T ], (4.2) where ɛ (, 1) i a mollificaion parameer, whoe role will become apparen laer. To reduce noaional cluer and improve readabiliy, unle ome limiing procedure i conidered, we hall no explici wrie he dependence of X η,ɛ on η, ɛ and imply denoe by X he oluion of he above SDE. Le u alo denoe for convenience X k := X kη. Furher, le {X x, : } denoe he ochaic proce obeying he ame equaion (4.2), bu wih he iniial condiion X x, x,l = x. We imilarly wrie X k := X x,lη kη and denoe by {x x,l k : k l} he ochaic proce aifying (4.1) bu wih x l = x. Throughou hi ecion, we aume he following condiion: Aumpion 4.1 The funcion b : R d (, 1 T ) (, 1) R d and σ : R d (, 1 T ) (, 1) R d d aify: 1. Uniform linear growh condiion for all x, y R d, η (, 1 T ), ɛ (, 1). 2. Uniform Lipchiz condiion b(x, η, ɛ) 2 + σ(x, η, ɛ) 2 L 2 (1 + x 2 ) b(x, η, ɛ) b(y, η, ɛ) + σ(x, η, ɛ) σ(y, η, ɛ) L x y for all x, y R d, η (, 1 T ), ɛ (, 1). Noe ha 2 implie 1 if here i a lea one x where he upremum of b, σ over η, ɛ i finie. In paricular, hee condiion imply via Thm. 18 ha here exi a unique oluion o Eq Now, le u denoe he one-ep change (x) := x x, 1 x, (x) := Xx, 1 x. (4.3) We prove he following reul which relae one-ep approximaion wih approximaion on a finie ime inerval. 8

9 Sochaic Modified Equaion I: Mahemaical Foundaion Theorem 3 Le T >, η (, 1 T ), ɛ (, 1) and N = T/η. Le α 1 be an ineger. Suppoe furher ha he following condiion hold: (i) There exi a funcion ρ : (, 1) R + and K 1 G independen of η, ɛ uch ha E (ij )(x) E j=1 for = 1, 2,..., α and where i j {1,..., d}. E α+1 j=1 (ij )(x) K 1(x)(ηρ(ɛ) + η α+1 ), j=1 (ij )(x) K 1 (x)η α+1, (ii) For each m 1, he 2m-momen of x x, k i uniformly bounded wih repec o k and η, i.e. here exi a K 2 G, independen of η, k, uch ha for all k =,..., N T/η. E x x, k 2m K 2 (x), Then, for each g G α+1, here exi a conan C >, independen of η, ɛ, uch ha max Eg(x k) Eg(X kη ) C(η α + ρ(ɛ)) k=,...,n The proof of Thm. 3 require a number of echnical reul ha we defer o he appendix. Below, we demonrae he main ingredien of he proof and refer o he appendix where he proof of he auxiliary reul are fully preened. Proof In hi proof, ince here are many condiioning on he iniial condiion, o preven need upercrip we hall inroduce he alernaive noaion X (x, ) X x,, and imilarly for X k and x k. Fix g G α+1 and 1 k N. We have Eg(X kη ) = Eg( X k ) = Eg( X k ( X 1, 1)) Eg( X k (x 1, 1)) + Eg( X k (x 1, 1)). If k > 1, by noing ha X k (x 1, 1) = X k ( X 2 (x 1, 1), 2), we ge Eg( X k (x 1, 1)) = Eg( X k ( X 2 (x 1, 1), 2)) Eg( X k (x 2, 2)) + Eg( X k (x 2, 2)) Coninuing hi proce, we hen have Eg( X k 1 k ) = Eg( X k ( X l (x l 1, l 1), l)) Eg( X k (x l, l)) l=1 + Eg( X k (x k 1, k 1)) 9

10 Li, Tai and E and hence by ubracing Eg(x k ) Eg(x k (x k 1, k 1)) we ge and o Eg( X k 1 k ) Eg(x k ) = Eg( X k ( X l (x l 1, l 1), l)) Eg( X k (x l, l)) l=1 l=1 + Eg( X k (x k 1, k 1)) Eg(x k (x k 1, k 1)) Eg( X k 1 [ k ) Eg(x k ) = EE g( X k ( X l (x l 1, l 1), l)) X l (x l 1, l 1) ] [ EE g( X ] k (x l, l)) x l + Eg( X k (x k 1, k 1)) Eg(x k (x k 1, k 1)), Now, le u(x, ) = Eg(X kη (x, )). Then, we have Eg( X k 1 k ) Eg(x k ) Eu( X l (x l 1, l 1), lη) Eu(x l (x l 1, l 1), lη) l=1 + Eg( X k (x k 1, k 1)) Eg(x k (x k 1, k 1)) k 1 E E[u( X l (x l 1, l 1), lη) x l 1 ] E[u(x l (x l 1, l 1), lη) x l 1 ] l=1 + E E[g( X k (x k 1, k 1)) x k 1 ] E[g(x k (x k 1, k 1)) x k 1 ]. Uing Prop. 25, u(, ) G α+1 uniformly in,, η and ɛ. Thu, by Aumpion (i) and Lem. 27, ( k 1 ) Eg(x k ) Eg( X k ) (ηρ(ɛ) + η α+1 ) EK l 1 (x l 1 ) + EK k 1 (x k 1 ) (ηρ(ɛ) + η α+1 ) l=1 N κ l,1 (1 + E x l 2κ l,2 ), where in he la line we ued momen eimae from Thm. 19. Finally, uing Aumpion (ii) and he fac ha N T/η, we have Eg(x k ) Eg(X kη ) = Eg(x k ) Eg( X k ) C(ρ(ɛ) + η α ). l= 4.2 SME for ochaic gradien decen Thm. 3 allow u o prove he main approximaion reul for he curren paper. In paricular, in hi ecion we derive a econd-order accurae weak approximaion for he imple 1

11 Sochaic Modified Equaion I: Mahemaical Foundaion SGD ieraion (1.3), from which a impler, fir-order accurae approximaion alo follow. A een in Thm. 3, we need only verify he condiion (i)-(ii) in order o prove he weak approximaion reul. Thee condiion moly involve momen eimae, which we now perform. To implify preenaion, we inroduce he following horhand. Whenever we wrie ψ(x) = ψ (x) + ηψ 1 (x) + O(r(η, ɛ)), for ome remainder erm r(η, ɛ), we mean: here exi K G independen of η, ɛ uch ha Now, le u e in (4.2) ψ(x) ψ (x) ηψ 1 (x) K(x)r(η, ɛ). b(x, η, ɛ) = b (x, ɛ) + ηb 1 (x, ɛ) σ(x, η, ɛ) = σ (x, ɛ), where b, b 1, σ are funcion o be deermined. We have he following momen eimae. Lemma 4 Le (x) be defined a in (4.3). Suppoe furher ha wih b, b 1, σ G 3. Then we have (i) E (i) (x) = b (x, ɛ) (i) η + [ 1 2 b (x, ɛ) (j) (j) b (x, ɛ) (i) + b 1 (x, ɛ) (i) ]η 2 + O(η 3 ), (ii) E (i) (x) (j) (x) = [b (x, ɛ) (i) b (x, ɛ) (j) + σ (x, ɛ) (i,k) σ (x, ɛ) (j,k) ]η 2 + O(η 3 ), (iii) E 3 j=1 (ij )(x) = O(η 3 ). Proof To obain (i)-(iii), we imply apply Lem. 28 wih ψ(z) = j=1 (z (i j ) x (ij )) for = 1, 2, 3 repecively. Nex, we eimae he momen of he SGA ieraion below. Lemma 5 Le (x) be defined a in (4.3) wih he SGD ieraion, i.e. h(x, r, η) = f r (x). Suppoe ha for each x R d, f G 1. Then, (i) E (i) (x) = (i) f(x)η, (ii) E (i) (x) (j) (x) = (i) f(x) (j) f(x)η 2 + Σ(x) (i,j) η 2, (iii) E 3 j=1 (i j )(x) = O(η 3 ), where Σ(x) := E( f γ (x) f(x))( f γ (x) f(x)) T. Proof We have (x) = η f γ (x). Taking expecaion, he reul hen follow. We now prove he main approximaion heorem for he imple SGD. Before preening he aemen and proof, we hall noe a few echnical iue ha preven he direc applicaion 11

12 Li, Tai and E of Thm. 3 wih he momen eimae in Lem.4 and 5. The laer ugge ignoring ɛ and eing b (x, ɛ) = f(x), b 1 (x, ɛ) = 1 4 f(x) 2, σ (x, ɛ) = Σ(x) 1 2. Then, we would ee from Lem.4 and 5 ha he SGD and he SDE have maching momen up o O(η 3 ). The fir iue wih hi approach i ha even if Σ(x) i ufficienly mooh (which may follow from he regulariy of f γ ), he moohne of Σ(x) 1 /2 canno be guaraneed unle Σ(x) i poiive-definie, which i ofen oo rong an aumpion in pracice and exclude inereing cae where Σ(x) i a ingular diffuion marix. However, he reul in Sec. 4.1 require moohne. Second, we would like o conider funcion f γ ha may no have higher rong derivaive required by he Lemma, beyond hoe required o define he modified equaion ielf. To fix boh of hee iue, we will ue a imple mollifying echnique. Thi i he reaon for he incluion of he ɛ parameer in he reul in Sec Definiion 6 Le u denoe by ν : R d R, ν Cc (R d ) he andard mollifier { C exp( 1 ) x < 1 1 x ν(x) := 2 x 1, where C := ( R ν(y)dy) 1 i choen o ha he inegral of ν i 1. Furher, define ν ɛ (x) = d ɛ d ν(x/ɛ). Le ψ L 1 loc (Rd ) be locally inegrable, hen we may define i mollificaion by ψ ɛ (x) := (ν ɛ ψ)(x) = ν ɛ (x y)ψ(y)dy = ν ɛ (y)ψ(x y)dy, R d B(,ɛ) where B(z, ɛ) i he d-dimenional ball of radiu ɛ cenered a z. The mollificaion of vecor (or marix) valued funcion are defined elemen-wie. The mollifier ha very ueful properie. In paricular, we will ue he following wellknown fac (ee e.g. Evan (21) for proof) (i) If ψ L 1 loc (Rd ), hen ψ ɛ C (R d ) (ii) ψ ɛ (x) ψ(x) a ɛ for almo every x R d (wih repec o he Lebegue meaure) (iii) If ψ i coninuou, hen ψ ɛ (x) ψ(x) a ɛ uniformly on compac ube of R d Nex, we make ue of he idea of weak derivaive. Definiion 7 Le Ψ L 1 loc (Rd ) and J be a muli-index of order J. Suppoe ha here exi a ψ L 1 loc (Rd ) uch ha Ψ(x) J φ(x)dx = ( 1) J ψ(x)φ(x)dx R d R d for all φ C c. Then, we call ψ he order J weak derivaive of Ψ and wrie D J Ψ = ψ. Noe ha when i exi, he weak derivaive i unique almo everywhere and if Ψ i differeniable, J Ψ = D J Ψ almo everywhere (Evan, 21). 12

13 Sochaic Modified Equaion I: Mahemaical Foundaion The inroducion of weak derivaive moivae he definiion of he weak verion of he funcion pace G α. Definiion 8 For α 1, we define he pace G α w o be he ubpace of L 1 loc (Rd ) uch ha if g G α w, hen g ha weak derivaive up o order α and for each muli-index J wih J α, here exi poiive ineger κ 1, κ 2 uch ha D J g(x) κ 1 (1 + x 2κ 2 ) for a.e. x R d. A in Def. 1, if g depend on addiional parameer, we ay ha g G α w if he above conan do no depend on he addiional parameer. Alo, vecor-valued g are defined a above elemen-wie in he co-domain. Noe ha G α w i a ubpace of he Sobolev pace W α,1 loc. Theorem 9 Le, T >, η (, 1 T ) and e N = T/η. Le {x k : k } be he SGD ieraion defined in (1.3). Suppoe he following condiion are me: (i) f Ef γ i wice coninuouly differeniable, f 2 i Lipchiz, and f G 4 w. (ii) f γ aifie a Lipchiz condiion: f γ (x) f γ (y) L γ x y a.. for all x, y R d, where L γ i a random variable which i poiive a.. and EL m γ for each m 1. < Define {X : [, T ]} a he ochaic proce aifying he SDE dx = (f(x ) η f(x ) 2 )d + ησ(x ) 1 /2 dw X = x, (4.4) wih Σ(x) = E( f γ (x) f(x))( f γ (x) f(x)) T. Then, {X : [, T ]} i an order- 2 weak approximaion of he SGD, i.e. for each g G 3, here exi a conan C > independen of η uch ha max Eg(x k) Eg(X kη ) Cη 2. k=,...,n Proof Fir, we check ha Eq. (4.4) admi a unique oluion, which amoun o checking he condiion in Thm. 18. Noe ha he Lipchiz condiion (ii) implie f i Lipchiz wih conan EL γ. To ee ha Σ(x) 1 /2 i alo Lipchiz, oberve ha u(x) := f γ (x) f(x) i Lipchiz (in he ene of (ii), wih conan a mo L γ + EL γ ), and Σ(x) 1 /2 Σ(y) 1 /2 = [u(x)u(x) T ] 1 /2 L 2 (Ω) [u(y)u(y) T ] 1 /2 L 2 (Ω) [u(x)u(x) T ] 1 /2 [u(y)u(y) T ] 1 /2 L 2 (Ω). Moreover, oberve ha for vecor u R d he mapping u (uu T ) 1 /2 = uu T / u i Lipchiz, which implie Σ(x) 1 /2 Σ(y) 1 /2 L u(x) u(y) L 2 (Ω) L x y. 13

14 Li, Tai and E The Lipchiz condiion on he drif and he diffuion marix imply uniform linear growh, o by Thm. 18, Eq. (4.4) admi a unique oluion. For each ɛ (, 1), define he mollified funcion b (x, ɛ) = ν ɛ f(x), b 1 (x, ɛ) = 1 4 νɛ ( f(x) 2 ), σ (x, ɛ) = ν ɛ Σ(x) 1 /2. Oberve ha b + ηb 1, σ aifie a Lipchiz condiion in x uniformly in η, ɛ. To ee hi, noe ha for any Lipchiz funcion ψ wih conan L, we have ν ɛ ψ(x) ν ɛ ψ(y) ν ɛ (z) ψ(x z) ψ(y z) dz L x y, B(,ɛ) which prove b + ηb 1 and σ are uniformly Lipchiz. Similarly, he linear growh condiion follow. Hence, we may define a family of ochaic procee {X ɛ : ɛ (, 1)} aifying dx ɛ = b (X ɛ, ɛ) + ηb 1 (X ɛ, ɛ) + ησ (X ɛ, ɛ)dw X ɛ = x, which each admi a unique oluion by Thm. 18. Now, we claim ha b (, ɛ), b 1 (, ɛ), σ (, ɛ) G 3 uniformly in ɛ. To ee hi, imply oberve ha mollificaion are mooh, and moreover, he polynomial growh i aified ince ν ɛ D J ψ = J (ν ɛ ψ) and furhermore, if ψ G, hen we have ψ ɛ (x) ν ɛ (y) ψ(x y) dy B(,ɛ) κ 1 ( κ 2 1 x 2κ κ ɛ d B(,ɛ) y 2κ 2 dy Bu B(,ɛ) y 2κ 2 dy Vol(B(, ɛ)) = Cɛ d, where C i independen of ɛ. Thi how ha ψ ɛ G uniformly in ɛ. Thi immediaely implie ha b (, ɛ), b 1 (, ɛ), σ (, ɛ) G 3. Now, ince b (x, ɛ) b (x, ) (and imilarly for b 1, σ ), and he limi are coninuou, by Lem. 4, 5, 29, 3 all condiion of Thm. 3 are aified, and hence we conclude ha for each g G 3, we have, max k=,...,n Eg(Xɛ kη ) Eg(x k) C(η 2 + ρ(ɛ)), where C i independen of η and ɛ and ρ(ɛ) a ɛ. Moreover, ince b (x, ɛ) b (x, ) (and imilarly for b 1, σ ) uniformly on compac e, we may apply Thm. 2 o conclude ha Thu, we have up E X ɛ X 2 a ɛ. [,T ] Eg(X kη ) Eg(x k ) Eg(X ɛ kη ) Eg(x k) + Eg(X ɛ kη ) Eg(X kη) C(η 2 + ρ(ɛ)) + ( E Xkη ɛ X kη 2)1 /2 ( 1 E 2 g(λxkη ɛ + (1 λ)x kη) 2 dλ )1/2 ) 14

15 Sochaic Modified Equaion I: Mahemaical Foundaion Uing Thm. 19 and aumpion ha 2 g G, he la expecaion i finie and hence aking he limi ɛ yield our reul. By going for a lower order approximaion, we of coure have he following: Corollary 1 Aume he ame condiion a in Thm. 9, excep ha we replace (i) wih (i) f Ef γ i coninuouly differeniable, and f G 3 w. Define {X : [, T ]} a he ochaic proce aifying he SDE dx = f(x )d + ησ(x ) 1 /2 dw X = x, (4.5) wih Σ(x) = E( f γ (x) f(x))( f γ (x) f(x)) T. Then, {X : [, T ]} i an order- 1 weak approximaion of he SGD, i.e. for each g G 2, here exi a conan C > independen of η uch ha max Eg(X kη) Eg(x k ) Cη. k=,...,n Remark 11 In he above reul, he mo rericive condiion i probably he Lipchiz condiion on f γ. Such Lipchiz condiion are imporan o enure ha he SME admi unique rong oluion and he SGA having uniformly bounded momen. Noe ha following imilar echnique in SDE analyi (e.g. Kloeden and Plaen (211)), hee global condiion may be relaxed o heir repecive local verion if we aume in addiion a uniform global linear growh condiion on f γ. Finally, for applicaion, ypical lo funcion have inward poining gradien for all ufficienly large x, meaning ha he SGD ierae will be uniformly bounded almo urely. Thu, we may imply modify he lo funcion for large x (wihou affecing he SGA ierae) o aify he condiion above. Remark 12 The conan C doe no depend on η, bu a evidenced in he proof of he heorem, i generally depend on g, T, d and he variou Lipchiz conan. For he fairly general iuaion we are conider, we do no derive igh eimae of hee dependencie. 4.3 SME for ochaic gradien decen wih momenum Le u dicu he correponding SME for a popular varian of he SGD called he momenum SGD (MSGD). The momenum SGD augmen he uual SGD ieraion wih a memory erm. In he uual form, we have he ieraion ˆv k+1 = ˆµˆv k ˆη f γk (x k ) x k+1 = x k + ˆv k+1 where ˆµ (, 1) (ypically cloe o 1) i called he momenum parameer and ˆη i he learning rae. Le u conider a recaled verion of he above ha i eaier o analyze via coninuou-ime approximaion. We redefine η := ˆη, v k := ˆv k / ˆη, µ := (1 ˆµ)/ ˆη (4.6) 15

16 Li, Tai and E o obain v k+1 = v k µηv k η f γk (x k ) x k+1 = x k + ηv k+1. (4.7) In view of he recaling, he range of momenum parameer we conider become µ (, η 1/2 ), which we may replace by (, ) for impliciy. Le u now derive he SME aified by he ieraion (4.7). Oberve ha hi i again a pecial cae of (4.1) wih x now replaced by (v, x) and h(v, x, γ, η) = ( µv f γ (x), v ηµv η f γ (x)) In view of Thm. 14 and he reul in Sec. 4.2, in order o derive he SME we imply mach momen up o order 3. A in Sec. 4.2, le u define he one ep difference The following momen expanion are immediae. Lemma 13 Le (x, v) be defined a in (4.8). We have (v, x) := (v v,x, 1 v, x v,x, 1 x). (4.8) (i) E (i) (v, x) = η( µv (i) (i) f(x), v) + η 2 (, µv (i) (i) f(x)), (ii) E (i) (v, x) (j) (v, x) = µ 2 v (i) v (j) + µv (i) (j) f(x) + µv (j) (i) f(x) η 2 +Σ(x) (i,j) + (i) (j) f(x) µv (i) v (j) v (j) (i) f(x) + O(η 3 ), µv (i) v (j) v (i) (j) f(x) v (i) v (j) (iii) E 3 j=1 (i j )(v, x) = O(η 3 ), where Σ(x) := E( f γ (x) f(x))( f γ (x) f(x)) T. Proof The proof follow from direc calculaion of he momen. Hence, proceeding exacly a in Sec. 4.2 and uing Lem.4, 13, we ee ha we may e b (v, x) = ( µv f(x), v) b 1 (v, x) = 1 ( 2 µ[µv + f(x)] 2 f(x)v, µv + f(x) ) ( ) Σ(x) 1/2 σ (v, x) = in order o mach he momen. By imilar mollificaion and limiing argumen a in Thm. 9, we arrive a he following approximaion heorem, where we can ee ha he SME for MSGD ake he form of a Langevin equaion. 16

17 Sochaic Modified Equaion I: Mahemaical Foundaion Theorem 14 Aume he ame condiion a in Thm. 9. Le µ > be fixed and define {V, X : [, T ]} a he ochaic proce aifying he SDE dv = [(µi η[µ2 I 2 f(x )])V + ( ηµ) f(x )]d + ησ(x ) 1 /2 dw V = v, dx = [(1 1 2 ηµ)v 1 2 η f(x )]d X = x, (4.9) wih Σ(x) a defined in Thm. 9. Then, {(V, X ) : [, T ]} i an order-2 weak approximaion of he MSGD. Moreover, if we relax he aumpion o Cor. 1, we have he order-1 weak approximaion dv = [µv + f(x )]d + ησ(x ) 1 /2 dw V = v, dx = V d X = x. (4.1) Noe ha by invering he caling (4.6), he order-1 SME (4.1) i he formal equaion derived in Li e al. (215). 4.4 SME for a momenum varian: Neerov acceleraed gradien I follow from he calculaion above ha we can alo obain he SME for he ochaic gradien verion of he Neerov acceleraed gradien (NAG) mehod (Neerov, 1983), which we refer o a SNAG. In he non-ochaic cae, he NAG mehod ha been analyzed uing he ODE approach (Su e al., 214). Therefore, he derivaion in hi ecion can be viewed a a ochaic parallel. The NAG mehod i omeime ued wih ochaic gradien, and hence i i ueful o analyze i properie in hi eing and compare i o MSGD. The uncaled NAG ieraion are ˆv k+1 = ˆµ kˆv k ˆη f γk (x k + ˆµ kˆv k ) x k+1 = x k + ˆv k+1 wih ˆv =, which differ from he momenum ieraion a he gradien i now evaluaed a a prediced poiion x k + ˆµ kˆv k, inead of he original poiion x k. Moreover, he momenum parameer ˆµ k i now allowed o vary a k increae, and in fac, he uual choice of ˆµ k = k 1 k+2 (4.11) hi ha imporan link o abiliy and acceleraion in he deerminiic cae (Neerov, 1983; Su e al., 214). In paricular, i achieve O(1/k 2 ) convergence rae for general convex funcion. On he oher hand, a conan ˆµ k i uggeed for rongly convex funcion (Neerov, 213). In he following, we hall fir conider he cae of conan momenum parameer wih ˆµ k ˆµ, and hen he choice (4.11) ubequenly. Conan momenum. which i again (4.1) wih Uing he ame recaling in (4.6), we have v k+1 = v k µηv k η f γk (x k + η(1 µη)v k ) x k+1 = x k + ηv k+1. h(v, x, γ, η) = ( µv f γ (x + η(1 µη)v), v ηµv η f γ (x + η(1 µη)v)) Hence, we have he following momen expanion. (4.12) 17

18 Li, Tai and E Lemma 15 Le (x, v) := (v v,x, 1 v, x v,x, 1 x). We have (i) E (i) (v, x) = η( µv (i) (i) f(x), v) + η 2 ( (i) (j) f(x)v (j), µv (i) (i) f(x + v)) + O(η 3 ), (ii) E (i) (v, x) (j) (v, x) = µ 2 v (i) v (j) + µv (i) (j) f(x + v) + µv (j) (i) f(x + v) η 2 +Σ(x + v) (i,j) + (i) (j) f(x + v) µv (i) v (j) v (i) (j) f(x + v) + O(η 3 ), µv (i) v (j) v (j) (i) f(x + v) (iii) E 3 j=1 (i j )(v, x) = O(η 3 ), where Σ(x) := E( f γ (x) f(x))( f γ (x) f(x)) T. v (i) v (j) Proof The proof follow from direc calculaion of he momen and Taylor expanion. Hence, we may mach momen by eing b (v, x) = ( µv f(x), v) b 1 (v, x) = 1 ( 2 µ[µv + f(x)] + 2 f(x)v, µv + f(x) ) ( ) σ (v, x) = Σ(x) 1 2 from which we obain he following approximaion heorem for SNAG. Theorem 16 Aume he ame condiion a in Thm. 14. Define {V, X : [, T ]} a he ochaic proce aifying he SDE dv = [(µi η[µ2 I + 2 f(x )])V + ( ηµ) f(x )]d + ησ(x ) 1 /2 dw V = v, dx = [(1 1 2 ηµ)v 1 2 η f(x )]d X = x, (4.13) wih Σ a defined in Thm. 14. Then, {(V, X ) : [, T ]} i an order-2 weak approximaion of SNAG. Moreover, he ame order-1 weak approximaion of MSGD in (4.1) hold for he SNAG. The reul above how ha for conan momenum parameer, he modified equaion for MSGD and he SNAG are equivalen a leading order, bu differ when we conider he econd order modified equaion. Le u now dicu he cae where he momenum parameer i allowed o vary. Varying momenum. argumen, we arrive a Now le u ake ˆµ a in (4.11). Then, uing he ame recaling v k+1 = v k µ k ηv k η f γk (x k + η(1 µ k η)v k ) x k+1 = x k + ηv k+1. (4.14) 18

19 Sochaic Modified Equaion I: Mahemaical Foundaion wih µ k = 3/(2η + kη). Now, in order o apply our heoreical reul o deduce he SME, imply noice ha we may inroduce an auxiliary calar variable z k+1 = z k + η, z =. Then, µ k = 3/(2η + z k ), and hence all erm are now no explicily k-independen, hu we may proceed formally a in he previou ecion o arrive a he order-1 SME for SNAG wih varying momenum dv = [ 3 V + f(x )]d + ησ(x ) 1 /2 dw V =, dx = V d X = x. (4.15) Thi reul i formal becaue he erm 3/ doe no aify our global Lipchiz condiion, unle we reric our inerval o ome [, T ] wih >, in which cae he above reul become rigorou. Alernaively, ome limiing argumen have o be ued o eablih wellpoedne of he equaion on [, T ] individually. We hall omi hee analye in he curren paper, and only conider (4.15) on ome inerval [, T ], where iniial condiion are hen replaced by (v, x ). A a poin of comparion, (4.15) reduce o he ODE derived in Su e al. (214) if Σ(x) (i.e. he gradien are non-ochaic). 5. Applicaion of he SME o he analyi of SGA In hi ecion, we apply he SME framework developed o analyze he dynamic of he hree ochaic gradien algorihm varian dicued above, namely SGD, MSGD and SNAG. We hall focu on imple bu non-rivial model where o a large exen, analyical compuaion uing SME are racable, giving u key inigh ino he algorihm ha are oherwie difficul o obain wihou appealing o he coninuou formalim preened in hi paper. We conider primarily he following model: Le H R d d be a ymmeric, poiive definie marix. Define he ample objec- Model: ive f γ (x) := 1 2 (x γ)t H(x γ) 1 2 Tr(H) γ N (, I) (5.1) which give he oal objecive f(x) Ef γ (x) = 1 2 xt Hx. 5.1 SME analyi of SGD We fir derive he SME aociaed wih (5.1). For impliciy, we will only conider he order-1 SME (4.5). A direc compuaion how ha Σ(x) = H 2 and o he SME for SGD applied o model (5.1) i dx = HX d + ηhdw, Thi i a muli-dimenional Ornein-Uhlenbeck (OU) proce and admi he explici oluion X = e (x H + ) η e H HdW. 19

20 Li, Tai and E Oberve ha for each, he diribuion of X i Gauian. Uing Iô iomery, we hen deduce he dynamic of he objecive funcion Ef(X ) = 1 2 xt He 2H x η Tr(H 3 e 2( )H )d n = 1 2 xt He 2H x η λ 2 i (H)(1 e 2λi(H) ). (5.2) The fir erm decay linearly wih aympoic rae 2λ d (H), and he econd erm i induced by noie, and i aympoic value i proporional o he learning rae η. Thi i he wellknown wo-phae behavior of SGD under conan learning rae: an iniial decen phae induced by he deerminiic gradien flow and an evenual flucuaion phae dominaed by he variance of he ochaic gradien. In hi ene, he SME make he ame predicion, and in fac we can ee ha i approximae he SGD ieraion well a η decreae (Fig. 5.1(a)), according o he rae we derived in Thm. 9 and Cor. 1. i=1 f(xt) f(x T/ ) Order 1 Slope=1 Order 1 Slope=2 rae SGD Slope = (H) (a) (b) Figure 5.1: SME predicion v SGD dynamic. (a) SME a a weak approximaion of he SGD. We compue he weak error wih e funcion g equal o f (ee Thm. 9). A prediced by our analyi, he order-2 SME (4.4) (order-1 SME (4.5)) hould give a lope = 2 (1) decreae in error a η decreae (noe ha he x-axi i flipped). The SME oluion i compued uing an exac formula derived by he applicaion of Iô iomery and he SGD expecaion i averaged over 1e6 run. We ook T = 2.. We ee ha he predicion of Thm. 9 and Cor.1 hold. (b) Decen rae v condiion number. H i generaed wih differen condiion number, and he reuling decen rae of SGD i approximaely κ(h) 1, a prediced by he SME. Moreover, noice ha by he idenificaion = kη (k i he SGD ieraion number), he SME analyi ell u ha he aympoic linear convergence rae (in k, i.e. rae log[ef(x k )]/k) in he decen phae of he SGD i 2λ d (H)η. For numerical abiliy (even in he non-ochaic cae), we uually require η 1/λ 1 (H), hu he maximal decen rae i inverely proporional o he condiion number κ(h) = λ 1 (H)/λ d (H). We validae hi obervaion by generaing a collecion of H wih varying condiion number and applying 2

21 Sochaic Modified Equaion I: Mahemaical Foundaion SGD wih η 1/λ 1 (H). In Fig 5.1(b), we plo he iniial decen rae veru he condiion number of H and we oberve ha we indeed have rae κ(h) 1. Alernae model. Now, we conider a ligh variaion of he model (5.1). The goal i how ha he dynamic of SGD (and he correponding SME) i no alway Gauian-like and hu uing he OU proce o model he SGD i no alway valid. Given he ame poiivedefinie marix H, we diagonalize i in he form H = QDQ T where Q i an orhogonal marix and D i a diagonal marix of eigenvalue. We hen define he ample objecive f γ (x) := 1 2 (QT x) T [D + diag(γ)](q T x) γ N (, I) (5.3) which give he ame oal objecive f(x) Ef γ (x) = 1 2 xt Hx. However, we have a differen expreion for Σ(x) which give he SME We can rewrie he above a Σ(x) = Qdiag(Qx) 2 Q T, dx = HX d + ηq diag(q T x) Q T dw in diribuion = HX d + ηq diag(q T x)q T dw. dx = HX d + η d Q (l) X dw (l),, where Q (l) = Q diag(q (l, ) )Q T and Q (l, ) denoe he l h row of Q. By oberving ha every pair of {H, Q (1),..., Q (d) } commue, we have he explici oluion l=1 X = e 1 2 η+ η d l=1 Q(l) W (l), e H x. which i a muli-dimenional Black-Schole (Black and Schole, 1973) ype of ochaic proce. In paricular, he diribuion i no Gauian of any >. Neverhele, we may ake expecaion o obain Ef(X ) = 1 2 eη x T He 2H x. Thi immediaely implie he following inereing behavior: if η < 2λ d (H), hen 2H ηi i poiive definie and o Ef(X ) exponenially a conan, non-zero η; Oherwie, depending on iniial condiion x, he objecive may no converge o. In paricular, if η > 2λ d (H) (which happen quie ofen if he condiion number of H i large) and x i in general poiion, hen we have aympoic exponenial divergence. Thi i a varianceinduced divergence ypically oberved in Black-Schole and geomeric Brownian moion ype of ochaic procee. The erm variance-induced i imporan here ince he deerminiic par of he evoluion equaion i mean-revering and in fac i idenical o he able OU proce udied earlier. In Fig. 5.2(a), (b), we how he correpondence of he SME finding 21

22 Li, Tai and E wih he acual dynamic of he SGD ieraion. In paricular, we ee in Fig. 5.2(c) ha for mall η, we have exponenial convergence of he SGD a conan learning rae, wherea for η > 2λ d (H), he SGD ierae ar o ocillae wildly and i mean value i dominaed by few large value and diverge approximaely a he rae prediced by he SME. Noe ha hi divergence i prediced o be a a finie η, and from he heory developed o far we canno conclude ha he SME approximaion alway hold accuraely a hi regime (bu he approximaion i guaraneed for η ufficienly mall). Neverhele, we oberve a lea in hi model ha he variance-induced divergence of he SGD happen a prediced by he SME. f(xt) f(x T/ ) Order 1 Slope=1 Order 1 Slope=2 f SME ( =.25) SGD ( =.25) SME ( =.1) SGD ( =.1) SME ( =.1) SGD ( =.1) (k ) (a) (b) 1 SME ( =.1) SGD ( =.1) SME ( =.1) SGD ( =.1) f (k ) (c) Figure 5.2: SME predicion v SGD dynamic for he model varian (5.3). (a) Order of convergence of he SME o he SGD. We ue he ame eup a in Fig. 5.1(a). Oberve ha our analyi again predic he correc rae of weak error decay a η decreae. (b) SGD pah v order-1 SME predicion. Solid line are SME exac oluion and doed line are mean of SGD pah over 5 run, and he percenile are haded. We oberve convergence of Ef a conan η, and ha he ample mean i dominaed by few large value, a oberved by he deviaion of he percenile from he mean. (b) Varianceinduced exploion. A prediced by he SME analyi, if η > 2λ d (H) (Here, λ d (H) =.1), variance-induced inabiliy e in. 22

23 Sochaic Modified Equaion I: Mahemaical Foundaion 5.2 SME analyi of MSGD Le u now ue he SME o analyze MSGD applied o model (5.1). We have hown earlier ha Σ(x) = H. Thu, according o Thm. 14, he order-1 SME for MSGD i dv = [µv + HX ]d + ηhdw, dx = V d, (5.4) wih X = x and V =. If we e Y := (V, X ) R 2d, U a 2d-dimenional Brownian moion wih fir d coordinae equal o W, and define block marice ( ) ( ) µi H H A :=, B :=, (5.5) I we can hen wrie (5.4) a which admi he explici oluion dy = AY + ηbdu, Y = (, x ), Y = e A (Y + η By Iô iomery, we have [ Ef(X ) = 1 2 diag(, H) 1 /2 e A Y 2 + η ) e A BdU.. ] diag(, H) 1 /2 e ( )A B 2 d, (5.6) One can ee immediaely ha a imilar wo-phae behavior i preen, bu he propery of he decen phae now hinge on he pecral properie of he marix A (inead of H). Before proceeding, we fir oberve ha he eigenvalue of A can be wrien a ( λ(a) := {Λ +, Λ }, Λ ±,i = 1 2 µ ± ) µ 2 4λ i (H), i = 1, 2,..., d. (5.7) In paricular, Rλ i (A) > for all i a long a µ >. We alo need he following imple reul concerning he decay of he norm of e A if all eigenvalue of A have poiive real par. Lemma 17 Le A be a real quare marix uch ha all eigenvalue have poiive real par. Then, (i) For each ɛ >, here exi a conan C ɛ > independen of bu depend on ɛ, uch ha e A C ɛ e (min i Rλ i (A) ɛ) (ii) If in addiion A i diagonalizable, hen here exi a conan C > independen of uch ha e A Ce min i Rλ i (A) 23

24 Li, Tai and E Proof See Appendix E. Wih he above reul, we can now characerize he decay of he objecive under momenum SGD. From expreion (5.7), we ee ha a long a µ 2 4λ i for any i = 1,..., d, A ha 2d diinc eigenvalue and i hence diagonalizable. We hall hereafer aume ha µ i in general poiion uch ha hi i he cae. Uing Lem. 17 and expreion (5.6), we arrive a he eimae ηc 2 λ 1 (H) 3 Ef(X ) 1 2 C2 x 2 λ 1 (H)e 2 min i Rλ i (A) min i Rλ i (A) (1 e 2 min i Rλ i (A) ). (5.8) Thi reul ell u ha he convergence rae of he decen phae i now conrolled by he minimum real par of he eigenvalue of A, inead of he minimum eigenvalue of H. In paricular, we achieve he be linear convergence rae by maximizing he malle real par of he eigenvalue of A. Thi lead o he following opimizaion problem for he opimal convergence rae: up min min µ (, ) i=1,...,d {+1, 1} { [ R µ + ]} µ 2 4λ i (H) Since H i poiive definie, he upremum i aained a µ = 2 λ d (H) wih he rae alo equal o 2 λ d (H). However, noe ha if we ake µ = µ exacly, one can check ha A i no longer diagonalizable and by Lem. 17, he rae i lighly diminihed, hu echnically we can ake µ a cloe o µ a we like (i.e. he rae i a cloe o 2 λ d (H) a we like), bu exac equaliy i no echnically deducible from curren reul. In Fig. 5.3(c), we demonrae he opimal choice of µ and i effec on he convergence rae. Moreover, oberve ha a µ increae, he number of complex eigenvalue ar o decreae, and he magniude of he imaginary par of he complex eigenvalue alo decreae. Thi ignifie ha increaing µ caue ocillaion o decreae in magniude and frequency. Thi i again corroboraed by numerical experimen (Fig. 5.3(c)). Anoher inereing obervaion i ha by he idenificaion = ηk, he decen rae (in erm of k) i 2 λ d (H)η. A before, if we chooe he maximal able learning rae we would have ˆη 1/λ 1 (H) (ˆη = η 2 according o he caling inroduced in (4.6)). Thu, for he MSGD ierae we have i decen rae κ(h) 1/2, which i a huge improvemen over SGD, whoe rae i κ(h) 1, epecially for badly condiioned marice where κ(h) 1. In Fig. 5.3(d), we plo he MSGD iniial decen rae for varying condiion number of H. Again, we oberve ha he SME analyi give he correc characerizaion of he precie dynamic and recover he quare-roo relaionhip wih condiion number. Finally, le u dicu he effec of adding momenum o he aympoic flucuaion due o noiy gradien. Noe ha i i no correc o conclude, uing Eq. (5.8), ha aking µ µ alo give he lowe flucuaion. Thi i becaue he conan C depend on µ a well, a i evidenced in he proof of Lem. 17, which how ha C depend on he condiioning of he eigenvecor marix of A. To proceed, we do no ue he bound (5.8). Inead, we explicily 24

25 Sochaic Modified Equaion I: Mahemaical Foundaion f(xt) f(x T/ ) Order 1 Slope=1 Order 1 Slope=2 f SME ( =.5) MSGD ( =.5) SME ( =.1) MSGD ( =.1) SME ( =.1) MSGD ( =.1) (k ) (a) (b) 1 3 SME ( =.48) MSGD ( =.48) SME ( =.95) MSGD ( =.95) SME ( =1.91) MSGD ( =1.91) SGD Slope = 1 2 f rae (k ) (c) (H) (d) Figure 5.3: SME predicion v MSGD dynamic. (a) and (b) SME v MSGD dynamic a µ =.1 for differen learning rae η. A before, he SME predicion ge beer a η decreae according o he prediced order. Noice alo he preence of ocillaion, due o he complex eigenvalue of A. (c) Opimal decen rae of he SGD i achieved by he SME predicion µ = µ, which i.95 in hi cae. Noice ha exacly a prediced by he SME, increaing µ decreae he ocillaion frequency and magniude (due o having fewer complex eigenvalue and maller imaginary par), a well a he aympoic flucuaion (due o formula (5.9)). (d) Decen rae v condiion number. H i generaed wih differen condiion number, and he decen rae of MSGD i κ(h) 1/2, a prediced by he SME, which for badly condiioned H give a large improvemen. diagonalize A and afer ome compuaion, we arrive a he exac expreion for Ef(X ) Ef(X ) = 1 2 diag(, H)1 /2 e A Y 2 (5.9) η d i=1 [ λ 3 i 1 e 2RΛ +,i µ 2 4λ i 2RΛ +,i + 1 e 2RΛ,i 2RΛ,i ] 2R(, µ, λ i (H)) (5.1) 25

26 Li, Tai and E where R(, µ, λ) = { 1 e µ µ µ 2 λ µ+ 4λ µ 2 e µ in( 4λ µ 2 ) µe µ co( 4λ µ 2 ) 4λ µ < 2 λ. (5.11) In paricular, he aympoic lo value induced by noie i d lim Ef(X ) = 1 2 η i=1 [ λ i (H) 3 µ 2 4λ i (H) 1 2RΛ +,i + 1 2RΛ,i 2 min { }] µ 4λ i (H), 1 µ (5.12) Oberve ha hi funcion (in fac, each erm in he um) i monoone-decreaing in µ, and for µ 1 i cale like µ 1, and for µ 1 i cale like µ 3. Thu, increaing he momenum parameer decreae he aympoic noie in he ierae, i.e. decreae he aympoic value of Ef, which hould be in he abence of noie. Thi again agree wih he acual MSGD dynamic (Fig. 5.3(b)). Conequenly, o obain opimal radeoff beween decen and noie, we would like a momenum chedule ha equal µ in he decen phae and increae o infiniy (in he original caling hi correpond o ˆµ ) a we approach he opimum. Finding hi opimal chedule can be ca a an opimal conrol problem (Li e al., 217), and a rigorou inveigaion of hee approache will be conidered in ubequen work. 5.3 SME analyi of SNAG Finally, le u ee wha we can ay, uing he SME approach, abou he difference beween MSGD and SNAG in hi ochaic eing. Le u fir conider he cae of conan momenum. From Thm. 16, we know ha he order-1 SME are idenical, o we mu conider higher order SME. A raighforward compuaion yield he following order-2 SME for MSGD and SNAG (again we le Y = (V, X )) MSGD: dy = A 1 Y + ηbdu, Y = (, x ), SNAG: dy = A 2 Y + ηbdu, Y = (, x ), where A i = A ηe i wih A, B a defined in (5.5) and E 1 := ( µ 2 ) I H µh, E µi H 2 := ( µ 2 ) I + H µh. µi H From he analyi in Sec. 4.3, he decen rae i dominaed by he minimal real par of he eigenvalue of A i, which are repecively λ(a 1 ) = λ(a 2 ) = { 1 4 { ( 1 4 ) } µ 2 (ηµ + 2) 2 + 4η 2 λ i (H) 2 8λ i (H)(ηµ + 2), i = 1,..., d µ(ηµ + 2) ± ( µ(ηµ + 2) + 2ηλ i (H) ± ηµ + 2 µ 2 (ηµ + 2) + 4λ i (H)(ηµ 2) ) }, i = 1,..., d We oberve ha for mall µ (i.e. ˆµ 1 in he uual MSGD caling), he erm in quare-roo are negaive and hence for he ame mall µ, he convergence rae of SNAG i 1 2 ηλ d(h) larger 26

EECE 301 Signals & Systems Prof. Mark Fowler

EECE 301 Signals & Systems Prof. Mark Fowler EECE 31 Signal & Syem Prof. Mark Fowler Noe Se #27 C-T Syem: Laplace Tranform Power Tool for yem analyi Reading Aignmen: Secion 6.1 6.3 of Kamen and Heck 1/18 Coure Flow Diagram The arrow here how concepual

More information

Randomized Perfect Bipartite Matching

Randomized Perfect Bipartite Matching Inenive Algorihm Lecure 24 Randomized Perfec Biparie Maching Lecurer: Daniel A. Spielman April 9, 208 24. Inroducion We explain a randomized algorihm by Ahih Goel, Michael Kapralov and Sanjeev Khanna for

More information

Introduction to Congestion Games

Introduction to Congestion Games Algorihmic Game Theory, Summer 2017 Inroducion o Congeion Game Lecure 1 (5 page) Inrucor: Thoma Keelheim In hi lecure, we ge o know congeion game, which will be our running example for many concep in game

More information

Chapter 7: Inverse-Response Systems

Chapter 7: Inverse-Response Systems Chaper 7: Invere-Repone Syem Normal Syem Invere-Repone Syem Baic Sar ou in he wrong direcion End up in he original eady-ae gain value Two or more yem wih differen magniude and cale in parallel Main yem

More information

Fractional Ornstein-Uhlenbeck Bridge

Fractional Ornstein-Uhlenbeck Bridge WDS'1 Proceeding of Conribued Paper, Par I, 21 26, 21. ISBN 978-8-7378-139-2 MATFYZPRESS Fracional Ornein-Uhlenbeck Bridge J. Janák Charle Univeriy, Faculy of Mahemaic and Phyic, Prague, Czech Republic.

More information

6.8 Laplace Transform: General Formulas

6.8 Laplace Transform: General Formulas 48 HAP. 6 Laplace Tranform 6.8 Laplace Tranform: General Formula Formula Name, ommen Sec. F() l{ f ()} e f () d f () l {F()} Definiion of Tranform Invere Tranform 6. l{af () bg()} al{f ()} bl{g()} Lineariy

More information

CSC 364S Notes University of Toronto, Spring, The networks we will consider are directed graphs, where each edge has associated with it

CSC 364S Notes University of Toronto, Spring, The networks we will consider are directed graphs, where each edge has associated with it CSC 36S Noe Univeriy of Torono, Spring, 2003 Flow Algorihm The nework we will conider are direced graph, where each edge ha aociaed wih i a nonnegaive capaciy. The inuiion i ha if edge (u; v) ha capaciy

More information

Stability in Distribution for Backward Uncertain Differential Equation

Stability in Distribution for Backward Uncertain Differential Equation Sabiliy in Diribuion for Backward Uncerain Differenial Equaion Yuhong Sheng 1, Dan A. Ralecu 2 1. College of Mahemaical and Syem Science, Xinjiang Univeriy, Urumqi 8346, China heng-yh12@mail.inghua.edu.cn

More information

Algorithmic Discrete Mathematics 6. Exercise Sheet

Algorithmic Discrete Mathematics 6. Exercise Sheet Algorihmic Dicree Mahemaic. Exercie Shee Deparmen of Mahemaic SS 0 PD Dr. Ulf Lorenz 7. and 8. Juni 0 Dipl.-Mah. David Meffer Verion of June, 0 Groupwork Exercie G (Heap-Sor) Ue Heap-Sor wih a min-heap

More information

Mathematische Annalen

Mathematische Annalen Mah. Ann. 39, 33 339 (997) Mahemaiche Annalen c Springer-Verlag 997 Inegraion by par in loop pace Elon P. Hu Deparmen of Mahemaic, Norhweern Univeriy, Evanon, IL 628, USA (e-mail: elon@@mah.nwu.edu) Received:

More information

ESTIMATES FOR THE DERIVATIVE OF DIFFUSION SEMIGROUPS

ESTIMATES FOR THE DERIVATIVE OF DIFFUSION SEMIGROUPS Elec. Comm. in Probab. 3 (998) 65 74 ELECTRONIC COMMUNICATIONS in PROBABILITY ESTIMATES FOR THE DERIVATIVE OF DIFFUSION SEMIGROUPS L.A. RINCON Deparmen of Mahemaic Univeriy of Wale Swanea Singleon Par

More information

Problem Set If all directed edges in a network have distinct capacities, then there is a unique maximum flow.

Problem Set If all directed edges in a network have distinct capacities, then there is a unique maximum flow. CSE 202: Deign and Analyi of Algorihm Winer 2013 Problem Se 3 Inrucor: Kamalika Chaudhuri Due on: Tue. Feb 26, 2013 Inrucion For your proof, you may ue any lower bound, algorihm or daa rucure from he ex

More information

To become more mathematically correct, Circuit equations are Algebraic Differential equations. from KVL, KCL from the constitutive relationship

To become more mathematically correct, Circuit equations are Algebraic Differential equations. from KVL, KCL from the constitutive relationship Laplace Tranform (Lin & DeCarlo: Ch 3) ENSC30 Elecric Circui II The Laplace ranform i an inegral ranformaion. I ranform: f ( ) F( ) ime variable complex variable From Euler > Lagrange > Laplace. Hence,

More information

The Residual Graph. 11 Augmenting Path Algorithms. Augmenting Path Algorithm. Augmenting Path Algorithm

The Residual Graph. 11 Augmenting Path Algorithms. Augmenting Path Algorithm. Augmenting Path Algorithm Augmening Pah Algorihm Greedy-algorihm: ar wih f (e) = everywhere find an - pah wih f (e) < c(e) on every edge augmen flow along he pah repea a long a poible The Reidual Graph From he graph G = (V, E,

More information

Introduction to SLE Lecture Notes

Introduction to SLE Lecture Notes Inroducion o SLE Lecure Noe May 13, 16 - The goal of hi ecion i o find a ufficien condiion of λ for he hull K o be generaed by a imple cure. I urn ou if λ 1 < 4 hen K i generaed by a imple curve. We will

More information

The Residual Graph. 12 Augmenting Path Algorithms. Augmenting Path Algorithm. Augmenting Path Algorithm

The Residual Graph. 12 Augmenting Path Algorithms. Augmenting Path Algorithm. Augmenting Path Algorithm Augmening Pah Algorihm Greedy-algorihm: ar wih f (e) = everywhere find an - pah wih f (e) < c(e) on every edge augmen flow along he pah repea a long a poible The Reidual Graph From he graph G = (V, E,

More information

ARTIFICIAL INTELLIGENCE. Markov decision processes

ARTIFICIAL INTELLIGENCE. Markov decision processes INFOB2KI 2017-2018 Urech Univeriy The Neherland ARTIFICIAL INTELLIGENCE Markov deciion procee Lecurer: Silja Renooij Thee lide are par of he INFOB2KI Coure Noe available from www.c.uu.nl/doc/vakken/b2ki/chema.hml

More information

1 Motivation and Basic Definitions

1 Motivation and Basic Definitions CSCE : Deign and Analyi of Algorihm Noe on Max Flow Fall 20 (Baed on he preenaion in Chaper 26 of Inroducion o Algorihm, 3rd Ed. by Cormen, Leieron, Rive and Sein.) Moivaion and Baic Definiion Conider

More information

Discussion Session 2 Constant Acceleration/Relative Motion Week 03

Discussion Session 2 Constant Acceleration/Relative Motion Week 03 PHYS 100 Dicuion Seion Conan Acceleraion/Relaive Moion Week 03 The Plan Today you will work wih your group explore he idea of reference frame (i.e. relaive moion) and moion wih conan acceleraion. You ll

More information

, the. L and the L. x x. max. i n. It is easy to show that these two norms satisfy the following relation: x x n x = (17.3) max

, the. L and the L. x x. max. i n. It is easy to show that these two norms satisfy the following relation: x x n x = (17.3) max ecure 8 7. Sabiliy Analyi For an n dimenional vecor R n, he and he vecor norm are defined a: = T = i n i (7.) I i eay o how ha hee wo norm aify he following relaion: n (7.) If a vecor i ime-dependen, hen

More information

Additional Methods for Solving DSGE Models

Additional Methods for Solving DSGE Models Addiional Mehod for Solving DSGE Model Karel Meren, Cornell Univeriy Reference King, R. G., Ploer, C. I. & Rebelo, S. T. (1988), Producion, growh and buine cycle: I. he baic neoclaical model, Journal of

More information

18.03SC Unit 3 Practice Exam and Solutions

18.03SC Unit 3 Practice Exam and Solutions Sudy Guide on Sep, Dela, Convoluion, Laplace You can hink of he ep funcion u() a any nice mooh funcion which i for < a and for > a, where a i a poiive number which i much maller han any ime cale we care

More information

Physics 240: Worksheet 16 Name

Physics 240: Worksheet 16 Name Phyic 4: Workhee 16 Nae Non-unifor circular oion Each of hee proble involve non-unifor circular oion wih a conan α. (1) Obain each of he equaion of oion for non-unifor circular oion under a conan acceleraion,

More information

Rough Paths and its Applications in Machine Learning

Rough Paths and its Applications in Machine Learning Pah ignaure Machine learning applicaion Rough Pah and i Applicaion in Machine Learning July 20, 2017 Rough Pah and i Applicaion in Machine Learning Pah ignaure Machine learning applicaion Hiory and moivaion

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

2. VECTORS. R Vectors are denoted by bold-face characters such as R, V, etc. The magnitude of a vector, such as R, is denoted as R, R, V

2. VECTORS. R Vectors are denoted by bold-face characters such as R, V, etc. The magnitude of a vector, such as R, is denoted as R, R, V ME 352 VETS 2. VETS Vecor algebra form he mahemaical foundaion for kinemaic and dnamic. Geomer of moion i a he hear of boh he kinemaic and dnamic of mechanical em. Vecor anali i he imehonored ool for decribing

More information

Notes on cointegration of real interest rates and real exchange rates. ρ (2)

Notes on cointegration of real interest rates and real exchange rates. ρ (2) Noe on coinegraion of real inere rae and real exchange rae Charle ngel, Univeriy of Wiconin Le me ar wih he obervaion ha while he lieraure (mo prominenly Meee and Rogoff (988) and dion and Paul (993))

More information

Flow networks. Flow Networks. A flow on a network. Flow networks. The maximum-flow problem. Introduction to Algorithms, Lecture 22 December 5, 2001

Flow networks. Flow Networks. A flow on a network. Flow networks. The maximum-flow problem. Introduction to Algorithms, Lecture 22 December 5, 2001 CS 545 Flow Nework lon Efra Slide courey of Charle Leieron wih mall change by Carola Wenk Flow nework Definiion. flow nework i a direced graph G = (V, E) wih wo diinguihed verice: a ource and a ink. Each

More information

EE Control Systems LECTURE 2

EE Control Systems LECTURE 2 Copyrigh F.L. Lewi 999 All righ reerved EE 434 - Conrol Syem LECTURE REVIEW OF LAPLACE TRANSFORM LAPLACE TRANSFORM The Laplace ranform i very ueful in analyi and deign for yem ha are linear and ime-invarian

More information

Macroeconomics 1. Ali Shourideh. Final Exam

Macroeconomics 1. Ali Shourideh. Final Exam 4780 - Macroeconomic 1 Ali Shourideh Final Exam Problem 1. A Model of On-he-Job Search Conider he following verion of he McCall earch model ha allow for on-he-job-earch. In paricular, uppoe ha ime i coninuou

More information

18 Extensions of Maximum Flow

18 Extensions of Maximum Flow Who are you?" aid Lunkwill, riing angrily from hi ea. Wha do you wan?" I am Majikhie!" announced he older one. And I demand ha I am Vroomfondel!" houed he younger one. Majikhie urned on Vroomfondel. I

More information

EE202 Circuit Theory II

EE202 Circuit Theory II EE202 Circui Theory II 2017-2018, Spring Dr. Yılmaz KALKAN I. Inroducion & eview of Fir Order Circui (Chaper 7 of Nilon - 3 Hr. Inroducion, C and L Circui, Naural and Sep epone of Serie and Parallel L/C

More information

On the Exponential Operator Functions on Time Scales

On the Exponential Operator Functions on Time Scales dvance in Dynamical Syem pplicaion ISSN 973-5321, Volume 7, Number 1, pp. 57 8 (212) hp://campu.m.edu/ada On he Exponenial Operaor Funcion on Time Scale laa E. Hamza Cairo Univeriy Deparmen of Mahemaic

More information

Laplace Transform. Inverse Laplace Transform. e st f(t)dt. (2)

Laplace Transform. Inverse Laplace Transform. e st f(t)dt. (2) Laplace Tranform Maoud Malek The Laplace ranform i an inegral ranform named in honor of mahemaician and aronomer Pierre-Simon Laplace, who ued he ranform in hi work on probabiliy heory. I i a powerful

More information

Network Flows: Introduction & Maximum Flow

Network Flows: Introduction & Maximum Flow CSC 373 - lgorihm Deign, nalyi, and Complexiy Summer 2016 Lalla Mouaadid Nework Flow: Inroducion & Maximum Flow We now urn our aenion o anoher powerful algorihmic echnique: Local Search. In a local earch

More information

CHAPTER 7: SECOND-ORDER CIRCUITS

CHAPTER 7: SECOND-ORDER CIRCUITS EEE5: CI RCUI T THEORY CHAPTER 7: SECOND-ORDER CIRCUITS 7. Inroducion Thi chaper conider circui wih wo orage elemen. Known a econd-order circui becaue heir repone are decribed by differenial equaion ha

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Generalized Orlicz Spaces and Wasserstein Distances for Convex-Concave Scale Functions

Generalized Orlicz Spaces and Wasserstein Distances for Convex-Concave Scale Functions Generalized Orlicz Space and Waerein Diance for Convex-Concave Scale Funcion Karl-Theodor Surm Abrac Given a ricly increaing, coninuou funcion ϑ : R + R +, baed on he co funcional ϑ (d(x, y dq(x, y, we

More information

FLAT CYCLOTOMIC POLYNOMIALS OF ORDER FOUR AND HIGHER

FLAT CYCLOTOMIC POLYNOMIALS OF ORDER FOUR AND HIGHER #A30 INTEGERS 10 (010), 357-363 FLAT CYCLOTOMIC POLYNOMIALS OF ORDER FOUR AND HIGHER Nahan Kaplan Deparmen of Mahemaic, Harvard Univeriy, Cambridge, MA nkaplan@mah.harvard.edu Received: 7/15/09, Revied:

More information

Mon Apr 2: Laplace transform and initial value problems like we studied in Chapter 5

Mon Apr 2: Laplace transform and initial value problems like we studied in Chapter 5 Mah 225-4 Week 2 April 2-6 coninue.-.3; alo cover par of.4-.5, EP 7.6 Mon Apr 2:.-.3 Laplace ranform and iniial value problem like we udied in Chaper 5 Announcemen: Warm-up Exercie: Recall, The Laplace

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Explicit form of global solution to stochastic logistic differential equation and related topics

Explicit form of global solution to stochastic logistic differential equation and related topics SAISICS, OPIMIZAION AND INFOMAION COMPUING Sa., Opim. Inf. Compu., Vol. 5, March 17, pp 58 64. Publihed online in Inernaional Academic Pre (www.iapre.org) Explici form of global oluion o ochaic logiic

More information

Research Article On Double Summability of Double Conjugate Fourier Series

Research Article On Double Summability of Double Conjugate Fourier Series Inernaional Journal of Mahemaic and Mahemaical Science Volume 22, Aricle ID 4592, 5 page doi:.55/22/4592 Reearch Aricle On Double Summabiliy of Double Conjugae Fourier Serie H. K. Nigam and Kuum Sharma

More information

Research Article Existence and Uniqueness of Solutions for a Class of Nonlinear Stochastic Differential Equations

Research Article Existence and Uniqueness of Solutions for a Class of Nonlinear Stochastic Differential Equations Hindawi Publihing Corporaion Abrac and Applied Analyi Volume 03, Aricle ID 56809, 7 page hp://dx.doi.org/0.55/03/56809 Reearch Aricle Exience and Uniquene of Soluion for a Cla of Nonlinear Sochaic Differenial

More information

CS 473G Lecture 15: Max-Flow Algorithms and Applications Fall 2005

CS 473G Lecture 15: Max-Flow Algorithms and Applications Fall 2005 CS 473G Lecure 1: Max-Flow Algorihm and Applicaion Fall 200 1 Max-Flow Algorihm and Applicaion (November 1) 1.1 Recap Fix a direced graph G = (V, E) ha doe no conain boh an edge u v and i reveral v u,

More information

Use of variance estimation in the multi-armed bandit problem

Use of variance estimation in the multi-armed bandit problem Ue of variance eimaion in he muli-armed bi problem Jean Yve Audiber CERTIS - Ecole de Pon 19, rue Alfred Nobel - Cié Decare 77455 Marne-la-Vallée - France audiber@cerienpcfr Rémi Muno INRIA Fuur, Grappa

More information

Exponential Sawtooth

Exponential Sawtooth ECPE 36 HOMEWORK 3: PROPERTIES OF THE FOURIER TRANSFORM SOLUTION. Exponenial Sawooh: The eaie way o do hi problem i o look a he Fourier ranform of a ingle exponenial funcion, () = exp( )u(). From he able

More information

NECESSARY AND SUFFICIENT CONDITIONS FOR LATENT SEPARABILITY

NECESSARY AND SUFFICIENT CONDITIONS FOR LATENT SEPARABILITY NECESSARY AND SUFFICIENT CONDITIONS FOR LATENT SEPARABILITY Ian Crawford THE INSTITUTE FOR FISCAL STUDIES DEPARTMENT OF ECONOMICS, UCL cemmap working paper CWP02/04 Neceary and Sufficien Condiion for Laen

More information

u(t) Figure 1. Open loop control system

u(t) Figure 1. Open loop control system Open loop conrol v cloed loop feedbac conrol The nex wo figure preen he rucure of open loop and feedbac conrol yem Figure how an open loop conrol yem whoe funcion i o caue he oupu y o follow he reference

More information

ON FRACTIONAL ORNSTEIN-UHLENBECK PROCESSES

ON FRACTIONAL ORNSTEIN-UHLENBECK PROCESSES Communicaion on Sochaic Analyi Vol. 5, No. 1 211 121-133 Serial Publicaion www.erialpublicaion.com ON FRACTIONAL ORNSTEIN-UHLENBECK PROCESSES TERHI KAARAKKA AND PAAVO SALMINEN Abrac. In hi paper we udy

More information

An introduction to the (local) martingale problem

An introduction to the (local) martingale problem An inroducion o he (local) maringale problem Chri Janjigian Ocober 14, 214 Abrac Thee are my preenaion noe for a alk in he Univeriy of Wiconin - Madion graduae probabiliy eminar. Thee noe are primarily

More information

13.1 Accelerating Objects

13.1 Accelerating Objects 13.1 Acceleraing Objec A you learned in Chaper 12, when you are ravelling a a conan peed in a raigh line, you have uniform moion. However, mo objec do no ravel a conan peed in a raigh line o hey do no

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Algorithms and Data Structures 2011/12 Week 9 Solutions (Tues 15th - Fri 18th Nov)

Algorithms and Data Structures 2011/12 Week 9 Solutions (Tues 15th - Fri 18th Nov) Algorihm and Daa Srucure 2011/ Week Soluion (Tue 15h - Fri 18h No) 1. Queion: e are gien 11/16 / 15/20 8/13 0/ 1/ / 11/1 / / To queion: (a) Find a pair of ube X, Y V uch ha f(x, Y) = f(v X, Y). (b) Find

More information

Let. x y. denote a bivariate time series with zero mean.

Let. x y. denote a bivariate time series with zero mean. Linear Filer Le x y : T denoe a bivariae ime erie wih zero mean. Suppoe ha he ime erie {y : T} i conruced a follow: y a x The ime erie {y : T} i aid o be conruced from {x : T} by mean of a Linear Filer.

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Convergence of the gradient algorithm for linear regression models in the continuous and discrete time cases

Convergence of the gradient algorithm for linear regression models in the continuous and discrete time cases Convergence of he gradien algorihm for linear regreion model in he coninuou and dicree ime cae Lauren Praly To cie hi verion: Lauren Praly. Convergence of he gradien algorihm for linear regreion model

More information

FIXED POINTS AND STABILITY IN NEUTRAL DIFFERENTIAL EQUATIONS WITH VARIABLE DELAYS

FIXED POINTS AND STABILITY IN NEUTRAL DIFFERENTIAL EQUATIONS WITH VARIABLE DELAYS PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 136, Number 3, March 28, Page 99 918 S 2-9939(7)989-2 Aricle elecronically publihed on November 3, 27 FIXED POINTS AND STABILITY IN NEUTRAL DIFFERENTIAL

More information

CONTROL SYSTEMS. Chapter 10 : State Space Response

CONTROL SYSTEMS. Chapter 10 : State Space Response CONTROL SYSTEMS Chaper : Sae Space Repone GATE Objecive & Numerical Type Soluion Queion 5 [GATE EE 99 IIT-Bombay : Mark] Conider a econd order yem whoe ae pace repreenaion i of he form A Bu. If () (),

More information

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION MATH 28A, SUMME 2009, FINAL EXAM SOLUTION BENJAMIN JOHNSON () (8 poins) [Lagrange Inerpolaion] (a) (4 poins) Le f be a funcion defined a some real numbers x 0,..., x n. Give a defining equaion for he Lagrange

More information

Main Reference: Sections in CLRS.

Main Reference: Sections in CLRS. Maximum Flow Reied 09/09/200 Main Reference: Secion 26.-26. in CLRS. Inroducion Definiion Muli-Source Muli-Sink The Ford-Fulkeron Mehod Reidual Nework Augmening Pah The Max-Flow Min-Cu Theorem The Edmond-Karp

More information

CHAPTER 7. Definition and Properties. of Laplace Transforms

CHAPTER 7. Definition and Properties. of Laplace Transforms SERIES OF CLSS NOTES FOR 5-6 TO INTRODUCE LINER ND NONLINER PROBLEMS TO ENGINEERS, SCIENTISTS, ND PPLIED MTHEMTICINS DE CLSS NOTES COLLECTION OF HNDOUTS ON SCLR LINER ORDINRY DIFFERENTIL EQUTIONS (ODE")

More information

6. Stochastic calculus with jump processes

6. Stochastic calculus with jump processes A) Trading sraegies (1/3) Marke wih d asses S = (S 1,, S d ) A rading sraegy can be modelled wih a vecor φ describing he quaniies invesed in each asse a each insan : φ = (φ 1,, φ d ) The value a of a porfolio

More information

CS4445/9544 Analysis of Algorithms II Solution for Assignment 1

CS4445/9544 Analysis of Algorithms II Solution for Assignment 1 Conider he following flow nework CS444/944 Analyi of Algorihm II Soluion for Aignmen (0 mark) In he following nework a minimum cu ha capaciy 0 Eiher prove ha hi aemen i rue, or how ha i i fale Uing he

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

U T,0. t = X t t T X T. (1)

U T,0. t = X t t T X T. (1) Gauian bridge Dario Gabarra 1, ommi Soinen 2, and Eko Valkeila 3 1 Deparmen of Mahemaic and Saiic, PO Box 68, 14 Univeriy of Helinki,Finland dariogabarra@rnihelinkifi 2 Deparmen of Mahemaic and Saiic,

More information

An Introduction to Malliavin calculus and its applications

An Introduction to Malliavin calculus and its applications An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214

More information

Graphs III - Network Flow

Graphs III - Network Flow Graph III - Nework Flow Flow nework eup graph G=(V,E) edge capaciy w(u,v) 0 - if edge doe no exi, hen w(u,v)=0 pecial verice: ource verex ; ink verex - no edge ino and no edge ou of Aume every verex v

More information

Selfish Routing. Tim Roughgarden Cornell University. Includes joint work with Éva Tardos

Selfish Routing. Tim Roughgarden Cornell University. Includes joint work with Éva Tardos Selfih Rouing Tim Roughgarden Cornell Univeriy Include join work wih Éva Tardo 1 Which roue would you chooe? Example: one uni of raffic (e.g., car) wan o go from o delay = 1 hour (no congeion effec) long

More information

Chapter 6. Systems of First Order Linear Differential Equations

Chapter 6. Systems of First Order Linear Differential Equations Chaper 6 Sysems of Firs Order Linear Differenial Equaions We will only discuss firs order sysems However higher order sysems may be made ino firs order sysems by a rick shown below We will have a sligh

More information

16 Max-Flow Algorithms and Applications

16 Max-Flow Algorithms and Applications Algorihm A proce canno be underood by opping i. Underanding mu move wih he flow of he proce, mu join i and flow wih i. The Fir Law of Mena, in Frank Herber Dune (196) There a difference beween knowing

More information

Pathwise description of dynamic pitchfork bifurcations with additive noise

Pathwise description of dynamic pitchfork bifurcations with additive noise Pahwie decripion of dynamic pichfork bifurcaion wih addiive noie Nil Berglund and Barbara Genz Abrac The low drif (wih peed ) of a parameer hrough a pichfork bifurcaion poin, known a he dynamic pichfork

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

6.302 Feedback Systems Recitation : Phase-locked Loops Prof. Joel L. Dawson

6.302 Feedback Systems Recitation : Phase-locked Loops Prof. Joel L. Dawson 6.32 Feedback Syem Phae-locked loop are a foundaional building block for analog circui deign, paricularly for communicaion circui. They provide a good example yem for hi cla becaue hey are an excellen

More information

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS LECTURE : GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS We will work wih a coninuous ime reversible Markov chain X on a finie conneced sae space, wih generaor Lf(x = y q x,yf(y. (Recall ha q

More information

The multisubset sum problem for finite abelian groups

The multisubset sum problem for finite abelian groups Alo available a hp://amc-journal.eu ISSN 1855-3966 (prined edn.), ISSN 1855-3974 (elecronic edn.) ARS MATHEMATICA CONTEMPORANEA 8 (2015) 417 423 The muliube um problem for finie abelian group Amela Muraović-Ribić

More information

Time Varying Multiserver Queues. W. A. Massey. Murray Hill, NJ Abstract

Time Varying Multiserver Queues. W. A. Massey. Murray Hill, NJ Abstract Waiing Time Aympoic for Time Varying Mulierver ueue wih Abonmen Rerial A. Melbaum Technion Iniue Haifa, 3 ISRAEL avim@x.echnion.ac.il M. I. Reiman Bell Lab, Lucen Technologie Murray Hill, NJ 7974 U.S.A.

More information

Parameter Estimation for Fractional Ornstein-Uhlenbeck Processes: Non-Ergodic Case

Parameter Estimation for Fractional Ornstein-Uhlenbeck Processes: Non-Ergodic Case Parameer Eimaion for Fracional Ornein-Uhlenbeck Procee: Non-Ergodic Cae R. Belfadli 1, K. E-Sebaiy and Y. Ouknine 3 1 Polydiciplinary Faculy of Taroudan, Univeriy Ibn Zohr, Taroudan, Morocco. Iniu de Mahémaique

More information

Systems of nonlinear ODEs with a time singularity in the right-hand side

Systems of nonlinear ODEs with a time singularity in the right-hand side Syem of nonlinear ODE wih a ime ingulariy in he righ-hand ide Jana Burkoová a,, Irena Rachůnková a, Svaolav Saněk a, Ewa B. Weinmüller b, Sefan Wurm b a Deparmen of Mahemaical Analyi and Applicaion of

More information

Chapter 6. Laplace Transforms

Chapter 6. Laplace Transforms Chaper 6. Laplace Tranform Kreyzig by YHLee;45; 6- An ODE i reduced o an algebraic problem by operaional calculu. The equaion i olved by algebraic manipulaion. The reul i ranformed back for he oluion of

More information

NEUTRON DIFFUSION THEORY

NEUTRON DIFFUSION THEORY NEUTRON DIFFUSION THEORY M. Ragheb 4//7. INTRODUCTION The diffuion heory model of neuron ranpor play a crucial role in reacor heory ince i i imple enough o allow cienific inigh, and i i ufficienly realiic

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

Physics 127b: Statistical Mechanics. Fokker-Planck Equation. Time Evolution

Physics 127b: Statistical Mechanics. Fokker-Planck Equation. Time Evolution Physics 7b: Saisical Mechanics Fokker-Planck Equaion The Langevin equaion approach o he evoluion of he velociy disribuion for he Brownian paricle migh leave you uncomforable. A more formal reamen of his

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

5.2 GRAPHICAL VELOCITY ANALYSIS Polygon Method

5.2 GRAPHICAL VELOCITY ANALYSIS Polygon Method ME 352 GRHICL VELCITY NLYSIS 52 GRHICL VELCITY NLYSIS olygon Mehod Velociy analyi form he hear of kinemaic and dynamic of mechanical yem Velociy analyi i uually performed following a poiion analyi; ie,

More information

1 CHAPTER 14 LAPLACE TRANSFORMS

1 CHAPTER 14 LAPLACE TRANSFORMS CHAPTER 4 LAPLACE TRANSFORMS 4 nroducion f x) i a funcion of x, where x lie in he range o, hen he funcion p), defined by p) px e x) dx, 4 i called he Laplace ranform of x) However, in hi chaper, where

More information

Admin MAX FLOW APPLICATIONS. Flow graph/networks. Flow constraints 4/30/13. CS lunch today Grading. in-flow = out-flow for every vertex (except s, t)

Admin MAX FLOW APPLICATIONS. Flow graph/networks. Flow constraints 4/30/13. CS lunch today Grading. in-flow = out-flow for every vertex (except s, t) /0/ dmin lunch oday rading MX LOW PPLIION 0, pring avid Kauchak low graph/nework low nework direced, weighed graph (V, ) poiive edge weigh indicaing he capaciy (generally, aume ineger) conain a ingle ource

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Piecewise-Defined Functions and Periodic Functions

Piecewise-Defined Functions and Periodic Functions 28 Piecewie-Defined Funcion and Periodic Funcion A he ar of our udy of he Laplace ranform, i wa claimed ha he Laplace ranform i paricularly ueful when dealing wih nonhomogeneou equaion in which he forcing

More information

What is maximum Likelihood? History Features of ML method Tools used Advantages Disadvantages Evolutionary models

What is maximum Likelihood? History Features of ML method Tools used Advantages Disadvantages Evolutionary models Wha i maximum Likelihood? Hiory Feaure of ML mehod Tool ued Advanage Diadvanage Evoluionary model Maximum likelihood mehod creae all he poible ree conaining he e of organim conidered, and hen ue he aiic

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Topic 3. Single factor ANOVA [ST&D Ch. 7]

Topic 3. Single factor ANOVA [ST&D Ch. 7] Topic 3. Single facor ANOVA [ST&D Ch. 7] "The analyi of variance i more han a echnique for aiical analyi. Once i i underood, ANOVA i a ool ha can provide an inigh ino he naure of variaion of naural even"

More information

Average Case Lower Bounds for Monotone Switching Networks

Average Case Lower Bounds for Monotone Switching Networks Average Cae Lower Bound for Monoone Swiching Nework Yuval Filmu, Toniann Piai, Rober Robere, Sephen Cook Deparmen of Compuer Science Univeriy of Torono Monoone Compuaion (Refreher) Monoone circui were

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Buckling of a structure means failure due to excessive displacements (loss of structural stiffness), and/or

Buckling of a structure means failure due to excessive displacements (loss of structural stiffness), and/or Buckling Buckling of a rucure mean failure due o exceive diplacemen (lo of rucural iffne), and/or lo of abiliy of an equilibrium configuraion of he rucure The rule of humb i ha buckling i conidered a mode

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information