Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization

Size: px
Start display at page:

Download "Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization"

Transcription

1 he hirty-secod AAAI Coferece o Artificial Itelligece AAAI-8 Accelerated Method for Stochastic Compositio Optimizatio with Nosmooth Regularizatio Zhouyua Huo, Bi Gu, Ji Liu, 2 Heg Huag Departmet of Electrical ad Computer Egieerig, Uiversity of Pittsburgh, Pittsburgh, PA 526, USA 2 Departmet of Computer Sciece, Uiversity of Rochester, Rochester, NY, 4627, USA zhouyua.huo@pitt.edu, big0@pitt.edu, jliu@cs.rochester.edu, heg.huag@pitt.edu Abstract Stochastic compositio optimizatio draws much attetio recetly ad has bee successful i may emergig applicatios of machie learig, statistical aalysis, ad reiforcemet learig. I this paper, we focus o the compositio problem with osmooth regularizatio pealty. Previous works either have slow covergece rate, or do ot provide complete covergece aalysis for the geeral problem. I this paper, we tackle these two issues by proposig a ew stochastic compositio optimizatio method for compositio problem with osmooth regularizatio pealty. I our method, we apply variace reductio techique to accelerate the speed of covergece. o the best of our kowledge, our method admits the fastest covergece rate for stochastic compositio optimizatio: for strogly covex compositio problem, our algorithm is proved to admit liear covergece; for geeral compositio problem, our algorithm sigificatly improves the state-of-theart covergece rate from O /2 to O + 2 2/3. Fially, we apply our proposed algorithm to portfolio maagemet ad policy evaluatio i reiforcemet learig. Experimetal results verify our theoretical aalysis. Itroductio Stochastic compositio optimizatio draws much attetio recetly ad has bee successful i addressig may emergig applicatios of differet areas, such as reiforcemet learig Dai et al. 206; Wag ad Liu 206, statistical learig Wag, Fag, ad Liu 204 ad risk maagemet Detcheva, Peev, ad Ruszczyński 206. he authors i Wag, Fag, ad Liu 204; Wag ad Liu 206 proposed compositio problem, which is the compositio of two expected-value fuctios: mi x R N E i F i E j G j x +hx, }{{} fx where G j x :R N R M are ier compoet fuctios, F i y :R M R are outer compoet fuctios. he regularizatio pealty hx is a closed covex fuctio but ot ecessarily smooth. I reality, we usually solve the fiitesum sceario for compositio problem, ad it ca be o whom all correspodece should be addressed. Copyright c 208, Associatio for the Advacemet of Artificial Itelligece All rights reserved. represeted as follows: 2 mi Hx = mi F i G jx +hx, 2 x R N x R N 2 j= }{{} fx where it is defied that F y = 2 j= F i y ad Gx = G j x. hroughout this paper, we maily focus o the case that F i ad G j are smooth. However, we do ot require that F i ad G j have to be covex. Miimizig the compositio of expected-value fuctios or fiite-sum fuctios 2 is challegig. Classical stochastic gradiet method SGD ad its variats are well suited for miimizig traditioal fiite-sum fuctios Bottou, Curtis, ad Nocedal 206. However, they are ot directly applicable to the compositio problem. o apply SGD, we eed to compute the ubiased samplig gradiet G j x F i Gx of problem 2, which is timecosumig whe Gx is ukow. Evaluatig Gx requires traversig all ier compoet fuctios, which is uacceptable to compute i each iteratio if 2 is a large umber. I Wag, Fag, ad Liu 204, the authors cosidered the problem with hx =0ad proposed stochastic compositioal gradiet descet algorithm SCGD which is the first stochastic method for compositio problem. I their paper, they proved that the covergece rate of SCGD for strogly covex compositio problem is O 2/3, ad for geeral problem is O /4. hey also proposed accelerated SCGD by usig Nesterov smoothig techique Nesterov 983 which is proved to admit faster covergece rate. SCGD has costat query complexity per iteratio, however, their covergece rate is far worse tha full gradiet method because of the oise iduced by samplig gradiets. Recetly, variace reductio techique Johso ad Zhag 203 was applied to accelerate the covergece of stochastic compositio optimizatio. Lia, Wag, ad Liu 206 first utilized the variace reductio techique ad proposed two variace reduced stochastic compositioal gradiet descet methods Compositioal-SVRG- ad Compositioal-SVRG-2. Both methods are proved to admit liear covergece rate. However, the methods proposed i Wag, Fag, ad Liu

2 able : he table shows the comparisos of SCGD, Accelerated SCGD, ASC-PG, Compositioal-SVRG-, Compositioal- SVRG-2, com-svr-admm ad our VRSC-PG i terms of covergece. For fair compariso, we cosider query complexity i the covergece rate. We defie that oe query of Samplig Oracle SO has three cases: Give x R N ad j {, 2,..., 2 }, SO returs G j x R M ; 2 Give x R N ad j {, 2,..., 2 }, SO returs G j x R M N ; 3 Give y R M ad i {, 2,..., }, SO returs F i y R M. deotes the total umber of iteratios ad κ deotes coditio umber ad 0 <ρ<. Algorithm hx 0 Strogly Covex Geeral Problem SCGD Wag, Fag, ad Liu 204 O 2/3 O /4 Accelerated SCGD Wag, Fag, ad Liu 204 O 4/5 O 2/7 Compositioal-SVRG- Lia, Wag, ad Liu 206 O ρ + 2 +κ 4 - Compositioal-SVRG-2 Lia, Wag, ad Liu 206 Oρ + 2 +κ 3 - ASC-PG Wag ad Liu 206 O 4/5 O 4/9 ASC-PG if G j x are liear Wag ad Liu 206 O O /2 com-svr-admm Yu ad Huag 207 O ρ + 2 +κ 4 - VRSC-PG Our O ρ + 2 +κ 3 O + 2 2/3 ad Lia, Wag, ad Liu 206 are ot applicable to compositio problem with osmooth regularizatio pealty. Compositio problem with osmooth regularizatio was the cosidered i Wag ad Liu 206; Yu ad Huag 207. I Wag ad Liu 206, the authors proposed accelerated stochastic compositioal proximal gradiet algorithm ASC-PG. hey proved that the optimal covergece rate of ASC-PG for strogly covex problem ad geeral problem is O ad O /2 respectively. However, ASC-PG suffers from slow covergece because of the oise of the samplig gradiets. Yu ad Huag 207 proposed com-svr-admm usig variace reductio. Although com- SVR-ADMM admits liear covergece for strogly covex compositio problem, it is ot optimal. Besides, they did ot aalyze the covergece for geeral ocovex compositio problem either. We review the covergece rate of stochastic compositio optimizatio i able. I this paper, we propose variace reduced stochastic compositioal proximal gradiet method VRSC-PG for compositio problem with osmooth regularizatio pealty. Applyig the variace reductio techique to compositio problem is otrivial because the optimizatio procedure ad covergece aalysis are essetially differet. We ivestigate the covergece rate of our method: for strogly covex problem, we prove that VRSC-PG has liear covergece rate O ρ + 2 +κ, 3 which is faster tha com-svr-admm; For geeral problem, sometimes ocovex, VRSC-PG sigificatly improves the state-of-the-art covergece rate of ASC- PG from O /2 to O + 2 2/3. o the best of our kowledge, our result is the ew bechmark for stochastic compositio optimizatio. We further evaluate our method by applyig it to portfolio maagemet ad reiforcemet learig. Experimetal results verify our theoretical aalysis. I Yu ad Huag 207, their result is Oρ + 2 +Am.We prove that to get liear covergece, it must be satisfied that A ad m are proportioal to κ 2, which is ot icluded i their paper. Check Prelimiary I this sectio, we briefly review stochastic compositio optimizatio ad proximal stochastic variace reduced gradiet. Stochastic Compositio Optimizatio he objective fuctio of the stochastic compositio optimizatio is the compositio of expected-value or fiitesum 2 fuctios, which is much more complicated tha traditioal fiite-sum problem. he full gradiet of compositio problem usig chai rule is fx = Gx F Gx. Give x, applyig the classical stochastic gradiet descet method i costat queries to compute the ubiased samplig gradiet G j x F i Gx is ot available, whe Gx is ukow yet. I problem 2, evaluatig Gx is time-cosumig which requires 2 queries i each iteratio. herefore, classical SGD is ot applicable to compositio optimizatio. I Wag, Fag, ad Liu 204, the authors proposed the first stochastic compositioal gradiet descet SCGD for miimizig the stochastic compositio problem with hx =0. I their paper, they proposed to use a auxiliary variable y to approximate Gx. I each iteratio t, we store x t ad y t i memory. SCGD are briefly described i Algorithm. I the algorithm, α t ad β t are learig rate. Both of them are decreasig to guaratee covergece because of the oise iduced by samplig gradiets. I their paper, they supposed that x X. I each iteratio, x is projected to X after step 4. Furthermore, the authors proposed Accelerated SCGD by applyig Nesterov smoothig Nesterov 983, which is proved to coverge faster tha basic SCGD. Remark i supplemetary material. 3288

3 Algorithm SCGD : Iitialize x 0 R N, y 0 R M ; 2: for t =0,, 2,..., do 3: Uiformly sample j from {, 2,..., 2 } with replacemet ad query G j x t ad j Gx t ; 2 queries 4: Update y t+ usig: y t+ β t y t + β t G j x t ; 3 5: Uiformly sample i from {, 2,..., } with replacemet ad query F i y t+ ; query 6: Update x t+ usig: 7: ed for x t+ x t α t G j x t F i y t+ ; 4 Proximal Stochastic Variace Reduced Gradiet Stochastic variace reduced gradiet SVRG Johso ad Zhag 203 was proposed to miimize fiite-sum fuctios: mi x R N f i x, 5 where compoet fuctios f i x : R N R. I largescale optimizatio, SGD ad its variats use ubiased samplig gradiet f i x as the approximatio of the full gradiet, which oly requires oe query i each iteratio. However, the variace iduced by samplig gradiets forces us to decease learig rate to make the algorithm coverge. Suppose x is the optimal solutio to problem 5, full gradiet f i x =0, while samplig gradiet f i x 0. We should decease learig rate, otherwise the covergece of the objective fuctio value ca ot be guarateed. However, the decreasig learig rate makes SGD coverge very slow at the same time. For example, if problem 5 is strogly covex, gradiet descet method GD coverges with liear rate, while SGD coverges with a learig rate at O. Reducig the variace is oe of the most importat ways to accelerate SGD, ad it has bee widely applied to large-scale optimizatio Bottou, Curtis, ad Nocedal 206; Defazio, Bach, ad Lacoste-Julie 204; Gu, Huo, ad Huag 206b; Alle-Zhu ad Yua 206; Huo ad Huag 207; Gu, Huo, ad Huag 206a. I Xiao ad Zhag 204, the authors cosidered the osmooth regularizatio pealty hx 0ad proposed proximal stochastic variace reduced gradiet Proximal SVRG. Proximal SVRG is briefly described i Algorithm 2. I their paper, they used v t as the approximatio of full gradiet, where Ev t =0. It was also proved that the variace of v t coverges to zero: lim E v t t f i x t herefore, we ca keep learig rate η costat i the procedure. I step 7, Prox ηh. x deotes proximal operator. With the defiitio of proximal mappig, we have: Prox ηh. x =argmi x hx + η x x 2, 6 Covergece aalysis ad experimetal results cofirmed that Proximal SVRG admits liear covergece i expectatio for strogly covex optimizatio. I Reddi et al. 206b, the authors proved that Proximal SVRG has subliear covergece rate of O 2/3 whe f i x is ocovex. Algorithm 2 Proximal SVRG : Iitialize x 0 R N ; 2: for s =0,, 2,...S do 3: x s+ 0 x s ; 4: f f i x s ; queries 5: for t =0,, 2,...,m do 6: Uiformly sample i from {, 2,..., } with replacemet ad query f i x s+ t ad f i x s ; 2 queries 7: Update vt s+ usig: v s+ t f i x s+ t f i x s +f ; 7 8: Update model x s+ t+ usig: 9: ed for 0: x s+ x s+ : ed for x s+ t+ Prox ηh.x s+ t ηv s+ t ; 8 m ; Variace Reduced Stochastic Compositioal Proximal Gradiet I this sectio, we propose variace reduced stochastic compositioal proximal gradiet method VRSC-PG for solvig the fiite-sum compositio problem with osmooth regularizatio pealty 2. he descriptio of VRSC-PG is preseted i Algorithm 3. Similar to the framework of Proximal SVRG Xiao ad Zhag 204, our VRSC-PG also has two-layer loops. At the begiig of the outer loop s, we keep a sapshot of the curret model x s i memory ad compute the full gradiet: f x s = 2 G j x s F i G s, 9 2 j= where G s = 2 2 j= G j x s deotes the value of the ier fuctios ad G x s = 2 2 j= G j x s deotes the gradiet of ier fuctios. Computig the full gradiet of fx i problem 2 requires +2 2 queries. o make the umber of queries i each ier iteratio irrelevat to 2, we eed to keep Ĝs+ t ad Ĝs+ t i memory to work as the estimates of Gx s+ t ad Gxt s+ respectively. I our algorithm, we query G At x s+ t ad G At x s, 3289

4 the Ĝs+ t Ĝ s+ t is evaluated as follows: = G s A j A GAt [j] x s G At [j]x s+ t, 0 where A t [j] deotes elemet j i the set A t ad A t = A. he elemets of A t are uiformly sampled from {, 2,..., 2 } with replacemet. I 0, we reduce the variace of G At x s+ t by usig G s ad G At x s. Similarly, we sample B t with size B from {, 2,..., 2 } uiformly with replacemet, ad query G Bt x s+ t ad G Bt x s. he estimatio of Gx s+ t is evaluated as follows: B j B Ĝs+ t = G x s GBt[j] x s G Bt[j]x s+ t where B t [j] deotes elemet j i the set B t ad B t = B. It is importat to ote that A t ad B t are idepedet. Computig Ĝs+ t ad Ĝs+ t requires 2A +2B queries i each ier iteratio. Now, we are able to compute the estimate of fx s+ t i ier iteratio t as follows: vt s+ = b Ĝs+ t Fit Ĝs+ t i t I t G x s F it G s + f x s, 2 where I t is a set of idexes uiformly sampled from {, 2,..., } ad I t = b. As per 2, we eed to query F It Ĝs+ t ad F It G s, ad it requires 2b queries. Fially, we update the model with proximal operator: x s+ t+ = Prox ηh x s+ t ηvt s+, 3 where η is the learig rate. Covergece Aalysis I this sectio, we prove that VRSC-PG admits liear covergece rate for the strogly covex problem; 2 VRSC- PG admits subliear covergece rate O + 2 2/3 for the geeral problem. o the best of our kowledge, both of them are the best results so far. Followig are the assumptios commoly used for stochastic compositio optimizatio Wag, Fag, ad Liu 204; Wag ad Liu 206; Lia, Wag, ad Liu 206. Strogly covex: o aalyze the covergece of VRSC-PG for the strogly covex compositio problem, we assume that the fuctio f is -strogly covex. Assumptio he fuctio fx is -strogly covex. herefore x ad y, we have: fx fy x y. 5 Equivaletly, -strogly covexity ca also be writte as follows: fx fy+ fy,x y + 2 x y 2.6 Algorithm 3 VRSC-PG Iput: he total umber of iteratios i the ier loop m, the total umber of iteratios i the outer loop S, the size of the mii-batch sets A,B ad b, learig rate η. : Iitialize x 0 R N ; 2: for s =0,, 2,,S do 3: x s+ 0 x s ; 4: G s 2 2 j= G i x s ; 2 queries 5: G x s 2 2 j= G j x s ; 2 queries 6: Compute the full gradiet f x s usig 9 ; queries 7: for t =0,, 2,,m do 8: Uiformly sample A t from {, 2,..., 2 } with replacemet ad A t = A ; 9: Update Ĝs+ t usig 0 ; 2A queries 0: Uiformly sample B t from {, 2,..., 2 } with replacemet ad B t = B; : Update Ĝs+ t usig ; 2B queries 2: Uiformly sample I t from {, 2,..., } with replacemet; 3: Compute vt s+ usig 2: 2b queries 4: Update model x s+ x s+ t+ Prox ηh 5: ed for 6: x s+ x s+ 7: ed for m ; t+ usig: x s+ t ηvt s+ 4 Lipschitz Gradiet: We assume that there exist Lipschitz costats L F, L G ad L f for F i x, G j x ad fx respectively. Assumptio 2 here exist costats L F, L G ad L f for F i x, G j x ad fx satisfyig that x, y, i {,, }, j {,, 2 }: F i x F i y L F x y, 7 G j x G j y L G x y, 8 G j x F i Gx G j y F i Gy L f x y. 9 As proved i Lia, Wag, ad Liu 206, accordig to 9, we have: fx fy L f x y, x, y. 20 Equivaletly, 20 ca also be writte as follows: x, y, we have fx fy+ fy,x y + L f 2 x y 2, 2 Bouded gradiets: We assume that the gradiets F i x ad G j x are upper bouded. Assumptio 3 he gradiets F i x ad G j x have upper bouds B F ad B G respectively. F i x B F, x, i {,, } 22 G j x B G, x, j {,, 2 }

5 Note that we do ot eed the strog covexity assumptio whe we aalyze the covergece of VRSC-PG for the geeral problem. Strogly Covex Problem I this sectio, we prove that our VRSC-PG admits liear covergece rate for strogly covex fiite-sum compositio problem with osmooth pealty regularizatio 2. We eed Assumptios, 2 ad 3 i this sectio. Ulike Prox-SVRG i Xiao ad Zhag 204, the estimated vt s+ is biased, i.e., E It,A t,b t [vt s+ ] fx s+ t. It makes the theoretical aalysis for provig the covergece rate of VRSC-PG more challegig tha the aalysis i Xiao ad Zhag 204. I spite of this, we ca demostrate that E vt s+ fx s+ t 2 is upper bouded as well. Lemma Let x be the optimal solutio to problem 2 Hx such that x =argmi x R N Hx. We defie γ = 64 B 2 F L2 G B + B4 G L2 F A +8L f. Supposig Assumptios, 2 ad 3 hold, from the defiitio of vt s+ i 2, the followig iequality holds that: E vt s+ fx s+ t 2 [ ] γ Hx s+ t Hx +H x s Hx. 24 herefore, whe x s+ t ad x s coverges to x, E v t fx s+ t 2 also coverges to zero. hus, we ca keep learig rate costat, ad obtai faster covergece. heorem Suppose Assumptios, 2 ad 3 hold. We let the optimal solutio x =argmi x R N Hx, ifm, A, B ad η are selected properly so that ρ<, where ρ is defied as follows: ρ = ρ c = 2 +2η 6ηL f + ρ c m + 2η 7 8 6ηL f + ρ c m η B 2 F L 2 G B + B4 G L2 F A we ca prove that our VRSC-PG admits liear covergece rate: EH x S Hx ρ EH x S 0 Hx 27 As per heorem, we eed to choose η, m, A ad B properly to make ρ<. We provide a example to show how to select these parameters. Corollary Accordig to heorem, we set η, m, A ad B as follows: η = 28 96L f m = L f 29 A = 2048B4 G L2 F 2 30 B = 2048B2 F L2 G 2 3 we have the followig liear covergece rate for VRSC-PG: S 2 EH x S Hx EH x 0 Hx 32 3 Remark Accordig to heorem, to obtai EH x s Hx ε 33 the umber of stages S is required to satisfy: S log EH x0 Hx / log ε ρ 34 As per Algorithm 3 ad the defiitio of Samplig Oracle i Wag ad Liu 206, to make the objective value gap EH x s Hx ε, the total query complexity we eed to take is O + 2 +ma+b+b log ε = O + { } 2 + κ 3 log ε, L where we let κ =max f, L F, L G ad b ca be smaller tha or proportioal to κ 2. It is better tha com-svr-admmyu ad Huag 207 whose total query complexity is O κ 4 log ε. Geeral Problem I this sectio, we prove that VRSC-PG admits a subliear covergece rate O for the geeral fiite-sum compositio problem with osmooth regularizatio pealty. It is much better tha the state-of-the-art method ASC-PG Wag ad Liu 206 whose optimal covergece rate is O /2. I this sectio, we oly eed Assumptio 2 ad 3. he ubiased vt s+ makes our aalysis otrivial ad it is much differet from previous aalysis for fiite-sum problem Reddi et al. 206a. I our proof, we defie: G η x = x Proxηh. x fx. 35 η heorem 2 Suppose Assumptios 2 ad 3 hold. Let x be the optimal solutio to problem 2, we have x = arg mi x R N Hx.Ifm, A, B, b ad η are selected properly such that: 4 ηm 2 L 2 f b + 2ηm2 B 4 G L2 F A + 2ηm2 B 2 F L2 G B + L f 2 2η, 36 the the followig iequality holds that: E G η x a 2 2 H x 0 Hx 37 2ηL f η where x a is uiformly selected from {{x s+ t } m t=0 }S t=0 ad is a multiple of m, As per heorem 2, we eed to choose m, A, B, b ad η appropriately to make coditio 36 satisfied. We provide a example to show how to select these parameters. 329

6 a κ cov =2 b κ cov = c κ cov =0 d κ cov =0 Figure : Experimetal results for meaig-variace portfolio maagemet o sythetic data. κ cov is the coditioal umber of the covariace matrix of the correspodig Gaussia distributio which is used to geerate reward. We use time as x axis, ad it is proportioal to the query complexity. I y axis, the objective value gap is defied as Hx Hx, where x is obtaied by ruig our methods for eough iteratios util covergece. Gx 2 deotes the l 2 -orm of the full gradiet, where Gx = fx+ hx. Corollary 2 Accordig to heorem 2, we let m = + 2 3, η = 4L f, b = ad be a multiple of m, it is easy to kow that if A ad B are lower bouded: A 8m2 B 4 G L2 F L f 38 B 8m2 BF 2 L2 G 39 L f we ca obtai subliear covergece rate for VRSC-PG: E G η x a 2 H x 0 Hx 6L f 40 Remark 2 Accordig to heorem 2, to obtai E G η x a 2 ε 4 the umber of iteratios is required to satisfy: EH x 0 Hx 6L f 42 ε As per Algorithm 3 ad the defiitio of Samplig Oracle i Wag ad Liu 206, to obtai ε-accurate solutio, E G η x a 2 ε, the total query complexity we eed to take is O A+B+b ε =O /3 ε where A, B ad b are proportioal to herefore, our method improves the state-of-the-art covergece rate of stochastic compositio optimizatio for geeral problem from O /2 Optimal covergece rate for ASC-PG to O + 2 2/3. Experimetal Results We coduct two experimets to evaluate our proposed method: applicatio to portfolio maagemet; 2 applicatio to policy evaluatio i reiforcemet learig. I the experimets, there are three compared methods for stochastic compositio optimizatio:, 3292

7 Accelerated stochastic compositioal proximal gradiet ASC-PG Wag ad Liu 206; Stochastic variace reduced ADMM for Stochastic compositio optimizatio com-svr-admm Yu ad Huag 207; Variace Reduced Stochastic Compositioal Proximal Gradiet VRSC-PGOur method. I our experimets, learig rate η is tued from {, 0, 0 2, 0 3, 0 4 }. We keep the learig rate costat for com-svr-admm ad VRSC-PG i the optimizatio. For ASC-PG, i order to guaratee covergece, learig rate is decreased as per η +t, where t deotes the umber of iteratios. Applicatio to Portfolio Maagemet Suppose there are N assets we ca ivest, r t R N deotes the rewards of N assets at time t. Our goal is to maximize the retur of the ivestmet ad to miimize the risk of the ivestmet at the same time. Portfolio maagemet problem ca be formulated as the mea-variace optimizatio as follows: mi x R N r t,x + t= t= r t,x 2 r j,x 43 j= where x R N deotes the ivestmet quatity vector i N assets. Accordig to Lia, Wag, ad Liu 206, problem 43 ca also be viewed as the compositio problem as 2. I our experimet, we also add a osmooth regularizatio pealty hx =λ x i the mea-variace optimizatio problem 43. Similar to the experimetal settigs i Lia, Wag, ad Liu 206, we let = 2000 ad N = 200. Rewards r t are geerated i two steps: Geerate a Gaussia distributio o R N, where we defie the coditio umber of its covariace matrix as κ cov. Because κ cov is proportioal to κ, i our experimet, we will cotrol κ cov to chage the value of κ; 2 Sample rewards r t from the Gaussia distributio ad make all elemets positive to guaratee that this problem has a solutio. I the experimet, we compared three methods o two sythetic datasets, which are geerated through Gaussia distributios with κ cov =2ad κ cov =0separately. We set λ =0 3 ad A = B = b =5. We just select the values of A, B, b casually, it is probable that we ca get better results as log as we tue them carefully. Figure shows the covergece of compared methods regardig time. We suppose that the elapsed time is proportioal to the query complexity. Objective value gap meas Hx t Hx, where x is the optimal solutio to Hx. We compute Hx by ruig our method util covergece. Firstly, by observig the x ad y axises i Figure, we ca kow that whe κ cov =0, all compared methods eed more time to miimize problem 43, which is cosistet with our aalysis. Icreasig κ will icrease the total query complexity. Secodly, we ca also fid out that com-svr-admm ad VRSC-PG admit liear covergece rate. ASC-PG rus faster at the begiig, because of their low query complexity i each iteratio. However, their covergece slows dow whe the learig rate gets small. I four figures, our SVRC-PG always has the best performace compared to other compared methods. Applicatio to Reiforcemet Learig We the apply stochastic compositio optimizatio to reiforcemet learig ad evaluate three compared methods i the task of policy evaluatio. I reiforcemet learig, let V π s be the value of state s uder policy π. he value fuctio V π s ca be evaluated through Bellma equatio as follows: V π s =E[r s,s 2 + γv π s 2 s ] 44 for all s,s 2 {, 2,..., S}, where S represets the umber of total states. Accordig to Wag ad Liu 206, the Bellma equatio 44 ca also be writte as a compositio problem. I our experimet, we also add sparsity regularizatio hx =λ x i the objective fuctio. Followig Da, Neuma, ad Peters 204, we geerate a Markov decisio process MDP. here are 400 states ad 0 actios at each state. he trasitio probability is geerated radomly from the uiform distributio i the rage of [0, ]. We the add 0 5 to each elemet of trasitio matrix to esure the ergodicity of our MDP. he rewards rs, s from state s to state s are also sampled uiformly i the rage of [0, ]. I our experimet, we set λ =0 3 ad A = B = b =5. We also select these values casually, better results ca be obtaied if we tue them carefully. I Figure 2, we plot the covergece of the objective value ad Gx 2 i terms of time. We ca observe that VRSC- PG is much faster tha ASC-PG, which has bee reflected i the aalysis of covergece rate already. It is also obvious that our VRSC-PG coverges faster tha com-svr-admm. Experimetal results o policy evaluatio also verify our theoretical aalysis. Coclusio I this paper, we propose variace reduced stochastic compositioal proximal gradiet method VRSC-PG for compositio problem with osmooth regularizatio pealty. We also aalyze the covergece rate of our method: for strogly covex compositio problem, VRSC-PG is proved to admit liear covergece; 2 for geeral compositio problem, VRSC-PG sigificatly improves the state-of-the-art covergece rate from O /2 to O + 2 2/3. Both of our theoretical aalysis, to the best of our kowledge, are the state-of-the-art results for stochastic compositio optimizatio. Fially, we apply our method to two differet applicatios, portfolio maagemet ad reiforcemet learig. Experimetal results show that our method always has the best performace i differet cases ad verify the coclusios of theoretical aalysis. Ackowledgemet Z. H., B. G., H. H. were partially supported by the followig grats: NSF-IIS , NSF-IIS 34452, NSF- DBI , NSF-IIS 69308, NSF-IIS , NIH R0 AG J. L. was partially supported by NSF-CCF

8 Figure 2: Figures show the experimetal results of policy evaluatio i reiforcemet learig. We plot the covergece of objective value ad the full gradiet Gx 2 regardig time respectively. Gx 2 deotes the l 2 -orm of the full gradiet, where Gx = fx+ hx. Refereces Alle-Zhu, Z., ad Yua, Y Improved svrg for ostrogly-covex or sum-of-o-covex objectives. I Iteratioal coferece o machie learig, Bottou, L.; Curtis, F. E.; ad Nocedal, J Optimizatio methods for large-scale machie learig. arxiv preprit arxiv: Dai, B.; He, N.; Pa, Y.; Boots, B.; ad Sog, L Learig from coditioal distributios via dual kerel embeddigs. arxiv preprit arxiv: Da, C.; Neuma, G.; ad Peters, J Policy evaluatio with temporal differeces: a survey ad compariso. Joural of Machie Learig Research 5: Defazio, A.; Bach, F.; ad Lacoste-Julie, S Saga: A fast icremetal gradiet method with support for ostrogly covex composite objectives. I Advaces i Neural Iformatio Processig Systems, Detcheva, D.; Peev, S.; ad Ruszczyński, A Statistical estimatio of composite risk fuctioals ad risk optimizatio problems. Aals of the Istitute of Statistical Mathematics 24. Gu, B.; Huo, Z.; ad Huag, H. 206a. Asychroous stochastic block coordiate descet with variace reductio. arxiv preprit arxiv: Gu, B.; Huo, Z.; ad Huag, H. 206b. Zeroth-order asychroous doubly stochastic algorithm with variace reductio. arxiv preprit arxiv: Huo, Z., ad Huag, H Asychroous mii-batch gradiet descet with variace reductio for o-covex optimizatio. I AAAI, Johso, R., ad Zhag, Acceleratig stochastic gradiet descet usig predictive variace reductio. I Advaces i Neural Iformatio Processig Systems, Lia, X.; Wag, M.; ad Liu, J Fiite-sum compositio optimizatio via variace reduced gradiet descet. arxiv preprit arxiv: Nesterov, Y A method for ucostraied covex miimizatio problem with the rate of covergece o /k2. I Doklady a SSSR, volume 269, Reddi, S. J.; Sra, S.; Poczos, B.; ad Smola, A. 206a. Fast stochastic methods for osmooth ocovex optimizatio. arxiv preprit arxiv: Reddi, S. J.; Sra, S.; Poczos, B.; ad Smola, A. J. 206b. Proximal stochastic methods for osmooth ocovex fiite-sum optimizatio. I Advaces i Neural Iformatio Processig Systems, Wag, M., ad Liu, J Acceleratig stochastic compositio optimizatio. I Advaces I Neural Iformatio Processig Systems, Wag, M.; Fag, E. X.; ad Liu, H Stochastic compositioal gradiet descet: algorithms for miimizig compositios of expected-value fuctios. arxiv preprit arxiv: Xiao, L., ad Zhag, A proximal stochastic gradiet method with progressive variace reductio. SIAM Joural o Optimizatio 244: Yu, Y., ad Huag, L Fast stochastic variace reduced admm for stochastic compositio optimizatio. arxiv preprit arxiv:

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data Doubly Stochastic Primal-Dual Coordiate Method for Regularized Empirical Risk Miimizatio with Factorized Data Adams Wei Yu, Qihag Li, Tiabao Yag Caregie Mello Uiversity The Uiversity of Iowa weiyu@cs.cmu.edu,

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Markov Decision Processes

Markov Decision Processes Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

On the Linear Convergence of a Cyclic Incremental Aggregated Gradient Method

On the Linear Convergence of a Cyclic Incremental Aggregated Gradient Method O the Liear Covergece of a Cyclic Icremetal Aggregated Gradiet Method Arya Mokhtari Departmet of Electrical ad Systems Egieerig Uiversity of Pesylvaia Philadelphia, PA 19104, USA Mert Gürbüzbalaba Departmet

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

ON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS * M. JA]IMOVI], I. KRNI] 1.

ON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS * M. JA]IMOVI], I. KRNI] 1. Yugoslav Joural of Operatios Research 1 (00), Number 1, 49-60 ON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS M. JA]IMOVI], I. KRNI] Departmet of Mathematics

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression Harder, Better, Faster, Stroger Covergece Rates for Least-Squares Regressio Aoymous Author(s) Affiliatio Address email Abstract 1 2 3 4 5 6 We cosider the optimizatio of a quadratic objective fuctio whose

More information

Research Article A Unified Weight Formula for Calculating the Sample Variance from Weighted Successive Differences

Research Article A Unified Weight Formula for Calculating the Sample Variance from Weighted Successive Differences Discrete Dyamics i Nature ad Society Article ID 210761 4 pages http://dxdoiorg/101155/2014/210761 Research Article A Uified Weight Formula for Calculatig the Sample Variace from Weighted Successive Differeces

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences Commuicatios of the Korea Statistical Society 29, Vol. 16, No. 5, 841 849 Precise Rates i Complete Momet Covergece for Negatively Associated Sequeces Dae-Hee Ryu 1,a a Departmet of Computer Sciece, ChugWoo

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

ON POINTWISE BINOMIAL APPROXIMATION

ON POINTWISE BINOMIAL APPROXIMATION Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece

More information

Regression with an Evaporating Logarithmic Trend

Regression with an Evaporating Logarithmic Trend Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,

More information

Monte Carlo Optimization to Solve a Two-Dimensional Inverse Heat Conduction Problem

Monte Carlo Optimization to Solve a Two-Dimensional Inverse Heat Conduction Problem Australia Joural of Basic Applied Scieces, 5(): 097-05, 0 ISSN 99-878 Mote Carlo Optimizatio to Solve a Two-Dimesioal Iverse Heat Coductio Problem M Ebrahimi Departmet of Mathematics, Karaj Brach, Islamic

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

MODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING. University of Illinois at Urbana-Champaign

MODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING. University of Illinois at Urbana-Champaign MODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING Yuheg Bu Jiaxu Lu Veugopal V. Veeravalli Uiversity of Illiois at Urbaa-Champaig Tsighua Uiversity Email: bu3@illiois.edu, lujx4@mails.tsighua.edu.c,

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Multi parameter proximal point algorithms

Multi parameter proximal point algorithms Multi parameter proximal poit algorithms Ogaeditse A. Boikayo a,b,, Gheorghe Moroşau a a Departmet of Mathematics ad its Applicatios Cetral Europea Uiversity Nador u. 9, H-1051 Budapest, Hugary b Departmet

More information

Quantile regression with multilayer perceptrons.

Quantile regression with multilayer perceptrons. Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Fastest mixing Markov chain on a path

Fastest mixing Markov chain on a path Fastest mixig Markov chai o a path Stephe Boyd Persi Diacois Ju Su Li Xiao Revised July 2004 Abstract We ider the problem of assigig trasitio probabilities to the edges of a path, so the resultig Markov

More information

arxiv: v1 [stat.ml] 11 Dec 2016

arxiv: v1 [stat.ml] 11 Dec 2016 Lock-Free Optimizatio for No-Covex Problems She-Yi Zhao, Gog-Duo Zhag ad Wu-Ju Li Natioal Key Laboratory for Novel Software Techology Departmet of Computer Sciece ad Techology, Najig Uiversity, Chia {zhaosy,

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

An Alternative Scaling Factor In Broyden s Class Methods for Unconstrained Optimization

An Alternative Scaling Factor In Broyden s Class Methods for Unconstrained Optimization Joural of Mathematics ad Statistics 6 (): 63-67, 00 ISSN 549-3644 00 Sciece Publicatios A Alterative Scalig Factor I Broyde s Class Methods for Ucostraied Optimizatio Muhammad Fauzi bi Embog, Mustafa bi

More information

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung Bull. Korea Math. Soc. 36 (999), No. 3, pp. 45{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Abstract. This paper provides suciet coditios which esure the strog cosistecy of regressio

More information

A new iterative algorithm for reconstructing a signal from its dyadic wavelet transform modulus maxima

A new iterative algorithm for reconstructing a signal from its dyadic wavelet transform modulus maxima ol 46 No 6 SCIENCE IN CHINA (Series F) December 3 A ew iterative algorithm for recostructig a sigal from its dyadic wavelet trasform modulus maxima ZHANG Zhuosheg ( u ), LIU Guizhog ( q) & LIU Feg ( )

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Fast Rates for Regularized Objectives

Fast Rates for Regularized Objectives Fast Rates for Regularized Objectives Karthik Sridhara, Natha Srebro, Shai Shalev-Shwartz Toyota Techological Istitute Chicago Abstract We study covergece properties of empirical miimizatio of a stochastic

More information

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of

More information

On Weak and Strong Convergence Theorems for a Finite Family of Nonself I-asymptotically Nonexpansive Mappings

On Weak and Strong Convergence Theorems for a Finite Family of Nonself I-asymptotically Nonexpansive Mappings Mathematica Moravica Vol. 19-2 2015, 49 64 O Weak ad Strog Covergece Theorems for a Fiite Family of Noself I-asymptotically Noexpasive Mappigs Birol Güdüz ad Sezgi Akbulut Abstract. We prove the weak ad

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,

More information

Self-normalized deviation inequalities with application to t-statistic

Self-normalized deviation inequalities with application to t-statistic Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric

More information

Research Article Approximate Riesz Algebra-Valued Derivations

Research Article Approximate Riesz Algebra-Valued Derivations Abstract ad Applied Aalysis Volume 2012, Article ID 240258, 5 pages doi:10.1155/2012/240258 Research Article Approximate Riesz Algebra-Valued Derivatios Faruk Polat Departmet of Mathematics, Faculty of

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Preponderantly increasing/decreasing data in regression analysis

Preponderantly increasing/decreasing data in regression analysis Croatia Operatioal Research Review 269 CRORR 7(2016), 269 276 Prepoderatly icreasig/decreasig data i regressio aalysis Darija Marković 1, 1 Departmet of Mathematics, J. J. Strossmayer Uiversity of Osijek,

More information

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem This is the Pre-Published Versio. A New Solutio Method for the Fiite-Horizo Discrete-Time EOQ Problem Chug-Lu Li Departmet of Logistics The Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog Phoe: +852-2766-7410

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Supplement to Graph Sparsification Approaches for Laplacian Smoothing

Supplement to Graph Sparsification Approaches for Laplacian Smoothing Supplemet to Graph Sparsificatio Approaches for Laplacia Smoothig Veerajaeyulu Sadhaala Yu-Xiag Wag Rya J. Tibshirai Machie Learig Departmet Caregie Mello Uiversity Pittsburgh PA 53 Departmet of Statistics

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

Mixed Optimization for Smooth Functions

Mixed Optimization for Smooth Functions Mixed Optimizatio for Smooth Fuctios Mehrdad Mahdavi Liju Zhag Rog Ji Departmet of Computer Sciece ad Egieerig, Michiga State Uiversity, MI, USA {mahdavim,zhaglij,rogji}@msu.edu Abstract It is well ow

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Questions and answers, kernel part

Questions and answers, kernel part Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Computational Tutorial of Steepest Descent Method and Its Implementation in Digital Image Processing

Computational Tutorial of Steepest Descent Method and Its Implementation in Digital Image Processing Computatioal utorial of Steepest Descet Method ad Its Implemetatio i Digital Image Processig Vorapoj Pataavijit Departmet of Electrical ad Electroic Egieerig, Faculty of Egieerig Assumptio Uiversity, Bagkok,

More information

A collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation

A collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation Iteratioal Joural of Mathematics Research. ISSN 0976-5840 Volume 9 Number 1 (017) pp. 45-51 Iteratioal Research Publicatio House http://www.irphouse.com A collocatio method for sigular itegral equatios

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Dynamic Policy Programming with Function Approximation: Supplementary Material

Dynamic Policy Programming with Function Approximation: Supplementary Material Dyamic Policy Programmig with Fuctio pproximatio: Supplemetary Material Mohammad Gheshlaghi zar Radboud Uiversity Nijmege Geert Grooteplei Noord 21 6525 EZ Nijmege Netherlads m.azar@sciece.ru.l Viceç Gómez

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

On the convergence rates of Gladyshev s Hurst index estimator

On the convergence rates of Gladyshev s Hurst index estimator Noliear Aalysis: Modellig ad Cotrol, 2010, Vol 15, No 4, 445 450 O the covergece rates of Gladyshev s Hurst idex estimator K Kubilius 1, D Melichov 2 1 Istitute of Mathematics ad Iformatics, Vilius Uiversity

More information

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows

More information

Comparison Study of Series Approximation. and Convergence between Chebyshev. and Legendre Series

Comparison Study of Series Approximation. and Convergence between Chebyshev. and Legendre Series Applied Mathematical Scieces, Vol. 7, 03, o. 6, 3-337 HIKARI Ltd, www.m-hikari.com http://d.doi.org/0.988/ams.03.3430 Compariso Study of Series Approimatio ad Covergece betwee Chebyshev ad Legedre Series

More information

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants O Variace Reductio i Stochastic Gradiet Descet ad its Asychroous Variats Sashak J. Reddi Caregie Mello Uiversity sjakkamr@cs.cmu.edu Ahmed Hefy Caregie Mello Uiversity ahefy@cs.cmu.edu Barabás Póczos Caregie

More information

The Perturbation Bound for the Perron Vector of a Transition Probability Tensor

The Perturbation Bound for the Perron Vector of a Transition Probability Tensor NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Liear Algebra Appl. ; : 6 Published olie i Wiley IterSciece www.itersciece.wiley.com. DOI:./la The Perturbatio Boud for the Perro Vector of a Trasitio

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Numerical Method for Blasius Equation on an infinite Interval

Numerical Method for Blasius Equation on an infinite Interval Numerical Method for Blasius Equatio o a ifiite Iterval Alexader I. Zadori Omsk departmet of Sobolev Mathematics Istitute of Siberia Brach of Russia Academy of Scieces, Russia zadori@iitam.omsk.et.ru 1

More information

Probabilistic and Average Linear Widths in L -Norm with Respect to r-fold Wiener Measure

Probabilistic and Average Linear Widths in L -Norm with Respect to r-fold Wiener Measure joural of approximatio theory 84, 3140 (1996) Article No. 0003 Probabilistic ad Average Liear Widths i L -Norm with Respect to r-fold Wieer Measure V. E. Maiorov Departmet of Mathematics, Techio, Haifa,

More information

Detailed proofs of Propositions 3.1 and 3.2

Detailed proofs of Propositions 3.1 and 3.2 Detailed proofs of Propositios 3. ad 3. Proof of Propositio 3. NB: itegratio sets are geerally omitted for itegrals defied over a uit hypercube [0, s with ay s d. We first give four lemmas. The proof of

More information

Numerical Solution of the Two Point Boundary Value Problems By Using Wavelet Bases of Hermite Cubic Spline Wavelets

Numerical Solution of the Two Point Boundary Value Problems By Using Wavelet Bases of Hermite Cubic Spline Wavelets Australia Joural of Basic ad Applied Scieces, 5(): 98-5, ISSN 99-878 Numerical Solutio of the Two Poit Boudary Value Problems By Usig Wavelet Bases of Hermite Cubic Splie Wavelets Mehdi Yousefi, Hesam-Aldie

More information

Appendix to: Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data

Appendix to: Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data Appedix to: Hypothesis Testig for Multiple Mea ad Correlatio Curves with Fuctioal Data Ao Yua 1, Hog-Bi Fag 1, Haiou Li 1, Coli O. Wu, Mig T. Ta 1, 1 Departmet of Biostatistics, Bioiformatics ad Biomathematics,

More information

Improved Class of Ratio -Cum- Product Estimators of Finite Population Mean in two Phase Sampling

Improved Class of Ratio -Cum- Product Estimators of Finite Population Mean in two Phase Sampling Global Joural of Sciece Frotier Research: F Mathematics ad Decisio Scieces Volume 4 Issue 2 Versio.0 Year 204 Type : Double Blid Peer Reviewed Iteratioal Research Joural Publisher: Global Jourals Ic. (USA

More information

Approximating the ruin probability of finite-time surplus process with Adaptive Moving Total Exponential Least Square

Approximating the ruin probability of finite-time surplus process with Adaptive Moving Total Exponential Least Square WSEAS TRANSACTONS o BUSNESS ad ECONOMCS S. Khotama, S. Boothiem, W. Klogdee Approimatig the rui probability of fiite-time surplus process with Adaptive Movig Total Epoetial Least Square S. KHOTAMA, S.

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Application to Random Graphs

Application to Random Graphs A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let

More information

A New Multivariate Markov Chain Model with Applications to Sales Demand Forecasting

A New Multivariate Markov Chain Model with Applications to Sales Demand Forecasting Iteratioal Coferece o Idustrial Egieerig ad Systems Maagemet IESM 2007 May 30 - Jue 2 BEIJING - CHINA A New Multivariate Markov Chai Model with Applicatios to Sales Demad Forecastig Wai-Ki CHING a, Li-Mi

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information