Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization
|
|
- Neal Waters
- 5 years ago
- Views:
Transcription
1 he hirty-secod AAAI Coferece o Artificial Itelligece AAAI-8 Accelerated Method for Stochastic Compositio Optimizatio with Nosmooth Regularizatio Zhouyua Huo, Bi Gu, Ji Liu, 2 Heg Huag Departmet of Electrical ad Computer Egieerig, Uiversity of Pittsburgh, Pittsburgh, PA 526, USA 2 Departmet of Computer Sciece, Uiversity of Rochester, Rochester, NY, 4627, USA zhouyua.huo@pitt.edu, big0@pitt.edu, jliu@cs.rochester.edu, heg.huag@pitt.edu Abstract Stochastic compositio optimizatio draws much attetio recetly ad has bee successful i may emergig applicatios of machie learig, statistical aalysis, ad reiforcemet learig. I this paper, we focus o the compositio problem with osmooth regularizatio pealty. Previous works either have slow covergece rate, or do ot provide complete covergece aalysis for the geeral problem. I this paper, we tackle these two issues by proposig a ew stochastic compositio optimizatio method for compositio problem with osmooth regularizatio pealty. I our method, we apply variace reductio techique to accelerate the speed of covergece. o the best of our kowledge, our method admits the fastest covergece rate for stochastic compositio optimizatio: for strogly covex compositio problem, our algorithm is proved to admit liear covergece; for geeral compositio problem, our algorithm sigificatly improves the state-of-theart covergece rate from O /2 to O + 2 2/3. Fially, we apply our proposed algorithm to portfolio maagemet ad policy evaluatio i reiforcemet learig. Experimetal results verify our theoretical aalysis. Itroductio Stochastic compositio optimizatio draws much attetio recetly ad has bee successful i addressig may emergig applicatios of differet areas, such as reiforcemet learig Dai et al. 206; Wag ad Liu 206, statistical learig Wag, Fag, ad Liu 204 ad risk maagemet Detcheva, Peev, ad Ruszczyński 206. he authors i Wag, Fag, ad Liu 204; Wag ad Liu 206 proposed compositio problem, which is the compositio of two expected-value fuctios: mi x R N E i F i E j G j x +hx, }{{} fx where G j x :R N R M are ier compoet fuctios, F i y :R M R are outer compoet fuctios. he regularizatio pealty hx is a closed covex fuctio but ot ecessarily smooth. I reality, we usually solve the fiitesum sceario for compositio problem, ad it ca be o whom all correspodece should be addressed. Copyright c 208, Associatio for the Advacemet of Artificial Itelligece All rights reserved. represeted as follows: 2 mi Hx = mi F i G jx +hx, 2 x R N x R N 2 j= }{{} fx where it is defied that F y = 2 j= F i y ad Gx = G j x. hroughout this paper, we maily focus o the case that F i ad G j are smooth. However, we do ot require that F i ad G j have to be covex. Miimizig the compositio of expected-value fuctios or fiite-sum fuctios 2 is challegig. Classical stochastic gradiet method SGD ad its variats are well suited for miimizig traditioal fiite-sum fuctios Bottou, Curtis, ad Nocedal 206. However, they are ot directly applicable to the compositio problem. o apply SGD, we eed to compute the ubiased samplig gradiet G j x F i Gx of problem 2, which is timecosumig whe Gx is ukow. Evaluatig Gx requires traversig all ier compoet fuctios, which is uacceptable to compute i each iteratio if 2 is a large umber. I Wag, Fag, ad Liu 204, the authors cosidered the problem with hx =0ad proposed stochastic compositioal gradiet descet algorithm SCGD which is the first stochastic method for compositio problem. I their paper, they proved that the covergece rate of SCGD for strogly covex compositio problem is O 2/3, ad for geeral problem is O /4. hey also proposed accelerated SCGD by usig Nesterov smoothig techique Nesterov 983 which is proved to admit faster covergece rate. SCGD has costat query complexity per iteratio, however, their covergece rate is far worse tha full gradiet method because of the oise iduced by samplig gradiets. Recetly, variace reductio techique Johso ad Zhag 203 was applied to accelerate the covergece of stochastic compositio optimizatio. Lia, Wag, ad Liu 206 first utilized the variace reductio techique ad proposed two variace reduced stochastic compositioal gradiet descet methods Compositioal-SVRG- ad Compositioal-SVRG-2. Both methods are proved to admit liear covergece rate. However, the methods proposed i Wag, Fag, ad Liu
2 able : he table shows the comparisos of SCGD, Accelerated SCGD, ASC-PG, Compositioal-SVRG-, Compositioal- SVRG-2, com-svr-admm ad our VRSC-PG i terms of covergece. For fair compariso, we cosider query complexity i the covergece rate. We defie that oe query of Samplig Oracle SO has three cases: Give x R N ad j {, 2,..., 2 }, SO returs G j x R M ; 2 Give x R N ad j {, 2,..., 2 }, SO returs G j x R M N ; 3 Give y R M ad i {, 2,..., }, SO returs F i y R M. deotes the total umber of iteratios ad κ deotes coditio umber ad 0 <ρ<. Algorithm hx 0 Strogly Covex Geeral Problem SCGD Wag, Fag, ad Liu 204 O 2/3 O /4 Accelerated SCGD Wag, Fag, ad Liu 204 O 4/5 O 2/7 Compositioal-SVRG- Lia, Wag, ad Liu 206 O ρ + 2 +κ 4 - Compositioal-SVRG-2 Lia, Wag, ad Liu 206 Oρ + 2 +κ 3 - ASC-PG Wag ad Liu 206 O 4/5 O 4/9 ASC-PG if G j x are liear Wag ad Liu 206 O O /2 com-svr-admm Yu ad Huag 207 O ρ + 2 +κ 4 - VRSC-PG Our O ρ + 2 +κ 3 O + 2 2/3 ad Lia, Wag, ad Liu 206 are ot applicable to compositio problem with osmooth regularizatio pealty. Compositio problem with osmooth regularizatio was the cosidered i Wag ad Liu 206; Yu ad Huag 207. I Wag ad Liu 206, the authors proposed accelerated stochastic compositioal proximal gradiet algorithm ASC-PG. hey proved that the optimal covergece rate of ASC-PG for strogly covex problem ad geeral problem is O ad O /2 respectively. However, ASC-PG suffers from slow covergece because of the oise of the samplig gradiets. Yu ad Huag 207 proposed com-svr-admm usig variace reductio. Although com- SVR-ADMM admits liear covergece for strogly covex compositio problem, it is ot optimal. Besides, they did ot aalyze the covergece for geeral ocovex compositio problem either. We review the covergece rate of stochastic compositio optimizatio i able. I this paper, we propose variace reduced stochastic compositioal proximal gradiet method VRSC-PG for compositio problem with osmooth regularizatio pealty. Applyig the variace reductio techique to compositio problem is otrivial because the optimizatio procedure ad covergece aalysis are essetially differet. We ivestigate the covergece rate of our method: for strogly covex problem, we prove that VRSC-PG has liear covergece rate O ρ + 2 +κ, 3 which is faster tha com-svr-admm; For geeral problem, sometimes ocovex, VRSC-PG sigificatly improves the state-of-the-art covergece rate of ASC- PG from O /2 to O + 2 2/3. o the best of our kowledge, our result is the ew bechmark for stochastic compositio optimizatio. We further evaluate our method by applyig it to portfolio maagemet ad reiforcemet learig. Experimetal results verify our theoretical aalysis. I Yu ad Huag 207, their result is Oρ + 2 +Am.We prove that to get liear covergece, it must be satisfied that A ad m are proportioal to κ 2, which is ot icluded i their paper. Check Prelimiary I this sectio, we briefly review stochastic compositio optimizatio ad proximal stochastic variace reduced gradiet. Stochastic Compositio Optimizatio he objective fuctio of the stochastic compositio optimizatio is the compositio of expected-value or fiitesum 2 fuctios, which is much more complicated tha traditioal fiite-sum problem. he full gradiet of compositio problem usig chai rule is fx = Gx F Gx. Give x, applyig the classical stochastic gradiet descet method i costat queries to compute the ubiased samplig gradiet G j x F i Gx is ot available, whe Gx is ukow yet. I problem 2, evaluatig Gx is time-cosumig which requires 2 queries i each iteratio. herefore, classical SGD is ot applicable to compositio optimizatio. I Wag, Fag, ad Liu 204, the authors proposed the first stochastic compositioal gradiet descet SCGD for miimizig the stochastic compositio problem with hx =0. I their paper, they proposed to use a auxiliary variable y to approximate Gx. I each iteratio t, we store x t ad y t i memory. SCGD are briefly described i Algorithm. I the algorithm, α t ad β t are learig rate. Both of them are decreasig to guaratee covergece because of the oise iduced by samplig gradiets. I their paper, they supposed that x X. I each iteratio, x is projected to X after step 4. Furthermore, the authors proposed Accelerated SCGD by applyig Nesterov smoothig Nesterov 983, which is proved to coverge faster tha basic SCGD. Remark i supplemetary material. 3288
3 Algorithm SCGD : Iitialize x 0 R N, y 0 R M ; 2: for t =0,, 2,..., do 3: Uiformly sample j from {, 2,..., 2 } with replacemet ad query G j x t ad j Gx t ; 2 queries 4: Update y t+ usig: y t+ β t y t + β t G j x t ; 3 5: Uiformly sample i from {, 2,..., } with replacemet ad query F i y t+ ; query 6: Update x t+ usig: 7: ed for x t+ x t α t G j x t F i y t+ ; 4 Proximal Stochastic Variace Reduced Gradiet Stochastic variace reduced gradiet SVRG Johso ad Zhag 203 was proposed to miimize fiite-sum fuctios: mi x R N f i x, 5 where compoet fuctios f i x : R N R. I largescale optimizatio, SGD ad its variats use ubiased samplig gradiet f i x as the approximatio of the full gradiet, which oly requires oe query i each iteratio. However, the variace iduced by samplig gradiets forces us to decease learig rate to make the algorithm coverge. Suppose x is the optimal solutio to problem 5, full gradiet f i x =0, while samplig gradiet f i x 0. We should decease learig rate, otherwise the covergece of the objective fuctio value ca ot be guarateed. However, the decreasig learig rate makes SGD coverge very slow at the same time. For example, if problem 5 is strogly covex, gradiet descet method GD coverges with liear rate, while SGD coverges with a learig rate at O. Reducig the variace is oe of the most importat ways to accelerate SGD, ad it has bee widely applied to large-scale optimizatio Bottou, Curtis, ad Nocedal 206; Defazio, Bach, ad Lacoste-Julie 204; Gu, Huo, ad Huag 206b; Alle-Zhu ad Yua 206; Huo ad Huag 207; Gu, Huo, ad Huag 206a. I Xiao ad Zhag 204, the authors cosidered the osmooth regularizatio pealty hx 0ad proposed proximal stochastic variace reduced gradiet Proximal SVRG. Proximal SVRG is briefly described i Algorithm 2. I their paper, they used v t as the approximatio of full gradiet, where Ev t =0. It was also proved that the variace of v t coverges to zero: lim E v t t f i x t herefore, we ca keep learig rate η costat i the procedure. I step 7, Prox ηh. x deotes proximal operator. With the defiitio of proximal mappig, we have: Prox ηh. x =argmi x hx + η x x 2, 6 Covergece aalysis ad experimetal results cofirmed that Proximal SVRG admits liear covergece i expectatio for strogly covex optimizatio. I Reddi et al. 206b, the authors proved that Proximal SVRG has subliear covergece rate of O 2/3 whe f i x is ocovex. Algorithm 2 Proximal SVRG : Iitialize x 0 R N ; 2: for s =0,, 2,...S do 3: x s+ 0 x s ; 4: f f i x s ; queries 5: for t =0,, 2,...,m do 6: Uiformly sample i from {, 2,..., } with replacemet ad query f i x s+ t ad f i x s ; 2 queries 7: Update vt s+ usig: v s+ t f i x s+ t f i x s +f ; 7 8: Update model x s+ t+ usig: 9: ed for 0: x s+ x s+ : ed for x s+ t+ Prox ηh.x s+ t ηv s+ t ; 8 m ; Variace Reduced Stochastic Compositioal Proximal Gradiet I this sectio, we propose variace reduced stochastic compositioal proximal gradiet method VRSC-PG for solvig the fiite-sum compositio problem with osmooth regularizatio pealty 2. he descriptio of VRSC-PG is preseted i Algorithm 3. Similar to the framework of Proximal SVRG Xiao ad Zhag 204, our VRSC-PG also has two-layer loops. At the begiig of the outer loop s, we keep a sapshot of the curret model x s i memory ad compute the full gradiet: f x s = 2 G j x s F i G s, 9 2 j= where G s = 2 2 j= G j x s deotes the value of the ier fuctios ad G x s = 2 2 j= G j x s deotes the gradiet of ier fuctios. Computig the full gradiet of fx i problem 2 requires +2 2 queries. o make the umber of queries i each ier iteratio irrelevat to 2, we eed to keep Ĝs+ t ad Ĝs+ t i memory to work as the estimates of Gx s+ t ad Gxt s+ respectively. I our algorithm, we query G At x s+ t ad G At x s, 3289
4 the Ĝs+ t Ĝ s+ t is evaluated as follows: = G s A j A GAt [j] x s G At [j]x s+ t, 0 where A t [j] deotes elemet j i the set A t ad A t = A. he elemets of A t are uiformly sampled from {, 2,..., 2 } with replacemet. I 0, we reduce the variace of G At x s+ t by usig G s ad G At x s. Similarly, we sample B t with size B from {, 2,..., 2 } uiformly with replacemet, ad query G Bt x s+ t ad G Bt x s. he estimatio of Gx s+ t is evaluated as follows: B j B Ĝs+ t = G x s GBt[j] x s G Bt[j]x s+ t where B t [j] deotes elemet j i the set B t ad B t = B. It is importat to ote that A t ad B t are idepedet. Computig Ĝs+ t ad Ĝs+ t requires 2A +2B queries i each ier iteratio. Now, we are able to compute the estimate of fx s+ t i ier iteratio t as follows: vt s+ = b Ĝs+ t Fit Ĝs+ t i t I t G x s F it G s + f x s, 2 where I t is a set of idexes uiformly sampled from {, 2,..., } ad I t = b. As per 2, we eed to query F It Ĝs+ t ad F It G s, ad it requires 2b queries. Fially, we update the model with proximal operator: x s+ t+ = Prox ηh x s+ t ηvt s+, 3 where η is the learig rate. Covergece Aalysis I this sectio, we prove that VRSC-PG admits liear covergece rate for the strogly covex problem; 2 VRSC- PG admits subliear covergece rate O + 2 2/3 for the geeral problem. o the best of our kowledge, both of them are the best results so far. Followig are the assumptios commoly used for stochastic compositio optimizatio Wag, Fag, ad Liu 204; Wag ad Liu 206; Lia, Wag, ad Liu 206. Strogly covex: o aalyze the covergece of VRSC-PG for the strogly covex compositio problem, we assume that the fuctio f is -strogly covex. Assumptio he fuctio fx is -strogly covex. herefore x ad y, we have: fx fy x y. 5 Equivaletly, -strogly covexity ca also be writte as follows: fx fy+ fy,x y + 2 x y 2.6 Algorithm 3 VRSC-PG Iput: he total umber of iteratios i the ier loop m, the total umber of iteratios i the outer loop S, the size of the mii-batch sets A,B ad b, learig rate η. : Iitialize x 0 R N ; 2: for s =0,, 2,,S do 3: x s+ 0 x s ; 4: G s 2 2 j= G i x s ; 2 queries 5: G x s 2 2 j= G j x s ; 2 queries 6: Compute the full gradiet f x s usig 9 ; queries 7: for t =0,, 2,,m do 8: Uiformly sample A t from {, 2,..., 2 } with replacemet ad A t = A ; 9: Update Ĝs+ t usig 0 ; 2A queries 0: Uiformly sample B t from {, 2,..., 2 } with replacemet ad B t = B; : Update Ĝs+ t usig ; 2B queries 2: Uiformly sample I t from {, 2,..., } with replacemet; 3: Compute vt s+ usig 2: 2b queries 4: Update model x s+ x s+ t+ Prox ηh 5: ed for 6: x s+ x s+ 7: ed for m ; t+ usig: x s+ t ηvt s+ 4 Lipschitz Gradiet: We assume that there exist Lipschitz costats L F, L G ad L f for F i x, G j x ad fx respectively. Assumptio 2 here exist costats L F, L G ad L f for F i x, G j x ad fx satisfyig that x, y, i {,, }, j {,, 2 }: F i x F i y L F x y, 7 G j x G j y L G x y, 8 G j x F i Gx G j y F i Gy L f x y. 9 As proved i Lia, Wag, ad Liu 206, accordig to 9, we have: fx fy L f x y, x, y. 20 Equivaletly, 20 ca also be writte as follows: x, y, we have fx fy+ fy,x y + L f 2 x y 2, 2 Bouded gradiets: We assume that the gradiets F i x ad G j x are upper bouded. Assumptio 3 he gradiets F i x ad G j x have upper bouds B F ad B G respectively. F i x B F, x, i {,, } 22 G j x B G, x, j {,, 2 }
5 Note that we do ot eed the strog covexity assumptio whe we aalyze the covergece of VRSC-PG for the geeral problem. Strogly Covex Problem I this sectio, we prove that our VRSC-PG admits liear covergece rate for strogly covex fiite-sum compositio problem with osmooth pealty regularizatio 2. We eed Assumptios, 2 ad 3 i this sectio. Ulike Prox-SVRG i Xiao ad Zhag 204, the estimated vt s+ is biased, i.e., E It,A t,b t [vt s+ ] fx s+ t. It makes the theoretical aalysis for provig the covergece rate of VRSC-PG more challegig tha the aalysis i Xiao ad Zhag 204. I spite of this, we ca demostrate that E vt s+ fx s+ t 2 is upper bouded as well. Lemma Let x be the optimal solutio to problem 2 Hx such that x =argmi x R N Hx. We defie γ = 64 B 2 F L2 G B + B4 G L2 F A +8L f. Supposig Assumptios, 2 ad 3 hold, from the defiitio of vt s+ i 2, the followig iequality holds that: E vt s+ fx s+ t 2 [ ] γ Hx s+ t Hx +H x s Hx. 24 herefore, whe x s+ t ad x s coverges to x, E v t fx s+ t 2 also coverges to zero. hus, we ca keep learig rate costat, ad obtai faster covergece. heorem Suppose Assumptios, 2 ad 3 hold. We let the optimal solutio x =argmi x R N Hx, ifm, A, B ad η are selected properly so that ρ<, where ρ is defied as follows: ρ = ρ c = 2 +2η 6ηL f + ρ c m + 2η 7 8 6ηL f + ρ c m η B 2 F L 2 G B + B4 G L2 F A we ca prove that our VRSC-PG admits liear covergece rate: EH x S Hx ρ EH x S 0 Hx 27 As per heorem, we eed to choose η, m, A ad B properly to make ρ<. We provide a example to show how to select these parameters. Corollary Accordig to heorem, we set η, m, A ad B as follows: η = 28 96L f m = L f 29 A = 2048B4 G L2 F 2 30 B = 2048B2 F L2 G 2 3 we have the followig liear covergece rate for VRSC-PG: S 2 EH x S Hx EH x 0 Hx 32 3 Remark Accordig to heorem, to obtai EH x s Hx ε 33 the umber of stages S is required to satisfy: S log EH x0 Hx / log ε ρ 34 As per Algorithm 3 ad the defiitio of Samplig Oracle i Wag ad Liu 206, to make the objective value gap EH x s Hx ε, the total query complexity we eed to take is O + 2 +ma+b+b log ε = O + { } 2 + κ 3 log ε, L where we let κ =max f, L F, L G ad b ca be smaller tha or proportioal to κ 2. It is better tha com-svr-admmyu ad Huag 207 whose total query complexity is O κ 4 log ε. Geeral Problem I this sectio, we prove that VRSC-PG admits a subliear covergece rate O for the geeral fiite-sum compositio problem with osmooth regularizatio pealty. It is much better tha the state-of-the-art method ASC-PG Wag ad Liu 206 whose optimal covergece rate is O /2. I this sectio, we oly eed Assumptio 2 ad 3. he ubiased vt s+ makes our aalysis otrivial ad it is much differet from previous aalysis for fiite-sum problem Reddi et al. 206a. I our proof, we defie: G η x = x Proxηh. x fx. 35 η heorem 2 Suppose Assumptios 2 ad 3 hold. Let x be the optimal solutio to problem 2, we have x = arg mi x R N Hx.Ifm, A, B, b ad η are selected properly such that: 4 ηm 2 L 2 f b + 2ηm2 B 4 G L2 F A + 2ηm2 B 2 F L2 G B + L f 2 2η, 36 the the followig iequality holds that: E G η x a 2 2 H x 0 Hx 37 2ηL f η where x a is uiformly selected from {{x s+ t } m t=0 }S t=0 ad is a multiple of m, As per heorem 2, we eed to choose m, A, B, b ad η appropriately to make coditio 36 satisfied. We provide a example to show how to select these parameters. 329
6 a κ cov =2 b κ cov = c κ cov =0 d κ cov =0 Figure : Experimetal results for meaig-variace portfolio maagemet o sythetic data. κ cov is the coditioal umber of the covariace matrix of the correspodig Gaussia distributio which is used to geerate reward. We use time as x axis, ad it is proportioal to the query complexity. I y axis, the objective value gap is defied as Hx Hx, where x is obtaied by ruig our methods for eough iteratios util covergece. Gx 2 deotes the l 2 -orm of the full gradiet, where Gx = fx+ hx. Corollary 2 Accordig to heorem 2, we let m = + 2 3, η = 4L f, b = ad be a multiple of m, it is easy to kow that if A ad B are lower bouded: A 8m2 B 4 G L2 F L f 38 B 8m2 BF 2 L2 G 39 L f we ca obtai subliear covergece rate for VRSC-PG: E G η x a 2 H x 0 Hx 6L f 40 Remark 2 Accordig to heorem 2, to obtai E G η x a 2 ε 4 the umber of iteratios is required to satisfy: EH x 0 Hx 6L f 42 ε As per Algorithm 3 ad the defiitio of Samplig Oracle i Wag ad Liu 206, to obtai ε-accurate solutio, E G η x a 2 ε, the total query complexity we eed to take is O A+B+b ε =O /3 ε where A, B ad b are proportioal to herefore, our method improves the state-of-the-art covergece rate of stochastic compositio optimizatio for geeral problem from O /2 Optimal covergece rate for ASC-PG to O + 2 2/3. Experimetal Results We coduct two experimets to evaluate our proposed method: applicatio to portfolio maagemet; 2 applicatio to policy evaluatio i reiforcemet learig. I the experimets, there are three compared methods for stochastic compositio optimizatio:, 3292
7 Accelerated stochastic compositioal proximal gradiet ASC-PG Wag ad Liu 206; Stochastic variace reduced ADMM for Stochastic compositio optimizatio com-svr-admm Yu ad Huag 207; Variace Reduced Stochastic Compositioal Proximal Gradiet VRSC-PGOur method. I our experimets, learig rate η is tued from {, 0, 0 2, 0 3, 0 4 }. We keep the learig rate costat for com-svr-admm ad VRSC-PG i the optimizatio. For ASC-PG, i order to guaratee covergece, learig rate is decreased as per η +t, where t deotes the umber of iteratios. Applicatio to Portfolio Maagemet Suppose there are N assets we ca ivest, r t R N deotes the rewards of N assets at time t. Our goal is to maximize the retur of the ivestmet ad to miimize the risk of the ivestmet at the same time. Portfolio maagemet problem ca be formulated as the mea-variace optimizatio as follows: mi x R N r t,x + t= t= r t,x 2 r j,x 43 j= where x R N deotes the ivestmet quatity vector i N assets. Accordig to Lia, Wag, ad Liu 206, problem 43 ca also be viewed as the compositio problem as 2. I our experimet, we also add a osmooth regularizatio pealty hx =λ x i the mea-variace optimizatio problem 43. Similar to the experimetal settigs i Lia, Wag, ad Liu 206, we let = 2000 ad N = 200. Rewards r t are geerated i two steps: Geerate a Gaussia distributio o R N, where we defie the coditio umber of its covariace matrix as κ cov. Because κ cov is proportioal to κ, i our experimet, we will cotrol κ cov to chage the value of κ; 2 Sample rewards r t from the Gaussia distributio ad make all elemets positive to guaratee that this problem has a solutio. I the experimet, we compared three methods o two sythetic datasets, which are geerated through Gaussia distributios with κ cov =2ad κ cov =0separately. We set λ =0 3 ad A = B = b =5. We just select the values of A, B, b casually, it is probable that we ca get better results as log as we tue them carefully. Figure shows the covergece of compared methods regardig time. We suppose that the elapsed time is proportioal to the query complexity. Objective value gap meas Hx t Hx, where x is the optimal solutio to Hx. We compute Hx by ruig our method util covergece. Firstly, by observig the x ad y axises i Figure, we ca kow that whe κ cov =0, all compared methods eed more time to miimize problem 43, which is cosistet with our aalysis. Icreasig κ will icrease the total query complexity. Secodly, we ca also fid out that com-svr-admm ad VRSC-PG admit liear covergece rate. ASC-PG rus faster at the begiig, because of their low query complexity i each iteratio. However, their covergece slows dow whe the learig rate gets small. I four figures, our SVRC-PG always has the best performace compared to other compared methods. Applicatio to Reiforcemet Learig We the apply stochastic compositio optimizatio to reiforcemet learig ad evaluate three compared methods i the task of policy evaluatio. I reiforcemet learig, let V π s be the value of state s uder policy π. he value fuctio V π s ca be evaluated through Bellma equatio as follows: V π s =E[r s,s 2 + γv π s 2 s ] 44 for all s,s 2 {, 2,..., S}, where S represets the umber of total states. Accordig to Wag ad Liu 206, the Bellma equatio 44 ca also be writte as a compositio problem. I our experimet, we also add sparsity regularizatio hx =λ x i the objective fuctio. Followig Da, Neuma, ad Peters 204, we geerate a Markov decisio process MDP. here are 400 states ad 0 actios at each state. he trasitio probability is geerated radomly from the uiform distributio i the rage of [0, ]. We the add 0 5 to each elemet of trasitio matrix to esure the ergodicity of our MDP. he rewards rs, s from state s to state s are also sampled uiformly i the rage of [0, ]. I our experimet, we set λ =0 3 ad A = B = b =5. We also select these values casually, better results ca be obtaied if we tue them carefully. I Figure 2, we plot the covergece of the objective value ad Gx 2 i terms of time. We ca observe that VRSC- PG is much faster tha ASC-PG, which has bee reflected i the aalysis of covergece rate already. It is also obvious that our VRSC-PG coverges faster tha com-svr-admm. Experimetal results o policy evaluatio also verify our theoretical aalysis. Coclusio I this paper, we propose variace reduced stochastic compositioal proximal gradiet method VRSC-PG for compositio problem with osmooth regularizatio pealty. We also aalyze the covergece rate of our method: for strogly covex compositio problem, VRSC-PG is proved to admit liear covergece; 2 for geeral compositio problem, VRSC-PG sigificatly improves the state-of-the-art covergece rate from O /2 to O + 2 2/3. Both of our theoretical aalysis, to the best of our kowledge, are the state-of-the-art results for stochastic compositio optimizatio. Fially, we apply our method to two differet applicatios, portfolio maagemet ad reiforcemet learig. Experimetal results show that our method always has the best performace i differet cases ad verify the coclusios of theoretical aalysis. Ackowledgemet Z. H., B. G., H. H. were partially supported by the followig grats: NSF-IIS , NSF-IIS 34452, NSF- DBI , NSF-IIS 69308, NSF-IIS , NIH R0 AG J. L. was partially supported by NSF-CCF
8 Figure 2: Figures show the experimetal results of policy evaluatio i reiforcemet learig. We plot the covergece of objective value ad the full gradiet Gx 2 regardig time respectively. Gx 2 deotes the l 2 -orm of the full gradiet, where Gx = fx+ hx. Refereces Alle-Zhu, Z., ad Yua, Y Improved svrg for ostrogly-covex or sum-of-o-covex objectives. I Iteratioal coferece o machie learig, Bottou, L.; Curtis, F. E.; ad Nocedal, J Optimizatio methods for large-scale machie learig. arxiv preprit arxiv: Dai, B.; He, N.; Pa, Y.; Boots, B.; ad Sog, L Learig from coditioal distributios via dual kerel embeddigs. arxiv preprit arxiv: Da, C.; Neuma, G.; ad Peters, J Policy evaluatio with temporal differeces: a survey ad compariso. Joural of Machie Learig Research 5: Defazio, A.; Bach, F.; ad Lacoste-Julie, S Saga: A fast icremetal gradiet method with support for ostrogly covex composite objectives. I Advaces i Neural Iformatio Processig Systems, Detcheva, D.; Peev, S.; ad Ruszczyński, A Statistical estimatio of composite risk fuctioals ad risk optimizatio problems. Aals of the Istitute of Statistical Mathematics 24. Gu, B.; Huo, Z.; ad Huag, H. 206a. Asychroous stochastic block coordiate descet with variace reductio. arxiv preprit arxiv: Gu, B.; Huo, Z.; ad Huag, H. 206b. Zeroth-order asychroous doubly stochastic algorithm with variace reductio. arxiv preprit arxiv: Huo, Z., ad Huag, H Asychroous mii-batch gradiet descet with variace reductio for o-covex optimizatio. I AAAI, Johso, R., ad Zhag, Acceleratig stochastic gradiet descet usig predictive variace reductio. I Advaces i Neural Iformatio Processig Systems, Lia, X.; Wag, M.; ad Liu, J Fiite-sum compositio optimizatio via variace reduced gradiet descet. arxiv preprit arxiv: Nesterov, Y A method for ucostraied covex miimizatio problem with the rate of covergece o /k2. I Doklady a SSSR, volume 269, Reddi, S. J.; Sra, S.; Poczos, B.; ad Smola, A. 206a. Fast stochastic methods for osmooth ocovex optimizatio. arxiv preprit arxiv: Reddi, S. J.; Sra, S.; Poczos, B.; ad Smola, A. J. 206b. Proximal stochastic methods for osmooth ocovex fiite-sum optimizatio. I Advaces i Neural Iformatio Processig Systems, Wag, M., ad Liu, J Acceleratig stochastic compositio optimizatio. I Advaces I Neural Iformatio Processig Systems, Wag, M.; Fag, E. X.; ad Liu, H Stochastic compositioal gradiet descet: algorithms for miimizig compositios of expected-value fuctios. arxiv preprit arxiv: Xiao, L., ad Zhag, A proximal stochastic gradiet method with progressive variace reductio. SIAM Joural o Optimizatio 244: Yu, Y., ad Huag, L Fast stochastic variace reduced admm for stochastic compositio optimizatio. arxiv preprit arxiv:
1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationDoubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data
Doubly Stochastic Primal-Dual Coordiate Method for Regularized Empirical Risk Miimizatio with Factorized Data Adams Wei Yu, Qihag Li, Tiabao Yag Caregie Mello Uiversity The Uiversity of Iowa weiyu@cs.cmu.edu,
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationMarkov Decision Processes
Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationA Risk Comparison of Ordinary Least Squares vs Ridge Regression
Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer
More informationOn the Linear Convergence of a Cyclic Incremental Aggregated Gradient Method
O the Liear Covergece of a Cyclic Icremetal Aggregated Gradiet Method Arya Mokhtari Departmet of Electrical ad Systems Egieerig Uiversity of Pesylvaia Philadelphia, PA 19104, USA Mert Gürbüzbalaba Departmet
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS * M. JA]IMOVI], I. KRNI] 1.
Yugoslav Joural of Operatios Research 1 (00), Number 1, 49-60 ON WELLPOSEDNESS QUADRATIC FUNCTION MINIMIZATION PROBLEM ON INTERSECTION OF TWO ELLIPSOIDS M. JA]IMOVI], I. KRNI] Departmet of Mathematics
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationHarder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression
Harder, Better, Faster, Stroger Covergece Rates for Least-Squares Regressio Aoymous Author(s) Affiliatio Address email Abstract 1 2 3 4 5 6 We cosider the optimizatio of a quadratic objective fuctio whose
More informationResearch Article A Unified Weight Formula for Calculating the Sample Variance from Weighted Successive Differences
Discrete Dyamics i Nature ad Society Article ID 210761 4 pages http://dxdoiorg/101155/2014/210761 Research Article A Uified Weight Formula for Calculatig the Sample Variace from Weighted Successive Differeces
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationPrecise Rates in Complete Moment Convergence for Negatively Associated Sequences
Commuicatios of the Korea Statistical Society 29, Vol. 16, No. 5, 841 849 Precise Rates i Complete Momet Covergece for Negatively Associated Sequeces Dae-Hee Ryu 1,a a Departmet of Computer Sciece, ChugWoo
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationON POINTWISE BINOMIAL APPROXIMATION
Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece
More informationRegression with an Evaporating Logarithmic Trend
Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,
More informationMonte Carlo Optimization to Solve a Two-Dimensional Inverse Heat Conduction Problem
Australia Joural of Basic Applied Scieces, 5(): 097-05, 0 ISSN 99-878 Mote Carlo Optimizatio to Solve a Two-Dimesioal Iverse Heat Coductio Problem M Ebrahimi Departmet of Mathematics, Karaj Brach, Islamic
More informationSummary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector
Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short
More informationMODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING. University of Illinois at Urbana-Champaign
MODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING Yuheg Bu Jiaxu Lu Veugopal V. Veeravalli Uiversity of Illiois at Urbaa-Champaig Tsighua Uiversity Email: bu3@illiois.edu, lujx4@mails.tsighua.edu.c,
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationMulti parameter proximal point algorithms
Multi parameter proximal poit algorithms Ogaeditse A. Boikayo a,b,, Gheorghe Moroşau a a Departmet of Mathematics ad its Applicatios Cetral Europea Uiversity Nador u. 9, H-1051 Budapest, Hugary b Departmet
More informationQuantile regression with multilayer perceptrons.
Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationFastest mixing Markov chain on a path
Fastest mixig Markov chai o a path Stephe Boyd Persi Diacois Ju Su Li Xiao Revised July 2004 Abstract We ider the problem of assigig trasitio probabilities to the edges of a path, so the resultig Markov
More informationarxiv: v1 [stat.ml] 11 Dec 2016
Lock-Free Optimizatio for No-Covex Problems She-Yi Zhao, Gog-Duo Zhag ad Wu-Ju Li Natioal Key Laboratory for Novel Software Techology Departmet of Computer Sciece ad Techology, Najig Uiversity, Chia {zhaosy,
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationRandom Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.
Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)
More informationPAijpam.eu ON TENSOR PRODUCT DECOMPOSITION
Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationAn Alternative Scaling Factor In Broyden s Class Methods for Unconstrained Optimization
Joural of Mathematics ad Statistics 6 (): 63-67, 00 ISSN 549-3644 00 Sciece Publicatios A Alterative Scalig Factor I Broyde s Class Methods for Ucostraied Optimizatio Muhammad Fauzi bi Embog, Mustafa bi
More informationBull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung
Bull. Korea Math. Soc. 36 (999), No. 3, pp. 45{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Abstract. This paper provides suciet coditios which esure the strog cosistecy of regressio
More informationA new iterative algorithm for reconstructing a signal from its dyadic wavelet transform modulus maxima
ol 46 No 6 SCIENCE IN CHINA (Series F) December 3 A ew iterative algorithm for recostructig a sigal from its dyadic wavelet trasform modulus maxima ZHANG Zhuosheg ( u ), LIU Guizhog ( q) & LIU Feg ( )
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationFast Rates for Regularized Objectives
Fast Rates for Regularized Objectives Karthik Sridhara, Natha Srebro, Shai Shalev-Shwartz Toyota Techological Istitute Chicago Abstract We study covergece properties of empirical miimizatio of a stochastic
More informationA Hadamard-type lower bound for symmetric diagonally dominant positive matrices
A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of
More informationOn Weak and Strong Convergence Theorems for a Finite Family of Nonself I-asymptotically Nonexpansive Mappings
Mathematica Moravica Vol. 19-2 2015, 49 64 O Weak ad Strog Covergece Theorems for a Fiite Family of Noself I-asymptotically Noexpasive Mappigs Birol Güdüz ad Sezgi Akbulut Abstract. We prove the weak ad
More informationVector Quantization: a Limiting Case of EM
. Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z
More information18.657: Mathematics of Machine Learning
18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,
More informationSelf-normalized deviation inequalities with application to t-statistic
Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric
More informationResearch Article Approximate Riesz Algebra-Valued Derivations
Abstract ad Applied Aalysis Volume 2012, Article ID 240258, 5 pages doi:10.1155/2012/240258 Research Article Approximate Riesz Algebra-Valued Derivatios Faruk Polat Departmet of Mathematics, Faculty of
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationPreponderantly increasing/decreasing data in regression analysis
Croatia Operatioal Research Review 269 CRORR 7(2016), 269 276 Prepoderatly icreasig/decreasig data i regressio aalysis Darija Marković 1, 1 Departmet of Mathematics, J. J. Strossmayer Uiversity of Osijek,
More informationA New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem
This is the Pre-Published Versio. A New Solutio Method for the Fiite-Horizo Discrete-Time EOQ Problem Chug-Lu Li Departmet of Logistics The Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog Phoe: +852-2766-7410
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationSupplement to Graph Sparsification Approaches for Laplacian Smoothing
Supplemet to Graph Sparsificatio Approaches for Laplacia Smoothig Veerajaeyulu Sadhaala Yu-Xiag Wag Rya J. Tibshirai Machie Learig Departmet Caregie Mello Uiversity Pittsburgh PA 53 Departmet of Statistics
More informationMAT1026 Calculus II Basic Convergence Tests for Series
MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real
More informationMixed Optimization for Smooth Functions
Mixed Optimizatio for Smooth Fuctios Mehrdad Mahdavi Liju Zhag Rog Ji Departmet of Computer Sciece ad Egieerig, Michiga State Uiversity, MI, USA {mahdavim,zhaglij,rogji}@msu.edu Abstract It is well ow
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationDouble Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution
Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationQuestions and answers, kernel part
Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationComputational Tutorial of Steepest Descent Method and Its Implementation in Digital Image Processing
Computatioal utorial of Steepest Descet Method ad Its Implemetatio i Digital Image Processig Vorapoj Pataavijit Departmet of Electrical ad Electroic Egieerig, Faculty of Egieerig Assumptio Uiversity, Bagkok,
More informationA collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation
Iteratioal Joural of Mathematics Research. ISSN 0976-5840 Volume 9 Number 1 (017) pp. 45-51 Iteratioal Research Publicatio House http://www.irphouse.com A collocatio method for sigular itegral equatios
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationSequences. Notation. Convergence of a Sequence
Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it
More informationDynamic Policy Programming with Function Approximation: Supplementary Material
Dyamic Policy Programmig with Fuctio pproximatio: Supplemetary Material Mohammad Gheshlaghi zar Radboud Uiversity Nijmege Geert Grooteplei Noord 21 6525 EZ Nijmege Netherlads m.azar@sciece.ru.l Viceç Gómez
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationOn the convergence rates of Gladyshev s Hurst index estimator
Noliear Aalysis: Modellig ad Cotrol, 2010, Vol 15, No 4, 445 450 O the covergece rates of Gladyshev s Hurst idex estimator K Kubilius 1, D Melichov 2 1 Istitute of Mathematics ad Iformatics, Vilius Uiversity
More informationECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations
ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows
More informationComparison Study of Series Approximation. and Convergence between Chebyshev. and Legendre Series
Applied Mathematical Scieces, Vol. 7, 03, o. 6, 3-337 HIKARI Ltd, www.m-hikari.com http://d.doi.org/0.988/ams.03.3430 Compariso Study of Series Approimatio ad Covergece betwee Chebyshev ad Legedre Series
More informationOn Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
O Variace Reductio i Stochastic Gradiet Descet ad its Asychroous Variats Sashak J. Reddi Caregie Mello Uiversity sjakkamr@cs.cmu.edu Ahmed Hefy Caregie Mello Uiversity ahefy@cs.cmu.edu Barabás Póczos Caregie
More informationThe Perturbation Bound for the Perron Vector of a Transition Probability Tensor
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Liear Algebra Appl. ; : 6 Published olie i Wiley IterSciece www.itersciece.wiley.com. DOI:./la The Perturbatio Boud for the Perro Vector of a Trasitio
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationNumerical Method for Blasius Equation on an infinite Interval
Numerical Method for Blasius Equatio o a ifiite Iterval Alexader I. Zadori Omsk departmet of Sobolev Mathematics Istitute of Siberia Brach of Russia Academy of Scieces, Russia zadori@iitam.omsk.et.ru 1
More informationProbabilistic and Average Linear Widths in L -Norm with Respect to r-fold Wiener Measure
joural of approximatio theory 84, 3140 (1996) Article No. 0003 Probabilistic ad Average Liear Widths i L -Norm with Respect to r-fold Wieer Measure V. E. Maiorov Departmet of Mathematics, Techio, Haifa,
More informationDetailed proofs of Propositions 3.1 and 3.2
Detailed proofs of Propositios 3. ad 3. Proof of Propositio 3. NB: itegratio sets are geerally omitted for itegrals defied over a uit hypercube [0, s with ay s d. We first give four lemmas. The proof of
More informationNumerical Solution of the Two Point Boundary Value Problems By Using Wavelet Bases of Hermite Cubic Spline Wavelets
Australia Joural of Basic ad Applied Scieces, 5(): 98-5, ISSN 99-878 Numerical Solutio of the Two Poit Boudary Value Problems By Usig Wavelet Bases of Hermite Cubic Splie Wavelets Mehdi Yousefi, Hesam-Aldie
More informationAppendix to: Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data
Appedix to: Hypothesis Testig for Multiple Mea ad Correlatio Curves with Fuctioal Data Ao Yua 1, Hog-Bi Fag 1, Haiou Li 1, Coli O. Wu, Mig T. Ta 1, 1 Departmet of Biostatistics, Bioiformatics ad Biomathematics,
More informationImproved Class of Ratio -Cum- Product Estimators of Finite Population Mean in two Phase Sampling
Global Joural of Sciece Frotier Research: F Mathematics ad Decisio Scieces Volume 4 Issue 2 Versio.0 Year 204 Type : Double Blid Peer Reviewed Iteratioal Research Joural Publisher: Global Jourals Ic. (USA
More informationApproximating the ruin probability of finite-time surplus process with Adaptive Moving Total Exponential Least Square
WSEAS TRANSACTONS o BUSNESS ad ECONOMCS S. Khotama, S. Boothiem, W. Klogdee Approimatig the rui probability of fiite-time surplus process with Adaptive Movig Total Epoetial Least Square S. KHOTAMA, S.
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationApplication to Random Graphs
A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let
More informationA New Multivariate Markov Chain Model with Applications to Sales Demand Forecasting
Iteratioal Coferece o Idustrial Egieerig ad Systems Maagemet IESM 2007 May 30 - Jue 2 BEIJING - CHINA A New Multivariate Markov Chai Model with Applicatios to Sales Demad Forecastig Wai-Ki CHING a, Li-Mi
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More information