On the Linear Convergence of a Cyclic Incremental Aggregated Gradient Method

Size: px
Start display at page:

Download "On the Linear Convergence of a Cyclic Incremental Aggregated Gradient Method"

Transcription

1 O the Liear Covergece of a Cyclic Icremetal Aggregated Gradiet Method Arya Mokhtari Departmet of Electrical ad Systems Egieerig Uiversity of Pesylvaia Philadelphia, PA 19104, USA Mert Gürbüzbalaba Departmet of Maagemet Sciece ad Iformatio Systems Rutgers Uiversity Piscataway, NJ 08854, USA Alejadro Ribeiro Departmet of Electrical ad Systems Egieerig Uiversity of Pesylvaia Philadelphia, PA 19104, USA aryam@seas.upe.edu mgurbuzbalaba@busiess.rutgers.edu aribeiro@seas.upe.edu Editor: Abstract This paper cosiders the problem of miimizig the average of a fiite set of strogly covex fuctios. We itroduce a cyclic icremetal aggregated gradiet method that at each iteratio computes the gradiet of oly oe fuctio, which is chose based o a cyclic scheme, ad uses that to update the aggregated average gradiet of all the fuctios. We prove that ot oly the proposed method coverges liearly to the optimal argumet, but also its liear covergece rate factor justifies the advatage of icremetal methods with respect to full batch gradiet descet. I particular, we show theoretically ad empirically that oe pass of the proposed method is more efficiet tha oe iteratio of gradiet descet. I additio, we propose a accelerated versio of the itroduced cyclic icremetal aggregated gradiet method that softes the depedecy of the covergece rate to the coditio umber of the problem. Keywords: Icremetal gradiet methods, fiite sum miimizatio, large-scale optimizatio, liear covergece rate, accelerated methods 1. Itroductio We cosider the optimizatio problem where the objective fuctio ca be writte as the average of a set of strogly covex fuctios. I particular, cosider x R p as the optimizatio variable ad f i : R p R as the i-th available fuctio. We aim to fid the miimizer of the average fuctio fx) = 1/) f ix), i.e., x 1 = argmi fx) := argmi x R p x R p f i x). 1) We call f i as the istataeous fuctios ad the average fuctio f as the global objective fuctio. This class of optimizatio problems arises i machie learig Bottou ad Le Cu 2005)), estimatio, wireless systems, ad sesor etworks. I this work, we cosider the case that the objective fuctios f i are smooth ad strogly covex. Gradiet descet GD) method is oe of the first methods used for solvig the problem i 1). However, gradiet descet is impractical whe the umber of fuctios is extremely large, sice it

2 requires computatio of gradiets at each iteratio. Stochastic gradiet descet SGD) or miibatch gradiet descet MGD) which use oe or a subset of gradiets, respectively, to approximate the full gradiet are more popular for large-scale problems Robbis ad Moro 1951); Bottou 2010)). Although these methods reduce the computatioal complexity of GD, they caot achieve a liear covergece rate as GD. The last decade has see fudametal progress i developig alteratives with faster covergece. A partial list of this cosequetial literature icludes stochastic averagig gradiet Roux et al. 2012); Defazio et al. 2014)), variace reductio methods Johso ad Zhag 2013); Xiao ad Zhag 2014)), dual coordiate methods Shalev-Shwartz ad Zhag 2013, 2016)), hybrid algorithms Zhag et al. 2013); Koečỳ ad Richtárik 2013)), ad majorizatio-miimizatio algorithms Mairal 2015)). All these stochastic algorithms are successful i achievig a liear covergece rate i expectatio. The other class of first-order alteratives for GD are icremetal methods Blatt et al. 2007); Tseg ad Yu 2014)). This class of algorithms is differet from stochastic methods i the way that they choose fuctios for gradiet approximatio. To be more precise, i stochastic methods a radom fuctio has bee chose radomly from the set of fuctios, while i icremetal methods fuctios are chose i a cyclic order. Although icremetal methods perform as well as stochastic methods i practice, their covergece results are limited relative to stochastic methods. I particular, as i the case of SGD, cyclic GD exhibits subliear covergece. This limitatio motivated the developmet of the icremetal aggregated gradiet IAG) method that achieves a liear covergece rate Gürbüzbalaba et al. 2015). To explai our cotributio, we must emphasize that the covergece costat of IAG ca be smaller tha the covergece costat of GD Sectio 3). Thus, eve though IAG is desiged to improve upo GD, the available aalyses still make it impossible to assert that IAG outperforms GD uder all circumstaces. I fact, the questio of whether it is possible at all to desig a cyclic method that is guarateed to always outperform GD remais ope. I this paper, we propose a ovel icremetal first-order method called Double Icremetal Aggregated Gradiet method DIAG). The DIAG update uses the average of both delayed variables ad gradiets i oppose to the classic icremetal methods i Blatt et al. 2007); Tseg ad Yu 2014); Gürbüzbalaba et al. 2015)) that oly use the average of delayed gradiets. This major differece comes from the fact that DIAG uses a approximatio of the global fuctio f at each iteratio which is differet from the oe of IAG. We show that this critical differece leads to a icremetal algorithm with a liear covergece factor which improves the covergece factor of GD uder all circumstaces. Based o our kowledge, this is the first icremetal method which is guarateed to improve the performace of GD. We start the paper by studyig existig methods ad their covergece guaratees Sectio 2). The, we preset the proposed icremetal method Sectio 3) ad suggest a efficiet mechaism to implemet the proposed algorithm Sectio 3.1). Further, we provide the covergece aalysis of the DIAG method Sectio 4). We show that if the fuctios f i are strogly covex ad their gradiets f are Lipschitz cotiuous, the the sequece of variables x k geerated by DIAG coverges liearly to the optimal argumet x Propositio 3). Moreover, we show that the fuctio decremet for the proposed method after each pass over the dataset is strictly smaller tha the fuctio decremet of gradiet descet after oe iteratio Theorem 5 ad Theorem 8). We compare the performace of DIAg with its stochastic variat MISO) ad the IAG method Sectio 7). Fially we close the paper by cocludig remarks. 2. Related Works ad Prelimiaries Sice the objective fuctio i 1) is covex, descet methods ca be used to fid the optimal argumet x. I this paper, we are iterested i studyig methods that coverge to the optimal argumet of fx) at a liear rate. It is customary for the liear covergece aalysis of first-order 2

3 methods to assume that the fuctios are smooth ad strogly covex. We formalize these coditios i the followig assumptio. Assumptio 1 The fuctios f i are differetiable ad strogly covex with costat µ > 0, i.e., f i x) f i y)) T x y) µ x y 2. 2) Moreover, the gradiets f i are Lipschitz cotiuous with costat L <, i.e., f i x) f i y) L x y. 3) The strog covexity of the fuctios f i with costat µ implies that the global objective fuctio f is also strogly covex with costat µ. Likewise, the Lipschitz cotiuity of the gradiets f i with costat L yields Lipschitz cotiuity of the global objective fuctio gradiets f with costat L. Note that the coditios i Assumptio 1 are mild ad hold for most large-scale applicatios such as, liear regressio, logistic regressio, least squares, ad support vector machies. The optimizatio problem i 1) ca be solved usig the gradiet descet GD) method Nesterov 2004)). The idea of GD is to update the curret iterate x k by descedig through the egative directio of the curret gradiet fx k ) with a proper stepsize ɛ. I other words, the update of GD at step k is defied as x k+1 = x k ɛ k fx k ) = x k ɛ f i x k ), 4) where ɛ k is a positive stepsize learig rate). Covergece aalysis of GD shows that the sequece of iterates x k coverges liearly to the optimal argumet for the costat stepsizes that satisfy ɛ k = ɛ < 2/L Nesterov 2004)). The fastest covergece rate is achieved by stepsize 2/µ + L) which leads to the liear covergece factor κ 1)/κ + 1), i.e., x k x ) k κ 1 x 0 x, 5) κ + 1 where κ := L/µ is the global objective fuctio coditio umber. Although, GD has a fast liear covergece rate, it is ot computatioally affordable i large-scale applicatios because of its high computatioal complexity. To comprehed this limitatio, ote that each iteratio of GD requires gradiet evaluatios which is ot computatioally affordable i large-scale applicatios with massive values of. Stochastic gradiet descet SGD) arises as a atural solutio i large-scale settigs. SGD modifies the update of GD by approximatig the gradiet of the global objective fuctio f by the average of a small umber of istataeous gradiets chose uiformly at radom from the set of gradiets. To be more precise, the update of SGD at step k is defied as x k+1 = x k ɛk b f i x k ), 6) where Sb k has cardiality of Sk b = b << ad its compoets are chose uiformly at radom from the set {1, 2,..., }. Note that the stochastic gradiet 1/b) i S f b k i x k ) is a ubiased estimator of the gradiet fx k ) = 1/) f ix k ). Thus, the sequece of the iterates geerated by SGD coverges to the optimal argumet i expectatio. It has bee show that the covergece rate of SGD is subliear ad ca be characterized as i S k b E x k x 2 O 3 ) 1, 7) k

4 whe the sequece of dimiishig stepsizes ɛ k is of the order 1/k. Note that the expectatio i 7) is take with respect to the idices of the chose radom fuctios up to step k. Oe may use a cyclic order istead of stochastic selectio of fuctios i SGD which leads to the update of Icremetal Gradiet method IG) as i Blatt et al. 2007); Tseg ad Yu 2014). Similar to the case for SGD, the sequece of iterates geerated by the IG method coverges to the optimal argumet with a subliear rate of O 1/k) whe the stepsize is dimiishig. SGD ad IG are able to reduce the computatioal complexity of GD by requirig oly oe or a subset of) gradiet evaluatio per iteratio; however, they both suffer from slow subliear) covergece rates. The subliear covergece rate of SGD has bee improved recetly. The first successful attempt for achievig liear covergece rate with the computatioal complexity of SGD was the stochastic average gradiet descet method SAG) which updates oly oe gradiet per iteratio ad uses the average of the most recet versio of the gradiets as a approximatio for the full gradiet Roux et al. 2012)). I particular, defie yi k as the copy of the decisio variable x for the last time that the fuctio f i s gradiet is updated. The, at each iteratio, a radom idex i k is chose uiformly at radom ad the gradiet of its correspodig fuctio f i kx k ) is evaluated ad stored as f i kyi k). The, the variable x k is updated as x k+1 = x k ɛ f i yi k ). 8) I have to double check the followig text The sequece of iterates geerated by SAG coverges liearly to x i expectatio with respect to the choices of radom idices, i.e., E x k x ) k C 0, 9) 8κ where C 0 is a costat idepedet of ad κ. This result justifies the advatage of SAG with respect to GD. basically the error x k x decays by the factor of 1 1/8κ)) after a pass over the dataset which shows improvemet with respect to GD that decays with the factor of κ 1)/κ + 1). Similar advatages have bee observed for other recet stochastic methods such as SAGA, SVRG, SDCA, MISOot sure about MISO. check it The other alterative for solvig the optimizatio problem i 1) is the Icremetal Aggregated Gradiet IAG) method which is a middle groud betwee GD ad IG. The IAG method requires oe gradiet evaluatio per iteratio, as i IG, while it approximates the gradiet of the global objective fuctio fx) by the average of the most recet gradiet of all istataeous fuctios Blatt et al. 2007)), ad it has a liear covergece rate, as i GD. I the IAG method, the fuctios are chose i a cyclic order ad it takes iteratios to have a pass over all the available fuctios. To itroduce the update of IAG, recall the defiitio of yi k as the copy of the decisio variable x for the last time that the fuctio f i s gradiet is updated before step k. The, the update of IAG is give by x k+1 = x k ɛ f i yi k ), 10) Therefore, the update of IAG is idetical to the update of SAG i 8), ad the oly differece is i the scheme that the idex i k is chose. The covergece results i Tseg ad Yu 2014)) provide global covergece ad local liear covergece of IAG i a more geeral settig whe each compoet fuctio satisfies a local Lipschitzia error coditio. More recetly, a ew covergece aalysis of IAG has bee studied i Gürbüzbalaba et al. 2015)) which proves global liear covergece of IAG for strogly covex fuctios with Lipschitz cotiuous gradiets. I particular, it has bee show that the sequece of 4

5 iterates x k geerated by IAG satisfies the followig iequality x k x ) k )κ + 1) 2 x 0 x. 11) Notice that the covergece rate of IAG is liear ad evetually the error of IAG will be smaller tha the errors of SGD ad IGD which dimiish with a subliear rate of O1/k). To compare the performaces of GD ad IAG it is fair to compare oe iteratio of GD with iteratios of IAG. This is reasoable sice oe iteratio of GD requires gradiet evaluatios, while IAG uses gradiet evaluatio after iteratios. Comparig the decremet factors of GD i 5) ad IAG after gradiet evaluatios i 11) does ot guaratee the advatage of IAG relative to GD for all choices of coditio umber κ ad umber of fuctios, sice we could face the sceario that ) ) κ 1 2 < 1 κ )κ + 1) 2. 12) Note that the boud for GD i 5) is strict ad we ca desig a sequece which satisfies the equality case of the result i 5). However, the boud i 11) is ot ecessarily tight ad it could be the reaso that the compariso i 12) does ot justify the use of IAG istead of GD. Our goal i this paper is to come up with a first order icremetal method that has a guarateed upper boud which is better tha the oe for GD i 5). We propose this algorithm i the followig sectio. 3. Algorithm Defiitio Recall y i as the copy of the decisio variable x correspodig to the fuctio f i. The update of IAG i 10) ca be iterpreted as the solutio of the optimizatio program x k+1 = argmi x R p { 1 f i x k ) + 1 f i yi k ) T x x k ) + 1 } 1 2ɛ x xk 2. 13) This iterpretatio shows that i the update of IAG each istataeous fuctio f i x) is approximated by the followig approximatio f i x) f i x k ) + f i y k i ) T x x k ) + 1 2ɛ x xk 2. 14) Notice that the first two terms f i x k )+ f i yi k)t x x k ) correspod to the first order approximatio of the fuctio f i aroud the iterate yi k. The last term which is 1/2ɛ) x xk 2 is a proximal term that is added to the first order approximatio. This approximatio is differet from the classic approximatio that is used i first-order methods, sice the first-order approximatio is evaluated aroud a poit yi k which is differet from the iterate xk used i the proximal term. This observatio verifies that the IAG algorithm performs well whe the delayed versio variables yi k are close to the curret iterate x k which is true whe the stepsize ɛ is very small or the iterates are all close to the optimal solutio. We resolve this issue by itroducig a differet approach for approximatig each compoet fuctio f i, I particular, we use the approximatio f i x) f i y k i ) + f i y k i ) T x y k i ) + 1 2ɛ x yk i 2. 15) As we observe, the approximatio i 15) is more cosistet to classic first-order methods comparig to the oe for IAG i 14). This is true sice the first order approximatio ad the proximal term i 5

6 15) are evaluated for the same poit yi k. Ideed, the approximatio i 15) implies that the global objective fuctio fx) ca be approximated by fx) 1 f i yi k ) + 1 f i yi k ) T x yi k ) ɛ x yk i 2. 16) We ca approximate the optimal argumet of the global objective fuctio f by miimizig its approximatio i 16). Thus, the updated iterate x k+1 ca be computed as the miimizer of the approximated global objective fuctio i 16), i.e., x k+1 = argmi { 1 f i yi k ) + 1 f i yi k ) T x yi k ) + 1 } 1 2ɛ x yk i 2. 17) Cosiderig the covex programmig i 17) we ca derive a closed form solutio for the variable x k+1 which is give by x k+1 = 1 yi k ɛ f i yi k ). 18) We call the proposed method with the update 18) as Double Icremetal Aggregated Gradiet method DIAG). This appellatio is justified cosiderig that the update of DIAG requires the icremeted aggregate of both variables ad gradiets ad oly uses gradiet first-order) iformatio. Notice that sice we use a cyclic scheme, the set of variables {y1, k y2, k..., y} k is equal to the set {x k, x k 1,..., x k +1 }. Hece, the update for the proposed cyclic icremetal aggregated gradiet descet method that the cycle that has the order f 1, f 2,..., f ca be writte as x k+1 = 1 x k +i ɛ f j x k +i ), 19) where j = k + i mod ). The update i 19) shows that we use first order approximatio of the fuctios f i aroud the last iterates to evaluate the ew update x k+1. I other words, x k+1 is a fuctio of the last iterates {x k, x k 1,..., x k +1 }. This observatio is very fudametal i the aalysis of the proposed DIAG method as we observe i Sectio 4. Remark 1 Oe may cosider the proposed DIAG method as a cyclic versio of the stochastic MISO algorithm i Mairal 2015)). This is a valid iterpretatio; however, the covergece aalysis of MISO caot guaratee that for all choices of ad κ it outperforms GD, while we establish theoretical results i Sectio 4 which guaratee the advatages of DIAG o GD for ay ad κ. Moreover, the proposed DIAG method is desiged based o the ew iterpretatio i 15) that leads to a ovel proof techique; see Lemma 2. This ew aalysis is differet from the aalysis of MISO i Mairal 2015) ad provides stroger covergece results. 3.1 Implemetatio Details Naive implemetatio of the update i 18) requires computatio of sums of vectors per iteratio which is computatioally costly. This uecessary computatioal complexity ca be avoided by trackig the sums over time. To be more precise, the fist sum i 18) which is the sum of the variables ca be updated as y k+1 i = yi k + x k y k ik, 20) where i k is the idex of the fuctio that is chose at step k. Likewise, the sum of gradiets i 18) ca be updated as f i y k+1 i ) = f i yi k ) + f i kx k ) f i ky k ik). 21) 6

7 Algorithm 1 Double Icremetal Aggregated Gradiet method DIAG) 1: Require: iitial variables y1 0 = = y 0 = x 0 ad gradiets f 1 y1), 0..., f y) 0 2: for k = 0, 1,... do 3: Compute the fuctio idex i k = modk, ) + 1 4: Compute x k+1 = 1 yk i ɛ f iyi k). 5: Update the sum of variables yk+1 i = yk i + xk yj k. 6: Compute f j x k+1 ) ad update the sum f iy k+1 i ) = f j x k ) f j yj k) + f iyi k). 7: Replace y k i ad f k i ky k i ) i the table by f k i kx k+1 ) ad x k+1, respectively. The other compoets remai uchaged. i.e., y k+1 i = yi k ad f iy k+1 i ) = f i yi k) for i ik. 8: ed for The proposed double icremetal aggregated gradiet DIAG) method is summarized i Algorithm 1. The variables for all the copies of the vector x are iitialized by vector 0, i.e., y1 0 = = y 0 = x 0, ad their correspodig gradiets are stored i the memory. At each iteratio k, the updated variable x k+1 is computed i Step 4 usig the update i 18). The sums of variables ad gradiets are updated i Step 5 ad 6, respectively, followig the recursio i 20) ad 21). I Step 7, the old variable ad gradiets of the updated fuctio f i k are replaced with their updated versios ad other compoets of the variables ad gradiets tables remai uchaged. I Step 3, the idex i k is updated i a cyclig maer. 4. Covergece Aalysis I this sectio, we study the covergece properties of the proposed double icremetal aggregated gradiet method ad justify its advatages versus the gradiet descet method. The followig lemma characterizes a upper boud for the optimality error x k+1 x i terms of the optimality errors of the last iteratios. Lemma 2 Cosider the proposed double icremetal aggregated gradiet DIAG) method i 18). If the coditios i Assumptio 1 hold, ad the stepsize ɛ is chose as ɛ = 2/µ + L), the sequece of iterates x k geerated by DIAG satisfies the iequality x k+1 x κ 1 κ + 1 where κ = L/µ is the objective fuctio coditio umber. Proof See Appedix A. ) x k x + + x k +1 x, 22) The result i Lemma 2 has a sigificat role i the aalysis of the proposed method ad it shows that error at step k + 1 is smaller tha the average of the last errors. I particular, the ratio κ 1)/κ + 1) is strictly smaller tha 1 which shows that the error at each iteratio is strictly smaller tha the average error of its last steps. This cyclic scheme is critical to prove the result i 22), sice it allows to replace the sum yk i x by the sum of last steps errors x k x + + x k x. Note that If we pick fuctios uiformly at radom, as i MISO, it is ot possible to write the expressio i 22), eve i expectatio. We also caot a iequality similar to the oe i 22) for IAG, although, it uses a cyclic scheme. This comes from the fact that IAG oly uses the gradiets average while DIAG uses both the variables ad gradiets averages. Thus, this special property distiguishes DIAG from IAG ad MISO. I the followig theorem, we use the result i Lemma 2 to show that the sequece of errors x k x is coverget. 7

8 Propositio 3 Cosider the proposed double icremetal aggregated gradiet DIAG) method i 18). If the coditios i Assumptio 1 hold, ad the stepsize ɛ is chose as ɛ = 2/µ + L), the sequece of iterates x k geerated by the proposed DIAG method satisfies the iequality ) ) + x k x ρ +k k 1 k 1 1 ρ) x 0 x, 23) where ρ := κ 1)/κ + 1) ad a + idicates the floor of a. Proof See Appedix B. The first outcome of the result i Propositio 3 is the covergece of the sequece x k x to zero as k approaches ifiity. The secod result which we formalize i the followig corollary shows that the sequece of error coverges liearly after each pass over the dataset. Corollary 4 If the coditios i Propositio 3 are satisfied, the error of the proposed DIAG method after m passes over the fuctios f i, i.e., k = m iteratios, is bouded above by ) ) 1 x m x ρ 1 m 1 ρ) x 0 x 24) Proof Set k = m i 23) ad the claim follows. The result i Corollary 4 shows liear covergece of the subsequece of iterates which are sampled after each pass over the set of fuctios. Moreover, the result i Corollary 4 verifies the advatage of DIAG method versus the full gradiet descet method. The result i 24) shows that the error of DIAG after m passes over the dataset is bouded above by ρ m 1 1 ρ) 1)/) x 0 x which is strictly smaller tha the upper boud for the error of GD after m iteratios give by ρ m x 0 x. Therefore, the DIAG method outperforms GD for ay choice of κ ad > 1. Notice that the upper boud ρ m x 0 x for the error of GD after m iteratios is tight, ad there exists a optimizatio problem such that the error of GD satisfies the relatio x m x = ρ m x 0 x. Although, the result i Corollary 4 implies that the DIAG method is preferable with respect to GD ad shows liear covergece of a subsequece of iterates, it is ot eough to prove liear covergece of the whole sequece of iterates geerated by DIAG. To be more precise, the result i Corollary 4 shows that the subsequece of errors { x k x } k=0, which are associated with the variables at the ed of each pass over the set of fuctios, is liearly coverget. However, we aim to show that the whole sequece { x k x } k=0 is liearly coverget. To be more precise, we aim to show that the sequece of DIAG iterates satisfies x k x aγ k x 0 x for a costat a > 0 ad a positive coefficiet 0 γ < 1. I the followig theorem, we show that this coditio is satisfied for the DIAG method. Theorem 5 Cosider the proposed double icremetal aggregated gradiet DIAG) method i 18). If the coditios i Assumptio 1 hold, ad the stepsize ɛ is chose as ɛ = 2/µ + L), we ca write the followig iequality x k x aγ k x 0 x, 25) if the costats a > 0 ad 0 γ < 1 satisfy the followig coditios ) k 1)1 ρ) ρ 1 aγ k for k = 1,..., 26) γ ρ ) γ + ρ 0 for k >. 27) 8

9 Proof See Appedix C. The result i Theorem 5 provides coditios o the costats a ad γ such that the liear covergece iequality x k x aγ k x 0 x holds. However, it does ot guratee that the set of costats {a, γ} that satisfy the required coditios i 26) ad 27) is o-empty. I the followig propositio we show that there exists costats a ad γ that satisfy these coditios. Propositio 6 Cosider the positive costats a > 0 ad the costat 0 γ < 1. The, there exists a ad γ such that the iequalities i 26) ad 27). I other words, the set of feasible solutios for the system of iequalities i 26) ad 27) is o-empty. Proof See Appedix D. The result i Propositio 6 i cojuctio with the result i Theorem 5 guaratees liear covergece of the iterates geerated by the DIAG method. Although there are differet pairs of {a, γ} that satisfy the coditios i 26) ad 27) ad lead to the liear covergece result i 25), we are iterested i fidig the pair {a, γ} that leads to smallest liear covergece factor γ, i.e., the pair that guaratees faster liear covergece factor. To fid the smallest γ for the liear covergece rate we should pick the smallest γ that satisfies the iequality γ ρ/) γ + ρ/ 0. The choose the smallest costat a that satisfies the coditios i 26) for the give γ. To do so, we first look at the properties of the fuctio hγ) := γ ρ/) γ + ρ/ i the followig lemma. Lemma 7 Cosider the fuctio hγ) := γ ρ/) γ + ρ/ for γ 0, 1). The fuctio h has oly oe root γ 0 i the iterval 0, 1). Moreover, γ 0 is the smallest choice of γ that satisfies the coditio i 27). Proof The derivative of the fuctio h is give by d dγ h = + 1)γ + ρ)γ 1. 28) Therefore, the oly critical poit of the fuctio h i the iterval 0, 1) is γ = + ρ)/ + 1). The poit γ is a local miimum for the fuctio h, sice the secod derivative of the fuctio h is positive at γ. Notice that the objective fuctio value hγ ) < 0 is egative. Moreover, we kow that h0) > 0 ad h1) = 0. This observatio shows that the fuctio h has a root γ 0 betwee 0 ad γ ad this is the oly root of fuctio h i the iterval 0, 1). Thus, γ 0 is the smallest value of γ i the iterval 0, 1) that satisfies the coditio i 27). The result i Lemma 7 shows that the uique root of the fuctio hγ) := γ ρ/) γ + ρ/ i the iterval 0, 1) is the smallest γ that satisfies the coditio i 27). We use this result to formalize the pair {a, γ} with the smallest choice of γ which satisfies the coditios i 26) ad 27). Theorem 8 Cosider the proposed double icremetal aggregated gradiet DIAG) method i 18). Let the coditios i Assumptio 1 hold, ad set the stepsize as ɛ = 2/µ + L). The, the sequece of iterates geerated by DIAG is liearly coverget as where γ 0 is the uique root of the equatio x k x a 0 γ k 0 x 0 x, 29) γ ρ ) γ + ρ = 0, 30) 9

10 i the iterval 0, 1) ad a 0 is give by a 0 = max ρ 1 i {1,...,} Proof It follows from the results i Theorem 5 ad Lemma 7. ) i 1)1 ρ) γ0 i. 31) The result i Theorem 8 shows R-liear covergece of the DIAG iterates with the smallest liear covergece factor γ 0. I the followig sectio, we show that a sequece which is a upper boud for the errors x k x coverges Q-liearly to zero with the liear covergece costat i 30). 5. Worst-case asymptotic rate of DIAG I this sectio, our aim is to provide a upper boud o the quatity x k x which is the distace to optimality after k steps. It follows directly from 22) that the sequece d k defied by the recursio where ρ = have κ 1 κ+1 Defiig the colum vector d k+1 = ρ dk + d k d k +1 ) ad d j := x j x for j = 0, 1, 2,..., 1 provides a upper boud, i.e. we x k x d k for all k 0. D k := d k d k 1... d k +1 ote that this recursio ca be rewritte as the followig matrix iteratio ρ ρ ρ... D k+1 = M ρ D k where M ρ := We observe that M ρ is a o-egative matrix whose eigevalues determie the asymptotic growth rate of the sequece D k ad hece of d k. It is straightforward to check that the characteristic polyomial of M ρ is, T λ) = λ ρ λ 1 ρ λ 2... ρ = λ ρ ) λ + ρ λ 1 whose roots are the eigevalues of M ρ. I the remaider of this sectio, we will ifer iformatio about the eigevalues of M ρ usig Perro-Frobeius PF) theory. This theory is well developed for positive matrices where all the etries are strictly positive but M ρ has zero etries ad is therefore ot positive. Nevertheless, the PF theory has bee successfully exteded to certai o-egative matrices called irreducible matrices. A square matrix A is called irreducible if for every i ad j, there exists a r such that A r i, j) > 0. I the ext lemma, we prove that the matrix M ρ is irreducible which will justify our use of PF theory developed for irreducible matrices. Lemma 9 The matrix M σ is irreducible for ay ρ > 0. 10

11 Proof By the defiitio of irreducibility, we eed to show that for every i ad j, there exists a r such that Mρ r i, j) > 0. Let e 1, e 2,..., e be the stadard basis for R. We will show that we ca choose r = for all i ad j. Let ad cosider D = d d 1... d 1 = e j Mρ e j = Mρ D = d 2 d d +1 Usig the defiitio of M σ, it is easy to check that such a iitializatio of D leads to d +1 = ρ/ > 0, d +2 > 0,..., d 2 > 0. Therefore, for every i ad j, we have which completes the proof.. M ρ i, j) = e T i M ρ e j = d 2 i > 0 Theorem 10 Let ρ 0, 1) ad let λ ρ) be the spectral radius of M ρ. The, i) λ ρ) is the largest real root of the characteristic polyomial T λ). Furthermore, it is a simple root. ii) We have the limit. iii) We have the bouds. Proof lim d k+1/d k = λ ρ) k ρ > λ ρ) ρ i) Part i) is a direct cosequece of the Perro-Frobeius theorem for irreducible o-egative matrices. ii) By Hom ad Johso, 1991, Theorem 8.5.1), we also have Mρ k lim k λ ρ) k = uvt where u, v are the right ad left eigevectors of M ρ correspodig to the eigevalue λ ρ) ormalized to satisfy v T u = 1. Note also that Therefore, lim d k+1/d k = lim k k d k = e T 1 D k = e T 1 Mρ k D e T 1 Mρ k +1 D e T 1 M ρ k = lim D k ρet 1 Mρ k +1 D /λ ρ) k +1 e T 1 M k ρ D /λ ρ) k 32) = lim k λ ρ) et 1 uv T D e T 1 uvt D 33) = λ ρ). 34) 11

12 iii) Theorem Hom ad Johso 1991) directly implies that λ ρ) ρ which proves the lower boud o λ ρ). To get the upper boud, let 1 = T be the vector of oes. We will show that M 2 ρ 1 < ρ ) where < deotes the compoetwise iequality for vectors. The, by Hom ad Johso, 1991, Corollary ), this would imply λ ρ) 2 < ρ 2 which is equivalet to the desired upper boud. It is a straightforward computatio to show that if we set D = 1, the after a simple iductio argumet we obtai d +1 = ρ ad d +2 < ρ, d +3 < ρ,..., d 2 < ρ, i.e. D 2 = d 2 d d +1 = M ρ D = Mρ 1 = ρv for some vector v = v 1, v 2,..., v T, where v i < 1 if i < ad v = 1. Similarly, D 3 = d 3 d d 2+1 ad a straightforward computatio shows that = M ρ 2 D = M ρ Mρ 1 = ρm ρ v M ρ v < ρ1. Combiig this iequality with the previous equatio proves 35) ad cocludes the proof. 6. Acceleratio Li et al. Li et al. 2015) describe a geeric method, called the Catalyst Algorithm to accelerate a liearly coverget algorithm. This method is directly applicable to Algorithm 1, it oly requires the tuig of a parameter κ which adjusts the resultig covergece rate. Lemma 2 shows that Algorithm 1 is liearly coverget ad part iii) of Theorem 10 shows that the resultig rate is smaller) better tha that of the stadard gradiet descet method. Li et al. propose to choose κ = L µ to accelerate the gradiet descet method, the resultig rate for the accelerated method whe L/µ is O ) L/µ log1/ε). 36) As Algorithm 1 is faster tha the gradiet descet method, its accelerated versio obtaied by choosig the same tuig parameter κ = L µ will have a complexity that is o worse tha 36). 12

13 GD IAG DIAG error xk x x 0 x umber of gradiet evaluatios Figure 1: Covergece paths of GD, IAG, ad DIAG for the quadratic programmig with = 200 ad κ = Numerical experimets I this sectio, we compare the performaces of GD, IAG, ad DIAG. First, we apply these methods to solve the quadratic programmig mi x R p fx) := xt A i x + b T i x, 37) where A i R p p is a diagoal matrix ad b i R p is a radom vector chose from the box 0, 1 p. To cotrol the problem coditio umber, the first p/2 diagoal elemets of A i are chose uiformly at radom from the iterval 1, 10 1,..., 10 η/2 ad its last p/2 elemets chose from the iterval 1, 10 1,..., 10 η/2. This selectio resultig i the sum A i havig eigevalues i the rage 10 η/2, 10 η/2. I our simulatios, we fix the variable dimesio as p = 20 ad the umber of fuctios as = 200. Moreover, the stepsizes of GD ad DIAG are set as their best theoretical stepsize which are ɛ GD = 2/µ + L) ad ɛ DIAG = 2/µ + L), respectively. Note that the stepsize suggested i Gürbüzbalaba et al. 2015) for IAG is ɛ IAG = 0.32/L)L+µ)); however, this choice of stepsize is very slow i practice. Thus, we use the stepsize ɛ IAG = 2/L) which performs better tha the oe suggested i Gürbüzbalaba et al. 2015). To have a fair compariso, we compare the algorithms i terms of the total umber of gradiet evaluatios. Note that compariso of these methods i terms of the total umber of iteratios would ot be fair sice each iteratio of GD requires gradiet evaluatios, while IAG ad DIAG oly require oe gradiet computatio per iteratio. We first cosider the case that η = 1 ad use the realizatio with coditio umber κ = 10 to have a relatively small coditio umber. Fig. 1 demostrates the covergece paths of the ormalized error x k x / x 0 x for IAG, DIAG, ad GD whe = 200 ad κ = 10. As we observe, IAG performs better tha GD, while the best performace belogs to DIAG. I the secod experimet, we icrease the problem coditio umber by settig η = 2 ad usig the realizatio with coditio umber κ = 117. Fig. 2 illustrates the performaces of these methods for the case that = 200 ad κ = 117. We observe that the covergece path of IAG is almost idetical to the oe for GD. I this experimet, we also observe that DIAG has the best performace amog the three methods. Note that the relative performace of IAG ad GD chages for problems with differet coditio umbers. O the other had, the relative covergece paths of DIAG ad GD does ot chage i differet settigs, ad DIAG cosistetly outperforms GD. 13

14 GD IAG DIAG error xk x x 0 x umber of gradiet evaluatios 10 4 Figure 2: Covergece paths of GD, IAG, ad DIAG for the quadratic programmig with = 200 ad κ = 117. We also compare the performaces of GD, IAG, ad DIAG i solvig a biary classificatio problem. Cosider the logistic regressio problem where samples {u i } ad their correspodig labels {l i } are give. The dimesio of samples is p, i.e., u i R p, ad the labels l i are either 1 or 1. The goal is to fid the optimal classifier x R p that miimizes the regularized logistic loss which is give by mi fx) := 1 log1 + exp l i x T u i )) + λ x R p 2 x 2. 38) The objective fuctio f i 38) is strogly covex with costat µ = λ ad its gradiets are Lipschitz cotiuous with costat L = λ + ζ/4 where ζ = max i u T i u i. Note that the fuctios f i i this case ca be defied as f i x) = log1 + exp l i x T u i )) + λ/2) x 2. It is easy to verify that the istataeous fuctios f i are also strogly covex with costat µ = L, ad their gradiets are Lipschitz cotiuous with costat L = λ + ζ/4. We apply GD, IAG, ad DIAG to the logistic regressio problem i 38) for the MNIST dataset LeCu et al. 1998). We assig label l i = 1 to the samples that correspod to digit 8 ad label l i = 1 to those associated with digit 0. We get a total of = 11, 774 traiig examples, each of dimesio p = 784. The objective fuctio error fx k ) fx ) of the GD, IAG, ad DIAG methods versus the umber of passes over the dataset are show i Fig. 3 for the stepsizes ɛ GD = 2/µ + L), ɛ IAG = 2/L), ad ɛ DIAG = 2/µ + L). Moreover, we report the covergece paths of these algorithms for their best choice of stepsize i practice. The results verify the advatage of the proposed DIAG method relative to IAG ad GD i both scearios. 14

15 objective fuctio value error fx k ) fx ) GD with ǫ = 2/µ+L) IAG with ǫ = 2/L) DIAG with ǫ = 2/µ+L) GD with the best stepsize IAG with the best stepsize DIAG with the best stepsize umber of passes over the dataset Figure 3: Covergece paths of GD, IAG, ad DIAG for the biary classificatio applicatio. Appedix A. Proof of Lemma 2 Cosider the update i 18). Subtract the optimal argumet x from both sides of the equality to obtai x k+1 x = 1 yi k x ) ɛ f i yi k ). 39) Note that the global objective fuctio gradiet at the optimal poit is ull, i.e., 1/) f ix ) = 0. This observatio i cojuctio with the expressio i 39) leads to x k+1 x = 1 y k i x ) ɛ fi yi k ) f i x ) ) = 1 y k i x ɛ f i y k i ) f i x ) ). 40) Compute the orm of both sides i 40), ad use the Cauchy-Schwarz iequality to obtai x k+1 x 1 y k i x ɛ f i yi k ) f i x ) ). 41) Now we proceed to derive a upper boud for each summad i 41). Accordig to the result i Ryu ad Boyd 2016); page 13, if the fuctios are µ-strogly covex ad their gradiets are L-Lipschitz cotiuous we ca show that y k i x ɛ f i y k i ) f i x ) ) max{ 1 ɛµ, 1 ɛl } y k i x. 42) By settig the stepsize ɛ i 42) as ɛ = 2/µ + L), we ca write y k i x ɛ f i y k i ) f i x ) ) κ 1 κ + 1 yk i x, 43) where κ = L/µ is the fuctio f i coditio umber. By replacig the summads i the right had side of 41) with their upper bouds κ 1)/κ + 1)) yi k x, as show i 43), we ca show that the residual x k+1 x is bouded above as x k+1 x ) κ 1 κ + 1 y k i x 44) 15

16 Note that i the DIAG method we use a cyclic scheme to update the variables. Thus, the set of variables {y1, k..., y} k is idetical to the set of the last iterates before the iterate x k+1 which si give by {x k,..., x k +1 }. Thus, we ca replace the sum i yk i x i 44) by the sum xk i+1 x which follows the claim i 22). Appedix B. Proof of Propositio 3 Cosider the defiitio of the costat ρ := κ 1)/κ + 1) where κ = L/µ is the objective fuctio coditio umber. Thus, if all the copies y i are iitialized at x 0, the result i Lemma 2 implies that We ca use the same iequality for the secod iterate to obtai x 1 x ρ x 0 x. 45) x 2 x ρ x1 x ρ 1) + x 0 x ρ2 x0 x ρ 1) + x 0 x = ρ 1 1 ρ x 0 x, 46) where the secod iequality follows by replacig x 1 x by its upper boud i 45), ad the equality holds by regroupig the terms. We repeat the same process for the third residual x 3 x to obtai x 3 x ρ x2 x + ρ x1 x + 21 ρ) ρ1 ρ) ρ 1 2 ρ 2) x 0 x x 0 x, 47) where i the secod iequality we use the bouds i 45) ad 46). Sice the term ρ1 ρ)/ 2 is egative we ca drop this term ad show that the residual x 3 x is upper bouded by x 3 x ρ 1 21 ρ) x 0 x. 48) By followig the same logic we ca show that for the first residuals { x k x } k=1 the followig iequality holds x k x k 1)1 ρ) ρ 1 x 0 x, for k = 1,...,. 49) Now we proceed to prove the claim i... by iductio. Assume that the followig coditio holds for j 0 x k x ρ j+1 k j 1)1 ρ) 1 x 0 x, for k = j + 1,..., j +. 50) The, our goal is to show that the same iequalities hold for j + 1, i.e., x k x ρ j+2 1 k j + 1) 1)1 ρ) x 0 x, for k = j + 1) + 1,..., j + 1) +. 51) 16

17 To do so, we start we the time idex k = j + 1) + 1. Accordig to the result i Lemma 2 we ca boud the residual x j+1)+1 x by x j+1)+1 x ρ xj+1 x + + ρ xj+ x 52) Cosiderig the iequalities i 50) we ca show that x k x for k = j +1,..., j + is bouded above by ρ j+1 x 0 x. Replacig the terms i 52) by their upper boud ρ j+1 x 0 x yields x j+1)+1 x ρ j+2 x 0 x 53) We use the result i Lemma 2 for k + 1 = j + 1) + 2 this time to obtai x j+1)+2 x ρ xj+1)+1 x + + ρ xj+ x 54) By replacig the first summad i the right had side of 54) by its upper boud i 53) ad the rest of the summads by the upper boud ρ j+1 x 0 x we ca write x j+1)+2 x ρj+3 x0 x + = ρ j ρ 1)ρj+2 x 0 x Followig the same steps for the ext residual x j+1)+3 x yields x 0 x 55) x j+1)+3 x ρ xj+1)+2 x + ρ xj+1)+1 x + ρ xj+3 x + + ρ xj+ x 2ρj+3 = ρ j+2 1 x0 x + 21 ρ) 2)ρj+2 x 0 x x 0 x 56) Note that i the secod iequality we have replaced x j+1)+2 x by the upper boud ρ j+2 x 0 x which is looser tha the upper boud i 55). By repeatig the same scheme we ca show that x j+1)+u x ρ j+2 u1 ρ) 1 x 0 x, 57) for u = 0,..., 1. Note that this result is idetical to the claim i 51). Thus, if the sets of ieqaulities i 50) hold for j it also hold for the ext set of iequalities that are geerated for j + 1. Therefore, the proof is complete by iductio ad the claim i 23) follows. Appedix C. Proof of Theorem 5 Accordig to the proof i Appedix B, we kow that for k = 1,..., the followig iequality holds. x k x k 1)1 ρ) ρ 1 x 0 x 58) Combiig this result ad the defiitio of the costat a i 26) we obtai that x k x aγ k x 0 x, for k = 1,...,. 59) Thus, the iequality i 25) holds for steps k = 1,...,. 17

18 Now we proceed to show that the claim i 25) also holds for k. To do so, we use a iductio argumet. Let s assume we aim to show that the iequality i 25) holds for k = j, while it holds for the last iterates k = j 1,..., j. Accordig to the result i Lemma 2 we ca write x x j x j 1 x + + x j x ρ, 60) where ρ = κ 1)/κ + 1). Based o the iductio assumptio, for steps k = j 1,..., j, the result i 25) holds. Thus, we ca replace the terms i the right had side of 60) by the upper bouds from 25). This substitutio implies x j x ρa γ j γ j x 0 x = ρaγj 1 γ ) x 0 x 61) 1 γ) Rearragig the terms i 27) allows us to show that ρ1 γ ))/1 γ)) is bouded above by γ. This is true sice γ ρ ) γ + ρ 0 ρ 1 γ ) γ 1 γ) 0 ρ1 γ ) 1 γ) γ. 62) Therefore, we ca replace the term ρ1 γ ))/1 γ)) i 61) by its upper boud γ to obtai x j x aγ j x 0 x. 63) The result i 63) completes the proof. Thus, by iductio the claim i 25) holds for all k 1 if the coditios i 26) ad 27) satisfied. Appedix D. Proof of Propositio 6 To prove the claim i Propositio 6 we first derive the followig lemma. Lemma 11 For all 1 ad 0 φ 1 we have 1 ) φ 1 φ ) +1 64) + 1 Proof Cosider the fuctio hx) = 1 φ/x)) x for x > 1. The atural logarithm of the fuctio hx) is give by l hx)) = x l1 φ/x)). Compute the derivative of both sides with respect to x to obtai dh dx 1 hx) = l 1 φ ) + x x φ x 2 1 φ x By multiplyig both sides by hx), replacig hx) by the expressio terms we obtai that the derivative of the fuctio hx) is give by dh dx = 1 φ ) x l 1 φ ) φ x + x x 1 φ x 65) 1 φ x ) x, ad simplifyig the Note that the sum l1 u) + u/1 u) is always positive for 0 < u < 1. By seetig u := φ/x, we ca coclude that the term i the right had side of 66) is positive for x > 1. Therefore, the 66) 18

19 derivative dh/dx is always positive for x > 1. Thus, the fuctio hx) is a icreasig fuctio for x > 1 ad we ca write 1 ) φ 1 φ ) +1, 67) + 1 for > 1. It remais to show that the same the claim is also valid for = 1 which is equivalet to the iequality 1 φ 1 φ 2 ) 2. 68) This iequality is trivial, ad, therefore, the claim i 64) holds for all 1. Now proceed to prove the claim i Propositio 6 usig the result i Lemma 11. To prove that the feasible set of the coditio i 27) is o-empty we show that γ = ρ 1/ satisfies the iequality i 27). I other words, ρ ρ ) ρ + ρ 0 69) Divide both sides of 69) by ρ ad regroupe the terms to obtai the followig ieqaulity ρ 1 1 ρ ), 70) which is equivalet to 69). I other words, the iequality i 70) is a ecessary ad sufficiet coditio for the coditio i 69). Recall the result i Lemma 11. By settig φ = 1 ρ we obtai that ρ = 1 1 ρ ) ρ ) ρ ), 71) 1 2 for 1. Thus, the iequality i 70) holds, ad, cosequetly, the iequality i 69) is valid. Therefore, γ = ρ 1/ satisfies the iequality i 27). The, we ca defie a as the smallest costat that satisfies 26) for the choice γ = ρ 1/, which is give by a = max 1 k=1,..., ) k 1)1 ρ) ρ 1 k. 72) Therefore, γ = ρ 1/ ad the costat a i 72) satisfy the coditios i 26) ad 27), ad the claim i Propositio 6 follows. Refereces Doro Blatt, Alfred O Hero, ad Hillel Gauchma. A coverget icremetal gradiet method with a costat step size. SIAM Joural o Optimizatio, 181):29 51, Léo Bottou. Large-scale machie learig with stochastic gradiet descet. I Proceedigs of COMPSTAT 2010, pages Spriger, Léo Bottou ad Ya Le Cu. O-lie learig for very large data sets. Applied stochastic models i busiess ad idustry, 212): ,

20 Aaro Defazio, Fracis Bach, ad Simo Lacoste-Julie. SAGA: A fast icremetal gradiet method with support for o-strogly covex composite objectives. I Advaces i Neural Iformatio Processig Systems, pages , Mert Gürbüzbalaba, Asuma Ozdaglar, ad Pablo Parrilo. O the covergece rate of icremetal aggregated gradiet algorithms. arxiv preprit arxiv: , Roger A Hom ad Charles R Johso. Topics i matrix aalysis Rie Johso ad Tog Zhag. Acceleratig stochastic gradiet descet usig predictive variace reductio. I Advaces i Neural Iformatio Processig Systems, pages , Jakub Koečỳ ad Peter Richtárik. Semi-stochastic gradiet descet methods. arxiv preprit arxiv: , Ya LeCu, Coria Cortes, ad Christopher JC Burges. The MNIST database of hadwritte digits, Hogzhou Li, Julie Mairal, ad Zaid Harchaoui. A uiversal catalyst for first-order optimizatio. I Advaces i Neural Iformatio Processig Systems, pages , Julie Mairal. Icremetal majorizatio-miimizatio optimizatio with applicatio to large-scale machie learig. SIAM Joural o Optimizatio, 252): , Yurii Nesterov. Itroductory lectures o covex optimizatio, volume 87. Spriger Sciece & Busiess Media, Herbert Robbis ad Sutto Moro. A stochastic approximatio method. The aals of mathematical statistics, pages , Nicolas L Roux, Mark Schmidt, ad Fracis R Bach. A stochastic gradiet method with a expoetial covergece rate for fiite traiig sets. I Advaces i Neural Iformatio Processig Systems, pages , Erest K Ryu ad Stephe Boyd. Primer o mootoe operator methods. Appl. Comput. Math, 15 1):3 43, Shai Shalev-Shwartz ad Tog Zhag. Stochastic dual coordiate ascet methods for regularized loss. The Joural of Machie Learig Research, 141): , Shai Shalev-Shwartz ad Tog Zhag. Accelerated proximal stochastic dual coordiate ascet for regularized loss miimizatio. Mathematical Programmig, ): , Paul Tseg ad Sagwoo Yu. Icremetally updated gradiet methods for costraied ad regularized optimizatio. Joural of Optimizatio Theory ad Applicatios, 1603): , Li Xiao ad Tog Zhag. A proximal stochastic gradiet method with progressive variace reductio. SIAM Joural o Optimizatio, 244): , Liju Zhag, Mehrdad Mahdavi, ad Rog Ji. Liear covergece with coditio umber idepedet access of full gradiets. I Advaces i Neural Iformatio Processig Systems, pages ,

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data

Doubly Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization with Factorized Data Doubly Stochastic Primal-Dual Coordiate Method for Regularized Empirical Risk Miimizatio with Factorized Data Adams Wei Yu, Qihag Li, Tiabao Yag Caregie Mello Uiversity The Uiversity of Iowa weiyu@cs.cmu.edu,

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

Differentiable Convex Functions

Differentiable Convex Functions Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for

More information

Feedback in Iterative Algorithms

Feedback in Iterative Algorithms Feedback i Iterative Algorithms Charles Byre (Charles Byre@uml.edu), Departmet of Mathematical Scieces, Uiversity of Massachusetts Lowell, Lowell, MA 01854 October 17, 2005 Abstract Whe the oegative system

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

1+x 1 + α+x. x = 2(α x2 ) 1+x

1+x 1 + α+x. x = 2(α x2 ) 1+x Math 2030 Homework 6 Solutios # [Problem 5] For coveiece we let α lim sup a ad β lim sup b. Without loss of geerality let us assume that α β. If α the by assumptio β < so i this case α + β. By Theorem

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Math 113, Calculus II Winter 2007 Final Exam Solutions

Math 113, Calculus II Winter 2007 Final Exam Solutions Math, Calculus II Witer 7 Fial Exam Solutios (5 poits) Use the limit defiitio of the defiite itegral ad the sum formulas to compute x x + dx The check your aswer usig the Evaluatio Theorem Solutio: I this

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

MAS111 Convergence and Continuity

MAS111 Convergence and Continuity MAS Covergece ad Cotiuity Key Objectives At the ed of the course, studets should kow the followig topics ad be able to apply the basic priciples ad theorems therei to solvig various problems cocerig covergece

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem This is the Pre-Published Versio. A New Solutio Method for the Fiite-Horizo Discrete-Time EOQ Problem Chug-Lu Li Departmet of Logistics The Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog Phoe: +852-2766-7410

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

Introduction to Optimization Techniques

Introduction to Optimization Techniques Itroductio to Optimizatio Techiques Basic Cocepts of Aalysis - Real Aalysis, Fuctioal Aalysis 1 Basic Cocepts of Aalysis Liear Vector Spaces Defiitio: A vector space X is a set of elemets called vectors

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Math 312 Lecture Notes One Dimensional Maps

Math 312 Lecture Notes One Dimensional Maps Math 312 Lecture Notes Oe Dimesioal Maps Warre Weckesser Departmet of Mathematics Colgate Uiversity 21-23 February 25 A Example We begi with the simplest model of populatio growth. Suppose, for example,

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Axioms of Measure Theory

Axioms of Measure Theory MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that

More information

Complex Analysis Spring 2001 Homework I Solution

Complex Analysis Spring 2001 Homework I Solution Complex Aalysis Sprig 2001 Homework I Solutio 1. Coway, Chapter 1, sectio 3, problem 3. Describe the set of poits satisfyig the equatio z a z + a = 2c, where c > 0 ad a R. To begi, we see from the triagle

More information

Are adaptive Mann iterations really adaptive?

Are adaptive Mann iterations really adaptive? MATHEMATICAL COMMUNICATIONS 399 Math. Commu., Vol. 4, No. 2, pp. 399-42 (2009) Are adaptive Ma iteratios really adaptive? Kamil S. Kazimierski, Departmet of Mathematics ad Computer Sciece, Uiversity of

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Fastest mixing Markov chain on a path

Fastest mixing Markov chain on a path Fastest mixig Markov chai o a path Stephe Boyd Persi Diacois Ju Su Li Xiao Revised July 2004 Abstract We ider the problem of assigig trasitio probabilities to the edges of a path, so the resultig Markov

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

The log-behavior of n p(n) and n p(n)/n

The log-behavior of n p(n) and n p(n)/n Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity

More information

Math 113 Exam 4 Practice

Math 113 Exam 4 Practice Math Exam 4 Practice Exam 4 will cover.-.. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

An Alternative Scaling Factor In Broyden s Class Methods for Unconstrained Optimization

An Alternative Scaling Factor In Broyden s Class Methods for Unconstrained Optimization Joural of Mathematics ad Statistics 6 (): 63-67, 00 ISSN 549-3644 00 Sciece Publicatios A Alterative Scalig Factor I Broyde s Class Methods for Ucostraied Optimizatio Muhammad Fauzi bi Embog, Mustafa bi

More information

PAPER : IIT-JAM 2010

PAPER : IIT-JAM 2010 MATHEMATICS-MA (CODE A) Q.-Q.5: Oly oe optio is correct for each questio. Each questio carries (+6) marks for correct aswer ad ( ) marks for icorrect aswer.. Which of the followig coditios does NOT esure

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Read carefully the instructions on the answer book and make sure that the particulars required are entered on each answer book.

Read carefully the instructions on the answer book and make sure that the particulars required are entered on each answer book. THE UNIVERSITY OF WARWICK FIRST YEAR EXAMINATION: Jauary 2009 Aalysis I Time Allowed:.5 hours Read carefully the istructios o the aswer book ad make sure that the particulars required are etered o each

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you

More information

A collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation

A collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation Iteratioal Joural of Mathematics Research. ISSN 0976-5840 Volume 9 Number 1 (017) pp. 45-51 Iteratioal Research Publicatio House http://www.irphouse.com A collocatio method for sigular itegral equatios

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

Strong Convergence Theorems According. to a New Iterative Scheme with Errors for. Mapping Nonself I-Asymptotically. Quasi-Nonexpansive Types

Strong Convergence Theorems According. to a New Iterative Scheme with Errors for. Mapping Nonself I-Asymptotically. Quasi-Nonexpansive Types It. Joural of Math. Aalysis, Vol. 4, 00, o. 5, 37-45 Strog Covergece Theorems Accordig to a New Iterative Scheme with Errors for Mappig Noself I-Asymptotically Quasi-Noexpasive Types Narogrit Puturog Mathematics

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Kinetics of Complex Reactions

Kinetics of Complex Reactions Kietics of Complex Reactios by Flick Colema Departmet of Chemistry Wellesley College Wellesley MA 28 wcolema@wellesley.edu Copyright Flick Colema 996. All rights reserved. You are welcome to use this documet

More information

A constructive analysis of convex-valued demand correspondence for weakly uniformly rotund and monotonic preference

A constructive analysis of convex-valued demand correspondence for weakly uniformly rotund and monotonic preference MPRA Muich Persoal RePEc Archive A costructive aalysis of covex-valued demad correspodece for weakly uiformly rotud ad mootoic preferece Yasuhito Taaka ad Atsuhiro Satoh. May 04 Olie at http://mpra.ub.ui-mueche.de/55889/

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam 4 will cover.-., 0. ad 0.. Note that eve though. was tested i exam, questios from that sectios may also be o this exam. For practice problems o., refer to the last review. This

More information

There is no straightforward approach for choosing the warmup period l.

There is no straightforward approach for choosing the warmup period l. B. Maddah INDE 504 Discrete-Evet Simulatio Output Aalysis () Statistical Aalysis for Steady-State Parameters I a otermiatig simulatio, the iterest is i estimatig the log ru steady state measures of performace.

More information

ON THE LEHMER CONSTANT OF FINITE CYCLIC GROUPS

ON THE LEHMER CONSTANT OF FINITE CYCLIC GROUPS ON THE LEHMER CONSTANT OF FINITE CYCLIC GROUPS NORBERT KAIBLINGER Abstract. Results of Lid o Lehmer s problem iclude the value of the Lehmer costat of the fiite cyclic group Z/Z, for 5 ad all odd. By complemetary

More information

Uniform Strict Practical Stability Criteria for Impulsive Functional Differential Equations

Uniform Strict Practical Stability Criteria for Impulsive Functional Differential Equations Global Joural of Sciece Frotier Research Mathematics ad Decisio Scieces Volume 3 Issue Versio 0 Year 03 Type : Double Blid Peer Reviewed Iteratioal Research Joural Publisher: Global Jourals Ic (USA Olie

More information

Polynomial identity testing and global minimum cut

Polynomial identity testing and global minimum cut CHAPTER 6 Polyomial idetity testig ad global miimum cut I this lecture we will cosider two further problems that ca be solved usig probabilistic algorithms. I the first half, we will cosider the problem

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Computation of Error Bounds for P-matrix Linear Complementarity Problems

Computation of Error Bounds for P-matrix Linear Complementarity Problems Mathematical Programmig mauscript No. (will be iserted by the editor) Xiaoju Che Shuhuag Xiag Computatio of Error Bouds for P-matrix Liear Complemetarity Problems Received: date / Accepted: date Abstract

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

A class of spectral bounds for Max k-cut

A class of spectral bounds for Max k-cut A class of spectral bouds for Max k-cut Miguel F. Ajos, José Neto December 07 Abstract Let G be a udirected ad edge-weighted simple graph. I this paper we itroduce a class of bouds for the maximum k-cut

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Math 341 Lecture #31 6.5: Power Series

Math 341 Lecture #31 6.5: Power Series Math 341 Lecture #31 6.5: Power Series We ow tur our attetio to a particular kid of series of fuctios, amely, power series, f(x = a x = a 0 + a 1 x + a 2 x 2 + where a R for all N. I terms of a series

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Multi parameter proximal point algorithms

Multi parameter proximal point algorithms Multi parameter proximal poit algorithms Ogaeditse A. Boikayo a,b,, Gheorghe Moroşau a a Departmet of Mathematics ad its Applicatios Cetral Europea Uiversity Nador u. 9, H-1051 Budapest, Hugary b Departmet

More information

CS537. Numerical Analysis and Computing

CS537. Numerical Analysis and Computing CS57 Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 Jauary 9 9 What is the Root May physical system ca be

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics ad Egieerig Lecture otes Sergei V. Shabaov Departmet of Mathematics, Uiversity of Florida, Gaiesville, FL 326 USA CHAPTER The theory of covergece. Numerical sequeces..

More information

Chapter 7 Isoperimetric problem

Chapter 7 Isoperimetric problem Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated

More information