Stochastic Variance-Reduced Cubic Regularized Newton Method
|
|
- Archibald Roberts
- 5 years ago
- Views:
Transcription
1 Sochasic Variance-Reduced Cubic Regularized Newon Mehod Dongruo Zhou and Pan Xu and Quanquan Gu arxiv: v1 [cs.lg] 13 Feb 2018 February 9, 2018 Absrac We propose a sochasic variance-reduced cubic regularized Newon mehod for non-convex opimizaion. A he core of our algorihm is a novel semi-sochasic gradien along wih a semi-sochasic Hessian, which are specifically designed for cubic regularizaion mehod. We show ha our algorihm is guaraneed o converge o an (ɛ, ɛ)-approximaely local minimum wihin Õ(n4/5 /ɛ 3/2 ) second-order oracle calls, which ouperforms he sae-of-he-ar cubic regularizaion algorihms including subsampled cubic regularizaion. Our work also sheds ligh on he applicaion of variance reducion echnique o high-order non-convex opimizaion mehods. Thorough experimens on various non-convex opimizaion problems suppor our heory. 1 Inroducion We sudy he following finie-sum opimizaion problem: min x R d F (x) = 1 n n f i (x), (1.1) where F (x) and each f i (x) can be non-convex. Such problems are common in machine learning, where each f i (x) is a loss funcion on a raining example (LeCun e al., 2015). Since F (x) is non-convex, finding is global minimum is generally NP-Hard (Hillar and Lim, 2013). As a resul, one possible goal is o find an approximae firs-order saionary poin (ɛ saionary poin): i=1 F (x) ɛ, for some given ɛ > 0. A lo of sudies have been devoed o his problem including gradien descen (GD), sochasic gradien descen (SGD) (Robbins and Monro, 1951), and heir exensions Deparmen of Sysems and Informaion Engineering, Universiy of Virginia, Charloesville, VA 22904, USA; dz4bd@virginia.edu Deparmen of Compuer Science, Universiy of Virginia, Charloesville, VA 22904, USA; px3ds@virginia.edu Deparmen of Compuer Science, Universiy of Virginia, Charloesville, VA 22904, USA; qg5w@virginia.edu 1
2 (Ghadimi and Lan, 2013; Reddi e al., 2016a; Allen-Zhu and Hazan, 2016; Ghadimi and Lan, 2016). Neverheless, firs-order saionary poins can be non-degenerae saddle poins or even local maximum in non-convex opimizaion, which are undesirable. Therefore, a more reasonable objecive is o find an approximae second-order saionary poin (Neserov and Polyak, 2006), which is also known as an (ɛ g, ɛ h )-approximae local minimum of F (x): F (x) 2 < ɛ g, λ min ( 2 F (x)) ɛ h, (1.2) for some given consan ɛ g, ɛ h > 0. In fac, in some machine learning problems like marix compleion (Ge e al., 2016), one finds ha every local minimum is a global minimum, suggesing ha finding an approximae local minimum is a beer choice han a saionary poin, and is good enough in many applicaions. One of he mos popular mehod o achieve his goal is perhaps cubic regularized Newon mehod, which was inroduced by Neserov and Polyak (2006), and solves he following kind of subproblems in each ieraion: h(x) = argmin h R d m(h, x) = F (x), h F (x)h, h + θ 6 h 3 2, (1.3) where θ > 0 is a regularizaion parameer. Neserov and Polyak (2006) proved ha fixing a saring poin x 0, and performing he updaing rule x = x 1 +h(x 1 ), he algorihm can oupu a sequence x i ha converges o a local minimum provided ha he funcion is Hessian Lipschiz. However, i can be seen ha o solve he subproblem (1.3), one needs o calculae he full gradien F (x) and Hessian 2 F (x), which is a big overhead in large scale machine learning problem because n is ofen very large. Some recen sudies presened various algorihms o avoid he calculaion of full gradien and Hessian in cubic regularizaion. Kohler and Lucchi (2017) used subsampling echnique o ge approximae gradien and Hessian insead of exac ones, and Xu e al. (2017b) also used subsampled Hessian. Boh of hem can reduce he compuaional complexiy in some circumsance. However, jus like oher sampling-based algorihm such as subsampled Newon mehod Erdogdu and Monanari (2015); Xu e al. (2016); Roosakhorasani and Mahoney (2016a,b); Ye e al. (2017), heir convergence raes are worse han ha of he Newon mehod, especially when one needs a high-accuracy soluion (i.e., he opimizaion error ɛ is small). This is because he subsampling size one needs o achieve cerain accuracy may be even larger han he full sample size n. Therefore, a naural quesion arises as follows: When we need a high-accuracy local minimum, is here an algorihm ha can oupu an approximae local minimum wih beer second-order oracle complexiy han cubic regularized Newon mehod? In his paper, we give an affirmaive answer o he above quesion. We propose a novel cubic regularizaion algorihm named Sochasic Variance-Reduced Cubic regularizaion (), which incorporaes he variance reducion echniques (Johnson and Zhang, 2013; Xiao and Zhang, 2014; Allen-Zhu and Hazan, 2016; Reddi e al., 2016a) ino he cubic-regularized Newon mehod. The key componen in our algorihm is a novel semi-sochasic gradien, ogeher wih a semisochasic Hessian, ha are specifically designed for cubic regularizaion. Furhermore, we prove 2
3 ha, for Hessian Lipschiz funcions, o aain an approximae (ɛ, ρɛ)-local minimum, our proposed algorihm requires O(n+n 4/5 /ɛ 3/2 ) Second-order Oracle (SO) calls and O(1/ɛ 3/2 ) Cubic Subproblem Oracle (CSO) calls. Here an ISO oracle represens an evaluaion of riple (f i (x), f i (x), 2 f i (x)), and a CSO oracle denoes an evaluaion of he exac soluion (or inexac soluion) of he cubic subproblem (1.3). Compared wih he original cubic regularizaion algorihm (Neserov and Polyak, 2006), which requires O(n/ɛ 3/2 ) ISO calls and O(1/ɛ 3/2 ) CSO calls, our proposed algorihm reduces he SO calls by a facor of Ω(n 1/5 ). We also carry ou experimens on real daa o demonsrae he superior performance of our algorihm. Our major conribuions are summarized as follows: We presen a novel cubic regularizaion mehod wih improved oracle complexiy. To he bes of our knowledge, his is he firs algorihm ha ouperforms cubic regularizaion wihou any loss in convergence rae. This is in sharp conras o subsampled cubic regularizaion mehods (Kohler and Lucchi, 2017; Xu e al., 2017a), which suffer from worse convergence raes han cubic regularizaion. We also exend our algorihm o he case wih inexac soluion o he cubic regularizaion subproblem. Similar o previous work (Caris e al., 2011; Xu e al., 2017a), we also layou a se of sufficien condiions, under which he oupu of he inexac algorihm is sill guaraneed o have he same convergence rae and oracle complexiy as he exac algorihm. This furher sheds ligh on he pracical implemenaion of our algorihm. As far as we know, our work is he firs work, which rigorously demonsraes he advanage of variance reducion for second-order opimizaion algorihms. Alhough here exis a few sudies (Lucchi e al., 2015; Moriz e al., 2016; Rodomanov and Kropoov, 2016) using variance reducion o accelerae Newon mehod, none of hem can deliver faser raes of convergence han sandard Newon mehod. Noaion We use [n] o denoe he index se {1, 2,..., n}. We use v 2 o denoe vecor Euclidean norm. For symmeric marix H R d d, we denoe is eigenvalues by λ 1 (H)... λ d (H), is specral norm by H 2 = max{ λ 1 (H), λ d (H) }, and he Schaen r-norm by H Sr = ( d i=1 λ i(h) r ) 1/r for r 1. We denoe A B if λ 1 (A B) 0 for symmeric marices A, B R d d. We also noe ha A B 2 C A 2 B 2 C I, C > 0. We call ξ a Rademacher random variable if P(ξ = 1) = P(ξ = 1) = 1/2. We use f n = O(g n ) o denoe ha f n Cg n for some consan C > 0 and use f n = Õ(g n) o hide he logarihmic erms of g n. 2 Relaed Work In his secion, we briefly review he relevan work in he lieraure. The mos relaed work o ours is he cubic regularized Newon mehod, which was originally proposed in Neserov and Polyak (2006). Caris e al. (2011) presened an adapive framework of cubic regularizaion, which uses an adapive esimaion of he local Lipschiz consan and approximae soluion o he cubic subproblem. To overcome he compuaional burden of gradien and Hessian marix evaluaions, Kohler and Lucchi (2017); Xu e al. (2017b,a) proposed o use subsampled gradien and Hessian in cubic regularizaion. On he oher hand, in order o solve he 3
4 cubic subproblem (1.3) more efficienly, Carmon and Duchi (2016) proposed o use gradien descen, while Agarwal e al. (2017) proposed a sophisicaed algorihm based on approximae marix inverse and approximae PCA. Tripuraneni e al. (2017) proposed a refined sochasic cubic regularizaion algorihm based on above subproblem solver. However, none of he aforemenioned varians of cubic regularizaion ouperforms he original cubic regularizaion mehod in erms of he oracle complexiy. Anoher imporan line of relaed research is he variance reducion mehod, which has been exensively sudied for large-scale finie-sum opimizaion problems. Variance reducion was firs proposed in convex finie-sum opimizaion (Roux e al., 2012; Johnson and Zhang, 2013; Xiao and Zhang, 2014; Defazio e al., 2014), which uses semi-sochasic gradien o reduce he variance of he sochasic gradien and improves he gradien complexiy of boh sochasic gradien descen (SGD) and gradien descen (GD). Represenaive algorihms include Sochasic Average Gradien (SAG) (Roux e al., 2012), Sochasic Variance Reduced Gradien (SVRG) (Johnson and Zhang, 2013) and SAGA (Defazio e al., 2014), o menion a few. For non-convex finie-sum opimizaion, Garber and Hazan (2015); Shalev-Shwarz (2016) proposed algorihms for he seing where each individual funcion may be non-convex, bu heir sum is sill convex. Laer on, Reddi e al. (2016a) and Allen-Zhu and Hazan (2016) exended he SVRG algorihm o he general non-convex finie-sum opimizaion, which ouperforms SGD and GD in erms of gradien complexiy as well. However, o he bes of our knowledge, i is sill an open problem wheher variance reducion can also improve he oracle complexiy of second-order opimizaion algorihms. Las bu no he leas is he line of research which aims o escape from nondegeneraed saddle poins by finding he negaive curvaure direcion. There is a vas lieraure which focuses on algorihms escaping from saddle poin by using informaion of gradien and negaive curvaure insead of considering he subproblem (1.3). Ge e al. (2015), Jin e al. (2017a) showed ha simple (sochasic) gradien descen wih perurbaion can escape from saddle poins. Carmon e al. (2016); Royer and Wrigh (2017); Allen-Zhu (2017) showed ha by calculaing he negaive curvaure using Hessian informaion, one can find (ɛ, ɛ)-local minimum faser han he firs-order mehods. Recen work (Allen-Zhu and Li, 2017; Jin e al., 2017b; Xu e al., 2017c) proposed firs-order algorihms ha can escape from saddle poins wihou using Hessian informaion. For beer comparison of our algorihm wih he mos relaed algorihms in erms of SO and CSO oracle complexiies, we summarize he resuls in Table 1. I can be seen from Table 1 ha our algorihm () achieves he lowes (SO and CSO) oracle complexiy compared wih he original cubic regularizaion mehod (Neserov and Polyak, 2006) which employs full gradien and Hessian evaluaions and he subsampled cubic mehod (Kohler and Lucchi, 2017; Xu e al., 2017b). In paricular, our algorihm reduces he SO oracle complexiy of cubic regularizaion by a facor of n 1/5 for finding an (ɛ, ɛ)-local minimum. We will provide more deailed discussion in he main heory secion. 1 I is he refined rae proved by Xu e al. (2017b) for he subsampled cubic regularizaion algorihm proposed in Kohler and Lucchi (2017) 4
5 Table 1: Comparisons beween differen mehods o find (ɛ, ɛ)-local minimum on he second-order oracle (SO) complexiy and and he cubic subproblem oracle (CSO) complexiy. Algorihm SO calls CSO calls Gradien Lipschiz Cubic regularizaion (Neserov and Polyak, 2006) Subsampled cubic regularizaion (Kohler and Lucchi, 2017; Xu e al., 2017b) (his paper) O(n/ɛ 3/2 ) O(1/ɛ 3/2 ) no yes Õ(n/ɛ 3/2 + 1/ɛ 5/2 ) 1 O(1/ɛ 3/2 ) yes yes Õ(n + n 4/5 /ɛ 3/2 ) O(1/ɛ 3/2 ) no yes Hessian Lipschiz 3 The Proposed Algorihm In his secion, we presen a novel algorihm, which uilizes sochasic variance reducion echniques o improve cubic regularizaion mehod. To reduce he compuaion burden of gradien and Hessian marix evaluaions in he cubic regularizaion updaes in (1.3), subsampled gradien and Hessian marix have been used in subsampled cubic regularizaion (Kohler and Lucchi, 2017; Xu e al., 2017b) and sochasic cubic regularizaion (Tripuraneni e al., 2017). Neverheless he sochasic gradien and Hessian marix have large variances, which undermine he convergence performance. Inspired by SVRG (Johnson and Zhang, 2013), we propose o use a semi-sochasic version of gradien and Hessian marix, which can conrol he variances auomaically. Specifically, our algorihm has wo loops. A he beginning of he s-h ieraion of he ouer loop, we denoe x s = x s+1 0. We firs calculae he full gradien g s = F ( x s ) and Hessian marix H s = 2 F ( x s ), which are sored for furher references in he inner loop. A he -h ieraion of he inner loop, we calculae he following semi-sochasic gradien and Hessian marix: v s+1 = 1 b g i I g U s+1 = 1 b h j I h ( fi (x s+1 ) f i ( x s ) + g s) 1 ( 2 f i ( x s ) H s) (x s+1 x s), (3.1) b g i I g ( 2 f j (x s+1 ) 2 f j ( x s ) ) + H s, (3.2) where I g and I h are bach index ses, and he bach sizes will be decided laer. In each inner ieraion, we solve he following cubic regularizaion subproblem: h s+1 = argmin m s+1 (h) = v s+1 h Us+1 h, h + M s+1, h (3.3) Then we perform he updae x s+1 +1 = xs+1 + h s+1 in he -h ieraion of he inner loop. The proposed algorihm is displayed in Algorihm 1. 5
6 Algorihm 1 Sochasic Variance Reducion Cubic Regularizaion () 1: Inpu: bach size b g, b h, penaly parameer M s,, s = 1... S, = 0... T, saring poin x 1. 2: Iniializaion 3: for s = 1,..., S do 4: x s+1 0 = x s 5: g s = F ( x s ) = 1 n n i=1 f i( x s ), H s = 1 n n i=1 2 f i ( x s ) 6: for = 0,..., T 1 do 7: Sample index se I g, I h, I g = b g, I h = b h ; 8: v s+1 = 1 b g i I g f i (x s+1 ) f i ( x s ) + g s ( 1 b g i I g 2 f i ( x s ) H s) (x s+1 x s ) 9: U s+1 = 1 b h ( j I h 2 f j (x s+1 ) 2 f j ( x s )) + H s 10: h s+1 = argmin h v s+1, h Us+1 h, h + M s+1, 6 h 3 2, 11: x s+1 +1 = xs+1 + h s+1 12: end for 13: x s+1 = x s+1 T 14: end for 15: Oupu: random choose one x s, for = 0,..., T and s = 1,..., S. There are wo noable feaures of our esimaor of he full gradien and Hessian in each inner loop, compared wih ha used in SVRG (Johnson and Zhang, 2013). The firs is ha our gradien and Hessian esimaors consis of mini-baches of sochasic gradien and Hessian. The second one is ha we use second order informaion when we consruc he gradien esimaor v s+1, while classical SVRG only uses firs order informaion o build i. Inuiively speaking, boh feaures are used o make a more accurae esimaion of he rue gradien and Hessian wih affordable oracle calls. Noe ha similar approximaions of he gradien and Hessian marix have been saged in recen work by Gower e al. (2017) and Wai e al. (2017), where hey used his new kind of esimaor for radiional SVRG in he convex seing, which radically differs from our seing. 4 Main Theory We firs lay down he following Hessian Lipschiz assumpion, which are necessary for our analysis and are widely used in he lieraure (Neserov and Polyak, 2006; Xu e al., 2016; Kohler and Lucchi, 2017). Assumpion 4.1 (Hessian Lipschiz). There exiss a consan ρ > 0, such ha for all x, y and i [n] 2 f i (x) 2 f i (y) 2 ρ x y 2. In fac, his is he only assumpion we need o prove our heoreical resuls. The Hessian Lipschiz assumpion plays a cenral role in conrolling he changing speed of second order informaion. I is obvious ha Assumpion 4.1 implies he Hessian Lipschiz assumpion of F, which, according o Neserov and Polyak (2006), is also equivalen o he following lemma. 6
7 Lemma 4.2. Le funcion F : x R d saisfy ρ-hessian Lipschiz assumpion, hen for any h R d, i holds ha 2 F (x) 2 F (y) 2 ρ x y 2, F (x + h) F (x) + F (x), h F (x)h, h + ρ 6 h 3 2, F (x + h) F (x) 2 F (x)h 2 ρ 2 h 2 2. We hen define he following opimal funcion gap beween iniial poin x 0 and he global minimum of F. Definiion 4.3 (Opimal Gap). For funcion F ( ) and he iniial poin x 0, le F be where F = inf x R d F (x). F = inf{ R : F (x 0 ) F }, Wihou loss of generaliy, we assume F < + hroughou his paper. Before we presen nonasympoic convergence resuls of Algorihm 1, we define { µ(x s+1 ) = max F (x s+1 ) 3/2 2, λ3 min ( 2 F (x s+1 } )) [M s+1, ] 3/2. (4.1) By definiion in (4.1), µ(x s+1 ) < ɛ 3/2 holds if and only if F (x s+1 ) 2 ɛ, λ min ( 2 F (x s+1 ) ) > M s+1, ɛ. (4.2) Therefore, in order o find an (ɛ, ρɛ)-local minimum of he non-convex funcion F, i suffices o find a poin x s+1 which saisfies µ(x s+1 ) < ɛ 3/2, and M s+1, = O(ρ) for all s,. Nex we define our oracles formally: Definiion 4.4 (Second-order Oracle). Given an index i and a poin x, one second-order oracle (SO) call reurns such a riple: [f i (x), f i (x), 2 f i (x)]. (4.3) Definiion 4.5 (Cubic Subproblem Oracle). Given a vecor g R d, a Hessian marix H and a posiive consan θ, one Cubic Subproblem Oracle (CSO) call reurns h sol, where h sol can be solved exacly as follows h sol = argmin h R d g, h h, Hh + θ 6 h 3 2. Remark 4.6. The second-order oracle is a special form of Informaion Oracle which is inroduced by Neserov, which reurns gradien, Hessian and all high order derivaives of objecive funcion F (x). Here, our second-order oracle will only reurns firs and second order informaion a some poin of single objecive f i insead of F. We argue ha i is a reasonable adapion because in his paper we focus on finie-sum objecive funcion. The Cubic Subproblem Oracle will reurn an exac or inexac soluion of (3.3), which plays an imporan role in boh heory and pracice. 7
8 Now we are ready o give a general convergence resul of Algorihm 1: Theorem 4.7. Le A, B, α and β be arbirary posiive consans, choose M s, = M for each s. Define parameer sequences {Θ } T =0 and {c } T =0 as follows ( Θ c = M 3/2 + 1 ) ( ρ 3/2 Θ A 1/2 b 3/4 + M ) g B 2 Cρ3 (log d) 3/2 ( b 3/2 + c α ) h β 1/2, Θ = 3M 2ρ 4A 4B (4.4) c +1 (1 + 2α + β ), 12 c T = 0, where ρ is he Hessian Lipschiz consan, M is he regularizaion parameer of Algorihm 1, and C is an absolue consan. If seing bach size b h > 25 log d, M = O(ρ), and Θ > 0 for all, hen he oupu of Algorihm 1 saisfies where γ n = min Θ /(15M 3/2 ). E[µ(x ou )] E[ F ( x 0 ) F ], (4.5) γ n ST Remark 4.8. To ensure ha x ou is an (ɛ, ρɛ)-local minimum, we can se he righ hand side of (4.5) o be less hen ɛ 3/2. This immediaely implies ha he oal ieraion complexiy of Algorihm 1 is ST = O(E [ F ( x 0 ) F ] /ɛ 3/2 ), which maches he ieraion complexiy of cubic regularizaion Neserov and Polyak (2006). Remark 4.9. Noe ha here is a log d erm in he expression of parameer c, and i is only relaed o Hessian bach size b h. The log d erm comes from marix concenraion inequaliies, which is believed o be unavoidable (Tropp e al., 2015). In oher words, he bach size of Hessian marix b h has a ineviable relaion o dimension d, unlike he bach size of gradien b g. The ieraion complexiy resul in Theorem 4.7 depends on a series of parameer defined as in (4.4). In he following corollary, we will show how o choose hese parameers in pracice o achieve a beer oracle complexiy. Corollary Le bach sizes b g and b h saisfy b g = b h / log d = 1400n 2/5. Se he parameers in Theorem 4.7 as follows A = B = 125ρ, α = 2n 1/10, β = 4n 2/5. Θ and c are defined as in (4.4). Le he cubic regularizaion parameer be M = 2000ρ, and he epoch lengh be T = n 1/5. Then Algorihm 1 converges o a (ɛ, ρɛ)-local minimum wih ( O n + F ρn 4/5 ) ( ) F ρ SO calls and O CSO calls. (4.6) ɛ 3/2 Remark Corollary 4.10 saes ha we can reduce he SO calls by seing he bach size b g, b h relaed o n. In conras, in order o achieve a (ɛ, ρɛ) local minimum, original cubic regularizaion mehod in Neserov and Polyak (2006) needs O(n/ɛ 3/2 ) second-order oracle calls, which is by a facor of n 1/5 worse han ours. And subsampled cubic regularizaion Kohler and Lucchi (2017); Xu e al. (2017b) requires Õ(n/ɛ3/2 + 1/ɛ 5/2 ) SO calls, which is even worse. ɛ 3/2 8
9 5 Pracical Algorihm wih Inexac Oracle In pracice, he exac soluion o he cubic subproblem (3.3) canno be obained. Insead, one can only ge an approximae soluion by some inexac solver. Thus we replace he CSO oracle in (4.5) wih he following inexac CSO oracle h sol argmin h R d g, h h, Hh + θ 6 h 3 2. To analyze he performance of Algorihm 1 wih inexac cubic subproblem solver, we replace he exac solver in Line 10 of Algorihm 1 wih h s+1 argmin m s+1 (h). (5.1) In order o characerize he above inexac soluion, we presen he following sufficien condiion, under which inexac soluion can ensure he same oracle complexiy as he exac soluion: Condiion 5.1 (Inexac condiion). For each s, and given δ > 0, if saisfies h s+1 m s+1 ( h s+1 m s+1 ( h s+1 δ 3/2, h s+1 ) M s+1, h s δ, 12 h s h s δ. saisfies δ- inexac condiion Remark 5.2. Similar inexac condiions have been sudied in he lieraure of cubic regularizaion. For insance, Neserov and Polyak (2006) presened a pracical way o solve he cubic subproblem wihou erminaion condiion. Caris e al. (2011); Kohler and Lucchi (2017) presened erminaion crieria for approximae soluion o cubic subproblem, which is slighly differen from our condiion. In general, he erminaion crieria in Caris e al. (2011); Kohler and Lucchi (2017) conains a non-linear equaion, which is hard o verify and less pracical. In conras, our inexac condiion only conains inequaliy, which is easy o be verified in pracice. Nex we give he convergence resul wih inexac CSO oracle: h s+1 Theorem 5.3. Le o be he oupu in each inner loop of Algorihm 1 which saisfies Condiion 5.1. Le A, B, α, β > 0 be arbirary consans. Le M s, = M for each s, and Θ and c are defined in (4.4), where 1 T. If choosing bach size b h > 25 log d and M = O(ρ), and Θ > 0 for all, hen he oupu of Algorihm 1 wih inexac subproblem solver saisfies: E[µ(x ou )] E[F ( x0 ) F ] γ n ST + δ, where ( δ = δ Θ + Θ ) + 1, γ 2M 3/2 n = min Θ /(15M 3/2 ). 9
10 Remark 5.4. By he definiion of µ(x) in (4.1) and (4.2), in order o aain an (ɛ, ɛ) local minimum, we require E[µ(x ou )] ɛ 3/2 and hus δ < ɛ 3/2, which implies ha δ in Condiion 5.1 should saisfy δ < ɛ 3/2 /(Θ + Θ /(2M 3/2 ) + 1). Thus he oal ieraion complexiy of Algorihm 1 is O( F /(γ n ɛ 3/2 )). By he same choice of parameers, Algorihm 1 wih inexac oracle can achieve a reducion in SO calls. Corollary 5.5. Under Condiion 5.1, and under he same condiions as in Corollary 4.10, he oupu of Algorihm 1 wih he inexac subproblem solver saisfies Eµ(x ou ) ɛ 3/2 + δ f wihin where δ f = O(ρδ). ( O n + F ρn 4/5 ) ( ) F ρ SO calls and O CSO calls, ɛ 3/2 ɛ 3/ (a) a9a (b) covype (c) ijcnn ime in seconds (d) a9a ime in seconds (e) covype ime in seconds (f) ijcnn1 Figure 1: Logarihmic funcion value gap for nonconvex regularized logisic regression on differen daases. (a), (b) and (c) presen he oracle complexiy comparison; (d), (e) and (f) presen he runime comparison. 6 Experimens In his secion, we presen numerical experimens on differen non-convex Empirical Risk Minimizaion (ERM) problems and on differen daases o validae he advanage of our proposed algorihm in finding approximae local minima. 10
11 (a) a9a (b) covype (c) ijcnn ime in seconds (d) a9a ime in seconds (e) covype ime in seconds (f) ijcnn1 Figure 2: Logarihmic funcion value gap for nonlinear leas square on differen daases. (a), (b) and (c) presen he oracle complexiy comparison; (d), (e) and (f) presen he runime comparison. Baselines: We compare our algorihm wih Regularizaion () (Caris e al., 2011), Regularizaion () (Kohler and Lucchi, 2017), Regularizaion () (Tripuraneni e al., 2017) and Gradien Cubic Regularizaion () (Carmon and Duchi, 2016). All of above algorihms are carefully uned for a fair comparison. The Calculaion for SO calls: Here we lis he SO call each algorihm needs for one loop. For, each loop coss (B g +B h ) SO calls, where B g and B h o denoe he subsampling size of gradien and Hessian. For, each loop coss (n g + n h ) SO calls, where we use n g and n h o denoe he subsampling size of gradien and Hessian-vecor operaor. Gradien Cubic and cos n SO calls in each loop. Finally, we define he amoun of is he amoun of SO call divided by n. Parameer uning and subproblem solver: For each algorihm and each daase, we choose differen b g, b h, T for he bes performance. Meanwhile, we also use wo differen sraegies for choosing M s, : he firs one is o fix M s, = M in each ieraion, which is proved o enjoy good convergence performance; he oher one is o choose M s, = α/(1 + β) (s+/t ), α, β > 0 for each ieraion. This choice of parameer is similar o he choice of penaly parameer in Subsampled Cubic and, which someimes makes some algorihms behave beer in our experimen. As o he solver for subproblem (3.3) in each loop, we choose o use he Lanczos-ype mehod inroduced in Caris e al. (2011). Daases: The daases we use are a9a, covype, ijcnn1, which are common daases used in ERM problems. The deailed informaion abou hese daases are in Table 2. Non-convex regularized logisic regression: The firs nonconvex problem we choose is 11
12 (a) a9a (b) covype (c) ijcnn ime in seconds (d) a9a ime in seconds (e) covype ime in seconds (f) ijcnn1 Figure 3: Logarihmic funcion value gap for robus linear regression on differen daases. (a), (b) and (c) presen he oracle complexiy comparison; (d), (e) and (f) presen he runime comparison. Table 2: Overview of he daases used in our experimens Daase sample size n dimension d a9a 32, covype 581, ijcnn1 35, a binary logisic regression problem wih a non-convex regularizer d i=1 λw2 (i) /(1 + w2 (i) ) (Reddi e al., 2016b). More specifically, suppose we are given raining daa {x i, y i } n i=1, where x i R d and y i {0, 1} are feaure vecors and labels corresponding o he i-h daa poins. The minimizaion problem is as follows 1 min w R d n n y i log φ(x T i w) + (1 y i ) log[1 φ(x T i w)] + i=1 d λw(i) 2 /(1 + w2 (i) ), where φ(x) = 1/(1 + exp( x)) is he sigmoid funcion. We fix λ = 10 in our experimens. The experimen resuls are shown in Figure 1. Nonlinear linear squares: The second problem is a non-linear leas squares problem which focuses on he ask of binary linear classificaion (Xu e al., 2017a). Given raining daa {x i, y i } n i=1, where x i R d and y i {0, 1} are feaure vecors and labels corresponding o he i-h daa poins. i=1 12
13 The minimizaion problem is 1 min w R d n n [y i φ(x T i w)] 2 i=1 where φ(x) = 1/(1 + exp( x)) is he sigmoid funcion. The experimen resuls are shown in Figure 2. Robus linear regression: The hird problem is a robus linear regression problem where we use a non-convex robus loss funcion log(x 2 /2 + 1) (Barron, 2017) insead of square loss in leas square regression. Given a raining sample {x i, y i } n i=1, where x i R d and y i {0, 1} are feaure vecors and labels corresponding o he i-h daa poin. The minimizaion problem is 1 min w R d n n η(y i x T i w), i=1 where η(x) = log(x 2 /2 + 1). The experimenal resuls are shown in Figure 3. From Figures 1, 2 and 3, we can see ha our algorihm ouperforms all he oher baseline algorihms on all he daases. The only excepion happens in he non-linear leas square problem and he robus linear regression problem on he covype daase, where our algorihm behaves a lile worse han a he high accuracy regime in erms of epoch couns. However, under his seing, our algorihm sill ouperforms he oher baselines in erms of he cpu ime. 7 Conclusions In his paper, we propose a novel second-order algorihm for non-convex opimizaion called. Our algorihm is he firs algorihm which improves he oracle complexiy of cubic regularizaion and is subsampled varians under cerain regime using variance reducion echniques. We also show ha similar oracle complexiy also holds wih inexac oracle. Under boh seings our algorihm ouperforms he sae-of-he-ar. References Agarwal, N., Allenzhu, Z., Bullins, B., Hazan, E. and Ma, T. (2017). Finding approximae local minima for nonconvex opimizaion in linear ime. Allen-Zhu, Z. (2017). Naasha 2: Faser non-convex opimizaion han sgd. arxiv preprin arxiv: Allen-Zhu, Z. and Hazan, E. (2016). Variance reducion for faser non-convex opimizaion. In Inernaional Conference on Machine Learning. Allen-Zhu, Z. and Li, Y. (2017). Neon2: Finding Local Minima via Firs-Order Oracles. ArXiv e-prins abs/ Full version available a hp://arxiv.org/abs/ Barron, J. T. (2017). A more general robus loss funcion. arxiv preprin arxiv:
14 Carmon, Y. and Duchi, J. C. (2016). Gradien descen efficienly finds he cubic-regularized non-convex newon sep. Carmon, Y., Duchi, J. C., Hinder, O. and Sidford, A. (2016). Acceleraed mehods for non-convex opimizaion. Caris, C., Gould, N. I. and Toin, P. L. (2011). Adapive cubic regularisaion mehods for unconsrained opimizaion. par i: moivaion, convergence and numerical resuls. Mahemaical Programming Defazio, A., Bach, F. and Lacose-Julien, S. (2014). Saga: A fas incremenal gradien mehod wih suppor for non-srongly convex composie objecives. In Advances in Neural Informaion Processing Sysems. Erdogdu, M. A. and Monanari, A. (2015). Convergence raes of sub-sampled newon mehods. In Proceedings of he 28h Inernaional Conference on Neural Informaion Processing Sysems- Volume 2. MIT Press. Garber, D. and Hazan, E. (2015). Fas and simple pca via convex opimizaion. arxiv preprin arxiv: Ge, R., Huang, F., Jin, C. and Yuan, Y. (2015). Escaping from saddle poinsonline sochasic gradien for ensor decomposiion. In Conference on Learning Theory. Ge, R., Lee, J. D. and Ma, T. (2016). Marix compleion has no spurious local minimum. In Advances in Neural Informaion Processing Sysems. Ghadimi, S. and Lan, G. (2013). Sochasic firs-and zeroh-order mehods for nonconvex sochasic programming. SIAM Journal on Opimizaion Ghadimi, S. and Lan, G. (2016). Acceleraed gradien mehods for nonconvex nonlinear and sochasic programming. Mahemaical Programming Gower, R. M., Roux, N. L. and Bach, F. (2017). Tracking he gradiens using he hessian: A new look a variance reducing sochasic mehods. arxiv preprin arxiv: Hillar, C. J. and Lim, L.-H. (2013). Mos ensor problems are np-hard. Journal of he ACM (JACM) Jin, C., Ge, R., Nerapalli, P., Kakade, S. M. and Jordan, M. I. (2017a). How o escape saddle poins efficienly. Jin, C., Nerapalli, P. and Jordan, M. I. (2017b). Acceleraed gradien descen escapes saddle poins faser han gradien descen. arxiv preprin arxiv: Johnson, R. and Zhang, T. (2013). Acceleraing sochasic gradien descen using predicive variance reducion. In Advances in neural informaion processing sysems. Kohler, J. M. and Lucchi, A. (2017). Sub-sampled cubic regularizaion for non-convex opimizaion. arxiv preprin arxiv:
15 LeCun, Y., Bengio, Y. and Hinon, G. (2015). Deep learning. Naure Lucchi, A., McWilliams, B. and Hofmann, T. (2015). A variance reduced sochasic newon mehod. arxiv preprin arxiv: Moriz, P., Nishihara, R. and Jordan, M. (2016). A linearly-convergen sochasic l-bfgs algorihm. In Arificial Inelligence and Saisics. Neserov, Y. and Polyak, B. T. (2006). Cubic regularizaion of newon mehod and is global performance. Mahemaical Programming Reddi, S. J., Hefny, A., Sra, S., Poczos, B. and Smola, A. (2016a). Sochasic variance reducion for nonconvex opimizaion Reddi, S. J., Sra, S., Póczos, B. and Smola, A. (2016b). Fas incremenal mehod for smooh nonconvex opimizaion. In Decision and Conrol (CDC), 2016 IEEE 55h Conference on. IEEE. Robbins, H. and Monro, S. (1951). A sochasic approximaion mehod. The annals of mahemaical saisics Rodomanov, A. and Kropoov, D. (2016). A superlinearly-convergen proximal newon-ype mehod for he opimizaion of finie sums. In Inernaional Conference on Machine Learning. Roosakhorasani, F. and Mahoney, M. W. (2016a). Sub-sampled newon mehods i: Globally convergen algorihms. Roosakhorasani, F. and Mahoney, M. W. (2016b). Sub-sampled newon mehods ii: Local convergence raes. Roux, N. L., Schmid, M. and Bach, F. R. (2012). A sochasic gradien mehod wih an exponenial convergence rae for finie raining ses. In Advances in Neural Informaion Processing Sysems. Royer, C. W. and Wrigh, S. J. (2017). Complexiy analysis of second-order line-search algorihms for smooh nonconvex opimizaion. arxiv preprin arxiv: Shalev-Shwarz, S. (2016). Sdca wihou dualiy, regularizaion, and individual convexiy. In Inernaional Conference on Machine Learning. Tripuraneni, N., Sern, M., Jin, C., Regier, J. and Jordan, M. I. (2017). Sochasic cubic regularizaion for fas nonconvex opimizaion. arxiv preprin arxiv: Tropp, J. A. e al. (2015). An inroducion o marix concenraion inequaliies. Foundaions and Trends R in Machine Learning Wai, H.-T., Shi, W., Nedic, A. and Scaglione, A. (2017). aggregaed gradien mehod. arxiv preprin arxiv: Curvaure-aided incremenal Xiao, L. and Zhang, T. (2014). A proximal sochasic gradien mehod wih progressive variance reducion. SIAM Journal on Opimizaion
16 Xu, P., Roosa-Khorasan, F. and Mahoney, M. W. (2017a). Second-order opimizaion for non-convex machine learning: An empirical sudy. arxiv preprin arxiv: Xu, P., Roosa-Khorasani, F. and Mahoney, M. W. (2017b). Newon-ype mehods for non-convex opimizaion under inexac hessian informaion. arxiv preprin arxiv: Xu, P., Yang, J., Roosa-Khorasani, F., R, C. and Mahoney, M. W. (2016). Sub-sampled newon mehods wih non-uniform sampling. Xu, Y., Jin, R. and Yang, T. (2017c). Neon+: Acceleraed gradien mehods for exracing negaive curvaure for non-convex opimizaion. arxiv preprin arxiv: Ye, H., Luo, L. and Zhang, Z. (2017). Approximae newon mehods and heir local convergence. In Inernaional Conference on Machine Learning. 16
A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs
PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers
More informationSupplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence
Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3
More informationMATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018
MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren
More informationLecture 9: September 25
0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:
More informationA Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization
A Forward-Backward Spliing Mehod wih Componen-wise Lazy Evaluaion for Online Srucured Convex Opimizaion Yukihiro Togari and Nobuo Yamashia March 28, 2016 Absrac: We consider large-scale opimizaion problems
More informationNotes on online convex optimization
Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec
More informationVehicle Arrival Models : Headway
Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where
More informationTechnical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.
Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear
More informationOnline Convex Optimization Example And Follow-The-Leader
CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion
More informationLecture 20: Riccati Equations and Least Squares Feedback Control
34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he
More informationLecture 2 October ε-approximation of 2-player zero-sum games
Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion
More informationThe Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales
Advances in Dynamical Sysems and Applicaions. ISSN 0973-5321 Volume 1 Number 1 (2006, pp. 103 112 c Research India Publicaions hp://www.ripublicaion.com/adsa.hm The Asympoic Behavior of Nonoscillaory Soluions
More informationRobust estimation based on the first- and third-moment restrictions of the power transformation model
h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,
More informationPhysics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle
Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,
More informationChapter 3 Boundary Value Problem
Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le
More informationRevisiting Projection-Free Optimization for Strongly Convex Constraint Sets
Revisiing Projecion-Free Opimizaion for Srongly Convex Consrain Ses Jarrid Recor-Brooks 2260 Hayward S Ann Arbor, MI, 4804 Universiy of Michigan, Ann Arbor jrecorb@umich.edu Jun-Kun Wang 226 Fers Drive
More informationLecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.
Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in
More information1 Review of Zero-Sum Games
COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any
More informationChapter 2. First Order Scalar Equations
Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.
More informationarxiv: v2 [math.oc] 4 Apr 2016
Sochasic Variance Reducion for Nonconvex Opimizaion arxiv:1603.06160v [mah.oc] 4 Apr 016 Sashank J. Reddi sjakkamr@cs.cmu.edu Carnegie Mellon Universiy Suvri Sra suvri@mi.edu Massachuses Insiue of Technology
More informationOptimality Conditions for Unconstrained Problems
62 CHAPTER 6 Opimaliy Condiions for Unconsrained Problems 1 Unconsrained Opimizaion 11 Exisence Consider he problem of minimizing he funcion f : R n R where f is coninuous on all of R n : P min f(x) x
More information3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon
3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of
More informationT L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB
Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal
More informationCHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK
175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning
More informationRandom Walk with Anti-Correlated Steps
Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and
More informationMulti-scale 2D acoustic full waveform inversion with high frequency impulsive source
Muli-scale D acousic full waveform inversion wih high frequency impulsive source Vladimir N Zubov*, Universiy of Calgary, Calgary AB vzubov@ucalgaryca and Michael P Lamoureux, Universiy of Calgary, Calgary
More informationAn introduction to the theory of SDDP algorithm
An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking
More informationPENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD
PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.
More informationMatrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality
Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]
More informationEnsamble methods: Bagging and Boosting
Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par
More informationLinear Response Theory: The connection between QFT and experiments
Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and
More informationInventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions
Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.
More informationELE 538B: Large-Scale Optimization for Data Science. Quasi-Newton methods. Yuxin Chen Princeton University, Spring 2018
ELE 538B: Large-Scale Opimizaion for Daa Science Quasi-Newon mehods Yuxin Chen Princeon Universiy, Spring 208 00 op ff(x (x)(k)) f p 2 L µ f 05 k f (xk ) k f (xk ) =) f op ieraions converges in only 5
More informationLecture 33: November 29
36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure
More informationSingle and Double Pendulum Models
Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double
More information23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes
Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals
More informationA New Perturbative Approach in Nonlinear Singularity Analysis
Journal of Mahemaics and Saisics 7 (: 49-54, ISSN 549-644 Science Publicaions A New Perurbaive Approach in Nonlinear Singulariy Analysis Ta-Leung Yee Deparmen of Mahemaics and Informaion Technology, The
More informationNotes for Lecture 17-18
U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up
More informationTwo Coupled Oscillators / Normal Modes
Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own
More informationEnsamble methods: Boosting
Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room
More information12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =
1: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME Moving Averages Recall ha a whie noise process is a series { } = having variance σ. The whie noise process has specral densiy f (λ) = of
More informationGRADIENT ESTIMATES FOR A SIMPLE PARABOLIC LICHNEROWICZ EQUATION. Osaka Journal of Mathematics. 51(1) P.245-P.256
Tile Auhor(s) GRADIENT ESTIMATES FOR A SIMPLE PARABOLIC LICHNEROWICZ EQUATION Zhao, Liang Ciaion Osaka Journal of Mahemaics. 51(1) P.45-P.56 Issue Dae 014-01 Tex Version publisher URL hps://doi.org/10.18910/9195
More informationSTATE-SPACE MODELLING. A mass balance across the tank gives:
B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing
More informationFinal Spring 2007
.615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o
More informationApplication of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing
Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology
More informationEXERCISES FOR SECTION 1.5
1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler
More informationdi Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.
di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x
More informationVariational Iteration Method for Solving System of Fractional Order Ordinary Differential Equations
IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 1, Issue 6 Ver. II (Nov - Dec. 214), PP 48-54 Variaional Ieraion Mehod for Solving Sysem of Fracional Order Ordinary Differenial
More informationChristos Papadimitriou & Luca Trevisan November 22, 2016
U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream
More informationA Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization
1 A Decenralized Second-Order Mehod wih Exac Linear Convergence Rae for Consensus Opimizaion Aryan Mokhari, Wei Shi, Qing Ling, and Alejandro Ribeiro Absrac This paper considers decenralized consensus
More informationL07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms
L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)
More informationKriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number
More informationProblem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims
Problem Se 5 Graduae Macro II, Spring 2017 The Universiy of Nore Dame Professor Sims Insrucions: You may consul wih oher members of he class, bu please make sure o urn in your own work. Where applicable,
More informationNature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.
Supplemenary Figure 1 Spike-coun auocorrelaions in ime. Normalized auocorrelaion marices are shown for each area in a daase. The marix shows he mean correlaion of he spike coun in each ime bin wih he spike
More informationPrimal-Dual Splitting: Recent Improvements and Variants
Primal-Dual Spliing: Recen Improvemens and Varians 1 Thomas Pock and 2 Anonin Chambolle 1 Insiue for Compuer Graphics and Vision, TU Graz, Ausria 2 CMAP & CNRS École Polyechnique, France The proximal poin
More informationAppendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection
Appendix o Online l -Dicionary Learning wih Applicaion o Novel Documen Deecion Shiva Prasad Kasiviswanahan Huahua Wang Arindam Banerjee Prem Melville A Background abou ADMM In his secion, we give a brief
More information6.2 Transforms of Derivatives and Integrals.
SEC. 6.2 Transforms of Derivaives and Inegrals. ODEs 2 3 33 39 23. Change of scale. If l( f ()) F(s) and c is any 33 45 APPLICATION OF s-shifting posiive consan, show ha l( f (c)) F(s>c)>c (Hin: In Probs.
More informationOn Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature
On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check
More informationLecture Notes 2. The Hilbert Space Approach to Time Series
Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship
More informationKINEMATICS IN ONE DIMENSION
KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec
More informationClass Meeting # 10: Introduction to the Wave Equation
MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion
More informationLecture 4 Notes (Little s Theorem)
Lecure 4 Noes (Lile s Theorem) This lecure concerns one of he mos imporan (and simples) heorems in Queuing Theory, Lile s Theorem. More informaion can be found in he course book, Bersekas & Gallagher,
More informationDiebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles
Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance
More informationMean-square Stability Control for Networked Systems with Stochastic Time Delay
JOURNAL OF SIMULAION VOL. 5 NO. May 7 Mean-square Sabiliy Conrol for Newored Sysems wih Sochasic ime Delay YAO Hejun YUAN Fushun School of Mahemaics and Saisics Anyang Normal Universiy Anyang Henan. 455
More informationAn Introduction to Malliavin calculus and its applications
An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214
More informationHamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:
M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno
More informationON THE BEAT PHENOMENON IN COUPLED SYSTEMS
8 h ASCE Specialy Conference on Probabilisic Mechanics and Srucural Reliabiliy PMC-38 ON THE BEAT PHENOMENON IN COUPLED SYSTEMS S. K. Yalla, Suden Member ASCE and A. Kareem, M. ASCE NaHaz Modeling Laboraory,
More informationDual Averaging Methods for Regularized Stochastic Learning and Online Optimization
Dual Averaging Mehods for Regularized Sochasic Learning and Online Opimizaion Lin Xiao Microsof Research Microsof Way Redmond, WA 985, USA lin.xiao@microsof.com Revised March 8, Absrac We consider regularized
More informationGMM - Generalized Method of Moments
GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................
More informationParticle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems
Paricle Swarm Opimizaion Combining Diversificaion and Inensificaion for Nonlinear Ineger Programming Problems Takeshi Masui, Masaoshi Sakawa, Kosuke Kao and Koichi Masumoo Hiroshima Universiy 1-4-1, Kagamiyama,
More informationRecursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems
8 Froniers in Signal Processing, Vol. 1, No. 1, July 217 hps://dx.doi.org/1.2266/fsp.217.112 Recursive Leas-Squares Fixed-Inerval Smooher Using Covariance Informaion based on Innovaion Approach in Linear
More informationOnline Appendix to Solution Methods for Models with Rare Disasters
Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,
More informationSimulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010
Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid
More informationNetwork Newton Distributed Optimization Methods
Nework Newon Disribued Opimizaion Mehods Aryan Mokhari, Qing Ling, and Alejandro Ribeiro Absrac We sudy he problem of minimizing a sum of convex objecive funcions where he componens of he objecive are
More informationOn Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes
More informationCHAPTER 2 Signals And Spectra
CHAPER Signals And Specra Properies of Signals and Noise In communicaion sysems he received waveform is usually caegorized ino he desired par conaining he informaion, and he undesired par. he desired par
More informationarxiv: v3 [cs.lg] 9 Apr 2017
Riemannian sochasic variance reduced gradien on Grassmann manifold Hiroyuki Kasai Hiroyuki Sao Bamdev Mishra April 11, 2017 arxiv:1605.07367v3 cs.lg] 9 Apr 2017 Absrac Sochasic variance reducion algorihms
More information10. State Space Methods
. Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he
More information11!Hí MATHEMATICS : ERDŐS AND ULAM PROC. N. A. S. of decomposiion, properly speaking) conradics he possibiliy of defining a counably addiive real-valu
ON EQUATIONS WITH SETS AS UNKNOWNS BY PAUL ERDŐS AND S. ULAM DEPARTMENT OF MATHEMATICS, UNIVERSITY OF COLORADO, BOULDER Communicaed May 27, 1968 We shall presen here a number of resuls in se heory concerning
More informationChapter 6. Systems of First Order Linear Differential Equations
Chaper 6 Sysems of Firs Order Linear Differenial Equaions We will only discuss firs order sysems However higher order sysems may be made ino firs order sysems by a rick shown below We will have a sligh
More informationSpeaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis
Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions
More information23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes
Half-Range Series 2.5 Inroducion In his Secion we address he following problem: Can we find a Fourier series expansion of a funcion defined over a finie inerval? Of course we recognise ha such a funcion
More informationarxiv: v4 [cs.lg] 20 Dec 2016
Asynchronous Sochasic Gradien Descen wih Variance Reducion for Non-Convex Opimizaion Zhouyuan Huo zhouyuan.huo@mavs.ua.edu Heng Huang heng@ua.edu arxiv:604.03584v4 cs.lg 20 Dec 206 Decemer 2, 206 Asrac
More informationt is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...
Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger
More informationSGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. Nguyen 1 2 Phuong Ha Nguyen 3 Maren van Dijk 3 Peer Richárik 4 Kaya Scheinberg 1 Marin Takáč 1 Absrac Sochasic gradien descen (SGD) is he opimizaion algorihm of choice in many machine learning applicaions
More informationIMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013
IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher
More informationAnti-Disturbance Control for Multiple Disturbances
Workshop a 3 ACC Ani-Disurbance Conrol for Muliple Disurbances Lei Guo (lguo@buaa.edu.cn) Naional Key Laboraory on Science and Technology on Aircraf Conrol, Beihang Universiy, Beijing, 9, P.R. China. Presened
More informationRapid Termination Evaluation for Recursive Subdivision of Bezier Curves
Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN
Inernaional Journal of Scienific & Engineering Research, Volume 4, Issue 10, Ocober-2013 900 FUZZY MEAN RESIDUAL LIFE ORDERING OF FUZZY RANDOM VARIABLES J. EARNEST LAZARUS PIRIYAKUMAR 1, A. YAMUNA 2 1.
More informationState-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter
Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when
More informationVanishing Viscosity Method. There are another instructive and perhaps more natural discontinuous solutions of the conservation law
Vanishing Viscosiy Mehod. There are anoher insrucive and perhaps more naural disconinuous soluions of he conservaion law (1 u +(q(u x 0, he so called vanishing viscosiy mehod. This mehod consiss in viewing
More informationOrdinary Differential Equations
Ordinary Differenial Equaions 5. Examples of linear differenial equaions and heir applicaions We consider some examples of sysems of linear differenial equaions wih consan coefficiens y = a y +... + a
More informationMost Probable Phase Portraits of Stochastic Differential Equations and Its Numerical Simulation
Mos Probable Phase Porrais of Sochasic Differenial Equaions and Is Numerical Simulaion Bing Yang, Zhu Zeng and Ling Wang 3 School of Mahemaics and Saisics, Huazhong Universiy of Science and Technology,
More informationRegularization Paths with Guarantees for Convex Semidefinite Optimization
Regularizaion Pahs wih Guaranees for Convex Semidefinie Opimizaion Joachim Giesen Marin Jaggi Sören Laue Friedrich-Schiller-Universiy Jena ETH Zurich Friedrich-Schiller-Universiy Jena Absrac We devise
More information5.1 - Logarithms and Their Properties
Chaper 5 Logarihmic Funcions 5.1 - Logarihms and Their Properies Suppose ha a populaion grows according o he formula P 10, where P is he colony size a ime, in hours. When will he populaion be 2500? We
More informationAryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro. cost function n
IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 2, NO. 4, DECEMBER 2016 507 A Decenralized Second-Order Mehod wih Exac Linear Convergence Rae for Consensus Opimizaion Aryan Mokhari,
More informationModel Reduction for Dynamical Systems Lecture 6
Oo-von-Guericke Universiä Magdeburg Faculy of Mahemaics Summer erm 07 Model Reducion for Dynamical Sysems ecure 6 v eer enner and ihong Feng Max lanck Insiue for Dynamics of Complex echnical Sysems Compuaional
More informationA DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS
A DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS Xinping Guan ;1 Fenglei Li Cailian Chen Insiue of Elecrical Engineering, Yanshan Universiy, Qinhuangdao, 066004, China. Deparmen
More information