For any y 2C, any sub-gradient, v, of h at prox(x), i.e., v and by optimality of prox(x) in (3), we have

Size: px
Start display at page:

Download "For any y 2C, any sub-gradient, v, of h at prox(x), i.e., v and by optimality of prox(x) in (3), we have"

Transcription

1 A Proofs We now give the details for the proof of our main results, i.e., heorems and 2. Below, we outline the steps for the proof of FAG s heorem. he proof of heorem 2 for FARE follows the same line of reasoning. Also, we note that, in what follows, lemmas/corollaries required for the proof of heorem 2, are given immediately after those of FAG.. FAG is essentially a combination of mirror descent and proximal gradient descent steps (emmas and 4). 2. in Algorithm plays the role of an e ective gradient ipschitz constant in each iteration. he convergence rate of FAG ultimately depends on P P g S g. (emma 8 and Corollary 3) 3. By picing S adaptively lie in AdaGrad, we achieve a non-trivial upper bound for P. (emma 5) 4. FAG relies on picing an x at each iteration that satisfies an inequality involving (Corollary ). However, because is not nown prior to picing x, we must choose an x to roughly satisfy the inequality for all possible values of. We do this by picing x using binary search. (emmas 2 and 3 and Corollary ) 5. Finally, we need to pic the right stepsize for each iteration. Our scheme is very similar to the one used in [], but generalized to handle a di erent each iteration. (emmas 6 and 8 as well as Corollary 3). 6. heorem 3 combines items, 2 and 4, above. Finally, to prove heorem, we combine heorem 3 with items 3 and 5 above. A. Proof of heorem and heorem 2 First, we obtain the following ey result (similar to [4, emma 2.3]) regarding the vector p (prox(x) x), as in Step 3 of FAG, which is nown as the Gradient Mapping of F on C. emma (Gradient Mapping) For any x, y 2C, we have F (prox(x)) F (y)h(prox(x) x), y xi 2 x prox(x)2 2, where prox(x) is defined as in (3). In particular, F (prox(x)) F (x) 2 x prox(x)2 2. Proof of emma his result is the same as emma 2.3 in [4]. We bring its proof here for completeness. For any y 2C, any sub-gradient, v, of h at prox(x), i.e., v and by optimality of prox(x) in (3), we have 0 hrf(x)v (prox(x) x), y prox(x)i hrf(x)v (prox(x) x), y xi hrf(x) and so v (prox(x) x), x prox(x)i, hrf(x), prox(x) xi hrf(x)v (prox(x) x), y xi hv, x prox(x)i x prox(x) 2 2, Now from -ipschitz continuity of rf as well as convexity of f and h, we get F (prox(x)) f(prox(x)) h(prox(x)) f(x)hrf(x), prox(x) xi 2 prox(x) x2 2 h(prox(x)) f(x)hrf(x)v (prox(x) x), y xi hv, x prox(x)i 2 x prox(x)2 2 h(prox(x)) f(y)hv (prox(x) x), y xi hv, x prox(x)i 2 x prox(x)2 2 h(prox(x)) f(y)h(prox(x) x), y xi hv, y prox(x)i 2 x prox(x)2 2 h(prox(x)) F (y)h(prox(x) x), y xi 2 x prox(x)2 2. he following lemma establishes the ipschitz continuity of the prox operator. emma 2 (Prox Operator Continuity) prox : R d! R d is a 2-ipschitz continuous, that is, for any x, y 2C, we have prox(x) prox(y) 2 2x y 2. 2

2 Proof of emma 2 By Definition (3), for any x, y, z, z 0 2 C, v and w we have hv, z hw, z 0 prox(x)i hrf(x)(prox(x) x), z prox(x)i, prox(y)i hrf(y)(prox(y) y), z 0 prox(y)i. In particular, for z prox(y) and z 0 prox(z), we get emma 3 (Binary Search emma) et x BinarySearch(z, y, ) defined as in Algorithm 2. hen one of 3 cases happen: (i) x y and hprox(x) x, x zi 0, (ii) x z and hprox(x) x, y xi 0, or (iii) x ty ( t)z for some t 2 (0, ) and hprox(x) x, y zi 3y z 2 2. hv, prox(y) hw, prox(y) prox(x)i hrf(x)(prox(x) x), prox(y) prox(x)i, prox(x)i hrf(y)(prox(y) y), prox(x) prox(y)i. Proof of emma 3 Items (i) and (ii), are simply Steps 2 and 5, respectively. For item (iii), wehave x w 2 ty ( t)z t y ( t )z 2 By monotonicity of sub-gradient, we get hv, prox(y) prox(x)i hw, prox(y) prox(x)i. So hrf(x)(prox(x) x), prox(x) prox(y)i hrf(y)(prox(y) y), prox(x) prox(y)i, and as a result hrf(x)(prox(x) x), prox(x) prox(y)i hrf(x) (prox(x) prox(y)prox(y) x), prox(x) prox(y)i prox(x) prox(y) 2 2 hrf(x)(prox(y) x), prox(x) prox(y)i hrf(y)(prox(y) y), prox(x) prox(y)i, which gives prox(x) prox(y) 2 2 hrf(y) rf(x)(x y), prox(x) prox(y)i (rf(y) rf(x) 2 x y 2 ) prox(x) prox(y) 2 2x y 2 prox(x) prox(y) 2, (t t )y (t t )z 2 y z 2. Now it follows that hprox(x) x, y zi hprox(x) x, y zi hprox(w) w, y zi hprox(x) prox(w), y zi 2 hx w, y zi prox(x) prox(w) 2 y z 2 x w 2 y z 2 2x w 2 y z 2 x w 2 y z 2 3x w 2 y z 2 3 y z 2 2. Where the third inequality follows by emma 2 Using the above result, we can prove the following: Corollary et x, y, z and be defined as in Algorithm and. hen for all, hp, x z i( )hp, y x i D. and the result follows. Using prox operator continuity emma 2, we can conclude that given any y, z 2C,ifhprox(y) y, y zi < 0 and hprox(z) z, y zi > 0, then there must be a t 2 (0, ) for which w t y ( t )z gives hprox(w) w, y zi 0. Algorithm 2 finds an approximation to w in O(log / ) iterations. Proof of Corollary Note that by Step 3 of Algorithm ), p (prox(x ) x ). For, since x y z, the inequality is trivially true. For 2, we consider the three cases of emma 3: (i) if x y, the right hand side is / 0 and the left hand side is hp, x z i h (prox(x ) x ), x z i0, (ii) if x z, the left hand side 3

3 is 0 and hp, y x i h (prox(x ) x ), y x i 0, so the inequality holds trivially, and (iii) in this last case, for some t 2 (0, ), we have hp, x z i h (prox(x ) x ),ty ( t)z z i th(prox(x ) x ), y z i, and hp, y x i h (prox(x ) x ), y ty ( t)z i ( t)h(prox(x ) x ), (y z )i. Hence hp, x z i ( )hp, y x i hp, x z i ( )hp, y x i ( t ( )( t)) h(prox(x ) x ), (y z )i Next, we state a result regarding the mirror descent step. Similar results can be found in most texts on online optimization, e.g. []. emma 4 (Mirror Descent Inequality) et z arg min z2c h p, z z i 2 z z 2 S and D : sup x,y2c x y 2 be the diameter of C measured by infinity norm. hen for any u 2C, we have h p, z ui 2 2 p 2 S D 2 s 3 ( t ( )( t)) y z ( t) y z 2 2 3( )y z y z 2 2 6D y z 2 2 D D, 6d Proof of emma 4 For any u 2Cand by optimality of z,wehaveh p, z uihs (z z ), u z i. Hence, using (5) and (4), it follows that where in the last line we used the fact that y z 2 2 Dd Similar to for Algorithm, the following emma proves an analogous result for Algorithm 3. Corollary 2 et x, y, z and be defined as in Algorithm 3 and. hen for all, hp, x z i( )hp, y x i D. Proof of Corollary 2 We consider two cases:. If x is generated through Algorithm 5, then x BinarySearch(y, z, ) and,sothe statement follows from Corollary. 2. If x is generated through Algorithm 4, thenx y z, and so satisfies hp, x z i ( )hp, y x i h p, z ui h p, z z i h p, z ui h p, z z i hs (z z ), z ui h p, z z i 2 z z 2 S 2 z u 2 S 2 u z 2 S h p, zi sup z2r d 2 z2 S 2 z u 2 S 2 u z 2 S 2 2 p 2 S 2 u z 2 S 2 u z 2 S. Now recalling from Steps 5-7 of Algorithm that S diag(s ) I and s s,wesumover to 4

4 get h p, z ui 2 2 p 2 S 2 u z 2 S u z 2 S 2 u z 2 S 2 2 p 2 S 2 u z 2 S h(s S )(u z ), u z i p 2 S 2 u z 2 hs, i u z 2 hs s, i p 2 S D 2 hs, i D p 2 S D 2 s hs s, i 2 Finally, we state a similar result to that of [7] that captures the benefits of using S in FAG. emma 5 (AdaGrad Inequalities) Define q : P d i G (i, :) 2, where G is as in Step 5 of Algorithm. We have (i) P g S g 2q, (ii) q 2 min S2S P g S g, where S : {S 2 R d d S is diagonal, S ii > 0, trace(s) }, and (iii) p q p d. Proof of emma 5 o prove part (i), we use the following inequality introduced in the proof of emma 4 in [7]: for any arbitrary real-valued sequence of {a i } i and its vector representation as a : [a,a 2,...,a ], we have a 2 a : 2 2a : 2. 5 So it follows that g S g i i i 2q, g 2 (i) s 2 (i) g 2 (i) s (i) g 2 (i) G (i, :) 2 where the last equality follows from the definition of s in Step 6 of Algorithm. For the rest of the proof, one can easily see that g S g i g 2 (i) s(i) i a(i) s(i), where a(i) : P g2 (i) and s diag(s). Now the agrangian for 0 and 0, can be written as! a(i) (s,, ) s(i) s(i) h, si. i i Since the strong duality holds, for any primal-dual optimal solutions, S, and, it follows from complementary slacness that 0 (since s > 0). Now requiring )/@s(i) 0 gives s (i) p a i > 0, which since s (i) > 0, implies that > 0. As a result, by using complementary slacness again, we must have P d i s (i). Now simple algebraic calculations gives s (i) p ai /( P d i p ai ) and part (ii) follows. For part (iii), recall that g 2. Now, since min(s 0 ), one has g S g, and so q. One the other hand, consider the optimization problem v ux max G (i, :) 2 t gi 2() i i s.t. g 2 2,, 2,...,. he agrangian can be written as v ux ({g }, { } ) t gi 2() i! gi 2 (). i

5 By KK necessary condition, we require }, { } )/@g i() 0, which implies that /(2 q P g2 i ()), i, 2,...,d. Hence, P d P i g2 i () d/(4 2 ), and so 2 p d/, which gives q p d. We can now prove the central theorems of which is used to obtain FAG s main result. heorem 3 et D : sup x,y2c x y 2. For any u 2C, after iterations of Algorithm, we get n o 2 2 F (y ) F (u) 2 F (y ) D 2 D 2 s. Proof of heorem 3 Noting that p (y x ) is the gradient mapping of F 6 on C, it follows that (F (y ) (F (prox(x )) hp, x ui hp, (z u)i 2 2 p 2 S ( ) 2 2 p 2 2 hp, x D 2 s z i hp, x p 2 2 D 2 s 2 p 2 2 z i hp, x ( ) p 2 2 D 2 2 s ( )hp, y x i D 2 D 2 D 2 s ( ) (F (y ) F (y )). (emma ) Where the first inequality is by emma, the second inequality is by emma 4, the third equality is by Step 8 of Algorithm, and the second last inequality is by Corollary. Now we have 2 p 2 2 z i (F (y ) ( ) (F (y ) F (y )) F (y ) F (u) ( )F (y ) ( )F (y ) 2 F (y ) F (u) ( )F (y ) 2 F (y ) 2 F (y ) F (u) ( )F (y ) 2 F (y ) 2 2 F (y ) F (u),

6 and the result follows. Proof of heorem 4 Parts of this proof which di er from the proof of heorem 3 are bolded. Noting that p (y x ) is the gradient mapping of F on C, it follows that Once again, we present the analog of heorem 3 for Algorithm 3. heorem 4 et D : sup x,y2c x y 2. For any u 2C, after iterations of Algorithm, we get n 2 2 F (y ) 2 F (y ) D 2 D 2 s. o F (u) (F (y ) (F (prox(x )) hp, x ui hp, (z u)i 2 p p 2 S 2 p 2 2 hp, x D 2 s z i hp, x 2 p 2 2 ( ) p 2 2 D 2 2 s hp, x z i ( ) p 2 2 D 2 2 s ( )hp, y x i D 2 D 2 D 2 s ( ) (F (y ) F (y )). z i! Where the first inequality follows from emma, the second inequality follows from emma 4, the last equality follows from Steps 9 and of Alg 4, Steps8 and 9 of Alg 5, and the second last inequality follows from Corollary 2, and the last equality follows from emma. 7

7 Now we have (F (y ) ( ) (F (y ) F (y )) F (y ) F (u) ( )F (y ) ( )F (y ) 2 F (y ) F (u) ( )F (y ) 2 F (y ) 2 F (y ) F (u) ( )F (y ) 2 F (y ) 2 2 F (y ) and the result follows. F (u), We now set out to put the final piece of the proof in place: choosing the stepsize for the mirror descent step. emma 7 For the choice of in Algorithm 3 and, we have (i) 2 P i i, (ii) 2 2 0,and (iii). Proof of emma 7 Completely identical to proof of emma 6. Corollary 3 et D : sup x,y2c x y 2. For any u 2C, after iterations of Algorithm, we get F (y ) F (u) D 2 Ds 2 P. Proof of corollary 3 he result follows from heorem 3 and emma 6 as well as noting that 2 P i i P i i 2. he FARE analog: emma 6 For the choice of in Algorithm and, we have (i) 2 P i i, (ii) 2 2 0,and (iii). Corollary 4 et D : sup x,y2c x y 2. For any u 2C, after iterations of Algorithm 3, we get F (y ) F (u) D 2 Ds 2 P. Proof We prove (i) by induction. For, is is easy to verify that /, and so 2 and the base case follows trivially. Now suppose 2 P i i. Re-arranging (i) for gives X 0 2 i 2 2. i Now, it is easy to verify that the choice of in Algorithm is a solution of the above quadratic equation. he rest of the items follow immediately from part (i). Proof of corollary 4 he result follows from heorem 4 and emma 7 as well as noting that 2 P i i P i i 2. Finally, it only remains to lower bound P, which is done in the following emma. emma 8 For the choice of in Algorithm, we have 000 P Once again, the FARE analog of emma 6 is 8

8 Proof of emma 8 We prove by induction on. For, we have /, and the base case holds trivially. Suppose the desired relation holds for. We have ( ) P 2 s ( ) P 000 s ( ) P ( ) 3 P 000 s ( ) P 8000 P Where the first inequality is by the induction hypothesis on. Now if ( ) P 000 P, then we are done. Otherwise denoting : P, we must have that Hence, we get ( ) (3 2 3 ) 4 P. ( ) P ( ) P 000 P. v u 4 t P P Remar: We note here that we made little e ort to minimize constants, and that we used rather sloppy bounds such as /2. As a result, the constant appearing above is very conservative and a mere by product of our proof technique.. 2 emma 9 For the choice of in Algorithm 3, we have 000 P Proof of emma 9 Once again, exactly identical to the proof of emma 8, wehave 000 P Finally, using the guarantee that from Step of Algorithm 4 and Step 9 from Algorithm 5, we get the conclusion. he proof of FAG s main result, heorem, follows rather immediately. Proof of heorem he result follows immediately P from emma 8 and Corollary 3 and noting that P g S g 2q by emma 5 and s q by Step 6 of Algorithm and definition of q in emma 5. his gives F (y ) F (u) D 2 q2 000D 2 q2 00D 2. Now from emma 5, we see that : q 2 / 2 [,d]. Finally, the run-time per iteration follows from having to do log 2 (/ ) calls to bisection, each taing O( prox )time. he proof of FARE s main result, heorem 2, is obtained similarly to that of heorem. Proof of heorem 2 he result follows immediately P from emma 9 and Corollary 4 and noting that P g S g 2q by emma 5 and s q by Step 6 of Algorithm 4 and Step 5 of Algorithm 5 and definition of q in emma 5. his gives F (y ) F (u) D 2 q2 q2 00 D D 2 : q 2 / 2 [,d]. Now from emma 5, we see that Finally, we try to guess a suitable for log(d/ ) times, and resort to BinarySearch after. If we resort 9

9 to algorithm 5 (essentially BinarySeaerch), we mae log(/ ) calls to bisection, so overall the number of inner iterations per outer iteration is same as Algorithm. Each inner iteration taes O( prox )timein the worst case (if we have to resort to algorithm 5 each time). 20

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by Duality (Continued) Recall, the general primal problem is min f ( x), xx g( x) 0 n m where X R, f : X R, g : XR ( X). he Lagrangian is a function L: XR R m defined by L( xλ, ) f ( x) λ g( x) Duality (Continued)

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Nesterov s Optimal Gradient Methods

Nesterov s Optimal Gradient Methods Yurii Nesterov http://www.core.ucl.ac.be/~nesterov Nesterov s Optimal Gradient Methods Xinhua Zhang Australian National University NICTA 1 Outline The problem from machine learning perspective Preliminaries

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machine Learning Lecturer: Philippe Rigollet Lecture 3 Scribe: Mina Karzand Oct., 05 Previously, we analyzed the convergence of the projected gradient descent algorithm. We proved

More information

Lecture 19: Follow The Regulerized Leader

Lecture 19: Follow The Regulerized Leader COS-511: Learning heory Spring 2017 Lecturer: Roi Livni Lecture 19: Follow he Regulerized Leader Disclaimer: hese notes have not been subjected to the usual scrutiny reserved for formal publications. hey

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES JONATHAN LUK These notes discuss theorems on the existence, uniqueness and extension of solutions for ODEs. None of these results are original. The proofs

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

An E cient A ne-scaling Algorithm for Hyperbolic Programming

An E cient A ne-scaling Algorithm for Hyperbolic Programming An E cient A ne-scaling Algorithm for Hyperbolic Programming Jim Renegar joint work with Mutiara Sondjaja 1 Euclidean space A homogeneous polynomial p : E!R is hyperbolic if there is a vector e 2E such

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Convergent Iterative Algorithms in the 2-inner Product Space R n

Convergent Iterative Algorithms in the 2-inner Product Space R n Int. J. Open Problems Compt. Math., Vol. 6, No. 4, December 2013 ISSN 1998-6262; Copyright c ICSRS Publication, 2013 www.i-csrs.org Convergent Iterative Algorithms in the 2-inner Product Space R n Iqbal

More information

Lecture 25: Subgradient Method and Bundle Methods April 24

Lecture 25: Subgradient Method and Bundle Methods April 24 IE 51: Convex Optimization Spring 017, UIUC Lecture 5: Subgradient Method and Bundle Methods April 4 Instructor: Niao He Scribe: Shuanglong Wang Courtesy warning: hese notes do not necessarily cover everything

More information

arxiv: v3 [math.oc] 8 Jan 2019

arxiv: v3 [math.oc] 8 Jan 2019 Why Random Reshuffling Beats Stochastic Gradient Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo Parrilo arxiv:1510.08560v3 [math.oc] 8 Jan 2019 January 9, 2019 Abstract We analyze the convergence rate

More information

Notes on Some Methods for Solving Linear Systems

Notes on Some Methods for Solving Linear Systems Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

Convex Optimization Conjugate, Subdifferential, Proximation

Convex Optimization Conjugate, Subdifferential, Proximation 1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview

More information

Package spcov. R topics documented: February 20, 2015

Package spcov. R topics documented: February 20, 2015 Package spcov February 20, 2015 Type Package Title Sparse Estimation of a Covariance Matrix Version 1.01 Date 2012-03-04 Author Jacob Bien and Rob Tibshirani Maintainer Jacob Bien Description

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ???? DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Lecture 16: Introduction to Neural Networks

Lecture 16: Introduction to Neural Networks Lecture 16: Introduction to Neural Networs Instructor: Aditya Bhasara Scribe: Philippe David CS 5966/6966: Theory of Machine Learning March 20 th, 2017 Abstract In this lecture, we consider Bacpropagation,

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18 XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence) Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence) David Glickenstein December 7, 2015 1 Inner product spaces In this chapter, we will only consider the elds R and C. De nition 1 Let V be a vector

More information

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017 Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods Coordinate Descent, Directional Search, Derivative-Free Method) Pavel Dvurechensky Alexander Gasnikov

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Kiddie Talk - The Diamond Lemma and its applications

Kiddie Talk - The Diamond Lemma and its applications Kiddie Tal - The Diamond Lemma and its applications April 22, 2013 1 Intro Start with an example: consider the following game, a solitaire. Tae a finite graph G with n vertices and a function e : V (G)

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Proof Pearl: Bounding Least Common Multiples with Triangles

Proof Pearl: Bounding Least Common Multiples with Triangles Proof Pearl: Bounding Least Common Multiples with Triangles Hing-Lun Chan and Michael Norrish 2 joseph.chan@anu.edu.au Australian National University 2 Michael.Norrish@data6.csiro.au Canberra Research

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder 011/70 Stochastic first order methods in smooth convex optimization Olivier Devolder DISCUSSION PAPER Center for Operations Research and Econometrics Voie du Roman Pays, 34 B-1348 Louvain-la-Neuve Belgium

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Nesterov s Acceleration

Nesterov s Acceleration Nesterov s Acceleration Nesterov Accelerated Gradient min X f(x)+ (X) f -smooth. Set s 1 = 1 and = 1. Set y 0. Iterate by increasing t: g t 2 @f(y t ) s t+1 = 1+p 1+4s 2 t 2 y t = x t + s t 1 s t+1 (x

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

Absolute Value Programming

Absolute Value Programming O. L. Mangasarian Absolute Value Programming Abstract. We investigate equations, inequalities and mathematical programs involving absolute values of variables such as the equation Ax + B x = b, where A

More information

Maximization of Submodular Set Functions

Maximization of Submodular Set Functions Northeastern University Department of Electrical and Computer Engineering Maximization of Submodular Set Functions Biomedical Signal Processing, Imaging, Reasoning, and Learning BSPIRAL) Group Author:

More information

Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives

Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives Machine Learning Brett Bernstein Recitation 1: Gradients and Directional Derivatives Intro Question 1 We are given the data set (x 1, y 1 ),, (x n, y n ) where x i R d and y i R We want to fit a linear

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures 2 1 Borel Regular Measures We now state and prove an important regularity property of Borel regular outer measures: Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon

More information

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

More information

(1) is an invertible sheaf on X, which is generated by the global sections

(1) is an invertible sheaf on X, which is generated by the global sections 7. Linear systems First a word about the base scheme. We would lie to wor in enough generality to cover the general case. On the other hand, it taes some wor to state properly the general results if one

More information

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization 5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

A class of Smoothing Method for Linear Second-Order Cone Programming

A class of Smoothing Method for Linear Second-Order Cone Programming Columbia International Publishing Journal of Advanced Computing (13) 1: 9-4 doi:1776/jac1313 Research Article A class of Smoothing Method for Linear Second-Order Cone Programming Zhuqing Gui *, Zhibin

More information

Supplemental Material for Monte Carlo Sampling for Regret Minimization in Extensive Games

Supplemental Material for Monte Carlo Sampling for Regret Minimization in Extensive Games Supplemental Material for Monte Carlo Sampling for Regret Minimization in Extensive Games Marc Lanctot Department of Computing Science University of Alberta Edmonton, Alberta, Canada 6G E8 lanctot@ualberta.ca

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

WAITING FOR A BAT TO FLY BY (IN POLYNOMIAL TIME)

WAITING FOR A BAT TO FLY BY (IN POLYNOMIAL TIME) WAITING FOR A BAT TO FLY BY (IN POLYNOMIAL TIME ITAI BENJAMINI, GADY KOZMA, LÁSZLÓ LOVÁSZ, DAN ROMIK, AND GÁBOR TARDOS Abstract. We observe returns of a simple random wal on a finite graph to a fixed node,

More information

USA Mathematical Talent Search Round 2 Solutions Year 27 Academic Year

USA Mathematical Talent Search Round 2 Solutions Year 27 Academic Year 1/2/27. In the grid to the right, the shortest path through unit squares between the pair of 2 s has length 2. Fill in some of the unit squares in the grid so that (i) exactly half of the squares in each

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Matrix Secant Methods

Matrix Secant Methods Equation Solving g(x) = 0 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D:

More information

For those who want to skip this chapter and carry on, that s fine, all you really need to know is that for the scalar expression: 2 H

For those who want to skip this chapter and carry on, that s fine, all you really need to know is that for the scalar expression: 2 H 1 Matrices are rectangular arrays of numbers. hey are usually written in terms of a capital bold letter, for example A. In previous chapters we ve looed at matrix algebra and matrix arithmetic. Where things

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.85J / 8.5J Advanced Algorithms Fall 008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 8.5/6.85 Advanced Algorithms

More information

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels Mehdi Mohseni Department of Electrical Engineering Stanford University Stanford, CA 94305, USA Email: mmohseni@stanford.edu

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016 Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016 1. Let V be a vector space. A linear transformation P : V V is called a projection if it is idempotent. That

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

Size-Depth Tradeoffs for Boolean Formulae

Size-Depth Tradeoffs for Boolean Formulae Size-Depth Tradeoffs for Boolean Formulae Maria Luisa Bonet Department of Mathematics Univ. of Pennsylvania, Philadelphia Samuel R. Buss Department of Mathematics Univ. of California, San Diego July 3,

More information

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

More information

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto

More information

Radial Subgradient Descent

Radial Subgradient Descent Radial Subgradient Descent Benja Grimmer Abstract We present a subgradient method for imizing non-smooth, non-lipschitz convex optimization problems. The only structure assumed is that a strictly feasible

More information

GRADIENT = STEEPEST DESCENT

GRADIENT = STEEPEST DESCENT GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information