Accelerated gradient methods
|
|
- Alicia Porter
- 5 years ago
- Views:
Transcription
1 ELE 538B: Large-Scale Optimization for Data Science Accelerated gradient methods Yuxin Chen Princeton University, Spring 018
2 Outline Heavy-ball methods Nesterov s accelerated gradient methods Accelerated proximal gradient methods (FISTA) Convergence analysis Lower bounds
3 (Proximal) gradient methods Iteration complexities of (proximal) gradient methods strongly convex and smooth problems ( O κ log 1 ) ε convex and smooth problems ( ) 1 O ε Can one still hope to further accelerate convergence? Accelerated GD 7-3
4 Issues and possible solutions Issues: GD focuses on improving cost per iteration, which might be too short-sighted GD might sometimes zigzag / experience abrupt changes Solutions: exploit information from history / past iterates add buffer (like momentum) to yield smoother trajectory Accelerated GD 7-4
5 Heavy-ball methods Polyak 64
6 Heavy-ball method minimize x R n f(x) B. Polyak x t+1 = x t η t f(x t ) + θ t (x t x t 1 ) }{{} momentum term add inertia to the ball (i.e. include momentum term) to mitigate zigzagging Accelerated GD 7-6
7 Heavy-ball method x ú gradient descent x ú heavy-ball method Accelerated GD 7-7
8 State-space models One can understand heavy-ball methods through dynamics of following dynamical system [ ] x t+1 ] x t = [ (1 + θt )I θ t I I 0 ] [ ] [ x t ηt f(x x t 1 t ) 0 or equivalently, [ ] [ x t+1 x (1 + θt )I θ x t x = t I I 0 }{{} state ] [ x t x x t 1 x ] [ ηt f(x t ) 0 [ 1 ] [ (1 + θt )I η = t 0 f(x τ )dτ θ t I x t x I 0 x t 1 x }{{} system matrix ] ] with x τ = x t + τ(x x t ), where last line comes from fundamental theorem of calculus Accelerated GD 7-8
9 System matrix [ ] [ x t+1 x ( ) 1 ] [ ] 1 + θt I ηt x t x = 0 f(x τ )dτ θ t I x t x I 0 x t 1 x }{{} :=H t (system matrix) (7.1) implication: convergence of heavy-ball methods depends on spectrum of system matrix H t key idea: find appropriate stepsize η t and momentum coefficient θ t to control spectrum of H Accelerated GD 7-9
10 Convergence for strongly convex problems Theorem 7.1 (Convergence of heavy-ball methods for strongly convex functions) Suppose f is µ-strongly convex and L-smooth. Set κ = L/µ, η t 4/( L + µ), θ t max { 1 η t L, 1 η t µ }. Then [ ] x t+1 x x t x ( κ 1 κ + 1 ) t [ x 1 x x 0 x ] iteration complexity: O( κ log 1 ε ) significant improvement over GD: O( κ log 1 ε ) vs. O(κ log 1 ε ) relies on knowledge of both L and µ Accelerated GD 7-
11 Proof of Theorem 7.1 In view of (7.1), it suffices to control spectrum of H t. Let λ i denote ith eigenvalue of λ f(x τ )dτ and set Λ :=..., then λ n [ ( ) ] H t = 1 + θt I ηt Λ θ t I t I 0 [ ] max 1 + θt η t λ i θ t 1 i n 1 0 To finish proof, it suffices to show [ max 1 + θt η t λ i θ t i 1 0 ] κ 1 κ + 1 (7.) Accelerated GD 7-11
12 Proof of Theorem 7.1 To show (7.), note that two eigenvalues of roots of z (1 + θ t η t λ i )z + θ t = 0 [ 1 + θt η t λ i θ t 1 0 ] are If (1 + θ t η t λ i ) 4θ t, then roots of this equation have same magnitudes θt (as they are either both imaginary or there is only one root). In addition, one can easily check that (1 + θ t η t λ i ) 4θ t is satisfied if θ t [( 1 η t λ i ), ( 1 + ηt λ i ) ], (7.3) which would hold if one picks θ t = max {( 1 η t L ), ( 1 ηt µ ) } Accelerated GD 7-1
13 Proof of Theorem 7.1 With this choice of θ t, we have H t θ t 4 Finally, setting η t = ( L+ ensures 1 η µ) t L = (1 η t µ), which yields ( ) ( θ t = max 1 L, 1 ) µ L + µ L + µ = This in turn establishes H t κ 1 κ + 1 ( ) κ 1 κ + 1 Accelerated GD 7-13
14 Nesterov s accelerated gradient methods
15 Convex case minimize x R n f(x) For strongly convex f, including momentum terms allows to improve iteration complexity from O ( κ log 1 ) ( ) ε to O κ log 1 ε Can we obtain improvement for convex case (without strong convexity) as well? Accelerated GD 7-15
16 Nesterov s idea Nesterov 83 x t+1 = y t η t f(y t ) y t+1 = x t+1 + t ( x t+1 x t) t + 3 Y. Nesterov alternates between gradient updates and proper extrapolation each iteration takes nearly same cost as GD not a descent method (i.e. we may not have f(x t+1 ) f(x t )) one of most beautiful and mysterious results in optimization... Accelerated GD 7-16
17 Convergence of Nesterov s accelerated gradient method Suppose f is convex and L-smooth. If η t η = 1/L, then iteration complexity: O ( 1 ε ) f(x t ) f opt L x0 x (t + 1) much faster than gradient methods we ll provide proof for (more general) proximal version later Accelerated GD 7-17
18 Interpretation using differential equations Nesterov s momentum coefficient mysterious t t+3 = 1 3 t is particularly Accelerated GD 7-18
19 has Consider 1/ > of 1/ and note that P = W 1/ RW version > (X) P := W 1/isBrescaled B>W B B WR: P = F( => )rf = U U >. diagonal elements are effective resistances. Given th Interpretationwhose using differential equations As a result, leverage is recaled effective resistance has 3 3 score w.r.t. and note that P = W 1/ RW 1/ is rescaled version of R ping With forceconstant > > x ls klg kxls klg X prob., X kxls ( > )resistances. = U UGiven. whose diagonal elements Pare=effective t ime-varying dampling force spring force has F = (X) leverage score w.r.t. As arf result, is recaled effective resis > P ls= x (ls k> ) kx =k U U >. ls With constant prob., kx LG LG As a result, leverage score w.r.t. is recaled effective res Fls= (X)ls k With constant prob., kxls x klg rf kx LG F = rf (X) damping force 3 X 3 X time-varying dampling force spring force damping force 3 X 3 X To develop insights into why Nesterov s method works so well, it s helpful to look at its continuous limits (ηt 0), which is given by second-order ordinary differential equations (ODE).. X(τ ) + 3/τ {z} dampling coefficient Accelerated GD. X(τ ) + f (X(τ )) = 0 {z potential } Su, Boyd, Candes
20 Heuristic derivation of ODE To begin with, Nesterov s update rule is equivalent to x t+1 x t = t 1 η t + x t x t 1 η η f(y t ) (7.4) Let t = τ η. Set X(τ) x τ/ η = x t and X(τ + η) x t+1. Then Taylor expansion gives x t+1 x t Ẋ(τ) X(τ) η η x t x t 1 Ẋ(τ) 1.. X(τ) η η which combined with (7.4) yields. X(τ) X(τ) ( η 1 3 ) η 1.. (Ẋ(τ) X(τ) ) η η f ( X(τ) ) τ =... X(τ) + 3 X(τ) + f ( X(τ) ) 0 τ Accelerated GD 7-0
21 Convergence rate of ODE... X + 3 X + f(x) = 0 (7.5) τ It is not hard to show that this ODE obeys ( ) 1 f(x(τ)) f opt O τ (7.6) which somehow explains Nesterov s O(1/t ) convergence Accelerated GD 7-1
22 Proof of (7.6) Define E(τ) = τ ( f(x) f opt) + X + τ X X. This obeys }{{} Lyapunov function / energy function. E = τ ( f(x) f opt) + τ. f(x), X + 4 X + τ. X X, 3. X + τ.. X (i) = τ ( f(x) f opt) τ X X, f ( X ) (by convexity) 0 where (i) follows by replacing τ.. X + 3Ẋ with τ f( X ) This means E is non-decreasing in τ, and hence (defn of E) opt E(τ) f(x(τ)) f τ E(0) ( ) 1 τ = O τ. Accelerated GD 7-
23 Magic number 3... X + 3 X + f(x) = 0 τ 3 is smallest constant that guarantees O(1/τ ) convergence, and can be replaced by any other α 3 in some sense, 3 minimizes pre-constant in convergence bound O(1/τ ) (see Su, Boyd, Candes 14) Accelerated GD 7-3
24 minimize log m P i=1 ng as xk œ x0 + span{òf (x0 ),, Òf (xk 1 )} for all 1 Æ k estingly, no first-order methods can improve upon Nesterov s t in general Example T Numerical exp(a i x + bi ) example taken from UCLA EE36C Convergence analysis precisely, convex and L-smooth function f s.t. y generated problems with m = 000, n = 00 m X (k)?? 3LÎx0 xú Î 3(t + 1) tep size5.4, usedwe forimmediate gradient method FISTA minimize Lemma arriveand at log x exp(a> i x i=1! + bi ) definition of first-order methods (f (x ) f )/f em 5.3 with randomly generated problems and m = 000, n = 00 f best,t f opt Æ! f (xt ) f opt Ø 0 e f is convex and Lipschitz continuous (i.e. Îg t Îú Æ Lf ) on C, gradient gradient ppoe Ï is fl-strongly convexw.r.t. Î Î. Then FISTA FISTA 1 " supxœc DÏ x, x qt Lf fl k=0 k qt k=0 k f ÆO Ô fl Ô t ted GD f Accelerated GD ent methods 00 5! " flR Ô1 t = Lf t with R := supxœc DÏ x, x0, then A 50 0 B 150 Ô k k L R log t f best,t opt Ô
25 Extension to composite models minimize x subject to F (x) := f(x) + h(x) x R n f: convex and smooth h: convex (may not be differentiable) let F opt := min x F (x) be optimal cost Accelerated GD 7-5
26 FISTA (Beck & Teboulle 09) Fast iterative shrinkage-thresholding algorithm x t+1 ( = prox ηth y t η t f(y t ) ) y t+1 = x t+1 + θ t 1 (x t+1 x t ) θ t+1 where y 0 = x 0, θ 0 = 1 and θ t+1 = θ t momentum coefficient originally proposed by Nesterov 83 Accelerated GD 7-6
27 Momentum coefficient θ t+1 = θ t with θ 0 = 1 coefficient θt 1 θ t+1 = 1 3 t + o( 1 t ) (homework) asymptotically equivalent to t t+3 Accelerated GD 7-7
28 Momentum coefficient θ t+1 = θ t with θ 0 = 1 Fact 7. For all t 1, one has θ t t+ (homework) Accelerated GD 7-7
29 Convergence analysis
30 Convergence for convex problems Theorem 7.3 (Convergence of accelerated proximal gradient methods for convex problems) Suppose f is convex and L-smooth. If η t 1/L, then F (x t ) F opt L x0 x (t + 1) improved iteration complexity (i.e. O(1/ ε)) than proximal gradient method (i.e. O(1/ε)) fast if prox can be efficiently implemented Accelerated GD 7-9
31 Recap: fundamental inequality for proximal method Recall following fundamental inequality shown in last lecture: Lemma 7.4 Let y + = prox 1 L h ( y 1 L f(y)), then F (y + ) F (x) L x y L x y+ Accelerated GD 7-30
32 Proof of Theorem build discrete-time version of Lyapunov function. magic happens! Lyapunov function is non-increasing when Nesterov s momentum coefficient is adopted Accelerated GD 7-31
33 Proof of Theorem 7.6 Key lemma: monotonicity of certain Lyapunov function Lemma 7.5 Let u t = θ t 1 x t ( x + (θ t 1 1)x t 1). Then }{{} or θ t 1 (x t x ) (θ t 1 1)(x t 1 x ) u t+1 + ( L θ t F (x t+1 ) F opt) u t + ( L θ t 1 F (x t ) F opt). quite similar to X + τ X X + τ ( f(x) f opt) (Lyapunov function) as discussed before (think about θ t t/) Accelerated GD 7-3
34 Proof of Theorem 7.6 With Lemma 7.5 in place, one has ( L θ t 1 F (x t ) F opt) u 1 + L 0( θ F (x 1 ) F opt) = x 1 x + ( F (x 1 ) F opt) L To bound RHS of this inequality, we use Lemma 7.4 and y 0 = x 0 to get ( F (x 1 ) F opt) y 0 x x 1 x = x 0 x x 1 x L x 1 x + L( F (x 1 ) F opt) x 0 x As a result, L t 1( θ F (x t ) F opt) x 1 x + ( F (x 1 ) F opt) x 0 x L, = F (x t ) F opt L x0 x θ t 1 (Fact 7.) L x 0 x (t + 1) Accelerated GD 7-33
35 Proof of Lemma 7.5 Take x = 1 θ t x + ( 1 1 θ t ) x t and y = y t in Lemma 7.4 to get ( F (x t+1 ) F θ 1 t θ 1 t x + ( 1 θt 1 ) ) x t (7.7) x + ( 1 θt 1 ) x t y t L L = L x θt + ( θ t 1 ) x t θ t y t L θt (i) = L ( u t θt u t+1 θ 1 t x + ( 1 θt 1 ) x t x t+1 x + ( θ t 1 ) x t θ t x t+1 }{{} = u t+1 ), (7.8) where (i) follows from definition of u t and y t = x t + θt 1 1 θ t (x t x t 1 ) Accelerated GD 7-34
36 Proof of Lemma 7.5 (cont.) We will also lower bound (7.7). By convexity of F, ( F θt 1 x + ( 1 θt 1 ) ) x t θt 1 F (x ) + ( 1 θt 1 ) F (x t ) = θt 1 F opt + ( 1 θt 1 ) F (x t ) ( F θ 1 t ( 1 θ 1 t x + ( 1 θt 1 ) ) x t F (x t+1 ) )( F (x t ) F opt) ( F (x t+1 ) F opt) Combining this with (7.8) and θ t θ t = θ t 1 yields L( u t u t+1 ( ) θ t F (x t+1 ) F opt) ( θt )( θ t F (x t ) F opt) = θt ( F (x t+1 ) F opt) θt 1( F (x t ) F opt), thus finishing proof Accelerated GD 7-35
37 Convergence for strongly convex case x t+1 ( = prox ηth y t η t f(y t ) ) κ 1 y t+1 = x t+1 + (x t+1 x t ) κ + 1 Theorem 7.6 (Convergence of accelerated proximal gradient methods for strongly convex case) Suppose f is µ-strongly convex and L-smooth. If η t 1/L, then F (x t ) F opt ( 1 1 ) t (F (x 0 ) F opt + µ x0 x ) κ Accelerated GD 7-36
38 A practical issue Fast convergence requires knowledge of κ = L/µ in practice, estimating µ is typically very challenging A common observation: ripples / bumps in the traces of cost values Accelerated GD 7-37
39 as long as xk œ x0 + span{òf (x0 ),, Òf (xk 1 )} for More precisely, convex and L-smooth function f s.t. 3LÎx0 xú Î 3(t + 1) Rippling behavior Convergence analysis definition of first-order methods Numericalarrive example: take y t+1 = xt+1 +, we immediate at 1 q t+1 1+ q (x vex and Lipschitz continuous (i.e. Îg t Îú Æ Lf ) on C, fl-strongly convex w.r.t. Î Î. Then t f (x ) f opt Ø f (x ) f 4! supxœc DÏ x, x 6 k 8 1 qt q q q q q q q " 0 k=0 =0 = q / = q /3 = q = 3q = q =1! + Lf fl k " qt k=0 period of ripples is often proportional to p L/µ k O Donoghue, Candes 1 with R := supxœc DÏ x, x0, then A Ô B k Lf ofr log 1twith different estimates of q. Algorithm best,t opt Figure 1: Convergence Ô f f ÆO Ô fl t when q > q : we underestimate momentum slower convergence Interpretation. The optimal momentum depends on the condition number of the function; Accelerated GD 1 Ô t result in general f opt Æ 0 xt ); q = 1/κ specifically, higher momentum is required when the function has a higher condition number. Under- estimating the amount of momentum required leads to slower convergence. However we are more 1 q further remove log t factor when q < q : we overestimate momentum ( overshooting / rippling behavior often in the other regime, that of overestimated momentum, because generally q = 0, in which case 1+ βk 1; this corresponds to high momentum and rippling behavior, as we see in Figure This can be visually understood in Figure (), which shows the trajectories of sequences generated by Algorithm 1 minimizing a positive definite quadratic in two dimensions, under q = q, the optimal choice of q, and q = 0. The high momentum causes the trajectory to overshoot the minimum and oscillate around it. This causes a rippling in the function values along the trajectory. Later we Accelerated GD shall demonstrate that the period of these ripples is proportional to the square root of the (local) condition number of the function. q is large) 7-38
40 Adaptive restart (O Donoghue, Candes 1) When certain criterion is met, restart running FISTA with x 0 y 0 x t x t θ 0 = 1 take current iteration as new starting point erase all memory of previous iterates and restart momentum back to zero Accelerated GD 7-39
41 as long as xk œ x0 + span{òf (x0 ),, Òf (x 3LÎx0 xú 3(t + 1) More precisely, convex and L-smooth functio Interestingly, no first-order methods can improv result in general 5.3 definition of first-order methods manumerical 5.4, we immediate arrive at comparisons of adaptive restart schemes flR Ô1 Lf t 4 6! " supxœc DÏ x, x0 + 8 k Æ 1 14 qt Lf fl k=0 k qt k=0 k Gradient descent No Restart Restart every 0 Restart every 400 Restart every 00 Restart optimal, 700 With q=µ/l Adaptive Restart Function scheme Adaptive Restart Gradient scheme! " with R := supxœc DÏ x, x0, then A Ô B k Lf Roflog t best,t opt Figure 3: Comparison fixed Ô and adaptive restart intervals. f f ÆO Ô fl function scheme: restart when ft(xt ) > f (xt 1 ) Accelerated GD 16 = Ô f opt t f (x ) f opt Ø f (x ) f f best,t 0 is convex and Lipschitz continuous (i.e. Îg t Îú Æ Lf ) on C, e Ï is fl-strongly convex w.r.t. Î Î. Then gradient scheme: restart when h f (y t 1 ), xt xt 1 i > 0 one can further remove log t factor {z } q=q q = 0 momentum lead us towards5-37 restart when a bad direction 0.1 adaptive restart Accelerated GD
42 Illustration 0.1 q = q q =0 adaptive restart 0.05 x x 1 with overestimated momentum (e.g. q = 0), one sees spiralling trajectory adaptive restart helps mitigate this issue Accelerated GD 7-41
43 Lower bounds
44 Optimality of Nesterov s method Interestingly, no first-order methods can improve upon Nesterov s result in general More precisely, convex and L-smooth function f s.t. f(x t ) f opt 3L x0 x 3(t + 1) as long as x k x 0 + span{ f(x 0 ),, f(x k 1 )} for all 1 k t }{{} definition of first-order methods Nemirovski, Yudin 83 Accelerated GD 7-43
45 Example L f (x) = 4 minimizex R(n+1) 1 where A = f is convex and L-smooth f Accelerated GD L = 8 1 > x Ax e> 1x R(n+1) (n+1) 1 optimizer x is given by x i = 1 opt 1 1 n + i n+ (1 i n) obeying and kx k n
46 Example L f (x) = 4 minimizex R(n+1) 1 where A = f (x) = 1... L 4 Ax L4 e > x Ax e> 1x R(n+1) (n+1) 1 span{ f (x0 ),, f (xk 1 )} = span{e1,, ek } if x0 = 0 {z :=Kk } every iteration of first-order methods expands search space by at most one dimension Accelerated GD 7-44
47 Example (cont.) If we start with x 0 = 0, then = f(x n ) inf f(x) = L x K n 8 f(x n ) f opt x 0 x ( L 1 8 ( 1 n ) ) n+1 1 n+ 1 3 (n + ) = 3L 3(n + 1) Accelerated GD 7-45
48 Summary: accelerated proximal gradient stepsize convergence iteration rule rate complexity convex & smooth ( ) ( ) η t = 1 L O 1 O 1 ε problems t ( strongly convex & ( ) ) t ( ηt = 1 L O 1 1 ) κ O κ log 1 smooth problems ε Accelerated GD 7-46
49 Reference [1] Some methods of speeding up the convergence of iteration methods, B. Polyak, USSR Computational Mathematics and Mathematical Physics, 1964 [] A method of solving a convex programming problem with convergence rate O(1/k ), Y. Nestrove, Soviet Mathematics Doklady, [3] A fast iterative shrinkage-thresholding algorithm for linear inverse problems, A. Beck and M. Teboulle, SIAM journal on imaging sciences, 009. [4] First-order methods in optimization, A. Beck, Vol. 5, SIAM, 017. [5] Mathematical optimization, MATH301 lecture notes, E. Candes, Stanford. [6] Gradient methods for minimizing composite functions,, Y. Nesterov, Technical Report, 007. Accelerated GD 7-47
50 Reference [7] Large-scale numerical optimization, MS&E318 lecture notes, M. Saunders, Stanford. [8] Proximal algorithms, N. Parikh and S. Boyd, Foundations and Trends in Optimization, 013. [9] Convex optimization: algorithms and complexity, S. Bubeck, Foundations and trends in machine learning, 015. [] Optimization methods for large-scale systems, EE36C lecture notes, L. Vandenberghe, UCLA. [11] Problem complexity and method efficiency in optimization, A. Nemirovski, D. Yudin, Wiley, [1] Introductory lectures on convex optimization: a basic course, Y. Nesterov, 004 [13] A differential equation for modeling Nesterov s accelerated gradient method, W. Su, S. Boyd, E. Candes, NIPS, 014. Accelerated GD 7-48
51 Reference [14] On accelerated proximal gradient methods for convex-concave optimization, P. Tseng, 008 [15] Adaptive restart for accelerated gradient schemes, B. O donoghue, and E. Candes, Foundations of computational mathematics, 01 Accelerated GD 7-49
Proximal gradient methods
ELE 538B: Large-Scale Optimization for Data Science Proximal gradient methods Yuxin Chen Princeton University, Spring 08 Outline Proximal gradient descent for composite functions Proximal mapping / operator
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationA Conservation Law Method in Optimization
A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationAdaptive Restarting for First Order Optimization Methods
Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationIFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent
IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):
More informationSelected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018
Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationOptimization for Machine Learning
Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS
More informationStochastic Optimization: First order method
Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationGRADIENT = STEEPEST DESCENT
GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationOptimized first-order minimization methods
Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationLecture 2 February 25th
Statistical machine learning and convex optimization 06 Lecture February 5th Lecturer: Francis Bach Scribe: Guillaume Maillard, Nicolas Brosse This lecture deals with classical methods for convex optimization.
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationAccelerating Nesterov s Method for Strongly Convex Functions
Accelerating Nesterov s Method for Strongly Convex Functions Hao Chen Xiangrui Meng MATH301, 2011 Outline The Gap 1 The Gap 2 3 Outline The Gap 1 The Gap 2 3 Our talk begins with a tiny gap For any x 0
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationELE 538B: Large-Scale Optimization for Data Science. Quasi-Newton methods. Yuxin Chen Princeton University, Spring 2018
ELE 538B: Large-Scale Opimizaion for Daa Science Quasi-Newon mehods Yuxin Chen Princeon Universiy, Spring 208 00 op ff(x (x)(k)) f p 2 L µ f 05 k f (xk ) k f (xk ) =) f op ieraions converges in only 5
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationGradient Methods Using Momentum and Memory
Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationA Differential Equation for Modeling Nesterov s Accelerated Gradient Method: Theory and Insights
Journal of Machine Learning Research 17 (16) 1-43 Submitted 3/15; Revised 1/15; Published 9/16 A Differential Equation for Modeling Nesterov s Accelerated Gradient Method: Theory and Insights Weijie Su
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationA Differential Equation for Modeling Nesterov s Accelerated Gradient Method: Theory and Insights
A Differential Equation for Modeling Nesterov s Accelerated Gradient Method: Theory and Insights Weijie Su 1 Stephen Boyd Emmanuel J. Candès 1,3 1 Department of Statistics, Stanford University, Stanford,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationMATH 829: Introduction to Data Mining and Analysis Computing the lasso solution
1/16 MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution Dominique Guillot Departments of Mathematical Sciences University of Delaware February 26, 2016 Computing the lasso
More information5. Subgradient method
L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationNon-negative Matrix Factorization via accelerated Projected Gradient Descent
Non-negative Matrix Factorization via accelerated Projected Gradient Descent Andersen Ang Mathématique et recherche opérationnelle UMONS, Belgium Email: manshun.ang@umons.ac.be Homepage: angms.science
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationCS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent
CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi Gradient Descent This lecture introduces Gradient Descent a meta-algorithm for unconstrained minimization. Under convexity of the
More informationLecture 5: September 15
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationUnconstrained minimization
CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationRandomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity
Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationA Brief Overview of Practical Optimization Algorithms in the Context of Relaxation
A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:
More informationGeometric Descent Method for Convex Composite Minimization
Geometric Descent Method for Convex Composite Minimization Shixiang Chen 1, Shiqian Ma 1, and Wei Liu 2 1 Department of SEEM, The Chinese University of Hong Kong, Hong Kong 2 Tencent AI Lab, Shenzhen,
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationIntegration Methods and Optimization Algorithms
Integration Methods and Optimization Algorithms Damien Scieur INRIA, ENS, PSL Research University, Paris France damien.scieur@inria.fr Francis Bach INRIA, ENS, PSL Research University, Paris France francis.bach@inria.fr
More informationConvex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University
Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization
More informationFAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING
FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex
More informationA SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS
A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)
More informationGradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationA DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson
204 IEEE INTERNATIONAL WORKSHOP ON ACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 2 24, 204, REIS, FRANCE A DELAYED PROXIAL GRADIENT ETHOD WITH LINEAR CONVERGENCE RATE Hamid Reza Feyzmahdavian, Arda Aytekin,
More informationDesign and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016
Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationUnconstrained minimization: assumptions
Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationThe Proximal Gradient Method
Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,
More informationExpanding the reach of optimal methods
Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationHamiltonian Descent Methods
Hamiltonian Descent Methods Chris J. Maddison 1,2 with Daniel Paulin 1, Yee Whye Teh 1,2, Brendan O Donoghue 2, Arnaud Doucet 1 Department of Statistics University of Oxford 1 DeepMind London, UK 2 The
More informationarxiv: v1 [math.oc] 7 Dec 2018
arxiv:1812.02878v1 [math.oc] 7 Dec 2018 Solving Non-Convex Non-Concave Min-Max Games Under Polyak- Lojasiewicz Condition Maziar Sanjabi, Meisam Razaviyayn, Jason D. Lee University of Southern California
More information