GRADIENT = STEEPEST DESCENT
|
|
- Phillip Adams
- 5 years ago
- Views:
Transcription
1 GRADIENT METHODS
2 GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient negative gradient
3 GRADIENT DESCENT METHOD Iso-contours gradient 0.5 Gradient descent x k+ = x k rf(x k ) negative gradient
4 STEPSIZE RESTRICTION f(x) = 2 x2 rf(x) = x 0.9 x k+ = x k ( x k ) x k = 2 Restriction: < =
5 RECALL Lipschitz Constant for gradient krf(x) rf(y)k applemkx yk f(y) apple f(x)+(y x) T rf(x)+ M 2 ky xk2 When Hessian exists: M kr 2 f(x)k
6 STABILITY RESTRICTION If you know the Lipschitz Constant for the gradient f(y) apple f(x)+(y x) T rf(x)+ M 2 ky xk2 f(x rf(x)) apple f(x) rf(x) T rf(x)+ M 2 gradient step =f(x) krf(x)k 2 + M 2 2 krf(x)k2 =f(x)+ M 2 2 krf(x)k krf(x)k2 M 2 2 < 0 ) < 2 M proves convergence in terms of gradient
7 LINE SEARCH METHODS Choose search direction: d SEARCH for stepsize that satisfies SOME inequality: Update iterate: x k+ = x k + d Armijo condition f(x k + d) apple f(x k )+ ( d) T rf(x k ), < x k f(x k )+ d T rf(x k ) f(x k )+ d T rf(x k )
8 WOLFE CONDITIONS Choose search direction: d = rf(x k ) Armijo condition don t go too far Find stepsize satisfying Wolf conditions f(x k + d) apple f(x k )+ ( d) T rf(x k ), < d T rf(x k + d) > d T rf(x k ), < < Update iterate: x k+ = x k + d curvature condition don t go too near Theorem Suppose we have the uniform bound for a convex function then rf(x k ) T d krf(x k )kkdk >c lim k! rf(xk )! 0
9 LINE SEARCH IN REAL LIFE Backtracking/Armijo line search Choose search direction: d = rf(x k ) While f(x k + d) f(x k )+ ( d) T rf(x k ) /2 Update iterate x k+ = x k + d Exact line search Choose search direction: d = rf(x k ) =min f(x k + d) Update iterate x k+ = x k + d
10 ADAPTIVE STEPSIZE METHOD Consider this simple model problem f(x) = 2 xt x x k+ = x k rf(x k )=x k ( x k ) Best choice: = Barzilai-Borwein Model: f(x) 2 xt x Figure out best approximation
11 BB METHOD Barzilai-Borwein Model: f(x) 2 xt x rf(y) rf(x) (y x) On each iteration, define: g = rf(x k+ ) rf(x k ) x = x k+ x k The model tells us g x min k x gk2 2
12 BB METHOD min k x gk2 2 x T x x T g =0 = xt g x T x = = xt x x T g
13 BB METHOD Choose search direction: d = rf(x k ) Compute BB step: k = xt x x T g While f(x k + k d) f(x k )+ k d T rf(x k ) For some smooth problems, BB method is superlinear, i.e. for large enough k k k /2 Update iterate: x k+ = x k + d f(x k+ ) f? apple C(f(x k+ ) f? ) p for some C>0, and p> This rate is asymptotic, and not global
14 CONVERGENCE RATE: EXACT LINE SEARCH Strong convexity f(y) f(x)+(y x) T rf(x)+ m 2 ky xk2 Lipschitz bound f(y) apple f(x)+(y x) T rf(x)+ M 2 ky xk2 y = x k+,x= x k y x = x k+ x k = rf(x k ) f(x k+ ) apple min f(x k ) rf(x k ) T rf(x k )+ M 2 2 krf(xk )k 2
15 CONVERGENCE RATE: EXACT LINE SEARCH f(x k+ ) apple min f(x k ) rf(x k ) T rf(x k )+ M 2 2 krf(xk )k 2 = M f(x k+ ) apple f(x k ) 2M krf(xk )k 2 f(x k+ ) f? apple f(x k ) f? 2M krf(xk )k 2 Optimality gap Write in terms of optimality gap so we get relative change
16 STRONG CONVEXITY BOUND f(x k+ ) f? apple f(x k ) f? 2M krf(xk )k 2 Strong convexity constant f(y) f(x)+(y x) T rf(x)+ m 2 ky xk2 f(x? ) Optimality: min y f(x k )+(y x k ) T rf(x k )+ m 2 ky xk k 2 y x k = m rf(xk ) f(x? ) f(x k ) m krf(xk )k 2 + 2m krf(xk )k 2 2m(f(x k ) f(x? )) apple krf(x k )k 2
17 STRONG CONVEXITY BOUND f(x k+ ) f? apple f(x k ) f? f(x k+ ) f? apple f(x k ) f? m 2M krf(xk )k 2 2m(f(x k ) f(x? )) apple krf(x k )k 2 M (f(xk ) f? ) f(x k+ ) f? apple m M (f(x k ) f? ) What s this?
18 STRONG CONVEXITY BOUND f(x k+ ) f? apple m M (f(x k ) f? ) By induction f(x k ) f? apple m M k (f(x 0 ) f? ) Theorem Gradient with exact line search satisfies linear convergence when the objective is strongly convex k f(x k ) f? apple (f(x 0 ) f? ) apple Condition number
19 GRADIENT DESCENT USES CONTOURS Gradients are perpendicular to contours apple =
20 POOR CONDITIONING: CONTOUR INTERPRETATION apple = 00 Contours are (almost) parallel!
21 GRADIENT DESCENT: WEAKLY CONVEX PROBLEMS All methods so far assume strong convexity For strongly convex problems: f(x k ) f? apple m M k (f(x 0 ) f? ) Strong convexity constant For weakly convex problems: f(x k+ ) f? apple Mkx0 x? k 2 k + Slow (worst case)
22 WHAT S THE BEST WE CAN DO? Goal: put worst-case bound on convergence of first-order methods Definition: a method is first-order if x k 2 span{x 0, rf(x 0 ), rf(x ),, rf(x k )} Theorem: For any first-order method, there is a smooth function with Lipschitz constant M such that f(x k ) f(x? ) 3Mkx 0 x? k 2 32(k + ) 2 Nemirovsky and Yudin, 82
23 PROOF The worst function in the world f n (x) = L 4 2 x2 + 2 n X i= (x i x i+ ) x2 n x minimizer x? i = i/(n + ) f n (x? )= L 8 n +
24 PROOF The worst function in the world f n (x) = L 4 2 x2 + 2 f n (x? )= L 8 n + n X i= (x i x i+ ) x2 n x After iteration k x i =0, for i>k Consider function with n=2k variables f 2k (x k )=f k (x k L ) 8 k + f 2k (x? )= L 8 2k +
25 PROOF Consider function with n=2k variables f 2k (x k )=f k (x k L ) 8 k + f 2k (x? )= L 8 2k +2 lowest possible energy on step k f 2k (x k ) f 2k (x? ) kx 0 x? k 2 L 8 k+ 2k+2 3 (2k + 2) = true solution 3L 32(k + ) 2 start at 0
26 Nemirovski and Yundin 83 NESTEROV S METHOD ACHIEVES THE BOUND Gradient Nesterov x 0 x 0 x x 2 x 4 x 3 x x 4 x 2 x3 Optimal f(x k+ ) f? apple kx 0 x? k 2 (k + ) f(x k+ ) f? apple 2kx 0 x? k 2 (k + ) 2
27 NESTEROV S METHOD < L rf, 0 = x k+ = y k rf(y k ) Momentum parameter k+ = +p 4( k ) y k+ = x k+ + k k+ (xk+ x k ) Prediction step Momentum term
28 OPTIMAL CONVERGENCE Theorem: The objective error of the kth iterate of Nester s method is bounded by f(x k ) f(x? ) apple 2Lkx0 x? k 2 (k + ) 2 This worst case doesn t mean much in practice adaptive convergence is faster
29 POOR CONDITIONING: FUNCTION INTERPRETATION f(x) = X d i 2 x2 i = x T Dx d i = 3 d i =
30 POOR CONDITIONING: FUNCTION INTERPRETATION d i = 3 f(x) = X d i 2 x2 i = x T Dx x k+ i = x i d i x k One stepsize to rule them all d i = Big step = Small step =
31 THIS IS ALWAYS HAPPENING! minimize minimize 2 xt Hx + g T x EVD 2 xt UDU T x + g T x y U T x minimize 2 yt Dy +(U T g) T x = X d i 2 y2 i +(U T g) i y i
32 PRECONDITIONERS x = Py minimize f(x) minimize f(py) H Hessians P T HP
33 THE BEST PRECONDITIONER x = Py minimize f(x) minimize f(py) H Hessians P = H /2 P T HP P T HP = H /2 HH /2 = I
34 SVD INTERPRETATION f(x) = 2 xt Hx = 2 xt UDU T x = 2 xt UD /2 D /2 U T x = 2 yt y x = UD /2 y = Py y = D /2 U T x Rotation Stretch P
35 NEWTON S METHOD x = H /2 y minimize f(x) minimize f(h /2 y) gradient step y k+ = y k H /2 rf(h /2 y k ) change variables back to x H /2 y k+ = H /2 y k H /2 H /2 rf(h /2 y k ) x k+ = x k H rf(x k ) Newton direction There are many interpretations
36 GEOMETRIC INTERPRETATION minimize f(x) f(x k )+(x x k ) T g +(x x k ) T H(x x k ) Derivative: g + H(x x k )=0 x k+ = x k H g f(x k )+(x x k ) T g +(x x k ) T H(x x k ) f(x) x k x k+
37 CLASSICAL NEWTON METHOD Newton s method = Algorithm for root finding g(x) =0 x k+ = x k g(x k ) g 0 (x k ) For nonlinear system of equations x k+ x k g(x) g(x k )+(x x k ) T rg(x k ) x k+ = x k rg(x k ) g(x)
38 NEWTON METHOD FOR OPTIMIZATION Taylor s theorem rf(x) =0 f(x) (x x k ) T rf(x k )+ 2 (x xk ) T r 2 f(x k )(x x k ) Derivative Linear approximation of gradient rf(x) rf(x k )+r 2 f(x k )(x x k )=g + H(x x k )=0 x k+ = x k H g
39 SIMPLE RATE ANALYSIS Bound the error: e k = x k x? Assume f 2 C 2, e k is sufficiently small, and strong convexity. 0=rf(x? )=rf(x k e k )=rf(x k ) He k + O(ke k k 2 ) Multiply by inverse hessian 0=H rf(x k ) e k + O(ke k k 2 ) note: e k+ = e k H rf(x k ) e k+ = O(ke k k 2 ) Fletcher, Practical Methods of Optimization
40 DAMPED NEWTON METHOD Choose search direction: d = r 2 f(x k ) rf(x k ) Find stepsize satisfying Wolf conditions f(x k + d) apple f(x k )+ ( d) T rf(x k ), < Update iterate: x k+ = x k + d < = = Damped step = Full Newton Predicted objective change in Newton direction
41 DAMPED NEWTON METHOD Theorem Suppose we have a Lipschitz constant for the Hessian kr 2 f(x) r 2 f(y)k applel H kx yk When the gradient gets small enough to satisfy krf(x k )kapple3( 2 ) m2 L H.the unit stepsize is an Armijo step, and ( = ) L H 2m 2 krf(xk+ )kapple LH 2m 2 krf(xk )k 2 see Boyd and Vandenberghe
42 NEWTON IN THE WILD In practice Hessian might be indefinite If you start at a point of negative curvature, the Newton goes up! (negative) search direction: must form an acute angle with the gradient g T d>0 We can use any SPD matrix to approximate Hessian d = Ĥ g g T d = g T Ĥ g>0
43 MODIFIED HESSIAN We can use any SPD matrix to approximate Hessian Levenberg Marquardt Ĥ = H + I Add a large enough multiple of the identity to the Hessian that it becomes SPD Modified Cholesky Begin with partial Cholesky I 0 H! L 0 B L T Replace B with a SPD matrix See Fang & O leary, 06
Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More information5 Overview of algorithms for unconstrained optimization
IOE 59: NLP, Winter 22 c Marina A. Epelman 9 5 Overview of algorithms for unconstrained optimization 5. General optimization algorithm Recall: we are attempting to solve the problem (P) min f(x) s.t. x
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationUnconstrained minimization: assumptions
Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationLecture 3: Linesearch methods (continued). Steepest descent methods
Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationWHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????
DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Gradient Descent, Newton-like Methods Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them in class/help-session/tutorial this
More informationMethods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent
Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationOptimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23
Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationNesterov s Optimal Gradient Methods
Yurii Nesterov http://www.core.ucl.ac.be/~nesterov Nesterov s Optimal Gradient Methods Xinhua Zhang Australian National University NICTA 1 Outline The problem from machine learning perspective Preliminaries
More informationR-Linear Convergence of Limited Memory Steepest Descent
R-Linear Convergence of Limited Memory Steepest Descent Frank E. Curtis, Lehigh University joint work with Wei Guo, Lehigh University OP17 Vancouver, British Columbia, Canada 24 May 2017 R-Linear Convergence
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationPart 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)
Part 2: Linesearch methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationMath 164: Optimization Barzilai-Borwein Method
Math 164: Optimization Barzilai-Borwein Method Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 online discussions on piazza.com Main features of the Barzilai-Borwein (BB) method The BB
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More information14. Nonlinear equations
L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set
More informationLecture 7 Unconstrained nonlinear programming
Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,
More informationUnconstrained minimization
CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationOptimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization
5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The
More informationAn Optimal Affine Invariant Smooth Minimization Algorithm.
An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More information5. Subgradient method
L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More information17 Solution of Nonlinear Systems
17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m
More informationNumerical Methods. V. Leclère May 15, x R n
Numerical Methods V. Leclère May 15, 2018 1 Some optimization algorithms Consider the unconstrained optimization problem min f(x). (1) x R n A descent direction algorithm is an algorithm that construct
More informationOptimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng
Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationOptimization Methods. Lecture 19: Line Searches and Newton s Method
15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationMATH 4211/6211 Optimization Quasi-Newton Method
MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationGradient Methods Using Momentum and Memory
Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More informationAlgorithms for constrained local optimization
Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationand P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of s
Global Convergence of the Method of Shortest Residuals Yu-hong Dai and Ya-xiang Yuan State Key Laboratory of Scientic and Engineering Computing, Institute of Computational Mathematics and Scientic/Engineering
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationWhy should you care about the solution strategies?
Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the
More informationNumerical computation II. Reprojection error Bundle adjustment Family of Newtonʼs methods Statistical background Maximum likelihood estimation
Numerical computation II Reprojection error Bundle adjustment Family of Newtonʼs methods Statistical background Maximum likelihood estimation Reprojection error Reprojection error = Distance between the
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationMore First-Order Optimization Algorithms
More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM
More information6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection
6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More informationExpanding the reach of optimal methods
Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationAM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α
More informationStochastic Gradient Descent. Ryan Tibshirani Convex Optimization
Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so
More information1 Numerical optimization
Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationConvex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University
Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization
More information