Numerical optimization

Similar documents
Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Constrained Optimization

5 Handling Constraints

Gradient Descent. Dr. Xiaowei Huang

8 Numerical methods for unconstrained problems

Constrained optimization

Line Search Methods for Unconstrained Optimisation

Constrained optimization: direct methods (cont.)

Scientific Computing: Optimization

CS-E4830 Kernel Methods in Machine Learning

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Nonlinear Optimization: What s important?

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

LINEAR AND NONLINEAR PROGRAMMING

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Lecture 3. Optimization Problems and Iterative Algorithms

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization Tutorial 1. Basic Gradient Descent

ICS-E4030 Kernel Methods in Machine Learning

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Nonlinear Optimization for Optimal Control

Written Examination

Lecture V. Numerical Optimization

More on Lagrange multipliers

Numerical Optimization

1 Computing with constraints

Multidisciplinary System Design Optimization (MSDO)

Numerical Optimization Prof. Shirish K. Shevade Department of Computer Science and Automation Indian Institute of Science, Bangalore

IE 5531: Engineering Optimization I

Unconstrained minimization of smooth functions

Machine Learning. Support Vector Machines. Manfred Huber

IP-PCG An interior point algorithm for nonlinear constrained optimization

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Constrained Optimization and Lagrangian Duality

Support Vector Machines: Maximum Margin Classifiers

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Optimization Methods

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Algorithms for constrained local optimization

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

The Steepest Descent Algorithm for Unconstrained Optimization

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Sufficient Conditions for Finite-variable Constrained Minimization

2.098/6.255/ Optimization Methods Practice True/False Questions

Generalization to inequality constrained problem. Maximize

A Brief Review on Convex Optimization

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Algorithms for Constrained Optimization

Lecture 18: Optimization Programming

CONSTRAINED NONLINEAR PROGRAMMING

Lecture 13: Constrained optimization

Unconstrained optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Statistical Machine Learning from Data

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Numerical Optimization of Partial Differential Equations

Lagrange duality. The Lagrangian. We consider an optimization program of the form

MATH2070 Optimisation

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Optimization and Root Finding. Kurt Hornik

Convex Optimization. Problem set 2. Due Monday April 26th

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Conditional Gradient (Frank-Wolfe) Method

Ch4: Method of Steepest Descent

Optimisation in Higher Dimensions

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Optimality Conditions

Convex Optimization & Lagrange Duality

Numerical Optimization

Appendix A Taylor Approximations and Definite Matrices

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

10 Numerical methods for constrained problems

minimize x subject to (x 2)(x 4) u,

Image restoration: numerical optimisation

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Quadratic Programming

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Support Vector Machine

2.3 Linear Programming

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

Lecture Notes: Geometric Considerations in Unconstrained Optimization

5. Duality. Lagrangian

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

On Lagrange multipliers of trust region subproblems

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Optimality Conditions for Constrained Optimization

10. Unconstrained minimization

Numerical Optimization. Review: Unconstrained Optimization

Math 411 Preliminaries

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Convex Optimization M2

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

4TE3/6TE3. Algorithms for. Continuous Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Transcription:

Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009

2 Longest Slowest Shortest Minimal Maximal Largest Smallest Fastest Common denominator: optimization problems

Optimization problems 3 Generic unconstrained minimization problem where Vector space is the search space is a cost (or objective) function A solution is the minimizer of The value is the minimum

Local vs. global minimum 4 Find minimum by analyzing the local behavior of the cost function Local minimum Global minimum

Local vs. global in real life 5 False summit 8,030 m Main summit 8,047 m Broad Peak (K3), 12 th highest mountain on Earth

Convex functions A function defined on a convex set 6 is called convex if for any and For convex function local minimum = global minimum Convex Non-convex

7 One-dimensional optimality conditions Point is the local minimizer of a -function if. Approximate a function around as a parabola using Taylor expansion guarantees the minimum at guarantees the parabola is convex

Gradient 8 In multidimensional case, linearization of the function according to Taylor gives a multidimensional analogy of the derivative. The function, denoted as, is called the gradient of In one-dimensional case, it reduces to standard definition of derivative

Gradient 9 In Euclidean space ( ), can be represented in standard basis in the following way: i-th place which gives

Example 1: gradient of a matrix function 10 Given (space of real matrices) with standard inner product Compute the gradient of the function an matrix where is For square matrices

Example 2: gradient of a matrix function Compute the gradient of the function an matrix 11 where is

Hessian 12 Linearization of the gradient gives a multidimensional analogy of the secondorder derivative. The function, denoted as is called the Hessian of Ludwig Otto Hesse (1811-1874) In the standard basis, Hessian is a symmetric matrix of mixed second-order derivatives

13 Optimality conditions, bis Point is the local minimizer of a -function if. for all, i.e., the Hessian is a positive definite matrix (denoted ) Approximate a function around as a parabola using Taylor expansion guarantees the minimum at guarantees the parabola is convex

Optimization algorithms 14 Descent direction Step size

15 Generic optimization algorithm Start with some Determine descent direction Choose step size such that Update iterate Until convergence Increment iteration counter Solution Descent direction Step size Stopping criterion

Stopping criteria 16 Near local minimum, (or equivalently ) Stop when gradient norm becomes small Stop when step size becomes small Stop when relative objective change becomes small

Line search 17 Optimal step size can be found by solving a one-dimensional optimization problem One-dimensional optimization algorithms for finding the optimal step size are generically called exact line search

Armijo [ar-mi-xo] rule 18 The function sufficiently decreases if Armijo rule (Larry Armijo, 1966): start with and decrease it by multiplying by some until the function sufficiently decreases

Descent direction 19 How to descend in the fastest way? Go in the direction in which the height lines are the densest Devil s Tower Topographic map

20 Steepest descent Directional derivative: how much changes in the direction (negative for a descent direction) Find a unit-length direction minimizing directional derivative

Steepest descent 21 L 2 norm L 1 norm Normalized steepest descent Coordinate descent (coordinate axis in which descent is maximal)

22 Steepest descent algorithm Start with some Compute steepest descent direction Choose step size using line search Until convergence Update iterate Increment iteration counter

Condition number 23 Condition number is the ratio of maximal and minimal eigenvalues of the Hessian, 1 1 0.5 0.5 0 0-0.5-0.5-1 -1-0.5 0 0.5 1-1 -1-0.5 0 0.5 1 Problem with large condition number is called ill-conditioned Steepest descent convergence rate is slow for ill-conditioned problems

Q-norm 24 Change of coordinates Q-norm L 2 norm Function Gradient Descent direction

Preconditioning 25 Using Q-norm for steepest descent can be regarded as a change of coordinates, called preconditioning Preconditioner should be chosen to improve the condition number of the Hessian in the proximity of the solution In system of coordinates, the Hessian at the solution is (a dream)

Newton method as optimal preconditioner 26 Best theoretically possible preconditioner direction, giving descent Ideal condition number Problem: the solution is unknown in advance Newton direction: use Hessian as a preconditioner at each iteration

Another derivation of the Newton method 27 Approximate the function as a quadratic function using second-order Taylor expansion (quadratic function in ) Close to solution the function looks like a quadratic function; the Newton method converges fast

28 Newton method Start with some Compute Newton direction Choose step size using line search Until convergence Update iterate Increment iteration counter

Frozen Hessian 29 Observation: close to the optimum, the Hessian does not change significantly Reduce the number of Hessian inversions by keeping the Hessian from previous iterations and update it once in a few iterations Such a method is called Newton with frozen Hessian

Cholesky factorization 30 Decompose the Hessian where is a lower triangular matrix Solve the Newton system in two steps Andre Louis Cholesky (1875-1918) Forward substitution Backward substitution Complexity:, better than straightforward matrix inversion

Truncated Newton 31 Solve the Newton system approximately A few iterations of conjugate gradients or other algorithm for the solution of linear systems can be used Such a method is called truncated or inexact Newton

32 Non-convex optimization Using convex optimization methods with non-convex functions does not guarantee global convergence! There is no theoretical guaranteed global optimization, just heuristics Local minimum Global minimum Good initialization Multiresolution

Iterative majorization 33 Construct a majorizing function satisfying. Majorizing inequality: for all is convex or easier to optimize w.r.t.

34 Iterative majorization Start with some Find such that Update iterate Until convergence Increment iteration counter Solution

Constrained optimization 35 MINEFIELD CLOSED ZONE

Constrained optimization problems 36 Generic constrained minimization problem where are inequality constraints are equality constraints A subset of the search space in which the constraints hold is called feasible set A point belonging to the feasible set is called a feasible solution A minimizer of the problem may be infeasible!

An example Equality constraint 37 Inequality constraint Feasible set Inequality constraint is active at point if, inactive otherwise A point is regular if the gradients of equality constraints and of active inequality constraints are linearly independent

Lagrange multipliers 38 Main idea to solve constrained problems: arrange the objective and constraints into a single function and minimize it as an unconstrained problem is called Lagrangian and are called Lagrange multipliers

KKT conditions 39 If is a regular point and a local minimum, there exist Lagrange multipliers and such that for all and for all such that for active constraints and zero for inactive constraints Known as Karush-Kuhn-Tucker conditions Necessary but not sufficient!

KKT conditions 40 Sufficient conditions: If the objective is convex, the inequality constraints are convex and the equality constraints are affine, the KKT conditions are sufficient In this case, is the solution of the constrained problem (global constrained minimizer)

Geometric interpretation 41 Consider a simpler problem: Equality constraint The gradient of objective and constraint must line up at the solution

Penalty methods 42 Define a penalty aggregate where and are parametric penalty functions For larger values of the parameter, the penalty on the constraint violation is stronger

Penalty methods 43 Inequality penalty Equality penalty

44 Penalty methods Start with some and initial value of Find by solving an unconstrained optimization problem initialized with Set Until convergence Set Update Solution