4TE3/6TE3. Algorithms for. Continuous Optimization

Similar documents
Algorithms for constrained local optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

Algorithms for Constrained Optimization

Constrained Optimization


2.098/6.255/ Optimization Methods Practice True/False Questions

5 Handling Constraints

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

minimize x subject to (x 2)(x 4) u,

Written Examination

Lecture 13: Constrained optimization

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

1 Computing with constraints

Algorithms for nonlinear programming problems II

10 Numerical methods for constrained problems

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Algorithms for nonlinear programming problems II

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

MS&E 318 (CME 338) Large-Scale Numerical Optimization

LINEAR AND NONLINEAR PROGRAMMING

Lecture 3. Optimization Problems and Iterative Algorithms

Nonlinear Optimization: What s important?

Numerical Optimization

Interior-Point Methods for Linear Optimization

2.3 Linear Programming

Lecture 18: Optimization Programming

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Computational Optimization. Augmented Lagrangian NW 17.3

Constrained optimization: direct methods (cont.)

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 10: Interior methods. Anders Forsgren. 1. Try to solve theory question 7.

Nonlinear Optimization for Optimal Control

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

Interior Point Methods for Mathematical Programming

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Nonlinear optimization

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

LP. Kap. 17: Interior-point methods

Nonlinear Programming

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Numerical Optimization

Optimisation in Higher Dimensions

Interior Point Methods in Mathematical Programming

Lecture 15: SQP methods for equality constrained optimization

Applications of Linear Programming

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

12. Interior-point methods

12. Interior-point methods

10-725/ Optimization Midterm Exam

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Barrier Method. Javier Peña Convex Optimization /36-725

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

CONSTRAINED NONLINEAR PROGRAMMING

Advanced Mathematical Programming IE417. Lecture 24. Dr. Ted Ralphs

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Equality constrained minimization

SF2822 Applied nonlinear optimization, final exam Saturday December

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Lecture: Duality of LP, SOCP and SDP

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Lecture V. Numerical Optimization

Constrained Nonlinear Optimization Algorithms

What s New in Active-Set Methods for Nonlinear Optimization?

Scientific Computing: Optimization

Operations Research Lecture 4: Linear Programming Interior Point Method

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Multidisciplinary System Design Optimization (MSDO)

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

IE 5531: Engineering Optimization I

Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems. Hirokazu KATO

IE 5531 Midterm #2 Solutions

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Constrained Optimization and Lagrangian Duality

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function

4.5 Simplex method. LP in standard form: min z = c T x s.t. Ax = b

Interior Point Algorithms for Constrained Convex Optimization

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Sequential Unconstrained Minimization: A Survey

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Optimization and Root Finding. Kurt Hornik

Newton s Method. Javier Peña Convex Optimization /36-725

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Scientific Computing: An Introductory Survey

8 Numerical methods for unconstrained problems

Computational Finance

Constrained Optimization Theory

Transcription:

4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca Tel: 27780

Algorithms for constrained optimization Linear Equality Constraints (LEC) min f(x) s.t. Ax = b. f is continuously differentiable, A : m n is a matrix with rank(a) = m and b IR m. Given a basis B then Ax = Bx B + Nx N = b we have x B = B 1 b B 1 Nx N min f N (x N ) where f N (x N ) = f(x) = f(b 1 b B 1 Nx N, x N ). This is an unconstrained problem. Further, f(x) T = (( B f(x)) T,( N f(x)) T ), The reduced gradient can be expressed as: f N (x N ) T The Reduced Hessian: = ( B f(x)) T B 1 N + ( N f(x)) T = ( ( B f(x)) T,( N f(x)) T) ( B 1 N I 2 f N (x N ) = ( (B 1 N) T, I ) 2 f(x N ) ( B 1 N I ) ).. 1

Linear Equality Constraints Null-space method - I (LEC) min f(x) s.t. Ax = b. f is continuously differentiable, A : m n is a matrix with rank(a) = m and b IR m. Let x be feasible, i.e., A x = b, then Ax = A( x+s) = b, thus (LEC) is equivalent to: (LEC) min f( x + s) s.t. As = 0. The vector s is from the null-space of the matrix A. If the columns of Z (an n (n m) matrix) give a basis of the null-space of A, then s = Zv with v R n m. Then (LEC) can be given by (LEC) min h(v) = f( x + Zv). This is an unconstrained problem! 2

Linear Equality Constraints Null-space method - II The null-space can easily be given: Let B be a basis from the column space of A, then A = (B, N) The range(row) space of A can be given by the basis vectors The null-space is Clearly RZ = AZ = 0. R = (I, B 1 N). Z T = ( (B 1 N) T, I ). The gradient of h(v) can be expressed as h(v) = Z T f( x + Zv). The Hessian of h(v) can be expressed as 2 h(v) = Z T 2 f( x + Zv)Z. 3

The reduced gradient method Linear (in)equality constraints (LC) min f(x) s.t. Ax = b, x 0. f is continuously differentiable, A : m n is a matrix with rank(a) = m and b IR m. Given a basis B and a feasible x = (x B, x N ) such that x B > 0. x N do not have to be zero! we have Bx B + Nx N = b x B = B 1 b B 1 Nx N min f N (x N ) s.t. B 1 b B 1 Nx N 0, x N 0, where f N (x N ) = f(x) = f(b 1 b B 1 Nx N, x N ) and f(x) T = (( B f(x)) T,( N f(x)) T ). The reduced gradient can be expressed as r := f N (x N ) T = ( B f(x)) T B 1 N+( N f(x)) T. 4

The reduced gradient method Linear constraints: x k IR n is the current iterate; the basis is nondegenerate; Search direction: s T = (s T B, st N ) in N(A) s B = B 1 Ns N and s N properly given, the feasibility of x k + λs is guaranteed as long as { x k i x k + λs 0, i.e. λ λ = min. 1 i n, s i <0 s i Further, s N should be a descent direction of f. s j = 0 if x k j = 0 and f N(x k N ) x j 0, f N(x k N ) x j Make a line search: otherwise x k+1 = arg min f(x k + λs). 0 λ λ } j N. If all the coordinates x k+1 B stay strictly positive we keep the basis, else a pivot is made to eliminate the zero variable from the basis and replace it by a positive but currently non-basic coordinate. 5

The reduced gradient method Convergent variant: x k IR n is the current iterate; the basis is nondegenerate; Search direction: s T = (s T B, st N ) in N(A) s B = B 1 Ns N and s N is given by s j = { xi r j if r j 0, r j otherwise j N. the feasibility of x k + λs is guaranteed as long as x k + λs 0, i.e. λ λ = min 1 i n, s i <0 { x k i s i }. Theorem 1 The search direction s at x k is always a descent direction unless s = 0. If s = 0, then x k is a KKT point of problem (LC). Theorem 2 Any accumulation point of the sequence {x k } is a KKT point. 6

Remarks on the reduced gradient method x k is not necessarily a basic solution superbasic variables. Degeneracy imply possible zero steps, thus anticycling techniques are needed. Convex simplex method: Only one coordinate j of s N can be nonzero. s j = f N(x k N ) x > 0, the rest of s j N is zero; s B = B 1 Ns N = B 1 a j s j. The simplex method for linear optimization (programming) is a further specialization for the case when f(x) = c T x for some c R n and for the initial basis B we have x N = 0. The property x N = 0 is preserved throughout the process. As in the convex simplex method we let only one component j of s N to be nonzero. Due to linearity, the line search will always get to the boundary and makes a basis coordinate (a coordinate of x B ) zero. 7

The Generalized reduced gradient (GRG) method (NC) min f(x) s.t. h j (x) = 0, j = 1,, m x 0. f, h 1,, h m are continuously differentiable. Assumption: the gradients of the constraints h j are linearly independent at every point x 0. A feasible x k 0 with h j (x k ) = 0 j is given. The Jacobian H(x) = (h 1 (x),, h m (x)) T at each x 0 has full rank. Denote it at x k by A = JH(x k ). A basis B, with x k B > 0 is given. For the linearized constraints we have H(x k ) + JH(x k )(x x k ) = 0 + A(x x k ) = 0. From this one has Bx B + Nx N = Ax k [or Bx B + Nx N = Ax k H(x k )] and by introducing b = Ax k we have x B = B 1 b B 1 Nx N min f N (x N ) s.t. B 1 b B 1 Nx N 0, x N 0, 8

GRG cntd. where f N (x N ) = f(x) = f(b 1 b B 1 Nx N, x N ). Using the notation f(x) T = (( B f(x)) T,( N f(x)) T ), the reduced gradient can be expressed as f N (x) T = ( B f(x)) T B 1 N + ( N f(x)) T. Get search direction s goes exactly the same way as in the linearly constrained case. Due to the nonlinearity of the constraints H(x k+1 ) = H(x k + λs) = 0 will not hold! Something to be done to recover feasibility. 9

SQP-I Sequential quadratic programming Equality constraints (NC) min f(x) s.t. h j (x) = 0, j = 1,, m The Lagrange function is where y j R, L(x, y) = f(x) + m j=1 y j h j (x), j = 1,, m. Let us denote H(x) = (h 1 (x),, h m (x)) T. Then the KKT conditions are: x L(x, y) = 0 H(x) = 0. Let a candidate solution (x k, y k ) be given and apply Newton s method to solve this nonlinear equation system: 2 xxl(x k, y k ) x +( H(x k )) T y = x L(x k, y k ) H(x k ) x = H(x k ). 10

SQP-II Sequential quadratic programming Equality constraints This equation system 2 xxl(x k, y k ) x +( H(x k )) T y = x L(x k, y k ) H(x k ) x = H(x k ) is the KKT condition of the following linearly constrained quadratic optimization problem: min 1 2 xt 2 xxl(x k, y k ) x + x L(x k, y k ) T x s.t. H(x k ) x = H(x k ). One needs to: solve this quadratic problem at each iteration, to make a line-search where feasibility and optimality need to be considered and repeating the process from the new point until the optimality condition is satisfied. 11

Barrier functions How can one replace the constraint t 0 (i.e., g j (x) 0) by a good barrier function? Desired properties of a barrier functions B(t) of t 0. 1. B(t) is a smooth (infinitely many times) differentiable, strictly convex. 2. As combined with an f(t), the function f(t) + B(t) should not assume its minimum at t = 0. The derivative of B(t) goes to as t 0. 3. B(t) goes to infinity as t 0. Functions satisfying all three properties are barrier functions, functions satisfying the first two properties are quasi-barriers. Note: For barrier functions you need inequality constrains! 12

Quasi-Barrier functions Quasi barrier functions The entropy function tlog t. Let 0 < r < 1. The function t r. Barrier functions 1. The logarithmic barrier function log t. 2. Let r > 1. The inverse barrier function t r. Then the barrier function for the (CO) problem is given by (CO) min f(x) s.t. g j (x) 0, j = 1,, m f µ (x) = f(x) µ + m j=1 B( g j (x)) (1) where µ > 0. The original problem (CO) is solved by subsequentially minimizing the function f µ (x) for a series of µ values as µ 0. 13

Generic IPM Input: µ = µ 0 the barrier parameter value; θ the reduction parameter, 0 < θ < 1; ǫ > 0 the accuracy parameter; x 0 a given interior feasible point; Step 0: x := x 0, µ := µ 0 ; Step 1: If µ < ǫ STOP, x(µ) is returned as solution. Step 2: Calculate (approximately) x(µ); Step 3: µ := (1 θ)µ; Step 4: GO TO Step 1. 14

Remarks on IPMs IPM for LO will be discussed. Newton s method is used to calculate an approximation of the new target point x(µ). A smoothness condition self-concordancy, is needed to control Newton efficiently. A proximity measure: the length of the Newton step in the Hessian norm. If the measure is large, a fix reduction is realized in the barrier value. If the measure is small, Newton is quadratically convergent. Depending on θ we have small or large step methods. In small step methods µ is reduced by a small value (e.g. multiplied by (1 γ n ) with some γ > 0) at each iteration, while in large step methods it is significantly reduced (let us say divided by ten). 15

Log-barrier methods and Lagrange multiplier estimates The log-barrier function for the (CO) problem (CO) min f(x) s.t. g j (x) 0, j = 1,..., m for µ > 0 is given by f µ (x) = f(x) + µ m j=1 log( g j (x)) The optimality condition when minimizing f µ (x) is: f(x) + m j=1 µ g j (x) g j(x) = 0. ( ) On the other hand, the Lagrange function for (CO) L(x, y) = f(x) + m j=1 y j g j (x). In the Wolfe dual we get the constraint: f(x) + m j=1 y j g j (x) = 0, ( ) the optimality condition to minimize L(x, y) in x. Comparing ( ) and ( ) we have that µ is an estimate of y j, g j (x) the Lagrange multiplier. 16

Penalty functions How can one force the equality constraints t = 0 (i.e., h i (x) = 0)) and the inequality constraints t 0 (i.e, g j (x) 0) by penalizing the non-satisfaction of these constraints? What are the desirable properties of a penalty function? Desired properties of a penalty functions P(t). 1. P(t) is nonnegative and strictly convex; 2. P(t) = 0 for feasible points; 3. P(t) goes to infinity as infeasibility increases; 4. P(t) increases sharply as infeasibility occurs; 5. P(t) is a smooth (infinitely many times) differentiable. Functions satisfying at least the first three properties are called penalty functions. 17

Penalty functions For the equality constraints t = 0 (i.e., h i (x) = 0)) Quadratic penalty function: P(t) = t 2. Exact penalty function: P(t) = t. For the inequality constraints t 0 (i.e, g j (x) 0) 1. Quadratic { penalty function: 0 if t 0; P(t) = t 2 = (max{0, t}) if t > 0; 2. 2. Exact penalty function: P(t) = max{0, t}. (CO) min f(x) s.t. g j (x) 0, j = 1,, m, h i (x) = 0, i = 1,, k. The quadratic penalty function for (CO) is: P(x) = f(x)+ϑ m j=1 (max{0, g j (x)}) 2 + The exact penalty function for (CO) is: P(x) = f(x) + ϑ m j=1 max{0, g j (x)} + k i=1 k i=1 (h i (x)) 2 h i (x).. 18