Numerical Optimization of Partial Differential Equations

Similar documents
Numerical Optimization of Partial Differential Equations

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Nonlinear Optimization: What s important?

On Some Variational Optimization Problems in Classical Fluids and Superfluids

Unconstrained optimization

Constrained Optimization

5.6 Penalty method and augmented Lagrangian method

Scientific Computing: Optimization

1 Numerical optimization

Numerical Optimization: Basic Concepts and Algorithms

Optimization Methods

Programming, numerics and optimization

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Lecture 18: Optimization Programming

Numerical Optimization Techniques

Optimality Conditions for Constrained Optimization

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS

Optimization Tutorial 1. Basic Gradient Descent

Chapter 4. Unconstrained optimization

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

Constrained Optimization Theory

1 Numerical optimization

Review of Classical Optimization

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

5 Handling Constraints

Generalization to inequality constrained problem. Maximize

FALL 2018 MATH 4211/6211 Optimization Homework 4

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Constrained Optimization and Lagrangian Duality

Scientific Computing: An Introductory Survey

Numerical optimization

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Numerical Optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Chapter 10 Conjugate Direction Methods

IE 5531: Engineering Optimization I

Gradient Descent. Dr. Xiaowei Huang

Conjugate Directions for Stochastic Gradient Descent

The Conjugate Gradient Algorithm

Lecture 3. Optimization Problems and Iterative Algorithms

2.3 Linear Programming

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Gradient-Based Optimization

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Optimization Methods for Circuit Design

IE 5531: Engineering Optimization I

Algorithms for Constrained Optimization

Optimization Methods

Distributed Consensus-Based Optimization

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

Optimization and Root Finding. Kurt Hornik

1 Introduction

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

CS-E4830 Kernel Methods in Machine Learning

Conjugate Gradient Method

5. Duality. Lagrangian

Steepest descent algorithm. Conjugate gradient training algorithm. Steepest descent algorithm. Remember previous examples

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Numerical Optimization. Review: Unconstrained Optimization

An Efficient Modification of Nonlinear Conjugate Gradient Method

1 Computing with constraints

x k+1 = x k + α k p k (13.1)

Static unconstrained optimization

Optimization with nonnegativity constraints

IE 5531: Engineering Optimization I

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Newton s Method. Javier Peña Convex Optimization /36-725

Unconstrained Multivariate Optimization

Constrained optimization: direct methods (cont.)

Constrained optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

Conditional Gradient (Frank-Wolfe) Method

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

Introduction to Nonlinear Optimization Paul J. Atzberger

1 Conjugate gradients

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019

Acceleration Method for Convex Optimization over the Fixed Point Set of a Nonexpansive Mapping

A DIMENSION REDUCING CONIC METHOD FOR UNCONSTRAINED OPTIMIZATION

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Lecture V. Numerical Optimization

Written Examination

The Conjugate Gradient Method

Nonlinear Programming

Computational Finance

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Transcription:

Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada URL: http://www.math.mcmaster.ca/bprotas Rencontres Normandes sur les aspects théoriques et numériques des EDP 5 9 November 2018, Rouen

Unconstrained Optimization Problems Optimality Conditions Gradient Flows Lagrange Multipliers Projected Gradients

A good reference for standard approaches

Unconstrained Optimization Problems Optimality Conditions Gradient Flows Suppose f : R N R, N 1, is a twice continuously differentiable objective function Unconstrained Optimization Problems: min f (x) x R N (for maximization problems, we can consider min[ f (x)]) A point x is a global minimizer if f ( x) f (x) for all x A point x is a local minimizer if there exists a neighborhood N of x such that f ( x) f (x) for all x N A local minimizer is strict (or strong), if it is unique in N

Unconstrained Optimization Problems Optimality Conditions Gradient Flows Gradient of the objective function f (x) := [ f x 1,..., f x N ] T Hessian of the objective function [ 2 f (x) ] i,j := 2 f x j x i, i, j = 1,..., N

Unconstrained Optimization Problems Optimality Conditions Gradient Flows Theorem (First-Order Necessary Condition) If x is a local minimizer, then f ( x) = 0. Theorem (Second-Order Sufficient Conditions) Suppose that f ( x) = 0 and 2 f ( x) is positive-definite. Then x is a strict local minimizers of f. Unfortunately, analogous characterization of global minimizers is not possible

How to find a local minimizer x? Unconstrained Optimization Problems Optimality Conditions Gradient Flows Consider the following initial-value problem in R N, known as the gradient flow dx(τ) = f (x(τ)), τ > 0, (GF) dτ x(0) = x 0, where τ is a pseudo-time (a parametrization) x 0 is a suitable initial guess Then, lim τ x(τ) = x In principle, the gradient flow may converge to a saddle point x s, where f (x s ) = 0 and the Hessian 2 f (x s ) is not positive-definite, but in actual computations this is very unlikely.

Discretize the gradient flow (GF) with Euler s explicit method (SD) { x (n+1) = x (n) τ f (x (n) ), n = 1, 2,..., x (0) = x 0, where x (n) := x(n τ), such that lim n x (n) = x τ is a fixed step size (since Euler s explicit scheme is only conditionally stable, τ must be sufficiently small) In principle, the gradient flow (GF) can be discretized with higher-order schemes, including implicit approaches, but they are not easy to apply to PDE optimization problems, hence will not be considered here.

Algorithm 1 (SD) 1: x (0) x 0 (initial guess) 2: n 0 3: repeat 4: compute the gradient f (x (n) ) 5: update x (n+1) = x (n) τ f (x (n) ) 6: n n + 1 7: until f (x(n) ) f (x (n 1) ) < ε f (x (n 1) ) f Input: x 0 initial guess τ fixed step size ε f tolerance in the termination condition Output: an approximation of the minimizer x

Computational Tests Rosenbrock s banana function f (x 1, x 2 ) = 100(x 2 x1 2 ) 2 + (1 x 1 ) 2 Global minimizer x 1 = x 2 = 1, f (1, 1) = 0 The function is known for its poor conditioning eigenvalues of the Hessian 2 f at the minimum: λ 1 0.4, λ 2 1001.6

Choice of the step size τ: steepest descent is not meant to approximate the gradient flow (GF) accurately, but to minimize f (x) rapidly Sufficient decrease Armijo s condition f (x (n) + τ p (n) ) f (x (n) ) C τ f (x (n) ) T p (n) where p (n) is a search direction and C (0, 1) Figure credit: Nocedal & Wright (1999) Wolfe s condition: sufficient decrease and curvature

Optimize the step size at every iteration by solving the line minimization (line-search) problem τ n := argmin τ>0 f (x (n) τ f (x (n) )) Brent s method for line minimization: a combination of the golden-section search with parabolic interpolation (derivative-free) Figure credit: Numerical Recipes in C (1992) A robust implementation of Brent s method available in Numerical Recipes in C (1992), see also the function fminbnd in MATLAB

Algorithm 2 with Line Search (SDLS) 1: x (0) x 0 (initial guess) 2: n 0 3: repeat 4: compute the gradient f (x (n) ) 5: determine optimal step size τ n = argmin τ>0 f (x (n) τ f (x (n) )) 6: update x (n+1) = x (n) τ n f (x (n) ) 7: n n + 1 8: until f (x(n) ) f (x (n 1) ) f (x (n 1) ) < ε f Input: x 0 initial guess ε τ tolerance in line search ε f tolerance in the termination condition Output: an approximation of the minimizer x

Consider, for now, minimization of a quadratic form f (x) = 1 2 xt Ax b T x, where A R N N is a symmetric, positive-definite matrix and b R N Then, f (x) = Ax b =: r such that minimizing f (x) is equivalent to solving Ax = b A set of nonzero vectors [p 0, p 1,..., p k ] is said to be conjugate with respect to matrix A if p T i Ap j = 0, i, j = 0,..., k, i j (conjugacy implies linear independence)

Conjugate Gradient (CG) method x (n+1) = x (n) + τ n p (n), n = 1, 2,..., p (n) = r (n) + β (n) p n 1, (r (n) = f (x (n) ) = Ax (n) b), β (n) = (r(n) ) T Ap (n 1) (p (n 1) ) T, ( momentum ), Ap (n 1) τ n = (r(n) ) T p (n) (p (n) ) T, (exact formula for optimal step size), Ap (n) x 0 = x 0, p (0) = r (0) The directions p (0), p (1),..., p (n) generated by the CG method are conjugate with respect to matrix A this gives rise to a number of interesting and useful properties

Theorem (properties of CG iterations) The iterates generated by the CG method have the following properties { span p (0), p (1),..., p (n)} = span {r (0), r (1),..., r (n)} = span {r (0), Ar (0),..., A n r (0)} (the expanding subspace property) (r (n) ) T r (k) = (r (n) ) T p (k) = 0, i = 0,..., n 1 x (n) is the minimizer of f (x) = 1 2 xt Ax b T x over the set { x 0 + span {p (0), p (1),..., p (n)}}

Thus, in the method minimization of f (x) = 1 2 xt Ax b T x is performed by solving (exactly) N = dim(x) line-minimization problems along the conjugate directions { p (0), p (1),..., p (n)} As a result, convergence to x is achieved in at most N iterations What happens when f (x) is a general convex function?

The (linear) method admits a generalization to the nonlinear setting by: replacing the residual r (n) with the gradient f (x (n) ) computing the step size via line search τ n = argmin τ>0 f (x (n) τ f (x (n) )) using a more general expressions for the momentum term β (n) (such that the descent directions p (0), p (1),..., p (n) will only be approximately conjugate)

Nonlinear Conjugate Gradient (NCG) method x (n+1) = x (n) + τ n p (n), n = 1, 2,..., p (n) = f (x (n) ) + β (n) p n 1, ( T f (x )) (n) f (x (n) ) β (n) ( f (x = (n 1) )) T f (x (n 1) ) ( ) T ( ) f (x (n) ) f (x (n) ) f (x (n 1) ) ( f (x (n 1) )) T f (x (n 1) ) τ n = argmin τ>0 f (x (n) τ f (x (n) )), x 0 = x 0, p (0) = f (x (0) ) (Fletcher-Reeves), (Polak-Ribière), For quadratic functions f (x), both the Fletcher-Reeves (FR) and the Polak-Ribière (PR) variant coincide with the the linear CG In general, the descent directions p (0), p (1),..., p (n) are now only approximately conjugate

Algorithm 3 Polak-Ribière version of Conjugate Gradient (CG-PR) 1: x (0) x 0 (initial guess) 2: n 0 3: repeat 4: compute the gradient f (x (n) ) 5: calculate β n = ( f (x(n) )) T ( f (x (n) ) f (x (n 1) )) ( f (x (n 1) )) T f (x (n 1) ) 6: determine the descent direction p (n) = f (x (n) ) + β (n) p n 1 7: determine optimal step size τ n = argmin τ>0 f (x (n) + τ p (n) ) 8: update x (n+1) = x (n) + τ n p (n) 9: n n + 1 10: until f (x(n) ) f (x (n 1) ) f (x (n 1) ) < ε f Input: x 0 initial guess, ε τ tolerance in line search tolerance in the termination condition ε f Output: an approximation of the minimizer x

Convergence theory quadratic case (I) Let f (x) = 1 2 xt Ax b T x, where the matrix A has eigenvalues 0 < λ 1 λ N and x A = x T Ax Theorem (Linear Convergence of ) For the Steepest-Descent approach we have the following estimate ( ) x (n+1) x 2 A λn λ 2 1 x (n) x 2 A λ N + λ 1 The rate of convergence is controlled by the spread of the eigenvalues of A

Convergence theory quadratic case (II) Theorem (Convergence of Linear ) For the linear approach we have the following estimate ( ) x (n+1) x 2 A λn n λ 2 1 x 0 x 2 A λ N n + λ 1 The iterates take out one eigenvalue at a time clustering of eigenvalues matters In the nonlinear setting, it is advantageous to periodically reset β n to zero (helpful in practice and simplifies some convergence proofs)

Lagrange Multipliers Projected Gradients What about problems with equality constraints? suppose c : R N R M, where 1 M < N then, we have an equality-constrained optimization problem min f (x) x R N subject to: c(x) = 0 If the constraint equation can be solved and we can write x = y + z, where y R N M and z = g(y) R M, then the problem is reduced to an unconstrained one with a reduced objective function min f (y + g(y)) y RN M

Lagrange Multipliers Projected Gradients Consider augmented objective function L : R N R L(x, λ) := f (x) λ T c(x), where λ R M is the Lagrange multiplier Differentiating the augmented objective function with respect to x x L(x, λ) := f (x) λ T c(x) Theorem (First-Order Necessary Condition) If x is a local minimizer of an equality-constrained optimization problem, then there exists λ R M such that the following equations are satisfied f ( x) λ T c( x) = 0, c( x) = 0 For inequality-constrained problems, the first-order necessary conditions become more complicated the Karush-Kuhn-Tucker (KKT) conditions

Lagrange Multipliers Projected Gradients How to compute equality-constrained minimizers with a gradient method? At each x R N the linearized constraint function c(x) R M N defines a (kernel) subspace with dimension rank[ c(x)] S x := {x R N, c(x)x = 0} this is the subspace tangent to the constraint manifold at x we need to project the gradient f (x) onto Sx Assuming that rank[ c(x)] = M, the projection operator P Sx : R N S x is given by [ ] 1 P Sx := I c(x) ( c(x)) T c(x) ( c(x)) T Replace f (x) with P Sx f (x) in the gradient method (SD or SDLS) nonlinear constraints satisfied with an error O(( τ) 2 ) or O(τ 2 n )

Lagrange Multipliers Projected Gradients Algorithm 4 Projected (PSD) 1: x (0) x 0 (initial guess) 2: n 0 3: repeat 4: compute the gradient f (x (n) ) 5: compute linearization of the constraint c(x (n) ) 6: determine the projector P Sx (n) 7: determine the projected gradient P f Sx (n) (x(n) ) 8: update x (n+1) = x (n) τ P f Sx (n) (x(n) ) 9: n n + 1 10: until f (x(n) ) f (x (n 1) ) < ε f (x (n 1) ) f Input: x 0 initial guess, τ fixed step size tolerance in the termination condition ε f Output: an approximation of the minimizer x

Lagrange Multipliers Projected Gradients Computational Tests Rosenbrock s banana function f (x 1, x 2 ) = 100(x 2 x1 2 ) 2 + (1 x 1 ) 2 Global minimizer x 1 = x 2 = 1, f (1, 1) = 0 The function is known for its poor conditioning eigenvalues of the Hessian 2 f at the minimum: λ 1 0.4, λ 2 1001.6 Constraint c(x 1, x 2 ) = 0.05 x 4 1 x 2 + 2.651605 = 0