A Line search Multigrid Method for Large-Scale Nonlinear Optimization

Similar documents
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Some new developements in nonlinear programming

A Recursive Trust-Region Method for Non-Convex Constrained Minimization

Worst Case Complexity of Direct Search

An Inexact Newton Method for Optimization

Inexact Newton Methods and Nonlinear Constrained Optimization

Constrained Minimization and Multigrid

PDE-Constrained and Nonsmooth Optimization

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Worst Case Complexity of Direct Search

Multilevel Optimization Methods: Convergence and Problem Structure

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Line Search Methods for Unconstrained Optimisation

An Inexact Newton Method for Nonlinear Constrained Optimization

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Worst case complexity of direct search

Multigrid Methods for Elliptic Obstacle Problems on 2D Bisection Grids

8 Numerical methods for unconstrained problems

Computational Linear Algebra

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

An introduction to complexity analysis for nonconvex optimization

arxiv: v1 [math.oc] 17 Dec 2018

Higher-Order Methods

A derivative-free nonmonotone line search and its application to the spectral residual method

ITERATIVE METHODS FOR NONLINEAR ELLIPTIC EQUATIONS

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

5 Quasi-Newton Methods

Step-size Estimation for Unconstrained Optimization Methods

Convex Optimization. Problem set 2. Due Monday April 26th

Kasetsart University Workshop. Multigrid methods: An introduction

Programming, numerics and optimization

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1

A Sobolev trust-region method for numerical solution of the Ginz

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

On Nonlinear Dirichlet Neumann Algorithms for Jumping Nonlinearities

Unconstrained optimization

Unconstrained minimization of smooth functions

Nonlinear Multigrid and Domain Decomposition Methods

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

1. Fast Iterative Solvers of SLE

1. Introduction. In this work we consider the solution of finite-dimensional constrained optimization problems of the form

On Multigrid for Phase Field

Large-Scale Nonlinear Optimization with Inexact Step Computations

A SHORT NOTE COMPARING MULTIGRID AND DOMAIN DECOMPOSITION FOR PROTEIN MODELING EQUATIONS

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem

Step lengths in BFGS method for monotone gradients

On the complexity of an Inexact Restoration method for constrained optimization

Optimal Newton-type methods for nonconvex smooth optimization problems

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

On efficiency of nonmonotone Armijo-type line searches

Multigrid and Domain Decomposition Methods for Electrostatics Problems

Math 164: Optimization Barzilai-Borwein Method

Algorithms for Nonsmooth Optimization

Notes on Numerical Optimization

Multigrid absolute value preconditioning

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

Static unconstrained optimization

A multilevel, level-set method for optimizing eigenvalues in shape design problems

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Interpolation-Based Trust-Region Methods for DFO

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

Overlapping Domain Decomposition and Multigrid Methods for Inverse Problems

Unconstrained Optimization

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations

MULTIGRID METHODS FOR NONLINEAR PROBLEMS: AN OVERVIEW

New Multigrid Solver Advances in TOPS

Stochastic Quasi-Newton Methods

A multigrid method for large scale inverse problems

Complexity of gradient descent for multiobjective optimization

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

arxiv: v1 [math.na] 11 Jul 2011

Uniform Convergence of a Multilevel Energy-based Quantization Scheme

Convergence and Descent Properties for a Class of Multilevel Optimization Algorithms

1. Introduction. We consider the optimization of systems governed by differential equations. The nonlinear optimization problems have the form

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

Chapter 7 Iterative Techniques in Matrix Algebra

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

Institutional Repository - Research Portal Dépôt Institutionnel - Portail de la Recherche

Geometric Multigrid Methods

1 Numerical optimization

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Keywords: Nonlinear least-squares problems, regularized models, error bound condition, local convergence.

On Nesterov s Random Coordinate Descent Algorithms - Continued

Numerical Solution of a Coefficient Identification Problem in the Poisson equation

Spectral element agglomerate AMGe

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

Gradient-Based Optimization

MULTIGRID PRECONDITIONING IN H(div) ON NON-CONVEX POLYGONS* Dedicated to Professor Jim Douglas, Jr. on the occasion of his seventieth birthday.

Solving PDEs with Multigrid Methods p.1

Adaptive methods for control problems with finite-dimensional control space

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction

Nonsmooth Schur Newton Methods and Applications

A line-search algorithm inspired by the adaptive cubic regularization framework, with a worst-case complexity O(ɛ 3/2 ).

Transcription:

A Line search Multigrid Method for Large-Scale Nonlinear Optimization Zaiwen Wen Donald Goldfarb Department of Industrial Engineering and Operations Research Columbia University 2008 Siam Conference on Optimization

What s New? Multilevel/Multigrid Methods in Siam Conference on Optimization 2008 One plenary talk: Multiscale Optimization One dedicated session Four sessions on PDE-Based problems which are mainly handled by multigrid methods Almost 20 invited and contributed talks

0.25 0.2 0.5 0. 0.05 0 0.8 0.6 0.4 0.2 MG Solution level =4 0 0 0.2 0.4 0.6 0.8 0.25 0.2 0.5 0. 0.05 0 0.8 0.6 0.4 0.2 MG Solution level =5 0 0 0.2 0.4 0.6 0.8 0.25 0.2 0.5 0. 0.05 0 0.8 0.6 0.4 0.2 MG Solution level =6 0 0 0.2 0.4 0.6 0.8 Motivation Statement of Problem Statement of Problem Previous Work Consider solving problem min u V F(u) Infinite-dimensional problem: F is a functional F has a closely related representations {f h } on a hierarchical discretization levels h. Discretize-then-Optimize scheme: min f h Solutions of {min f h } might have similar structures Figure: Solution Structure of F(u) = Ω + u(x) 2 dx (a) Level 4 (b) Level 5 (c) Level 6

Sources of Problems Statement of Problem Previous Work Applications in nonlinear PDEs, image processing: min F(u) = L( u, u, x) dx u U Ω PDE-constrained optimization: optimal control problems and inverse Problems. Example: finding a local volatility σ(t, x) such that the prices C(T, S) from the Black-Scholes PDEs match the observed prices on the market. min J(C, σ) := C(S i, T i ) z(s i, T i ) 2 + αj r (σ) I s.t. τ C σ2 K 2 2 2 KK C + (r q)k K C + qc = 0, C(0, K ) = (S K ) +, K > 0, τ (0, + ), where J r (σ) is the regularization term.

Mesh-refinement Method Statement of Problem Previous Work Finest Level Problem Finer Level Problem Prolongation Restriction Finest Level Problem Finer Level Problem Prolongation......... Prolongation Restriction......... Prolongation Coarser Level Problem Prolongation Restriction Coarser Level Problem Prolongation Coarsest Level Problem Coarsest Level Problem

Linear Multigrid Methods Statement of Problem Previous Work Multigrid method for solving A h x h = b h, A h 0 Inter-grid operations: Restriction R h, Prolongation P h. Smoothing: reducing high frequency errors efficiently Coarse grid Correction: Let A h = R h A h P h. Algorithm Multigrid-cycle: x h,k+ = MGCYCLE(h, A h, b h, x h,k ) -PRE-SMOOTHING: Compute x h,k = x h,k + B h (b h A h x h,k ). -COARSE GRID CORRECTION: Compute the residual r h,k = b h A h x h,k. Restrict the residual r h,k = R h r h,k. -Solve the Coarse Grid Residual Equation A h e h,k = r h,k IF h = N 0, solve e h,k = A h r h,k, ELSE call e h,k = MGCYCLE(h, A h, r h,k, 0). Interpolate the correction: e h,k = P h e h,k. Compute the new approximation solution: x h,k+ = x h,k + e h,k. Smoothing R h Smoothing R H fine coarse coarsest P H P h

Statement of Problem Previous Work Methods for Solving nonlinear PDEs Global linearization method (Newton s Method) Local linearization method, such as the full approximation scheme (Brandt, Hackbusch) Combination of global and local linearization method (Irad Yavneh, Gregory Dardyk) Projection based multilevel method (Stephen Mccormick, Thomas Manteuffel, Oliver Rohrle, John Ruge) Multigrid methods for obstacle problems (Ralf Kornhuber, Carsten Graser)...

Statement of Problem Previous Work Multigrid Methods for Optimization Traditional optimization methods where the system of linear equations is solved by multigrid methods (A.Borzi, K.Kunisch, Volker Schulz, Thomas Dreyer, Bernd Maar, U.M.Ascher, E.Haber) The multigrid optimization framework proposed by Nash and Lewis. (extensions given by A.Borzi) The recursive trust region method for unconstrained and box-constrained optimization (S. Gratton, A. Sartenaer, P. Toint, M. Weber, D. Tomanos, M.Mouffe, M. Ulbrich, S. Ulbrich, B. Loesch) Multigrid method for image processing (Raymond Chan, Ke Chen, Xue-cheng Tai, Tony Chan, Elad Haber, Jan Moderstitzki)...

General Idea Motivation A general line search multigrid framework A new line search scheme Consider min x h V h f h (x h ) on a class of nested spaces V N0 V N V N V. General idea of line search: x h,k+ = x h,k + α h,k d h,k, d h,k V h, where d h,k is the search direction and α h,k is the step size. General idea of using coarse grid information: d h,k = arg min d H V H f h (x h,k + d H ).

A general line search multigrid framework A new line search scheme An illustration of the multigrid optimization framework. A direct search direction (Taylor iteration), marked by, is generated on the current level. A recursive search direction, marked by, is generated from the coarser levels. Level 5 4 3 2 0 recursive P h direct R h 0 0 2 3 0 2 0 Iteration 2 3 4 5 6 7 8 9 0 2 3 4

A general line search multigrid framework A new line search scheme Construction of recursive search direction If condition (Gratton, Sartenaer, Toint) R h g h,k κ g h,k, R h g h,k ɛ h hold, we compute recursive search direction ( i d h,k = P h dh = P h(x H,i x H,0 ) = P h Two major difficulties: i=0 α H,i d H,i ) The calculation of f h (x h,k + d H ) might be expensive. 2 The recursive direction might not be a descent direction.,

A general line search multigrid framework A new line search scheme Construction of the Coarse Level Model Coarse Level Model (Nash, 2000) min x H {ψ H (x H ) f H (x H ) (v H ) x H }, where v H = f H,0 R h g h,k and g h,k = ψ h (x h,k ). Define v N = 0, then f N (x N ) = ψ N (x N ) The scheme is compatible to the FAS scheme Second order coherence about the Hessian can also be enforced

A general line search multigrid framework A new line search scheme Properties of Recursive Search Direction d h,k Assumption Assume σ h P h = R h. First-order coherence g H,0 = R h g h,k, (d h,k ) g h,k = (d H ) g H,0. If f (x) is convex, the direction d h,k is a descent direction (d h,k ) g h,k < 0 and the directional derivative (d h,k ) g h,k satisfies (d h,k ) g h,k ψ H,0 ψ H,i.

32 8 32 8 4 32 32 Motivation Failure of recursive search direction Consider Rosenbrock function A general line search multigrid framework A new line search scheme ϕ(x) = 00(x 2 x 2 )2 + ( x ) 2 Local minimizer x = (, ), initial point x 0 = ( 0.5, 0.5). Search along the direction x x 0? ϕ(x 0 ) (x x 0 ) = (47, 50)(.5, 0.5) = 95.5 > 0.5 x* 8 4 8 32 4 4 x k 32 8 32 32 0.5 8 8 4 8 g k 4 4 32 0 8 32 4 8 0.5.5 0.5 0 0.5.5

A general line search multigrid framework A new line search scheme Obtaining a descent recursive direction Convexify the function f (x) on the coarser level. For example, setting ψ H = ψ H + λ H x H x H,0 2 2. Trust region method: min m h (s h,k ), subject to s h,k B h Line search method, but checking the decent condition at each step. Non-monotone line search Watch-dog technique

A New Line Search Scheme A general line search multigrid framework A new line search scheme Given the direction d h,k = P h dh,i = P h (x H,i x H,0 ). One can check g h,k d h,k on the coarser levels since the first order coherence: g h,k d h,k = g H,0 d H,i Since ψ H (x H,i ) ψ H (x H,0 ) + g H,0 (x H,i x H,0 ), we check Then ψ H (x H,i ) > ψ H,0 + ρ 2 g H,0 d H,i = ψ H,0 + ρ 2 g h,k d h,k g h,k d h,k < ρ 2 (ψ H(x H,i ) ψ H,0 ) 0 Sufficient reduction of the function value (Armijo Condition): ψ h (x h,k + α h,k d h,k ) ψ h,k + ρ α h,k (g h,k ) d h,k

A New Line Search Scheme A general line search multigrid framework A new line search scheme Line search conditions on the coarser level (A) ψ h (x h,k + α h,k d h,k ) ψ h,k + ρ α h,k (g h,k ) d h,k. (B) ψ h (x h,k + α h,k d h,k ) > ψ h,0 + ρ 2 g h,0 (x h,k + α h,k d h,k x h,0 ) Algorithm Condition (B) is similar to the Goldstein rule if k = 0 and ρ 2 = ρ. Backtracking Line Search Step. Given α ρ > 0, 0 < ρ < 2 and ρ ρ 2. Let α (0) = α ρ. Set l = 0. Step 2. If (h = N and condition (A) is satisfied) or if (h < N and both conditions (A) and (B) are satisfied), RETURN α h,k = α (l). Step 3. Set α (l+) = τα (l), where τ (0, ). Set l = l + and go to Step 2.

Existence of the step size A general line search multigrid framework A new line search scheme The first step size α h,0 always exists! Suppose α h,i for i = 0,, k exist, then α h,k+ also exists. Define φ h,k := ψ h,0 + ρ 2 g h,0 (x h,k x h,0 ). ψ h,k > ψ h,0 + ρ 2 g h,0 (x h,k + α h,k d h,k x h,0 ) = φ h,k Condition (B): ψ h (x h,k + α h,k d h,k ) > φ h,k + ρ 2 α h,k g h,0 d h,k φ h,k + ρ 2α(g 2 h,0 ) d h,k ψ h(x h,k + αd h,k) ψ h,k ψ h,k + ρ αg h,k dh,k φ h,k φ h,k + ρ 2α(g h,0 ) d h,k τ τ 2 τ 3 τ 4 Exit MG if the step size is too small

Choice of direct search direction We require that a direct search satisfy Direct Search Condition A general line search multigrid framework A new line search scheme d h,k β T g h,k and (d h,k ) g h,k η T g h,k 2. The following directions satisfy the condition (convex) Steepest descent direction d h,k = g h,k. Exact Newton s direction d h,k = G h,k g h,k. Inexact Newton s direction generated by the conjugate gradient method. Other possible choices: Inexact Newton s direction generated by the linear multigrid method Limited-memory BFGS direction

The Algorithm Motivation A general line search multigrid framework A new line search scheme Algorithm x h = MGLS(h, x h,0, g h,0 ) Step. Given κ > 0, ɛ h > 0 and ξ > 0 and an integer K. Step 2. IF h < N, compute v h = f h,0 g h,0, set g h,0 = g h,0 ; ELSE set v h = 0 and compute g h,0 = f h,0. Step 3. FOR k = 0,, 2, 3.. IF g h,k ɛ h or IF h < N and k K, RETURN solution x h,k ; 3.2. IF h = N 0 or R h g h,k < κ g h,k or R h g h,k < ɛ h -Direct Search Direction Computation. Compute a descent search direction d h,k on the current level. ELSE -Recursive Search Direction Computation. Call x h,i = MGLS(h, R h x h,k, R h g h,k ) to return a solution (or approximate solution) x h,i of min xh ψ h (x h )". Compute d h,k = P h e dh,i = P h (x h,i R h x h,k ). 3.3. Call the backtracking line search Algorithm to obtain a step size α h,k. 3.4. Set x h,k+ = x h,k + α h,k d h,k. IF α h,k ξ and h < N, RETURN solution x h,k+.

Convergence for Convex functions A general line search multigrid framework A new line search scheme Let ρ 2 =. Assume f h (x) is twice continuously differentiable and uniformly convex. Suppose the direct search Condition is satisfied by all direct search steps. The step size α h,k is bounded below by a constant. 2 d h,k g h,k η h g h,k 2 and cos(θ h,k ) δ h, where θ h,k is the angle between d h,k and g h,k. 3 {x N,k } converges to the unique minimizer {x N } of f N(x N ). 4 The rate of convergence is at least R-linear. 5 For any ɛ > 0, after at most τ = log((f N(x N,0 ) f N (xn ))/ɛ) log(/c) iterations, where 0 < c = χα η N 2 <, we have f N (x N,k ) f N (xn ) ɛ.

A general line search multigrid framework A new line search scheme Convergence for general nonconvex functions Let ρ ρ 2 <. Assume all direct search direction d h,k satisfy the direct search condition; the level set D h = {x h : ψ h (x h ) ψ h (x h,0 )} is bounded; the objective function ψ h is continuously differentiable and the gradient ψ h is Lipschitz continuous. The step size and the directional derivative of the first iteration of each minimization sequence on the coarser levels can be bounded from below by the norm of gradient raised to some finite power 2 At the uppermost level lim k f N (x N,k ) = 0.

Implementation Issues Implementation Issues Test Examples Discretization of the problems Choices of the direct search direction Prolongation and Restriction operator Full multigrid method 5 4 3 2 0 Level 0 2 3 0 0 2 recursive Ph direct Rh 0 0 0 2 3 Iteration 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 Sharing information on different minimization sequences P (l) : P (l+) : min ψ (l) h (x h) = f h (x h ) (v (l) h ) x h, min ψ (l+) h (x h ) = f h (x h ) (v (l+) h ) x h,

Test Examples Motivation Implementation Issues Test Examples We compared the following three algorithms: The standard L-BFGS method, denoted by L-BFGS. 2 The mesh refinement technique, denoted by MRLS. 3 The full multigrid Algorithm with one smoothing step, denoted by FMLS. Implementation detail: The problems are discretized by finite difference. The initial point in the FMLS Algorithm is taken to be the zero vector. For the MGLS Algorithm, we set κ = 0 4, ɛ h = 0 5 /5 N h, K = 00, ρ = 0 3, ρ 2 = ρ. The number of gradient and step difference pairs stored by the L-BFGS method was set to 5.

Minimal Surface Problem Implementation Issues Test Examples Consider the minimal surface problem min f (u) = + u(x) 2 dx Ω () s.t. u(x) K = {u H (Ω) : u(x) = uω (x) for x Ω}, where Ω = [0, ] [0, ] and { y( y), x = 0,, u Ω (x) = x( x), y = 0,.

Minimal Surface Problem Implementation Issues Test Examples Table: Summary of computational costs L-BFGS h nls nfe nge gn 2 CPU 8 820 837 82 7.6430e-06 69.0924 FMLS MRLS h 3 4 5 6 7 8 3 4 5 6 7 8 nls 268 50 96 6 22 0 7 26 45 66 88 25 nfe 355 236 40 80 28 2 20 27 47 7 9 26 nge 324 8 2 67 24 8 27 46 67 89 26 gn 2 7.98984e-06 9.03382e-06 CPU 3.9322 7.36342 h indicates the level; nls", nfe and nge" denote the total number of line searches, the total number of function evaluations and the total number of gradient evaluations at that level, respectively. We also report the total CPU time measured in seconds and the accuracy attained, which is measured by the Euclidean-norm g N 2 of the gradient at the final iteration.

Minimal Surface Problem Implementation Issues Test Examples Figure: Iteration history of the FMLS method 9 Iteration History 8 7 6 level 5 4 3 2 0 00 200 300 400 500 600 700 Iteration

Nonlinear PDE Motivation Implementation Issues Test Examples Consider the nonlinear PDE: u + λue u = f in Ω, u = 0 on Ω, (2) where λ = 0, Ω = [0, ] [0, ] and ( ) f = 9π 2 + λe ((x 2 x 3 ) sin(3πy) )(x 2 x 3 ) + 6x 2 sin(3πy), and the exact solution is u = (x 2 x 3 ) sin(3πy). The corresponding variational problem is min F(u) = 2 u 2 λ(ue u e u ) fu dx. Ω

Nonlinear PDE Motivation Implementation Issues Test Examples Table: Summary of computational costs L-BFGS h nls nfe nge gn 2 CPU 8 43 463 432 8.858409e-06 56.467 FMLS MRLS h 3 4 5 6 7 8 3 4 5 6 7 8 nls 82 0 50 28 3 8 6 27 49 58 69 59 nfe 234 44 73 4 8 20 3 53 6 75 60 nge 208 4 56 32 5 9 7 28 50 59 70 60 gn 2 5.90493e-06 9.40884e-06 CPU 3.05293 2.225726 h indicates the level; nls", nfe and nge" denote the total number of line searches, the total number of function evaluations and the total number of gradient evaluations at that level, respectively. We also report the total CPU time measured in seconds and the accuracy attained, which is measured by the Euclidean-norm g N 2 of the gradient at the final iteration.

Nonlinear PDE Motivation Implementation Issues Test Examples Figure: Iteration history of the FMLS method 9 Iteration History 8 7 6 level 5 4 3 2 0 50 00 50 200 250 300 350 400 Iteration

Summary Motivation Implementation Issues Test Examples Interpretation of Multigrid in optimization A new globally convergent line search algorithm by imposing a new condition on a modified backtracking line search procedure Outlook More efficient direct search directions? Multigrid for constrained optimization? Thanks S.Nash and M.Lewis for the multigrid framework P.Toint and his group for the Recursive TR method Y. Yuan for the subspace minimization