A Line search Multigrid Method for Large-Scale Nonlinear Optimization Zaiwen Wen Donald Goldfarb Department of Industrial Engineering and Operations Research Columbia University 2008 Siam Conference on Optimization
What s New? Multilevel/Multigrid Methods in Siam Conference on Optimization 2008 One plenary talk: Multiscale Optimization One dedicated session Four sessions on PDE-Based problems which are mainly handled by multigrid methods Almost 20 invited and contributed talks
0.25 0.2 0.5 0. 0.05 0 0.8 0.6 0.4 0.2 MG Solution level =4 0 0 0.2 0.4 0.6 0.8 0.25 0.2 0.5 0. 0.05 0 0.8 0.6 0.4 0.2 MG Solution level =5 0 0 0.2 0.4 0.6 0.8 0.25 0.2 0.5 0. 0.05 0 0.8 0.6 0.4 0.2 MG Solution level =6 0 0 0.2 0.4 0.6 0.8 Motivation Statement of Problem Statement of Problem Previous Work Consider solving problem min u V F(u) Infinite-dimensional problem: F is a functional F has a closely related representations {f h } on a hierarchical discretization levels h. Discretize-then-Optimize scheme: min f h Solutions of {min f h } might have similar structures Figure: Solution Structure of F(u) = Ω + u(x) 2 dx (a) Level 4 (b) Level 5 (c) Level 6
Sources of Problems Statement of Problem Previous Work Applications in nonlinear PDEs, image processing: min F(u) = L( u, u, x) dx u U Ω PDE-constrained optimization: optimal control problems and inverse Problems. Example: finding a local volatility σ(t, x) such that the prices C(T, S) from the Black-Scholes PDEs match the observed prices on the market. min J(C, σ) := C(S i, T i ) z(s i, T i ) 2 + αj r (σ) I s.t. τ C σ2 K 2 2 2 KK C + (r q)k K C + qc = 0, C(0, K ) = (S K ) +, K > 0, τ (0, + ), where J r (σ) is the regularization term.
Mesh-refinement Method Statement of Problem Previous Work Finest Level Problem Finer Level Problem Prolongation Restriction Finest Level Problem Finer Level Problem Prolongation......... Prolongation Restriction......... Prolongation Coarser Level Problem Prolongation Restriction Coarser Level Problem Prolongation Coarsest Level Problem Coarsest Level Problem
Linear Multigrid Methods Statement of Problem Previous Work Multigrid method for solving A h x h = b h, A h 0 Inter-grid operations: Restriction R h, Prolongation P h. Smoothing: reducing high frequency errors efficiently Coarse grid Correction: Let A h = R h A h P h. Algorithm Multigrid-cycle: x h,k+ = MGCYCLE(h, A h, b h, x h,k ) -PRE-SMOOTHING: Compute x h,k = x h,k + B h (b h A h x h,k ). -COARSE GRID CORRECTION: Compute the residual r h,k = b h A h x h,k. Restrict the residual r h,k = R h r h,k. -Solve the Coarse Grid Residual Equation A h e h,k = r h,k IF h = N 0, solve e h,k = A h r h,k, ELSE call e h,k = MGCYCLE(h, A h, r h,k, 0). Interpolate the correction: e h,k = P h e h,k. Compute the new approximation solution: x h,k+ = x h,k + e h,k. Smoothing R h Smoothing R H fine coarse coarsest P H P h
Statement of Problem Previous Work Methods for Solving nonlinear PDEs Global linearization method (Newton s Method) Local linearization method, such as the full approximation scheme (Brandt, Hackbusch) Combination of global and local linearization method (Irad Yavneh, Gregory Dardyk) Projection based multilevel method (Stephen Mccormick, Thomas Manteuffel, Oliver Rohrle, John Ruge) Multigrid methods for obstacle problems (Ralf Kornhuber, Carsten Graser)...
Statement of Problem Previous Work Multigrid Methods for Optimization Traditional optimization methods where the system of linear equations is solved by multigrid methods (A.Borzi, K.Kunisch, Volker Schulz, Thomas Dreyer, Bernd Maar, U.M.Ascher, E.Haber) The multigrid optimization framework proposed by Nash and Lewis. (extensions given by A.Borzi) The recursive trust region method for unconstrained and box-constrained optimization (S. Gratton, A. Sartenaer, P. Toint, M. Weber, D. Tomanos, M.Mouffe, M. Ulbrich, S. Ulbrich, B. Loesch) Multigrid method for image processing (Raymond Chan, Ke Chen, Xue-cheng Tai, Tony Chan, Elad Haber, Jan Moderstitzki)...
General Idea Motivation A general line search multigrid framework A new line search scheme Consider min x h V h f h (x h ) on a class of nested spaces V N0 V N V N V. General idea of line search: x h,k+ = x h,k + α h,k d h,k, d h,k V h, where d h,k is the search direction and α h,k is the step size. General idea of using coarse grid information: d h,k = arg min d H V H f h (x h,k + d H ).
A general line search multigrid framework A new line search scheme An illustration of the multigrid optimization framework. A direct search direction (Taylor iteration), marked by, is generated on the current level. A recursive search direction, marked by, is generated from the coarser levels. Level 5 4 3 2 0 recursive P h direct R h 0 0 2 3 0 2 0 Iteration 2 3 4 5 6 7 8 9 0 2 3 4
A general line search multigrid framework A new line search scheme Construction of recursive search direction If condition (Gratton, Sartenaer, Toint) R h g h,k κ g h,k, R h g h,k ɛ h hold, we compute recursive search direction ( i d h,k = P h dh = P h(x H,i x H,0 ) = P h Two major difficulties: i=0 α H,i d H,i ) The calculation of f h (x h,k + d H ) might be expensive. 2 The recursive direction might not be a descent direction.,
A general line search multigrid framework A new line search scheme Construction of the Coarse Level Model Coarse Level Model (Nash, 2000) min x H {ψ H (x H ) f H (x H ) (v H ) x H }, where v H = f H,0 R h g h,k and g h,k = ψ h (x h,k ). Define v N = 0, then f N (x N ) = ψ N (x N ) The scheme is compatible to the FAS scheme Second order coherence about the Hessian can also be enforced
A general line search multigrid framework A new line search scheme Properties of Recursive Search Direction d h,k Assumption Assume σ h P h = R h. First-order coherence g H,0 = R h g h,k, (d h,k ) g h,k = (d H ) g H,0. If f (x) is convex, the direction d h,k is a descent direction (d h,k ) g h,k < 0 and the directional derivative (d h,k ) g h,k satisfies (d h,k ) g h,k ψ H,0 ψ H,i.
32 8 32 8 4 32 32 Motivation Failure of recursive search direction Consider Rosenbrock function A general line search multigrid framework A new line search scheme ϕ(x) = 00(x 2 x 2 )2 + ( x ) 2 Local minimizer x = (, ), initial point x 0 = ( 0.5, 0.5). Search along the direction x x 0? ϕ(x 0 ) (x x 0 ) = (47, 50)(.5, 0.5) = 95.5 > 0.5 x* 8 4 8 32 4 4 x k 32 8 32 32 0.5 8 8 4 8 g k 4 4 32 0 8 32 4 8 0.5.5 0.5 0 0.5.5
A general line search multigrid framework A new line search scheme Obtaining a descent recursive direction Convexify the function f (x) on the coarser level. For example, setting ψ H = ψ H + λ H x H x H,0 2 2. Trust region method: min m h (s h,k ), subject to s h,k B h Line search method, but checking the decent condition at each step. Non-monotone line search Watch-dog technique
A New Line Search Scheme A general line search multigrid framework A new line search scheme Given the direction d h,k = P h dh,i = P h (x H,i x H,0 ). One can check g h,k d h,k on the coarser levels since the first order coherence: g h,k d h,k = g H,0 d H,i Since ψ H (x H,i ) ψ H (x H,0 ) + g H,0 (x H,i x H,0 ), we check Then ψ H (x H,i ) > ψ H,0 + ρ 2 g H,0 d H,i = ψ H,0 + ρ 2 g h,k d h,k g h,k d h,k < ρ 2 (ψ H(x H,i ) ψ H,0 ) 0 Sufficient reduction of the function value (Armijo Condition): ψ h (x h,k + α h,k d h,k ) ψ h,k + ρ α h,k (g h,k ) d h,k
A New Line Search Scheme A general line search multigrid framework A new line search scheme Line search conditions on the coarser level (A) ψ h (x h,k + α h,k d h,k ) ψ h,k + ρ α h,k (g h,k ) d h,k. (B) ψ h (x h,k + α h,k d h,k ) > ψ h,0 + ρ 2 g h,0 (x h,k + α h,k d h,k x h,0 ) Algorithm Condition (B) is similar to the Goldstein rule if k = 0 and ρ 2 = ρ. Backtracking Line Search Step. Given α ρ > 0, 0 < ρ < 2 and ρ ρ 2. Let α (0) = α ρ. Set l = 0. Step 2. If (h = N and condition (A) is satisfied) or if (h < N and both conditions (A) and (B) are satisfied), RETURN α h,k = α (l). Step 3. Set α (l+) = τα (l), where τ (0, ). Set l = l + and go to Step 2.
Existence of the step size A general line search multigrid framework A new line search scheme The first step size α h,0 always exists! Suppose α h,i for i = 0,, k exist, then α h,k+ also exists. Define φ h,k := ψ h,0 + ρ 2 g h,0 (x h,k x h,0 ). ψ h,k > ψ h,0 + ρ 2 g h,0 (x h,k + α h,k d h,k x h,0 ) = φ h,k Condition (B): ψ h (x h,k + α h,k d h,k ) > φ h,k + ρ 2 α h,k g h,0 d h,k φ h,k + ρ 2α(g 2 h,0 ) d h,k ψ h(x h,k + αd h,k) ψ h,k ψ h,k + ρ αg h,k dh,k φ h,k φ h,k + ρ 2α(g h,0 ) d h,k τ τ 2 τ 3 τ 4 Exit MG if the step size is too small
Choice of direct search direction We require that a direct search satisfy Direct Search Condition A general line search multigrid framework A new line search scheme d h,k β T g h,k and (d h,k ) g h,k η T g h,k 2. The following directions satisfy the condition (convex) Steepest descent direction d h,k = g h,k. Exact Newton s direction d h,k = G h,k g h,k. Inexact Newton s direction generated by the conjugate gradient method. Other possible choices: Inexact Newton s direction generated by the linear multigrid method Limited-memory BFGS direction
The Algorithm Motivation A general line search multigrid framework A new line search scheme Algorithm x h = MGLS(h, x h,0, g h,0 ) Step. Given κ > 0, ɛ h > 0 and ξ > 0 and an integer K. Step 2. IF h < N, compute v h = f h,0 g h,0, set g h,0 = g h,0 ; ELSE set v h = 0 and compute g h,0 = f h,0. Step 3. FOR k = 0,, 2, 3.. IF g h,k ɛ h or IF h < N and k K, RETURN solution x h,k ; 3.2. IF h = N 0 or R h g h,k < κ g h,k or R h g h,k < ɛ h -Direct Search Direction Computation. Compute a descent search direction d h,k on the current level. ELSE -Recursive Search Direction Computation. Call x h,i = MGLS(h, R h x h,k, R h g h,k ) to return a solution (or approximate solution) x h,i of min xh ψ h (x h )". Compute d h,k = P h e dh,i = P h (x h,i R h x h,k ). 3.3. Call the backtracking line search Algorithm to obtain a step size α h,k. 3.4. Set x h,k+ = x h,k + α h,k d h,k. IF α h,k ξ and h < N, RETURN solution x h,k+.
Convergence for Convex functions A general line search multigrid framework A new line search scheme Let ρ 2 =. Assume f h (x) is twice continuously differentiable and uniformly convex. Suppose the direct search Condition is satisfied by all direct search steps. The step size α h,k is bounded below by a constant. 2 d h,k g h,k η h g h,k 2 and cos(θ h,k ) δ h, where θ h,k is the angle between d h,k and g h,k. 3 {x N,k } converges to the unique minimizer {x N } of f N(x N ). 4 The rate of convergence is at least R-linear. 5 For any ɛ > 0, after at most τ = log((f N(x N,0 ) f N (xn ))/ɛ) log(/c) iterations, where 0 < c = χα η N 2 <, we have f N (x N,k ) f N (xn ) ɛ.
A general line search multigrid framework A new line search scheme Convergence for general nonconvex functions Let ρ ρ 2 <. Assume all direct search direction d h,k satisfy the direct search condition; the level set D h = {x h : ψ h (x h ) ψ h (x h,0 )} is bounded; the objective function ψ h is continuously differentiable and the gradient ψ h is Lipschitz continuous. The step size and the directional derivative of the first iteration of each minimization sequence on the coarser levels can be bounded from below by the norm of gradient raised to some finite power 2 At the uppermost level lim k f N (x N,k ) = 0.
Implementation Issues Implementation Issues Test Examples Discretization of the problems Choices of the direct search direction Prolongation and Restriction operator Full multigrid method 5 4 3 2 0 Level 0 2 3 0 0 2 recursive Ph direct Rh 0 0 0 2 3 Iteration 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 Sharing information on different minimization sequences P (l) : P (l+) : min ψ (l) h (x h) = f h (x h ) (v (l) h ) x h, min ψ (l+) h (x h ) = f h (x h ) (v (l+) h ) x h,
Test Examples Motivation Implementation Issues Test Examples We compared the following three algorithms: The standard L-BFGS method, denoted by L-BFGS. 2 The mesh refinement technique, denoted by MRLS. 3 The full multigrid Algorithm with one smoothing step, denoted by FMLS. Implementation detail: The problems are discretized by finite difference. The initial point in the FMLS Algorithm is taken to be the zero vector. For the MGLS Algorithm, we set κ = 0 4, ɛ h = 0 5 /5 N h, K = 00, ρ = 0 3, ρ 2 = ρ. The number of gradient and step difference pairs stored by the L-BFGS method was set to 5.
Minimal Surface Problem Implementation Issues Test Examples Consider the minimal surface problem min f (u) = + u(x) 2 dx Ω () s.t. u(x) K = {u H (Ω) : u(x) = uω (x) for x Ω}, where Ω = [0, ] [0, ] and { y( y), x = 0,, u Ω (x) = x( x), y = 0,.
Minimal Surface Problem Implementation Issues Test Examples Table: Summary of computational costs L-BFGS h nls nfe nge gn 2 CPU 8 820 837 82 7.6430e-06 69.0924 FMLS MRLS h 3 4 5 6 7 8 3 4 5 6 7 8 nls 268 50 96 6 22 0 7 26 45 66 88 25 nfe 355 236 40 80 28 2 20 27 47 7 9 26 nge 324 8 2 67 24 8 27 46 67 89 26 gn 2 7.98984e-06 9.03382e-06 CPU 3.9322 7.36342 h indicates the level; nls", nfe and nge" denote the total number of line searches, the total number of function evaluations and the total number of gradient evaluations at that level, respectively. We also report the total CPU time measured in seconds and the accuracy attained, which is measured by the Euclidean-norm g N 2 of the gradient at the final iteration.
Minimal Surface Problem Implementation Issues Test Examples Figure: Iteration history of the FMLS method 9 Iteration History 8 7 6 level 5 4 3 2 0 00 200 300 400 500 600 700 Iteration
Nonlinear PDE Motivation Implementation Issues Test Examples Consider the nonlinear PDE: u + λue u = f in Ω, u = 0 on Ω, (2) where λ = 0, Ω = [0, ] [0, ] and ( ) f = 9π 2 + λe ((x 2 x 3 ) sin(3πy) )(x 2 x 3 ) + 6x 2 sin(3πy), and the exact solution is u = (x 2 x 3 ) sin(3πy). The corresponding variational problem is min F(u) = 2 u 2 λ(ue u e u ) fu dx. Ω
Nonlinear PDE Motivation Implementation Issues Test Examples Table: Summary of computational costs L-BFGS h nls nfe nge gn 2 CPU 8 43 463 432 8.858409e-06 56.467 FMLS MRLS h 3 4 5 6 7 8 3 4 5 6 7 8 nls 82 0 50 28 3 8 6 27 49 58 69 59 nfe 234 44 73 4 8 20 3 53 6 75 60 nge 208 4 56 32 5 9 7 28 50 59 70 60 gn 2 5.90493e-06 9.40884e-06 CPU 3.05293 2.225726 h indicates the level; nls", nfe and nge" denote the total number of line searches, the total number of function evaluations and the total number of gradient evaluations at that level, respectively. We also report the total CPU time measured in seconds and the accuracy attained, which is measured by the Euclidean-norm g N 2 of the gradient at the final iteration.
Nonlinear PDE Motivation Implementation Issues Test Examples Figure: Iteration history of the FMLS method 9 Iteration History 8 7 6 level 5 4 3 2 0 50 00 50 200 250 300 350 400 Iteration
Summary Motivation Implementation Issues Test Examples Interpretation of Multigrid in optimization A new globally convergent line search algorithm by imposing a new condition on a modified backtracking line search procedure Outlook More efficient direct search directions? Multigrid for constrained optimization? Thanks S.Nash and M.Lewis for the multigrid framework P.Toint and his group for the Recursive TR method Y. Yuan for the subspace minimization