Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization Denis Ridzal Department of Computational and Applied Mathematics Rice University, Houston, Texas dridzal@caam.rice.edu March 24, 2006 Rice University CAAM699 Seminar University of Houston D. Ridzal Inexact TR SQP for Large Scale Optimization 1
Outline Motivation Large Scale Problems in PDE Constrained Optimization Inexactness in linear system solves arising in an SQP algorithm Trust Region SQP Algorithm with Inexact Linear System Solves Existing Work on Inexactness in Optimization Algorithms Review of the SQP Methodology Mechanisms of Inexactness Control Numerical Results Conclusion D. Ridzal Inexact TR SQP for Large Scale Optimization 2
Motivation: A PDE Constrained Optimization Problem 10 5 Velocity Field u 0 0 10 20 30 40 50 60 70 80 90 100 subject to minimize 1 2 Ω c 2 dω + α2 2 Computed Concentration c Ω c ( v) 2 + v 2 dγ ρ(u u) µ ( u + u T ) + p = 0 in Ω, u = 0 in Ω, pn + µ ( u + u T ) n = 0 on Ω o, u = 0 on Ω \ ( Ω c Ω o ), u = vn on Ω c, (ɛ c) + u c = f on Ω, c = 0 on Ω \ Ω c, n ɛ c = g on Ω c. D. Ridzal Inexact TR SQP for Large Scale Optimization 3
Large Scale Optimization Problems: Common Features Other applications: optimal design / shape optimization, parameter estimation, inverse problems. Common features: can be solved as constrained nonlinear programming problems (NLPs) using all-at-once techniques number of variables can easily be in the millions in 3D the discretized linear operators are often not available in matrix form even if available explicitly, the resulting linear systems usually require specialized solvers, such as multigrid or domain decomposition regardless of which optimization algorithm is used, linear systems must be solved iteratively! D. Ridzal Inexact TR SQP for Large Scale Optimization 4
Use of Sequential Quadratic Programming Methods SQP methods have been used successfully for the solution of smooth NLPs in R n. Most available SQP codes (NPSOL, SNOPT, KNITRO, LOQO) are based on direct (dense or sparse) linear algebra. impossible to apply to many large scale optimization problems, in particular PDE constrained optimization problems not suitable for parallel computing environments Contribution: Incorporated iterative linear algebra in an SQP framework:!!! iterative linear system solvers are inherently inexact rigorous theoretical analysis of inexactness within an SQP algorithm practical approaches to inexactness control D. Ridzal Inexact TR SQP for Large Scale Optimization 5
Outline Motivation Large Scale Problems in PDE Constrained Optimization Inexactness in linear system solves arising in an SQP algorithm Trust Region SQP Algorithm with Inexact Linear System Solves Existing Work on Inexactness in Optimization Algorithms Review of the SQP Methodology Mechanisms of Inexactness Control Numerical Results Conclusion D. Ridzal Inexact TR SQP for Large Scale Optimization 5
Inexactness in Optimization Algorithms: Existing Work Early results for inexact Newton methods in optimization: e.g. Dembo, Eisenstat, Steihaug, Dennis, Walker (1980s) Connection with inexact SQP methods: Dembo and Tulowitzki (1985) and Fontecilla (1985), limited to local convergence analysis! Global results for inexact Newton methods for nonlinear equations: e.g. Brown and Saad (1990,1994), Eisenstat and Walker (1994) Jäger and Sachs (1997) line search reduced space SQP first global convergence result dependence on Lipschitz constants and derivative bounds Biros and Ghattas (2002) quasi Newton reduced space SQP dependence on derivative bounds Heinkenschloss and Vicente (2001) reduced space TRSQP established a theoretical convergence framework that does not rely on Lipschitz constants or derivative bounds limited to the reduced space SQP approach D. Ridzal Inexact TR SQP for Large Scale Optimization 6
Review of Trust-Region SQP Solve NLP: min f(x) s.t. c(x) = 0 where f : X R and c : X Y, for some Hilbert spaces X and Y, and f and c are twice continuously Fréchet differentiable. define Lagrangian functional L : X Y R: L(x, λ) = f(x) + λ, c(x) Y if regular point x is a local solution of the NLP, then there exists a λ Y satisfying the 1st order necessary optimality conditions: x f(x ) + c x (x ) λ = 0 c(x ) = 0 D. Ridzal Inexact TR SQP for Large Scale Optimization 7
Newton s method applied to the 1st order optimality conditions: ( ) ( ) ( ) xx L(x k, λ k ) c x (x k ) s x k x f(x c x (x k ) 0 s λ = k ) + c x (x k ) λ k k c(x k ) If xx L(x k, λ k ) is positive definite on the null space of c x (x k ), the above KKT system is necessary and sufficient for the solution of the quadratic programming problem (QP): min 1 2 xxl(x k, λ k )s x k, s x k X + x L(x k, λ k ), s x k X s.t. c x (x k )s x k + c(x k ) = 0 To globalize the convergence, we add a trust region constraint: min 1 2 H ks x k, s x k X + x L k, s x k X s.t. c x (x k )s x k + c(x k ) = 0 s x k X k. Possible incompatibility of constraints: Composite Step Approach. D. Ridzal Inexact TR SQP for Large Scale Optimization 8
Composite Step Approach for the Solution of the Quadratic Subproblem TR SQP step: s k = n k + t k quasi-normal step n k : moves toward feasibility tangential step t k : moves toward optimality while staying in the null space of the linearized constraints t k n k ζ k k c x(x k )s x + c(x k ) = 0 c x(x k )t = 0 e.g. Omojokun [1989], Byrd, Hribar, Nocedal [1997], Dennis, El Alem, Maciel [1997], Dennis, Heinkenschloss, Vicente [1998], Conn, Gould, Toint [2000] D. Ridzal Inexact TR SQP for Large Scale Optimization 9
Acceptance of the Step Merit function: φ(x, λ; ρ) = f(x) + λ, c(x) Y + ρ c(x) 2 Y = L(x, λ) + ρ c(x) 2 Y. Actual reduction at step k: ared(s x k; ρ k ) = φ(x k, λ k ; ρ k ) φ(x k + s k, λ k+1 ; ρ k ) Predicted reduction at step k: [ pred(s x k; ρ k ) = φ(x k, λ k ; ρ k ) L(x k, λ k )+ g k, s k X + 1 2 H ks x k, s x k X ] + λ k+1 λ k, c x (x k )s x k + c(x k ) Y + ρ k c x (x k )s x k + c(x k ) 2 Y. D. Ridzal Inexact TR SQP for Large Scale Optimization 10
Composite Step Trust Region SQP Algorithm 1. Compute quasi normal step n k. 2. Compute tangential step t k. 3. Compute new Lagrange multiplier estimate λ k+1. 4. Update penalty parameter ρ k. 5. Compute ared k, pred k. 6. Decide whether to accept the new iterate x k+1 = x k + n k + t k, and update k+1 from k, based on ared k pred k. D. Ridzal Inexact TR SQP for Large Scale Optimization 11
Composite Step Trust Region SQP Algorithm 1. Compute quasi normal step n k. One linear system involving c x (x k ). Possible inexactness! 2. Compute tangential step t k. 3. Compute new Lagrange multiplier estimate λ k+1. One linear system involving c x (x k ). Possible inexactness! 4. Update penalty parameter ρ k. 5. Compute ared k, pred k. 6. Decide whether to accept the new iterate x k+1 = x k + n k + t k, and update k+1 from k, based on ared k pred k. D. Ridzal Inexact TR SQP for Large Scale Optimization 11
Composite Step Trust Region SQP Algorithm 1. Compute quasi normal step n k. One linear system involving c x (x k ). Possible inexactness! 2. Compute tangential step t k. Multiple linear systems involving c x (x k ). Possible inexactness! Depends on already (inexactly) computed quantities n k and λ k. 3. Compute new Lagrange multiplier estimate λ k+1. One linear system involving c x (x k ). Possible inexactness! 4. Update penalty parameter ρ k. 5. Compute ared k, pred k. 6. Decide whether to accept the new iterate x k+1 = x k + n k + t k, and update k+1 from k, based on ared k pred k. D. Ridzal Inexact TR SQP for Large Scale Optimization 11
Composite Step Trust Region SQP Algorithm 1. Compute quasi normal step n k. One linear system involving c x (x k ). Possible inexactness! 2. Compute tangential step t k. Multiple linear systems involving c x (x k ). Possible inexactness! Depends on already (inexactly) computed quantities n k and λ k. 3. Compute new Lagrange multiplier estimate λ k+1. One linear system involving c x (x k ). Possible inexactness! 4. Update penalty parameter ρ k. Need to modify penalty parameter update! 5. Compute ared k, pred k. The definition of pred k must be modified! 6. Decide whether to accept the new iterate x k+1 = x k + n k + t k, and update k+1 from k, based on ared k pred k. D. Ridzal Inexact TR SQP for Large Scale Optimization 11
Balancing Inexactness in the Quasi Normal and the Tangential Step c x (x k )s x k + c(x k) = 0 c x (x k )t k = 0 ζ k k D. Ridzal Inexact TR SQP for Large Scale Optimization 12
Balancing Inexactness in the Quasi Normal and the Tangential Step c x (x k )s x k + c(x k) = 0 c x (x k )t k = 0 ζ k k D. Ridzal Inexact TR SQP for Large Scale Optimization 12
Inexactness in TRSQP: Summary of My Contributions Iterative linear system solves arise in the computation of: (1) Lagrange multipliers, (2) quasi-normal step, (3) tangential step. Global convergence theory for TR/SQP methods gives a rather generic treatment of the issue of inexactness. My work ties these generic requirements to inexactness specific to linear system solves, for each of the above. The devised stopping criteria for iterative linear system solves are dynamically adjusted by the SQP algorithm, based on its current progress toward a KKT point, trade gains in feasibility for gains in optimality and vice versa, can be easily implemented and are sufficient to guarantee first order global convergence of the algorithm, allow for a rigorous integration of preconditioners for KKT systems. D. Ridzal Inexact TR SQP for Large Scale Optimization 13
Tangential Step The exact model requires that t k approximately solve the problem: min 1 2 H k(t + n k ), t + n k X + x L k, t + n k X s.t. c x (x k )t = 0 t + n k X k. Assume that there exists a bounded linear operator W k : Z X, where Z is a Hilbert space, such that Range(W k ) = Null(c x (x k )). Covers all existing implementations for handling c x(x k )t = 0. Drop constant term from the QP, ignore n k in the trust region constraint, set g k = H k n k + x L k, let t = W k w. Obtain equivalent reduced QP min q k (w) 1 2 W k H k W k w, w Z + W k g k, w Z s.t. W k w X k. D. Ridzal Inexact TR SQP for Large Scale Optimization 14
Tangential Step Steihaug Toint CG 0. Let w 0 = 0 Z. Let r 0 = W k g k, p 0 = r 0. 1. For i = 0, 1, 2,... 1.1 If p i, W k H k W k p i Z 0, extend w i to boundary of TR and stop. 1.2 α i = r i, r i Z / p i, W k H k W k p i Z 1.3 w i+1 = w i + α ip i 1.4 If W k w i+1 k, extend w i to boundary of TR and stop. 1.5 r i+1 = r i α iw k H k W k p i 1.6 β i = r i+1, r i+1 Z / r i, r i Z 1.7 p i+1 = r i+1 + β ip i D. Ridzal Inexact TR SQP for Large Scale Optimization 15
Tangential Step Linear Systems The application of W k, Wk requires linear system solves. Example: W k is an orthogonal projector onto Null(c x (x k )). Any computation z = W k p can be performed by solving the augmented system ( ) ( ) ( ) I c x (x k ) z p = c x (x k ) 0 y 0 If I is replaced by G k H k, and W k G kw k is positive definite, this leads to the preconditioning of the reduced Hessian W k H kw k [Keller, Gould, Wathen 2000]. Attractive if we have a good preconditioner for KKT systems: [H., Nguyen 2004], [Bartlett, H., R., van Bloemen Waanders 2006]. We have the tools to efficiently solve large scale KKT systems or above augmented systems iteratively. D. Ridzal Inexact TR SQP for Large Scale Optimization 16
Tangential Step with Inexactness (Projector Case) Issues: Augmented systems are solved iteratively. Every CG iteration uses a different W k. The CG operator W k H kw k is nonsymmetric. The CG operator W k H kw k is effectively nonlinear. Which quadratic functional are we minimizing? Conventional proofs of global convergence for SQP methods require us to replace the reduced QP with the following inexact problem: min 1 W k 2 kw k w, w + W k g k, w Z Z s.t. w X k. W k H kw k =? W k g k =? D. Ridzal Inexact TR SQP for Large Scale Optimization 17
Tangential Step with Inexactness (Projector Case) Outline of the Solution: Use a full space approach, in which the CG operator is H k (exact), and the inexactness is moved into a preconditioner W k (inexact). min 1 2 H kt, t X + g k, t X s.t. t Range(W k ), t X k. Find a fixed (with respect to every CG iteration) linear representation W k = W k + E k of the inexact null space operator. W k H kw k = W k Hk Wk, W k g k = W k gk Establish bounds on E k that can be controlled in practice. D. Ridzal Inexact TR SQP for Large Scale Optimization 18
Tangential Step Inexact CG with Full Orthogonalization 0. Let t 0 = 0 X. Let r 0 = g k. Set i max, set i = 0. 1. While (W k (r i ) 0 and i < i max ) 1.1 z i = W k (r i) 1.2 p i = z i + i 1 z i,h k p j X j=0 p j p j,h k p j X 1.3 If p i, H k p i X 0, extend t i to boundary of TR and stop. 1.4 α i = r i, p i X / p i, H k p i X 1.5 t i+1 = t i + α ip i 1.6 If t i+1 k, extend t i to boundary of TR and stop. 1.7 r i+1 = r i + α ih k p i 1.8 i i + 1 D. Ridzal Inexact TR SQP for Large Scale Optimization 19
Inexact CG with Full Orthogonalization Theory Theorem (1) If W k = W k is a fixed (exact) linear operator, then the inexact CG algorithm in the full space is equivalent to a traditional Steih./T. CG algorithm applied to the tangential subproblem in the reduced space. Proof. Straightforward. If linear system solves can be performed with high accuracy, we recover the convergence properties of traditional CG. D. Ridzal Inexact TR SQP for Large Scale Optimization 20
Inexact CG with Full Orthogonalization Theory Theorem (2) There exists a fixed linear operator W k such that W k (r i ) = W k r i for every iteration i of the inexact CG algorithm. Proof. It can be shown that residual vectors r i, i = 0, 1,..., m, are linearly independent, so the matrix R m = [r 0, r 1,..., r m] has full column rank. Introduce matrices Y m = [W k r 0, W k r 1,..., W k r m], Ỹ m = [W k (r 0), W k (r 1),..., W k (r m)]. Inexact operator (one possible choice): W k = W k + (Ỹm Ym)(R mr m) 1 R m, since W k R m = Ỹm. D. Ridzal Inexact TR SQP for Large Scale Optimization 21
Inexact CG with Full Orthogonalization Theory Theorem (2) There exists a fixed linear operator W k such that W k (r i ) = W k r i for every iteration i of the inexact CG algorithm. Inexact CG effectively solves the inexact tangential subproblem: min 1 Wk Hk Wk w, w + Wk gk, w 2 Z Z s.t. W k w X k. Use conventional theory for global convergence of SQP methods. Remark: For analytical purposes, we use the inexact operator W k = W k + E k = W k + (Ỹm Y m )(Ỹm Rm ) 1 Ỹ m (after establishing the conditions for the invertibility of Ỹm Rm ). D. Ridzal Inexact TR SQP for Large Scale Optimization 21
Tangential Step Global Convergence Requirements (C1) W k gk Wk g k X κ 1 min ( W ) k gk X, k, (C2) (C3) 1 2 Wk Hk Wk w k, w k Wk Hk Wk w k, w k X κ 2 w k 2 X, X κ 3 W k gk X min Wk gk, w k X {κ 4 W k gk X, κ 5 k }, for positive constants κ 1,..., κ 5 independent of k. D. Ridzal Inexact TR SQP for Large Scale Optimization 22
Tangential Step Global Convergence Requirements (C1) W k gk Wk g k X κ 1 min ( W ) k gk X, k, (C2) (C3) 1 2 Wk Hk Wk w k, w k Wk Hk Wk w k, w k X κ 2 w k 2 X, X κ 3 W k gk X min Wk gk, w k X {κ 4 W k gk X, κ 5 k }, for positive constants κ 1,..., κ 5 independent of k. The true difficulty is in proving the global convergence condition (C1), related to the inexact reduced gradient. D. Ridzal Inexact TR SQP for Large Scale Optimization 22
Inexact CG with Full Orthogonalization Theory Theorem (3) If at every iteration i of the inexact CG algorithm { W k r i W k r i ξ min W } k g k g k, k g k, β W k r i, ξ > 0, and c 1 W k gk W k g k c 2 W k gk, c 1, c 2 > 0, then the convergence requirements (C1) (C2) are satisfied. Proof. Relies on a bound for the quantity E k = (Ỹm Ym)(Ỹm Rm) 1 Ỹ m. Notes: (1) Even though the inexact reduced gradient W k g k is computed in the very first CG iteration, in order to guarantee (C1) our theoretical framework puts restrictions on all subsequent applications of W k. (2) The theorem gives a sufficient condition that works extremely well in practice. D. Ridzal Inexact TR SQP for Large Scale Optimization 23
Application of the Inexact Operator W k Recall: (i) At every iteration k of the SQP algorithm, inexact CG is called. (ii) At every CG iteration i, we compute iteratively an inexact projected residual z i = W k (r i ) = W k r i such that ( I c x (x k ) c x (x k ) 0 ) ( zi y ) = ( ri 0 ) + ( e 1 i e 2 i ). D. Ridzal Inexact TR SQP for Large Scale Optimization 24
Application of the Inexact Operator W k Recall: (i) At every iteration k of the SQP algorithm, inexact CG is called. (ii) At every CG iteration i, we compute iteratively an inexact projected residual z i = W k (r i ) = W k r i such that ( I c x (x k ) c x (x k ) 0 ) ( zi y ) = ( ri 0 ) + ( e 1 i e 2 i ). Control global SQP convergence by controlling e i! D. Ridzal Inexact TR SQP for Large Scale Optimization 24
Application of the Inexact Operator W k Recall: (i) At every iteration k of the SQP algorithm, inexact CG is called. (ii) At every CG iteration i, we compute iteratively an inexact projected residual z i = W k (r i ) = W k r i such that ( I c x (x k ) c x (x k ) 0 ) ( zi y ) = ( ri 0 ) + ( e 1 i e 2 i Theory: If at every iteration i of the inexact CG algorithm { W k r i W k r i ξ min W } k g k g k, k g k, β W k r i, ξ > 0, and c 1 W k gk W k g k c 2 W k gk, c 1, c 2 > 0, then the convergence requirements (C1) (C2) are satisfied. ). D. Ridzal Inexact TR SQP for Large Scale Optimization 24
Application of the Inexact Operator W k Recall: (i) At every iteration k of the SQP algorithm, inexact CG is called. (ii) At every CG iteration i, we compute iteratively an inexact projected residual z i = W k (r i ) = W k r i such that ( I c x (x k ) c x (x k ) 0 ) ( zi y ) = ( ri 0 ) + ( e 1 i e 2 i Practice: It is sufficient to require { e i min W } k g k g k, k g k, β z i, }{{} γ where β = 10 3 (fixed small constant). Note W k g k = z 0. ). D. Ridzal Inexact TR SQP for Large Scale Optimization 24
Application of the Inexact Operator W k Recall: (i) At every iteration k of the SQP algorithm, inexact CG is called. (ii) At every CG iteration i, we compute iteratively an inexact projected residual z i = W k (r i ) = W k r i such that ( I c x (x k ) c x (x k ) 0 ) ( zi y ) = Implementation: First CG iteration Stop the linear system solver at iteration m, if e (m) 0 γ z (m) 0. ( ri 0 ) + ( e 1 i e 2 i Subsequent CG iterations Heuristics: reuse the size of the iterate returned by the previous solve, e i γ z i 1. ). D. Ridzal Inexact TR SQP for Large Scale Optimization 24
Outline Motivation Large Scale Problems in PDE Constrained Optimization Inexactness in linear system solves arising in an SQP algorithm Trust Region SQP Algorithm with Inexact Linear System Solves Existing Work on Inexactness in Optimization Algorithms Review of the SQP Methodology Mechanisms of Inexactness Control Numerical Results Conclusion D. Ridzal Inexact TR SQP for Large Scale Optimization 25
Example 1: Burgers Equation in 1D subject to min 1 2 1 0 (y(x) y d (x)) 2 dx + α 2 1 0 u 2 (x)dx νy xx (x) + y(x)y x (x) = f(x) + u(x) x (0, 1) y(0) = 0, y(1) = 0. Finite element discretization with linear elements. ν = 10 2, α = 10 5, 100 equidistant subintervals. SQP stopping criteria: c(x k ) < 10 6, x L(x k, λ k ) < 10 6. For augmented system solves use GMRES with incomplete LU preconditioning. D. Ridzal Inexact TR SQP for Large Scale Optimization 26
Example 1 Inexactness Control in Tang. Step 10 4 absolute inner solver stopping tol 10 6 10 8 10 10 10 12 CG iterations (over all SQP iterations) Controlled tolerance in first CG iteration (one for every SQP iteration). * Controlled tolerance in all other CG iterations. D. Ridzal Inexact TR SQP for Large Scale Optimization 27
Example 1 Inexactness Control in Tang. Step 10 4 absolute inner solver stopping tol 10 6 10 8 10 10 10 12 CG iterations (over all SQP iterations) Total number of GMRES iterations: 2544. Runtime: 11 seconds. D. Ridzal Inexact TR SQP for Large Scale Optimization 27
Example 1 Inexactness Control in Tang. Step 10 4 absolute inner solver stopping tol 10 6 10 8 10 10 10 12 CG iterations (over all SQP iterations) How do we pick a fixed tolerance for comparison? D. Ridzal Inexact TR SQP for Large Scale Optimization 27
Example 1 Inexactness Control in Tang. Step 10 4 absolute inner solver stopping tol 10 6 10 8 10 10 10 12 CG iterations (over all SQP iterations) Pick the largest tolerance that recovers the same convergence profile (in terms of the number of SQP iterations and the quality of the solution). D. Ridzal Inexact TR SQP for Large Scale Optimization 27
Example 1 Inexactness Control in Tang. Step 10 4 absolute inner solver stopping tol 10 6 10 8 10 10 10 12 CG iterations (over all SQP iterations) Fixed Tolerance: 1 10 11. Total number of GMRES iterations: 5652 (was 2544). Runtime: 33 seconds (was 11). D. Ridzal Inexact TR SQP for Large Scale Optimization 27
Example 1 Inexactness Control in Tang. Step 10 3 relative inner solver stopping tol 10 4 10 5 10 6 10 7 10 8 CG iterations (over all SQP iterations) Relative linear solver stopping never need to surpass the desired SQP stopping tolerances! D. Ridzal Inexact TR SQP for Large Scale Optimization 28
Example 2: Nonlinear Elliptic Problem in 2D subject to minimize 1 y 0 (x)) 2 Ω(y(x) 2 dx + 1 u 2 (x)dx 2 Ω y(x) + y 3 (x) y(x) = 0 in Ω, y (x) = u(x) n on Ω. The computational domain is the [0, 1] [0, 1] square. Unstructured meshes generated by Triangle, partitioned using Metis. Mesh sizes: 32K, 64K, 128K, 256K ( total number of variables). Partition sizes: 2, 4, 8, 16 (= number of processors). For augmented system solves use GMRES with DD preconditioning. Beowulf cluster (Mike Heroux, CSBSJU, MN and Sandia, NM): 16 Athlon 2.0GHz nodes / 1GB RAM / 100 Mbps Ethernet D. Ridzal Inexact TR SQP for Large Scale Optimization 29
Example 2 Inexactness Control in Tang. Step absolute inner solver stopping tol 10 4 10 6 10 8 10 10 1 2 1 2 1 2 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 7 CG iterations Controlled tolerance in first CG iteration (one for every SQP iteration). * Controlled tolerance in all other CG iterations. D. Ridzal Inexact TR SQP for Large Scale Optimization 30
Example 2 Inexactness Control in Tang. Step absolute inner solver stopping tol 10 4 10 6 10 8 10 10 1 2 1 2 1 2 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 7 CG iterations How do we pick a fixed tolerance for comparison? D. Ridzal Inexact TR SQP for Large Scale Optimization 30
Example 2 Inexactness Control in Tang. Step absolute inner solver stopping tol 10 4 10 6 10 8 10 10 1 2 1 2 1 2 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 7 CG iterations Pick the largest tolerance that recovers the same convergence profile (in terms of the number of SQP iterations and the quality of the solution). Fixed Tolerance: 5 10 9. D. Ridzal Inexact TR SQP for Large Scale Optimization 30
Example 2 Inexactness Control in Tang. Step Total Number of GMRES Iterations: Fixed Tol / Controlled Tol Mesh \ Part. 2 4 8 16 32K 297/197 396/252 495/327 605/402 64K 254/166 318/190 432/273 526/337 128K 378/275 504/363 652/464 750/544 256K 425/283 564/401 730/521 906/665 Savings 30% (tangential step computation only) Wall Time in Seconds: Fixed Tol / Controlled Tol Mesh \ Part. 2 4 8 16 32K 51/46 41/34 44/40 60/75 64K 82/71 57/47 53/47 63/58 128K 268/243 185/167 144/126 140/130 256K 661/575 426/376 301/265 221/182 Savings 15% D. Ridzal Inexact TR SQP for Large Scale Optimization 31
Example 3: Navier Stokes Problem in 2D Finite element discretization with the Taylor Hood element pair. ν = 5 10 3, α = 10 1, δ = 10 5. SQP stopping criteria: c(x k ) < 10 6, x L(x k, λ k ) < 10 6. For augmented system solves use GMRES with incomplete LU preconditioning (drop tolerance 5 10 5 ). Use full reorthogonalization for all tangential step computations. D. Ridzal Inexact TR SQP for Large Scale Optimization 32
Example 3 Inexactness Control in Tang. Step absolute inner solver stopping tol 10 2 10 4 10 6 10 8 10 10 10 12 10 14 CG iterations (over all SQP iterations) Controlled tolerance in first CG iteration (one for every SQP iteration). * Controlled tolerance in all other CG iterations. D. Ridzal Inexact TR SQP for Large Scale Optimization 33
Example 3 Inexactness Control in Tang. Step absolute inner solver stopping tol 10 2 10 4 10 6 10 8 10 10 10 12 10 14 CG iterations (over all SQP iterations) Total number of GMRES iterations: 2672. D. Ridzal Inexact TR SQP for Large Scale Optimization 33
Example 3 Inexactness Control in Tang. Step absolute inner solver stopping tol 10 2 10 4 10 6 10 8 10 10 10 12 10 14 CG iterations (over all SQP iterations) How do we pick a fixed tolerance for comparison? D. Ridzal Inexact TR SQP for Large Scale Optimization 33
Example 3 Inexactness Control in Tang. Step absolute inner solver stopping tol 10 2 10 4 10 6 10 8 10 10 10 12 10 14 CG iterations (over all SQP iterations) Pick the largest tolerance, by trial and error, that recovers the same convergence profile (in terms of the number of SQP iterations and the quality of the solution). D. Ridzal Inexact TR SQP for Large Scale Optimization 33
Example 3 Inexactness Control in Tang. Step absolute inner solver stopping tol 10 2 10 4 10 6 10 8 10 10 10 12 10 14 CG iterations (over all SQP iterations) Fixed Tolerance: 1 10 10. Total number of GMRES iterations: 3404 (was 2672). D. Ridzal Inexact TR SQP for Large Scale Optimization 33
Example 3 Inexactness Control in Tang. Step More Details Stopping Tolerances for Linear System Solver inx. ctrl 1e-12 1e-11 1e-10 1e-9 1e-8 converges YES YES YES YES NO NO GMRES iter s 2670 4020 3728 3404 >10000 >10000 CG iter s 162 142 142 142 >500 >500 SQP iter s 8 7 7 7 >50 >50 No theoretical justification. D. Ridzal Inexact TR SQP for Large Scale Optimization 34
Outline Motivation Large Scale Problems in PDE Constrained Optimization Inexactness in linear system solves arising in an SQP algorithm Trust Region SQP Algorithm with Inexact Linear System Solves Existing Work on Inexactness in Optimization Algorithms Review of the SQP Methodology Mechanisms of Inexactness Control Numerical Results Conclusion D. Ridzal Inexact TR SQP for Large Scale Optimization 35
Conclusion Integrated iterative linear solvers in a trust-region SQP algorithm. Global convergence of the SQP algorithm is guaranteed through a mechanism of inexpensive and easily implementable stopping conditions for iterative linear system solvers. Eliminated the need to guess fixed solver tolerances, at the expense of a few vector norm computations and a full reorthogonalization in the tangential step computation. extra work < 1% of the cost of linear system solves (for a simple medium scale problem) Numerical results indicate that the dynamic stopping conditions effectively reduce oversolves. Local convergence behavior of the algorithm must be investigated. D. Ridzal Inexact TR SQP for Large Scale Optimization 36