High Performance Nonlinear Solvers

Similar documents
Iterative Methods for Solving A x = b

The Conjugate Gradient Method

Linear Solvers. Andrew Hazel

Iterative Methods and Multigrid

Progress in Parallel Implicit Methods For Tokamak Edge Plasma Modeling

Newton s Method and Efficient, Robust Variants

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

DELFT UNIVERSITY OF TECHNOLOGY

Solving Ax = b, an overview. Program

ITERATIVE METHODS FOR NONLINEAR ELLIPTIC EQUATIONS

M.A. Botchev. September 5, 2014

9.1 Preconditioned Krylov Subspace Methods

Preface to the Second Edition. Preface to the First Edition

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

Notes for CS542G (Iterative Solvers for Linear Systems)

Stabilization and Acceleration of Algebraic Multigrid Method

Solving Boundary Value Problems (with Gaussians)

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

Lab 1: Iterative Methods for Solving Linear Systems

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Review for Exam 2 Ben Wang and Mark Styczynski

Chapter 7 Iterative Techniques in Matrix Algebra

Preconditioned inverse iteration and shift-invert Arnoldi method

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

7.2 Steepest Descent and Preconditioning

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for

Boundary Value Problems and Iterative Methods for Linear Systems

Review: From problem to parallel algorithm

Multigrid absolute value preconditioning

Advanced Computational Methods for VLSI Systems. Lecture 4 RF Circuit Simulation Methods. Zhuo Feng

2 CAI, KEYES AND MARCINKOWSKI proportional to the relative nonlinearity of the function; i.e., as the relative nonlinearity increases the domain of co

Solving Large Nonlinear Sparse Systems

1 Extrapolation: A Hint of Things to Come

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

6.4 Krylov Subspaces and Conjugate Gradients

A Linear Multigrid Preconditioner for the solution of the Navier-Stokes Equations using a Discontinuous Galerkin Discretization. Laslo Tibor Diosady

HOMEWORK 10 SOLUTIONS

A THEORETICAL INTRODUCTION TO NUMERICAL ANALYSIS

Numerical Methods for Large-Scale Nonlinear Equations

FAS and Solver Performance

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Iterative Methods for Linear Systems

Contents. Preface... xi. Introduction...

Integration of PETSc for Nonlinear Solves

Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

The method of lines (MOL) for the diffusion equation

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

Lecture 17: Iterative Methods and Sparse Linear Algebra

Course Notes: Week 1

An Accelerated Block-Parallel Newton Method via Overlapped Partitioning

Programming, numerics and optimization

Newton-Krylov-Schwarz Method for a Spherical Shallow Water Model

Kasetsart University Workshop. Multigrid methods: An introduction

An Iterative Descent Method

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,

Lecture 18 Classical Iterative Methods

Indefinite and physics-based preconditioning

Fast Iterative Solution of Saddle Point Problems

Solving PDEs with Multigrid Methods p.1

30.5. Iterative Methods for Systems of Equations. Introduction. Prerequisites. Learning Outcomes

Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools

ADAPTIVE ACCURACY CONTROL OF NONLINEAR NEWTON-KRYLOV METHODS FOR MULTISCALE INTEGRATED HYDROLOGIC MODELS

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods for Large-Scale Nonlinear Systems

Lecture 10 Preconditioning, Software, Parallelisation

An advanced ILU preconditioner for the incompressible Navier-Stokes equations

Poisson Equation in 2D

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Shifted Laplace and related preconditioning for the Helmholtz equation

Iterative Methods for Sparse Linear Systems

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

From Stationary Methods to Krylov Subspaces

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

Nonlinear Optimization for Optimal Control

Conjugate Gradients: Idea

Lecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Numerical Analysis: Solutions of System of. Linear Equation. Natasha S. Sharma, PhD

1. Fast Iterative Solvers of SLE

Termination criteria for inexact fixed point methods

Introduction, basic but important concepts

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Non-linear least squares

Chapter 9 Implicit integration, incompressible flows

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

Numerical Methods for Inverse Kinematics

Notes on Some Methods for Solving Linear Systems

Preconditioners for the incompressible Navier Stokes equations

Lecture 8: Fast Linear Solvers (Part 7)

Using PETSc Solvers in PyLith

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

MATH 310, REVIEW SHEET 2

Quasi-Newton Methods

Transcription:

What is a nonlinear system? High Performance Nonlinear Solvers Michael McCourt Division Argonne National Laboratory IIT Meshfree Seminar September 19, 2011 Every nonlinear system of equations can be described as F(u) = 0 for u R N and F : R N R. F is often referred to as a residual function. This includes x + 2 = 3 Ax = b x 3 = 3 x This does not include x + 2 < 3 mint Ax = b(t) x 3 = 3 x ; x Z mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 1 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 2 / 34 What can become a nonlinear system? Consider the problem u F(u) = α e t2 dt = 0, α R. 0 This is a nonlinear equation, but because e t2 has no antiderivative, there is no way to compute F(u). Solution Approximate the integral with e.g., trapezoid rule, Gauss quadrature, monte carlo, and call that discretization Ĩ. Then call F (u) = α Ĩ(u). mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 3 / 34 What can become a nonlinear system? Consider the problem ut(t) f (u, t) = 0, u(0) = u 0 In trying to solve for u, what does it mean to apply d dt? Solution Among other possible options, we could discretize the solution on a grid and solve for u(t) at specific t (labeled u k+1 ), with a finite difference approximation to ut(t) yielding u k : 1 t (uk+1 u k ) f (u k+1, t) = 0, k = 0, 1,... mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 4 / 34

What can become a nonlinear system? How do we solve nonlinear systems? Consider the problem min u Ω R N G(u) Picard iteration: u k+1 = f (u k ) Also called fixed point iteration, or nonlinear Richardson As mentioned earlier, optimization problems are not nonlinear systems because there is no residual function to evaluate. Solution A technique referred to as Quasi-Newton leverages the fact that local minima are reached when G(u) = 0. By discretizing the gradient as we can define F(u) = (G)(u). Charles Émile Picard Limitations to Picard include Must be able to write F (u) = u f (u) such that f < 1 near the solution. May need good initial guess u 0. Convergence may be slow. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 5 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 6 / 34 How do we solve nonlinear systems? How do we solve nonlinear systems? Stochastic search: F(u) = 0 minu F(u) Reformulate the nonlinear system as an optimization problem and solve it with optimization techniques. Newton s Method: u k+1 = u k J(F)(u k ) 1 F(u k ) Quadratically convergent algorithm from back in the day Limitations to stochastic search include Limitations to Newton s method include Nicholas Metropolis Produces a solution in distribution Computationally costly; may require extra memory Less rigorous mathematics ( F(u) may not have smooth derivatives) Sir Isaac Newton Good initial guess needed Requires Jacobian knowledge Linear solve required at each nonlinear iteration mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 7 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 8 / 34

Derivation of Newton s Method Making Newton s method practical Where does the iteration u k+1 = u k J(F)(u k ) 1 F(u k ), k = 0, 1,... to solve F(u) = 0 come from? Taylor series Assume you are at step u k and the solution is u, meaning u k = u u k. F(u k + u k ) = F(u k ) + J(F)(u k ) u k + O( u k 2 ) }{{}}{{} F (u)=0 0 0 F(u k ) + J(F)(u k ) u k Quadratic convergence makes Newton s method the optimal choice, if we can circumvent the limitations. For Newton s method to be practical we need Globalization - How bad can our initial guess be to still see convergence? Linear solvers - Can we efficiently invert the Jacobian? Jacobian computation - How can we efficiently evaluate the Jacobian? Can we make do with a cheap approximation to the Jacobian? When the steps u k get small enough, u k u. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 9 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 10 / 34 Globalization Why does a bad initial guess prevent convergence? Recall the Taylor expansion F(u k + u k ) = F(u k ) + J(F)(u k ) u k + O ( u k 2) J(F)(u k ) 1 F(u k ) = u k + O ( J(F )(u k ) 1 u k 2) Globalization How do we implement Newton s method for a bad initial guess? Line search - take a shorter step in the Newton direction and make sure to reduce the norm. Why does that make sense? Newton s method converges quadratically with a decent initial guess. If u k is too large, the assumption that O( u k 2 ) is negligible is invalid. This means that the linear system solution J(F)(u k ) 1 F(u k ) is a poor approximation to u k. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 11 / 34 As long as we are reducing the norm, we will eventually get close enough for Newton s method to converge as it should. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 12 / 34

Globalization How do we implement Newton s method for a bad initial guess? Trust region - prevent the iteration from entering a region with unacceptable values. Why does that make sense? If you have physical knowledge about the system, use it to restrict the steps when possible. Example - Pressure cannot be negative, so if the iteration produces a negative value, take a smaller step. Globalization How do we implement Newton s method for a bad initial guess? Pseudotransient continuation - solve an equivalent system where the. Why does that make sense? This one is a little more difficult to understand. In trying to solve F(u) = 0, we can find the steady-state solution to ut(x, t) = F(u(x, t)), u(x, 0) = u 0 (x) This time dependent system at steady-state is independent of the initial condition. It is much better conditioned, although we re not interested in why here. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 13 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 14 / 34 Linear Solvers How do we find the Newton step J(F)(u k ) u k = F(u k ) efficiently? Question Do we even need the exact inverse J(F)(u k ) 1 F(u k )? Actually, no It turns out that Inexact Newton will also converge quadratically J(F)(u k ) u k + F(u k ) < ɛ This means an iterative solver can be used. Furthermore, what s the point in exactly solving the linear system if a globalization technique (e.g., line search) is being used? mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 15 / 34 Linear Solvers Now that we know an iterative solver can be used to find the Newton step, new opportunities are available: The Jacobian no longer needs to be computed - only the action J(F)(u)v. How can we take advantage of this? Finite differences F(u + hv) = F(u) + hj(f )(u)v + O(h 2 ) J(F)(u)v = 1 (F(u + hv) F(u)) h Jacobian-vector products can be approximated by finite differences at the cost of 1 function evaluation. This does not require computing the full Jacobian. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 16 / 34

Linear Solvers Now that we know an iterative solver can be used to find the Newton step, new opportunities are available: The Jacobian no longer needs to be computed - only the action J(F)(u)v. How can we take advantage of this? Complex derivatives To avoid cancelation from finite differences, F(u + ıv) = F(u) + ıj(f)(u)v + O(h 2 ) R(F(u + ıhv)) = F(u) I(F(u + ıhv)) = J(F)(u)v Function evaluations and Jacobian-vector products can be computed simultaneously given a real function F if it is overloaded to accept complex arguments. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 17 / 34 Linear Solvers After choosing a linear solver tolerance ɛ, J(F)(u k ) u k + F(u k ) < ɛ can be solved via GMRES or some other iterative method without ever computing the true Jacobian. This introduces the Krylov into Newton-Krylov-Schwarz. Unfortunately, most problems of interest are rather ill-conditioned, meaning that an iterative solver will converge very slowly. Preconditioning To combat this, it is common to use a preconditioner. Unfortunately, since we don t have the true Jacobian, we have no idea what a good preconditioner looks like. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 18 / 34 Jacobian Computation Recall the iterative approach to solving systems: unpreconditioned methods for Ax = b form the Krylov space Kn = {b, Ab,..., A n b}. We only have the ability to conduct matrix-vector products and do not have access to the true Jacobian. Since the Jacobian-vector products are being approximated via finite differences, the true Jacobian is not necessary. Recall the structure of a preconditioned Krylov subspace for the problem (AM 1 )(Mx) = b: Kn = {b, AM 1 b,..., (AM 1 ) n b} How can we approximate a Jacobian matrix with which to create a preconditioner? (Hint: it doesn t need to be perfect...) mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 19 / 34 Jacobian Computation Approximating the Jacobian can be done via finite differences: J(F)(u)v = 1 (F(u + hv) F(u)) h If v is set to the k th column of the identity matrix IN, J(F)(u)v will be the k th column of J(F)(u). J(F)(u)IN = J(F)(u) Approximating the Jacobian with this approach will require N function evaluations, which is unacceptably high. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 20 / 34

Jacobian Computation Approximating the Jacobian can be done via finite differences is practical when working with a sparse matrix. The nonzero structure of the matrix may produce columns which are orthogonal. These columns can be computed with a single function evaluation. Jacobian Computation Approximating the Jacobian can be done via automatic differentiation (AD). This will compute derivatives of functions without loss of accuracy from cancelation or truncation, as was present in finite differences. AD likely requires access to the source code, which may be unreasonable in some cases. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 21 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 22 / 34 Where are we now? Preconditioners We have the following steps to solve F(u) = 0: 1 Use Newton s method to iterate from an initial guess u 0 to the solution u. 2 Find the next iterate by solving J(F)(u k ) u k + F(u k ) < ɛ iteratively. 3 ** Precondition the iterative method using an approximate Jacobian. 4 Apply line search to the Newton iterate to improve convergence. Now that we have an approximate Jacobian via coloring, how can we precondition our system? There are literally thousands of preconditioners that exist for solving systems. There is a cottage industry for every application where a specialized preconditioner could exist. The most common preconditioners are: LU - Use the full inverse of M. ILU - Cheaply approximate the full inverse while controlling memory costs. Multigrid - Multilevel solvers are much more complicated but helpful for many problems. Schwarz - Domain decomposition techniques help reduce parallel communication and improve scalability. FFT - Some systems respond well to transforms. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 23 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 24 / 34

Preconditioners What does I ll use preconditioner [pick one] mean? When we compute M via coloring we get a matrix M J(F)(u k ). This matrix is not necessarily the matrix which is inverted in AM 1 b. What s going on here? In order to make M 1 easier to compute, some values are often discarded from M before computing M 1. Components in preconditioned GMRES J(F)(u k )v products are approximated via finite differences M is an approximate Jacobian computed via finite difference with coloring Note that (M 1 ) 1 M because some values are lost. M 1 is applied efficiently by dumping some values in M. Preconditioners For example, consider a simple Schwarz preconditioner called block Jacobi on 2 processors. Each processor retains only the M values which it owns, and ignores the rest. The blocks of M are inverted by LU. ( ) M1 M2 M = M3 M4 ( ) M 1 M 1 = 1 0 0 M 1 4 Even though the full matrix M may have been computed, some terms were dumped to speed up the computation and application of M 1. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 25 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 26 / 34 Preconditioners Example of preconditioning To allow for a speedy solve, the preconditioners have to be tailored to the physics of the system: 1 If the system is well-conditioned, ILU may be used in place of LU. 2 If the system is elliptic, Multigrid will be effective. 3 If you need a large system solved, Schwarz methods will allow you to reduce communication between processors. 4 When the system is very ill-conditioned, sometimes all you can use is LU. The more you know about the system, the better your preconditioner can be... mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 27 / 34 The neutral terms make the system so ill-conditioned that the LU preconditioner needs to be used. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 28 / 34

Example of preconditioning Example of preconditioning The LU preconditioner shows poor scalability. What can we do?? What if we used a targeted approach of solving the ill-conditioned neutral velocity terms with LU, elliptic neutral density terms with Multigrid, and well-conditioned plasma terms with a Schwarz method? mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 29 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 30 / 34 Example of preconditioning Conclusion By targeting the preconditioning, the solver can be sped up significantly because unnecessary work is removed from the process. Today we have gone over techniques to make Newton s method a practical solver nonlinear systems F(u) = 0: Line search is a common approach to allow for bad initial guesses. Iterative solvers may be used to find Newton directions. Jacobian-vector products can be approximated via finite differences. A preconditioning matrix can be computed with graph coloring. Targeting your preconditioner to your system can speed it significantly. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 31 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 32 / 34

Other cool stuff Other bad stuff There are other things which may be important in speeding up your nonlinear solver, including Jacobian lagging - Recompute the preconditioner less frequently, since matrix-vector products are independent of the M matrix Variable linear tolerance (Eisenstat-Walker trick) - Some of your linear solves can be crummy and you can still reach the solution Nonlinear preconditioning - Is there a F which you can apply as F(F(u)) = 0 to make your system easier to solve? High order finite differences - Will more accurate Jacobian vector products speed the solution? There are problems I didn t talk about today Jacobian coloring - How does your choice of coloring hurt the accuracy of the finite difference approximation? Line search - Can this trap you in a local minimum? Preconditioning - How do I pick a good preconditioner? Note: This is the main impediment for people not using implicit methods. Storage - Newton-Krylov-Schwarz can demand a lot of memory that simpler nonlinear schemes don t demand. mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 33 / 34 mccomic@mcs.anl.gov (Argonne) Newton-Krylov-Schwarz September 19, 2011 34 / 34