Linear and Non-Linear Preconditioning

and Non- martin.gander@unige.ch University of Geneva June 2015

Invents an Iterative Method in a Letter (1823), in a letter to Gerling: in order to compute a least squares solution based on angle measurements between the locations Berger Warte, Johannisberg, Taufstein and Milseburg:

Method

Method Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 2.277381e-17. xexact = 17.5190 163.3949 85.1300-128.0000

Calculations Make Happy! concludes his letter with the statement:... You will in the future hardly eliminate directly, at least not when you have more than two unknowns. The indirect procedure can be done while one is half asleep, or is thinking about other things.

also Invents an Iterative Method (1845): Über eine neue Auflösungsart der bei der Methode der kleinsten Quadrate vorkommenden lineären Gleichungen

s Method

Points out a Problem of the Method with a simple computation, the equations can be transformed into new ones, where the problem does not appear any more

Rotations Next, takes an example from Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium (1809)

does After preconditioning, it takes only three iterations to obtain three accurate digits!

Classical Stationary Iterative Methods For a linear system Au = f, one needs a splitting of A = M N and then iterates s Mu n+1 = Nu n +f : M = diag(a) -Seidel: M = tril(a) domain decomposition: M block diagonal Multigrid: M represents a V-cycle or W-cycle The iterative method u n+1 = M 1 Nu n +M 1 f = (I M 1 A)u n +M 1 f 1. converges fast if ρ(i M 1 A) is small. 2. and is cheap, if systems with M can easily be solved

Invention of Stiefel and Rosser 1951: Presentations at a Symposium at the National Bureau of Standards (UCLA) Hestenes 1951: Iterative methods for solving linear equations Stiefel 1952: Über einige Methoden der Relaxationsrechnung Hestenes and Stiefel 1952: Methods of for Solving Systems An iterative algorithm is given for solving a system Ax = k of n linear equations in n unknowns. The solution is given in n steps. Lanczos 1952: Solution of systems of linear equations by minimized iterations (see also 1950)

The Conjugate Gradient Method To solve approximately Au = f, A spd, CG finds at step n using the Krylov space K n (A,r 0 ) := {r 0,Ar 0,...,A n 1 r 0 }, r 0 := f Au 0 an approximate solution u n u 0 +K n (A,r 0 ) which satisfies u u n A min. Theorem With κ(a) the condition number of A, ( ) n κ(a) 1 u u n A 2 u u 0 A. κ(a)+1 The conjugate gradient method converges very fast, if the condition number κ(a) is not large.

of Minimum residual methods (MR): find u n u 0 +K n (A,r 0 ) such that f Au n 2 min. MINRES for symmetric problems GMRES for general problems QMR for general problems, only approximate minimization Methods based on orthogonalization (OR): find u n u 0 +K n (A,r 0 ) such that f Au n K n (A,r 0 ). SymmLQ for symmetric problems FOM for general problems BiCGstab for general problems, bi-orthogonalization All these methods converge well, if the spectrum of the matrix A is clustered around 1.

the System Find a matrix M such that the preconditioned system M 1 Au = M 1 f is easier to solve with a Krylov method. Two goals: 1. For CG: make κ(m 1 A) much smaller than κ(a) More generally: cluster spectrum of M 1 A close to one 2. It should be inexpensive to apply M 1 With very good intuition or experience, one can guess good preconditioners Additive by Dryja and Widlund 1987

FETI by Farhat and Roux 1991 (see already Destuynder and Roux 1988) Balancing Domain Decomposition by Mandel and Brezina 1993

In preconditioning, we search for M such that 1. the spectrum of M 1 A is clustered around one 2. it should be inexpensive to apply M 1 For stationary iterative methods, we needed M such that 1. the spectral radius ρ(i M 1 A) is small 2. it should be inexpensive to apply M 1 Note that ρ(i M 1 A) small spectrum of M 1 A close to one Idea: design a good M for a stationary iterative method, and then use it as a preconditioner for a Krylov method. Theorem Using a Krylov method with preconditioner M is always better than just using the stationary iteration.

Proof: The stationary iterative method computes u n = (I M 1 A)u n 1 +M 1 f = u n 1 +r n 1 stat which means the residual r n stat := M 1 f M 1 Au n satisfies M 1 f M 1 Au n = M 1 f M 1 A(u n 1 +r n 1 stat ) = r n stat = (I M 1 A)r n 1 stat = (I M 1 A) n r 0 The Krylov method will use the Krylov space K n (M 1 A,r 0 ) := {r 0,M 1 Ar 0,...,(M 1 A) n 1 r 0 } to search for u n u 0 +K n (M 1 A,r 0 ), i.e. n u n = u 0 + α i (M 1 A) i 1 r 0 i=1 which means the residual r n kry satisfies r n kry = pn (M 1 A)r 0, p n (0) = 1

An example of non-linear preconditioning Additive Preconditioned Inexact Newton (): Cai, Keyes and Young DD13 (2001), Cai and Keyes SISC (2002) The nonlinear system is transformed into a new nonlinear system, which has the same solution as the original system. For certain applications the nonlinearities of the new function are more balanced and, as a result, the inexact Newton method converges more rapidly. Instead of solving F(u) = 0, solve instead G(F(u)) = 0 with If G(v) = 0 then v = 0 G F 1 in some sense G(F(v)) is easy to compute Applying Newton, (G(F(v))) w should also be easy to compute

For F : R n R n, define (overlapping) subsets Ω j for the indices {1,2,...,n}, such that Ω j = {1,2,...,n} j and corresponding restriction matrices R j, e.g. 1 Ω 1 = {1,2,3} R 1 = 1 1 Define the solution operator T j : R n R Ω j such that R T RF(v T j (v)) = 0. Then solves using inexact Newton J G(F(u)) := T j (u) = 0. j=1.

: Newton without preconditioning Driven cavity flow problem (Cai, Keyes 2002)

: Newton with preconditioning Driven cavity flow problem (Cai, Keyes 2002)

Preconditioners: Recall from the linear case: from a stationary iterative method for Au = f, u n+1 = (I M 1 A)u n +M 1 f with fast convergence, i.e. ρ(i M 1 A) small, we obtain a good preconditioner M to solve with a Krylov method. M 1 Au = M 1 f Idea: For the non-linear problem F(u) = 0, construct a fixed point iteration u n+1 = G(u n ) and then solve using Newton s method. F(u) := G(u) u = 0

Using a Method One dimensional non-linear model problem L(u) := x ((1+u 2 ) x u) = f, in Ω := (0,L), u(0) = 0, u(l) = 0, Parallel method with two subdomains Ω 1 := (0,β) and Ω 2 := (α,l), α < β L(u1 n) = f, u1 n (0) = 0, u1 n (β) = un 1 2 (β), L(u2 n) = f, u2 n (α) = un 1 1 (α), (L) = 0. u n 2 in Ω 1 := (0,β), in Ω 2 := (α,l), This is a non-linear fixed point iteration. How can we apply Newton to solve at the fixed point?

One Level RASPEN { u n u n (x) := 1 (x) if 0 x < α+β 2, u2 n α+β (x) if 2 x L, or with zero extension operators P i (often called R T i in RAS) u n = P 1 u n 1 + P 2 u n 2. With the solution operators for the non-linear subdomain problems u n 1 = G 1 (u n 1 ), u n 2 = G 2 (u n 1 ), we obtain (for I subdomains) I u n = P i G i (u n 1 ) =: G 1 (u n 1 ). i=1 RASPEN: Solve the fixed point equation with Newton I F 1 (u) := G 1 (u) u = P i G i (u) u = 0 i=1

#! # $ % & "! "# "$ "% : Forchheimer Equation (q( λ(x)u(x) )) = f(x) in Ω, u(0) = u D 0, u(l) = u D L. β > 0, 0 < λ min λ(x) λ max, q(g) = sgn(g) 1+ 1+4β g 2β Residual as a function of iterations for 8 subdomains! # $ % & "! "# "$ "% "&! "!! #! $! %! &! "#! "$! "%! "&! "&! "!! #! $! %! &! "#! "$! "%! "&! RASPEN

Adding a Coarse Grid Correction Use the Full Approximation Scheme (FAS) from multigrid: compute correction c from a non-linear coarse problem L c (R 0 u n 1 +c) = L c (R 0 u n 1 )+R 0 (f L(u n 1 )), Add the correction c := C 0 (u n 1 ) to the iterate u n 1 new = un 1 +P 0 C 0 (u n 1 ), This gives naturally the two level fixed point iteration u n = I P i G i (u n 1 +P 0 C 0 (u n 1 )) =: G 2 (u n 1 ), i=1 Two level RASPEN means solving with Newton F 2 (u) := G 2 (u) u = I P i G i (u +P 0 C 0 (u)) u = 0. i=1

Comparison of and RASPEN One Level Variants: RASPEN : F 1 (u) := I i=1 P i G i (u) u = 0 : F 1 (u) := I i=1 P ig i (u) u = 0 Two Level Variants: RASPEN : F 2 (u) := I P i=1 i G i (u +P 0 C 0 (u)) u = 0 : F 2 (u) = P 0 C0 A(u)+ I i=1 P ig i (u) u = 0, F 0 (C0 A(u)+u 0 ) = R 0F(u), F 0 (u0 ) = 0 The main differences are: RASPEN does not sum corrections in the overlap like, since it is a convergent fixed point method RASPEN uses the full approximation scheme for the coarse correction, whereas does an additive ad hoc construction RASPEN uses exact Newton, since one obtains the exact an from the inner non-linear solves.

6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 Newton One Level AS Two level AS One Level Two level 10 0 20 2 4 6 8 12 14 16 18 22 24 Iteration n Newton One Level AS Two level AS One Level Two level 0 100 20 40 60 80 120 140 160 180 subdomain solves 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 Newton One Level RAS Two level RAS One Level RASPEN Two level FA RASPEN 10 0 20 2 4 6 8 12 14 16 18 22 24 Iteration n Newton One Level RAS Two level RAS One Level RASPEN Two level FA RASPEN 0 100 20 40 60 80 120 140 160 180 subdomain solves

A Non- Diffusion Problem: One Level ((1+u 2 ) u) = f, Ω = [0,1] 2, u = 1, x = 1, u = 0, otherwise. n Results for RASPEN ( in parentheses) N N n lsn G lsn in lsn min LS n 1 level 2 2 1 15(20) 3(3) 2(2) 2 17(23) 2(2) 2(2) 57(75) 3 18(26) 1(1) 1(1) 4 4 1 32(37) 2(2) 2(2) 2 35(41) 2(2) 1(1) 110(129) 3 38(46) 1(1) 1(1) 6 6 1 46(54) 2(2) 2(2) 2 51(59) 2(2) 1(1) 164(183) 3 57(65) 1(1) 1(1) ls G n : GMRES steps for an, ls in n maximium and ls min n minimum iterations to evaluate F

A Non- Diffusion Problem: Two Level ls G n ((1+u 2 ) u) = f, Ω = [0,1] 2, u = 1, x = 1, u = 0, otherwise. n Results for Two Level RASPEN (Two Level ) N N n lsn G lsn in lsn min LS n 2 level 2 2 1 13(23) 3(3) 2(2) 2 15(26) 2(2) 2(2) 51(83) 3 17(28) 1(1) 1(1) 4 4 1 18(33) 2(2) 2(2) 2 22(39) 2(2) 1(1) 71(123) 3 26(46) 1(1) 1(1) 6 6 1 18(35) 2(2) 2(2) 2 23(42) 2(2) 1(1) 73(133) 3 27(51) 1(1) 1(1) : GMRES steps for an, lsin n minimum iterations to evaluate F maximium and lsmin n

preconditioning means solve the preconditioned linear system M 1 Au = M 1 f using a Krylov method. Non-linear preconditioning means solve the preconditioned non-linear system G(F(u)) = 0 using Newton s method. Preconditioners can be defined using 1. intuition and experience 2. a stationary fixed point iteration What should a linear or non-linear preconditioner really do? Preprints are available at www.unige.ch/ gander