Solving Large Nonlinear Sparse Systems

Solving Large Nonlinear Sparse Systems Fred W. Wubs and Jonas Thies Computational Mechanics & Numerical Mathematics University of Groningen, the Netherlands f.w.wubs@rug.nl Centre for Interdisciplinary Mathematics Uppsala University, Sweden jonas@math.uu.se Workshop: Tipping Points in Complex Flows Leiden, November 2, 2011

Outline 1 Twolevel ILU 2 They are everywhere Solution Example: eigenvalue problem 3 Improvements and generalizations Summary

Twolevel ILU Objective 3D CFD problems, geophysical applications Compute branches of steady states Identify bifurcation points Investigate stability Key challenge: large sparse linear systems with the Jacobian

Fully implicit approach Incompressible Navier-Stokes equations: u t + N ( u, u) + L u + p = 0 u = 0 Discretize (here second order symmetry-preserving nite dierences on C-grid) Linearize by Newton's method Structure of resulting linear systems (Saddle-point matrix): ( ) ( ) ( ) L + N Grad u f u = (1) Div 0 p f p

Direct vs. Iterative Linear Solvers Sparse Direct robust and easy to use comput. complexity O(N 2 ) in 3D (N: number of unknowns) substantial ll-in O(N 4/3 ) Preconditioned Iterative usually not robust, depend on many parameters can have optimal complexity O(N) save memory + CPU time by avoiding ll-in Can we combine the best of both? ILU close to LU and preserve properties

Ingredients for eective and robust incomplete factorization Fill reducing ordering Fourier-like transformation improves diagonal dominance to get rid of unwanted couplings Drop by retaining principal submatrices these submatrices will be positive denite if the matrix is positive denite For incompressible Navier Stokes equation, do not drop in divergence and gradient part There is no increase of ll in this part (even not in direct method) on C-grid

of the new algorithm Stokes on a structured C-grid

of the new algorithm, step 1 Domain decomposition

of the new algorithm, step 2 Identify separators

of the new algorithm, step 3 Elimination yields `geometric' Schur-complement

of the new algorithm, step 4 Flux representation (`coarse grid')

F-matrices Twolevel ILU A saddle point matrix has the following structure: [ ] A B K =. (2) B T 0 Denition 1 A gradient-type matrix has at most two nonzero entries per row and its row sum is zero. Denition 2 A saddle point matrix (2) is called an F-matrix if A is positive denite and B is a gradient-type matrix. The Jacobian of the Stokes equations (Re 0) on a C-grid is an F-matrix.

Computing an LU decomposition of an F-matrix [ A B B T 0 ] [ xv x p ] = [ fv f p ] V nodes P nodes Algorithm: LU decomposition of an F-matrix. Compute a ll-reducing ordering for the graph F (A) F (BB T ), during Gaussian elimination, insert the P-nodes to form 2 2 pivots whenever a coupling between a V-node and a P-node is encountered. Theorem 1 In every step of the above algorithm, the resulting Schur complement is an F-matrix.

How is ll generated in the direct approach? α β a T b T β 0 ˆbT 0 a ˆb Â ˆB b 0 T ˆB O. (3) Elimination step: Multiple of ˆbˆb T is added to Â; ˆb becomes denser as P-nodes are eliminated; So dropping in Â doesn't make sense. Main problem: For ILU we have to get rid of couplings of velocities to inside pressure

Domain decomposition This ordering exposes parallelism in the matrix: ( ) K11 K K = 12, K 21 K 22 where K 11 is block-diagonal. Subdomains and `separator groups'; Retain one pressure per subdomain.

The Schur complement LU-decomposition of the matrices on the subdomains, K 11 = L 11 U 11 ; Schur-complement: S = K 22 K 21 K 1 11 K 12; S retains structural and numerical properties of K; S has only a few rather dense `B' columns (with at most two entries per row); Solve the system with S by a preconditioned Krylov subspace method. Schur-complement:

How can we maintain sparsity? Still an F-matrix; All V-nodes on a separator are now connected to the same 2 P-nodes; Use orthogonal similarity transformation to disconnect them (harmless: SPD remains SPD)

Dropping Twolevel ILU Use simple drop-by-position: Drop all couplings between separator groups... and all couplings between VΣ and regular V-nodes. Principal submatrices of SPD matrix are SPD

Dropping Twolevel ILU Use simple drop-by-position: Drop all couplings between separator groups... and all couplings between VΣ and regular V-nodes. Principal submatrices of SPD matrix are SPD = Block diagonal preconditioner with a `reduced matrix' S 2 in the lower right.

Why it works Twolevel ILU Orthogonal transformations: Eliminate most V-P couplings to avoid ll; `Transfer operators' dening coarse problem S 2 ; Improves diagonal dom. grid indep. convergence; Coarse problem S 2 : solve for ux V Σ through each separator; Still an F-matrix in case of the Stokes equations; Constraint preconditioning: no approximations in `Grad' or `Div' part; mass is conserved exactly throughout. Drop-by-position original properties preserved (symmetry, positiveness); singular subsystems cannot occur robust approach. No segregation of variables Grid independent convergence: Block size determines rate of convergence In two level variant amount of work is not independent of problem size.

Convergence criterion: r / r 0 < 10 8. Twolevel ILU Example 1: 3D Laplace Equation

Example 1: 3D Laplace Equation (2)

Stokes equations: relative ll

Stokes equations: number of iterations

2D lid-driven cavity Twolevel ILU Incompressible Navier-Stokes; Stretched structured grid (ratio 5); Newton's method; First Hopf-bifurcation at Re 8375 (Tiesinga & Wubs 2002).

Convergence behavior Convergence criterion: r / r 0 < 10 6

Achieving high accuracy Driven Cavity, 512 512 grid; Subdomain size: 8 8; Convergence tolerance 10 10 ; Preconditioned GMRES; = Some modes not captured using this subdomain size.

Robust at high Reynolds numbers Can compute highly unstable steady states; Moderate increase in number of iterations; Conv. tol 10 8 here.

Performance, 3D Flow in a Driven Cavity Trilinos implementation, MUMPS for coarse solve; IBM P6, 32 cores/node (Huygens); subdomain size: 8 3. grid num. setup solve size cores time (s) speedup time (s) speedup 32 3 8 113 (6.7/8) 9.30 (6.4/8) 1 757 59.8 16 64.1 (12/16) 5.24 (11/16) 64 3 16 555 (1.9/2) 50.0 (1.8/2) 8 1 050 91.3 32 290 (3.6/4) 30.2 (3.0/4) 64 176 (6.6/8) 89.3 (0.9/8)

They are everywhere Solution Example: eigenvalue problem They are everywhere [ A V W T C ] [ x s ] = [ fx f c ] where A is a big sparse matrix and V and W contain a number of vectors. Occur in: Continuation (Jacobian A singular near turning point) Eigenvalue computation in Jacobi-Davidson method DO method for stochastic PDEs using implicit methods In both last methods one has to compute a correction on a space perpendicular to the current space.

They are everywhere Solution Example: eigenvalue problem Standard solution Standard approach: Make block LU factorization [ ] [ A 0 I A 1 V W T I 0 C W T A 1 V ] What if A becomes singular. Arpack: targets 0 and 0.1

They are everywhere Solution Example: eigenvalue problem Incorporation in multilevel approach Multilevel ILU comes in very handy. Example in two-level case: A 11 A 12 V 1 A 21 A 22 V 2 = W1 T W2 T C A 11 0 0 I A 1 11 A 12 A 1 11 V 1 A 21 I 0 W1 T 0 I 0 A 22 A 21 A 1 11 A 12 V 2 A 21 A 1 11 V 1 0 W T 2 W T 1 A 1 11 A 12 C W T 1 A 1 11 V 1 On the last level we can apply a direct method with pivoting, which precludes instability. Possible indeniteness is likely to occur in low frequency part of solution. This is pushed downwards to coarsest grid. If the last block preconditioning matrix is indenite then the original is signaling of bifurcation.

They are everywhere Solution Example: eigenvalue problem Eigenvalue problems Jacobi-Davidson QZ method: accelerated inexact Newton method. Space ranges from 20 till 40. Use two-level factorization as approximate Jacobian. Target 0, nd rst 6 eigenvalues First vector from LUx=rand Problem Jacobian matrices from lid driven cavity at Re=100 and 1000 (vertical). Grid renement from 32x32 to 64x64 (horizontal) Plots residual against iteration number (MVs). Mind scale x-axis (200 resp. 400)

They are everywhere Solution Example: eigenvalue problem Eigenvalue problems 1 JDQZ with jmin=20, jmax=40, residual tolerance 1e 08. 1 JDQZ with jmin=20, jmax=40, residual tolerance 1e 08. log 10 r it step 2 log 10 r it step 2 2 2 Test subspace computed as Av tau*dv. 3 4 5 6 7 Test subspace computed as Av tau*dv. 3 4 5 6 7 8 8 9 0 20 40 60 80 100 120 140 160 180 200 Correction equation solved with Augm. Prec.. 9 0 20 40 60 80 100 120 140 160 180 200 Correction equation solved with Augm. Prec.. 1 JDQZ with jmin=20, jmax=40, residual tolerance 1e 08. 1 11 2011, 17:46:47 log 10 r it step 2 1 JDQZ with jmin=20, jmax=40, residual tolerance 1e 08. 1 11 2011, 17:55: 9 log 10 r it step 2 2 2 Test subspace computed as Av tau*dv. 3 4 5 6 7 Test subspace computed as Av tau*dv. 3 4 5 6 7 8 8 9 0 50 100 150 200 250 300 350 400 Correction equation solved with Augm. Prec.. 1 11 2011, 18: 7:44 9 0 50 100 150 200 250 300 350 400 Correction equation solved with Augm. Prec.. 1 11 2011, 18:13:29

Improvements and generalizations Summary Generalizations Dierent coordinate systems spherical coordinates common in geophysics Flux-formulation = F-matrix Dierent discretizations: B-grid Dierent physics: can solve Poisson, Convection-Diusion, Stokes adding heat transfer is easy Coriolis force - skew-symmetric, so it works good scaling is essential rotate v by 45 = F-matrix

Improvements and generalizations Summary Possible improvements Multi-level extension: Reduced problem has same structure as original matrix; Recursive application leads to linear comp. complexity. Deation to avoid `plateaus' in GMRES Adaptivity: Any domain decomposition can be used; Inhom. problems: short separators in regions of weak coupling. Unstructured grids: Structure-preserving direct method?

Improvements and generalizations Summary Summary Bifurcation analysis requires fast and robust linear algebra We developed a solver that combines Ease of use: only one parameter; Robustness: factorization doesn't break down; Can be used as approximate Jacobian Parallelism: exposed on every level; Grid-independent convergence for ILU. Extendable to multi-physics problems Next steps Recursive solver with O(N) complexity Implement deation Do some nice CFD problems

Improvements and generalizations Summary References A.C. de Niet and F.W. Wubs. Numerically stable LDL T factorization of F-type saddle point matrices. IMA Journal of Numerical Analysis, vol. 29, no 1, pp. 208-234. F.W.Wubs and J.Thies, A robust two-level incomplete factorization for (Navier-) Stokes saddle point matrices, to appear in SIAM J. Matrix Anal. Appl., 2011. Preprint available on arxiv:1006.1874v1. G.L.G. Sleijpen and F.W. Wubs. Exploiting Multilevel Preconditioning Techniques in Eigenvalue Computations. SIAM Journal on Scientic Computing, 25(4):1249-1272, 2003.