Structure preserving preconditioner for the incompressible Navier-Stokes equations

Structure preserving preconditioner for the incompressible Navier-Stokes equations Fred W. Wubs and Jonas Thies Computational Mechanics & Numerical Mathematics University of Groningen, the Netherlands f.w.wubs@rug.nl Centre for Interdisciplinary Mathematics Uppsala University, Sweden jonas@math.uu.se Workshop: Recent developments in the solution of indenite systems Eindhoven, April 17, 2012

Outline The Challenge 1 The Challenge Workshops 2 Current Results A cartoon Computational experiments 3 They are everywhere Solution Example: eigenvalue problem 4 Summary

Workshops The Challenge

Past and future The Challenge Workshops 1995 Laplace Symphony How fast the Laplace equation was solved in 1995, E.F.F. Botta, K. Dekker, Y. Notay, A. van der Ploeg, C. Vuik, F.W. Wubs and P.M. de Zeeuw, Applied Numerical Mathematics, 1997, pp. 439-455. 2013 Minisymposium on Comparison of solvers in Large Scale Bifurcation Analysis Where: BIFD conference July 8-13, Haifa, Israel Organizers: Henk Dijkstra and Fred Wubs 2 Problems: 3D Lid Driven Cavity, Boussinesq on the globe

Benchmark problems The Challenge Workshops 1 3D Lid Driven Cavity Problem description and results in Oscillatory instability of a three-dimensional lid-driven ow in a cube by Yuri Feldman and Alexander Yu. Gelfgat, Phys. Fluids 22, 093602 (2010). They used FVM, 128 3-200 3 grid. Aim is to study transition from steady state to periodic solution. 2 Boussinesq on the Globe. Domain from 60 degrees N Lat. to 60 degrees S Lat. Continent modelled by one line going from the north pole to 50 degrees S Lat. Depth 4000m. Numerical ingredients: continuation of steady states and periodic solutions, nonlinear equations, eigenvalue problems. Key challenge: ecient solution of large sparse linear systems.

Current Results A cartoon Computational experiments

Stokes on HPC computers Current Results A cartoon Computational experiments Jacobian, rhs, construction and application of ILU factorization all in parallel. Solver is expressed in EPETRA package from Trilinos. Both cases 2 block eliminations 116 iterations for 8 digits. Opteron cluster 64 3 grid (1 million unknowns): one solve: 10 minutes on 32 cores from 3 nodes. IBM power 6 cluster (Huygens): 128 3 grid: one solve: 5 minutes on 256 cores from 8 nodes. Based on previous experience: one solve with Nav. Stokes Jacobian takes about the same amount.

Fully implicit approach Incompressible Navier-Stokes equations: Current Results A cartoon Computational experiments u t + N ( u, u) + L u + p = 0 u = 0 Discretize (here second order symmetry-preserving nite dierences on C-grid) Linearize by Newton's method Structure of resulting linear systems (Saddle-point matrix): ( ) ( ) ( ) L + N Grad u f u = (1) Div 0 p f p

Current Results A cartoon Computational experiments Ingredients for eective and robust incomplete factorization (Start from primitive equations) Fill reducing ordering Fourier-like transformation improves diagonal dominance to get rid of unwanted couplings Drop by retaining principal submatrices these submatrices will be positive denite if the matrix is positive denite For incompressible Navier Stokes equation, do not drop in divergence and gradient part There is no increase of ll in this part (even not in direct method) on C-grid

A cartoon of the new algorithm Stokes on a structured C-grid Current Results A cartoon Computational experiments

Current Results A cartoon Computational experiments A cartoon of the new algorithm, step 1 Domain decomposition

Current Results A cartoon Computational experiments A cartoon of the new algorithm, step 2 Identify separators

Current Results A cartoon Computational experiments A cartoon of the new algorithm, step 3 Elimination yields `geometric' Schur-complement NB: All horizontal (vertical) velocities on a vertical (horizontal) separator are coupled to the pressure inside Perform transformation and drop.

Current Results A cartoon Computational experiments A cartoon of the new algorithm, step 4 Flux representation (`coarse grid') Transformation is such that amount of mass owing through the separator remains the same.

Dropping The Challenge Current Results A cartoon Computational experiments Use simple drop-by-position: Drop all couplings between separator groups... and all couplings between VΣ and regular V-nodes. Principal submatrices of SPD matrix are SPD

Example 1: 3D Laplace Equation Current Results A cartoon Computational experiments Convergence criterion: r / r 0 < 10 8.

Current Results A cartoon Computational experiments Stokes equations: number of iterations

2D lid-driven cavity The Challenge Current Results A cartoon Computational experiments Incompressible Navier-Stokes; Stretched structured grid (ratio 5); Newton's method; First Hopf-bifurcation at Re 8375 (Tiesinga & Wubs 2002).

Convergence behavior Current Results A cartoon Computational experiments Convergence criterion: r / r 0 < 10 6

Robust at high Reynolds numbers Current Results A cartoon Computational experiments Can compute highly unstable steady states; Moderate increase in number of iterations; Conv. tol 10 8 here.

They are everywhere Solution Example: eigenvalue problem

They are everywhere The Challenge They are everywhere Solution Example: eigenvalue problem [ A V W T C ] [ x s ] = [ fx f c ] where A is a big sparse matrix and V and W contain a number of vectors. Occur in: Continuation (Jacobian A singular near turning point) Eigenvalue computation in Jacobi-Davidson method DO method for stochastic PDEs using implicit methods In both last methods one has to compute a correction on a space perpendicular to the current space.

Standard solution The Challenge They are everywhere Solution Example: eigenvalue problem Standard approach: Make block LU factorization [ ] [ A 0 I A 1 V W T I 0 C W T A 1 V ] What if A becomes singular. Arpack: targets 0 and 0.1

Incorporation in multilevel approach They are everywhere Solution Example: eigenvalue problem Multilevel ILU comes in very handy. Example in two-level case: A 11 A 12 V 1 A 21 A 22 V 2 = W1 T W2 T C A 11 0 0 A 21 I 0 I A 1 11 A 12 A 1 11 V 1 W1 T 0 I 0 A 22 A 21 A 1 11 A 12 V 2 A 21 A 1 11 V 1 0 W T 2 W T 1 A 1 11 A 12 C W T 1 A 1 11 V 1 On the last level we can apply a direct method with pivoting, which precludes instability. Possible indeniteness is likely to occur in low frequency part of solution. This is pushed downwards to coarsest grid. If the last block preconditioning matrix is indenite then the original is signaling of bifurcation.

Eigenvalue problems The Challenge They are everywhere Solution Example: eigenvalue problem Jacobi-Davidson QZ method: accelerated inexact Newton method. Space ranges from 20 till 40. Use two-level factorization as approximate Jacobian. Target 0, nd rst 6 eigenvalues First vector from LUx=rand Problem Jacobian matrices from lid driven cavity at Re=100 and 1000 (vertical). Grid renement from 32x32 to 64x64 (horizontal) Plots residual against iteration number (MVs). Mind scale x-axis (200 resp. 400)

Eigenvalue problems The Challenge They are everywhere Solution Example: eigenvalue problem 1 JDQZ with jmin=20, jmax=40, residual tolerance 1e 08. 1 JDQZ with jmin=20, jmax=40, residual tolerance 1e 08. log 10 r it step 2 log 10 r it step 2 2 2 Test subspace computed as Av tau*dv. 3 4 5 6 7 Test subspace computed as Av tau*dv. 3 4 5 6 7 8 8 9 0 20 40 60 80 100 120 140 160 180 200 Correction equation solved with Augm. Prec.. 9 0 20 40 60 80 100 120 140 160 180 200 Correction equation solved with Augm. Prec.. 1 JDQZ with jmin=20, jmax=40, residual tolerance 1e 08. 1 11 2011, 17:46:47 log 10 r it step 2 1 JDQZ with jmin=20, jmax=40, residual tolerance 1e 08. 1 11 2011, 17:55: 9 log 10 r it step 2 2 2 Test subspace computed as Av tau*dv. 3 4 5 6 7 Test subspace computed as Av tau*dv. 3 4 5 6 7 8 8 9 0 50 100 150 200 250 300 350 400 Correction equation solved with Augm. Prec.. 9 0 50 100 150 200 250 300 350 400 Correction equation solved with Augm. Prec.. 1 11 2011, 18: 7:44 1 11 2011, 18:13:29

Summary

Summary The Challenge Summary Bifurcation analysis requires fast and robust linear algebra We developed a solver that combines Ease of use: only one parameter; Robustness: factorization doesn't break down; Can be used as approximate Jacobian Parallelism: exposed on every level; Grid-independent convergence for ILU. Extendable to multi-physics problems Low global communication (one per iteration) Next steps Improvements on accuracy and performance. Do some nice (multiphysics) CFD problems (see Challenge). Look for generalizations.

Summary References A.C. de Niet and F.W. Wubs. Numerically stable LDL T factorization of F-type saddle point matrices. IMA Journal of Numerical Analysis, vol. 29, no 1, pp. 208-234. F.W.Wubs and J.Thies, A robust two-level incomplete factorization for (Navier-) Stokes saddle point matrices, SIAM J. Matrix Anal. Appl., 32:1475-1499, 2011. Jonas Thies and Fred Wubs, Design of a Parallel Hybrid Direct/Iterative Solver for CFD Problems. Proceedings 2011 Seventh IEEE International Conference on escience, 5-8 December 2011, Stockholm, Sweden, pages 387-394, 2011.. G.L.G. Sleijpen and F.W. Wubs. Exploiting Multilevel Preconditioning Techniques in Eigenvalue Computations. SIAM Journal on Scientic Computing, 25(4):1249-1272, 2003.

Summary Direct vs. Iterative Linear Solvers Sparse Direct robust and easy to use comput. complexity O(N 2 ) in 3D (N: number of unknowns) substantial ll-in O(N 4/3 ) Preconditioned Iterative usually not robust, depend on many parameters can have optimal complexity O(N) save memory + CPU time by avoiding ll-in Can we combine the best of both? ILU close to LU and preserve properties

F-matrices The Challenge Summary A saddle point matrix has the following structure: [ ] A B K =. (2) B T 0 Denition 1 A gradient-type matrix has at most two nonzero entries per row and its row sum is zero. Denition 2 A saddle point matrix (2) is called an F-matrix if A is positive denite and B is a gradient-type matrix. The Jacobian of the Stokes equations (Re 0) on a C-grid is an F-matrix.

Summary Computing an LU decomposition of an F-matrix [ A B B T 0 ] [ xv x p ] = [ fv f p ] V nodes P nodes Algorithm: LU decomposition of an F-matrix. Compute a ll-reducing ordering for the graph F (A) F (BB T ), during Gaussian elimination, insert the P-nodes to form 2 2 pivots whenever a coupling between a V-node and a P-node is encountered. Theorem 1 In every step of the above algorithm, the resulting Schur complement is an F-matrix.

Summary How is ll generated in the direct approach? α β a T b T β 0 ˆbT 0 a ˆb Â ˆB b 0 T ˆB O. (3) Elimination step: Multiple of ˆbˆb T is added to Â; ˆb becomes denser as P-nodes are eliminated; So dropping in Â doesn't make sense. Main problem: For ILU we have to get rid of couplings of velocities to inside pressure

Domain decomposition Summary This ordering exposes parallelism in the matrix: ( ) K11 K K = 12, K 21 K 22 where K 11 is block-diagonal. Subdomains and `separator groups'; Retain one pressure per subdomain.

The Schur complement Summary LU-decomposition of the matrices on the subdomains, K 11 = L 11 U 11 ; Schur-complement: S = K 22 K 21 K 1 11 K 12; S retains structural and numerical properties of K; S has only a few rather dense `B' columns (with at most two entries per row); Solve the system with S by a preconditioned Krylov subspace method. Schur-complement:

How can we maintain sparsity? Summary Still an F-matrix; All V-nodes on a separator are now connected to the same 2 P-nodes; Use orthogonal similarity transformation to disconnect them (harmless: SPD remains SPD)

Why it works The Challenge Summary Orthogonal transformations: Eliminate most V-P couplings to avoid ll; `Transfer operators' dening coarse problem S 2 ; Improves diagonal dom. grid indep. convergence; Coarse problem S 2 : solve for ux V Σ through each separator; Still an F-matrix in case of the Stokes equations; Constraint preconditioning: no approximations in `Grad' or `Div' part; mass is conserved exactly throughout. Drop-by-position original properties preserved (symmetry, positiveness); singular subsystems cannot occur robust approach. No segregation of variables Grid independent convergence: Block size determines rate of convergence In two level variant amount of work is not independent of problem size.

Summary Example 1: 3D Laplace Equation (2)

Stokes equations: relative ll Summary

Achieving high accuracy Summary Driven Cavity, 512 512 grid; Subdomain size: 8 8; Convergence tolerance 10 10 ; Preconditioned GMRES; = Some modes not captured using this subdomain size.

Generalizations The Challenge Dierent coordinate systems Summary spherical coordinates common in geophysics Flux-formulation = F-matrix Dierent discretizations: B-grid Dierent physics: can solve Poisson, Convection-Diusion, Stokes adding heat transfer is easy Coriolis force - skew-symmetric, so it works good scaling is essential rotate v by 45 = F-matrix

Possible improvements Summary Multi-level extension: Reduced problem has same structure as original matrix; Recursive application leads to linear comp. complexity. Deation to avoid `plateaus' in GMRES Adaptivity: Any domain decomposition can be used; Inhom. problems: short separators in regions of weak coupling. Unstructured grids: Structure-preserving direct method?