AMG for a Peta-scale Navier Stokes Code

AMG for a Peta-scale Navier Stokes Code James Lottes Argonne National Laboratory October 18, 2007

The Challenge Develop an AMG iterative method to solve Poisson 2 u = f discretized on highly irregular (stretched, deformed, curved) trilinear FEs. System Ax = b needs solving many many times (for different b s), allowing a very high constant for set-up time. Scalability requirements start at P > 32,000 as a good solution is in place below this level.

Outline Notation for iterative solvers and multigrid (3 slides) Analysis of two-level multigrid, illustrated on model problem AMG iteration design

Iterative Solvers Linear system to solve is Ax = b. Iteration defined through preconditioner B by x k = x k 1 + B(b Ax k 1 ) = (I BA)x k 1 + Bb. Error e k x x k behaves as e k = Ee k 1 = E k e 0, E I BA. Convergence characterized by ρ(e), and E A ρ(e).

Multigrid Iteration Multigrid iteration defined by E mg = I B mg A = (I PB c P t A)(I BA), where B c is a multigrid preconditioner for the coarse operator A c P t AP, defined in terms of the n n c prolongator P, and B is the smoother.

C-F Point Algebraic Multigrid Assume n c coarse variables are a subset of the n variables, so that the prolongation matrix takes the form [ ] W P I for some n f n c W, with n f + n c = n. Let A have the corresponding block form [ ] Aff A A = fc, and also let A cf A cc R f [ I O ], R c [ O I ].

Model Problem Poisson, bilinear FEs, Neumann BCs Mesh is 2-D slice from an application mesh A is SPD (but not an M-matrix) except A1 = 0

C-F Points Figure: C-pts in red, F-pts in blue

Prolongation The energy-minimizing prolongation of Wan, Chan, and Smith [8, 9] Find P, given its sparsity pattern, and with R c P = I, that minimizes tr(p T AP) subject to P1 nc = 1 n.

Smoothers Damped Jacobi insufficient, Gauss-Seidel not parallel Adams, Brezina, Hu, and Tuminaro [1] recommend Chebyshev polynomial smoothers over Gauss-Seidel Sparse Approximate Inverse: Tang & Wan [6] Find B with a given sparsity pattern that minimizes I BA F SAI-0: Diagonal B A ii B ii = n k=1 A (compare to Jacobi: B ii = ω ) ika ik A ii SAI-1: Sparsity pattern of A used for M Simple to compute, and parameter-free

Chebyshev Polynomial Smoothing Chebyshev semi-iterative method to accelerate I BA B 1 = a 1 B, B 2 = a 1 B + a 2 BAB, λ(i B 1 A) = 1 a 1 λ(ba) λ(i B 2 A) = 1 a 1 λ(ba) a 2 λ 2 (BA) 1 0.8 0.6 0.4 0.2 0-0.2-0.4 Choose coefficients using Chebyshev polynomials to damp error modes λ [λ min, λ max (BA)] λ min open parameter B: Jacobi, Diagonal SAI 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Smoother Error Propagation Spectra 1.4 1.2 Gauss Seidel Damped Jacobi Diagonal SAI SAI, A s stencil Diagonal SAI, 1st Chebyshev Improvement 1 0.8 0.6 0.4 0.2 0 0 500 1000 1500 Figure: λ i (I BA) vs. i

Two-grid Error Propagation Spectra 1.4 1.2 Gauss Seidel Damped Jacobi Diagonal SAI SAI, A s stencil Diagonal SAI, 1st Chebyshev Improvement 1 0.8 0.6 0.4 0.2 0 0 500 1000 1500 Figure: λ i [(I P(P t AP) 1 P t A)(I BA)] vs. i

Change of Basis Given invertible T, defining a change of basis by, x = T ˆx, Transformed system is Âˆx = T t b, Â T t AT. Transformed iteration given by ˆB T 1 BT t, ˆP T 1 P, Ê mg (I ˆPB ˆPt c Â)(I ˆBÂ) = T 1 E mg T. Equivalent in that λ(êmg) = λ(e mg ), Êmg Â = E mg A.

On the C-F Point Assumption Analysis of a C-F point AMG method can apply to other, more general AMG methods, so long as one can find a T such that [ ] ˆP = T 1 W P =. I This T is related to the R and S of Falgout and Vassilevski s On generalizing the AMG framework [4] through [ ] T 1 S t =. R

Hierarchical Basis Two-level hierarchical basis [ ] I W T = [ R I f t P ], T 1 = Chosen so that ˆP = R t c [ ] I W I Transforms A into [ A Â = ff W t A ff + A cf ] A ff W + A fc A c

Discrete Fundamental Solutions The transformed A, [ A Â = ff W t A ff + A cf A ff W + A fc A c ], is block-diagonal when the coarse basis functions are the discrete fundamental solutions W = A 1 ff A fc. But W is not sparse, hence not a viable choice. Introduce F W W.

Introducing F In terms of F, and where [ ] Aff A Â = ff F F t, A ff A c A c = S c + F t A ff F, S c A cc A cf A 1 ff A fc is the Schur complement of A ff in both A and Â. Note v 2 A c = v 2 S c + F v 2 A ff.

Exact Compatible Relaxation Theorem When B c = A 1 c, non-zero eigenvalues of E mg are also eigenvalues of I ˆB ff S f, where ˆB ff = [ I W ] B [ I W ] t, is the ff -block of the transformed smoother, and S f = A ff A ff FA 1 c F t A ff is the Schur complement of A c in Â.

Exact Compatible Relaxation Spectra 1.2 Gauss Seidel Damped Jacobi Diagonal SAI SAI, A s stencil Diagonal SAI, 1st Chebyshev Improvement 1 0.8 0.6 0.4 0.2 0 0 100 200 300 400 500 600 700 800 900 Figure: λ i (I ˆB ff S f ) vs. i

Equivalent F-relaxation Corollary With an exact coarse grid correction, the spectrum of E mg is left unchanged when the smoother B is replaced by the F-relaxation B F-r = R t f ˆB ff R f, where again, ˆB ff = [ I W ] B [ I W ] t.

Coarse- and Smoother-space Energies Coarse-space operator A c = S c + F t A ff F. Smoother-space operator S f = A ff A ff FA 1 c F t A ff. Coarse-space and smoother-space energies v 2 A c = v 2 S c + F v 2 A ff, w 2 S f = w 2 A ff F t A ff w 2, Ac 1 are minimal and maximal for any vectors when F = O. S f is close to A ff when F is small.

Inexact Compatible Relaxation Spectra 1.2 Gauss Seidel Damped Jacobi Diagonal SAI SAI, A s stencil Diagonal SAI, 1st Chebyshev Improvement 1 0.8 0.6 0.4 0.2 0 0 100 200 300 400 500 600 700 800 900 Figure: λ i (I ˆB ff A ff ) vs. i

Measuring F Define γ as an energy norm of F, F v Aff γ sup. v 0 v Ac This quantity appears in, e.g., Falgout, Vassilevski, and Zikatanov [5] in the form γ 2 v 2 A = sup c v 2 S c v 0 v 2 < 1 A c as the square of the cosine of the abstract angle between the hierarchical component subspaces.

How Closely A ff Approximates S f Lemma The eigenvalues of A 1 ff S f are real and bounded by 0 < 1 γ 2 λ(a 1 ff S f ) 1.

A ff vs. S f : 1 γ 2 λ(a 1 ff S f ) 1 1.01 1 0.99 0.98 0.97 0.96 0.95 0.94 0 100 200 300 400 500 600 700 800 900 Figure: λ i (A 1 ff S f ) vs. i

An Inexact Compatible Relaxation Iteration Theorem If ˆB ff is symmetric and ρ(i ˆB ff A ff ) ρ f < 1, then ρ(e mg ) = ρ(i ˆB ff S f ) ρ f + γ 2 (1 ρ f ).

Symmetric Cycle Corollary Define the symmetrized smoother as and its transformed ff -block as If σ is given such that then B s B + B t BAB t, ˆB ff,s [ I W ] B s [ I W ] t. ρ(i ˆB ff,s A ff ) σ 2 < 1, E mg 2 A = ρ[(i Bt A)Q(I BA)] σ 2 + γ 2 (1 σ 2 ).

Symmetric Cycle with F-relaxation Corollary If the smoother is an F-relaxation, and σ is given such that then, again, B = R t f ˆB ff R f, I ˆB ff A ff Aff σ < 1, E mg 2 A = ρ[(i Bt A)Q(I BA)] σ 2 + γ 2 (1 σ 2 ). Equivalent to first half of Theorem 4.2 in Falgout, Vassilevski, and Zikatanov s On two-grid convergence estimates [5].

AMG Iteration Design Central goal: make γ small Coarsening heursistic: make the columns of A 1 ff decay quickly Prolongation: choose sparsity by cutting off A 1 ff A fc according to some tolerance Smoother: simple F-relaxation of A ff

Designer F-relaxations, Compatible Relaxation Prediction 1.2 Diagonal SAI Diagonal SAI, 1st Chebyshev Improvement Damped Diagonal SAI of A ff Damped SAI of A ff with A ff sparsity Diagonal SAI of A ff, 1st Chebyshev Improvement 1 0.8 0.6 0.4 0.2 0 0 100 200 300 400 500 600 700 800 900 Figure: λ i (I ˆB ff A ff ) vs. i

Designer F-relaxations, Two-level Performance 1.2 1 Diagonal SAI Diagonal SAI, 1st Chebyshev Improvement Damped Diagonal SAI of A ff Damped SAI of A ff with A ff sparsity Diagonal SAI of A ff, 1st Chebyshev Improvement A ff 1 0.8 0.6 0.4 0.2 0 0 100 200 300 400 500 600 700 800 900 Figure: λ i (I ˆB ff S f ) vs. i

M. Adams, M. Brezina, J. Hu, and R. Tuminaro. Parallel multigrid smoothing: polynomial versus Gauss-Seidel. Journal of Computational Physics, 188(2):593 610, 2003. J. Brannick and L. Zikatanov. Domain Decomposition Methods in Science and Engineering XVI, volume 55 of Lecture Notes in Computational Science and Engineering, chapter Algebraic multigrid methods based on compatible relaxation and energy minimization, pages 15 26. Springer Berlin Heidelberg, 2007. O. Bröker and M. J. Grote. Sparse approximate inverse smoothers for geometric and algebraic multigrid. Applied Numerical Mathematics, 41(1):61 80, 2002. R. D. Falgout and P. S. Vassilevski. On generalizing the algebraic multigrid framework. SIAM Journal on Numerical Analysis, 42(4):1669 1693, 2004.

R. D. Falgout, P. S. Vassilevski, and L. T. Zikatanov. On two-grid convergence estimates. Numerical Linear Algebra with Applications, 12(5 6):471 494, 2005. W.-P. Tang and W. L. Wan. Sparse approximate inverse smoother for multigrid. SIAM Journal on Matrix Analysis and Applications, 21(4):1236 1252, 2000. H. M. Tufo and P. F. Fischer. Fast parallel direct solvers for coarse grid problems. J. Par. & Dist. Comput., 61:151 177, 2001. W. L. Wan, T. F. Chan, and B. Smith. An energy-minimizing interpolation for robust multigrid methods. SIAM Journal on Scientific Computing, 21(4):1632 1649, 1999. J. Xu and L. Zikatanov.

On an energy minimizing basis for algebraic multigrid methods. Computing and Visualization in Science, 7(3 4):121 127, 2004.

Two-level Analysis Assume the coarse grid correction is exact, The coarse grid correction is a projection. ˆQ I B c A 1 c. Q I PA 1 c P t A, 1 ˆPA c ˆP t Â = I RcA t 1 c R c Â, The error propagation matrix spectrum is λ(e mg ) = λ[ ˆQ(I ˆBÂ)] = λ[(i ˆBÂ) ˆQ].

Proof of First Theorem One may calculate [ ˆQ = Â ˆQ = I O A 1 c F t A ff O [ Aff A ff FA 1 c F t ] A ff O [ ] ˆBff ˆBfc ˆB, ˆB cf ˆBcc ], [ Sf [ (I ˆBÂ) ˆQ I = ˆB ] ff S f O A 1 c F t A ff ˆB cf S f O S f is the Schur complement of A c in Â. O ],

More on γ One may alternatively define F v Aff β sup = Rf t v 0 v FR c A. Sc The two quantities are related through γ 2 = sup v 0 F v 2 A ff v 2 S c + F v 2 A ff F v 2 A = sup ff / v 2 S c v 0 1 + F v 2 A ff / v 2 = β2 S c 1 + β 2.

How Closely A ff Approximates S f Lemma The eigenvalues of A 1 ff S f are real and bounded by 0 < 1 γ 2 λ(a 1 ff S f ) 1. Proof. A 1 ff S f = I FA 1 c F t A ff. λ(a 1 ff S f ) = 1 λ(fa 1 c F t A ff ) = 1 [{0} λ(a 1 c F t A ff F )]. F v 2 A 0 inf ff v 0 v 2 A c λ(a 1 c F t F v 2 A A ff F ) sup ff v 0 v 2 = γ 2. A c