Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Introduction to Simulation - Lecture 2 Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy

Outline Reminder about FEM and F-D -D Example Finite Difference Matrices in, 2 and 3D Gaussian Elimination Costs Krylov Method Communication Lower bound Preconditioners based on improving communication

Heat Flow -D Example Normalized -D Equation Normalized Poisson Equation T( x) u( x) κ = h = s x x x 2 2 f ( x) u ( x) = f ( x) xx

FD Matrix properties -D Poisson u = xx Finite Differences f x0 x x2 x uˆ 2uˆ uˆ + + = x j j j xn x n + 2 f( x ) uˆ f( x) 2 0 0 L M M 2 O M M M 0 O O O 0 M = M M O O O M M 0 L 0 2 M M uˆ n f( xn) A 2 j

Using Basis Functions Residual Equation Partial Differential Equation form 2 u = 2 x Basis Function Representation f u(0) = 0 u() = 0 u( x) u ( ) ( ) h x ωi i x { ϕ Plug Basis Function Representation into the Equation d ϕ ( x) R x f x ( ) n = i= n 2 i = ωi + 2 i= dx Basis Functions ( )

Using Basis functions Basis Weights Galerkin Scheme Force the residual to be orthogonal to the basis functions ϕ ( ) ( ) l x R x dt = 0 Generates n equations in n unknowns 0 0 ϕ l ( ) d ( ) n 2 ϕi x ω ( ) i f x dx 2 i= dx x + = 0 l {,..., n}

Using Basis Functions Basis Weights Galerkin with integration by parts Only first derivatives of basis functions ( x) n d ωϕ ( x) i i i= dx ϕ ( x) f ( x) dx = i dx dϕ l dx 0 0 0 l {,..., n}

Structural Analysis of Automobiles Equations Force-displacement relationships for mechanical elements (plates, beams, shells) and sum of forces = 0. Partial Differential Equations of Continuum Mechanics

Drag Force Analysis of Aircraft Equations Navier-Stokes Partial Differential Equations.

Engine Thermal Analysis Equations The Poisson Partial Differential Equation.

FD Matrix properties m x x2 x m + m 2-D Discretized Problem xm x2m Discretized Poisson uj+ 2uj + uj uj+ m 2uj + uj m + = 2 2 442443 x y 442443 u xx u yy f( x ) j

FD Matrix properties 2-D Discretized Problem Matrix Nonzeros, 5x5 example

FD Matrix properties m m 3-D Discretization Discretized Poisson x j m x 2 j m x j x j m x j + x 2 j + m x j + m uˆ 2 2 2ˆ ˆ ˆ ˆ 2 ˆ ˆ 2ˆ ˆ u uj u j+ uj+ uj uj+ m uj+ u + j m j+ m j m + + = 2 2 2 ( x) ( y) ( z) 442443 442443 44424443 u u u xx yy zz f ( x ) j

FD Matrix properties 3-D Discretization Matrix nonzeros, m = 4 example

FD Matrix properties Summary Numerical Properties Matrix is Irreducibly Diagonally Dominant Matrix is symmetric positive definite Assuming uniform discretization, diagonal is Aii Aij j i Each row is strictly diagonally dominant, or path connected to a strictly diagonally dominant row D 2, 2 D 4, 3 D 6, 2 2 2

FD Matrix properties Summary Structural Properties Matrices in 3-D are LARGE 2 2 3 3 D m m, 2 D m m, 3 D m m 00x00x00 grid in 3-D = million x million matrix Matrices are very sparse Nonzeros per row D 3, 2 D 5, 3 D 7 Matrices are banded D A = 0 i j > 2 D A = 0 i j > m 3 D A = 0 i j > m ij ij ij 2

Basics of GE Triangularizing Picture A A A A 2 3 4 A0 A A A A22 23 2 22 23 24 A0 A A A A 32 0 A A24 A33 A 34 3 32 33 34 A0 A0 0A A 43 A 42 A A34 A A 44 A 44 4 42 43 44

GE Basics Triangularizing Algorithm For i = to n- { For each Row For j = i+ to n { For each Row below pivot For k = i+ to n { For each element beyond Pivot } } } A A A A ji jk jk ik Aii Multiplie r Pivot Form n- reciprocals (pivots) Form n i= Perform 2 n ( n i) = 2 n 2 ( n i) n 3 i= multipliers 2 3 Multiply-adds

Complexity of GE On = Om 3 3 D ( ) ( ) 00 pt grid On = Om 3 6 2 D ( ) ( ) 00 0 On = Om 0grid 6 (0 ) ops O( 2 0 ) ops 00 g O(0 ) ops 3 9 8 3 D ( ) ( ) 00 00 rid O For 2- D and 3-D problems Need a Faster Solver!

b Banded GE b NONZEROS 0 Triangularizing Algorithm For i = to n- { For j = i+ to n i+b- { { For k = i+ to n { A A A A i+b- { ji jk jk ik Aii 0 } } } n i= Perform (min(, )) ( ) 2 2 b n i O b n Multiply-adds

Complexity of Banded GE Obn = Om 2 D ( ) ( ) 00 pt grid Obn = Om 2 4 2 D ( ) ( ) 00 00 grid 3 D Obn = Om O(00) ops 2 7 ( ) ( ) 00 00 00 grid O 8 (0 ) ops O 4 (0 ) ops For 3 - D problems Need a Faster Solver!

The World According to Krylov Preconditioning Start with Ax = b, Form PAx = Pb 0 Determine the Krylov Subspace r = Pb PAx 0 0 k Krylov Subspace span r, PAr,..., PA r 0 { ( ) } Select Solution from the Krylov Subspace k k k k x + = x + y, y span r, PAr,..., PA r { ( ) } 0 0 0 0 k GCR picks a residual minimizi ng y.

Krylov Methods Preconditioning Diagonal Preconditioners Let A = D + A nd Apply GCR to ( ) ( ) nd D A x = I + D A x= D b The Inverse of a diagonal is cheap to compute Usually improves convergence

Krylov Methods Convergence Analysis Optimality of GCR poly r GCR Optimality Property th % ( PA) r where % is any k order k + 0 k+ k+ polynomial such that % 0 = Therefore Any polynomial which satisfies the constraints can be used to get an upper bound on k+ ( ) k r r + 0

Residual Poly Picture for Heat Conducting Bar Matrix No loss to air (n=0) ( ) Keep as small as possible: k λi Easier if eigenvalues are clustered!

The World According to Krylov Krylov Vectors for diagonal Preconditioned A x ( digit) 0.9-0.7 0.6-0.5 0.4-0.3 0. exact b 0 = r 0 0 0 0 0 0 D A D A -.5 0 0 0 0 0.25 - -.5 0 0 0 0 0.5 0 0 0.5 0 O 0 O O 0.5 0 0 0.5 444 42444443 D A 0 0.5 M 0 -D Discretized PDE =.25 0.5 M0.5 0

The World According to Krylov Krylov Vectors for Diagonal Preconditioned A Communication Lower Bound x ( digit) 0.9-0.7 0.6-0.5 0.4-0.3 0. exact b 0 = r 0 0 0 0 0 0 D A D A -.5 0 0 0 0 0.25 - -.5 0 0 0 0 Communication Lower Bound for m gridpoints ( ) k 0 th D A r is nonzero in m entry after k = m iters k ( k+ ) 0 j Need at least m iterations for x = x + α jr 0 m j= m

m The World According to Krylov m m 2 Krylov Vectors for Diagonal Preconditioned A Two Dimensional Case For an mxm Grid If r 0 0 = M 0 ( ) Takes 2m= O m iters ( x k + ) for 0 m 2

The World According to Krylov Convergence for GCR Eigenanalysis 0.5 0 0 0.5 0 O 0 O O 0.5 0 0 0.5 444 42444443 D A 0 0.5 M 0 =.25 0.5 M0.5 0 kπ D A = m + Recall Eigenvalues of cos

The World According to Krylov k r r 0 κ Convergence for GCR Eigenanalysis max For D A, cos cos λmin m # Iters for GCR to achieve convergence k λ mπ π = + m+ k 2 κ convergence γ κ + tolerance γ log = 2 κ m log κ + Om ( ) GCR achieves Communication lower bound O(m)!

The World According to Krylov Work for Banded Gaussian Elimination, Relaxation and GCR Dimension Banded GE Sparse GE GCR 2 3 ( ) ( ) ( 2 ) O m O m O m ( 4) ( 3) ( 3) O m O m O m ( 7) ( 6) ( 4) O m O m O m GCR faster than banded GE in 2 and 3 dimensions Could be faster, 3-D matrix only m 3 nonzeros. Must defeat the communication lower bound!

The World According to Krylov How to get Faster Converging GCR Preconditioning is the Only Hope GCR already achieves Communication Lower bound for a diagonally preconditioned A Preconditioner must accelerate communication Multiplying by PA must move values by more than one grid point.

Preconditioning Approaches u 0 (new) û (new) û (old) û 2 (new) û 2 (old) û 3 Gauss-Seidel Preconditioning Physical View -D Discretized PDE Gauss Seidel (new) uˆn (new) uˆn u ˆn+ Each Iteration of Gauss-Seidel Relaxation moves data across the grid

Preconditioning Approaches x ( digit) 0.9-0.7 0.6-0.5 0.4-0.3 0. exact b ( + ) ( + ) D L A D L A b ( + ) ( + ) D L A D L A 0 = r 0 0 0 0 0 0 x x x x x x x x x x x x x x 0 = r 0 0 0 0 0 0 0 0 0 0 0 x x 0 0 0 0 x x x Gauss-Seidel Preconditioning Krylov Vectors Gauss-Seidel communicates quickly in only one direction -D Discretized PDE X=nonzero

Preconditioning Approaches u 0 (new) û (new) û (old) û 2 (new) û 2 (old) û 3 Gauss-Seidel Preconditioning Symmetric Gauss-Seidel (new) uˆn (new) uˆn u ˆn+ (new) uˆn 2 (newer) uˆn ˆ ( new u ) n u (newer) û 0 (newer) û 2 This symmetric Gauss-Seidel Preconditioner Communicates both directions

Preconditioning Approaches Gauss-Seidel Preconditioning Symmetric Gauss-Seidel Derivation of the SGS Iteration Equations Forward Sweep half step : k D + L x + Ux = b ( ) ( ) k+ /2 Backward Sweep half step : x = + D+ U x + Lx = b ( ) ( ) k k+ /2 ( ) ( ) k D U L D + L Ux ( ) ( ) + D+ U D+ U L( + ) k+ ( ) ( ) b D L b k k k x + = x D+ U D D+ L Ax + D + U D D+ L b ( ) ( )

Preconditioning Approaches Grid Block Diagonal Preconditioners Line Schemes Matrix m 2 Tridiagonal Matrices factor quickly

Preconditioning Approaches Grid Block Diagonal Preconditioners Line Schemes Problem Lines preconditioners communicate rapidly in only one direction Solution m 2 Do lines first in x, then in y. The Preconditioner is now two Tridiagonal solves, with variable reordering in between.

Preconditioning Approaches Block Diagonal Preconditioners Domain Decomposition Approach Break the domain into small blocks each with the same # of grid points The trade-off m 2 Fewer blocks means faster convergence, but more costly iterates

Preconditioning Approaches m points 2 l Block index Block Diagonal Preconditioners Domain Decomposition m 2 l m l 2 points m ( 2 B lock cost: factoring l l grids, O m l ), sparse GE l m GCR iters: Communication bound gives O iterations. l 3 Suggests insensitivity t o l: Algorithm is O m. ( ) Do you have to refactor every GCR iteration?

Preconditioning Approaches Grid Siedelerized Block Diagonal Preconditioners Line Schemes Matrix m 2

Preconditioning Approaches Grid Overlapping Domain Preconditioners Line based Schemes Matrix m 2 Bigger systems to solve, but can have faster convergence on tougher problems (not just Poisson).

Preconditioning Approaches i Incomplete Factorization Schemes Outline Reminder about Gaussian Elimination Computational Steps Fill-in for Sparse Matrices Greatly increases factorization cost Fill-in in a 2-D grid Incomplete Factorization Idea

Sparse Matrices Matrix Non zero structure X X X Fill-In Example Matrix after one GE step X X X X X 0 X X X0 X 0 X X X0 X X= Non zero Fill-ins

Sparse Matrices Fill-In Second Example Fill-ins Propagate X X X X X X X0 X0 0 X X X0 0 X X0 X0 Fill-ins from Step result in Fill-ins in step 2

Sparse Matrices Fill-In Pattern of a Filled-in Matrix Very Sparse Very Sparse Dense

Sparse Matrices Fill-In Unfactored Random Matrix

Sparse Matrices Fill-In Factored Random Matrix

Factoring 2-D Finite-Difference matrices Generated Fill-in Makes Factorization Expensive

FD Matrix properties 3-D Discretization Matrix nonzeros, m = 4 example

Preconditioning Approaches i Incomplete Factorization Schemes Key idea THROW AWAY FILL-INS! Throw away all fill-ins Throw away only fill-ins with small values Throw away fill-ins produced by other fill-ins Throw away fill-ins produced by fill-ins of other fill-ins, etc.

Summary 3-D BVP Examples Aerodynamics, Continuum Mechanics, Heat-Flow Finite Difference Matrices in, 2 and 3D Gaussian Elimination Costs Krylov Method Communication Lower bound Preconditioners based on improving communication