Num. discretization. Numerical simulations. Finite elements Finite volume. Finite diff. Mathematical model A set of ODEs/PDEs. Integral equations

Scientific Computing Computer simulations Road map p. 1/46 Numerical simulations Physical phenomenon Mathematical model A set of ODEs/PDEs Integral equations Something else Num. discretization Finite diff. Finite elements Finite volume Boundary elements Solution method Num.lin.algebra: which method matrix properties data structure Computer implement. sequential computing parallel computing data structures programing languages p. 2/46

The position of Scientific Computing Physical phenomenon Mathematical model A set of ODEs/PDEs Integral equations Something else Num. discretization Finite diff. Finite elements Finite volume Boundary elements Solution method Num.Lin.Algebra: which method matrix properties data structure Computer implement. sequential computing parallel computing data structures programing languages p. 3/46 NLA - aims Physical phenomenon Mathematical model A set of ODEs/PDEs Integral equations Something else Num. discretization Robustness with respect to model (problem) parameters discretization parameters numerical method parameters Scalability issues on various computers Balance between numerical efficiency computational efficiency Finite diff. Finite elements Finite volume Boundary elements Solution method Num.Lin.Algebra: which method matrix properties data structure Computer implement. sequential computing parallel computing data structures programing languages p. 4/46

Grand challenge problem - Environment High-resolution weather forecasting: more accurate, faster and longer range predictions Pollution studies, including cross-pollutant interactions Global atmosphere-ocean-biosphere and long range climate modelling Risø National Laboratory & Technical University of Denmark, Roskilde The Danish Eulerian model is a large air pollution model for studying different phenomena connected with transport of air pollutants in the atmosphere. All important physical processes (advection, diffusion, deposition, emissions and chemical reactions) are represented in the mathematical formulation of the model. All important chemical species (sulphur pollutants, nitrogen pollutants, ammoniaammonium, ozone, as well as many radicals and hydrocarbons) can be studied. p. 5/46 DEM - air pollution model Danish Eulerian Model: to perform numerical simulations and to study the air pollution over Europe. Task: establish reliable control strategies for the air pollution.

DEM - air pollution model Danish Eulerian Model: to perform numerical simulations and to study the air pollution over Europe. Task: establish reliable control strategies for the air pollution. Basic scheme: pollutants are emitted in the air from various sources DEM - air pollution model Danish Eulerian Model: to perform numerical simulations and to study the air pollution over Europe. Task: establish reliable control strategies for the air pollution. Basic scheme: pollutants (many of them) are emitted in the air from various sources (many of them) and transported by the wind - (diffusion, advection, horizontal vertical) get deposited on the Earth surface transform due to chemical reactions factors: winds, temperature, humidity, day/night,... p. 6/46

DEM - air pollution model cs 3 t = i=1 uics xi + 3 i=1 xi ( Kxi cs xi ) +Es (kscs+q(c1,,cq), where cs, s = 1,,q are the unknown concentrations of q species of the air pollutants to be followed. p. 7/46 DEM - air pollution model Demands: p. 8/46

DEM - air pollution model Demands: Spatial domain: 4800 km 2, 96 96 or 480 480 regular mesh (horizontal resolution) p. 8/46 DEM - air pollution model Demands: Spatial domain: 4800 km 2, 96 96 or 480 480 regular mesh (horizontal resolution) 10 vertical layers, 1 km each 35 pollutants p. 8/46

DEM - air pollution model Demands: Spatial domain: 4800 km 2, 96 96 or 480 480 regular mesh (horizontal resolution) 10 vertical layers, 1 km each 35 pollutants 3D: 35 10 96 2 = 3225600 unknowns 3D: 35 10 480 2 = 80640000 unknowns about 36000 time-steps, only one month simulated p. 8/46 DEM - air pollution model Demands: Spatial domain: 4800 km 2, 96 96 or 480 480 regular mesh (horizontal resolution) 10 vertical layers, 1 km each 35 pollutants 3D: 35 10 96 2 = 3225600 unknowns 3D: 35 10 480 2 = 80640000 unknowns about 36000 time-steps, only one month simulated Computers at DCAMM: 16 PEs on Newton: 60 000 sec = 1000 min 17 h Is this much or not? p. 8/46

Numerical simulations - a cross-disciplinary issue Conducting a numerical simulation: Modelling Discretization Choice of a solution method Computer implementation Postprocessing Numerical simulations - a cross-disciplinary issue Modelling: modelling error

Numerical simulations - a cross-disciplinary issue Modelling: modelling error Discretization: discretization error (space, time,stability...) Numerical simulations - a cross-disciplinary issue Modelling: modelling error Discretization: discretization error (space, time, stability...) Choice of a solution method: iteration error (robustness wrt discretization parameters, ) Implementation, computer platform, suitability (memory or computation-bound), data structures, communication layout Mutual coupling: example with domain-decomposition approach, FEM, PETSC, deal.ii p. 9/46

Course contents: Introduction, NLA, basic iterative schemes Projection methods Speeding up the convergence - preconditioning Multilevel/multigrid preconditioners Multilinear algebra - tensors Structured matrices, properties, preconditioning Matrix factorizations, LS, SVD Num. Solution methods for eigenvalue problems p. 10/46 Learning Goals At the end of the course, the participant should be able to be aware of, understand, and be able to make arguments about central issues regarding numerical solution methods for linear systems, concerning numerical efficiency, computational efficiency (complexity of the numerical algorithm), robustness with respect to problem, discretization and method parameters, possible parallelization; given a problem, have a clear guiding criteria how to choose suitable solution technique and can reason what could be advantageous and disadvantageous; demonstrate how the algorithms can be applied (on some test problems). p. 11/46

Course moments: Lectures - very highly recommended! Hands-on sessions - must do; come up with some comments and at least two questions, to be discussed in the beginning of the next lecture. Currently - 7 such planned, but might be reduced. Projects - four such! Written report as a scientific paper, preferably in English. Structure, language, derivations, algorithms, figures, conclusions, references Take the chance to utilize the presence of the invited lecturers, attend the seminars, ask questions,... p. 12/46 Assumed that you are familiar with/can: vector subspaces full (column/row) rank positive definite matrices matrix/vector norms and condition numbers (strictly) diagonally dominant matrix spectral radius Z-matrices, M-matrices Schur s lemma dense/sparse matrices direct solvers/sparse direct solvers/pivoting/complexity fill-in, ordering (Minimal degree, Reverse Cuthill-McKee, nested dissection) derive the eigenvalues of the 1D and 2D discrete Laplacian :) p. 13/46

Schur s lemma: p. 14/46 Iterative Solution methods p. 15/46

Basic Iterative Solution methods p. 16/46 Basic Iterative Solution methods Introduction: The ideas to use iterative methods for solving linear systems of equations go back to Gauss (1823), Liouville (1837) and Jacobi (1845). After deriving an iterative procedure, in 1823, Gauss has written in a letter the following: "... You will hardly eliminate directly anymore, at least not when you have more than two unknowns. The indirect method can be pursued while half asleep or while thinking about other things." p. 17/46

Introduction: Before considering iterative solution methods for linear systems of equations, we recall how do we solve nonlinear problems Let f(x) = 0 have to be solved and f(x) is a nonlinear function in x. The usual way to approach the problem is: F(x) x f(x). If x is the solution of f(x) = 0, then x is a stationary point for x = F(x). (1) Then we proceed with finding the stationary point for (1) and this is done iteratively, namely, x (k+1) = F(x (k) ),k = 0,1,,x (0) given. (2) p. 18/46 Convergence of the fixed point iteration: For any initial guess x (0), there exists a unique fixed point x for F(x), x = lim k x(k) if and only if F is a contracting mapping, i.e. F(x) F(y) q x y for some q (0,1). p. 19/46

Fixed point for linear problems: Let now f(x) Ax b be linear. We use the same framework: F(x) = x (Ax b) x (k+1) = x (k) (Ax (k) b) = x (k) +r (k) where r (k) = b Ax (k) is called the residual at iteration k. In this way we obtain the simplest possible iterative scheme to solve Ax = b, namely, x (k+1) = x (k) (Ax (k) b), k = 0,1, x (0) given. p. 20/46 Fixed point for linear problems - when does it converge?: F(x) F(y) = x (Ax b) y (Ay b ) = (I A)(x y) (I A) (x y) The simple iteration will not converge in general. p. 21/46

Simple iteration For many reasons the latter form of the simple iteration is replaced by x (k+1) = x (k) +τr (k), (3) where τ is some properly chosen method parameter. Relation (3) defines the so-called stationary basic iterative method of first kind. p. 22/46 Stationary iterative methods... If we permit τ to change from one iteration to the next, we get x (k+1) = x (k) +τkr (k), (4) which latter defines the so-called non-stationary basic iterative method of first kind. So far τ and τk are some scalars. Nothing prevents us to replace the method parameter by some matrix, however, if this would improve the convergence of the iterative method. p. 23/46

(cont) Nothing prevents us to replace the method parameter by some matrix, however, if this would improve the convergence of the iterative method. Thus, we can consider x (k+1) = x (k) +C 1 (b Ax (k) ) or (5) x (k+1) = x (k) +C 1 r (k), It is easy to see that we obtain (5) by replacing Ax = b with C 1 Ax = C 1 b and use the simple iteration framework. In this case the iterative scheme takes the form Cd (k) = r (k), (6) x (k+1) = x (k) +d (k) The scheme (6) has in general a higher computational complexity than (4), since a solution of a system with the matrix C is required at each iteration. p. 24/46 Concerns: C1 Does the iteration process converge to the solution, i.e. does x (k) x? C2 If yes, how fast does it converge? The number of iterations it needed for the iterative method to converge with respect to some convergence criterion, is a function of the properties of A. For instance, it = it(n), where n is the size of A. If it turns out that it = O(n 2 ), we haven t gained anything compared to the direct solution methods. The best one can hope for is to get it Const, where Const is independent of n. Since the the computational complexity of one iteration is in many cases proportional to n (for sparse matrices, for instance)ten the complexity of the whole solution process will be O(n). C3 Is the method robust with respect to the method parameters (τ, τk)? p. 25/46

Concerns (cont.): C4 Is the method robust with respect to various problem parameters? A = A(ρ,ν,E, ) C5 When we are using the scheme C 1 Ax = C 1 b, it must be easy to solve systems with C. C6 Is the method parallelizable? Parallelization aspects become more and more important since n is XXL. p. 26/46 Concerns (cont.): Suppose the method converges to the exact solution x. Then more questions arise: C7 When do we stop the iterations? We want x x (k) ε but x is not known. What about checking on r (k)? Is it enough to have r (k) ε? Will the latter guarantee that x x (k) ε? Denote e (k) = x x (k) (the error at iteration k). Then r (k) = b Ax (k) = A(x x (k) ) = Ae (k). In other words e (k) = A 1 r (k). Scenario: Suppose A 1 = 10 8 and ε = 10 4. Then e (k) A 1 r (k) 10 4, which is not very exiting. Example: Discrete Laplace 5 h : A 1 λmin = 1 2 (πh)2 10 4 for h = 10 2. p. 27/46

Concerns (cont.): C8 How do we measure (estimate) the convergence rate? C9 How do we find good method parameters (τ, τk, C), which will speed up the convergence? We start our considerations with [C9]. p. 28/46 Choosing C: Intuitively, C has to do something with A. Note that if C = A, then C 1 = A 1 and we will get convergence in one step! However, the computational effort to construct A 1 is higher than to use a direct solution method. We try the following choice. Consider the following so-called splitting of A, A = C R, where C is nonsingular and R can be seen as an error matrix. p. 29/46

Choosing C (cont.) Then C 1 A = C 1 (C R) = I C 1 R = I B. The matrix B = C 1 R is referred to as the iteration matrix. B m is the convergence factor for m steps ( B m ) 1/m is called the average convergence factor. p. 30/46 Equivalent formulation using the splitting: Using the splitting A = C R we obtain the following equivalent form of the iterative procedure: A = C R R = C A x (k+1) = x (k) +C 1 (b Ax (k) ) = x (k) +C 1 b C 1 (C R)x (k) = C 1 b+c 1 Rx (k) Cx (k+1) = Rx (k) +b (7) The matrix C is called a preconditioner to A. Its general purpose is to improve the properties of A in order to achieve a better (faster) convergence of the method. p. 31/46

A general convergence result: Theorem 1 The sequence {x (k) } from Cx (k+1) = Rx (k) +bconverges to the solutionx of Ax = b for any initial guess x (0) if and only if there holds ρ(b) ρ(c 1 R) < 1 where ρ( ) denotes the spectral radius. Proof Let e (k) = x x (k), A = C R. Then Cx = Rx +b Cx (k) = Rx (k 1) +b Ce (k) = Re (k 1) e (k) = Be (k 1) = B 2 e (k 2) = = B k e (0). If ρ(c 1 R) < 1 then lim k Bk = 0 and e (k) 0. p. 32/46 If ρ(c 1 R) 1: Let λi = eig(b) and ρ(b) = λj i.e., λj is the eigenvalue of B, such that ρ(b) = λj. Let v j be the corresponding eigenvector. Then ( v j) m = B m v j = λ m j vj 0. e (0) = n βkv k = βjv j + k=1 B m e (0) = βjb m v j + and at least one component of e (m) does not converge to zero. p. 33/46

Remarks on the proof: The basic argument in the latter proof is that if ρ(b) < 1 then B k 0.. This can be shown in the following way. Lemma 1 Let T be a nonsingular matrix and let x T = Tx. Let A T = sup x 0 Ax T x T (a) A T = TAT 1 be the induced matrix norm. Then. (b) For any ε > 0 and matrix A, there exists a nonsingular matrix T such that A T ρ(a)+ε. In other words, there exist matrix norms, which are arbitrary close to the spectral radius of a given matrix. p. 34/46 Remarks on the proof: Proof (a) x T is a vector norm and T 1 exists. A T = sup x 0 = sup y 0 Ax T = sup x )T x 0 TAT 1 y y T Ax Tx = TAT 1. p. 35/46

Proof (cont.) (b) We use Schur s lemma: There exists a unitary matrix U, such that w11 0 UAU 1 w22 = W =......, wnn where wii = λi S(A); S(A) denotes the spectrum of A. Let δ > 0 and define D = D(δ) = diag{δ 1,δ 2,,δ n }. Then DWD 1 is also upper triangular and 0, j < i (DWD 1 )ij = wii, j = i wijδ j 1 j > i., { } DWD 1 max i wii +nmax w ij δ j 1 j>i. p. 36/46 Proof (cont.) We see that for any given ε > 0 we can choose δ > 0 small enough so that nmax w ijδ j 1 < ε. Hence, j>i DWD 1 ρ(a)+ε A T = TAT 1 = DUAU 1 D 1 = DWD 1 ρ(a)+ε. (for T = DU, nonsingular). p. 37/46

Convergence (cont.) Lemma 2 For any square matrix there holds (a) lim k Ak = 0 ρ(a) < 1, (b) Ifρ(A) < 1 then (I A) 1 = I +A+A 2 + is convergent. Proof (a) : If ρ(a) < 1 then choose ε > 0: ρ(a)+ε < 1. Then there exists a nonsingular T (which depends on A), such that A T ρ(a)+ε < 1. A k T A[ k T 0 lim k Ak = 0 (a) : If lim k Ak = 0, let {λ,v} be an eigensolution of A, then λ k v = A k v 0. This is true for all eigenvalues, thus ρ(a) = max λ < 1. (b) (I A)(I +A+A 2 + ) = I A k+1. If ρ(a) < 1 then A k 0 (b) follows. p. 38/46 Rate of convergence Theorem 1 shows both convergence and rate of convergence (e (k) = B k e (k) ). The latter is difficult to compute. Also the convergence may not be monotone. Theorem 2 Consider Cx (k+1) = Rx (k) +b, B = C 1 R and let ρ(b) < 1. Then x x (k) B 1 B x(k) x (k 1) p. 39/46

Rate of convergence, cont. Proof x (k+1) x (k) = B(x (k) x (k 1) ) and x (k+m+1) x (k+m) = B m+1 (x (k) x (k 1) ). We have x (k+s) x (k) = s 1 j=0 (x (k+j+1) x (k+j) ) x (k+1) x (k) + x (k+2) x (k+1) + Therefore x (k+m+1) x (k+m) m j=0 B j x (k) x (k 1) = B B m+1 1 B We let now m, i.e., x k+m x, B m 0. x x (k) B 1 B x(k) x (k 1). x (k) x (k 1) p. 40/46 Stopping tests: Theorem 2 can be used to get information whether the iteration error e (k) = x x (k) is small enough. In practice, most used stopping tests are: (S1) r (k) ε, residual based, absolute p. 41/46

Stopping tests: Theorem 2 can be used to get information whether the iteration error e (k) = x x (k) is small enough. In practice, most used stopping tests are: (S1) r (k) ε, residual based, absolute (S2) r (k) ε r (0), residual based, relative p. 41/46 Stopping tests: Theorem 2 can be used to get information whether the iteration error e (k) = x x (k) is small enough. In practice, most used stopping tests are: (S1) r (k) ε, residual based, absolute (S2) r (k) ε r (0), residual based, relative (S3) x (k) x (k 1) ε p. 41/46

Stopping tests: Theorem 2 can be used to get information whether the iteration error e (k) = x x (k) is small enough. In practice, most used stopping tests are: (S1) r (k) ε, residual based, absolute (S2) r (k) ε r (0), residual based, relative (S3) x (k) x (k 1) ε (S4) x x (k) ε0 x x (0). If the latter is wanted, then we must check on (S3) and choose ε such that ε B ε 1 B 0 x x (0). Either estimate of A 1 or of B is required. p. 41/46 Choices of the matrix C Choice J Let A = D L U, where D is diagonal, U is strictly upper triangular and L is strictly lower triangular. Let C D, R = L+U. The iterative scheme is known as Jacobi iteration: Entry-wise x k+1 = 1 i a ii ( Dx (k+1) = (L+U)x (k) +b bi ).. i j aijxj For the method to converge: B = D 1 (L+U) ρ(b) D 1 (L+U) = max 1 i n n j=1, j i aij aii We want ρ(b) < 1. One class of matrices, for which Jacobi method converges is when A is strictly diagonally dominant. p. 42/46

Choices of the matrix C Choice GS-B Choose C D U, R = L Backward Gauss-Seidel (D U)x (k+1) = Lx (k) +b p. 43/46 Choices of the matrix C Choice GS-B Choose C D U, R = L Backward Gauss-Seidel (D U)x (k+1) = Lx (k) +b Choice GS-F Choose C D L, R = U Forward Gauss-Seidel (D L)x (k+1) = Ux (k) +b p. 43/46

Choices of the matrix C Choice GS-B Choose C D U, R = L Backward Gauss-Seidel (D U)x (k+1) = Lx (k) +b Choice GS-F Choose C D L, R = U Forward Gauss-Seidel (D L)x (k+1) = Ux (k) +b G-S is convergent for s.p.d. matrices. p. 43/46 Choices of the matrix C Choice GS-B Choose C D U, R = L Backward Gauss-Seidel (D U)x (k+1) = Lx (k) +b Choice GS-F Choose C D L, R = U Forward Gauss-Seidel (D L)x (k+1) = Ux (k) +b G-S is convergent for s.p.d. matrices. make it more fancy: A = D L U. Then ωa = ωd ωl ωl+d D overrelaxation = (D ωl) (ωu +(1 ω)d) Choose C D ωl, R = ωu +(1 ω)d: SOR (D ωl)x (k+1) = [ωu +(1 ω)d]x (k) +ωb p. 43/46

SOR - back to 1940 One can see SOR as a generalization of G-S (ω = 1). Rewrite (D ωl)x (k+1) = [ωu +(1 ω)d]x (k) +ωb as ( 1 ω D L) x (k+1) = [( 1 ω 1) D +U ] x (k) +b For the iteration matrix Bω = ( 1 ω D L) 1[( 1 ω 1) D +U ] One can show that ρ(bω) < 1 for 0 < ω < 2. Furthermore, there is an optimal value of ω, for which ρ(bω) is minimized: ωopt = 1+ 2, B = I D 1 ρ( B) 1 2 A A. p. 44/46 SOR - cont. Rate of convergence: Let λi = eig(bω). n i=1 λi = det ( (1 ω)i +ωd 1 U ) = 1 ω n. at least one λi 1 ω. ρ(bω) 1 ω. We want ρ(bω) < 1, i.e. 1 ω ρ(bω) < 1, 0 < ω < 2. p. 45/46

Splittings of A Let A,C,R n n and consider A = C R. A splitting of A is called regular if C is monotone and R 0 (elementwise) weak regular if C is monotone and C 1 R 0 nonnegative if C 1 exists and C 1 R 0 convergent if ρ(c 1 R) < 1. Recall: A matrix is called monotone if Ax > 0 implies x > 0. Theorem: A - monotone A 1 0. p. 46/46