Parallel Numerics. Prof. Dr. Thomas Huckle. July 2, Technische Universität München, Institut für Informatik

Size: px
Start display at page:

Download "Parallel Numerics. Prof. Dr. Thomas Huckle. July 2, Technische Universität München, Institut für Informatik"

Transcription

1 Parallel Numerics Prof Dr Thomas Huckle July 2, 2006 Technische Universität München, Institut für Informatik 1

2 Contents 1 Introduction 4 11 Computer Science Aspects of Parallel Numerics Parallelism in CPU Memory Organization Parallel Processors Performance Analysis Further Keywords: 9 12 Numerical Problems Data Depency Graphs Directed Graph G=(E,V) Depency Graphs of Iterative Algorithms Depency graph for solving a triangular linear system 17 2 Elementary Linear Algebra Problems BLAS Basic Linear Algebra Subroutines Program package Analysis of Matrix-Vector product Vectorization Parallelization by building blocks c = Ab for banded matrix Analysis of the Matrix-Matrix-product 27 3 Linear Equations with dense matrices Gaussian Elimination: Basic facts Vectorization of the Gaussian Elimination Gaussian Elimination in Parallel Crout method left looking GE: Right looking / Gaussian Elimination standard QR-Decomposition with Householder matrices QR-decomposition Householder method for QR Householder method in parallel 38 4 Linear Equations with sparse matrices General properties of sparse matrices Storage in coordinate form Compressed Sparse Row Format: CSR Improving CSR Diagonalwise storage 41

3 415 rectangular, rowwise storage scheme Jagged diagonal form Sparse Matrices and Graphs A = A T > 0 (n n - matrix) symmetric A non symmetric: directed graph Dissection form preserved during GE Reordering Smaller Bandwidth by Cuthill Mckee-Algorithm Dissection Reordering Algebraic pivoting: during GE Gaussian Elimination in Graph Different direct solvers 55 5 Iterative methods for sparse matrices stationary methods Richardson Iteration Better splitting of A Jacobi (Diagonal) - Splitting: Gauss-Seidel method by improving convergence Nonstationary Methods Let A symmetric positive definite A = A T > 0 (spd) Improving the gradient method conjugate gradients GMRES for General Matrix A, not spd Convergence of cg or GMRES 67 6 Collection remaining problems Domain Decomposition Methods for Solving PDE Parallel Computation of the Discrete Fourier Transformation Parallel Computation of Eigenvalues 73 2

4 Literature: Numerical Linear Algebra for High Performance Computers Dongarra, 3

5 1 Introduction 11 Computer Science Aspects of Parallel Numerics 111 Parallelism in CPU Elementary operations in CPU are carried out in pipelines: Divide a task into a sequence of smaller tasks Each small task is executed on a piece of hardware, that operates concurrently with the other stages of the pipeline Example: Multiplication Advantage: If pipeline is filled, per clock one result comes out All multiplications should be organized such that the pipeline is always filled! If pipeline is empty, it is not efficient Special Case: Vector instruction: for set of data the same operation has to be executed: α x 1 x n Cost: Startup time + vector length*clock period 4

6 Chaining: Combine pipelines directly: advantage: total cost = startup time }{{} longer + vector length clock period Problem: Data Depency Fibonacci: x 0 = 0, x 1 = 1, x 2 = x 1 + x 0,, x i = x i 1 + x i 2 X1-X0 X2-X1 X 2 Have to wait until X2 Next pair has to wait until x 2 / pipeline is empty in each step 112 Memory Organization 5

7 Cache idea: Buffer between large slow memory and small fast memory By considering the flow of the last used data, we try to predict which data will be requested in the next step: keep the last used data in cache for fast access keep also the neighbourhood of this data in cache Cache hit: CPU looks for data and data is in cache and finds it Cache miss: Data is not in cache: look in main memory and copy new page in the cache 113 Parallel Processors MIMD-Architecture: Multiple Instruction-Multiple Data (global)shared memory (P=processors, M=memory): (Global) Shared Memory P1 P2 (Local) P1 Pn or Distributed Memory M1 Mn M: data or virtual shared memory: physically distributed data but organized as shared memory Topology of processors/memory: Interconnection (shared memory) Bus: P1 Cache Local Memory I/O bus Global Memory Pn 6

8 Mesh: (distributed memory) Hypercube: data depency: log(n) Shared memory communication for different processors by: synchronization: eg barrier p 1 p n halt continue iff all completed MPI: Message Passing Interface: Communication library for C, C++, FORTRAN Compiling: mpicc <options> progc Start: mpirun - arch <architecture> - up <up> prog 7

9 Commands: MPI S MPI Bcast MPI Recv MPI Gather MPI Barrier 114 Performance Analysis computation speed: r = N t Mflops N floating point operations in t microseconds or by known speed: t = N r flops Amdahl s law: Algorithm takes N flops fraction f carried out with speed of V Mflops (good parallel) fraction 1 f carried out with speed of S Mflops (bad parallel) fraction f is well-suited for parallel execution ; 1 f is not Total CPU-time: t = f N V + (1 f) N S = N ( f V + 1 f S ) microseconds Overall Speed (Performance): r = N t = 1 f + Mflops Amdahl s law 1 f V S Interpretation: f must be close to 1 in order to benefit significantly from parallelism Speedup by using p parallel processors for a given job: t j := wall clock time to execute the job on j parallel processor Speedup: S p = t 1 /t p (ideal : t 1 = pt p ) Efficiency: E p = Sp 0 E p p 1 E p 1 : very good parallelizable t p = t 1 /p problem scales t p = parallel {}}{ ft 1 p + (1 f)t 1 = t 1 f + (1 f)p p 8 (1 f)t 1

10 S p = 1 f (1 f)p p = p f + (1 f)p Ware s law E p = 1 f + (1 f)p ; lim p E p = lim p 1 f + (1 f)p 0 Assume that the given problem can be solved in 1 unit of time on a parallel machine with p processors A uniprocessor would perform (1 f) + f p Speedup: S pf = t 1 = 1 f + fp t p 1 = p + (1 p)(1 f) Gustafson s law E pf = S p p = 1 f p 115 Further Keywords: + f p f (only theoretical use!) - An algorithm is scaling iff with p processors we can reduce the operation time by a factor of p Larger problem can be solved in the same time by using more processors (This means: speedup p, efficiency 1 ) - load balancing: The job has to be distributed on different processors such that all processors are busy: Avoid idle processors - deadlock: two or more processors are waiting indefinitely for an event that can be caused only by one of the waiting processors - data depency: compute C = A + B (1) Z = C X + Y (2) 1 2 Each waiting fot the results of the other one 9

11 (2) can be computed only after (1) example loop: for(i = 1; i n; i++) a[i] = b[i] + a[i 1] + c[i] (strongly sequential) 12 Numerical Problems vectors x, y R n dot product (inner product) x T y = (x 1,, x n ) sum of vectors: x + αy = outer product: xy T = x 1 x n x 1 + αy 1 x n + αy n (y 1,, y m ) = y 1 y n = n i=1 x 1 y 1 x 1 y m x n y 1 x n y m matrix product: A R n,k, B R k,m, C = A B R n,m a 11 a 1k b 11 b 1m c 11 c 1m = a n1 a nk b k1 b km c n1 c nm with c ij = k a ir b rj, r=1 Solving linear equations, i=1,,n j=1,,m a 11 a 1n 0 a 22 a 2n eg triangular: 0 0 a nn or x 1 x 2 x n = b 1 b 2 b n x i y i 10

12 a 11 x a 1n x n = b 1 a 22 x a 2n x n = b 2 a nn x n = b n solution : x n = bn a nn, x n 1 = b n 1 a n 1,n x n a n 1,n 1 a n 1,n 1 x n 1 + a n 1,n x n = b n 1 general form x j = b j P n a jk x k k=j+1 a jj for j = n,, 1 Gaussian Elimination, LU-Decomposition (Cholesky-Decomposition) Least Squares problem (normal equation) min x Ax B 2 QR -Decomposition Differential Equations (PDE) eigenvalues, singular values, FFT 13 Data Depency Graphs 131 Directed Graph G=(E,V) with edges E, vertices/nodes V Example: Computation of (x 1 + x 2 )(x 2 + x 3 ) = x 1 x 2 + x x 1 x 3 + x 2 x 3 11

13 Input: X1 X2 X Data flow X1+X2 X2+X Time Sequentially (X1+x2)(X2+X3) time steps In parallel: x 1 + x 2 and x 2 + x 3 can be computed indepently Parallel: X1 X2 X3 X1+X2 X2+X (X1+x2)(X2+X3) time steps Second equivalent formula (Parallel): 12

14 132 Depency Graphs of Iterative Algorithms Given: Function f, start x (0), x (k+1) = f(x (k) ), Notation: x (k+1) ˆ=x(k + 1) (k) k x x = f( x) fix point of f compare Newton s method: x k+1 = x k g(x k) g( x) = 0 g (x k ) x 1 (k + 1) f 1 (x 1 (k)),, x n (k) In vector form: = x (k+1) = f x (k) = x n (k + 1) f n (x 1 (k)),, x n (k) Example: x 1 (k + 1) = f 1 (x 1 (k), x 3 (k)) x 2 (k + 1) = f 2 (x 1 (k), x 2 (k)) x 3 (k + 1) = f 3 (x 2 (k), x 3 (k), x 4 (k)) x 4 (k + 1) = f 4 (x 2 (k), x 4 (k)) edge i to j iff for x (k+1) i we need x (k) j

15 Parallel Computation: Depency Graph for Iteration: Single-step or Jacobi-Iteration (k) k Very nice in parallel ; Convergence slow: x x Idea for accelerating the Convergence: Use always the newest available information: x 1 (k + 1) = f 1 (x 1 (k), x 3 (k)) x 2 (k + 1) = f 2 (x 1 (k + 1), x 2 (k)) x 3 (k + 1) = f 3 (x 2 (k + 1), x 3 (k), x 4 (k)) x 4 (k + 1) = f 4 (x 2 (k + 1), x 4 (k)) leads to much faster convergence 14

16 Full-step or Gauss-Seidel-method (Drawback: loss of Parallelism:) In this form the iteration deps on the ordering of the variables x 1,, x n x 1 (k + 1) = f 1 (x 1 (k), x 3 (k)) x 3 (k + 1) = f 3 (x 2 (k), x 3 (k), x 4 (k)) x 4 (k + 1) = f 4 (x 2 (k), x 4 (k)) x 2 (k + 1) = f 2 (x 1 (k + 1), x 2 (k)) 15

17 Better parallelism, but slower convergence Find optimal ordering with fast convergence and good in parallel Colouring algorithms for depency graphs: - use k colours for the vertices of the graph - vertices of the same colour can be computed in parallel - optimal colouring for minimal k, but without cycles connecting vertices of the same colour in subset of vertices of the same colour there are no cycles subgraph is a tree Ordering by starting with the leaves and ing with the root Example: 16

18 x 3 does not dep on x 1 ; x 4 does not dep on x 3 x 2 uses new computed x 1, x 3, x 4 and needs one time step Computation uses only old information in parallel in one time step Theorem 1 Theorem: Two statements are equivalent: a There exists an ordering, such that the one Gauss-Seidel-Iteration-step takes k (time) levels b There exists a colouring with k colours, such that there is no cycle of edges of the same colour Proof: Colouring in subgraph with no cycles tree ordering (from leaves to root) no data depency in subgraph subgraph in parallel Graph: discretization of physical problems in R; neighbourconnections k=2 red-black-gauss-seidel in PDE, 2 time steps r b r b r b r b r b r b b r r b 133 Depency graph for solving a triangular linear system a 11 x 1 = b 1 a 21 x 1 + a 22 x 2 = b 2 a 31 x 1 + a 32 x 2 + a 33 x 3 = b 3 a 41 x 1 + a 42 x 2 + a 43 x 3 + a 44 x 4 = b 4 17

19 or a a 21 a a 31 a 32 a 33 0 a 41 a 42 a 43 a 44 x 1 x 2 x 3 x 4 = b 1 b 2 b 3 b 4 solution: x 1 = b 1 /a 11 x 2 = (b 2 a 21 x 1 )/a 22 x 3 = (b 3 a 31 x 1 a 32 x 2 )/a 33 x 4 = (b 4 a 41 x 1 a 42 x 2 a 43 x 3 )/a 44 strongly sequential problem General: x k = (b k k 1 j=1 a kj x j )/a kk for k = 1,, n Depency Graph: Assume a jj = 1 2n-1 timesteps 18

20 2 Elementary Linear Algebra Problems (dense matrices, parallel-vectorised) 21 BLAS Basic Linear Algebra Subroutines Program package Sum: s = n i=1 a i by fan-in process a (k) = a (k) 1 a (k) 2 N k = a (k 1) 1 a (k 1) 2 N k + a (k 1) 2 N k +1 a (k 1) 2 N k+1 Grouping: a a 8 = [(a 1 + a 5 ) + (a 3 + a 7 )] + [(a 2 + a 6 ) + (a 4 + a 8 )] for (k = 1; k N; k + +) for (j = 1; j 2 N k ; j + +) a j = a j + a j+2 N k; 19

21 full binary tree with n = 2 N leaves depth ˆ= time log n = N (sequential: O(n)) Level 1 BLAS: Basic Linear Algebra Subroutines with O(n) problems (Vectors only) eg DOT-product by fan-in: s = x T y = n x j y j j=1 Parallelization of dot-product: n x j y j j=1 Dot-product is not very good in parallel in vectorization Other way of computing DOT-product on a special architecture: Distribute data on linear 1 dimensional processor array with r = n/k processors Break x 1,, x n in r small vectors of length k Each processor computes a j1 b j1 + + a jk b jk 20

22 Time for this parallel computation: k (add + mult) ie time for one Addition/Multiplication After computing this part, processor P 1 /P r ss his result to the right/left neighbour, which adds the new data to his own result, and ss the new data to his right/left neighbour until P r/2 holds the final number Total time: (deping on n and r) f(r) = K(add + multi) + r 2 s + r 2 add = n r (add + mult) + r a + s 2 Minimize total time f(r): 0 = f (r) = a + m n + a + s r 2 2 2(a + m) r = n = O( n) a + s optimal: with n processors, the time is O( n) f( n) = n n (a + m) + n a+s 2 = O( n) Then with n processors the total time is O(log n) Further level-1 BLAS problems S single(precision) A α X x P + Y y : y = α x + y by pipelining vectorization: a Xi axj X + yj axk+yk 21

23 Parallelization by partitioning: 1, n = {1, 2, 3,, n} = I 1 I 2 I R x 1 y 1 x 2 y 2 x = x R, y = y R Each processor p j from p 1 to p r gets x j and y j and computes α x j + y j very good vectorizable and parallelizable: SCOPY: y = x NORM: n x 2 = j=1 x 2 j compare DOT Level-2 BLAS: Matrix-Vector O(n 2 ) sequentially S single precision G E general matrix : y = αa x + β y M V vector or solving triangular system Lx = b, L is lower triangular matrix Level-3 BLAS: Matrix-Matrix O(n 3 ) S single precision G E general matrix : y = αab + βc M M matrix Based on BLAS: LAPACK- subroutines for solving linear equations, least squares, QR-decomposition, eigenvalues, eigenvectors 22 Analysis of Matrix-Vector product A = (a i=1n,j=1m ) R n,m, b R m, c R n, C = Ab 22

24 221 Vectorization c 1 a 11 a 1m = c n = (ij) form: c = ; j=1 a n1 a nm m a 1j b j j=1 m a nj b j b 1 b m = }{{} collection of DOT (rows of A) = a 11 b a 1m b m a n1 b a nm b m m j=1 b j a 1j a nj }{{} collection of SAXPY s (columns of A) GAXPY for i = 1,,n for j = 1,,m c i = c i + a ij b j (ji) form: DOT (entries of c), where c i = (a i ) b i-th row of A b for j = 1,,m for i = 1,,n c i = c i + a ij b j SAXP Y c = c + b j (a j ) Add j-th column of A GAXP Y Number of SAXPY s with the same vector c Advantage of GAXPY: keep c in fast register memory SAXPY/GAXPY good vectorizable 222 Parallelization by building blocks Reduce Matrix-vector on smaller Matrix-vector on precessors 1, n = {1,2,3,,n} = I 1 I 2 I 3 I R disjunct: I j I k = 1, m = J 1 J 2 J 3 J s J j J k = for j k processor P rs gets matrix A rs := A(I r, J s ), b s = b(j s ), c r = c(i r ) 23

25 I r J s A rs b s }J s = c r }I r c r = S A rs b s = S s=1 s=1 for r = 1,,R for s = 1,,S c (S) r = A rs b s for r = 1,,R c r = 0 for s = 1,,S c r = c r + c (S) r c (s) r Special case S=1: A 1 A 1 b c = A 2 b = A 2 b for r = 1,,R small, indepent matrix-vector products no communication, totally parallel blockwise collection and addition of vectors rowwise communication (parallel) no communincation between processors (processors indepent of each other) compute A 1 b in vectorizable form by GAXPY s b 1 Special case R=1: c = (A 1 A 2 ) b 2 = A 1 b 1 + A 2 b 2 + A i b i indepent, then collection of result P 1 P s (not so good in parallel) Rule: 1) Vectorization - pipelining (inner most loops) 2) Cache 3) Parallel (outer most loops) P 1 P 2 : 24

26 223 c = Ab for banded matrix b 0 0 bandwidth b eg β = 1 : tridiagonal matrix (0: main diagonal, +/-1 first upper/lower) 0 0 A = 0 0 notation 0 0 ã 10 ã 11 ã 1β 0 0 ã 2, 1 ã 20 0 Ã = ãn β,β ã β+1, β ã n, β ã n0 0 0 ã 1,0 ã 1,β 0 ã β+1, β ã n β,β 0 ã n, β ã n, n = (2β + 1) O(n)

27 ã is = a i,i+s for row i = 1,, n 1 i + s n S [l i, r i ] = [max{ β, 1 i}, min{β, n i}] Therefore we get the inequality eg 1 i S n i, β S β, 1 S i n S row i = 1 : S [0, β] row i = β + 1 : S [ β, β] i = n β : S [ β, β] i = n : S [ β, 0] computation of matrix-vector-product C = A b on vector processor C i = A ij b = j a ij b j = r i a i,i+s b i + S S=l i }{{} j = r i S=l i ã i,s b i+s for i = 1,,n Algorithm: for s = - β : 1 : β for i = max{1-s, 1} : 1 : min {n-s,n} c i = c i + ã ij b i+s parallel computation: 1, n = for i I r c i = r i R r=1 general triade (no SAXPY) I r s=l i ã is b i+s Processor P r gets rows to index set I r := [m r, M r ] to compute its part of C What part of vector b is necessary to process P r? 26

28 b j for j = i + s m r + l mr = m r + max{ β, 1 m r } = max{m r β, 1} j = i + s M r + r Mr = M r + min{β, n M r } = min{m r + β, n} Hence processor P r ( I r ) needs b j for j [max{1, m r β}, min{n, M r + β}] 23 Analysis of the Matrix-Matrix-product A = (a ij ) i=1n j=1m B = (b ij ) i=1m j=1q C = A B = (c ij ) i=1n j=1q for i = 1n, c ij = for j=1q: m a ik b kj = k=1 a i1 a im b 1j b mj = c ij Algorithm 1 (ijk) - form: for i = 1:n for j = 1:q for k = 1:m c ij = c ij + a ik b kj DOT-product c ij = A i B j All entries c ij are fully computed, one after another Access to A rowwise, to B columnwise Algorithm 2 (jki) - form for j = 1:q k = 1:m for i = 1:n c ij = c ij + a ik b kj SAXPY c j = c j + a k b kj vector c j c computed columnwise; access to A columnwise GAXPY c j = k b kja k 27

29 Algorithm 3 (kji) - form for k = 1:m for j = 1:q for i = 1:n c ij = c ij + a ik b kj SAXPY NO GAXPY because different c j There are computed intermediate values c (k) ij Access to A columnwise ijk ikj kij jik jki kji Access to A row row column column Access to B column row row column Computation of c row row row column column column c ij direct delayed delayed direct delayed delayed vector operation DOT GAXPY SAXPY DOT GAXPY SAXPY with vector length m q q m m m usually GAXPY better; longer vector length better; choose the right access to A,B, deping on the storage Matrix-Matrix-product in parallel 1, n = 1, m = 1, q = R r=1 S s=1 T t=1 I r K s J t Distribute blocks relative to index sets I r, K s, J t to processor P rst : K s J t J t I r A rs K s B st = I r c (s) rt 28

30 1 process P rst : c (s) rt = A rs B st small matrix-matrix-product 2 sum: Special case S = 1: I r c rt = S s=1 J t all processors indepently c (s) rt fan-in in S = J t c rt I r Each process computes a block of c indepently without communication Each process needs full block of rows of A( I r ) and block of columns of B( J t ), to compute the block c rt with n q processor: each processor has to compute one DOT-product c rt = k a rk b kt in O(m) If we use more processors to compute all these DOT-products by fan-in, we can reduce the parallel complexity to O(log m) 3 Linear Equations with dense matrices 31 Gaussian Elimination: Basic facts Linear equations a 11 x a 1n x n = b 1 a n1 x a nn x n = b n a 11 a 1n x 1 b 1 = a n1 a nn x n b n }{{} Ax = b A = A (1) 29

31 Solving triangular equations is easy, so we try, to transform the given system in triangular form: (1) a 11 a 12 a 1n (2) a 21 a 22 a 2n (2) (2) a 21 a 11 (1) (n) (n) (n) a n1 a 11 (1) a n1 = A (2) = = A (3) = a nn a (2) 11 a (2) 12 a (2) 1n 0 a (2) 22 a (2) 2n a (2) 32 0 a (2) nn a (3) 11 a (3) 12 a (3) 1n 0 a (3) 22 a (3) 2n 0 a (3) a (3) n3 a (3) nn (3) (3) a 32 a 22 (2) (n) (n) a n2 a 22 (2) a 11 a 1n 0 A (n) = an 1,n 1 a n 1,n 0 0 a nn upper triangular form No pivoting (we assume a (k) kk 0 for all k), we ignore the right hand side b Algorithm: for k = 1:n-1 for i = k+1:n l ik = a ik a kk for i = k+1:n for j = k+1:n a ij = a ij l ik a kj Intermediate System A (u) = A (u) A (u) kk 0 0 A (u) nk A (u) kn A (u) nn 30

32 Define matrix with entries l ik from above algorithm l 21 1 L = and L k = l k+1,k l n,1 l n,n 1 1 l n,k 0 0 Each Elimination step in Gaussian Elimination can be written in the form A (k+1) = (1 L k ) A (k) = A (k) L k A (k) (3) with A (1) = A and A (n) = U = upper triangular U = A (n) = (1 l n 1 )A (n 1) = = (1 l n 1 ) (1 l 1 ) A }{{} (1) = L A L with L := (1 l n 1 ) (1 l 1 ) 1 l j lower triangular L lower triangular L 1 lower triangular A = L 1 U with L 1 lower and U upper triangular Theorem 2 L 1 = L Proof: i j 0 = l i l j = 0 0 i 0 0 j 0 0 Therefore (1 + l j )(1 l j ) = 1 + l j l j lj 2 = I and (1 l j ) 1 = 1 + l j L 1 [(1 l n 1 ) (1 l 1 )] 1 = (1 l 1 ) 1 (1 l n 1 ) 1 = (1 + l 1 )(1 + l 2 ) (1 + l n 1 ) = 1 + l 1 + l l n 1 = L because eg (1 + l 1 )(1 + l 2 ) = 1 + l 1 + l 2 + l 1 l }{{} 2 = 1 + l 1 + l 2 =0 Total: A = L U with L lower and U upper triangular 31

33 32 Vectorization of the Gaussian Elimination (kij)-form (standard) for k = 1:n-1 for i = k+1:n l i,k = a ik a kk for i = k+1:n for j = k+1:n a ij = a ij l ik a kj Vector operation α x SAXPY in row a i and a k U computed rowwise, columnwise In the following, we want to interchange the kij-loops: No GAXPY already computed unchanged no more computed L U A (n) newly computed updated in every step right looking GE Necessary condition: (ikj)-form: for i = 2:n for k = 1:i-1 l ik = a ik a kk j = k+1:n a ij = a ij l ik a kj 1 k < i n 1 k < j n GAXPY in a ii compute l i1 by SAXPY combine the 1st row and the i-th row, then compute l 12, and so on L and U are computed rowwise 32

34 already computed not used L i li1 A U already computed used newly computed unchanged (ijk)-form: for i = 2:n for j = 2:i l i,j 1 = a i,j 1 a j 1,j 1 for k = 1:j-1 a ij = a ij l ik a kj for j = i+1:n for k = 1:i-1 a ij = a ij l ik a kj (jki)-form for j = 2:n for k = j:n l k,j 1 = a k,j 1 a j 1,j 1 for k = 1:j-1 a ij = a ij l ik a kj α x DOT (upper left) DOT (upper right) GAXPY in a ij 33

35 computed j U already computed used L A unchanged not used left looking GE newly computed kij kji ikj ijk jki jik Access to AU row column row column column column Access to L column row column row Computation of U row row row row column column Computation of L column column row row column column Vector Operation SAXPY SAXPY GAXPY DOT GAXPY DOT 2 Vector Length Vector length = average of occurring vector lengths 33 Gaussian Elimination in Parallel: Blockwise GE (better in environment) (i) solve triangular system L : U = A indepently columns of U (ii) A 22 LU updating blocks (easy parallelize) (iii) small LU-decomposition l u 11 u 12 u 13 l 21 l u 22 u 23 l 31 l 32 l u 33 = = l 11 u 11 l 11 u 12 l 11 u 13 l 21 u 11 l 21 u 12 + l 22 u 22 l 21 u 13 + l 22 u 23 l 31 u 11 l 31 u 12 + l 32 u 22 A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33 Different ways of computing L and U, deping on ordering: different algorithm 34

36 331 Crout method l u 11 u 12 u 13 l 21 l u 22 u 23 l 31 l In Bold: already computed In italics: has to be computed in this step ( ) ( ) ( ) l22 u 22 l 22 u 23! A22 l = 21 u 12 A 23 l 21 u 13 Â22 Â = 23 l 32 u 22 A 32 l 31 u 12 Â 32 ( ) ( ) l22 Â22 (1) U 22 = by small LU-decomposition gives l 22, l 32, and U 22 l 32 Â 32 (2) l 22 u 23 = Â23 by solving triangular system in l 22 l 11 U 11 U 12 U 13 in total: l 21 l 22 U 22 U 23 = A l 31 l 32 l 33 U 33 Put the computed parts in the first row/column blocks of L and U Split l 33 and U 33 in new parts l 22, l 32, U 22, U 23 and repeat 332 left looking GE: U L l l 21 l 22 0 l 31 l 32 l 33 u 11 u 12 u 13 u 22 u 23 u 33 = A In Bold: already computed In italics: has to be computed in this step equations: l 11 u 12 = A 12 can be solved by triangular gives u 12 ( ) ( ) ( ) Â22 A22 l21 Compute = U Â 32 A 32 l 12 by matrix multiplication 31 ( ) ( ) l22 Â22 and U 22 = small LU-decomposition l 22, l 32, U 22 l 32 Â 32 35

37 333 Right looking / Gaussian Elimination standard l 11 u 11 u 12 u 13 l 21 l 22 u 22 u 23 = A l 31 l 32 l 33 u 33 In Bold: already computed In italics: has to be computed in this step A 11 = l 11 u 11 (small LU-decomposition) with equations: l 21 u 11 = A 21 l 21 ; l 11 u 12 = A 12 u 12 triangular solve l 22 u 22 = A 22 l 21 u 12 = Â22 by LU-decomposition of Â22 In comparison, all variants have nearly the same efficiency in parallel, flops in Matrix-Matrix-Multiplication, triangular solve and LU-decomposition 34 QR-Decomposition with Householder matrices 341 QR-decomposition Similar to LU-decomposition (numerically not stable) by Gaussian-Elimination We are interested in A = QR with Q orthogonal and R upper triangular b = Ax = QRx Rx = Q T b for solving linear system QR has advantages for ill-conditioned A Application for overdetermined systems A x! = b Ax = b has no solution best approximate solution by solving min Ax b 2 2 = min(x T A T Ax 2x T A T b + b T b) x x gradient equal zero leads to A T Ax = A T b (normal equation) A T A has a larger condition number than A Advantages of QR-decomposition: ( ) R1 A = QR, R =, cond(r 0 1 ) = cond(a) A T Ax = A T b (QR) T QRx = (QR) T b R T Rx = R T Q T b }{{} ( R1 T 0 ) ( ) R 1 x = ( R1 0 T 0 ) ˆb R T 1 R 1 x = ( R1 T 0 ) (ˆb1 ) ˆb2 R T 1 R 1 x = R T 1 ˆb 1 R 1 x = ˆb 1 ˆb 36

38 342 Householder method for QR u vector R n with length 1, u 2 = 1 H := 1 2uu T is called Householder matrix (rank-1-perturbation of identity) H is orthogonal (H T H = 1) ; H = H T : H T H = H 2 = (1 2uu T )(1 2uu T ) = 1 2uu T 2uu T + 4u u T u u T = I }{{} 1 First step: use H to transform the first column of A in upper triangular form: H 1 A = (1 2u 1 u T 1 )(a 1 ) = (a 1 2(u T 1 a 1 )u 1 )! = Hence we have to find u 1 of length 1 with a 1 2(u T! 1 a 1 )u 1 = αe 1 α H 1 is orthogonal, therefore a 1 2 = 0 : = α 0 We can set α = a 1, therefore u 1 = a 1 a 1 2 e 1 2(u T 1 a 1) = a 1 a 1 2 e 1 a 1 a 1 2 e 1 2 H 1 A 1 = (1 2u 1 u T 1 )A = Apply the same procedure on A 2 : * H 2 A 2 = (1 2u 2 u T 2 )A 2 = 0 0 a 1 * * 0 A 2 0 A 3 α 0 : 0 V 1 := u 1 (1 2u 2 u T 2 ) dimension n-1 ( ) 0 ext u 2 to vector of length n : v 2 :=, u 2 0 Hence H 2 H 1 A = (1 2v 2 v2 T )(1 2v 1 v1 T )A = 0 A

39 Total: H n 1 H 2 H }{{} 1 A = R = upper triangular Q T A = QR with Q = (H n 1 H 2 H 1 ) T = (H 1 H n 1 ) 343 Householder method in parallel Idea: Compute u 1 u k, but application of H k H 1 A in blocked form for elimination of first k columns Question: What is the structure of H k H i =: V k {}}{ A = ( A 1 A 2 ) =? QR compute u 1, H 1 = I 2u 1 u T 1, H 1 A 1 compute u 2, H 2 = I 2u 2 u T 2, H 2 (H 1 A 1 ) usw vḳ T Theorem 3 H k H i = (1 2v k vk T ) (1 2v ivi T ) = I (v k v i )T i vi T with T i upper triangular matrix Proof [ by Induction: (1 2vk vk T ) (1 2v i vi T ) ] (1 2v i 1 v }{{} i 1) T Assumption vḳ T = I (v k v i )T i (1 2v i 1 vi 1) T vi T vḳ T vk T = I 2v i 1 vi 1 T v i 1 (v k v i )T i + 2(v k v i ) T i vi 1 T vi T vi T v i 1 }{{} y vḳ ( ) T Ti -2y = I (v k v i v i 1 ) 0 2 vi T vi 1 T ( R * Computation of H k H i A as (I Y T Y T )A = A Y T (Y T A) = 0 à with y = (u 1,, u k ) and then repeat with à ) 38

40 4 Linear Equations with sparse matrices 41 General properties of sparse matrices Full n n matrix: O(n 2 )storage O(n 3 )solution } too costly Formulate the given problem such that the resulting linear system is sparse O(n) storage O(n) solution? example: tridiagonal: most of the entries are zero Example matrix: A = ; n = 5, nnz (number of nonzero entries) = Storage in coordinate form values AA row JR column JC Superfluous information: (storage nnz floating point numbers, 2nnz+2 integer numbers) Computation of C = Ab for j = 1 : nnz(a) C JR(j) = C JR(j) + AA (j) b JC(j); }{{} a JR(j),JC(j) indirect addressing (indexing) no c and b jumping in memory (Disadvantage) Advantage: does not prefer rows or columns 39

41 412 Compressed Sparse Row Format: CSR row1 row2 row 3 row4 row5 AA JA values column indices A of first row of last row pointer to begin of each row (nnz floating point numbers ; nnz+n+3 integer numbers) C = Ab: for i = 1 : n for IA(i) : IA(i + 1) 1 C(i) = C(i) + AA(j)b(JA(j)) only indirect addressing and jumps in b Compressed Sparse Column format 413 Improving CSR by extracting the main diagonal entries (nnz+1 floating point numbers ; nnz+1+2 integer numbers) AA JA main diagonal indices non diagonal entris in CSA, values and column indices * Pointer to begin of first row Storage: 2(nnz(A)+1) for i = 1 : n C(i) = AA(i)b(i) for JA(i) : JA(i + 1) 1 c(i) = c(i) + AA(j)b(JA(j)) 40

42 414 Diagonalwise storage, eg for band matrices example: only efficient for band matrices 415 rectangular, rowwise storage scheme by compressing from the right gives COEF (values) = JCOEF (columnination) = storage: n }{{} =5 * nnz of longest row of A }{{} nl=3 41

43 C = Ab ; C = 0 for i = 1 : n for j = 1 : nl C(i) = C(i) + COEF F (i, j) b(jcoef F (i, j)) ELLPACK 416 Jagged diagonal form First step: Sort rows after their length: Storage for PA in the form: DJ values: 3} 6 1{{ 9 11} 4} 7 2{{ 10 12} 5 8 first jagged diagonal second Column indices: JDIAG: IDIAG: C = Ab ; C = 0 j = 1 : NDIAG for i = 1 : length of j jagged diagonals {}}{ IDIAG(j + 1) IDIAG(j) k {}}{{}}{ C(i) = C(i) + DJ( IDIAG(j) +i 1)b(JDIAG( IDIAG(j) + i 1)) }{{} similar to SAXP Y Operations on local block data! k 42

44 42 Sparse Matrices and Graphs 421 A = A T > 0 (n n - matrix) symmetric, define Graph G(A): Knots, vertices: e 1,, e n ; edges (e i, e j ) a ij 0 0 example: A = G(A) = undirected graph: Graph G(A) has adjacency matrix A(G(A)) = has exactly the structure of A Symmetric permutation P AP T, by permuting row and columns of A in the same way: renumbering of the knots (renumbering 3 4 means, that r 3 r 4 and c 3 c 4 ) 43

45 422 A non symmetric: directed graph good sparsity pattern: Block Diagonal: 0 0 Example: A = = e e e e Graph splits into two subgraphs; use permutation that groups together edges in the same subgraph: 2 3 new G(A): P AP T = = ( A1 0 0 A 2 ) block diagonal Reduce the solution of the large given matrix to the solution of small block parts The block pattern is not disturbed in Gauss-Elimination 44

46 ( ) 1 ( ) A1 0 A = 0 A 2 0 A 1 2 a 11 a 1p 0 0 a Band Matrix: A = q a nn Gaussian Elimination without pivoting maintains this pattern cols: O(n pq) and A = LU with l u 11 u 1p l L = q1 0 0 and U = l nn 0 0 u nn with pivoting u will have a larger bandwidth Similarly: A= structure is preserved by GE 423 Dissection form preserved during GE 0 0 (no fill-in GE) 45

47 Schur Complement for Block Matrices: Reduce to smaller matrices: ( ) ( ) ( ) ( B1 B 2 B 1 1 D I B B 3 B 4 0 S 1 = 1 D + B 2 S 1! I 0 B 3 B1 1 B 3 D + B 4 S 1 = I ) Therefore B 1 D + B 2 S 1 =! 0 = D = B1 1 B 2 S 1 and B 3 D + B 4 S 1 =! I = I = B 3 B1 1 B 2 S 1 + B 4 S 1 = I = (B 4 B 3 B 1 1 B 2 )S 1 = S = B 4 B 3 B1 1 B 2 (Schur Complement) ( ) ( ) ( ) B1 B 2 I 0 B1 B = B 3 B 4 B 3 B S 1 I Instead of solving LE in B, we have to solve small systems in B 1 and S Application in Dissection form: A 1 0 F 1 0 A 2 F 2 G 1 G 2 A 3 ( ) A1 0 Schur complement relative to : 0 A 2 S = A 3 ( ) ( A G 1 G 2 0 A 1 2 = A 3 G 1 A 1 1 F 1 G 2 A 1 2 F 2 ) ( F1 F 2 ) Linear Equation in Dissection form: A 1 0 F 1 x 1 A 1 x 1 + F 1 x 3 = b 1 0 A 2 F 2 x 2 = A 2 x 2 + F 2 x 3 = b 2 G 1 G 2 A 3 x 3 G 1 x 1 + G 2 x 2 + A 3 x 3 = b 3 = x 1 = A 1 1 b 1 A 1 1 F 1 x 3 x 2 = A 1 2 b 2 A 1 2 F 2 x 3 = (G 1 A 1 1 b 1 G 1 A 1 1 F 1 x 3 ) + (G 2 A 1 2 b 2 G 2 A 1 2 F 2 x 3 ) + A 3 x 3 = b 3 = (A 3 G 1 A 1 1 F 1 G 2 A 1 2 F 2 )x 3 = b 3 G 1 A 1 1 b 1 G 2 A 1 2 b 2 Sx 3 = ˆb 3 46

48 1 Compute S by using A 1 1 and A Solve Sx 3 = b 3 3 Compute x 1 and x 2 by using A 1 1 and A 1 2 Sometimes S is full or too expensive to compute Then use iterative method for Sx 3 = ˆb 3, that uses only s*vector, which can be computed easily with F 1, F 2, G 1, G 2 and A 1 1, A Reordering 431 Smaller Bandwidth by Cuthill Mckee-Algorithm Given sparse matrix A, G(A) Define level sets: S 1 = {1} S 2 = set of new edges connected to S 1 by vertex {2, 3, 4} S 3 = set of new edges connected to S 2 by vertex {5, 6, 7} S 4 = {8, 9, 10} S 5 = {11} Starting from one chosen vertex, according distance to the start knot First edge in S 1 gets number 1 In each level set we sort and order the knots such that the first group of entries in S i are the neighbours of the first entry in S i 1, and the second group of entries in S i+1 are the neighbours of the second entry in S i, and so on 47

49 often Cuthill McKee-Algorithm ordering is reversed: Reverse Cuthill McKee 48

50 432 Dissection Reordering A, G(A), eg leads to pattern = 0 A 1 0 F 1 0 A 2 F 2 G 1 G 2 A

51 433 Algebraic pivoting: during GE (Numerical pivoting: choose largest a ij a kk as pivot element) Algebraic pivoting: choose largest a ij from sparse row/column a kk as pivot element (small fill-in during GE-step) Minimum degree re-ordering for A = A T > 0 first step: define r j = # entries in row j (non-zero) in the G(A) = # edges connected with vertex j Repeat choose i such that r i = min r j j (nearly empty now) choose a ii pivot element i 1 by symmetric permutation Do the elimination step in GE reduce the matrix by one Generalization to nonsymmetric case: Markowitz-Criterion define r j = # entries in row j (non-zero) c k = # entries in column k (non-zero) minimizes: min j,k (r j 1)(c k 1) choose a jk as pivot element Apply permutation to put a j,k in diagonal position In practise: mixtures of algebraic and numerical pivoting: include a condition, that a i,s should be not too small! 0 * * * Example: GE 0 * * 0 * 0 * * * full G(A): n-1 n 50

52 Cuthill-Mckee with starting edge {1} : S 1 = {1}, S 1 = {1,, n} given no improvement with start {2} : S 1 = {1}, S 2 = {1}, S 3 = {2,, n} Permutation such that smallest bandwidth also not very helpful Matching: set of edges, for each row/column index there is exactly one edge Matching gives a permutation of the rows nonzero diagonal entries Example: ω(π) = log a ij i,j nonzero move here for example (1,3) to (3,3) and (2,1) to (1,1) Perfect matching, maximizing ω(π) heuristic methods to get approximal solutions For symmetric matrix we need a symmetric permutation P AP T low permutation perfect matching ( ) ( ) ( 1 3 ) ( 3 2 ) : ( ) ( ) 1 3 2, 3 1,

53 bandwidth n/2 } * * * * * * 0 0 * * Minimum Degree: is very good Choose edge 1 with degree n-1 Therefore replace 1 by 2 with degree GE 0 * * * * * * 0 0 * Next pivot 3, and so on ; works in O(n) Global reordering: be permutation 1 n ; GE in O(n) * * * Change the numbering such that indices in a 2 2 permutation have subsequent numbers: Apply symmetric permutation The large entries appear in 2 2 diagonal blocks 52

54 44 Gaussian Elimination in Graph A = A T > 0 symmetric positive definite No need for numerical pivoting Example G(A): Choose as pivot * * * * 7 * * * * Fill in: pattern of row J is added to non zero entries in column 7 = indices connected with row 7 give a dense submatrix here the submatrix to row/column 3, 6, 8 and 11 gets dense leads to fill in 53

55 New graph: one step GE in the graph consists in - remove edge 7 - add vertices such that all neighbours of 7 get fully connected Definition: A fully connected graph is called clique, eg or or or or In each elimination step the pivot knot is removed (pivot row and column are removed) and a subclique in the graph is generated Connecting all neighbours of the pivot entry Next step in GE: with pivot 6: neighbours: 2, 3, 5, 8, 10,

56 Gaussian elimination can be modelled without numerical computations only by computing the graphs algebraically Advantages: - algebraic prestep is cheap - gives information on the data structure (pattern) of resulting matrices - shows whether Gaussian Elimination makes sense - formulation in cliques, because in the course of GE, there will appear more and more cliques: cliques give short discretization of the graphs 45 Different direct solvers Frontal methods for band matrices b b b 0 frontal matrix of size ( b+1)x(2 b+1) treated as dense matrix 0 - Apply first GE step with column pivoting in dense frontal matrix - compute next row/column and move frontal matrix one step right+down Multifrontal method for general sparse matrices 0 example: A = d 11 first pivot element is related to first frontal matrix, that contain all numbers related to one step GE with a 11 : a i1 a 1j a 11 : in dense submatrix: a 11 a 13 a 14 a a 31 a 13 a 31 a a 11 a 11 a a 41 a 13 a 41 a a 11 a 11 55

57 Because a 12 = 0, wee can in parallel consider a 22 and the frontal matrix, related to the one step GE with a 22 : a 22 a 23 a 24 a a 32 a 23 a 32 a a 22 a 22 a a 42 a 23 a 42 a a 22 a 22 The computations a ij a ij a i1a 1j a 11 and a ij a ij a i2a 2j a 22 are indepent and can be done in parallel 56

58 5 Iterative methods for sparse matrices X 0 initial guess (eg X 0 = 0) Iteration function φ : x k+1 = φ(x k ) gives sequence x 0, x 1, x 2, x 3, x 4, k should converge x k x = A 1 b (fast convergence) Advantage: computation of φ(x) needs only matrix-vector products Do not change the pattern It is easy to parallelize Big question: fast convergence? 51 stationary methods 511 Richardson Iteration for Solving Ax = b : x := A 1 b b = (A I + I)x = (A I)x + x = x = b + (I A)x = b + Nx Fix point iteration x = φ(x) with φ(x) = b + Nx x 0 start x k+1 = φ(x k ) = b + Nx k if x k convergent, x k x, then x = b + N x A x = b ˆx = x other formulation: φ(x) = b + x Ax = x + (b Ax) = x + r(x) r - residual Convergence analysis via Neumann Series x k = b + Nx k 1 = b + N(b + Nx k 2 ) = b + Nb + Nx k 2 = b + Nb + N 2 b + Nx k 3 = = b + Nb + + N k 1 b + N k x 0 = ( k 1 i=0 N i )b + N k x 0 Special case: x 0 = 0 : x k = ( k 1 j=0 N j )b x k span(b, Nb, N 2 b,, N k 1 b) = span(b, Ab, A 2 b,, A k 1 b) = K k (A, b) = Krylov-row of dimension k to matrix A and vector b, assume N < 1: then k 1 j=0 N j convergence j=0 N j = (I N) 1 = A 1 ( q j = 1 ) 1 q j=0 x k ( N j )b = (I N) 1 b = (I (I A)) 1 b = A 1 b = x j=0 Richardson gives convergent sequence if A=I Error: e k := x k ˆx e k+1 = x k+1 ˆx = (b + Nx k ) (b + N ˆx) = N(x }{{}}{{} k ˆx) = Ne k φ(x k ) φ(ˆx) 57

59 e k N e k 1 N 2 e k 2 N k e 0 N < 1 N k k 0 e k k 0 ; ρ(n) = ρ(i A) < 1 largest absolute value of an eigenvalue < 1 define a norm with A < 1 Eigenvalues of A have to be in a circle around 1 with radius Better splitting of A A := M N Modifications of Richardson to get better convergence b = Ax = (M N)x = Mx Nx x = M 1 b + M 1 Nx new φ(x) = M 1 b + M 1 Nx = M 1 b + M 1 (M A)x = M 1 (b Ax) + x = x + M 1 r(x) M should be simple (easy to solve) x k+1 = M 1 b + M 1 Nx k = x k + M 1 (b Ax k ) = x k + M 1 r k is equivalent to Richardson applied on M 1 Ax = M 1 b Therefore convergent for ρ(m 1 N) = ρ(i M 1 A) < 1 M is also called a precondition, because M 1 A should be better conditioned than A itself: M 1 A I 513 Jacobi (Diagonal) - Splitting: A = M N = D (L + U) with L: lower triangular, U: upper triangular, D: diagonal part of A -U = -L D x k+1 = D 1 b + D 1 (L + U)x k = D 1 b + D 1 (D A)x k = x k + D 1 r k convergent if ρ(m 1 N) = ρ(i D 1 A) < 1 58

60 elementwise: or a jj x k+1 j x k+1 j = 1 a jj (b j = b j j 1 m=1 n m=1 m j a jm x (k) m a jm x (k) m ) n m=j+1 a jm x (k) m To improve convergence: x k+1 = x k + D 1 r k D 1 r k correction step x k+1 = x k + include damping {}}{ ω D 1 r k with step length ω: damped Jacobi x k+1 = x k + ωd 1 r k = x k + ωd 1 (b Ax k ) = x k + ωd 1 b ωd 1 Ax k = (I ωd 1 A)x k + ωd 1 b = (I ωd 1 (D L U))x k + ωd 1 b = ωd 1 b + [(1 ω)i + ωd 1 (L + U)]x k convergent if ρ([(1 ω)i + ωd 1 (L + U) ]) < 1 }{{} I for ω 0 Jacobi method is easy to parallelize: for ω = 1: Jacobi: look for optimal ω only A * vector, D 1 * vector To improve convergence (Block Jacobi): A = -L -U D 514 Gauss-Seidel method by improving convergence a jj x (k+1) j = b j j 1 m=1 a jm x (k+1) m n m=j+1 a jm x (k) m j = 1, 2,, n 59

61 Try to use the newest available information in each step Advantage: fast convergence a kk x (k+1) j = b j j 1 m=1 a jm x (k+1) m n m=j+1 a jm x (k) m Dx k+1 = b + Lx k+1 + Ux k or (D L)x k+1 = b + Ux k is related to splitting A = (D L) }{{}}{{} U : Gauss-Seidel-method M N In each step we have to solve a triangular linear system: disadvantage in parallel! Data deping graphs for iteration methods reorder A colouring of the graph red-block (circle): = compromise: convergence parallelism convergence deping on ρ(1 (D L) 1 A) < 1 (D L) 1 Ax = (D L) 1 b Damping: x k+1 = x k + ω(d L) 1 r k Stationary methods can be written in the form in general: x k+1 = C + Bx k with constant vector C and iteration matrix B = x k + }{{} F preconditioner r k ρ(b) < 1 convergence B = I F A 52 Nonstationary Methods 521 Let A symmetric positive definite A = A T > 0 (spd) Consider function φ(x) = 1 2 (xt Ax) b T x Derivative: φ(x) = Ax b gradient φ Paraboloid x Minimum of φ is unique with φ( x) = A x b = 0 A x = b Compute x, solution of Ax = b by approximating minimum of φ iteratively 60

62 x k last iterate: find x k+1 = x k + λv with search direction v and stepsize λ, such that φ(x k+1 ) < φ(x k ) and hence x k+1 is nearer to minimum Search direction v: d φ (x dλ k + λv) λ=0 = φ(x k )v (directional derivative) <! 0 Optimal search direction v = φ(x k ) = b Ax k = r K x k+1 = x k + λr k stepsize λ : finding min λ φ(x k + λr K ) is a simple 1D-problem d φ(x dλ k + λr k ) = d ( 1 dλ 2 (xt k + λrt k )A(x k + λr k ) b T (x k + λr k )) = d dλ ( 1 2 xt k Ax k + λr T k Ax k + λ2 2 rt k Ar k b T x k λb T r k ) = r T k Ax k b T r k + λr T k Ar k = r T k r k + λr T k Ar k! = 0 λ = rt k r k r T k Ar k v k = r k Algorithm: x k+1 = x k + rt k r k r T k Ar k r k with r k = b Ax k Gradient Method, steepest decent locally optimal search directions are not globally optimal, if paraboloid is very distorted very small and large eigenvalues if condition(a) = A 2 A 1 2 = λmax λ min >> 1 cond(a) >> 1 guaranteed convergence, but mostly slow! To analyse this slow convergence also theoretically, we introduce the following norm (so-called A-norm) x A := x T Ax Then it holds for the error x x with x = A 1 b 61

63 x x 2 A = x A 1 b 2 A = (x A 1 b) T A(x A 1 b) = = x T Ax 2b T x + b T A 1 b = 2φ(x) + b T A 1 b Hence minimizing φ is equivalent to minimizing the error in the A-norm with x j+1 := x j + λ j r j, r j = b Ax j and λ j = rt j r j r T j Ar j we get the following inequality between φ(x j+1 ) and φ(x j ): φ(x j+1 ) = 1 2 (xt j + λ j r T j )A(x j + λ j r j ) (x T j + λ j r T j )b = 1 2 xt j Ax j + λ j x T j Ar j + λ2 j 2 rt j Ar j x T j b λ j r T j b = φ(x j ) + λ j r T j (Ax j b) + λ2 j 2 rt j Ar j (r T j r j) 2 = φ(x j ) rt j r j rj T Ar rt j j r j + 1 r T 2 (rj T Ar j) 2 j Ar j = φ(x j ) 1 2 = φ(x j ) 1 2 (r T j r j) 2 (r T j Ar j) (r T j r j) 2 rj T Ar j rj T A 1 rj T A 1 r j r j }{{} ρ j = φ(x j ) 1 2 ρ j(b Ax j ) T A 1 (b Ax j ) = φ(x j ) ρ j 2 (bt A 1 b + x T j Ax j 2b T x j ) = φ(x j ) ρ j (φ(x j ) bt A 1 b) φ(x j+1 ) bt A 1 b = φ(x j ) bt A 1 b ρ j (φ(x j ) bt A 1 b) x j+1 x 2 A = x j x 2 A (1 ρ j) error in next step ρ j = (rj T r j) 2 1 rj T Ar jrj T A 1 r j λ max 1/λ min }{{} λmax(a 1 ) = 1 cond(a) (range(a) = { rt j Ar j r T j r j r j A} [λ min (A), λ max (A)]) x j+1 x 2 A = (1 1 cond(a) ) x j x 2 A Is therefore cond(a) >> 1, then the improvement in every iteration step is nearly nothing Therefore we have very slow convergence! 62

64 522 Improving the gradient method conjugate gradients Ansatz: x k+1 = x k + α k p k (α k stepsize ; p k search direction) As search direction we do not use the gradient, but a modification of the gradient Choose new search direction such that p k is orthogonal to p j p T k Ap j = 0 We choose the new search direction as the projection of the gradient on the A-conjugate subspace relative to previous p k α k derived by I-dimensional minimization as before Algorithm: x 0 = 0, r 0 = b Ax 0 for k = 1, 2, : β k 1 = r T k 1 r k 1/r T k 2 r k 2 β 0 = 0 p k = r k 1 + β k 1 p k 1 (p k A-conjugate to p k 1, p k 2, ) α k = r T k 1 r k 1/p T k Ap k x k = x k 1 + α k p k (1-dimension min) r k = r k 1 α k Ap k if r k < ɛ : stop Conjugate gradient method Main properties of the computed vectors: p T j Ap k = 0 = r T j r k for j k span(p 1,, p j ) = span(r 0,, r j 1 ) = span(r 0, Ar 0,,, A (j 1) r 0 ) = K j (A, r 0 ) (Krylov subspaces) especially for x 0 = 0 : span(b, Ab,, A (j 1) b) = K j (A, b) x k is the best approximate solution in subspace K k (A, b) for x 0 = 0 : x k span(b, Ab,, A (j 1) b) and x k x A = main property: x = A 1 b min x x A x K(A,b) Choosing these special search directions, the 1D minimization gives us the best solution relative to a k-dimensional subspace In each step optimal solution to larger and larger subspaces! Consequence: after n steps: K n (A, b) = R n = x n = x in exact arithmetic or: min x k K n x k x A = 0 Unfortunately, this is only true in exact arithmetic Also convergence after n steps would be not good enough 63

65 error estimation for x 0 = 0 : e k A = x k x k = min x K k (a,b) x x A = min α j k 1 j=0 α j (A j b) x A = min p k 1 (A)b x A = min p k 1 (A)A x x A p k 1 (x) p k 1 (x) = min q k (A) ( x x 0 ) q k (0)=1 }{{} A = e k A e 0 a spd A = UΛU T (Λ: diagonal matrix/eigenvalues, in U: eigenvectors) we can write: e 0 = n ξ j u j (u 1,, u n are ONB of eigenvectors of A) j=1 { e }} 0 { n e k A = min q k (0)=1 q k (A) ξ j u j A j=1 = min n ξ j q k (A)u j A q k (0)=1 j=1 = min q k (0)=1 n ξ j q k (λ j )u j A j=1 min [ q k (0)=1 n q k(λ j ) n j=1 max n j=1 = min [ max q k(λ j ) ] e 0 A q k (0)=1 j=1 ξ j u j ] by choosing any polynomial with q k (0) = 1 and degree k, we can derive estimates for the error e k eg: q k (x) := (1 2 λ max+λ min x) K leads to: e k A max n k(λ j ) e 0 A = max n 2 λ j K e 0 j=1 j=1 λ max + λ min = 2λ max (1 ) k e 0 A λ max + λ min = ( λ min λ max λ max + λ min ) k e 0 A = ( cond(a) 1 cond(a) + 1 )k e 0 A Better estimates by normalized Chebychev polynomials: T n (x) = cos(n arccos(x)) ( ) k 1 e k A 2 cond(a) 1 T k ( cond(a)+1 cond(a) 1 ) cond(a)+1 64

66 eg assume that A has only two eigenvalues λ 1 and λ 2 set q 2 (x) := (λ 1 x)(λ 2 x) λ 1 λ 2 q 2 (0) = 1 e 2 A max j=1,2 q 2(λ j ) e 0 A = 0 convergence of cg-method after 2 steps! Similar behaviour for eigenvalue clusters After 2 steps: small error 523 GMRES for General Matrix A, not spd Consider small subspaces U m and determine optimal approximate solutions in these subspaces for Ax = b in U m so we restrict x to the form x = U m y: min x U m Ax b 2 = min y A(U m y) b 2 could be solved by normal equations U T ma T AU m y = U T ma T b What subspace U m should we choose? (relative to Ax=b): U m := U m (A, b) = span(b, Ab,, A m 1 b) (bad basis for U m ) First step: provide Orthonormal basis for U m (A, b) : u 1 := b/ b 2 for j = 2 : m ũ j := Au j 1 j 1 (u T k Au j 1 ) u k=1 }{{} k ũ j u 1,, u j 1 h k,j 1 u j := ũ j / ũ j }{{} 2 h j,j 1 j 1 j 1 Au j 1 = (u T k Au j 1 )u k + ũ j = h k,j 1 u k + h j,j 1 u j = k=1 k=1 j h k,j 1 u k k=1 AU m = A(u 1,, u m ) = (u 1,, u m+1 ) H m+1,m = Ũm H m+1,m with H m+1,m = h 11 h 1m h 21 0 hm,m 0 0 h m+1,m Upper m+1 m Hessenberg form 65

67 Now we can solve the minimization problem: min Ax b 2 = min A(U m y) b 2 x U m y = min y U m H(m+1,m) y b u 1 2 = min y U m ( H (m+1,m) y b e 1 ) 2 = min y H (m+1,m) y b e 1 2 because U m is part of an orthogonal matrix (invariant) We can use Givens rotation to compute a QR-decomposition of the upper Hessenberg matrix H (m+1,m) 0 G 1, G 2 0,, G m 0 ( ) gives Q H (m+1m) = G m G 2 G 1 H(m+1m) = R = Rm = min Ax b 2 = min x U m = min y = min y = min y H (m+1,m) y b e 1 2 y ( Rm y 0 ) y b G m G }{{} 1 e 1 2 bm ( ) Rm y 0 b 2 m ( Rm y b ) 1 b 2 2 Solution: GMRES: R m y = b 1 Y - Compute H (m+1,m) by Arnoldi-orthogonalization - compute QR-factorization - solve least squares problem x k 66

68 Iterative: enlarge U m to A m U m new column in H (m+1,m) new Givens matrix update QR new column in R solve enlarged LS by updating x k Gets very costly after 50 steps Restarted version: GMRES(20) x A( x x) = b Ax = b A x = r Call GMRES(20) for Ax = r r m 2 := Ax m b 2 = min Ax b 2 = min x U m = min p m 1 Ap m 1 (A)b b 2 = V 2 V 1 2 b }{{} 2 [ max j=1 r j 2 n x j α j (A j b)] b 2 m 1 A[ j=0 min q m (A)b 2 = q m(0)=1 min q m(0)=1 524 Convergence of cg or GMRES min V q m (1)V 1 b 2 q m(0)=1 q m (λ j ) ] = cond V r j 2 min max q m(λ j ) q m(0)=1 j=1 }{{} like cg Convergence of cg/gmres deps strongly on the position of eigenvalues A = 1 0 n X X X X X X X X X X X X GMRES needs n steps! Preconditioning: Improve the eigenvalue location of A: P 1 Ax = P 1 b (implicit) (P A) replace the given Ax = b by or MAx = Mb (explicit) (M A 1 ) (P1 1 AP2 1 )(P 2 x) = P1 1 b à x = b 67

69 symmetric: (P1 1 AP1 T )(P1 T x) = (P 1 b) (Ã should have clustered eigenvalues) stationary methods: A = M N ; b = Ax = (M N)x = Mx Nx x k+1 = M 1 b + M 1 Nx k = M 1 b + (I M 1 A)x k convergent iff I M 1 A < 1 eigenvalues of M 1 A are clustered near 1 Good splitting good precondition Improve stationary methods by using the related splitting as preconditions in cg or GMRES: (i) Jacobi-splitting with D = diag(a) Jacobi preconditioner M := D (ii) Gauss-Seidel splitting M := L + D (iii) ILU = incomplete LU decomposition: Apply GE-algorithm, but reduced cg on the pattern of the sparse matrix A 0 A = to L =, U = 0 A = LU + R Modification: ILU(0) related pattern of A ; ILU(1) related to L(0)U(0) ILUT: Treshhold ILU: Apply standard GE, but in each step sparsification by deleting all entries less or equal MILU: Modified ILU: = Apply GE with sparsification Move all deleted entries to the diagonal IC (incomplete cholesky - symmetric form of ILU) implicite preconditioners disadvantage: in each step we have to solve sparse triangular system ILU L,U hard to parallelize Idea: explicit preconditioner M A 1 to minimize AM I? choose Frobenius norm B 2 F = n (B j ) 2 2 = n (B i ) 2 2 j=1 i=1 68

70 choose matrix class polynomial preconditioner: A 1 (A n +γ n 1 A n 1 + +γ 1 A+γ 0 ) 0 γ 0 A 1 = A n 1 γ n 1 A n 2 γ 1 min I p m (A)A : min p m P m max 1 p m (λ)λ λ 1,λ m Assume that the eigenvalues in interval: 0 < c λ d < for min max 1 p m (λ)λ solution: p m (x) = T m+1( d c) T d+c m+1( d+c 2x p m P m λ [c,d] xt( d+c (transformation from oscillation [ 1, 1] to [c, d]) cg, GMRES are optimal in Krylov spaces (b, Ab, A 2 b, ) p m (A)b is easy to parallelize, but not optimal choose M: sparse matrices, same sparcity as A min AM I 2 F = min n (AM I)e j 2 2 = n M P (A) M P (A) j=1 min j=1 M P (A) d c) AM j e }{{} j vectors n indepent minimization problems for computing M 1, M 2,, M n d c ) 2 2 A( :, I j )M(I j ) e j (I j are the non-zero entry indices of M j ) J j := indices of non-zero rows of A( :, I j ) min A(J j, I j )M(I j ) e j (J j ) Least squares problem can be solved by QR-method, Givens or Householder Solve n indepent small LS problems, to get M To apply this preconditioner in cg or GMRES: we only have to multiply sparse M times vector 69

71 6 Collection remaining problems 61 Domain Decomposition Methods for Solving PDE G W region Ω with boundary Γ Given PDE, eg Laplace equation: u = u xx + u yy = δ2 u + δ2 u! = f(x, y) δx 2 δy 2 in Ω and u Γ = q Dirichlet problem How to parallelize? W 1 and G ~ ~ G 1 2 W 2 overlapping W Γ 1 boundary of Ω 1 with Γ 1 unknown values Γ 2 boundary of Ω 2 with Γ 2 unknown values Idea: Solve PDE Ω 1 with some Solve PDE Ω 2 with some estimated boundary values for Γ 1 some estimated boundary values for Γ 2 Exchange boundary values on Γ 1 and Γ 2 overlapping Domain Decomposition 70

72 Second approach: Nonoverlapping DD Dissection method: W 2 W 1 ~ G A 1 0 F 1 0 A 2 F 2 G 1 G 2 A 3 u 1 û 2 û 3 = f 1 f 2 f 3 Solve by Schur complement or preconditioner (S = A 3 G 1 A 1 1 F 1 G 2 A 1 2 F 2 ): A 1 1 M = A 1 2 to solving PDE in Ω 1 and Ω Parallel Computation of the Discrete Fourier Transformation Definition: ω n = e 2πi n ω n ω n (n 1) Y = 1 ω n (n 1) ω n (n 1)(n 1) ; y = DF T (x) ; y k = n 1 x 0 x n 1 ω kj j=0 n x j (k = 0, 1,, n 1) f 1 x f 2 x = f n x collection of n indepent dot-products For a dot-product we can use fan-in algorithm: n-processors O(logn) steps In total: n n processors O(logn) steps for DFT Complexity in parallel for DFT is O(logn) Sequentially the complexity of the DFT is O(n logn) (FFT-method) What is a good sparsity pattern for M? ( ) 1 ( ) A B = B A A priori patterns for M: A, A 2, A 3, triangular ; A T, A 2T, A 3T, orthogonal, (A T A), (A T A) 2 Sparsification: Delete small entries in A, (ɛ), A ɛ, A 2 ɛ, A T ɛ, 71

73 A priori static pattern for computing M Dynamic minimization, that finds a good pattern automatically: We start with the diagonal pattern or with A ɛ for example solve n related LS-problems M j How to find a better pattern? min A(M j + µ k e k ) e j 2 2 = min (AM j e j ) +Aµ µ k µ k }{{} k e k 2 2 r j d dµ r j + Aµ k e k 2 2 µ k = rt j Ae k Ae k 2 2 (most cases: µ k = 0) improvement: r j + µ k A j 2 = r j 2 2 (rt j Ae k) 2 Ae k 2 2 Factorized Sparse Approximation, Inverses: spd A = A T > 0 A = L T A L A (Cholesky factorization) ; A 1 LL T min L A L I F (sparsity structure of L) ; min L A (I, J)L K (J) e K (I) normal equations: L T A (I, J)L A(I, J)L K (J) = L T A (I, J)e K(I) A(J, J)L K (J) = L } A,KK e {{} K (I) (diagonal scaling) =1 First we compute L under the assumption L A,KK = 1 (diagonal entries) D = L T AL ; Replace L by LD 1/2 result {}}{ function (v 0,, v n 1 ) = IDF T if n == 1 then v 0 = c 0 ; else m = n ; 2 z1 = IDF T (c 0, c 2, c 4,, c n 2, m) z2 = IDF T (c 1, c 3, c 5,, c n 1, m) for j = 0,, m 1 v j = z1 + ω j z2j ; v m+j = z1 ω j z2j ; for if input {}}{ (c 0,, c n 1, n) (ω = e 2πi n ) 72

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8

More information

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1 Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations

More information

Computation of the mtx-vec product based on storage scheme on vector CPUs

Computation of the mtx-vec product based on storage scheme on vector CPUs BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix Computation of the mtx-vec product based on storage scheme on

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

Algebra C Numerical Linear Algebra Sample Exam Problems

Algebra C Numerical Linear Algebra Sample Exam Problems Algebra C Numerical Linear Algebra Sample Exam Problems Notation. Denote by V a finite-dimensional Hilbert space with inner product (, ) and corresponding norm. The abbreviation SPD is used for symmetric

More information

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations!

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations! Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES 48 Arnoldi Iteration, Krylov Subspaces and GMRES We start with the problem of using a similarity transformation to convert an n n matrix A to upper Hessenberg form H, ie, A = QHQ, (30) with an appropriate

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS YOUSEF SAAD University of Minnesota PWS PUBLISHING COMPANY I(T)P An International Thomson Publishing Company BOSTON ALBANY BONN CINCINNATI DETROIT LONDON MADRID

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

Computational Economics and Finance

Computational Economics and Finance Computational Economics and Finance Part II: Linear Equations Spring 2016 Outline Back Substitution, LU and other decomposi- Direct methods: tions Error analysis and condition numbers Iterative methods:

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

Lab 1: Iterative Methods for Solving Linear Systems

Lab 1: Iterative Methods for Solving Linear Systems Lab 1: Iterative Methods for Solving Linear Systems January 22, 2017 Introduction Many real world applications require the solution to very large and sparse linear systems where direct methods such as

More information

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU Preconditioning Techniques for Solving Large Sparse Linear Systems Arnold Reusken Institut für Geometrie und Praktische Mathematik RWTH-Aachen OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative

More information

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1 Scientific Computing WS 2018/2019 Lecture 9 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de Lecture 9 Slide 1 Lecture 9 Slide 2 Simple iteration with preconditioning Idea: Aû = b iterative scheme û = û

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

Solving linear systems (6 lectures)

Solving linear systems (6 lectures) Chapter 2 Solving linear systems (6 lectures) 2.1 Solving linear systems: LU factorization (1 lectures) Reference: [Trefethen, Bau III] Lecture 20, 21 How do you solve Ax = b? (2.1.1) In numerical linear

More information

Lecture 8: Fast Linear Solvers (Part 7)

Lecture 8: Fast Linear Solvers (Part 7) Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi

More information

LINEAR SYSTEMS (11) Intensive Computation

LINEAR SYSTEMS (11) Intensive Computation LINEAR SYSTEMS () Intensive Computation 27-8 prof. Annalisa Massini Viviana Arrigoni EXACT METHODS:. GAUSSIAN ELIMINATION. 2. CHOLESKY DECOMPOSITION. ITERATIVE METHODS:. JACOBI. 2. GAUSS-SEIDEL 2 CHOLESKY

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Solving Ax = b, an overview. Program

Solving Ax = b, an overview. Program Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview

More information

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009) Iterative methods for Linear System of Equations Joint Advanced Student School (JASS-2009) Course #2: Numerical Simulation - from Models to Software Introduction In numerical simulation, Partial Differential

More information

Linear Algebra. Brigitte Bidégaray-Fesquet. MSIAM, September Univ. Grenoble Alpes, Laboratoire Jean Kuntzmann, Grenoble.

Linear Algebra. Brigitte Bidégaray-Fesquet. MSIAM, September Univ. Grenoble Alpes, Laboratoire Jean Kuntzmann, Grenoble. Brigitte Bidégaray-Fesquet Univ. Grenoble Alpes, Laboratoire Jean Kuntzmann, Grenoble MSIAM, 23 24 September 215 Overview 1 Elementary operations Gram Schmidt orthonormalization Matrix norm Conditioning

More information

9.1 Preconditioned Krylov Subspace Methods

9.1 Preconditioned Krylov Subspace Methods Chapter 9 PRECONDITIONING 9.1 Preconditioned Krylov Subspace Methods 9.2 Preconditioned Conjugate Gradient 9.3 Preconditioned Generalized Minimal Residual 9.4 Relaxation Method Preconditioners 9.5 Incomplete

More information

Math 577 Assignment 7

Math 577 Assignment 7 Math 577 Assignment 7 Thanks for Yu Cao 1. Solution. The linear system being solved is Ax = 0, where A is a (n 1 (n 1 matrix such that 2 1 1 2 1 A =......... 1 2 1 1 2 and x = (U 1, U 2,, U n 1. By the

More information

Iterative methods for Linear System

Iterative methods for Linear System Iterative methods for Linear System JASS 2009 Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline Basics: Matrices and their properties Eigenvalues, Condition Number Iterative Methods Direct and

More information

Lecture 18 Classical Iterative Methods

Lecture 18 Classical Iterative Methods Lecture 18 Classical Iterative Methods MIT 18.335J / 6.337J Introduction to Numerical Methods Per-Olof Persson November 14, 2006 1 Iterative Methods for Linear Systems Direct methods for solving Ax = b,

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 2 Systems of Linear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University Lecture Note 7: Iterative methods for solving linear systems Xiaoqun Zhang Shanghai Jiao Tong University Last updated: December 24, 2014 1.1 Review on linear algebra Norms of vectors and matrices vector

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i ) Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical

More information

Notes on PCG for Sparse Linear Systems

Notes on PCG for Sparse Linear Systems Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

G1110 & 852G1 Numerical Linear Algebra

G1110 & 852G1 Numerical Linear Algebra The University of Sussex Department of Mathematics G & 85G Numerical Linear Algebra Lecture Notes Autumn Term Kerstin Hesse (w aw S w a w w (w aw H(wa = (w aw + w Figure : Geometric explanation of the

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Math 471 (Numerical methods) Chapter 3 (second half). System of equations Math 47 (Numerical methods) Chapter 3 (second half). System of equations Overlap 3.5 3.8 of Bradie 3.5 LU factorization w/o pivoting. Motivation: ( ) A I Gaussian Elimination (U L ) where U is upper triangular

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T. Heath Chapter 2 Systems of Linear Equations Copyright c 2001. Reproduction permitted only for noncommercial,

More information

FEM and sparse linear system solving

FEM and sparse linear system solving FEM & sparse linear system solving, Lecture 9, Nov 19, 2017 1/36 Lecture 9, Nov 17, 2017: Krylov space methods http://people.inf.ethz.ch/arbenz/fem17 Peter Arbenz Computer Science Department, ETH Zürich

More information

AM205: Assignment 2. i=1

AM205: Assignment 2. i=1 AM05: Assignment Question 1 [10 points] (a) [4 points] For p 1, the p-norm for a vector x R n is defined as: ( n ) 1/p x p x i p ( ) i=1 This definition is in fact meaningful for p < 1 as well, although

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give

More information

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A. AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0 CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0 GENE H GOLUB 1 What is Numerical Analysis? In the 1973 edition of the Webster s New Collegiate Dictionary, numerical analysis is defined to be the

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost

More information

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen

More information

Stabilization and Acceleration of Algebraic Multigrid Method

Stabilization and Acceleration of Algebraic Multigrid Method Stabilization and Acceleration of Algebraic Multigrid Method Recursive Projection Algorithm A. Jemcov J.P. Maruszewski Fluent Inc. October 24, 2006 Outline 1 Need for Algorithm Stabilization and Acceleration

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

COURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems.

COURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems. COURSE 9 4 Numerical methods for solving linear systems Practical solving of many problems eventually leads to solving linear systems Classification of the methods: - direct methods - with low number of

More information

APPLIED NUMERICAL LINEAR ALGEBRA

APPLIED NUMERICAL LINEAR ALGEBRA APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation

More information

Ax = b. Systems of Linear Equations. Lecture Notes to Accompany. Given m n matrix A and m-vector b, find unknown n-vector x satisfying

Ax = b. Systems of Linear Equations. Lecture Notes to Accompany. Given m n matrix A and m-vector b, find unknown n-vector x satisfying Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter Systems of Linear Equations Systems of Linear Equations Given m n matrix A and m-vector

More information

Scientific Computing: Solving Linear Systems

Scientific Computing: Solving Linear Systems Scientific Computing: Solving Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 September 17th and 24th, 2015 A. Donev (Courant

More information

A hybrid reordered Arnoldi method to accelerate PageRank computations

A hybrid reordered Arnoldi method to accelerate PageRank computations A hybrid reordered Arnoldi method to accelerate PageRank computations Danielle Parker Final Presentation Background Modeling the Web The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

Solving Linear Systems of Equations

Solving Linear Systems of Equations 1 Solving Linear Systems of Equations Many practical problems could be reduced to solving a linear system of equations formulated as Ax = b This chapter studies the computational issues about directly

More information

Iterative Methods and Multigrid

Iterative Methods and Multigrid Iterative Methods and Multigrid Part 3: Preconditioning 2 Eric de Sturler Preconditioning The general idea behind preconditioning is that convergence of some method for the linear system Ax = b can be

More information

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Introduction to Simulation - Lecture 2 Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Outline Reminder about

More information

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same.

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same. Introduction Matrix Operations Matrix: An m n matrix A is an m-by-n array of scalars from a field (for example real numbers) of the form a a a n a a a n A a m a m a mn The order (or size) of A is m n (read

More information

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value

More information

Iterative Methods for Linear Systems

Iterative Methods for Linear Systems Iterative Methods for Linear Systems 1. Introduction: Direct solvers versus iterative solvers In many applications we have to solve a linear system Ax = b with A R n n and b R n given. If n is large the

More information

Direct solution methods for sparse matrices. p. 1/49

Direct solution methods for sparse matrices. p. 1/49 Direct solution methods for sparse matrices p. 1/49 p. 2/49 Direct solution methods for sparse matrices Solve Ax = b, where A(n n). (1) Factorize A = LU, L lower-triangular, U upper-triangular. (2) Solve

More information

Poisson Solvers. William McLean. April 21, Return to Math3301/Math5315 Common Material.

Poisson Solvers. William McLean. April 21, Return to Math3301/Math5315 Common Material. Poisson Solvers William McLean April 21, 2004 Return to Math3301/Math5315 Common Material 1 Introduction Many problems in applied mathematics lead to a partial differential equation of the form a 2 u +

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y) 5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h

More information

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Review of matrices. Let m, n IN. A rectangle of numbers written like A = Review of matrices Let m, n IN. A rectangle of numbers written like a 11 a 12... a 1n a 21 a 22... a 2n A =...... a m1 a m2... a mn where each a ij IR is called a matrix with m rows and n columns or an

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.

More information

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS Jordan Journal of Mathematics and Statistics JJMS) 53), 2012, pp.169-184 A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS ADEL H. AL-RABTAH Abstract. The Jacobi and Gauss-Seidel iterative

More information

Gaussian Elimination for Linear Systems

Gaussian Elimination for Linear Systems Gaussian Elimination for Linear Systems Tsung-Ming Huang Department of Mathematics National Taiwan Normal University October 3, 2011 1/56 Outline 1 Elementary matrices 2 LR-factorization 3 Gaussian elimination

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

Solving linear equations with Gaussian Elimination (I)

Solving linear equations with Gaussian Elimination (I) Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Iteration basics Notes for 2016-11-07 An iterative solver for Ax = b is produces a sequence of approximations x (k) x. We always stop after finitely many steps, based on some convergence criterion, e.g.

More information

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 18th,

More information

The Solution of Linear Systems AX = B

The Solution of Linear Systems AX = B Chapter 2 The Solution of Linear Systems AX = B 21 Upper-triangular Linear Systems We will now develop the back-substitution algorithm, which is useful for solving a linear system of equations that has

More information

Lecture # 20 The Preconditioned Conjugate Gradient Method

Lecture # 20 The Preconditioned Conjugate Gradient Method Lecture # 20 The Preconditioned Conjugate Gradient Method We wish to solve Ax = b (1) A R n n is symmetric and positive definite (SPD). We then of n are being VERY LARGE, say, n = 10 6 or n = 10 7. Usually,

More information

Numerical Linear Algebra And Its Applications

Numerical Linear Algebra And Its Applications Numerical Linear Algebra And Its Applications Xiao-Qing JIN 1 Yi-Min WEI 2 August 29, 2008 1 Department of Mathematics, University of Macau, Macau, P. R. China. 2 Department of Mathematics, Fudan University,

More information

M.A. Botchev. September 5, 2014

M.A. Botchev. September 5, 2014 Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev

More information

Boundary Value Problems and Iterative Methods for Linear Systems

Boundary Value Problems and Iterative Methods for Linear Systems Boundary Value Problems and Iterative Methods for Linear Systems 1. Equilibrium Problems 1.1. Abstract setting We want to find a displacement u V. Here V is a complete vector space with a norm v V. In

More information

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization

More information

CLASSICAL ITERATIVE METHODS

CLASSICAL ITERATIVE METHODS CLASSICAL ITERATIVE METHODS LONG CHEN In this notes we discuss classic iterative methods on solving the linear operator equation (1) Au = f, posed on a finite dimensional Hilbert space V = R N equipped

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 21: Sensitivity of Eigenvalues and Eigenvectors; Conjugate Gradient Method Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis

More information

Fundamentals of Numerical Linear Algebra

Fundamentals of Numerical Linear Algebra Fundamentals of Numerical Linear Algebra Seongjai Kim Department of Mathematics and Statistics Mississippi State University Mississippi State, MS 39762 USA Email: skim@math.msstate.edu Updated: November

More information

MAT 610: Numerical Linear Algebra. James V. Lambers

MAT 610: Numerical Linear Algebra. James V. Lambers MAT 610: Numerical Linear Algebra James V Lambers January 16, 2017 2 Contents 1 Matrix Multiplication Problems 7 11 Introduction 7 111 Systems of Linear Equations 7 112 The Eigenvalue Problem 8 12 Basic

More information

Scientific Computing: Dense Linear Systems

Scientific Computing: Dense Linear Systems Scientific Computing: Dense Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 February 9th, 2012 A. Donev (Courant Institute)

More information

14.2 QR Factorization with Column Pivoting

14.2 QR Factorization with Column Pivoting page 531 Chapter 14 Special Topics Background Material Needed Vector and Matrix Norms (Section 25) Rounding Errors in Basic Floating Point Operations (Section 33 37) Forward Elimination and Back Substitution

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra The two principal problems in linear algebra are: Linear system Given an n n matrix A and an n-vector b, determine x IR n such that A x = b Eigenvalue problem Given an n n matrix

More information

Classical iterative methods for linear systems

Classical iterative methods for linear systems Classical iterative methods for linear systems Ed Bueler MATH 615 Numerical Analysis of Differential Equations 27 February 1 March, 2017 Ed Bueler (MATH 615 NADEs) Classical iterative methods for linear

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large

More information

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular form) Given: matrix C = (c i,j ) n,m i,j=1 ODE and num math: Linear algebra (N) [lectures] c phabala 2016 DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix

More information

Jae Heon Yun and Yu Du Han

Jae Heon Yun and Yu Du Han Bull. Korean Math. Soc. 39 (2002), No. 3, pp. 495 509 MODIFIED INCOMPLETE CHOLESKY FACTORIZATION PRECONDITIONERS FOR A SYMMETRIC POSITIVE DEFINITE MATRIX Jae Heon Yun and Yu Du Han Abstract. We propose

More information