MS 28: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods I

Size: px
Start display at page:

Download "MS 28: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods I"

Transcription

1 MS 28: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods I Organizers: Siegfried Cools, University of Antwerp, Belgium Erin C. Carson, New York University, USA 10:50-11:10 High Performance Variants of Krylov Subspace Methods Erin C. Carson, New York University, USA 11:15-11:35 About Parallel Variants of GMRES Algorithm Jocelyne Erhel, Inria-Rennes, France 11:40-12:00 Enlarged GMRES for Reducing Communication OlivierTissot, Inria, France

2 MS 40: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods II Organizers: Siegfried Cools, University of Antwerp, Belgium Erin C. Carson, New York University, USA 2:40-3:00 Impact of Noise Models on Pipelined Krylov Methods Hannah Morgan, University of Chicago, USA 3:05-3:25 Scalable Krylov Methods for Spectral Graph Partitioning Pieter Ghysels, Lawrence Berkeley National Laboratory, USA 3:30-3:50 Using Non-Blocking Communication to Achieve Scalability for Preconditioned Conjugate Gradient Methods William D. Gropp, University of Illinois at Urbana-Champaign, USA 3:55-4:15 Performance of S-Step and Pipelined Krylov Methods Piotr Luszczek, University of Tennessee, Knoxville, USA

3 High Performance Variants of Krylov Subspace Methods Erin Carson New York University SIAM PP18, Tokyo, Japan March 8, 2018

4 Collaborators Emmanuel Agullo, Inria, France Siegfried Cools, University of Antwerp, Belgium James Demmel, University of California, Berkeley, USA Pieter Ghysels, Lawrence Berkeley National Laboratory, USA Luc Giraud, Inria, France Miro Rozložník, Czech Academy of Sciences, Czech Republic Zdeněk Strakoš, Charles University, Czech Republic Petr Tichý, Czech Academy of Sciences, Czech Republic Miroslav Tůma, Czech Academy of Sciences, Czech Republic Wim Vanroose, Antwerp University, Belgium Emrullah Fatih Yetkin, Inria, France

5 Exascale System Projections Today's Systems Predicted Exascale Systems* Factor Improvement System Peak flops/s flops/s 100 Node Memory Bandwidth Interconnect Bandwidth 10 2 GB/s 10 3 GB/s GB/s 10 2 GB/s 10 Memory Latency 10 7 s s 2 Interconnect Latency 10 6 s s 2 *Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL) 1

6 Exascale System Projections Today's Systems Predicted Exascale Systems* Factor Improvement System Peak flops/s flops/s 100 Node Memory Bandwidth Interconnect Bandwidth 10 2 GB/s 10 3 GB/s GB/s 10 2 GB/s 10 Memory Latency 10 7 s s 2 Interconnect Latency 10 6 s s 2 *Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL) Movement of data (communication) is much more expensive than floating point operations (computation), in terms of both time and energy Reducing time spent moving data/waiting for data will be essential for applications at exascale! 1

7 Exascale System Projections Today's Systems Predicted Exascale Systems* Factor Improvement System Peak flops/s flops/s 100 Node Memory Bandwidth Interconnect Bandwidth 10 2 GB/s 10 3 GB/s GB/s 10 2 GB/s 10 Memory Latency 10 7 s s 2 Interconnect Latency 10 6 s s 2 *Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL) Movement of data (communication) is much more expensive than floating point operations (computation), in terms of both time and energy Reducing time spent moving data/waiting for data will be essential for applications at exascale! communication avoiding & communication hiding 1

8 Krylov Subspace Methods Krylov Subspace Method: projection process onto the Krylov subspace K i A, r 0 = span r 0, Ar 0, A 2 r 0,, A i 1 r 0 where A is an N N matrix and r 0 is a length-n vector 2

9 Krylov Subspace Methods Krylov Subspace Method: projection process onto the Krylov subspace K i A, r 0 = span r 0, Ar 0, A 2 r 0,, A i 1 r 0 where A is an N N matrix and r 0 is a length-n vector In each iteration: Add a dimension to the Krylov subspace Forms nested sequence of Krylov subspaces K 1 A, r 0 K 2 A, r 0 K i (A, r 0 ) rnew Aδ r 0 Orthogonalize (with respect to some C i ) Linear systems: Select approximate solution x i x 0 + K i (A, r 0 ) using r i = b Ax i C i 0 C 2

10 Krylov Subspace Methods Krylov Subspace Method: projection process onto the Krylov subspace K i A, r 0 = span r 0, Ar 0, A 2 r 0,, A i 1 r 0 where A is an N N matrix and r 0 is a length-n vector In each iteration: Add a dimension to the Krylov subspace Forms nested sequence of Krylov subspaces K 1 A, r 0 K 2 A, r 0 K i (A, r 0 ) rnew Aδ r 0 Orthogonalize (with respect to some C i ) Linear systems: Select approximate solution x i x 0 + K i (A, r 0 ) using r i = b Ax i C i 0 C Conjugate gradient method: A is symmetric positive definite, C i = K i (A, r 0 ) r i K i A, r 0 x x i A = min z x 0 +K i (A,r 0 ) x z A r N = 0 2

11 Conjugate Gradient on the World's Fastest Computer 3

12 Conjugate Gradient on the World's Fastest Computer current #1 on top500 3

13 Conjugate Gradient on the World's Fastest Computer current #1 on top500 Linpack benchmark (dense Ax = b, direct) 74% efficiency 3

14 Conjugate Gradient on the World's Fastest Computer current #1 on top500 Linpack benchmark (dense Ax = b, direct) 74% efficiency HPCG benchmark (sparse Ax = b, iterative) 0.4% efficiency 3

15 The Conjugate Gradient (CG) Method r 0 = b Ax 0, p 0 = r 0 for i = 1:nmax α i 1 = r T i 1 ri 1 p T i 1 Ap i 1 x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 Ap i 1 β i = r i T r i r T i 1 r i 1 end p i = r i + β i p i 1 4

16 The Conjugate Gradient (CG) Method r 0 = b Ax 0, p 0 = r 0 for i = 1:nmax α i 1 = r T i 1 ri 1 p T i 1 Ap i 1 x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 Ap i 1 Iteration Loop β i = r i T r i r T i 1 r i 1 end p i = r i + β i p i 1 4

17 The Conjugate Gradient (CG) Method r 0 = b Ax 0, p 0 = r 0 for i = 1:nmax α i 1 = r T i 1 ri 1 p T i 1 Ap i 1 x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 Ap i 1 Iteration Loop Sparse Matrix Vector β i = r i T r i r T i 1 r i 1 end p i = r i + β i p i 1 4

18 The Conjugate Gradient (CG) Method r 0 = b Ax 0, p 0 = r 0 for i = 1:nmax α i 1 = r T i 1 ri 1 p T i 1 Ap i 1 x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 Ap i 1 Iteration Loop Sparse Matrix Vector Inner Products β i = r i T r i r T i 1 r i 1 end p i = r i + β i p i 1 4

19 The Conjugate Gradient (CG) Method r 0 = b Ax 0, p 0 = r 0 for i = 1:nmax α i 1 = r T i 1 ri 1 p T i 1 Ap i 1 x i = x i 1 + α i 1 p i 1 Iteration Loop Sparse Matrix Vector Inner Products r i = r i 1 α i 1 Ap i 1 Vector Updates β i = r i T r i r T i 1 r i 1 end p i = r i + β i p i 1 4

20 The Conjugate Gradient (CG) Method r 0 = b Ax 0, p 0 = r 0 for i = 1:nmax α i 1 = r T i 1 ri 1 p T i 1 Ap i 1 x i = x i 1 + α i 1 p i 1 Iteration Loop Sparse Matrix Vector Inner Products r i = r i 1 α i 1 Ap i 1 Vector Updates β i = r i T r i r T i 1 r i 1 Inner Products end p i = r i + β i p i 1 4

21 The Conjugate Gradient (CG) Method r 0 = b Ax 0, p 0 = r 0 for i = 1:nmax α i 1 = r T i 1 ri 1 p T i 1 Ap i 1 x i = x i 1 + α i 1 p i 1 Iteration Loop Sparse Matrix Vector Inner Products r i = r i 1 α i 1 Ap i 1 Vector Updates β i = r i T r i r T i 1 r i 1 Inner Products end p i = r i + β i p i 1 Vector Updates 4

22 The Conjugate Gradient (CG) Method r 0 = b Ax 0, p 0 = r 0 for i = 1:nmax α i 1 = r T i 1 ri 1 p T i 1 Ap i 1 x i = x i 1 + α i 1 p i 1 Iteration Loop Sparse Matrix Vector Inner Products r i = r i 1 α i 1 Ap i 1 Vector Updates β i = r i T r i r T i 1 r i 1 Inner Products end p i = r i + β i p i 1 Vector Updates End Loop 4

23 Cost Per Iteration Sparse matrix-vector multiplication (SpMV) O(nnz) flops Must communicate vector entries w/neighboring processors (nearest neighbor MPI collective) 5

24 Cost Per Iteration Sparse matrix-vector multiplication (SpMV) O(nnz) flops Must communicate vector entries w/neighboring processors (nearest neighbor MPI collective) Inner products O(N) flops global synchronization (MPI_Allreduce) all processors must exchange data and wait for all communication to finish before proceeding 5

25 Cost Per Iteration Sparse matrix-vector multiplication (SpMV) O(nnz) flops Must communicate vector entries w/neighboring processors (nearest neighbor MPI collective) Inner products O(N) flops global synchronization (MPI_Allreduce) all processors must exchange data and wait for all communication to finish before proceeding SpMV orthogonalize Low computation/communication ratio Performance is communication-bound 5

26 Reducing Synchronization Cost Communication cost has motivated many approaches to reducing synchronization cost in Krylov subspace methods: Hiding communication: Pipelined Krylov subspace methods Introduce auxiliary vectors to decouple SpMV and inner products Enables overlapping of communication and computation Avoiding communication: s-step Krylov subspace methods Compute iterations in blocks of s (using a different Krylov subspace basis) Reduces number of synchronizations per iteration by a factor of O(s) * Both equivalent to classical CG in exact arithmetic 6

27 Pipelined CG (Ghysels and Vanroose 2013) r 0 = b Ax 0, p 0 = r 0 s 0 = Ap 0, w 0 = Ar 0, z 0 = Aw 0, α 0 = r 0 T r 0 /p 0 T s 0 for i = 1:nmax x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 s i 1 w i = w i 1 α i 1 z i 1 q i = Aw i β i = r i T r i r T i 1 r i 1 α i = w i T r i r i T ri β i α i 1 r i T r i end p i = r i + β i p i 1 s i = w i + β i s i 1 z i = q i + β i z i 1 7

28 Pipelined CG (Ghysels and Vanroose 2013) r 0 = b Ax 0, p 0 = r 0 s 0 = Ap 0, w 0 = Ar 0, z 0 = Aw 0, α 0 = r 0 T r 0 /p 0 T s 0 for i = 1:nmax Uses auxiliary vectors s i Ap i, w i Ar i, z i A 2 r i x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 s i 1 w i = w i 1 α i 1 z i 1 q i = Aw i β i = r i T r i r T i 1 r i 1 α i = w i T r i r i T ri β i α i 1 r i T r i end p i = r i + β i p i 1 s i = w i + β i s i 1 z i = q i + β i z i 1 7

29 Pipelined CG (Ghysels and Vanroose 2013) r 0 = b Ax 0, p 0 = r 0 s 0 = Ap 0, w 0 = Ar 0, z 0 = Aw 0, α 0 = r 0 T r 0 /p 0 T s 0 for i = 1:nmax x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 s i 1 w i = w i 1 α i 1 z i 1 q i = Aw i β i = α i = r i T r i r T i 1 r i 1 w i T r i r i T ri β i α i 1 r i T r i Uses auxiliary vectors s i Ap i, w i Ar i, z i A 2 r i Removes sequential dependency between SpMV and inner products Allows the use of nonblocking (asynchronous) MPI communication to overlap SpMV and inner products See talk by W. Gropp in Part II: MS40 end p i = r i + β i p i 1 s i = w i + β i s i 1 z i = q i + β i z i 1 Hides the latency of global communications 7

30 Pipelined CG (Ghysels and Vanroose 2013) r 0 = b Ax 0, p 0 = r 0 s 0 = Ap 0, w 0 = Ar 0, z 0 = Aw 0, α 0 = r 0 T r 0 /p 0 T s 0 for i = 1:nmax Iteration Loop x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 s i 1 w i = w i 1 α i 1 z i 1 q i = Aw i β i = r i T r i r T i 1 r i 1 α i = w i T r i r i T ri β i α i 1 r i T r i end p i = r i + β i p i 1 s i = w i + β i s i 1 z i = q i + β i z i 1 7

31 Pipelined CG (Ghysels and Vanroose 2013) r 0 = b Ax 0, p 0 = r 0 s 0 = Ap 0, w 0 = Ar 0, z 0 = Aw 0, α 0 = r 0 T r 0 /p 0 T s 0 for i = 1:nmax x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 s i 1 Iteration Loop Vector Updates w i = w i 1 α i 1 z i 1 q i = Aw i β i = r i T r i r T i 1 r i 1 α i = w i T r i r i T ri β i α i 1 r i T r i end p i = r i + β i p i 1 s i = w i + β i s i 1 z i = q i + β i z i 1 7

32 Pipelined CG (Ghysels and Vanroose 2013) Overlap r 0 = b Ax 0, p 0 = r 0 s 0 = Ap 0, w 0 = Ar 0, z 0 = Aw 0, α 0 = r 0 T r 0 /p 0 T s 0 for i = 1:nmax x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 s i 1 Iteration Loop Vector Updates w i = w i 1 α i 1 z i 1 q i = Aw i β i = r i T r i r T i 1 r i 1 Inner Products SpMV α i = w i T r i r i T ri β i α i 1 r i T r i end p i = r i + β i p i 1 s i = w i + β i s i 1 z i = q i + β i z i 1 7

33 Pipelined CG (Ghysels and Vanroose 2013) Overlap r 0 = b Ax 0, p 0 = r 0 s 0 = Ap 0, w 0 = Ar 0, z 0 = Aw 0, α 0 = r 0 T r 0 /p 0 T s 0 for i = 1:nmax x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 s i 1 Iteration Loop Vector Updates w i = w i 1 α i 1 z i 1 q i = Aw i β i = r i T r i r T i 1 r i 1 Inner Products SpMV α i = w i T r i r i T ri β i α i 1 r i T r i Vector Updates end p i = r i + β i p i 1 s i = w i + β i s i 1 z i = q i + β i z i 1 7

34 Pipelined CG (Ghysels and Vanroose 2013) Overlap r 0 = b Ax 0, p 0 = r 0 s 0 = Ap 0, w 0 = Ar 0, z 0 = Aw 0, α 0 = r 0 T r 0 /p 0 T s 0 for i = 1:nmax x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 s i 1 Iteration Loop Vector Updates w i = w i 1 α i 1 z i 1 q i = Aw i β i = r i T r i r T i 1 r i 1 Inner Products SpMV α i = w i T r i r i T ri β i α i 1 r i T r i Vector Updates p i = r i + β i p i 1 s i = w i + β i s i 1 End Loop end z i = q i + β i z i 1 7

35 Pipelined CG (Ghysels and Vanroose 2013) Overlap r 0 = b Ax 0, p 0 = r 0 s 0 = Ap 0, w 0 = Ar 0, z 0 = Aw 0, α 0 = r 0 T r 0 /p 0 T s 0 for i = 1:nmax Iteration Loop x i = x i 1 + α i 1 p i 1 r i = r i 1 α i 1 s i 1 Vector Updates w i = w i 1 α i 1 z i 1 q i = Aw i β i = r i T r i r T i 1 r i 1 Inner Products Precond SpMV α i = w i T r i r i T ri β i α i 1 r i T r i Vector Updates p i = r i + β i p i 1 s i = w i + β i s i 1 End Loop end z i = q i + β i z i 1 7

36 Overview of Pipelined KSMs Pipelined GMRES (Ghysels et al. 2013) Deep pipelines - compute l new Krylov basis vectors during global communication, orthogonalize after l iterations Talk by W. Vanroose, IP7 Sat March 10 Pipelined CG (Ghysels et al. 2013) With deep pipelines (Cornelis et al. 2018) Pipelined BiCGSTAB (Cools et al. 2017) Probabilistic performance modeling of pipelined KSMs Talk by H. Morgan, Part II: MS40 8

37 s-step CG r 0 = b Ax 0, p 0 = r 0 for k = 0:nmax/s Compute Y k and B k such that AY k = Y k B k and span(y k ) = K s+1 A, p sk G k = Y k T Y k x 0 = 0, r 0 = e s+2, p 0 = e 1 for j = 1: s end α sk+j 1 = x j = x j 1 r j = r j 1 β sk+j = r j T G k r j r T j 1 Gk r j 1 p T j 1 G k B k p j 1 + α sk+j 1 p j 1 α sk+j 1 B k p j 1 T G k r j 1 r j 1 p j = r j + β sk+j p j 1 + K s A, r sk Block iterations into groups of s Construct basis matrix Y k to expand Krylov subspace s dimensions at once Same latency cost as 1 SpMV (under assumptions on sparsity) 1 global synchronization to compute inner products between basis vectors Update coordinates of iteration vectors in the constructed basis requires no communication [x s k+1 x sk, r s k+1, p s k+1 ] = Y k [x s, r s, p s ] end 9

38 s-step CG r 0 = b Ax 0, p 0 = r 0 for k = 0:nmax/s Compute Y k and B k such that AY k = Y k B k and Outer Loop span(y k ) = K s+1 A, p sk + K s A, r sk G k = Y k T Y k x 0 = 0, r 0 = e s+2, p 0 = e 1 for j = 1: s end α sk+j 1 = x j = x j 1 r j = r j 1 β sk+j = r j T G k r j r T j 1 Gk r j 1 p T j 1 G k B k p j 1 + α sk+j 1 p j 1 α sk+j 1 B k p j 1 T G k r j 1 r j 1 p j = r j + β sk+j p j 1 [x s k+1 x sk, r s k+1, p s k+1 ] = Y k [x s, r s, p s ] end 9

39 s-step CG r 0 = b Ax 0, p 0 = r 0 for k = 0:nmax/s Compute Y k and B k such that AY k = Y k B k and span(y k ) = K s+1 A, p sk + K s A, r sk Outer Loop Compute basis O(s) SPMVs G k = Y k T Y k x 0 = 0, r 0 = e s+2, p 0 = e 1 for j = 1: s end α sk+j 1 = x j = x j 1 r j = r j 1 β sk+j = r j T G k r j r T j 1 Gk r j 1 p T j 1 G k B k p j 1 + α sk+j 1 p j 1 α sk+j 1 B k p j 1 T G k r j 1 r j 1 p j = r j + β sk+j p j 1 [x s k+1 x sk, r s k+1, p s k+1 ] = Y k [x s, r s, p s ] end 9

40 s-step CG r 0 = b Ax 0, p 0 = r 0 for k = 0:nmax/s Compute Y k and B k such that AY k = Y k B k and span(y k ) = K s+1 A, p sk G k = Y k T Y k x 0 = 0, r 0 = e s+2, p 0 = e 1 for j = 1: s end α sk+j 1 = x j = x j 1 r j = r j 1 β sk+j = r j T G k r j r T j 1 Gk r j 1 p T j 1 G k B k p j 1 + α sk+j 1 p j 1 α sk+j 1 B k p j 1 T G k r j 1 r j 1 p j = r j + β sk+j p j 1 + K s A, r sk Outer Loop Compute basis O(s) SPMVs O(s 2 ) Inner Products (one synchronization) [x s k+1 x sk, r s k+1, p s k+1 ] = Y k [x s, r s, p s ] end 9

41 s-step CG r 0 = b Ax 0, p 0 = r 0 for k = 0:nmax/s Compute Y k and B k such that AY k = Y k B k and span(y k ) = K s+1 A, p sk G k = Y k T Y k x 0 = 0, r 0 = e s+2, p 0 = e 1 for j = 1: s end α sk+j 1 = x j = x j 1 r j = r j 1 β sk+j = r j T G k r j r T j 1 Gk r j 1 p T j 1 G k B k p j 1 + α sk+j 1 p j 1 α sk+j 1 B k p j 1 T G k r j 1 r j 1 p j = r j + β sk+j p j 1 + K s A, r sk Outer Loop Compute basis O(s) SPMVs O(s 2 ) Inner Products (one synchronization) Inner Loop [x s k+1 x sk, r s k+1, p s k+1 ] = Y k [x s, r s, p s ] end 9

42 s-step CG r 0 = b Ax 0, p 0 = r 0 for k = 0:nmax/s Compute Y k and B k such that AY k = Y k B k and span(y k ) = K s+1 A, p sk G k = Y k T Y k x 0 = 0, r 0 = e s+2, p 0 = e 1 for j = 1: s end α sk+j 1 = x j = x j 1 r j = r j 1 β sk+j = r j T G k r j r T j 1 Gk r j 1 p T j 1 G k B k p j 1 + α sk+j 1 p j 1 α sk+j 1 B k p j 1 T G k r j 1 r j 1 p j = r j + β sk+j p j 1 + K s A, r sk Outer Loop Compute basis O(s) SPMVs O(s 2 ) Inner Products (one synchronization) Inner Loop Local Vector Updates (no comm.) [x s k+1 x sk, r s k+1, p s k+1 ] = Y k [x s, r s, p s ] end 9

43 s-step CG r 0 = b Ax 0, p 0 = r 0 for k = 0:nmax/s Compute Y k and B k such that AY k = Y k B k and span(y k ) = K s+1 A, p sk G k = Y k T Y k x 0 = 0, r 0 = e s+2, p 0 = e 1 for j = 1: s end α sk+j 1 = x j = x j 1 r j = r j 1 β sk+j = r j T G k r j r T j 1 Gk r j 1 p T j 1 G k B k p j 1 + α sk+j 1 p j 1 α sk+j 1 B k p j 1 T G k r j 1 r j 1 p j = r j + β sk+j p j 1 + K s A, r sk [x s k+1 x sk, r s k+1, p s k+1 ] = Y k [x s, r s, p s ] end Outer Loop Compute basis O(s) SPMVs O(s 2 ) Inner Products (one synchronization) Inner Loop Local Vector Updates (no comm.) End Inner Loop s times 9

44 s-step CG r 0 = b Ax 0, p 0 = r 0 for k = 0:nmax/s Compute Y k and B k such that AY k = Y k B k and span(y k ) = K s+1 A, p sk G k = Y k T Y k x 0 = 0, r 0 = e s+2, p 0 = e 1 for j = 1: s end α sk+j 1 = x j = x j 1 r j = r j 1 β sk+j = r j T G k r j r T j 1 Gk r j 1 p T j 1 G k B k p j 1 + α sk+j 1 p j 1 α sk+j 1 B k p j 1 T G k r j 1 r j 1 p j = r j + β sk+j p j 1 + K s A, r sk [x s k+1 x sk, r s k+1, p s k+1 ] = Y k [x s, r s, p s ] end Outer Loop Compute basis O(s) SPMVs O(s 2 ) Inner Products (one synchronization) Inner Loop Local Vector Updates (no comm.) End Inner Loop End Outer Loop s times 9

45 Overview of s-step KSMs s-step CG/Lanczos: (Van Rosendale, 1983), (Chronopoulos and Gear, 1989), (Leland, 1989), (Toledo, 1995), (Hoemmen et al., 2010) s-step GMRES/Arnoldi: (Walker, 1988), (Chronopoulous and Kim, 1990), (Bai, Hu, Reichel, 1991), (de Sturler, 1991), (Joubert, Carey, 1992), (Erhel, 1995), (Hoemmen et al., 2010) s-step BICGSTAB (C. et al., 2012) s-step QMR (Feuerriegel, Bücker, 2013) s-step LSQR (C., 2015) Many others... Recent work: Hybrid pipelined s-step methods (Yamazaki et al., 2017) Talk by P. Luszczek in Part II, MS40 Improving convergence rate and scalability in preconditioned s-step GMRES methods Talk by J. Erhel in MS28 (this session) 10

46 The effect of finite precision Well-known that roundoff error has two effects: 1. Delay of convergence No longer have exact Krylov subspace Can lose numerical rank deficiency Residuals no longer orthogonal - Minimization no longer exact! 2. Loss of attainable accuracy Rounding errors cause true residual b Ax i and updated residual r i deviate! 11

47 The effect of finite precision Well-known that roundoff error has two effects: 1. Delay of convergence No longer have exact Krylov subspace Can lose numerical rank deficiency Residuals no longer orthogonal - Minimization no longer exact! 2. Loss of attainable accuracy Rounding errors cause true residual b Ax i and updated residual r i deviate! A: bcsstk03 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 112, κ A 7e6 11

48 The effect of finite precision Well-known that roundoff error has two effects: 1. Delay of convergence No longer have exact Krylov subspace Can lose numerical rank deficiency Residuals no longer orthogonal - Minimization no longer exact! 2. Loss of attainable accuracy Rounding errors cause true residual b Ax i and updated residual r i deviate! A: bcsstk03 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 112, κ A 7e6 11

49 The effect of finite precision Well-known that roundoff error has two effects: 1. Delay of convergence No longer have exact Krylov subspace Can lose numerical rank deficiency Residuals no longer orthogonal - Minimization no longer exact! 2. Loss of attainable accuracy Rounding errors cause true residual b Ax i and updated residual r i deviate! A: bcsstk03 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 112, κ A 7e6 Much work on these results for CG; See Meurant and Strakoš (2006) for a thorough summary of early developments in finite precision analysis of Lanczos and CG 11

50 Optimizing high performance iterative solvers Synchronization-reducing variants are designed to reduce the time/iteration 12

51 Optimizing high performance iterative solvers Synchronization-reducing variants are designed to reduce the time/iteration But this is not the whole story! 12

52 Optimizing high performance iterative solvers Synchronization-reducing variants are designed to reduce the time/iteration But this is not the whole story! What we really want to minimize is the runtime, subject to some constraint on accuracy 12

53 Optimizing high performance iterative solvers Synchronization-reducing variants are designed to reduce the time/iteration But this is not the whole story! What we really want to minimize is the runtime, subject to some constraint on accuracy Changes to how the recurrences are computed can exacerbate finite precision effects of convergence delay and loss of accuracy A: bcsstk03 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 112, κ A 7e6 12

54 Optimizing high performance iterative solvers Synchronization-reducing variants are designed to reduce the time/iteration But this is not the whole story! What we really want to minimize is the runtime, subject to some constraint on accuracy Changes to how the recurrences are computed can exacerbate finite precision effects of convergence delay and loss of accuracy Crucial that we understand and take into account how algorithm modifications will affect the convergence rate and attainable accuracy! A: bcsstk03 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 112, κ A 7e6 12

55 Maximum attainable accuracy Accuracy depends on the size of the true residual: b A x i 13

56 Maximum attainable accuracy Accuracy depends on the size of the true residual: b A x i Rounding errors cause the true residual, b A x i, and the updated residual, r i, to deviate 13

57 Maximum attainable accuracy Accuracy depends on the size of the true residual: b A x i Rounding errors cause the true residual, b A x i, and the updated residual, r i, to deviate Writing b A x i = r i + b A x i r i, b A x i r i + b A x i r i As r i 0, b A x i depends on b A x i r i 13

58 Maximum attainable accuracy Accuracy depends on the size of the true residual: b A x i Rounding errors cause the true residual, b A x i, and the updated residual, r i, to deviate Writing b A x i = r i + b A x i r i, b A x i r i + b A x i r i As r i 0, b A x i depends on b A x i r i Many results on bounding attainable accuracy, e.g.: Greenbaum (1989, 1994, 1997), Sleijpen, van der Vorst and Fokkema (1994), Sleijpen, van der Vorst and Modersitzki (2001), Björck, Elfving and Strakoš (1998) and Gutknecht and Strakoš (2000). 13

59 Maximum attainable accuracy of HSCG In finite precision HSCG, iterates are updated by x i = x i 1 + α i 1 p i 1 δx i and r i = r i 1 α i 1 A p i 1 δr i 14

60 Maximum attainable accuracy of HSCG In finite precision HSCG, iterates are updated by x i = x i 1 + α i 1 p i 1 δx i and r i = r i 1 α i 1 A p i 1 δr i Let f i b A x i r i 14

61 Maximum attainable accuracy of HSCG In finite precision HSCG, iterates are updated by x i = x i 1 + α i 1 p i 1 δx i and r i = r i 1 α i 1 A p i 1 δr i Let f i b A x i r i f i = b A x i 1 + α i 1 p i 1 δx i r i 1 α i 1 A p i 1 δr i 14

62 Maximum attainable accuracy of HSCG In finite precision HSCG, iterates are updated by x i = x i 1 + α i 1 p i 1 δx i and r i = r i 1 α i 1 A p i 1 δr i Let f i b A x i r i f i = b A x i 1 + α i 1 p i 1 δx i = f i 1 + Aδx i + δr i r i 1 α i 1 A p i 1 δr i 14

63 Maximum attainable accuracy of HSCG In finite precision HSCG, iterates are updated by x i = x i 1 + α i 1 p i 1 δx i and r i = r i 1 α i 1 A p i 1 δr i Let f i b A x i r i f i = b A x i 1 + α i 1 p i 1 δx i = f i 1 + Aδx i + δr i r i 1 α i 1 A p i 1 δr i i = f 0 + m=1 Aδx m + δr m 14

64 Maximum attainable accuracy of HSCG In finite precision HSCG, iterates are updated by x i = x i 1 + α i 1 p i 1 δx i and r i = r i 1 α i 1 A p i 1 δr i Let f i b A x i r i f i = b A x i 1 + α i 1 p i 1 δx i = f i 1 + Aδx i + δr i r i 1 α i 1 A p i 1 δr i i = f 0 + m=1 Aδx m + δr m accumulation of local rounding errors 14

65 Maximum attainable accuracy of HSCG In finite precision HSCG, iterates are updated by x i = x i 1 + α i 1 p i 1 δx i and r i = r i 1 α i 1 A p i 1 δr i Let f i b A x i r i f i = b A x i 1 + α i 1 p i 1 δx i = f i 1 + Aδx i + δr i r i 1 α i 1 A p i 1 δr i i = f 0 + m=1 Aδx m + δr m accumulation of local rounding errors f i i O ε m=0 N A A x m + r m van der Vorst and Ye, 2000 f i O(ε) A x + max m=0,,i x m Greenbaum, 1997 f i O ε N A A A 1 i m=0 r m Sleijpen and van der Vorst,

66 Attainable accuracy of pipelined CG Pipelined CG updates x i and r i via: x i = x i 1 + α i 1 p i 1, r i = r i 1 α i 1 s i 1 15

67 Attainable accuracy of pipelined CG Pipelined CG updates x i and r i via: x i = x i 1 + α i 1 p i 1, r i = r i 1 α i 1 s i 1 In finite precision: x i = x i 1 + α i 1 p i 1 + δx i r i = r i 1 α i 1 s i 1 + δr i 15

68 Attainable accuracy of pipelined CG Pipelined CG updates x i and r i via: x i = x i 1 + α i 1 p i 1, r i = r i 1 α i 1 s i 1 In finite precision: x i = x i 1 + α i 1 p i 1 + δx i r i = r i 1 α i 1 s i 1 + δr i f i = r i (b A x i ) = f i 1 α i 1 s i 1 A p i 1 + δr i + Aδx i where i = f 0 + m=1 (δr m + Aδx m ) G i d i G i = S i A P i, d i = α 0,, α i 1 T 15

69 Attainable accuracy of pipelined CG Pipelined CG updates x i and r i via: x i = x i 1 + α i 1 p i 1, r i = r i 1 α i 1 s i 1 In finite precision: x i = x i 1 + α i 1 p i 1 + δx i r i = r i 1 α i 1 s i 1 + δr i f i = r i (b A x i ) = f i 1 α i 1 s i 1 A p i 1 + δr i + Aδx i where i = f 0 + m=1 (δr m + Aδx m ) G i d i G i = S i A P i, d i = α 0,, α i 1 T 15

70 Attainable accuracy of pipelined CG Pipelined CG updates x i and r i via: x i = x i 1 + α i 1 p i 1, r i = r i 1 α i 1 s i 1 In finite precision: x i = x i 1 + α i 1 p i 1 + δx i r i = r i 1 α i 1 s i 1 + δr i f i = r i (b A x i ) = f i 1 α i 1 s i 1 A p i 1 + δr i + Aδx i where i = f 0 + m=1 (δr m + Aδx m ) G i d i G i = S i A P i, d i = α 0,, α i 1 T 15

71 Attainable accuracy of pipelined CG Pipelined CG updates x i and r i via: x i = x i 1 + α i 1 p i 1, r i = r i 1 α i 1 s i 1 In finite precision: x i = x i 1 + α i 1 p i 1 + δx i r i = r i 1 α i 1 s i 1 + δr i f i = r i (b A x i ) = f i 1 α i 1 s i 1 A p i 1 + δr i + Aδx i where i = f 0 + m=1 (δr m + Aδx m ) G i d i G i = S i A P i, d i = α 0,, α i 1 T Amplification of local rounding errors possible depending on α i s and β i s See recent work: (Cools et al., 2017), (Carson et al., 2017) 15

72 Numerical Example A: bcsstk03 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 112, κ A 7e6 16

73 Numerical Example A: bcsstk03 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 112, κ A 7e6 A: nos4 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 100, κ A 2e3 16

74 Attainable accuracy of s-step CG f i b A x i r i For CG: f i f 0 + ε i m=1 1 + N A x m + r m 17

75 Attainable accuracy of s-step CG f i b A x i r i For CG: f i f 0 + ε i m=1 1 + N A x m + r m For s-step CG: i sk + j (see C., 2015) f sk+j f 0 + εcγ sk+j m=1 1 + N A x m + r m where c is a low-degree polynomial in s, and Γ = max l k Y l + Y l 17

76 Attainable accuracy of s-step CG f i b A x i r i For CG: f i f 0 + ε i m=1 1 + N A x m + r m For s-step CG: i sk + j (see C., 2015) f sk+j f 0 + εcγ sk+j m=1 1 + N A x m + r m where c is a low-degree polynomial in s, and Γ = max l k Y l + Y l Amplification of local rounding errors possible depending on conditioning of basis 17

77 Numerical example s-step CG with monomial basis (Y = [p i, Ap i,, A s p i, r i, Ar i, A s 1 r i ]) A: bcsstk03 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 112, κ A 7e6 18

78 Numerical example s-step CG with monomial basis (Y = [p i, Ap i,, A s p i, r i, Ar i, A s 1 r i ]) A: bcsstk03 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 112, κ A 7e6 18

79 Numerical example s-step CG with monomial basis (Y = [p i, Ap i,, A s p i, r i, Ar i, A s 1 r i ]) A: bcsstk03 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 112, κ A 7e6 18

80 Numerical example s-step CG with monomial basis (Y = [p i, Ap i,, A s p i, r i, Ar i, A s 1 r i ]) * Can also use other, more well-conditioned bases to improve convergence rate and accuracy (see, e.g. Philippe and Reichel, 2012). 18

81 Numerical example s-step CG with monomial basis (Y = [p i, Ap i,, A s p i, r i, Ar i, A s 1 r i ]) A: nos4 from UFSMC, b: equal components in the eigenbasis of A and b = 1 N = 100, κ A 2e3 19

82 Residual Replacement Idea: improve accuracy by replacing r i with fl(b A x i ) in certain iterations (Van der Vorst and Ye, 2000) 20

83 Residual Replacement Idea: improve accuracy by replacing r i with fl(b A x i ) in certain iterations (Van der Vorst and Ye, 2000) Choose when to replace based on estimate of f i b A x i r i Replace often enough such that f i remains small But don't replace when error in computing fl(b A x i ) would perturb recurrence and cause convergence delay See (Strakoš and Tichý, 2002) 20

84 Residual Replacement Idea: improve accuracy by replacing r i with fl(b A x i ) in certain iterations (Van der Vorst and Ye, 2000) Choose when to replace based on estimate of f i b A x i r i Replace often enough such that f i remains small But don't replace when error in computing fl(b A x i ) would perturb recurrence and cause convergence delay See (Strakoš and Tichý, 2002) This strategy can be adapted for both pipelined KSMs (Cools and Vanroose, 2017) and s-step KSMs (Carson and Demmel, 2014) 20

85 Residual Replacement Idea: improve accuracy by replacing r i with fl(b A x i ) in certain iterations (Van der Vorst and Ye, 2000) Choose when to replace based on estimate of f i b A x i r i Replace often enough such that f i remains small But don't replace when error in computing fl(b A x i ) would perturb recurrence and cause convergence delay See (Strakoš and Tichý, 2002) This strategy can be adapted for both pipelined KSMs (Cools and Vanroose, 2017) and s-step KSMs (Carson and Demmel, 2014) In both cases, estimate of f i can be computed inexpensively Improves accuracy to comparable level as classical method in many cases 20

86 Scalability of pipelined CG with RR 21

87 Conclusions and takeaways Need new approaches to reducing synchronization cost in Krylov subspace methods for large-scale problems Pipelined methods, s-step methods, hybrid approaches But must also consider finite precision behavior! 22

88 Conclusions and takeaways Need new approaches to reducing synchronization cost in Krylov subspace methods for large-scale problems Pipelined methods, s-step methods, hybrid approaches But must also consider finite precision behavior! Other communication-avoiding and communication-hiding approaches possible e.g., Enlarged Krylov subspace methods - Talk by O. Tissot, MS28 (this session) 22

89 Conclusions and takeaways Need new approaches to reducing synchronization cost in Krylov subspace methods for large-scale problems Pipelined methods, s-step methods, hybrid approaches But must also consider finite precision behavior! Other communication-avoiding and communication-hiding approaches possible e.g., Enlarged Krylov subspace methods - Talk by O. Tissot, MS28 (this session) Best approach is highly application-dependent Application of pipelined KSMs to graph partitioning - Talk by P. Ghysels in Part II: MS40 22

90 Conclusions and takeaways Need new approaches to reducing synchronization cost in Krylov subspace methods for large-scale problems Pipelined methods, s-step methods, hybrid approaches But must also consider finite precision behavior! Other communication-avoiding and communication-hiding approaches possible e.g., Enlarged Krylov subspace methods - Talk by O. Tissot, MS28 (this session) Best approach is highly application-dependent Application of pipelined KSMs to graph partitioning - Talk by P. Ghysels in Part II: MS40 Many interesting open problems and challenges as we push toward exascalelevel computing! 22

91 Thank You! math.nyu.edu/~erinc

High performance Krylov subspace method variants

High performance Krylov subspace method variants High performance Krylov subspace method variants and their behavior in finite precision Erin Carson New York University HPCSE17, May 24, 2017 Collaborators Miroslav Rozložník Institute of Computer Science,

More information

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods Efficient Deflation for Communication-Avoiding Krylov Subspace Methods Erin Carson Nicholas Knight, James Demmel Univ. of California, Berkeley Monday, June 24, NASCA 2013, Calais, France Overview We derive

More information

MS4: Minimizing Communication in Numerical Algorithms Part I of II

MS4: Minimizing Communication in Numerical Algorithms Part I of II MS4: Minimizing Communication in Numerical Algorithms Part I of II Organizers: Oded Schwartz (Hebrew University of Jerusalem) and Erin Carson (New York University) Talks: 1. Communication-Avoiding Krylov

More information

A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-step Krylov Subspace Methods

A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-step Krylov Subspace Methods A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-step Krylov Subspace Methods Erin Carson James Demmel Electrical Engineering and Computer Sciences University of California

More information

EXASCALE COMPUTING. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. P. Ghysels W.

EXASCALE COMPUTING. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. P. Ghysels W. ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm P. Ghysels W. Vanroose Report 12.2012.1, December 2012 This

More information

Scalable Non-blocking Preconditioned Conjugate Gradient Methods

Scalable Non-blocking Preconditioned Conjugate Gradient Methods Scalable Non-blocking Preconditioned Conjugate Gradient Methods Paul Eller and William Gropp University of Illinois at Urbana-Champaign Department of Computer Science Supercomputing 16 Paul Eller and William

More information

A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of Communication- Avoiding Krylov Subspace Methods

A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of Communication- Avoiding Krylov Subspace Methods A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of Communication- Avoiding Krylov Subspace Methods Erin Carson James Demmel Electrical Engineering and Computer Sciences University

More information

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact Zdeněk Strakoš Academy of Sciences and Charles University, Prague http://www.cs.cas.cz/ strakos Hong Kong, February

More information

Finite-choice algorithm optimization in Conjugate Gradients

Finite-choice algorithm optimization in Conjugate Gradients Finite-choice algorithm optimization in Conjugate Gradients Jack Dongarra and Victor Eijkhout January 2003 Abstract We present computational aspects of mathematically equivalent implementations of the

More information

Communication-avoiding Krylov subspace methods

Communication-avoiding Krylov subspace methods Motivation Communication-avoiding Krylov subspace methods Mark mhoemmen@cs.berkeley.edu University of California Berkeley EECS MS Numerical Libraries Group visit: 28 April 2008 Overview Motivation Current

More information

Numerical behavior of inexact linear solvers

Numerical behavior of inexact linear solvers Numerical behavior of inexact linear solvers Miro Rozložník joint results with Zhong-zhi Bai and Pavel Jiránek Institute of Computer Science, Czech Academy of Sciences, Prague, Czech Republic The fourth

More information

Communication Avoiding Strategies for the Numerical Kernels in Coupled Physics Simulations

Communication Avoiding Strategies for the Numerical Kernels in Coupled Physics Simulations ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Communication Avoiding Strategies for the Numerical Kernels in Coupled Physics Simulations SIAM Conference on Parallel Processing for Scientific Computing

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

Communication-Avoiding Krylov Subspace Methods in Theory and Practice. Erin Claire Carson. A dissertation submitted in partial satisfaction of the

Communication-Avoiding Krylov Subspace Methods in Theory and Practice. Erin Claire Carson. A dissertation submitted in partial satisfaction of the Communication-Avoiding Krylov Subspace Methods in Theory and Practice by Erin Claire Carson A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in

More information

Error analysis of the s-step Lanczos method in finite precision

Error analysis of the s-step Lanczos method in finite precision Error analysis of the s-step Lanczos method in finite precision Erin Carson James Demmel Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2014-55

More information

S-Step and Communication-Avoiding Iterative Methods

S-Step and Communication-Avoiding Iterative Methods S-Step and Communication-Avoiding Iterative Methods Maxim Naumov NVIDIA, 270 San Tomas Expressway, Santa Clara, CA 95050 Abstract In this paper we make an overview of s-step Conjugate Gradient and develop

More information

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Key words. conjugate gradients, normwise backward error, incremental norm estimation. Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322

More information

Iterative Methods for Linear Systems of Equations

Iterative Methods for Linear Systems of Equations Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology Overview day 4 Bi-Lanczos method

More information

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method Leslie Foster 11-5-2012 We will discuss the FOM (full orthogonalization method), CG,

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

Sparse linear solvers: iterative methods and preconditioning

Sparse linear solvers: iterative methods and preconditioning Sparse linear solvers: iterative methods and preconditioning L Grigori ALPINES INRIA and LJLL, UPMC March 2017 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Krylov subspace

More information

On the accuracy of saddle point solvers

On the accuracy of saddle point solvers On the accuracy of saddle point solvers Miro Rozložník joint results with Valeria Simoncini and Pavel Jiránek Institute of Computer Science, Czech Academy of Sciences, Prague, Czech Republic Seminar at

More information

MARCH 24-27, 2014 SAN JOSE, CA

MARCH 24-27, 2014 SAN JOSE, CA MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =

More information

Sensitivity of Gauss-Christoffel quadrature and sensitivity of Jacobi matrices to small changes of spectral data

Sensitivity of Gauss-Christoffel quadrature and sensitivity of Jacobi matrices to small changes of spectral data Sensitivity of Gauss-Christoffel quadrature and sensitivity of Jacobi matrices to small changes of spectral data Zdeněk Strakoš Academy of Sciences and Charles University, Prague http://www.cs.cas.cz/

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Iterative methods, preconditioning, and their application to CMB data analysis. Laura Grigori INRIA Saclay

Iterative methods, preconditioning, and their application to CMB data analysis. Laura Grigori INRIA Saclay Iterative methods, preconditioning, and their application to CMB data analysis Laura Grigori INRIA Saclay Plan Motivation Communication avoiding for numerical linear algebra Novel algorithms that minimize

More information

B. Reps 1, P. Ghysels 2, O. Schenk 4, K. Meerbergen 3,5, W. Vanroose 1,3. IHP 2015, Paris FRx

B. Reps 1, P. Ghysels 2, O. Schenk 4, K. Meerbergen 3,5, W. Vanroose 1,3. IHP 2015, Paris FRx Communication Avoiding and Hiding in preconditioned Krylov solvers 1 Applied Mathematics, University of Antwerp, Belgium 2 Future Technology Group, Berkeley Lab, USA 3 Intel ExaScience Lab, Belgium 4 USI,

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

Moments, Model Reduction and Nonlinearity in Solving Linear Algebraic Problems

Moments, Model Reduction and Nonlinearity in Solving Linear Algebraic Problems Moments, Model Reduction and Nonlinearity in Solving Linear Algebraic Problems Zdeněk Strakoš Charles University, Prague http://www.karlin.mff.cuni.cz/ strakos 16th ILAS Meeting, Pisa, June 2010. Thanks

More information

Principles and Analysis of Krylov Subspace Methods

Principles and Analysis of Krylov Subspace Methods Principles and Analysis of Krylov Subspace Methods Zdeněk Strakoš Institute of Computer Science, Academy of Sciences, Prague www.cs.cas.cz/~strakos Ostrava, February 2005 1 With special thanks to C.C.

More information

APPLIED NUMERICAL LINEAR ALGEBRA

APPLIED NUMERICAL LINEAR ALGEBRA APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

A stable variant of Simpler GMRES and GCR

A stable variant of Simpler GMRES and GCR A stable variant of Simpler GMRES and GCR Miroslav Rozložník joint work with Pavel Jiránek and Martin H. Gutknecht Institute of Computer Science, Czech Academy of Sciences, Prague, Czech Republic miro@cs.cas.cz,

More information

On the loss of orthogonality in the Gram-Schmidt orthogonalization process

On the loss of orthogonality in the Gram-Schmidt orthogonalization process CERFACS Technical Report No. TR/PA/03/25 Luc Giraud Julien Langou Miroslav Rozložník On the loss of orthogonality in the Gram-Schmidt orthogonalization process Abstract. In this paper we study numerical

More information

1 Conjugate gradients

1 Conjugate gradients Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

ETNA Kent State University

ETNA Kent State University Electronic Transactions on Numerical Analysis. Volume 43, pp. 125-141, 2014. Copyright 2014,. ISSN 1068-9613. ETNA AN EFFICIENT DEFLATION TECHNIQUE FOR THE COMMUNICATION- AVOIDING CONJUGATE GRADIENT METHOD

More information

M.A. Botchev. September 5, 2014

M.A. Botchev. September 5, 2014 Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev

More information

Block Bidiagonal Decomposition and Least Squares Problems

Block Bidiagonal Decomposition and Least Squares Problems Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008 Outline Bidiagonal Decomposition

More information

Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method

Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method Zdeněk Strakoš and Petr Tichý Institute of Computer Science AS CR, Technical University of Berlin. Emory

More information

Reduced Synchronization Overhead on. December 3, Abstract. The standard formulation of the conjugate gradient algorithm involves

Reduced Synchronization Overhead on. December 3, Abstract. The standard formulation of the conjugate gradient algorithm involves Lapack Working Note 56 Conjugate Gradient Algorithms with Reduced Synchronization Overhead on Distributed Memory Multiprocessors E. F. D'Azevedo y, V.L. Eijkhout z, C. H. Romine y December 3, 1999 Abstract

More information

Parallel sparse linear solvers and applications in CFD

Parallel sparse linear solvers and applications in CFD Parallel sparse linear solvers and applications in CFD Jocelyne Erhel Joint work with Désiré Nuentsa Wakam () and Baptiste Poirriez () SAGE team, Inria Rennes, France journée Calcul Intensif Distribué

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

PETROV-GALERKIN METHODS

PETROV-GALERKIN METHODS Chapter 7 PETROV-GALERKIN METHODS 7.1 Energy Norm Minimization 7.2 Residual Norm Minimization 7.3 General Projection Methods 7.1 Energy Norm Minimization Saad, Sections 5.3.1, 5.2.1a. 7.1.1 Methods based

More information

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially

More information

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH V. FABER, J. LIESEN, AND P. TICHÝ Abstract. Numerous algorithms in numerical linear algebra are based on the reduction of a given matrix

More information

Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method

Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method Zdeněk Strakoš and Petr Tichý Institute of Computer Science AS CR, Technical University of Berlin. International

More information

Variants of BiCGSafe method using shadow three-term recurrence

Variants of BiCGSafe method using shadow three-term recurrence Variants of BiCGSafe method using shadow three-term recurrence Seiji Fujino, Takashi Sekimoto Research Institute for Information Technology, Kyushu University Ryoyu Systems Co. Ltd. (Kyushu University)

More information

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computations ªaw University of Technology Institute of Mathematics and Computer Science Warsaw, October 7, 2006

More information

On the influence of eigenvalues on Bi-CG residual norms

On the influence of eigenvalues on Bi-CG residual norms On the influence of eigenvalues on Bi-CG residual norms Jurjen Duintjer Tebbens Institute of Computer Science Academy of Sciences of the Czech Republic duintjertebbens@cs.cas.cz Gérard Meurant 30, rue

More information

A communication-avoiding thick-restart Lanczos method on a distributed-memory system

A communication-avoiding thick-restart Lanczos method on a distributed-memory system A communication-avoiding thick-restart Lanczos method on a distributed-memory system Ichitaro Yamazaki and Kesheng Wu Lawrence Berkeley National Laboratory, Berkeley, CA, USA Abstract. The Thick-Restart

More information

Some minimization problems

Some minimization problems Week 13: Wednesday, Nov 14 Some minimization problems Last time, we sketched the following two-step strategy for approximating the solution to linear systems via Krylov subspaces: 1. Build a sequence of

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 9 Minimizing Residual CG

More information

Application of a Preconditioned Chebyshev Basis Communication-Avoiding Conjugate Gradient Method to a Multiphase Thermal-Hydraulic CFD Code

Application of a Preconditioned Chebyshev Basis Communication-Avoiding Conjugate Gradient Method to a Multiphase Thermal-Hydraulic CFD Code Application of a Preconditioned Chebyshev Basis Communication-Avoiding Conjugate Gradient Method to a Multiphase Thermal-Hydraulic CFD Code Yasuhiro Idomura 1(B), Takuya Ina 1, Akie Mayumi 1, Susumu Yamada

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT 16-02 The Induced Dimension Reduction method applied to convection-diffusion-reaction problems R. Astudillo and M. B. van Gijzen ISSN 1389-6520 Reports of the Delft

More information

Performance Evaluation of GPBiCGSafe Method without Reverse-Ordered Recurrence for Realistic Problems

Performance Evaluation of GPBiCGSafe Method without Reverse-Ordered Recurrence for Realistic Problems Performance Evaluation of GPBiCGSafe Method without Reverse-Ordered Recurrence for Realistic Problems Seiji Fujino, Takashi Sekimoto Abstract GPBiCG method is an attractive iterative method for the solution

More information

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation Tao Zhao 1, Feng-Nan Hwang 2 and Xiao-Chuan Cai 3 Abstract In this paper, we develop an overlapping domain decomposition

More information

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,

More information

On the properties of Krylov subspaces in finite precision CG computations

On the properties of Krylov subspaces in finite precision CG computations On the properties of Krylov subspaces in finite precision CG computations Tomáš Gergelits, Zdeněk Strakoš Department of Numerical Mathematics, Faculty of Mathematics and Physics, Charles University in

More information

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES 48 Arnoldi Iteration, Krylov Subspaces and GMRES We start with the problem of using a similarity transformation to convert an n n matrix A to upper Hessenberg form H, ie, A = QHQ, (30) with an appropriate

More information

Inexactness and flexibility in linear Krylov solvers

Inexactness and flexibility in linear Krylov solvers Inexactness and flexibility in linear Krylov solvers Luc Giraud ENSEEIHT (N7) - IRIT, Toulouse Matrix Analysis and Applications CIRM Luminy - October 15-19, 2007 in honor of Gérard Meurant for his 60 th

More information

HOW TO MAKE SIMPLER GMRES AND GCR MORE STABLE

HOW TO MAKE SIMPLER GMRES AND GCR MORE STABLE HOW TO MAKE SIMPLER GMRES AND GCR MORE STABLE PAVEL JIRÁNEK, MIROSLAV ROZLOŽNÍK, AND MARTIN H. GUTKNECHT Abstract. In this paper we analyze the numerical behavior of several minimum residual methods, which

More information

AN EFFICIENT DEFLATION TECHNIQUE FOR THE COMMUNICATION- AVOIDING CONJUGATE GRADIENT METHOD

AN EFFICIENT DEFLATION TECHNIQUE FOR THE COMMUNICATION- AVOIDING CONJUGATE GRADIENT METHOD AN EFFICIENT DEFLATION TECHNIQUE FOR THE COMMUNICATION- AVOIDING CONJUGATE GRADIENT METHOD ERIN CARSON, NICHOLAS KNIGHT, AND JAMES DEMMEL Abstract. Communication-avoiding Krylov subspace methods (CA-KSMs)

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Krylov Space Solvers

Krylov Space Solvers Seminar for Applied Mathematics ETH Zurich International Symposium on Frontiers of Computational Science Nagoya, 12/13 Dec. 2005 Sparse Matrices Large sparse linear systems of equations or large sparse

More information

A Parallel Implementation of the Davidson Method for Generalized Eigenproblems

A Parallel Implementation of the Davidson Method for Generalized Eigenproblems A Parallel Implementation of the Davidson Method for Generalized Eigenproblems Eloy ROMERO 1 and Jose E. ROMAN Instituto ITACA, Universidad Politécnica de Valencia, Spain Abstract. We present a parallel

More information

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1 Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations

More information

On the Vorobyev method of moments

On the Vorobyev method of moments On the Vorobyev method of moments Zdeněk Strakoš Charles University in Prague and Czech Academy of Sciences http://www.karlin.mff.cuni.cz/ strakos Conference in honor of Volker Mehrmann Berlin, May 2015

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Matching moments and matrix computations

Matching moments and matrix computations Matching moments and matrix computations Jörg Liesen Technical University of Berlin and Petr Tichý Czech Academy of Sciences and Zdeněk Strakoš Charles University in Prague and Czech Academy of Sciences

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method direct and indirect methods positive definite linear systems Krylov sequence spectral analysis of Krylov sequence preconditioning Prof. S. Boyd, EE364b, Stanford University Three

More information

From here to there. peta exa-scale transient simulation. Lawrence Mitchell 1, 27th March

From here to there. peta exa-scale transient simulation. Lawrence Mitchell 1, 27th March From here to there challenges for peta exa-scale transient simulation Lawrence Mitchell 1, 27th March 2017 1 Departments of Computing and Mathematics, Imperial College London lawrence.mitchell@imperial.ac.uk

More information

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

Golub-Kahan iterative bidiagonalization and determining the noise level in the data

Golub-Kahan iterative bidiagonalization and determining the noise level in the data Golub-Kahan iterative bidiagonalization and determining the noise level in the data Iveta Hnětynková,, Martin Plešinger,, Zdeněk Strakoš, * Charles University, Prague ** Academy of Sciences of the Czech

More information

The Performance Evolution of the Parallel Ocean Program on the Cray X1

The Performance Evolution of the Parallel Ocean Program on the Cray X1 The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott

More information

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give

More information

Implementation of a preconditioned eigensolver using Hypre

Implementation of a preconditioned eigensolver using Hypre Implementation of a preconditioned eigensolver using Hypre Andrew V. Knyazev 1, and Merico E. Argentati 1 1 Department of Mathematics, University of Colorado at Denver, USA SUMMARY This paper describes

More information

Parallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen

Parallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen Parallel Iterative Methods for Sparse Linear Systems Lehrstuhl für Hochleistungsrechnen www.sc.rwth-aachen.de RWTH Aachen Large and Sparse Small and Dense Outline Problem with Direct Methods Iterative

More information

arxiv: v1 [hep-lat] 19 Jul 2009

arxiv: v1 [hep-lat] 19 Jul 2009 arxiv:0907.3261v1 [hep-lat] 19 Jul 2009 Application of preconditioned block BiCGGR to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD Abstract H. Tadano a,b, Y. Kuramashi c,b, T.

More information

IDR(s) A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations

IDR(s) A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations IDR(s) A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations Harrachov 2007 Peter Sonneveld en Martin van Gijzen August 24, 2007 1 Delft University of Technology

More information

Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES

Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra Department of Electrical Engineering and Computer Science University

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 3: Iterative Methods PD

More information

Comparison of Fixed Point Methods and Krylov Subspace Methods Solving Convection-Diffusion Equations

Comparison of Fixed Point Methods and Krylov Subspace Methods Solving Convection-Diffusion Equations American Journal of Computational Mathematics, 5, 5, 3-6 Published Online June 5 in SciRes. http://www.scirp.org/journal/ajcm http://dx.doi.org/.436/ajcm.5.5 Comparison of Fixed Point Methods and Krylov

More information

arxiv: v1 [hep-lat] 2 May 2012

arxiv: v1 [hep-lat] 2 May 2012 A CG Method for Multiple Right Hand Sides and Multiple Shifts in Lattice QCD Calculations arxiv:1205.0359v1 [hep-lat] 2 May 2012 Fachbereich C, Mathematik und Naturwissenschaften, Bergische Universität

More information

Key words. linear equations, eigenvalues, polynomial preconditioning, GMRES, GMRES-DR, deflation, QCD

Key words. linear equations, eigenvalues, polynomial preconditioning, GMRES, GMRES-DR, deflation, QCD 1 POLYNOMIAL PRECONDITIONED GMRES AND GMRES-DR QUAN LIU, RONALD B. MORGAN, AND WALTER WILCOX Abstract. We look at solving large nonsymmetric systems of linear equations using polynomial preconditioned

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

C. Vuik 1 R. Nabben 2 J.M. Tang 1

C. Vuik 1 R. Nabben 2 J.M. Tang 1 Deflation acceleration of block ILU preconditioned Krylov methods C. Vuik 1 R. Nabben 2 J.M. Tang 1 1 Delft University of Technology J.M. Burgerscentrum 2 Technical University Berlin Institut für Mathematik

More information

1. Introduction. Krylov subspace methods are used extensively for the iterative solution of linear systems of equations Ax = b.

1. Introduction. Krylov subspace methods are used extensively for the iterative solution of linear systems of equations Ax = b. SIAM J. SCI. COMPUT. Vol. 31, No. 2, pp. 1035 1062 c 2008 Society for Industrial and Applied Mathematics IDR(s): A FAMILY OF SIMPLE AND FAST ALGORITHMS FOR SOLVING LARGE NONSYMMETRIC SYSTEMS OF LINEAR

More information

Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems Iterative Methods for Sparse Linear Systems Luca Bergamaschi e-mail: berga@dmsa.unipd.it - http://www.dmsa.unipd.it/ berga Department of Mathematical Methods and Models for Scientific Applications University

More information

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems JAMES H. MONEY and QIANG YE UNIVERSITY OF KENTUCKY eigifp is a MATLAB program for computing a few extreme eigenvalues

More information

SCALABLE HYBRID SPARSE LINEAR SOLVERS

SCALABLE HYBRID SPARSE LINEAR SOLVERS The Pennsylvania State University The Graduate School Department of Computer Science and Engineering SCALABLE HYBRID SPARSE LINEAR SOLVERS A Thesis in Computer Science and Engineering by Keita Teranishi

More information

c 2008 Society for Industrial and Applied Mathematics

c 2008 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 30, No. 4, pp. 1483 1499 c 2008 Society for Industrial and Applied Mathematics HOW TO MAKE SIMPLER GMRES AND GCR MORE STABLE PAVEL JIRÁNEK, MIROSLAV ROZLOŽNÍK, AND MARTIN

More information

Error Bounds for Iterative Refinement in Three Precisions

Error Bounds for Iterative Refinement in Three Precisions Error Bounds for Iterative Refinement in Three Precisions Erin C. Carson, New York University Nicholas J. Higham, University of Manchester SIAM Annual Meeting Portland, Oregon July 13, 018 Hardware Support

More information

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps.

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps. Conjugate Gradient algorithm Need: A symmetric positive definite; Cost: 1 matrix-vector product per step; Storage: fixed, independent of number of steps. The CG method minimizes the A norm of the error,

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information