Communication Avoiding Strategies for the Numerical Kernels in Coupled Physics Simulations

Size: px
Start display at page:

Download "Communication Avoiding Strategies for the Numerical Kernels in Coupled Physics Simulations"

Transcription

1 ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Communication Avoiding Strategies for the Numerical Kernels in Coupled Physics Simulations SIAM Conference on Parallel Processing for Scientific Computing Savannah, Georgia, Feb 15-17, 2012 P. Ghysels 1,2, T. Ashby 2,3, K. Meerbergen 4, W. Vanroose 1 1 University of Antwerp - Dept Math & CS, Belgium (pieter.ghysels@ua.ac.be) 2 Intel ExaScience Lab, Leuven, Belgium, 3 Imec vzw, Leuven, Belgium, 4 KU Leuven, - Dept CS, Leuven, Belgium

2 Outline 1 Introduction 2 GMRES Overview 3 Avoiding the reduction for normalization Shifted matrix-vector product for stability 4 Pipelined GMRES p 1 -GMRES Deeper pipelining 5 Numerical results Changing the Krylov basis 6 Performance model Predicted scaling 7 Discussion and Outlook 2/35

3 Introduction Increasing gap between computation and communication costs Floating point performance steadily increases Network latencies only go down marginally Memory latencies decline slowly Avoid communication by trading communication for computation Hide latency of communications Generalized Minimal Residual (GMRES) method Dot-products are latency dominated Dot-products can not be overlapped by other work in standard GMRES All other operations (stencil/axpy) scale well 3/35

4 Global reductions and variability Load imbalance, OS jitter, hardware variability... Global communication expensive global synchronization Influence of OS noise on (left) CHiC cluster and (right) Jaguar XT4 Latency [microseconds] with noise noiseless Latency [microseconds] with noise noiseless k 4k 16k 128k 1M # Processes k 4k 16k 128k 1M # Processes Thanks to T. Hoefler, see T. Hoefler, T. Schneider and A. Lumsdaine (2010) 4/35

5 GMRES, a short overview Problem: find x from Ax = b Initial guess x (0) and initial residual r (0) = Ax (0) b Find an approximate solution x k x 0 K k (r (0), A), with K k (r (0), A) = span{r (0), Ar (0), A 2 r (0),..., A k 1 r (0) } Orthonormal Krylov basis V k = [v 0, v 1,..., v k 1 ], V T k V k = I k Create new base vector z k = Av k 1 using matrix-vector product Make z k orthogonal to all previous v 0,..., v k 1 classical/modified Gram-Schmidt or Householder orthogonalization Satisfy the Arnoldi recurrence relation AV i = V i1 H i1,i Find x k = x 0 V k y k with y k K k that gives minimal residual Transform H from upper Hessenberg to upper triangular form to solve least squares problem 5/35

6 GMRES, classical Gram-Schmidt 1: r 0 b Ax 0 2: v 0 r 0/ r 0 2 3: for i = 0,..., m 1 do 4: w Av i 5: for j = 0,..., i do 6: h j,i w, v j 7: end for 8: ṽ i1 w i j=1 h j,iv j 9: h i1,i ṽ i1 2 10: v i1 ṽ i1 /h i1,i 11: {apply Givens rotations to h :,i } 12: end for 13: y m argmin H m1,my m r 0 2e 1 2 Sparse Matrix-Vector product Only communication with neighbors Good scaling Dot-product Global communication Scales as log(p) Scalar vector multiplication, vector-vector addition No communication 14: x x 0 V my m 6/35

7 GMRES iteration on 4 nodes p4 p3 p2 p1 SpMV local dot reduction b-cast Gram-Schmidt axpy norm reduction b-castscale H update Normalization Communication for SpMV can be overlapped with stencil calculations Typically a binomial tree reduction with height log 2 (P) Processor idling during all reduce latency (not to scale) For modified Gram-Schmidt, (i1) global reductions 7/35

8 Avoiding reduction for normalization Can the extra reduction for the normalization be avoided? p4 p3 p2 p1 SpMV local dot reduction b-cast Gram-Schmidt axpy norm reduction b-castscale H update Normalization 8/35

9 Avoiding reduction for normalization Given an orthonormal basis for the Krylov space K i1 (A, v 0) V i1 := [v 0, v 1,..., v i 1, v i ], which satisfies the Arnoldi relation AV i = V i1 H i1,i 9/35

10 Avoiding reduction for normalization Given an orthonormal basis for the Krylov space K i1 (A, v 0) V i1 := [v 0, v 1,..., v i 1, v i ], which satisfies the Arnoldi relation AV i = V i1 H i1,i, then the steps 1: z i1 = Av i 2: Compute z i1, v j, for j = 0,..., i and z i1 3: h j,i = z i1, v j, for j = 0,..., i 4: h i1,i = 5: v i1 = z i1 2 i j=0 z i1, v j 2 ( z i1 i j=0 z i1, v j v j ) /h i1,i expand the basis to V i2 and the Hessenberg matrix to H i2,i1. 9/35

11 Avoiding reduction for normalization Given an orthonormal basis for the Krylov space K i1 (A, v 0) V i1 := [v 0, v 1,..., v i 1, v i ], which satisfies the Arnoldi relation AV i = V i1 H i1,i, then the steps 1: z i1 = Av i 2: Compute z i1, v j, for j = 0,..., i and z i1 3: h j,i = z i1, v j, for j = 0,..., i 4: h i1,i = 5: v i1 = z i1 2 i j=0 z i1, v j 2 ( z i1 i j=0 z i1, v j v j ) /h i1,i expand the basis to V i2 and the Hessenberg matrix to H i2,i1. Dot-products in line 2 can be combined in a single reduction Line 4 can lead to numerical instability Line 4 can lead to -breakdown 9/35

12 Avoiding reduction for normalization, Shifted matrix-vector product to improve stability Given an orthonormal basis for the Krylov space K i1 (A, v 0) V i1 := [v 0, v 1,..., v i 1, v i ], with AV i = V i1 H i1,i and the shifts σ i C, then the steps 1: z i1 = (A σ i I ) v i 2: Compute z i1, v j, for j = 0,..., i and z i1 3: h j,i = z i1, v j, for j = 0,..., i 1 4: h i,i = z i1, v i σ i 5: h i1,i = 6: v i1 = z i1 2 i j=0 z i1, v j 2 ( z i1 i j=0 z i1, v j v j ) /h i1,i expand the basis to V i2 and the Hessenberg matrix to H i2,i1. 10/35

13 Avoiding reduction for normalization, Shifted matrix-vector product to improve stability Given an orthonormal basis for the Krylov space K i1 (A, v 0) V i1 := [v 0, v 1,..., v i 1, v i ], with AV i = V i1 H i1,i and the shifts σ i C, then the steps 1: z i1 = (A σ i I ) v i 2: Compute z i1, v j, for j = 0,..., i and z i1 3: h j,i = z i1, v j, for j = 0,..., i 1 4: h i,i = z i1, v i σ i 5: h i1,i = 6: v i1 = z i1 2 i j=0 z i1, v j 2 ( z i1 i j=0 z i1, v j v j ) /h i1,i expand the basis to V i2 and the Hessenberg matrix to H i2,i1. If still -breakdown occurs: re-orthogonalize or restart 10/35

14 Avoiding reduction for normalization Only a single reduction per iteration Both reductions combined into 1 Now reduces i 1 floating point numbers i.s.o i p4 p3 p2 p1 SpMV local dot reduction b-cast axpy Gram-Schmidt normalization H update 11/35

15 Overlapping dot-products and SpMV Given the bases related by z 0 = v 0 and V i := [v 0, v 1,..., v i 1 ] Z i1 := [z 0, z 1,..., z i 1, z i ], z j = (A σi ) v j 1, (σ C) then the following steps 1: Compute z i, v j, for j = 0,..., i 1 and z i 2: w = Az i 3: h j,i 1 = z i, v j, for j = 0,..., i 2 4: h i 1,i 1 = z i, v i 1 σ 5: h i,i 1 = z i 2 i 1 j=0 z i, v j 2 ( 6: v i = z i ) i 1 j=0 z i, v j v j /h i,i 1 ( 7: z i1 = w ) i 1 j=0 h j,i 1z j1 /h i,i 1 expand the bases to V i1 and Z i1 and H i,i 1 to H i1,i. 12/35

16 Overlapping dot-products and SpMV Dot-products are computed in line 1, but first used in line 3, after the SpMV in line 2 Sliding window Add new vector z i1 to Z i1 at one end, using SpMV Add orthogonal vector v i to V i at other end, using dot-products Proof Starting from the Arnoldi relation Multiplying with (A σi ) with Az i already calculated in line 2 v i = (Av i 1 i 1 j=0 h j,i 1v j )/h i,i 1 z i1 = (Az i i 1 j=0 h j,i 1z j1 )/h i,i 1 13/35

17 p(1)-gmres p4 p3 p2 p1 reduction bcastscale axpy H update correction local dot Overlap dot-product global communication latency with SpMV Some additional flops compared to standard GMRES correction on the figure Still waiting if reduction takes longer than SpMV 14/35

18 Deeper pipelining What if a global all-reduce takes longer than a matrix-vector product? V i l1 = [v 0, v 1,..., v i l ] Z i1 = [z 0, z 1,..., z i l, z i l1,..., z i ] }{{} l v 0, j = 0, i z j = P j (A)v 0, 0 < j < l, with P i (t) = (t σ j ) P l (A)v j l, j l j=0 Sliding window now l iterations long All shifts zero, z j = A l v j l, called the monomial basis like power method, convergence to dominant eigenvector 15/35

19 The change of basis matrix The Z i basis satisfies a similar Arnoldi relation AZ i = Z i1 B i1,i with the upper Hessenberg change of basis matrix matrix B i1,i See also σ 0 1 B i1,i = M. Hoemmen (2010) σl 1 1 h 0,0... h 0,i l h 1, h i1 l,i l E. Carson, N. Knight and J. Demmel (2011) 16/35

20 The Z k basis can be made orthonormal by a QR factorization Z k = V k G k The last column of G k, i.e. g j,k 1 with j = 0,..., k 1 can be calculated using only the dot-products { z k 1, v j, j k l 1 and the elements of G k 1. z k 1, z j, k l 1 < j k 1 V k l = [ {}}{ v 0, v 1,..., v k l 1 ] Z k = [z 0, z 1,..., z k l 1, z k l,..., z k 2 }{{}, z k 1] ( g j,k 1 = z k 1, v j = z k 1, z j j 1 m=0 g m,jg m,i )/g j,j, g j,j = z j, z j j 1 m=0 g m,j 2 17/35

21 Extending the Arnoldi recurrence Using the Arnoldi recurrence relation and the QR factorization AV k = V k1 H k1,k Z k = V k G k do an incremental update of Arnoldi s Hessenberg matrix H k1,k = V T k1av k = V T k1az k G 1 k = V T k1z k1 B k1,k G 1 k = G k1 B k1,k G 1 k =... [ Hk,k 1 (G k b :,k g :,k1 b k1,k H k,k 1 g :,k )g 1 ] k,k = 0 g k1,k1 b k1,k g 1 k,k 18/35

22 1: r 0 b Ax 0 ; v 0 r 0/ r 0 ; z 0 v 0 2: for i =..., m l do 0, { 3: w (A σ ii )z i, i < l Az i, i l 4: a i l 5: if a 0 then 6: g j,a1 (g j,a1 j 1 k=0 g k,jg k,a1 )/g j,j, j = a l 2,..., a 7: g a1,a1 g a1,a1 a k=0 g k,a1 2 8: # Check for breakdown and restart or re-orthogonalize if necessary 9: if a < l then 10: h j,a (g j,a1 σ ag j,a a 1 k=0 h j,kg k,a )/g a,a, j = 0,..., a 11: h a1,a g a1,a1/g a,a 12: else 13: h j,a ( a1 l k=0 g j,kl h k,a l a 1 k=j 1 h j,kg k,a )/g a,a, j = 0,..., a 14: h a1,a g a1,a1h a1 l,a l /g a,a 15: end if 16: v a (z a a 1 j=0 gj,avj)/ga,a 17: z i1 (w a 1 j=0 hj,a 1z jl)/h a,a 1 18: end if { 19: g j,i1 z i1, v j, j = 0,..., a z i1, z j, j = a 1,..., i 1 20: end for 21: y m argmin H m1,my m r 0 e : x x 0 V my m

23 Random lower bidiagonal matrix A R with A i,i = 1 p i and A i1,i = q i p i and q i uniformly distributed on [0, 1] κ 2(A) 5 unsymmetric, nonnegative, not normal, positive definite and diagonally dominant r k e-06 1e-08 1e-10 1e-12 1e-14 Monomial basis, σ i = 0 1e iterations gmres c-gs gmres m-gs l 1 -gmres p 1 -gmres p(1)-gmres p(2)-gmres p(3)-gmres p(4)-gmres 20/35

24 Changing the basis for stability Newton basis Take l Ritz values as shifts for p(l)-gmres Chebyshev basis Chebyshev polynomial, minimal over an ellipse surrounding the spectrum Three term recurrence z i1 = σ [ i 2 σ i1 c (d A)z i σ ] i 1 z i 1 σ i When all eigenvalues are real, zeros of Chebyshev polynomial are ( ) ( ) σ i = λ minλ max λmax λmin (2i1)π cos, for i = 0,..., l l Complex shifts can be avoided for real matrices z j1 = (A R(λ j )I ) z j z j2 = (A R(λ j )I ) z j1 I(λ j ) 2 z j ( (A λj I )(A λ j I )z j ) Put shifts in (modified) Leja ordering to maximize their internal distances M. Hoemmen, PhD thesis (2010) 21/35

25 r k e-06 1e-08 1e-10 1e-12 1e-14 Newton basis 1e iterations gmres c-gs gmres m-gs l 1 -gmres p 1 -gmres p(1)-gmres p(2)-gmres p(3)-gmres p(4)-gmres r k e-06 1e-08 1e-10 1e-12 1e-14 Random bidiagonal matrix Chebyshev basis 1e iterations gmres c-gs gmres m-gs l 1 -gmres p 1 -gmres p(1)-gmres p(2)-gmres p(3)-gmres p(4)-gmres Eigenvalues are randomly distributed between 1 and 2 Compute l Ritz values (in [1, 2]) Compute zeros of the degree l Chebyshev polynomial minimal over [1, 2] Residual for p(l)-gmres is delayed l steps Restart after 30 iterations, re-fill pipeline, again l steps delay 22/35

26 r k 2 Finite difference stencil, PDE900 real, unsymmetric, matrix with κ(a) point stencil for linear elliptic equation with Dirichlet boundary conditions on a regular grid e-06 1e-08 1e-10 1e-12 Monomial basis gmres c-gs gmres m-gs l 1 -gmres p 1 -gmres p(1)-gmres p(2)-gmres p(3)-gmres p(4)-gmres r k e-06 1e-08 1e-10 1e-12 Newton basis gmres c-gs gmres m-gs l 1 -gmres p 1 -gmres p(1)-gmres p(2)-gmres p(3)-gmres p(4)-gmres 1e-14 1e iterations iterations Longer pipelining depth more frequent restarts 23/35

27 Oil reservoir simulation (ORSIRR1) real, unsymmetric, matrix, 6858 non-zeros and κ(a) 100 Matrix Market, generated from oil reservoir simulation r k 2 1e Monomial basis iterations gmres c-gs gmres m-gs l 1 -gmres p 1 -gmres p(1)-gmres p(2)-gmres p(3)-gmres p(4)-gmres r k 2 1e Newton basis iterations gmres c-gs gmres m-gs l 1 -gmres p 1 -gmres p(1)-gmres p(2)-gmres p(3)-gmres p(4)-gmres 24/35

28 Performance model In total N unknowns, P c cores, P n nodes A single message of size m takes T (m) = t s mt w latency t s and inverse bandwidth t w SpMV communication cost: send 6 faces of 3D cubic grid T spmv = 6(t s (N/P n) 2/3 t w ) All-reduce uses binomial tree (also for b-cast) T red = 2 log 2 (P n) (t s mt w ) Flops for dot-product 2N/P c log 2 (P n) Matrix has n z non-zeros 2n z flops per SpMV Single flop takes t c take half of peak flop rate 25/35

29 Counting flops and messages p4 p3 p2 p1 T SpMV 2iN/P c t c 2(i1)N/P c t c N/P c t c 2n z N/P c t c log 2 (P n ) (t s it w it c ) log 2 (P n ) (t s it w it c ) 26/35

30 GMRES classic Gram-Schmidt T calc = t c((2n z 4i 3)N/P c (i 1) log 2 (P n) ) T red = 2 log 2 (P n) (t s it w ) 2 log }{{} 2 (P n) (t s t w ) }{{} orthogonalization Normalization Costfunctions for iteration i 27/35

31 GMRES classic Gram-Schmidt T calc = t c((2n z 4i 3)N/P c (i 1) log 2 (P n) ) T red = 2 log 2 (P n) (t s it w ) 2 log 2 (P n) (t s t w ) GMRES modified Gram-Schmidt T calc = t c((2n z 4i 3)N/P c (i 1) log 2 (P n) ) T red = 2(i 1) log 2 (P n) (t s t w ) Costfunctions for iteration i 27/35

32 GMRES classic Gram-Schmidt T calc = t c((2n z 4i 3)N/P c (i 1) log 2 (P n) ) T red = 2 log 2 (P n) (t s it w ) 2 log 2 (P n) (t s t w ) GMRES modified Gram-Schmidt T calc = t c((2n z 4i 3)N/P c (i 1) log 2 (P n) ) T red = 2(i 1) log 2 (P n) (t s t w ) l 1 -GMRES T calc = t c((2n z 4i 4)N/P c (i 1) log 2 (P n) ) T red = 2 log 2 (P n) (t s (i 1)t w ) Costfunctions for iteration i 27/35

33 GMRES classic Gram-Schmidt T calc = t c((2n z 4i 3)N/P c (i 1) log 2 (P n) ) T red = 2 log 2 (P n) (t s it w ) 2 log 2 (P n) (t s t w ) GMRES modified Gram-Schmidt T calc = t c((2n z 4i 3)N/P c (i 1) log 2 (P n) ) T red = 2(i 1) log 2 (P n) (t s t w ) l 1 -GMRES T calc = t c((2n z 4i 4)N/P c (i 1) log 2 (P n) ) T red = 2 log 2 (P n) (t s (i 1)t w ) p(l)-gmres Costfunctions for iteration i T calc = t c((2n z 6i 4l 4)N/P c (i 1) log 2 (P n) ) Reduction can be overlapped with other calculations and communications total time reduction {}}{ T red = 2 log 2 (P n) (t s (i 1)t w ) total time reduction {}}{ min(l (t c(2n z 4(i l))n/p c T spmv), 2 log }{{} 2 (P n) (t s (i 1)t w )) local calc and comm for l its 27/35

34 Counting flops and messages p4 p3 p2 p1 T red For the reduction, take only the part that is not overlapped by other work 28/35

35 Machine parameters Machine parameters t s, t w and t c for 5 large scale machines Machine (peak) t s (µs) t w (µs) Procs (Pc/Pn) t c (s) CHiC Cluster (11.2 TFlop/s) Opteron 2218 (2x2) SGI Altix 4700 (13.1 TFlop/s) Itanium II (2x2) Jugene, BG/P, CNK (825.5 TFlop/s) PowerPC 450 (4) Intrepid, BG/P, ZeptoOS (458.6 TFlop/s) PowerPC 450 (4) Jaguar, Cray XT-4 (1.38 PFlop/s) Opteron 1354 (4) Lynx, Intel Exa Leuven (4.3 TFlop/s) Xeon X5660 (2x6) Does not consider network topology In BlueGene/P there is a reduction network T. Hoefler, T. Schneider and A. Lumsdaine (2010) T. Hoefler, T. Mehlan, A. Lumsdaine and W. Rehm (2007) Parameters from LogGOPS model are measured using the tool netgauge 29/35

36 Predicted scaling up to 200, 000 nodes Prediction of a strong scaling experiment on XT4 part of Cray Jaguar N = = unknowns x1000 iterations/s GMRES cgs GMRES mgs l 1 -GMRES p 1 -GMRES p(1)-gmres p(2)-gmres p(3)-gmres x1000 nodes 30/35

37 Predicted timings on 200, 000 nodes global all-reduce matrix-vector communication calculations time per iteration (sec) p(3) p(2) p(1) l p1 1 cgs CHiC p(3) p(2) p(1) l p1 1 cgs Altix p(3) p(2) p(1) l p1 1 cgs CNK p(3) p(2) p(1) l p1 1 cgs ZeptoOS p(3) p(2) p(1) l p1 1 cgs XT4 p(3) p(2) p(1) l p1 1 cgs Lynx Calculation part grows slightly, due to redundant computations Global reductions can be overlapped Larger benefits for long latencies and fast processors 31/35

38 Discussion Non-blocking collective operations Proposed for MPI-3 standard MPICH2 from svn already has support MPI Iallreduce uses MPI Isend/recv, call MPI Test or MPI Wait libnbc, see T. Hoefler and A. Lumsdaine (2006) Call blocking reduce from dedicated thread Some network cards support reductions in hardware (f.i. BlueGene/P) 32/35

39 Discussion Non-blocking collective operations Proposed for MPI-3 standard MPICH2 from svn already has support MPI Iallreduce uses MPI Isend/recv, call MPI Test or MPI Wait libnbc, see T. Hoefler and A. Lumsdaine (2006) Call blocking reduce from dedicated thread Some network cards support reductions in hardware (f.i. BlueGene/P) Preconditioning Pipelining can be combined with preconditioning Preconditioner makes SpMV more costly shorter pipelines 32/35

40 Discussion What if classical Gram-Schmidt is not good enough? Check for orthogonality of the basis and reorthogonalize if necessary Reorthogonalization can also be pipelined Increases pipeline depth? 33/35

41 Discussion What if classical Gram-Schmidt is not good enough? Check for orthogonality of the basis and reorthogonalize if necessary Reorthogonalization can also be pipelined Increases pipeline depth? Comparison with s-step GMRES s-step GMRES improves arithmetic intensity via dense QR (TSQR) and matrix-powers kernel s-step GMRES hard to combine with preconditioning p(l)-gmres easy to combine with preconditioning For p(l)-gmres, small l will be enough, good for stability In s-step GMRES, s should be as large as possible pipelining avoids bursts of communication M. Hoemmen, PhD thesis (2010) 33/35

42 Conclusions & Outlook Conclusions Presented GMRES variations where dot-product latency can be overlapped... at the expense of some redundant flops Studied performance/scaling with a simple analytical model Pipelining can already help to hide expensive global synchronization costs 34/35

43 Conclusions & Outlook Conclusions Presented GMRES variations where dot-product latency can be overlapped... at the expense of some redundant flops Studied performance/scaling with a simple analytical model Pipelining can already help to hide expensive global synchronization costs Outlook Efficient implementations, benchmarking Pipelining for short term recurrence methods CG, BiCGStab,... Less redundant work Fault tolerance? 34/35

44 References E. Carson, N. Knight and J. Demmel (2011) T. Hoefler, T. Schneider and A. Lumsdaine (2010) T. Hoefler, T. Mehlan, A. Lumsdaine and W. Rehm (2007) M. Hoemmen, PhD thesis (2010) 35/35

B. Reps 1, P. Ghysels 2, O. Schenk 4, K. Meerbergen 3,5, W. Vanroose 1,3. IHP 2015, Paris FRx

B. Reps 1, P. Ghysels 2, O. Schenk 4, K. Meerbergen 3,5, W. Vanroose 1,3. IHP 2015, Paris FRx Communication Avoiding and Hiding in preconditioned Krylov solvers 1 Applied Mathematics, University of Antwerp, Belgium 2 Future Technology Group, Berkeley Lab, USA 3 Intel ExaScience Lab, Belgium 4 USI,

More information

EXASCALE COMPUTING. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. P. Ghysels W.

EXASCALE COMPUTING. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. P. Ghysels W. ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm P. Ghysels W. Vanroose Report 12.2012.1, December 2012 This

More information

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods Efficient Deflation for Communication-Avoiding Krylov Subspace Methods Erin Carson Nicholas Knight, James Demmel Univ. of California, Berkeley Monday, June 24, NASCA 2013, Calais, France Overview We derive

More information

Scalable Non-blocking Preconditioned Conjugate Gradient Methods

Scalable Non-blocking Preconditioned Conjugate Gradient Methods Scalable Non-blocking Preconditioned Conjugate Gradient Methods Paul Eller and William Gropp University of Illinois at Urbana-Champaign Department of Computer Science Supercomputing 16 Paul Eller and William

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

Communication-avoiding Krylov subspace methods

Communication-avoiding Krylov subspace methods Motivation Communication-avoiding Krylov subspace methods Mark mhoemmen@cs.berkeley.edu University of California Berkeley EECS MS Numerical Libraries Group visit: 28 April 2008 Overview Motivation Current

More information

A communication-avoiding thick-restart Lanczos method on a distributed-memory system

A communication-avoiding thick-restart Lanczos method on a distributed-memory system A communication-avoiding thick-restart Lanczos method on a distributed-memory system Ichitaro Yamazaki and Kesheng Wu Lawrence Berkeley National Laboratory, Berkeley, CA, USA Abstract. The Thick-Restart

More information

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1 Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations

More information

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization

More information

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES 48 Arnoldi Iteration, Krylov Subspaces and GMRES We start with the problem of using a similarity transformation to convert an n n matrix A to upper Hessenberg form H, ie, A = QHQ, (30) with an appropriate

More information

MS 28: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods I

MS 28: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods I MS 28: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods I Organizers: Siegfried Cools, University of Antwerp, Belgium Erin C. Carson, New York University, USA 10:50-11:10 High Performance

More information

Lecture 8: Fast Linear Solvers (Part 7)

Lecture 8: Fast Linear Solvers (Part 7) Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

Communication avoiding parallel algorithms for dense matrix factorizations

Communication avoiding parallel algorithms for dense matrix factorizations Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013

More information

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue

More information

FEM and sparse linear system solving

FEM and sparse linear system solving FEM & sparse linear system solving, Lecture 9, Nov 19, 2017 1/36 Lecture 9, Nov 17, 2017: Krylov space methods http://people.inf.ethz.ch/arbenz/fem17 Peter Arbenz Computer Science Department, ETH Zürich

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

M.A. Botchev. September 5, 2014

M.A. Botchev. September 5, 2014 Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev

More information

Optimal Left and Right Additive Schwarz Preconditioning for Minimal Residual Methods with Euclidean and Energy Norms

Optimal Left and Right Additive Schwarz Preconditioning for Minimal Residual Methods with Euclidean and Energy Norms Optimal Left and Right Additive Schwarz Preconditioning for Minimal Residual Methods with Euclidean and Energy Norms Marcus Sarkis Worcester Polytechnic Inst., Mass. and IMPA, Rio de Janeiro and Daniel

More information

Algorithms that use the Arnoldi Basis

Algorithms that use the Arnoldi Basis AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Arnoldi Methods Dianne P. O Leary c 2006, 2007 Algorithms that use the Arnoldi Basis Reference: Chapter 6 of Saad The Arnoldi Basis How to

More information

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give

More information

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially

More information

Finite-choice algorithm optimization in Conjugate Gradients

Finite-choice algorithm optimization in Conjugate Gradients Finite-choice algorithm optimization in Conjugate Gradients Jack Dongarra and Victor Eijkhout January 2003 Abstract We present computational aspects of mathematically equivalent implementations of the

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

MS4: Minimizing Communication in Numerical Algorithms Part I of II

MS4: Minimizing Communication in Numerical Algorithms Part I of II MS4: Minimizing Communication in Numerical Algorithms Part I of II Organizers: Oded Schwartz (Hebrew University of Jerusalem) and Erin Carson (New York University) Talks: 1. Communication-Avoiding Krylov

More information

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization General Tools for Solving Large Eigen-Problems

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method direct and indirect methods positive definite linear systems Krylov sequence spectral analysis of Krylov sequence preconditioning Prof. S. Boyd, EE364b, Stanford University Three

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

LARGE SPARSE EIGENVALUE PROBLEMS

LARGE SPARSE EIGENVALUE PROBLEMS LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization 14-1 General Tools for Solving Large Eigen-Problems

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

Preconditioned inverse iteration and shift-invert Arnoldi method

Preconditioned inverse iteration and shift-invert Arnoldi method Preconditioned inverse iteration and shift-invert Arnoldi method Melina Freitag Department of Mathematical Sciences University of Bath CSC Seminar Max-Planck-Institute for Dynamics of Complex Technical

More information

Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems Iterative Methods for Sparse Linear Systems Luca Bergamaschi e-mail: berga@dmsa.unipd.it - http://www.dmsa.unipd.it/ berga Department of Mathematical Methods and Models for Scientific Applications University

More information

A DISSERTATION. Extensions of the Conjugate Residual Method. by Tomohiro Sogabe. Presented to

A DISSERTATION. Extensions of the Conjugate Residual Method. by Tomohiro Sogabe. Presented to A DISSERTATION Extensions of the Conjugate Residual Method ( ) by Tomohiro Sogabe Presented to Department of Applied Physics, The University of Tokyo Contents 1 Introduction 1 2 Krylov subspace methods

More information

Iterative methods for Linear System

Iterative methods for Linear System Iterative methods for Linear System JASS 2009 Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline Basics: Matrices and their properties Eigenvalues, Condition Number Iterative Methods Direct and

More information

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts

More information

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method Leslie Foster 11-5-2012 We will discuss the FOM (full orthogonalization method), CG,

More information

High performance Krylov subspace method variants

High performance Krylov subspace method variants High performance Krylov subspace method variants and their behavior in finite precision Erin Carson New York University HPCSE17, May 24, 2017 Collaborators Miroslav Rozložník Institute of Computer Science,

More information

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology, USA SPPEXA Symposium TU München,

More information

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Masami Takata 1, Hiroyuki Ishigami 2, Kini Kimura 2, and Yoshimasa Nakamura 2 1 Academic Group of Information

More information

Lab 1: Iterative Methods for Solving Linear Systems

Lab 1: Iterative Methods for Solving Linear Systems Lab 1: Iterative Methods for Solving Linear Systems January 22, 2017 Introduction Many real world applications require the solution to very large and sparse linear systems where direct methods such as

More information

2.5D algorithms for distributed-memory computing

2.5D algorithms for distributed-memory computing ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster

More information

Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES

Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra Department of Electrical Engineering and Computer Science University

More information

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated. Math 504, Homework 5 Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated 1 Find the eigenvalues and the associated eigenspaces

More information

Arnoldi Methods in SLEPc

Arnoldi Methods in SLEPc Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization

More information

Inexactness and flexibility in linear Krylov solvers

Inexactness and flexibility in linear Krylov solvers Inexactness and flexibility in linear Krylov solvers Luc Giraud ENSEEIHT (N7) - IRIT, Toulouse Matrix Analysis and Applications CIRM Luminy - October 15-19, 2007 in honor of Gérard Meurant for his 60 th

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Alternative correction equations in the Jacobi-Davidson method

Alternative correction equations in the Jacobi-Davidson method Chapter 2 Alternative correction equations in the Jacobi-Davidson method Menno Genseberger and Gerard Sleijpen Abstract The correction equation in the Jacobi-Davidson method is effective in a subspace

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT 10-14 On the Convergence of GMRES with Invariant-Subspace Deflation M.C. Yeung, J.M. Tang, and C. Vuik ISSN 1389-6520 Reports of the Delft Institute of Applied Mathematics

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR POLYNOMIAL PRECONDITIONED BICGSTAB AND IDR JENNIFER A. LOE AND RONALD B. MORGAN Abstract. Polynomial preconditioning is applied to the nonsymmetric Lanczos methods BiCGStab and IDR for solving large nonsymmetric

More information

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems Mario Arioli m.arioli@rl.ac.uk CCLRC-Rutherford Appleton Laboratory with Daniel Ruiz (E.N.S.E.E.I.H.T)

More information

Index. for generalized eigenvalue problem, butterfly form, 211

Index. for generalized eigenvalue problem, butterfly form, 211 Index ad hoc shifts, 165 aggressive early deflation, 205 207 algebraic multiplicity, 35 algebraic Riccati equation, 100 Arnoldi process, 372 block, 418 Hamiltonian skew symmetric, 420 implicitly restarted,

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large

More information

APPLIED NUMERICAL LINEAR ALGEBRA

APPLIED NUMERICAL LINEAR ALGEBRA APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation

More information

Algebraic Multigrid as Solvers and as Preconditioner

Algebraic Multigrid as Solvers and as Preconditioner Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven

More information

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17 Krylov Space Methods Nonstationary sounds good Radu Trîmbiţaş Babeş-Bolyai University Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17 Introduction These methods are used both to solve

More information

Review: From problem to parallel algorithm

Review: From problem to parallel algorithm Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 19: More on Arnoldi Iteration; Lanczos Iteration Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 17 Outline 1

More information

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign Householder Symposium XX June

More information

On the loss of orthogonality in the Gram-Schmidt orthogonalization process

On the loss of orthogonality in the Gram-Schmidt orthogonalization process CERFACS Technical Report No. TR/PA/03/25 Luc Giraud Julien Langou Miroslav Rozložník On the loss of orthogonality in the Gram-Schmidt orthogonalization process Abstract. In this paper we study numerical

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 9 Minimizing Residual CG

More information

Krylov subspace projection methods

Krylov subspace projection methods I.1.(a) Krylov subspace projection methods Orthogonal projection technique : framework Let A be an n n complex matrix and K be an m-dimensional subspace of C n. An orthogonal projection technique seeks

More information

Notes on PCG for Sparse Linear Systems

Notes on PCG for Sparse Linear Systems Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/

More information

A Jacobi Davidson Method for Nonlinear Eigenproblems

A Jacobi Davidson Method for Nonlinear Eigenproblems A Jacobi Davidson Method for Nonlinear Eigenproblems Heinrich Voss Section of Mathematics, Hamburg University of Technology, D 21071 Hamburg voss @ tu-harburg.de http://www.tu-harburg.de/mat/hp/voss Abstract.

More information

An Asynchronous Algorithm on NetSolve Global Computing System

An Asynchronous Algorithm on NetSolve Global Computing System An Asynchronous Algorithm on NetSolve Global Computing System Nahid Emad S. A. Shahzadeh Fazeli Jack Dongarra March 30, 2004 Abstract The Explicitly Restarted Arnoldi Method (ERAM) allows to find a few

More information

Iterative Methods for Linear Systems of Equations

Iterative Methods for Linear Systems of Equations Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology Overview day 4 Bi-Lanczos method

More information

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A. AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative

More information

Error analysis of the s-step Lanczos method in finite precision

Error analysis of the s-step Lanczos method in finite precision Error analysis of the s-step Lanczos method in finite precision Erin Carson James Demmel Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2014-55

More information

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik 1, Grey Ballard 2, James Demmel 3, Torsten Hoefler 4 (1) Department of Computer Science, University of Illinois

More information

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009) Iterative methods for Linear System of Equations Joint Advanced Student School (JASS-2009) Course #2: Numerical Simulation - from Models to Software Introduction In numerical simulation, Partial Differential

More information

PROJECTED GMRES AND ITS VARIANTS

PROJECTED GMRES AND ITS VARIANTS PROJECTED GMRES AND ITS VARIANTS Reinaldo Astudillo Brígida Molina rastudillo@kuaimare.ciens.ucv.ve bmolina@kuaimare.ciens.ucv.ve Centro de Cálculo Científico y Tecnológico (CCCT), Facultad de Ciencias,

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University Lecture 17 Methods for System of Linear Equations: Part 2 Songting Luo Department of Mathematics Iowa State University MATH 481 Numerical Methods for Differential Equations Songting Luo ( Department of

More information

Charles University Faculty of Mathematics and Physics DOCTORAL THESIS. Krylov subspace approximations in linear algebraic problems

Charles University Faculty of Mathematics and Physics DOCTORAL THESIS. Krylov subspace approximations in linear algebraic problems Charles University Faculty of Mathematics and Physics DOCTORAL THESIS Iveta Hnětynková Krylov subspace approximations in linear algebraic problems Department of Numerical Mathematics Supervisor: Doc. RNDr.

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel Laura Grigori Mark Frederick Hoemmen Julien Langou Electrical Engineering and Computer Sciences University of California at

More information

Krylov Subspace Methods that Are Based on the Minimization of the Residual

Krylov Subspace Methods that Are Based on the Minimization of the Residual Chapter 5 Krylov Subspace Methods that Are Based on the Minimization of the Residual Remark 51 Goal he goal of these methods consists in determining x k x 0 +K k r 0,A such that the corresponding Euclidean

More information

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................

More information

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners Eugene Vecharynski 1 Andrew Knyazev 2 1 Department of Computer Science and Engineering University of Minnesota 2 Department

More information

Math 577 Assignment 7

Math 577 Assignment 7 Math 577 Assignment 7 Thanks for Yu Cao 1. Solution. The linear system being solved is Ax = 0, where A is a (n 1 (n 1 matrix such that 2 1 1 2 1 A =......... 1 2 1 1 2 and x = (U 1, U 2,, U n 1. By the

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 4 Eigenvalue Problems Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering A short course on: Preconditioned Krylov subspace methods Yousef Saad University of Minnesota Dept. of Computer Science and Engineering Universite du Littoral, Jan 19-3, 25 Outline Part 1 Introd., discretization

More information

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,

More information

S-Step and Communication-Avoiding Iterative Methods

S-Step and Communication-Avoiding Iterative Methods S-Step and Communication-Avoiding Iterative Methods Maxim Naumov NVIDIA, 270 San Tomas Expressway, Santa Clara, CA 95050 Abstract In this paper we make an overview of s-step Conjugate Gradient and develop

More information

Communication-Avoiding Krylov Subspace Methods in Theory and Practice. Erin Claire Carson. A dissertation submitted in partial satisfaction of the

Communication-Avoiding Krylov Subspace Methods in Theory and Practice. Erin Claire Carson. A dissertation submitted in partial satisfaction of the Communication-Avoiding Krylov Subspace Methods in Theory and Practice by Erin Claire Carson A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

Approximating the matrix exponential of an advection-diffusion operator using the incomplete orthogonalization method

Approximating the matrix exponential of an advection-diffusion operator using the incomplete orthogonalization method Approximating the matrix exponential of an advection-diffusion operator using the incomplete orthogonalization method Antti Koskela KTH Royal Institute of Technology, Lindstedtvägen 25, 10044 Stockholm,

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

1 Extrapolation: A Hint of Things to Come

1 Extrapolation: A Hint of Things to Come Notes for 2017-03-24 1 Extrapolation: A Hint of Things to Come Stationary iterations are simple. Methods like Jacobi or Gauss-Seidel are easy to program, and it s (relatively) easy to analyze their convergence.

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method The minimization problem We are given a symmetric positive definite matrix R n n and a right hand side vector b R n We want to solve the linear system Find u R n such that

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 16: Reduction to Hessenberg and Tridiagonal Forms; Rayleigh Quotient Iteration Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS YOUSEF SAAD University of Minnesota PWS PUBLISHING COMPANY I(T)P An International Thomson Publishing Company BOSTON ALBANY BONN CINCINNATI DETROIT LONDON MADRID

More information

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis Eigenvalue Problems Eigenvalue problems occur in many areas of science and engineering, such as structural analysis Eigenvalues also important in analyzing numerical methods Theory and algorithms apply

More information

The quadratic eigenvalue problem (QEP) is to find scalars λ and nonzero vectors u satisfying

The quadratic eigenvalue problem (QEP) is to find scalars λ and nonzero vectors u satisfying I.2 Quadratic Eigenvalue Problems 1 Introduction The quadratic eigenvalue problem QEP is to find scalars λ and nonzero vectors u satisfying where Qλx = 0, 1.1 Qλ = λ 2 M + λd + K, M, D and K are given

More information