B. Reps 1, P. Ghysels 2, O. Schenk 4, K. Meerbergen 3,5, W. Vanroose 1,3. IHP 2015, Paris FRx

Size: px
Start display at page:

Download "B. Reps 1, P. Ghysels 2, O. Schenk 4, K. Meerbergen 3,5, W. Vanroose 1,3. IHP 2015, Paris FRx"

Transcription

1 Communication Avoiding and Hiding in preconditioned Krylov solvers 1 Applied Mathematics, University of Antwerp, Belgium 2 Future Technology Group, Berkeley Lab, USA 3 Intel ExaScience Lab, Belgium 4 USI, Lugano, CH 5 KU Leuven, Belgium B. Reps 1, P. Ghysels 2, O. Schenk 4, K. Meerbergen 3,5, W. Vanroose 1,3 IHP 2015, Paris FRx

2 Overview Arithmetic Intensity in Krylov Arithmetic Intensity in Multigrid Pipelining communication and computation in Krylov

3 GMRES, classical Gram-Schmidt 1: r 0 b Ax 0 2: v 0 r 0 / r 0 2 3: for i = 0,..., m 1 do 4: w Av i 5: for j = 0,..., i do 6: h j,i w, v j 7: end for 8: ṽ i+1 w i j=1 h j,iv j 9: h i+1,i ṽ i : v i+1 ṽ i+1 /h i+1,i 11: {apply Givens rotations to h :,i } 12: end for 13: y m argmin H m+1,m y m r 0 2 e : x x 0 + V m y m Sparse Matrix-Vector product Only communication with neighbors Good scaling Dot-product Global communication Scales as log(p) Scalar vector multiplication, vector-vector addition No communication

4 Part I Arithmetic Intensity and Sparse Matrix vector product

5 Performance of Kernel of dependent SpMV SIMD: SSE AVX mm512 on Xeon Phi. Similar with NEON on ARM. Bandwidth is the bottleneck and will remain even with 3D stacked memory. attainable GFLOP/sec peak memory BW with vectorization peak floating-point 1 1/8 1/4 1/ Arithmetic Intensity FLOP/Byte Arithmetic intensity (Flop/byte) S. Williams, A. Waterman and D. Patterson (2008)

6 Arithmetic intensity of k dependent SpMVs 1 SpMV k SpMVs k SpMVs in place flops 2n nz 2k n nz 2k n nz words moved n nz + 2n kn nz + 2kn n nz + 2n q 2 2 2k J. Demmel, CS on Communication-Avoiding algorithms M. Hoemmen, Communication-avoiding Krylov subspace methods. (2010).

7 Increasing the arithmetic intensity Divide the domain in tiles which fit in the cache Ground surface is loaded in cache and reused k times Redundant work at the tile edges s+1 u s+1 i,j 3 s u s i 1,j u s i,j u s i,j+1 u s i+1,j s 2 1 u s i,j B 0 B P. Ghysels, P. Klosiewicz, and W. Vanroose. Improving the arithmetic intensity of multigrid with the help of polynomial smoothers. Num. Lin. Alg. Appl. 19 (2012):

8 Stencil Compilers. (polytope model) Pluto Automatic loop parallelization, Locality optimizations based on the polyhedral model, add OpenMP pragma s around the outer loop, ivdep and vector always. compilers (guided) auto-vectorization. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan, Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model, Int. Conf. Compiler Construction (ETAPS CC), Apr 2008, Budapest, Hungary.

9 Increasing the arithmetic intensity with a stencil compiler (Pluto) naive naive, no-vec pluto pluto, no-vec peak roofline fitted roofline naive naive, no-vec pluto pluto, no-vec peak roofline fitted roofline GFlop/s 64 GFlop/s arithmetic intensity (flop/byte) Dual socket Intel Sandy Bridge, 16 2 threads arithmetic intensity (flop/byte) Intel Xeon Phi, 61 4 threads

10 Krylov methods Classical Krylov K k (A, v) = span{v, Av, A 2 v,..., A k 1 v} the residual and error can then be written as r (k) = P k (A)r (0) (1)

11 Krylov methods Classical Krylov K k (A, v) = span{v, Av, A 2 v,..., A k 1 v} the residual and error can then be written as r (k) = P k (A)r (0) (1) Polynomial Preconditioning P m (A)x = q(a)ax = q(a)b K k (P m (A), v) = span{v, P m (A)v, P m (A) 2 v,..., P m (A) k 1 v} where P m (A) = (A σ 1 )(A σ 2 )... (A σ m ) is a fixed low-order polynomial r (k) = Q k (P m (A))r (0) (2)

12 Incomplete list of the literature on polynomial preconditioning Y. Saad, Iterative methods for sparse linear systems Chapter 12, SIAM (2003). D. O Leary, Yet another polynomials preconditioner for the Conjugate Gradient Algorithm, Lin. Alg. Appl. (1991) p377 A. van der Ploeg, Preconditioning for sparse matrices with applications, (1994) M. Van Gijzen, A polynomial preconditioner for the GMRES algorithm J. Comp. Appl. Math 59 (1995): A. Basermann, B. Reichel, C. Schelthoff, Preconditioned CG methods for sparse matrices on massively parallel machines, 23 (1997),

13 Convergence of CG with different short Polynomials

14 Time to solution is reduced

15 Recursive calculation of w k+1 = p k+1 (A)v 1: σ 1 = θ/δ 2: ρ 0 = 1/σ 1 3: w 0 = 0, w 1 = 1 θ Av 4: w 1 = w 1 w 0 5: for k=1,... do 6: ρ k = 1/(2σ 1 ρ k 1 ) 7: w k+1 = ρ k [ 2 δ A(v w k) + ρ k 1 w k ] 8: w k+1 = w k + w k+1 9: end for Mangled with Pluto to raise arithmetic intensity.

16 Average cost for each matvec averaged time for each matvec total matvec dot-product axpy order of polynomial 2D finite difference poisson problem on i7-2860qm

17 attainable GFLOP/sec peak memory BW with vectorization peak floating-point 1 1/8 1/4 1/ Arithmetic Intensity FLOP/Byte Arithmetic intensity (Flop/byte) S. Williams, A. Waterman and D. Patterson (2008)

18 Relying on compiler autovectorization averaged time for each matvec total matvec dot-product axpy total SIMD matvec SIMD order of polynomial Intel i7-2860qm

19 Part II Arithmetic Intensity in Multigrid

20 Arithmetic Intensity Multigrid where MG = (I I2h h 2h (...)Ih A)S ν 1, (3) S is the smoother, for example ω-jacobi where S = I ωd 1 A, The interpolation I2h h and restriction I 2h h. These are all low arithmetic intensity.

21 Multigrid Cost Model Increasing the number of smoothing steps per level 1 60 convergence rate # V-cycles smoothing steps smoothing steps MG iterations = log(10 8 )/ log(ρ(v-cycle)) where ρ(v-cycle) is the spectral radius of the V-cycle, can be rounded to higher integer

22 Work Unit Cost model Work Unit cost model ignores memory bandwidth 1 WU = smoother cost = O(n) (9ν + 19)( ) log(tol) log(ρ) WU (9ν + 19) 4 log(tol) 3 log(ρ) WU # V-cycles work units smoothing steps smoothing steps

23 avg perf (Gflop/s) peak BW, peak Flops no SIMD stream BW # smoothing steps Attainable GFlop/sec avg cost (s/gflop) peak BW, peak Flops no SIMD stream BW # smoothing steps Average cost in sec/gflop ν 1 ω-jac = flops (ν 1 ω-jac) roof (q(ν 1 ω-jac)) ν 1flops (ω-jac) roof (ν 1 q 1 (ω-jac)) (4)

24 Roofline-based vs naive cost model roofline model naive model work units # smoothing steps

25 Timings for 2D (Sandy Bridge) time (s) naive pluto naive model roofline model smoothing steps P. Ghysels and W. Vanroose, Modelling the performance of geometric multigrid on many-core computer architectures, Exascience.com techreport Sept, 2013,

26 3d results (Sandy Bridge)

27 Current Krylov Solvers Krylov Solver User Matvec User provides a matvec routine and selects Krylov method at command line. No opportunities to increase arithmetic intensity.

28 Current Krylov Solvers Krylov Solver Future Krylov Solvers User provides a stencil routine. User Matvec Stencil compiler increases arithmetic intensity. Krylov Solver User provides a matvec routine and selects Krylov method at command line. No opportunities to increase arithmetic intensity. Stencil Compiler (Pochoir, Pluto,...) W.Vanroose, P. Ghysels, D. Roose and K. Meerbergen, Position paper at DOE Exascale Mathematics workshop User Stencil

29 Part III Pipelining communication and computations

30 GMRES, classical Gram-Schmidt 1: r 0 := b Ax 0 2: v 0 := r 0 / r 0 2 3: for i = 0,..., m 1 do 4: w := Av i 5: for j = 0,..., i do 6: h j,i := w, v j 7: end for 8: ṽ i+1 := w i j=1 h j,iv j 9: h i+1,i := ṽ i : v i+1 := ṽ i+1 /h i+1,i 11: {apply Givens rotations to h :,i } 12: end for 13: y m := argmin H m+1,m y m r 0 2 e : x := x 0 + V m y m Sparse Matrix-Vector product Only communication with neighbors Good scaling but BW limited. Dot-product Global communication Scales as log(p) Scalar vector multiplication, vector-vector addition No communication

31 GMRES vs Pipelined GMRES iteration on 4 nodes Classical GMRES p4 p3 + + p2 p SpMV local dot reduction b-cast Gram-Schmidt axpy norm reduction b-castscale H update Normalization Pipelined GMRES p4 p3 + p2 p1 + + reduction bcastscale axpy H update correction local dot P. Ghysels, T. Ashby, K. Meerbergen and W. Vanroose Hiding global communication latency in the GMRES algorithm on massively parallel machines. SIAM J. Scientific Computing, 35(1):C48C71, (2013).

32 Better Scaling x1000 iterations/s GMRES cgs GMRES mgs l 1 -GMRES p 1 -GMRES p(1)-gmres p(2)-gmres p(3)-gmres x1000 nodes Prediction of a strong scaling experiment for GMRES on XT4 part of Cray Jaguar.

33 Are there similar opportunities in preconditioned Conjugate Gradients? Preconditioned Conjugate Gradient 1: r 0 := b Ax 0 ; u 0 := M 1 r 0 ; p 0 := u 0 2: for i = 0,..., m 1 do 3: s := Ap i 4: α := r i, u i / s, p i 5: x i+1 := x i + αp i 6: r i+1 := r i αs 7: u i+1 := M 1 r i+1 8: β := r i+1, u i+1 / r i, u i 9: p i+1 := u i+1 + βp i 10: end for

34 Chronopoulos/Gear CG Only one global reduction each iteration. 1: r 0 := b Ax 0 ; u 0 := M 1 r 0 ; w 0 := Au 0 2: α 0 := r 0, u 0 / w 0, u 0 ; β := 0; γ 0 := r 0, u 0 3: for i = 0,..., m 1 do 4: p i := u i + β i p i 1 5: s i := w i + β i s i 1 6: x i+1 := x i + αp i 7: r i+1 := r i αs i 8: u i+1 := M 1 r i+1 9: w i+1 := Au i+1 10: γ i+1 := r i+1, u i+1 11: δ := w i+1, u i+1 12: β i+1 := γ i+1 /γ i 13: α i+1 := γ i+1 /(δ β i+1 γ i+1 /α i ) 14: end for

35 pipelined Chronopoulos/Gear CG Global reduction overlaps with matrix vector product. 1: r 0 := b Ax 0 ; w 0 := Au 0 2: for i = 0,..., m 1 do 3: γ i := r i, r i 4: δ := w i, r i 5: q i := Aw i 6: if i > 0 then 7: β i := γ i /γ i 1 ; α i := γ i /(δ β i γ i /α i 1 ) 8: else 9: β i := 0; α i := γ i /δ 10: end if 11: z i := q i + β i z i 1 12: s i := w i + β i s i 1 13: p i := r i + β i p i 1 14: x i+1 := x i + α i p i 15: r i+1 := r i α i s i 16: w i+1 := w i α i z i 17: end for

36 Preconditioned pipelined CG 1: r 0 := b Ax 0 ; u 0 := M 1 r 0 ; w 0 := Au 0 2: for i = 0,... do 3: γ i := r i, u i 4: δ := w i, u i 5: m i := M 1 w i 6: n i := Am i 7: if i > 0 then 8: β i := γ i /γ i 1 ; α i := γ i / (δ β i γ i /α i 1 ) 9: else 10: β i := 0; α i := γ i /δ 11: end if 12: z i := n i + β i z i 1 13: q i := m i + β i q i 1 14: s i := w i + β i s i 1 15: p i := u i + β i p i 1 16: x i+1 := x i + α i p i 17: r i+1 := r i α i s i 18: u i+1 := u i α i q i 19: w i+1 := w i α i z i 20: end for

37 Preconditioned pipelined CR 1: r 0 := b Ax 0 ; u 0 := M 1 r 0 ; w 0 := Au 0 2: for i = 0,... do 3: m i := M 1 w i 4: γ i := w i, u i 5: δ := m i, w i 6: n i := Am i 7: if i > 0 then 8: β i := γ i /γ i 1 ; α i := γ i / (δ β i γ i /α i 1 ) 9: else 10: β i := 0; α i := γ i /δ 11: end if 12: z i := n i + β i z i 1 13: q i := m i + β i q i 1 14: p i := u i + β i p i 1 15: x i+1 := x i + α i p i 16: u i+1 := u i α i q i 17: w i+1 := w i α i z i 18: end for

38 Cost Model G := time for a global reduction SpMV := time for a sparse-matrix vector product PC := time for preconditioner application local work such as AXPY is neglected flops time (excl, axpys, dots) #glob syncs memory CG 10 2G + SpMV + PC 2 4 Chro/Gea 12 G + SpMV + PC 1 5 CR 12 2G + SpMV + PC 2 5 pipe-cg 20 max(g, SpMV + PC) 1 9 pipe-cr 16 max(g, SpMV) + PC 1 7 Gropp-CG 14 max(g, SpMV) + max(g, PC) 2 6 P. Ghysels and W. Vanroose, Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm, Parallel Computing, 40, (2014), Pages

39 Better Scaling Hydrostatic ice sheet flow, Q1 finite elements line search Newton method (rtol = 10 8, atol = ) CG with block Jacobi ICC(0) precond (rtol = 10 5, atol = ) Measured speedup over standard CG for different variations of pipelined CG.

40 MPI trace

41 Pipelined methods in PETSc (from 3.4.2) Krylov methods: KSPPIPECR, KSPPIPECG, KSPGROPPCG, KSPPGMRES Uses MPI-3 non-blocking collectives export MPICH ASYNC PROGRES=1 1:... 2: KSP PCApply(ksp,W,M); /* m Bw */ 3: if (i > 0 && ksp normtype == KSP NORM PRECONDITIONED) 4: VecNormBegin(U,NORM 2,&dp); 5: VecDotBegin(W,U,&gamma); 6: VecDotBegin(M,W,&delta); 7: PetscCommSplitReductionBegin(PetscObjectComm((PetscObject)U)); 8: KSP MatMult(ksp,Amat,M,N); /* n Am */ 9: if (i > 0 && ksp normtype == KSP NORM PRECONDITIONED) 10: VecNormEnd(U,NORM 2,&dp); 11: VecDotEnd(W,U,&gamma); 12: VecDotEnd(M,W,&delta); 13:...

42 Krylov subspace methods with additional global reductions Deflation: Remove a few known annoying eigenvectors. Helmholtz. FETI methods.... Augmenting: adds a subspace to the Krylov subspace, e.g. recycling. Newton-Krylov methods. Numerical Continuation. Coarse Solver in multigrid.... Sn := Kn(A, v) + U (5) where U forms a basis for U. x n = x 0 + V n y n + Uu n

43 Deflation with eigenvectors with smallest eigenvalues Smallest eigenvalues and vectors: A [w 1, w 2,..., w m ] = [λ 1 w 1, λ 2 w 2,..., λ m w m ] + ɛ[θ] where λ 1 < λ 2 < λ 3 <... < λ m <... and Θ 1. W := [w 1, w 2,..., w m ] Correction step e = W (W T AW ) 1 W T r

44 Deflated CG 1: r 1 := b Ax 1 2: x 0 := x 1 + W (W T AW ) 1 W T r 1 3: r 0 := b Ax 0 4: p 0 := r 0 W (W T AW ) 1 W T Ar 0 5: for i = 0,... do 6: s := Ap i 7: α i := r i, r i / s, p i 8: x i+1 := x i + α i p i 9: r i+1 := r i α i s 10: β i := r i+1, r i+1 / r i, r i 11: w := Ar i+1 12: σ := W, w 13: p i+1 := r i+1 + β i p i W (W T AW ) 1 σ 14: end for x 0 such that W T r 0 = 0 with r 0 = b Ax 0 (cfr init-cg) σ = W, w = W, Ar i+1 = AW, r i+1 : store AW? CG on H T AH x = H T b with H = I W (W T AW ) 1 (AW ) T

45 Pipelined Deflation CG

46 Selective Deflation

47

48

49

50 Conclusions Two communication bottlenecks: limited BW in Av. No benefit from SIMD. Latencies in (u, v). Poor strong scaling.

51 Conclusions Two communication bottlenecks: limited BW in Av. No benefit from SIMD. Latencies in (u, v). Poor strong scaling. Replace Av with a P m (A)v with a fixed coefficients. Execute with a stencil compiler. Combine with Multigrid. Many open question for unstructured matrices.

52 Conclusions Two communication bottlenecks: limited BW in Av. No benefit from SIMD. Latencies in (u, v). Poor strong scaling. Replace Av with a P m (A)v with a fixed coefficients. Execute with a stencil compiler. Combine with Multigrid. Many open question for unstructured matrices. Pipelining of Krylov algorithms reduce the number of global reductions and hide their latencies. They can scale as the Sparse Matrix-Vector product. The pipelined GMRES and CG code is in PETSc-current. Extension to Deflated and Augmented Krylov methods. Algorithms with additional global reductions.

53 Conclusions Two communication bottlenecks: limited BW in Av. No benefit from SIMD. Latencies in (u, v). Poor strong scaling. Replace Av with a P m (A)v with a fixed coefficients. Execute with a stencil compiler. Combine with Multigrid. Many open question for unstructured matrices. Pipelining of Krylov algorithms reduce the number of global reductions and hide their latencies. They can scale as the Sparse Matrix-Vector product. The pipelined GMRES and CG code is in PETSc-current. Extension to Deflated and Augmented Krylov methods. Algorithms with additional global reductions. Acknowledgements: EU s Seventh Framework Programme (FP7/ ) under grant agreement n Intel and the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT).

EXASCALE COMPUTING. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. P. Ghysels W.

EXASCALE COMPUTING. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. P. Ghysels W. ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm P. Ghysels W. Vanroose Report 12.2012.1, December 2012 This

More information

Communication Avoiding Strategies for the Numerical Kernels in Coupled Physics Simulations

Communication Avoiding Strategies for the Numerical Kernels in Coupled Physics Simulations ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Communication Avoiding Strategies for the Numerical Kernels in Coupled Physics Simulations SIAM Conference on Parallel Processing for Scientific Computing

More information

Scalable Non-blocking Preconditioned Conjugate Gradient Methods

Scalable Non-blocking Preconditioned Conjugate Gradient Methods Scalable Non-blocking Preconditioned Conjugate Gradient Methods Paul Eller and William Gropp University of Illinois at Urbana-Champaign Department of Computer Science Supercomputing 16 Paul Eller and William

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods Efficient Deflation for Communication-Avoiding Krylov Subspace Methods Erin Carson Nicholas Knight, James Demmel Univ. of California, Berkeley Monday, June 24, NASCA 2013, Calais, France Overview We derive

More information

MS 28: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods I

MS 28: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods I MS 28: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods I Organizers: Siegfried Cools, University of Antwerp, Belgium Erin C. Carson, New York University, USA 10:50-11:10 High Performance

More information

Finite-choice algorithm optimization in Conjugate Gradients

Finite-choice algorithm optimization in Conjugate Gradients Finite-choice algorithm optimization in Conjugate Gradients Jack Dongarra and Victor Eijkhout January 2003 Abstract We present computational aspects of mathematically equivalent implementations of the

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

Lecture 17: Iterative Methods and Sparse Linear Algebra

Lecture 17: Iterative Methods and Sparse Linear Algebra Lecture 17: Iterative Methods and Sparse Linear Algebra David Bindel 25 Mar 2014 Logistics HW 3 extended to Wednesday after break HW 4 should come out Monday after break Still need project description

More information

Algebraic Multigrid as Solvers and as Preconditioner

Algebraic Multigrid as Solvers and as Preconditioner Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven

More information

Solving Ax = b, an overview. Program

Solving Ax = b, an overview. Program Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview

More information

High performance Krylov subspace method variants

High performance Krylov subspace method variants High performance Krylov subspace method variants and their behavior in finite precision Erin Carson New York University HPCSE17, May 24, 2017 Collaborators Miroslav Rozložník Institute of Computer Science,

More information

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline

- Part 4 - Multicore and Manycore Technology: Chances and Challenges. Vincent Heuveline - Part 4 - Multicore and Manycore Technology: Chances and Challenges Vincent Heuveline 1 Numerical Simulation of Tropical Cyclones Goal oriented adaptivity for tropical cyclones ~10⁴km ~1500km ~100km 2

More information

Communication-avoiding Krylov subspace methods

Communication-avoiding Krylov subspace methods Motivation Communication-avoiding Krylov subspace methods Mark mhoemmen@cs.berkeley.edu University of California Berkeley EECS MS Numerical Libraries Group visit: 28 April 2008 Overview Motivation Current

More information

9.1 Preconditioned Krylov Subspace Methods

9.1 Preconditioned Krylov Subspace Methods Chapter 9 PRECONDITIONING 9.1 Preconditioned Krylov Subspace Methods 9.2 Preconditioned Conjugate Gradient 9.3 Preconditioned Generalized Minimal Residual 9.4 Relaxation Method Preconditioners 9.5 Incomplete

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

Iterative Methods for Linear Systems of Equations

Iterative Methods for Linear Systems of Equations Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology Overview day 4 Bi-Lanczos method

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

Lecture 18 Classical Iterative Methods

Lecture 18 Classical Iterative Methods Lecture 18 Classical Iterative Methods MIT 18.335J / 6.337J Introduction to Numerical Methods Per-Olof Persson November 14, 2006 1 Iterative Methods for Linear Systems Direct methods for solving Ax = b,

More information

1 Conjugate gradients

1 Conjugate gradients Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems P.-O. Persson and J. Peraire Massachusetts Institute of Technology 2006 AIAA Aerospace Sciences Meeting, Reno, Nevada January 9,

More information

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................

More information

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Some Geometric and Algebraic Aspects of Domain Decomposition Methods Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology, USA SPPEXA Symposium TU München,

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

FAS and Solver Performance

FAS and Solver Performance FAS and Solver Performance Matthew Knepley Mathematics and Computer Science Division Argonne National Laboratory Fall AMS Central Section Meeting Chicago, IL Oct 05 06, 2007 M. Knepley (ANL) FAS AMS 07

More information

Deflation for inversion with multiple right-hand sides in QCD

Deflation for inversion with multiple right-hand sides in QCD Deflation for inversion with multiple right-hand sides in QCD A Stathopoulos 1, A M Abdel-Rehim 1 and K Orginos 2 1 Department of Computer Science, College of William and Mary, Williamsburg, VA 23187 2

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

M.A. Botchev. September 5, 2014

M.A. Botchev. September 5, 2014 Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev

More information

Stabilization and Acceleration of Algebraic Multigrid Method

Stabilization and Acceleration of Algebraic Multigrid Method Stabilization and Acceleration of Algebraic Multigrid Method Recursive Projection Algorithm A. Jemcov J.P. Maruszewski Fluent Inc. October 24, 2006 Outline 1 Need for Algorithm Stabilization and Acceleration

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Pierre Jolivet, F. Hecht, F. Nataf, C. Prud homme Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann INRIA Rocquencourt

More information

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization

More information

Implementation of a preconditioned eigensolver using Hypre

Implementation of a preconditioned eigensolver using Hypre Implementation of a preconditioned eigensolver using Hypre Andrew V. Knyazev 1, and Merico E. Argentati 1 1 Department of Mathematics, University of Colorado at Denver, USA SUMMARY This paper describes

More information

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially

More information

Parallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen

Parallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen Parallel Iterative Methods for Sparse Linear Systems Lehrstuhl für Hochleistungsrechnen www.sc.rwth-aachen.de RWTH Aachen Large and Sparse Small and Dense Outline Problem with Direct Methods Iterative

More information

Some minimization problems

Some minimization problems Week 13: Wednesday, Nov 14 Some minimization problems Last time, we sketched the following two-step strategy for approximating the solution to linear systems via Krylov subspaces: 1. Build a sequence of

More information

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners Eugene Vecharynski 1 Andrew Knyazev 2 1 Department of Computer Science and Engineering University of Minnesota 2 Department

More information

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

Algorithms and Methods for Fast Model Predictive Control

Algorithms and Methods for Fast Model Predictive Control Algorithms and Methods for Fast Model Predictive Control Technical University of Denmark Department of Applied Mathematics and Computer Science 13 April 2016 Background: Model Predictive Control Model

More information

EXASCALE COMPUTING. Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing

EXASCALE COMPUTING. Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing ExaScience Lab Intel Labs Europe EXASCALE COMPUTING Implementation of a 2D Electrostatic Particle in Cell algorithm in UniÞed Parallel C with dynamic load-balancing B. Verleye P. Henry R. Wuyts G. Lapenta

More information

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation Tao Zhao 1, Feng-Nan Hwang 2 and Xiao-Chuan Cai 3 Abstract In this paper, we develop an overlapping domain decomposition

More information

Bootstrap AMG. Kailai Xu. July 12, Stanford University

Bootstrap AMG. Kailai Xu. July 12, Stanford University Bootstrap AMG Kailai Xu Stanford University July 12, 2017 AMG Components A general AMG algorithm consists of the following components. A hierarchy of levels. A smoother. A prolongation. A restriction.

More information

A decade of fast and robust Helmholtz solvers

A decade of fast and robust Helmholtz solvers A decade of fast and robust Helmholtz solvers Werkgemeenschap Scientific Computing Spring meeting Kees Vuik May 11th, 212 1 Delft University of Technology Contents Introduction Preconditioning (22-28)

More information

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1 Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations

More information

Spectral analysis of complex shifted-laplace preconditioners for the Helmholtz equation

Spectral analysis of complex shifted-laplace preconditioners for the Helmholtz equation Spectral analysis of complex shifted-laplace preconditioners for the Helmholtz equation C. Vuik, Y.A. Erlangga, M.B. van Gijzen, and C.W. Oosterlee Delft Institute of Applied Mathematics c.vuik@tudelft.nl

More information

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give

More information

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES ITERATIVE METHODS BASED ON KRYLOV SUBSPACES LONG CHEN We shall present iterative methods for solving linear algebraic equation Au = b based on Krylov subspaces We derive conjugate gradient (CG) method

More information

Adaptive algebraic multigrid methods in lattice computations

Adaptive algebraic multigrid methods in lattice computations Adaptive algebraic multigrid methods in lattice computations Karsten Kahl Bergische Universität Wuppertal January 8, 2009 Acknowledgements Matthias Bolten, University of Wuppertal Achi Brandt, Weizmann

More information

Review: From problem to parallel algorithm

Review: From problem to parallel algorithm Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:

More information

Parallel sparse linear solvers and applications in CFD

Parallel sparse linear solvers and applications in CFD Parallel sparse linear solvers and applications in CFD Jocelyne Erhel Joint work with Désiré Nuentsa Wakam () and Baptiste Poirriez () SAGE team, Inria Rennes, France journée Calcul Intensif Distribué

More information

Arnoldi Methods in SLEPc

Arnoldi Methods in SLEPc Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,

More information

HPMPC - A new software package with efficient solvers for Model Predictive Control

HPMPC - A new software package with efficient solvers for Model Predictive Control - A new software package with efficient solvers for Model Predictive Control Technical University of Denmark CITIES Second General Consortium Meeting, DTU, Lyngby Campus, 26-27 May 2015 Introduction Model

More information

Multigrid absolute value preconditioning

Multigrid absolute value preconditioning Multigrid absolute value preconditioning Eugene Vecharynski 1 Andrew Knyazev 2 (speaker) 1 Department of Computer Science and Engineering University of Minnesota 2 Department of Mathematical and Statistical

More information

Notes on PCG for Sparse Linear Systems

Notes on PCG for Sparse Linear Systems Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization

Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization Martin Takáč The University of Edinburgh Based on: P. Richtárik and M. Takáč. Iteration complexity of randomized

More information

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES 48 Arnoldi Iteration, Krylov Subspaces and GMRES We start with the problem of using a similarity transformation to convert an n n matrix A to upper Hessenberg form H, ie, A = QHQ, (30) with an appropriate

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

Recent advances in HPC with FreeFem++

Recent advances in HPC with FreeFem++ Recent advances in HPC with FreeFem++ Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann Fourth workshop on FreeFem++ December 6, 2012 With F. Hecht, F. Nataf, C. Prud homme. Outline

More information

Variants of BiCGSafe method using shadow three-term recurrence

Variants of BiCGSafe method using shadow three-term recurrence Variants of BiCGSafe method using shadow three-term recurrence Seiji Fujino, Takashi Sekimoto Research Institute for Information Technology, Kyushu University Ryoyu Systems Co. Ltd. (Kyushu University)

More information

Krylov Space Solvers

Krylov Space Solvers Seminar for Applied Mathematics ETH Zurich International Symposium on Frontiers of Computational Science Nagoya, 12/13 Dec. 2005 Sparse Matrices Large sparse linear systems of equations or large sparse

More information

IDR(s) A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations

IDR(s) A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations IDR(s) A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations Harrachov 2007 Peter Sonneveld en Martin van Gijzen August 24, 2007 1 Delft University of Technology

More information

Parallel Preconditioning Methods for Ill-conditioned Problems

Parallel Preconditioning Methods for Ill-conditioned Problems Parallel Preconditioning Methods for Ill-conditioned Problems Kengo Nakajima Information Technology Center, The University of Tokyo 2014 Conference on Advanced Topics and Auto Tuning in High Performance

More information

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computations ªaw University of Technology Institute of Mathematics and Computer Science Warsaw, October 7, 2006

More information

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters -- Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer

More information

Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools

Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Xiangjun Wu (1),Lilun Zhang (2),Junqiang Song (2) and Dehui Chen (1) (1) Center for Numerical Weather Prediction, CMA (2) School

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A. AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative

More information

Preconditioned Locally Minimal Residual Method for Computing Interior Eigenpairs of Symmetric Operators

Preconditioned Locally Minimal Residual Method for Computing Interior Eigenpairs of Symmetric Operators Preconditioned Locally Minimal Residual Method for Computing Interior Eigenpairs of Symmetric Operators Eugene Vecharynski 1 Andrew Knyazev 2 1 Department of Computer Science and Engineering University

More information

An Accelerated Block-Parallel Newton Method via Overlapped Partitioning

An Accelerated Block-Parallel Newton Method via Overlapped Partitioning An Accelerated Block-Parallel Newton Method via Overlapped Partitioning Yurong Chen Lab. of Parallel Computing, Institute of Software, CAS (http://www.rdcps.ac.cn/~ychen/english.htm) Summary. This paper

More information

Sparse linear solvers: iterative methods and preconditioning

Sparse linear solvers: iterative methods and preconditioning Sparse linear solvers: iterative methods and preconditioning L Grigori ALPINES INRIA and LJLL, UPMC March 2017 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Krylov subspace

More information

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems JAMES H. MONEY and QIANG YE UNIVERSITY OF KENTUCKY eigifp is a MATLAB program for computing a few extreme eigenvalues

More information

Error Bounds for Iterative Refinement in Three Precisions

Error Bounds for Iterative Refinement in Three Precisions Error Bounds for Iterative Refinement in Three Precisions Erin C. Carson, New York University Nicholas J. Higham, University of Manchester SIAM Annual Meeting Portland, Oregon July 13, 018 Hardware Support

More information

A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations

A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations Akira IMAKURA Center for Computational Sciences, University of Tsukuba Joint work with

More information

A communication-avoiding thick-restart Lanczos method on a distributed-memory system

A communication-avoiding thick-restart Lanczos method on a distributed-memory system A communication-avoiding thick-restart Lanczos method on a distributed-memory system Ichitaro Yamazaki and Kesheng Wu Lawrence Berkeley National Laboratory, Berkeley, CA, USA Abstract. The Thick-Restart

More information

A Parallel Scalable PETSc-Based Jacobi-Davidson Polynomial Eigensolver with Application in Quantum Dot Simulation

A Parallel Scalable PETSc-Based Jacobi-Davidson Polynomial Eigensolver with Application in Quantum Dot Simulation A Parallel Scalable PETSc-Based Jacobi-Davidson Polynomial Eigensolver with Application in Quantum Dot Simulation Zih-Hao Wei 1, Feng-Nan Hwang 1, Tsung-Ming Huang 2, and Weichung Wang 3 1 Department of

More information

Multigrid Methods and their application in CFD

Multigrid Methods and their application in CFD Multigrid Methods and their application in CFD Michael Wurst TU München 16.06.2009 1 Multigrid Methods Definition Multigrid (MG) methods in numerical analysis are a group of algorithms for solving differential

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method direct and indirect methods positive definite linear systems Krylov sequence spectral analysis of Krylov sequence preconditioning Prof. S. Boyd, EE364b, Stanford University Three

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

A Balancing Algorithm for Mortar Methods

A Balancing Algorithm for Mortar Methods A Balancing Algorithm for Mortar Methods Dan Stefanica Baruch College, City University of New York, NY 11, USA Dan Stefanica@baruch.cuny.edu Summary. The balancing methods are hybrid nonoverlapping Schwarz

More information

IDR(s) Master s thesis Goushani Kisoensingh. Supervisor: Gerard L.G. Sleijpen Department of Mathematics Universiteit Utrecht

IDR(s) Master s thesis Goushani Kisoensingh. Supervisor: Gerard L.G. Sleijpen Department of Mathematics Universiteit Utrecht IDR(s) Master s thesis Goushani Kisoensingh Supervisor: Gerard L.G. Sleijpen Department of Mathematics Universiteit Utrecht Contents 1 Introduction 2 2 The background of Bi-CGSTAB 3 3 IDR(s) 4 3.1 IDR.............................................

More information

Lab 1: Iterative Methods for Solving Linear Systems

Lab 1: Iterative Methods for Solving Linear Systems Lab 1: Iterative Methods for Solving Linear Systems January 22, 2017 Introduction Many real world applications require the solution to very large and sparse linear systems where direct methods such as

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

Lecture 8: Fast Linear Solvers (Part 7)

Lecture 8: Fast Linear Solvers (Part 7) Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi

More information

Case Study: Quantum Chromodynamics

Case Study: Quantum Chromodynamics Case Study: Quantum Chromodynamics Michael Clark Harvard University with R. Babich, K. Barros, R. Brower, J. Chen and C. Rebbi Outline Primer to QCD QCD on a GPU Mixed Precision Solvers Multigrid solver

More information

Kasetsart University Workshop. Multigrid methods: An introduction

Kasetsart University Workshop. Multigrid methods: An introduction Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available

More information

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Computers and Mathematics with Applications

Computers and Mathematics with Applications Computers and Mathematics with Applications 68 (2014) 1151 1160 Contents lists available at ScienceDirect Computers and Mathematics with Applications journal homepage: www.elsevier.com/locate/camwa A GPU

More information

arxiv: v1 [hep-lat] 28 Aug 2014 Karthee Sivalingam

arxiv: v1 [hep-lat] 28 Aug 2014 Karthee Sivalingam Clover Action for Blue Gene-Q and Iterative solvers for DWF arxiv:18.687v1 [hep-lat] 8 Aug 1 Department of Meteorology, University of Reading, Reading, UK E-mail: K.Sivalingam@reading.ac.uk P.A. Boyle

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information