Solving PDEs on Supercomputers I: modern supercomputer architecture

Size: px
Start display at page:

Download "Solving PDEs on Supercomputers I: modern supercomputer architecture"

Transcription

1 Supercomputer architecture Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS I May 17, / 17

2 Supercomputer architecture Moore s Law Moore s Law The number of transistors per unit area on integrated circuits doubles every two years. (1965) P. E. Farrell (Oxford) SPS I May 17, / 17

3 Supercomputer architecture Moore s Law The consequence Individual computers aren t getting faster: we re getting more of them. P. E. Farrell (Oxford) SPS I May 17, / 17

4 Supercomputer architecture A modern supercomputer In this lecture we will give a brief overview of modern supercomputer architecture. ARCHER is composed of 4920 nodes, each with 24 cores, for a total of 118,080 cores. P. E. Farrell (Oxford) SPS I May 17, / 17

5 Supercomputer architecture A node P. E. Farrell (Oxford) SPS I May 17, / 17

6 Supercomputer architecture A node Algorithmic consequence Extreme pressure on memory and memory bandwidth. P. E. Farrell (Oxford) SPS I May 17, / 17

7 Supercomputer architecture A socket P. E. Farrell (Oxford) SPS I May 17, / 17

8 Supercomputer architecture A socket Algorithmic consequence Want to have multiple cores working on the same data. P. E. Farrell (Oxford) SPS I May 17, / 17

9 Supercomputer architecture A core P. E. Farrell (Oxford) SPS I May 17, / 17

10 Supercomputer architecture A core Algorithmic consequence Vectorisation essential for maximum floating point performance. P. E. Farrell (Oxford) SPS I May 17, / 17

11 Supercomputer architecture Hardware properties Some relative timings On a 3.0 GHz Intel Core 2 Duo E8400: One clock cycle: 1/3 nanoseconds ( 10 light-cm!). Accessing L1 data cache (32 KB): 3 cycles Accessing L2 cache (6 MB): 14 cycles Accessing main memory: 250 cycles Accessing disk: 40 million cycles P. E. Farrell (Oxford) SPS I May 17, / 17

12 Supercomputer architecture Hardware properties Some relative timings On a 3.0 GHz Intel Core 2 Duo E8400: One clock cycle: 1/3 nanoseconds ( 10 light-cm!). Accessing L1 data cache (32 KB): 3 cycles Accessing L2 cache (6 MB): 14 cycles Accessing main memory: 250 cycles Accessing disk: 40 million cycles Analogy Register: the data is on your working paper. L1 cache: the data is on your desk (3 seconds). L2 cache: the data is on your bookshelf (14 seconds). Main memory: the data is in the library (a 4 minute walk). P. E. Farrell (Oxford) SPS I May 17, / 17

13 Supercomputer architecture Hardware properties Some relative timings On a 3.0 GHz Intel Core 2 Duo E8400: One clock cycle: 1/3 nanoseconds ( 10 light-cm!). Accessing L1 data cache (32 KB): 3 cycles Accessing L2 cache (6 MB): 14 cycles Accessing main memory: 250 cycles Accessing disk: 40 million cycles Analogy Register: the data is on your working paper. L1 cache: the data is on your desk (3 seconds). L2 cache: the data is on your bookshelf (14 seconds). Main memory: the data is in the library (a 4 minute walk). Disk: go backpacking for 1.2 years. P. E. Farrell (Oxford) SPS I May 17, / 17

14 Supercomputer architecture Hardware properties The interconnect P. E. Farrell (Oxford) SPS I May 17, / 17

15 Supercomputer architecture Hardware properties Some more timings On the Cray Aries interconnect, to send a message: Within a socket: 800 cycles Within a node: 1600 cycles Across the machine: 8000 cycles P. E. Farrell (Oxford) SPS I May 17, / 17

16 Supercomputer architecture Hardware properties Some more timings On the Cray Aries interconnect, to send a message: Within a socket: 800 cycles Within a node: 1600 cycles Across the machine: 8000 cycles Algorithmic consequence Interleave communication and computation. P. E. Farrell (Oxford) SPS I May 17, / 17

17 MPI and OpenMP Domain decomposition The coarsest level of parallelism used is domain decomposition over MPI. from dolfin import * mesh = UnitCubeMesh(32, 32, 32) partitioning = CellFunction("size_t", mesh) partitioning.set_all(mpi.rank(mpi_comm_world())) File("output/partitioning.xdmf") << partitioning $ mpiexec -n 4 python partition.py P. E. Farrell (Oxford) SPS I May 17, / 17

18 MPI and OpenMP MPI: basic model MPI Separate processes with separate memory spaces communicate via message passing. MPI concepts: communicator collective rank blocking and nonblocking communication reductions Each subdomain is assigned to one MPI rank. P. E. Farrell (Oxford) SPS I May 17, / 17

19 MPI and OpenMP Main communication patterns in finite elements Assembly Assembly requires exchanging halo data with your neighbours. processor 0 core owned exec non-exec halos non-exec exec owned core processor 1 P. E. Farrell (Oxford) SPS I May 17, / 17

20 MPI and OpenMP Main communication patterns in finite elements Krylov solvers Neighbour communications for sparse matrix-vector product. Global reductions (allreduce for dot products) Preconditioner application Multigrid: extremely complicated. P. E. Farrell (Oxford) SPS I May 17, / 17

21 MPI and OpenMP OpenMP: basic model OpenMP Separate threads operate on the same memory space. Less overhead in parallel execution Multiple cores can act on the same data Less pressure on memory and memory bandwidth Easier load balancing Extremely difficult to program correctly Subtle race conditions possible Colouring and locks required to synchronise P. E. Farrell (Oxford) SPS I May 17, / 17

22 MPI and OpenMP DOLFIN can also run in OpenMP mode for assembly: from dolfin import * parameters["num_threads"] = 4 #... solve(f == 0, u) # must use a threaded solver # (e.g. pastix)! You can t use MPI and OpenMP at the same time (yet). P. E. Farrell (Oxford) SPS I May 17, / 17

23 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. P. E. Farrell (Oxford) SPS I May 17, / 17

24 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. P. E. Farrell (Oxford) SPS I May 17, / 17

25 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. Flops are (approximately) free. P. E. Farrell (Oxford) SPS I May 17, / 17

26 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. Flops are (approximately) free. Large stencils induce extra communication. P. E. Farrell (Oxford) SPS I May 17, / 17

27 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. Flops are (approximately) free. Large stencils induce extra communication. Must overlap communication and computation. P. E. Farrell (Oxford) SPS I May 17, / 17

28 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. Flops are (approximately) free. Large stencils induce extra communication. Must overlap communication and computation. Solver algorithms must be O(n) or O(nlogn). P. E. Farrell (Oxford) SPS I May 17, / 17

29 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. Flops are (approximately) free. Large stencils induce extra communication. Must overlap communication and computation. Solver algorithms must be O(n) or O(nlogn). General algorithmic trends Domain-decomposed high-order FE on semi-structured meshes. Multigrid/multilevel solvers with Krylov accelerators. Hybrid parallelism strategies (MPI/OpenMP/AVX). P. E. Farrell (Oxford) SPS I May 17, / 17

30 Solving PDEs on Supercomputers II: practical matters of using supercomputers Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS 2 May 17, / 7

31 Logging on Supercomputers are accessed by sshing to the login nodes. $ ssh mmschpcxx@arcus.oerc.ox.ac.uk You configure your environment with modules: $ module list No Modulefiles Currently Loaded. $ module avail... $ module use -a /data/math-farrellp/crichardson/modules $ module load fenics/1.5.0 $ module list Modules are generally awful, but nothing better exists yet. P. E. Farrell (Oxford) SPS 2 May 17, / 7

32 Running jobs interactively The simplest way to run a job is interactively. This is mainly used for debugging. $ qsub -I -l nodes=1:ppn=16 -l walltime=0:10:00 -q develq qsub: waiting for job headnode1.arcus.osc.local to start # wait until PBS allocates us the resources we asked for... qsub: job headnode1.arcus.osc.local ready $ cd $PBS_O_WORKDIR $ module use -a /data/math-farrellp/crichardson/modules $ module load fenics/1.5.0 $ mpirun $MPI_HOSTS python poisson.py P. E. Farrell (Oxford) SPS 2 May 17, / 7

33 Running jobs in batch mode ARCUS-A and ARCHER are managed using PBS, the Portable Batch System. Users submit jobs to the batch system which decides when and where they get executed. The main PBS commands: qsub qdel qstat The argument to qsub is a PBS script. P. E. Farrell (Oxford) SPS 2 May 17, / 7

34 Running jobs in batch mode #!/bin/bash # set the number of nodes and processes per node #PBS -l nodes=1:ppn=16 # set max wallclock time #PBS -l walltime=1:00:00 # set name of job #PBS -N poisson # mail alert at start, end and abortion of execution #PBS -m bea # send mail to this address #PBS -M patrick.farrell@maths.ox.ac.uk # start job from the directory it was submitted cd $PBS_O_WORKDIR module use -a /data/math-farrellp/crichardson/modules module load fenics/ enable_arcus_mpi.sh mpirun $MPI_HOSTS python poisson.py tee poisson.log P. E. Farrell (Oxford) SPS 2 May 17, / 7

35 HPC 02 Challenge! Investigate the weak scaling of the 2D Poisson solver with parallel LU that you developed last week: Have the code refine the mesh once each time the number of cores quadruples. Hint: size = MPI.size(mpi_comm_world())... for i in nrefine: mesh = refine(mesh, redistribute=false) Run the code on 1, 4 and 16 cores. What happens to the runtime as the problem is scaled weakly?... P. E. Farrell (Oxford) SPS 2 May 17, / 7

36 HPC 02 Challenge! Which components of the solver are taking the longest? Profile the code with DOLFIN timing system: list timings() PETSc timing system: import petsc4py petsc4py.init("-log_summary summary.log".split()) from dolfin import * Now switch to HYPRE algebraic multigrid and compare the timings again. Hint: to get more details about the AMG solve, call PETScOptions.set("pc_hypre_boomeramg_print_statistics", 1) P. E. Farrell (Oxford) SPS 2 May 17, / 7

37 Solving PDEs on Supercomputers III: an introduction to PETSc Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS 3 May 17, / 5

38 PETSc PETSc is a library of linear and nonlinear solvers for sparse PDEs. It has won most awards going: SIAM/ACM Prize in Computational Science and Engineering, R&D Award Gordon Bell Prizes in 2009, 2004, 2003, PETSc makes it easy to express complex hierarchical composed solvers as compactly as possible. P. E. Farrell (Oxford) SPS 3 May 17, / 5

39 Fundamental objects [Vec, Mat, PC, KSP, SNES] Vec Vec represents a dense vector, decomposed in parallel. Example ierr = VecCreateMPI(PETSC COMM WORLD, local, global, &x); ierr = VecDuplicate(x, &y); ierr = VecDotBegin(x, y, &xty); /* other computations */ ierr = VecDotEnd(x, y, &xty); P. E. Farrell (Oxford) SPS 3 May 17, / 5

40 Fundamental objects [Vec, Mat, PC, KSP, SNES] Mat Mat represents a sparse matrix, decomposed in parallel. Example ierr = MatCreateAIJ(PETSC COMM WORLD,..., &mat); for (i = 0; i < local rows; i++) ierr = MatSetValues(mat,...); ierr = MatAssemblyBegin(mat, MAT FINAL ASSEMBLY); ierr = MatAssemblyEnd(mat, MAT FINAL ASSEMBLY); ierr = MatMult(mat, x, y); P. E. Farrell (Oxford) SPS 3 May 17, / 5

41 Fundamental objects [Vec, Mat, PC, KSP, SNES] PC represents a linear preconditioner (Jacobi, Gauss-Seidel, ILU, ICC, AMG, additive Schwarz,...) PC Example ierr = PCCreate(PETSC COMM WORLD, &pc); ierr = PCSetOperators(pc, A, P); ierr = PCSetType(pc, PCILU); ierr = PCSetUp(pc); ierr = PCApply(pc, x, y); P. E. Farrell (Oxford) SPS 3 May 17, / 5

42 Fundamental objects [Vec, Mat, PC, KSP, SNES] KSP KSP represents a linear solver (CG, GMRES, TFQMR, BICGSTAB, MINRES, GCR, Richardson, Chebyshev,...) Example ierr = KSPCreate(PETSC COMM WORLD, &ksp); ierr = KSPSetOperators(ksp, A, P); ierr = KSPSetType(ksp, KSPCG); ierr = KSPSetUp(ksp); ierr = KSPSolve(ksp, b, x); P. E. Farrell (Oxford) SPS 3 May 17, / 5

43 Fundamental objects [Vec, Mat, PC, KSP, SNES] SNES SNES represents a nonlinear solver (Newton, reduced-space Newton, NGMRES, NCG, Anderson acceleration, FAS,...) Example ierr = SNESCreate(PETSC COMM WORLD, &snes); ierr = SNESSetFunction(snes, r, residual); ierr = SNESSetJacobian(snes, J, P, jacobian); ierr = SNESSetType(snes, SNESVINEWTONRSLS); ierr = SNESSetVariableBounds(snes, xl, xu); ierr = SNESSetUp(snes); ierr = SNESSolve(snes, b, x); P. E. Farrell (Oxford) SPS 3 May 17, / 5

44 Hierarchical composition Principle All objects are composable. P. E. Farrell (Oxford) SPS 3 May 17, / 5

45 Hierarchical composition Principle All objects are composable. Principle All objects are configurable. P. E. Farrell (Oxford) SPS 3 May 17, / 5

46 Hierarchical composition Principle All objects are composable. Principle All objects are configurable. (example from variational fracture mechanics) P. E. Farrell (Oxford) SPS 3 May 17, / 5

47 Wiring PETSc and FEniCS We re going to need fine control to design our solvers. A simple interface between FEniCS and PETSc: $ git clone P. E. Farrell (Oxford) SPS 3 May 17, / 5

48 Solving PDEs on Supercomputers IV: algebraic multigrid Patrick Farrell MMSC: Python in Scientific Computing May 18, 2015 P. E. Farrell (Oxford) SPS 4 May 18, / 13

49 Multilevel solvers At the core of most PDE solvers is the solution of a linear system Linear system Ax = b The most powerful solvers for PDEs exploit the fact that there exists an infinite hierarchy of discretisations, all approximating the same problem: Hierarchy of linear systems A h x h = b h A 2h x 2h = b 2h A 4h x 4h = b 4h P. E. Farrell (Oxford) SPS 4 May 18, / 13

50 Geometric multigrid: review Geometric multigrid algorithm Begin with an initial guess. P. E. Farrell (Oxford) SPS 4 May 18, / 13

51 Geometric multigrid: review Geometric multigrid algorithm Begin with an initial guess. Apply a relaxation method to smooth the error. P. E. Farrell (Oxford) SPS 4 May 18, / 13

52 Geometric multigrid: review Geometric multigrid algorithm Begin with an initial guess. Apply a relaxation method to smooth the error. Solve for the smooth error on a coarse grid. P. E. Farrell (Oxford) SPS 4 May 18, / 13

53 Why did geometric multigrid work? Geometric multigrid worked on the Laplacian because: simple relaxation methods yielded geometrically smooth errors; those errors could be well-represented on coarse grids. What about problems where the error isn t smooth after relaxation? P. E. Farrell (Oxford) SPS 4 May 18, / 13

54 Why did geometric multigrid work? Geometric multigrid worked on the Laplacian because: simple relaxation methods yielded geometrically smooth errors; those errors could be well-represented on coarse grids. What about problems where the error isn t smooth after relaxation? Anisotropic Laplacian au xx bu yy = f in Ω = [0, 1] 2 u = g on Ω a = b if x < 1/2 a b if x 1/2. P. E. Farrell (Oxford) SPS 4 May 18, / 13

55 Why did geometric multigrid work? Geometric multigrid worked on the Laplacian because: simple relaxation methods yielded geometrically smooth errors; those errors could be well-represented on coarse grids. What about problems where the error isn t smooth after relaxation? P. E. Farrell (Oxford) SPS 4 May 18, / 13

56 Two responses GMG: design increasingly arcane relaxation methods that do smooth; semi-coarsening, multi-coarsening, etc. P. E. Farrell (Oxford) SPS 4 May 18, / 13

57 Two responses GMG: design increasingly arcane relaxation methods that do smooth; semi-coarsening, multi-coarsening, etc. AMG: fix a simple relaxation method; algebraically construct coarse grids and interpolation operators; demand that these can well represent the error after relaxation. P. E. Farrell (Oxford) SPS 4 May 18, / 13

58 Two responses GMG: design increasingly arcane relaxation methods that do smooth; semi-coarsening, multi-coarsening, etc. AMG: fix a simple relaxation method; algebraically construct coarse grids and interpolation operators; demand that these can well represent the error after relaxation. A nice side effect: AMG requires much less infrastructure: No need to supply coarse grids No need to supply interpolation operators Only applies to linear problems Requires global linearisation (memory) Requires near-nullspace of operator P. E. Farrell (Oxford) SPS 4 May 18, / 13

59 Anisotropic Laplacian again P. E. Farrell (Oxford) SPS 4 May 18, / 13

60 Anisotropic Laplacian again P. E. Farrell (Oxford) SPS 4 May 18, / 13

61 Fundamental principles of AMG I: relaxation and error Recall Richardson iteration with a preconditioner P : Richardson iteration x k+1 = x k + P 1 (b Ax k ). P. E. Farrell (Oxford) SPS 4 May 18, / 13

62 Fundamental principles of AMG I: relaxation and error Recall Richardson iteration with a preconditioner P : Richardson iteration x k+1 = x k + P 1 (b Ax k ). A simple error analysis shows Error analysis of Richardson iteration e k+1 = ( I P 1 A ) e k P. E. Farrell (Oxford) SPS 4 May 18, / 13

63 Fundamental principles of AMG I: relaxation and error Recall Richardson iteration with a preconditioner P : Richardson iteration x k+1 = x k + P 1 (b Ax k ). A simple error analysis shows Error analysis of Richardson iteration e k+1 = ( I P 1 A ) e k Now if e k+1 e k then Near-nullspace of A P 1 Ae k 0 = Ae k 0. P. E. Farrell (Oxford) SPS 4 May 18, / 13

64 Fundamental principles of AMG I: relaxation and error Error after relaxation The error after relaxation is related to the near-nullspace of the operator. P. E. Farrell (Oxford) SPS 4 May 18, / 13

65 Fundamental principles of AMG II: interpolation Recall that in one multigrid cycle we approximate the fine error as Approximation of fine error e h P h He H Thus, we want the near-nullspace to be in the range of P h H. P. E. Farrell (Oxford) SPS 4 May 18, / 13

66 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

67 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

68 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

69 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

70 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

71 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

72 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

73 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

74 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

75 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

76 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13

77 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13

78 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13

79 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13

80 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13

81 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13

82 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13

83 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13

84 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13

85 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13

86 HPC 04 Challenge! Consider the linear elasticity equation σ(u) = f in Ω u = 0 on Ω D σ n = 0 on Ω N on the pulley mesh, where ε(u) = 1 ( u + u T ), 2 σ(u) = 2µε(u) + λtr(ε(u))i, f = (ρω 2 x, ρω 2 y, 0), Ω D = {(x, y, z) Ω x 2 + y 2 < ( z) 2 } Ω N = Ω \ Ω D, E = 10 9, ν = 0.3, ρ = 10, ω = 300. P. E. Farrell (Oxford) SPS 4 May 18, / 13

87 HPC 04 Challenge! Solve this problem using only smoothed aggregation algebraic multigrid (no Krylov accelerator, -ksp type richardson -ksp monitor true residual -pc type gamg). How many iterations does it take to converge to atol (a) without the near-nullspace (b) with the near-nullspace? Here the near-nullspace is the rigid body translations and rotations. Now investigate the configuration of the smoothed aggregation AMG solver and the Krylov accelerator. (Hint: -help -snes view). By tuning the solver, can you achieve faster convergence? P. E. Farrell (Oxford) SPS 4 May 18, / 13

88 Solving PDEs on Supercomputers V: algebraic multigrid on nonsymmetric problems Patrick Farrell MMSC: Python in Scientific Computing May 19, 2015 P. E. Farrell (Oxford) SPS 5 May 19, / 4

89 HPC 05 Challenge! (1/3) Implement a solver for the Yamabe equation 8 2 u + 1 r 3 u u = 0 on the doughnut mesh with boundary conditions u = 1. Initialise Newton with the initial guess u = 1. P. E. Farrell (Oxford) SPS 5 May 19, / 4

90 HPC 05 Challenge! (2/3) Next, develop an efficient linear solver: 1. First use Newton + LU. 2. Next, try GMRES + GAMG. Does it work well? 3. Try increasing the maximum size of the coarse grid (pc gamg coarse eq limit) 4. Ah! Now we re getting somewhere. Does changing the smoother help (mg levels ksp monitor true residual)? 5. Increase the quality of the smoothed aggregation basis (pc gamg agg nsmooths). P. E. Farrell (Oxford) SPS 5 May 19, / 4

91 HPC 05 Challenge! (3/3) Profile the code. Where is it spending most of its time? How can the preconditioner construction cost be reduced? Once that is done, compare the memory usage of GMRES, FGMRES, GCR and CGS. P. E. Farrell (Oxford) SPS 5 May 19, / 4

92 Solving PDEs on Supercomputers VI: fieldsplit preconditioners Patrick Farrell MMSC: Python in Scientific Computing May 19, 2015 P. E. Farrell (Oxford) SPS 6 May 19, / 8

93 Block triangular factorisations A block matrix with nonsingular A 1 has a block triangular factorisation: ( ) A B J = = C D Block triangular factorisation ( I 0 CA 1 I ) ( A 0 0 S ) ( I A 1 ) B. 0 I where S = D CA 1 B is the (dense!) Schur complement. This gives us an expression for its inverse: ( ) 1 A B = C D Block triangular inverse ( I A 1 ) ( B A I 0 S 1 ) ( I 0 CA 1 I ). P. E. Farrell (Oxford) SPS 6 May 19, / 8

94 Fieldsplit preconditioners This gives rise to four related theorems. The choice P = ( I 0 CA 1 I Theorem (full) ) ( A 0 0 S will induce Krylov convergence in 1 iteration. ) ( I A 1 ) B 0 I P. E. Farrell (Oxford) SPS 6 May 19, / 8

95 Fieldsplit preconditioners This gives rise to four related theorems. The choice P = Theorem (lower) ( ) ( ) I 0 A 0 CA 1 I 0 S will induce Krylov convergence in 2 iterations. P. E. Farrell (Oxford) SPS 6 May 19, / 8

96 Fieldsplit preconditioners This gives rise to four related theorems. The choice P = Theorem (upper) ( ) ( A 0 I A 1 ) B 0 S 0 I will induce Krylov convergence in 2 iterations. P. E. Farrell (Oxford) SPS 6 May 19, / 8

97 Fieldsplit preconditioners This gives rise to four related theorems. The choice Theorem (diag) P = ( ) A 0 0 S will induce Krylov convergence in 3 iterations, if D = 0. P. E. Farrell (Oxford) SPS 6 May 19, / 8

98 Fieldsplit preconditioners This gives rise to four related theorems. The choice Theorem (diag) P = ( ) A 0 0 S will induce Krylov convergence in 3 iterations, if D = 0. How do you use this? Cheaply approximate A 1 and S 1 (problem specific)! P. E. Farrell (Oxford) SPS 6 May 19, / 8

99 Spectral equivalence Definition (spectral equivalence) A h and B h R n n are spectrally equivalent, A h B h, iff there exists constants c, C independent of h such that c λ(b 1 h A h) C. P. E. Farrell (Oxford) SPS 6 May 19, / 8

100 Spectral equivalence Definition (spectral equivalence) A h and B h R n n are spectrally equivalent, A h B h, iff there exists constants c, C independent of h such that c λ(b 1 h A h) C. Solving block-structured systems Find an approximation Ŝ S or Ŝ 1 S 1. P. E. Farrell (Oxford) SPS 6 May 19, / 8

101 Stokes equations The Stokes equations are ν 2 u + p = 0, u = 0. P. E. Farrell (Oxford) SPS 6 May 19, / 8

102 Stokes equations The Stokes equations are ν 2 u + p = 0, u = 0. A stable discretisation yields ( ) A B T J =. B 0 with S = BA 1 B T. P. E. Farrell (Oxford) SPS 6 May 19, / 8

103 Stokes equations The Stokes equations are ν 2 u + p = 0, u = 0. Spectral equivalence (e.g. Elman, Silvester and Wathen, 2005) Let Q be the viscosity-weighted pressure mass matrix 1 Q ij = ν φ iφ j. Then Ω S Q. P. E. Farrell (Oxford) SPS 6 May 19, / 8

104 Coding tools Creating PETSc index sets to extract dofs: u_dofs = SubSpace(Z, 0).dofmap().dofs() u_is = PETSc.IS().createGeneral(u_dofs) P. E. Farrell (Oxford) SPS 6 May 19, / 8

105 Coding tools Creating PETSc index sets to extract dofs: u_dofs = SubSpace(Z, 0).dofmap().dofs() u_is = PETSc.IS().createGeneral(u_dofs) Configuring the dofs to split: fields = [("0", u_is), ("1", p_is)] snes.ksp.pc.setfieldsplitis(*fields) P. E. Farrell (Oxford) SPS 6 May 19, / 8

106 Coding tools Creating PETSc index sets to extract dofs: u_dofs = SubSpace(Z, 0).dofmap().dofs() u_is = PETSc.IS().createGeneral(u_dofs) Configuring the dofs to split: fields = [("0", u_is), ("1", p_is)] snes.ksp.pc.setfieldsplitis(*fields) Setting the matrix for building a preconditioner for the Schur complement: schur = (1.0/nu) * inner(p, q)*dx schur_full = assemble(schur) schur_fmat = as_backend_type(schur_full).mat() schur_mat = schur_fmat.getsubmatrix(p_is, p_is) snes.ksp.pc.setfieldsplitschurpretype(petsc.pc.schurpretype.user, schur_mat) P. E. Farrell (Oxford) SPS 6 May 19, / 8

107 Configuring fieldsplit --petsc.ksp_converged_reason --petsc.ksp_type fgmres --petsc.ksp_monitor_true_residual --petsc.ksp_atol 1.0e-10 --petsc.ksp_rtol petsc.pc_type fieldsplit --petsc.pc_fieldsplit_type schur --petsc.pc_fieldsplit_schur_factorization_type full --petsc.pc_fieldsplit_schur_precondition user --petsc.fieldsplit_0_ksp_type richardson --petsc.fieldsplit_0_ksp_max_it 1 --petsc.fieldsplit_0_pc_type lu --petsc.fieldsplit_0_pc_factor_mat_solver_package mumps --petsc.fieldsplit_1_ksp_type bcgs --petsc.fieldsplit_1_ksp_rtol 1.0e-10 --petsc.fieldsplit_1_ksp_monitor_true_residual --petsc.fieldsplit_1_pc_type lu --petsc.fieldsplit_1_pc_factor_mat_solver_package mumps P. E. Farrell (Oxford) SPS 6 May 19, / 8

108 HPC 06 Challenge! Solve the Stokes equations with ν = 1/100 on the dolphin.xml mesh, with boundary conditions u = (0, 0) on Ω 0 u = ( sin πy, 0) on Ω 1 ν u n = pn on Ω 2, with colours taken from dolphin subdomains.xml. 0. Discretise the equation with a stable finite element pair. Integrate both terms in the momentum equation by parts. 1. Solve the problem with LU (UMFPACK/MUMPS). 2. Implement the fieldsplit preconditioner with ideal inner solvers (LU). 3. Now replace the inner solvers with Krylov solvers (CG/ML/5 for A, BCGS/HYPRE/5 for S). 4. What configuration is fastest? full with strong inner solvers? diag with weak inner solvers? P. E. Farrell (Oxford) SPS 6 May 19, / 8

109 Solving PDEs on Supercomputers VII: PDE-constrained optimisation Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS 7 May 17, / 9

110 The mother problem Consider again the mother problem of PDE-constrained optimisation: 1 min d ) y,u 2 Ω(y y 2 dx + β u 2 dx 2 Ω subject to y = u y = 0 in Ω on Ω P. E. Farrell (Oxford) SPS 7 May 17, / 9

111 The mother problem Consider again the mother problem of PDE-constrained optimisation: 1 min d ) y,u 2 Ω(y y 2 dx + β u 2 dx 2 Ω subject to y = u y = 0 We form the Lagrangian: L(y, u, λ) = 1 y d ) 2 Ω(y 2 dx + β 2 in Ω on Ω Ω u 2 dx + λ y λu dx Ω P. E. Farrell (Oxford) SPS 7 May 17, / 9

112 The optimality conditions Taking the optimality conditions yields the system: find (y, u, λ) H0 1 L2 H0 1 such that ȳ(y y d ) + λ ȳ = 0, Ω Ω β ūu λū = 0, Ω Ω λ y λu = 0. Ω Ω P. E. Farrell (Oxford) SPS 7 May 17, / 9

113 The optimality conditions Taking the optimality conditions yields the system: find (y, u, λ) H0 1 L2 H0 1 such that ȳ(y y d ) + λ ȳ = 0, Ω Ω β ūu λū = 0, Ω Ω λ y λu = 0. On discretisation, this yields the system M 0 K y z 0 βm M u = 0. K M 0 λ 0 Ω Ω P. E. Farrell (Oxford) SPS 7 May 17, / 9

114 Ingredients of a fieldsplit Remember, to fieldsplit you need two things: 1. A diagonal block you can cheaply invert 2. A Schur complement you can cheaply approximate P. E. Farrell (Oxford) SPS 7 May 17, / 9

115 Ingredients of a fieldsplit Remember, to fieldsplit you need two things: 1. A diagonal block you can cheaply invert 2. A Schur complement you can cheaply approximate If we take A = [[M, 0], [0, βm]], the first is satisfied. P. E. Farrell (Oxford) SPS 7 May 17, / 9

116 Ingredients of a fieldsplit Remember, to fieldsplit you need two things: 1. A diagonal block you can cheaply invert 2. A Schur complement you can cheaply approximate If we take A = [[M, 0], [0, βm]], the first is satisfied. How about the Schur complement? Calculating, we find S = KM 1 K + 1 β M. P. E. Farrell (Oxford) SPS 7 May 17, / 9

117 Ingredients of a fieldsplit Remember, to fieldsplit you need two things: 1. A diagonal block you can cheaply invert 2. A Schur complement you can cheaply approximate If we take A = [[M, 0], [0, βm]], the first is satisfied. How about the Schur complement? Calculating, we find S = KM 1 K + 1 β M. Bad news Approximating the inverse of sums is hard. P. E. Farrell (Oxford) SPS 7 May 17, / 9

118 Two approaches Approach one: ignore one of terms (Rees, Dollar, Wathen 2010). S = KM 1 K + 1 β M KM 1 K with inverse Ŝ 1 K 1 MK 1. P. E. Farrell (Oxford) SPS 7 May 17, / 9

119 Two approaches Approach one: ignore one of terms (Rees, Dollar, Wathen 2010). S = KM 1 K + 1 β M KM 1 K with inverse Ŝ 1 K 1 MK 1. Approach two: approximate the sum with a product (Pearson and Wathen, 2012). ( S = K + 1 ) ( M M 1 K + 1 ) M 2 M β β β ( K + 1 ) ( M M 1 K + 1 ) M β β with inverse Ŝ 1 ˆK 1 M ˆK 1. P. E. Farrell (Oxford) SPS 7 May 17, / 9

120 Coding tools No need to pass index sets with scalar fields: """ --petsc.pc_fieldsplit_0_fields 0,1 --petsc.pc_fieldsplit_1_fields 2 """ P. E. Farrell (Oxford) SPS 7 May 17, / 9

121 Coding tools No need to pass index sets with scalar fields: """ --petsc.pc_fieldsplit_0_fields 0,1 --petsc.pc_fieldsplit_1_fields 2 """ You do need index sets to extract submatrices: trial = split(trialfunction(z))[0] test = split(testfunction(z))[0] bc = DirichletBC(Z.sub(0), 0.0, "on_boundary") mass_full = assemble(inner(trial, test)*dx) bc.apply(mass_full)... mass_mat = mass_fmat.getsubmatrix(is_0, is_0) P. E. Farrell (Oxford) SPS 7 May 17, / 9

122 Coding tools Creating a KSP to handle the solve: ksp_kbm = PETSc.KSP() ksp_kbm.create() ksp_kbm.settype("richardson") ksp_kbm.pc.settype("lu") ksp_kbm.setoperators(kbm) ksp_kbm.setoptionsprefix("fieldsplit_1_kbm_") ksp_kbm.setfromoptions() ksp_kbm.setup() P. E. Farrell (Oxford) SPS 7 May 17, / 9

123 Coding tools Using an approximate inverse action with PCMAT: """ --petsc.fieldsplit_1_pc_type mat """ P. E. Farrell (Oxford) SPS 7 May 17, / 9

124 Coding tools Using an approximate inverse action with PCMAT: """ --petsc.fieldsplit_1_pc_type mat """ Configuring a shell matrix: class SchurInv(object): def mult(self, mat, x, y): ksp_kbm.solve(x, tmp1) mass.mult(tmp1, tmp2) ksp_kbm.solve(tmp2, y) schur = PETSc.Mat() schur.createpython(mass.getsizes(), SchurInv()) schur.setup() P. E. Farrell (Oxford) SPS 7 May 17, / 9

125 HPC 07 Challenge! Solve the mother problem on Ω = [0, 1] 2 with { 1 if (x, y) [0, 0.5] 2 y d (x, y) = 0 otherwise and homogeneous Dirichlet boundary conditions. 0. Discretise the equation with [P 1 ] Solve the problem with LU. 2. Implement the two fieldsplit preconditioners with ideal inner solvers. 3. Which performs best as β 0? 4. Now choose scalable inner solvers. 5. Which configuration is fastest on the machine? P. E. Farrell (Oxford) SPS 7 May 17, / 9

126 Solving PDEs on Supercomputers VIII: advanced nonlinear solvers Patrick Farrell MMSC: Python in Scientific Computing May 18, 2015 P. E. Farrell (Oxford) SPS 8 May 18, / 13

127 Globalisation of Newton s method Consider again the p-laplace equation (γ(u) u) = f u = g in Ω on Ω where γ(u) = (ɛ u 2 ) (p 2)/2. The configuration we considered (p = 5) took 121 iterations to converge. Why? P. E. Farrell (Oxford) SPS 8 May 18, / 13

128 Newton steps near singular Jacobians Recall that at our initial guess u = 0, our Jacobian is nearly singular. If then and if σ min 0, then J = UΣV T, J 1 = V Σ 1 U T, δu = J 1 F. P. E. Farrell (Oxford) SPS 8 May 18, / 13

129 Newton steps near singular Jacobians Recall that at our initial guess u = 0, our Jacobian is nearly singular. If then and if σ min 0, then J = UΣV T, J 1 = V Σ 1 U T, δu = J 1 F. This explains 0 SNES Function norm e-02 1 SNES Function norm e+56 2 SNES Function norm e+56 P. E. Farrell (Oxford) SPS 8 May 18, / 13

130 Responses A few possible responses: 1. Start with a better initial guess (continuation) P. E. Farrell (Oxford) SPS 8 May 18, / 13

131 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) P. E. Farrell (Oxford) SPS 8 May 18, / 13

132 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! P. E. Farrell (Oxford) SPS 8 May 18, / 13

133 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! Newton fractal for z 3 1 = 0 with α = 1. P. E. Farrell (Oxford) SPS 8 May 18, / 13

134 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! Newton fractal for z 3 1 = 0 with α = P. E. Farrell (Oxford) SPS 8 May 18, / 13

135 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! Newton fractal for z 3 1 = 0 with α = 0.5. P. E. Farrell (Oxford) SPS 8 May 18, / 13

136 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! Newton fractal for z 3 1 = 0 with α = P. E. Farrell (Oxford) SPS 8 May 18, / 13

137 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! Newton fractal for z 3 1 = 0 with α = 0.1. P. E. Farrell (Oxford) SPS 8 May 18, / 13

138 Linesearch schemes in PETSc Backtracking linesearch (bt) Finds the minimum of a polynomial fit to the l 2 norm in [0, 1]. Demands monotonic and sufficient decrease. If decrease is insufficient, the interval is reduced. P. E. Farrell (Oxford) SPS 8 May 18, / 13

139 Linesearch schemes in PETSc Backtracking linesearch (bt) Finds the minimum of a polynomial fit to the l 2 norm in [0, 1]. Demands monotonic and sufficient decrease. If decrease is insufficient, the interval is reduced. Good for: convex problems, occasional near-singular Jacobians. P. E. Farrell (Oxford) SPS 8 May 18, / 13

140 Linesearch schemes in PETSc Backtracking linesearch (bt) Finds the minimum of a polynomial fit to the l 2 norm in [0, 1]. Demands monotonic and sufficient decrease. If decrease is insufficient, the interval is reduced. Good for: convex problems, occasional near-singular Jacobians. Bad for: nonconvex problems where the residual must increase before convergence. P. E. Farrell (Oxford) SPS 8 May 18, / 13

141 Linesearch schemes in PETSc Critical point linesearch (cp) Many PDEs have an energy function to be minimised. Suppose F (u) is the gradient of some (unknown) E(u). E(u + αdu) can be minimised by looking for roots of du T F (u + αdu) = 0 with a secant method. P. E. Farrell (Oxford) SPS 8 May 18, / 13

142 Linesearch schemes in PETSc Critical point linesearch (cp) Many PDEs have an energy function to be minimised. Suppose F (u) is the gradient of some (unknown) E(u). E(u + αdu) can be minimised by looking for roots of du T F (u + αdu) = 0 with a secant method. Good for: problems with an energy functional. P. E. Farrell (Oxford) SPS 8 May 18, / 13

143 Linesearch schemes in PETSc Affine-covariant linesearch (nleqerr) Undamped Newton s method is affine covariant. This observation fundamentally changes convergence theorems for Newton (Deuflhard, 2011). Convergence criteria are expressed in terms of affine-covariant Lipschitz constants. This linesearch estimates these constants and uses it to decide step lengths. P. E. Farrell (Oxford) SPS 8 May 18, / 13

144 Linesearch schemes in PETSc Affine-covariant linesearch (nleqerr) Undamped Newton s method is affine covariant. This observation fundamentally changes convergence theorems for Newton (Deuflhard, 2011). Convergence criteria are expressed in terms of affine-covariant Lipschitz constants. This linesearch estimates these constants and uses it to decide step lengths. Good for: problems where you can start within singular manifolds; the hardest nonlinear problems. P. E. Farrell (Oxford) SPS 8 May 18, / 13

145 Nonlinear preconditioning For a linear problem Ax = b we apply an approximate solver P 1 on the left: P 1 Ax = P 1 b. P. E. Farrell (Oxford) SPS 8 May 18, / 13

146 Nonlinear preconditioning For a linear problem Ax = b we apply an approximate solver P 1 on the left: P 1 Ax = P 1 b. Write one step of a nonlinear solver for F (x) = b as x i+1 = N(F, x i, b). P. E. Farrell (Oxford) SPS 8 May 18, / 13

147 Nonlinear preconditioning In nonlinear left preconditioning, we define a new residual R(x) = x N(F, x, b) and apply an outer nonlinear solver to R. P. E. Farrell (Oxford) SPS 8 May 18, / 13

148 Nonlinear preconditioning In nonlinear left preconditioning, we define a new residual R(x) = x N(F, x, b) and apply an outer nonlinear solver to R. In the linear case this is equivalent, since R(x) = x N(F, x, b) = x + P 1 (Ax b) x = P 1 (Ax b) P. E. Farrell (Oxford) SPS 8 May 18, / 13

149 Nonlinear preconditioning In nonlinear left preconditioning, we define a new residual R(x) = x N(F, x, b) and apply an outer nonlinear solver to R. In the linear case this is equivalent, since R(x) = x N(F, x, b) = x + P 1 (Ax b) x = P 1 (Ax b) Can accelerate an inner solver with an outer solver! P. E. Farrell (Oxford) SPS 8 May 18, / 13

150 Examples of nonlinear preconditioning Hyperelasticity (Brune et al, 2013) Inner solver: Newton. Outer solver: nonlinear conjugate gradients. P. E. Farrell (Oxford) SPS 8 May 18, / 13

151 Examples of nonlinear preconditioning Hyperelasticity (Brune et al, 2013) Inner solver: Newton. Outer solver: nonlinear conjugate gradients. High-Reynolds number Navier Stokes (Cai and Keyes, 2002) Inner solver: nonlinear additive Schwarz. Outer solver: Newton Krylov. P. E. Farrell (Oxford) SPS 8 May 18, / 13

152 Examples of nonlinear preconditioning Hyperelasticity (Brune et al, 2013) Inner solver: Newton. Outer solver: nonlinear conjugate gradients. High-Reynolds number Navier Stokes (Cai and Keyes, 2002) Inner solver: nonlinear additive Schwarz. Outer solver: Newton Krylov. High-Prandtl number Navier Stokes (Brune et al, 2013) Inner solver: nonlinear multigrid. Outer solver: nonlinear GMRES. P. E. Farrell (Oxford) SPS 8 May 18, / 13

153 Nonlinear preconditioning: a remark The design space for nonlinear solvers is vast. At the moment we have very little theory to guide us. There are very large potential gains, however. P. E. Farrell (Oxford) SPS 8 May 18, / 13

154 Nonlinear multigrid The main bottleneck for massive problems is the linear system. P. E. Farrell (Oxford) SPS 8 May 18, / 13

155 Nonlinear multigrid The main bottleneck for massive problems is the linear system. What if we didn t have to solve (large) linear systems? P. E. Farrell (Oxford) SPS 8 May 18, / 13

156 Nonlinear multigrid The main bottleneck for massive problems is the linear system. What if we didn t have to solve (large) linear systems? FAS uses fine-grid residuals to correct coarse-grid equations. P. E. Farrell (Oxford) SPS 8 May 18, / 13

157 Full Approximation Scheme (FAS) Given: a problem (F h, x h, b h ) a smoother S and coarse solver M restriction, prolongation and injection operators R, P and ˆR. while not converged: x h s = S(F h, x h i, b h ) x H = ˆRx h s b H = R[b F h (x h )] + F H (x H ) x H c = M(F H, x H, b H ) x h c = x h s + P [x H c x H ] x h i+1 = S(F h, x h c, b h ) P. E. Farrell (Oxford) SPS 8 May 18, / 13

158 Nonlinear multigrid You can use a high-flop smoother on the fine grids, and Newton-LU on the coarse grids! P. E. Farrell (Oxford) SPS 8 May 18, / 13

159 Nonlinear multigrid You can use a high-flop smoother on the fine grids, and Newton-LU on the coarse grids! (see firedrake Yamabe demo) P. E. Farrell (Oxford) SPS 8 May 18, / 13

160 HPC 08 Challenge! Consider again the p-laplace equation (FEniCS lecture III). 1. Investigate the performance of different linesearch schemes on the p-laplace problem. 2. Using only basic for the inner solver, accelerate the convergence of Newton s method with left-preconditioning with ncg/cp. 3. Now use the optimal inner linesearch to beat the unaccelerated solver. 4. Choose sensible Krylov solvers and scale the code on ARCUS. P. E. Farrell (Oxford) SPS 8 May 18, / 13

161 Solving PDEs on Supercomputers IV: a final challenge Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS 8 May 17, / 3

162 HPC 09 Challenge! (1/2) Consider the Cahn Hilliard equation ( ( )) c df t M dc λ 2 c = 0 in Ω, ( ( )) df M dc λ 2 c = 0 on Ω, Mλ c n = 0 on Ω. where c is the unknown field, f(c) = 100c 2 (c 1) 2, n is the unit normal, and M is a scalar parameter. To solve this with standard C 0 elements, write it as two coupled second-order problems. P. E. Farrell (Oxford) SPS 8 May 17, / 3

163 HPC 09 Challenge! (2/2) Discretise and solve the equation on Ω = [0, 1] 2 for M = 1, λ = 10 2, and initial condition class InitialConditions(Expression): def init (self): random.seed(2 + MPI.rank(mpi_comm_world())) def eval(self, values, x): values[0] = *(0.5 - random.random()) values[1] = 0.0 def value_shape(self): return (2,) Make sure your scheme is at least second-order. Sensible values are t = , θ = 0.5. An excellent preconditioner is discussed in doi: / P. E. Farrell (Oxford) SPS 8 May 17, / 3

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................

More information

Solving PDEs with Multigrid Methods p.1

Solving PDEs with Multigrid Methods p.1 Solving PDEs with Multigrid Methods Scott MacLachlan maclachl@colorado.edu Department of Applied Mathematics, University of Colorado at Boulder Solving PDEs with Multigrid Methods p.1 Support and Collaboration

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

FEniCS Course. Lecture 0: Introduction to FEM. Contributors Anders Logg, Kent-Andre Mardal

FEniCS Course. Lecture 0: Introduction to FEM. Contributors Anders Logg, Kent-Andre Mardal FEniCS Course Lecture 0: Introduction to FEM Contributors Anders Logg, Kent-Andre Mardal 1 / 46 What is FEM? The finite element method is a framework and a recipe for discretization of mathematical problems

More information

Stabilization and Acceleration of Algebraic Multigrid Method

Stabilization and Acceleration of Algebraic Multigrid Method Stabilization and Acceleration of Algebraic Multigrid Method Recursive Projection Algorithm A. Jemcov J.P. Maruszewski Fluent Inc. October 24, 2006 Outline 1 Need for Algorithm Stabilization and Acceleration

More information

Multigrid Methods and their application in CFD

Multigrid Methods and their application in CFD Multigrid Methods and their application in CFD Michael Wurst TU München 16.06.2009 1 Multigrid Methods Definition Multigrid (MG) methods in numerical analysis are a group of algorithms for solving differential

More information

Using PETSc Solvers in PyLith

Using PETSc Solvers in PyLith Using PETSc Solvers in PyLith Matthew Knepley, Brad Aagaard, and Charles Williams Computational and Applied Mathematics Rice University PyLith Virtual 2015 Cyberspace August 24 25, 2015 M. Knepley (Rice)

More information

Fast solvers for steady incompressible flow

Fast solvers for steady incompressible flow ICFD 25 p.1/21 Fast solvers for steady incompressible flow Andy Wathen Oxford University wathen@comlab.ox.ac.uk http://web.comlab.ox.ac.uk/~wathen/ Joint work with: Howard Elman (University of Maryland,

More information

Using PETSc Solvers in PyLith

Using PETSc Solvers in PyLith Using PETSc Solvers in PyLith Matthew Knepley, Brad Aagaard, and Charles Williams Computational and Applied Mathematics Rice University CIG All-Hands PyLith Tutorial 2016 UC Davis June 19, 2016 M. Knepley

More information

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Pierre Jolivet, F. Hecht, F. Nataf, C. Prud homme Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann INRIA Rocquencourt

More information

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU Preconditioning Techniques for Solving Large Sparse Linear Systems Arnold Reusken Institut für Geometrie und Praktische Mathematik RWTH-Aachen OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative

More information

Fast Iterative Solution of Saddle Point Problems

Fast Iterative Solution of Saddle Point Problems Michele Benzi Department of Mathematics and Computer Science Emory University Atlanta, GA Acknowledgments NSF (Computational Mathematics) Maxim Olshanskii (Mech-Math, Moscow State U.) Zhen Wang (PhD student,

More information

AMG for a Peta-scale Navier Stokes Code

AMG for a Peta-scale Navier Stokes Code AMG for a Peta-scale Navier Stokes Code James Lottes Argonne National Laboratory October 18, 2007 The Challenge Develop an AMG iterative method to solve Poisson 2 u = f discretized on highly irregular

More information

Multigrid absolute value preconditioning

Multigrid absolute value preconditioning Multigrid absolute value preconditioning Eugene Vecharynski 1 Andrew Knyazev 2 (speaker) 1 Department of Computer Science and Engineering University of Minnesota 2 Department of Mathematical and Statistical

More information

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners Eugene Vecharynski 1 Andrew Knyazev 2 1 Department of Computer Science and Engineering University of Minnesota 2 Department

More information

Indefinite and physics-based preconditioning

Indefinite and physics-based preconditioning Indefinite and physics-based preconditioning Jed Brown VAW, ETH Zürich 2009-01-29 Newton iteration Standard form of a nonlinear system F (u) 0 Iteration Solve: Update: J(ũ)u F (ũ) ũ + ũ + u Example (p-bratu)

More information

Algebraic Multigrid as Solvers and as Preconditioner

Algebraic Multigrid as Solvers and as Preconditioner Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven

More information

Kasetsart University Workshop. Multigrid methods: An introduction

Kasetsart University Workshop. Multigrid methods: An introduction Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available

More information

Robust solution of Poisson-like problems with aggregation-based AMG

Robust solution of Poisson-like problems with aggregation-based AMG Robust solution of Poisson-like problems with aggregation-based AMG Yvan Notay Université Libre de Bruxelles Service de Métrologie Nucléaire Paris, January 26, 215 Supported by the Belgian FNRS http://homepages.ulb.ac.be/

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 24: Preconditioning and Multigrid Solver Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 5 Preconditioning Motivation:

More information

Lecture 17: Iterative Methods and Sparse Linear Algebra

Lecture 17: Iterative Methods and Sparse Linear Algebra Lecture 17: Iterative Methods and Sparse Linear Algebra David Bindel 25 Mar 2014 Logistics HW 3 extended to Wednesday after break HW 4 should come out Monday after break Still need project description

More information

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Some Geometric and Algebraic Aspects of Domain Decomposition Methods Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition

More information

1. Fast Iterative Solvers of SLE

1. Fast Iterative Solvers of SLE 1. Fast Iterative Solvers of crucial drawback of solvers discussed so far: they become slower if we discretize more accurate! now: look for possible remedies relaxation: explicit application of the multigrid

More information

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems P.-O. Persson and J. Peraire Massachusetts Institute of Technology 2006 AIAA Aerospace Sciences Meeting, Reno, Nevada January 9,

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large

More information

Constrained Minimization and Multigrid

Constrained Minimization and Multigrid Constrained Minimization and Multigrid C. Gräser (FU Berlin), R. Kornhuber (FU Berlin), and O. Sander (FU Berlin) Workshop on PDE Constrained Optimization Hamburg, March 27-29, 2008 Matheon Outline Successive

More information

FAS and Solver Performance

FAS and Solver Performance FAS and Solver Performance Matthew Knepley Mathematics and Computer Science Division Argonne National Laboratory Fall AMS Central Section Meeting Chicago, IL Oct 05 06, 2007 M. Knepley (ANL) FAS AMS 07

More information

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers Applied and Computational Mathematics 2017; 6(4): 202-207 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20170604.18 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Robust Preconditioned

More information

Nonlinear Preconditioning in PETSc

Nonlinear Preconditioning in PETSc Nonlinear Preconditioning in PETSc Matthew Knepley PETSc Team Computation Institute University of Chicago Department of Molecular Biology and Physiology Rush University Medical Center Algorithmic Adaptivity

More information

Elliptic Problems / Multigrid. PHY 604: Computational Methods for Physics and Astrophysics II

Elliptic Problems / Multigrid. PHY 604: Computational Methods for Physics and Astrophysics II Elliptic Problems / Multigrid Summary of Hyperbolic PDEs We looked at a simple linear and a nonlinear scalar hyperbolic PDE There is a speed associated with the change of the solution Explicit methods

More information

Adaptive algebraic multigrid methods in lattice computations

Adaptive algebraic multigrid methods in lattice computations Adaptive algebraic multigrid methods in lattice computations Karsten Kahl Bergische Universität Wuppertal January 8, 2009 Acknowledgements Matthias Bolten, University of Wuppertal Achi Brandt, Weizmann

More information

Review: From problem to parallel algorithm

Review: From problem to parallel algorithm Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:

More information

Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools

Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Xiangjun Wu (1),Lilun Zhang (2),Junqiang Song (2) and Dehui Chen (1) (1) Center for Numerical Weather Prediction, CMA (2) School

More information

An Accelerated Block-Parallel Newton Method via Overlapped Partitioning

An Accelerated Block-Parallel Newton Method via Overlapped Partitioning An Accelerated Block-Parallel Newton Method via Overlapped Partitioning Yurong Chen Lab. of Parallel Computing, Institute of Software, CAS (http://www.rdcps.ac.cn/~ychen/english.htm) Summary. This paper

More information

9.1 Preconditioned Krylov Subspace Methods

9.1 Preconditioned Krylov Subspace Methods Chapter 9 PRECONDITIONING 9.1 Preconditioned Krylov Subspace Methods 9.2 Preconditioned Conjugate Gradient 9.3 Preconditioned Generalized Minimal Residual 9.4 Relaxation Method Preconditioners 9.5 Incomplete

More information

Solving Ax = b, an overview. Program

Solving Ax = b, an overview. Program Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview

More information

Lecture 18 Classical Iterative Methods

Lecture 18 Classical Iterative Methods Lecture 18 Classical Iterative Methods MIT 18.335J / 6.337J Introduction to Numerical Methods Per-Olof Persson November 14, 2006 1 Iterative Methods for Linear Systems Direct methods for solving Ax = b,

More information

Efficient multigrid solvers for mixed finite element discretisations in NWP models

Efficient multigrid solvers for mixed finite element discretisations in NWP models 1/20 Efficient multigrid solvers for mixed finite element discretisations in NWP models Colin Cotter, David Ham, Lawrence Mitchell, Eike Hermann Müller *, Robert Scheichl * * University of Bath, Imperial

More information

Efficient Augmented Lagrangian-type Preconditioning for the Oseen Problem using Grad-Div Stabilization

Efficient Augmented Lagrangian-type Preconditioning for the Oseen Problem using Grad-Div Stabilization Efficient Augmented Lagrangian-type Preconditioning for the Oseen Problem using Grad-Div Stabilization Timo Heister, Texas A&M University 2013-02-28 SIAM CSE 2 Setting Stationary, incompressible flow problems

More information

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation Tao Zhao 1, Feng-Nan Hwang 2 and Xiao-Chuan Cai 3 Abstract In this paper, we develop an overlapping domain decomposition

More information

High Performance Nonlinear Solvers

High Performance Nonlinear Solvers What is a nonlinear system? High Performance Nonlinear Solvers Michael McCourt Division Argonne National Laboratory IIT Meshfree Seminar September 19, 2011 Every nonlinear system of equations can be described

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Iteration basics Notes for 2016-11-07 An iterative solver for Ax = b is produces a sequence of approximations x (k) x. We always stop after finitely many steps, based on some convergence criterion, e.g.

More information

Parallel sparse linear solvers and applications in CFD

Parallel sparse linear solvers and applications in CFD Parallel sparse linear solvers and applications in CFD Jocelyne Erhel Joint work with Désiré Nuentsa Wakam () and Baptiste Poirriez () SAGE team, Inria Rennes, France journée Calcul Intensif Distribué

More information

Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs

Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs Pasqua D Ambra Institute for Applied Computing (IAC) National Research Council of Italy (CNR) pasqua.dambra@cnr.it

More information

Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems

Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday April 4,

More information

Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses

Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses P. Boyanova 1, I. Georgiev 34, S. Margenov, L. Zikatanov 5 1 Uppsala University, Box 337, 751 05 Uppsala,

More information

Multipole-Based Preconditioners for Sparse Linear Systems.

Multipole-Based Preconditioners for Sparse Linear Systems. Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal

More information

Iterative Methods and Multigrid

Iterative Methods and Multigrid Iterative Methods and Multigrid Part 3: Preconditioning 2 Eric de Sturler Preconditioning The general idea behind preconditioning is that convergence of some method for the linear system Ax = b can be

More information

Scalable Non-blocking Preconditioned Conjugate Gradient Methods

Scalable Non-blocking Preconditioned Conjugate Gradient Methods Scalable Non-blocking Preconditioned Conjugate Gradient Methods Paul Eller and William Gropp University of Illinois at Urbana-Champaign Department of Computer Science Supercomputing 16 Paul Eller and William

More information

CLASSICAL ITERATIVE METHODS

CLASSICAL ITERATIVE METHODS CLASSICAL ITERATIVE METHODS LONG CHEN In this notes we discuss classic iterative methods on solving the linear operator equation (1) Au = f, posed on a finite dimensional Hilbert space V = R N equipped

More information

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C. Lecture 9 Approximations of Laplace s Equation, Finite Element Method Mathématiques appliquées (MATH54-1) B. Dewals, C. Geuzaine V1.2 23/11/218 1 Learning objectives of this lecture Apply the finite difference

More information

Nonlinear Preconditioning in PETSc

Nonlinear Preconditioning in PETSc Nonlinear Preconditioning in PETSc Matthew Knepley PETSc Team Computation Institute University of Chicago Challenges in 21st Century Experimental Mathematical Computation ICERM, Providence, RI July 22,

More information

Computers and Mathematics with Applications

Computers and Mathematics with Applications Computers and Mathematics with Applications 68 (2014) 1151 1160 Contents lists available at ScienceDirect Computers and Mathematics with Applications journal homepage: www.elsevier.com/locate/camwa A GPU

More information

K.S. Kang. The multigrid method for an elliptic problem on a rectangular domain with an internal conductiong structure and an inner empty space

K.S. Kang. The multigrid method for an elliptic problem on a rectangular domain with an internal conductiong structure and an inner empty space K.S. Kang The multigrid method for an elliptic problem on a rectangular domain with an internal conductiong structure and an inner empty space IPP 5/128 September, 2011 The multigrid method for an elliptic

More information

New Multigrid Solver Advances in TOPS

New Multigrid Solver Advances in TOPS New Multigrid Solver Advances in TOPS R D Falgout 1, J Brannick 2, M Brezina 2, T Manteuffel 2 and S McCormick 2 1 Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, P.O.

More information

Integration of PETSc for Nonlinear Solves

Integration of PETSc for Nonlinear Solves Integration of PETSc for Nonlinear Solves Ben Jamroz, Travis Austin, Srinath Vadlamani, Scott Kruger Tech-X Corporation jamroz@txcorp.com http://www.txcorp.com NIMROD Meeting: Aug 10, 2010 Boulder, CO

More information

Composing Nonlinear Solvers

Composing Nonlinear Solvers Composing Nonlinear Solvers Matthew Knepley Computational and Applied Mathematics Rice University MIT Aeronautics and Astronautics Boston, MA May 10, 2016 Matt (Rice) PETSc MIT 1 / 69 What is PETSc? PETSc

More information

PETSc for Python. Lisandro Dalcin

PETSc for Python.  Lisandro Dalcin PETSc for Python http://petsc4py.googlecode.com Lisandro Dalcin dalcinl@gmail.com Centro Internacional de Métodos Computacionales en Ingeniería Consejo Nacional de Investigaciones Científicas y Técnicas

More information

Preconditioners for the incompressible Navier Stokes equations

Preconditioners for the incompressible Navier Stokes equations Preconditioners for the incompressible Navier Stokes equations C. Vuik M. ur Rehman A. Segal Delft Institute of Applied Mathematics, TU Delft, The Netherlands SIAM Conference on Computational Science and

More information

A Numerical Study of Some Parallel Algebraic Preconditioners

A Numerical Study of Some Parallel Algebraic Preconditioners A Numerical Study of Some Parallel Algebraic Preconditioners Xing Cai Simula Research Laboratory & University of Oslo PO Box 1080, Blindern, 0316 Oslo, Norway xingca@simulano Masha Sosonkina University

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

7.4 The Saddle Point Stokes Problem

7.4 The Saddle Point Stokes Problem 346 CHAPTER 7. APPLIED FOURIER ANALYSIS 7.4 The Saddle Point Stokes Problem So far the matrix C has been diagonal no trouble to invert. This section jumps to a fluid flow problem that is still linear (simpler

More information

Lecture 8: Fast Linear Solvers (Part 7)

Lecture 8: Fast Linear Solvers (Part 7) Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi

More information

University of Illinois at Urbana-Champaign. Multigrid (MG) methods are used to approximate solutions to elliptic partial differential

University of Illinois at Urbana-Champaign. Multigrid (MG) methods are used to approximate solutions to elliptic partial differential Title: Multigrid Methods Name: Luke Olson 1 Affil./Addr.: Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 email: lukeo@illinois.edu url: http://www.cs.uiuc.edu/homes/lukeo/

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

Distributed Memory Parallelization in NGSolve

Distributed Memory Parallelization in NGSolve Distributed Memory Parallelization in NGSolve Lukas Kogler June, 2017 Inst. for Analysis and Scientific Computing, TU Wien From Shared to Distributed Memory Shared Memory Parallelization via threads (

More information

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1 Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58 Background C. T. Kelley NC State University tim kelley@ncsu.edu C. T. Kelley Background NCSU, Spring 2012 1 / 58 Notation vectors, matrices, norms l 1 : max col sum... spectral radius scaled integral norms

More information

Parallel Discontinuous Galerkin Method

Parallel Discontinuous Galerkin Method Parallel Discontinuous Galerkin Method Yin Ki, NG The Chinese University of Hong Kong Aug 5, 2015 Mentors: Dr. Ohannes Karakashian, Dr. Kwai Wong Overview Project Goal Implement parallelization on Discontinuous

More information

PDE Solvers for Fluid Flow

PDE Solvers for Fluid Flow PDE Solvers for Fluid Flow issues and algorithms for the Streaming Supercomputer Eran Guendelman February 5, 2002 Topics Equations for incompressible fluid flow 3 model PDEs: Hyperbolic, Elliptic, Parabolic

More information

ANALYSIS OF AUGMENTED LAGRANGIAN-BASED PRECONDITIONERS FOR THE STEADY INCOMPRESSIBLE NAVIER-STOKES EQUATIONS

ANALYSIS OF AUGMENTED LAGRANGIAN-BASED PRECONDITIONERS FOR THE STEADY INCOMPRESSIBLE NAVIER-STOKES EQUATIONS ANALYSIS OF AUGMENTED LAGRANGIAN-BASED PRECONDITIONERS FOR THE STEADY INCOMPRESSIBLE NAVIER-STOKES EQUATIONS MICHELE BENZI AND ZHEN WANG Abstract. We analyze a class of modified augmented Lagrangian-based

More information

Algebraic Multigrid Methods for the Oseen Problem

Algebraic Multigrid Methods for the Oseen Problem Algebraic Multigrid Methods for the Oseen Problem Markus Wabro Joint work with: Walter Zulehner, Linz www.numa.uni-linz.ac.at This work has been supported by the Austrian Science Foundation Fonds zur Förderung

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD

More information

Universität Dortmund UCHPC. Performance. Computing for Finite Element Simulations

Universität Dortmund UCHPC. Performance. Computing for Finite Element Simulations technische universität dortmund Universität Dortmund fakultät für mathematik LS III (IAM) UCHPC UnConventional High Performance Computing for Finite Element Simulations S. Turek, Chr. Becker, S. Buijssen,

More information

An advanced ILU preconditioner for the incompressible Navier-Stokes equations

An advanced ILU preconditioner for the incompressible Navier-Stokes equations An advanced ILU preconditioner for the incompressible Navier-Stokes equations M. ur Rehman C. Vuik A. Segal Delft Institute of Applied Mathematics, TU delft The Netherlands Computational Methods with Applications,

More information

INTRODUCTION TO MULTIGRID METHODS

INTRODUCTION TO MULTIGRID METHODS INTRODUCTION TO MULTIGRID METHODS LONG CHEN 1. ALGEBRAIC EQUATION OF TWO POINT BOUNDARY VALUE PROBLEM We consider the discretization of Poisson equation in one dimension: (1) u = f, x (0, 1) u(0) = u(1)

More information

MULTIGRID METHODS FOR NONLINEAR PROBLEMS: AN OVERVIEW

MULTIGRID METHODS FOR NONLINEAR PROBLEMS: AN OVERVIEW MULTIGRID METHODS FOR NONLINEAR PROBLEMS: AN OVERVIEW VAN EMDEN HENSON CENTER FOR APPLIED SCIENTIFIC COMPUTING LAWRENCE LIVERMORE NATIONAL LABORATORY Abstract Since their early application to elliptic

More information

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A

More information

Modelling and implementation of algorithms in applied mathematics using MPI

Modelling and implementation of algorithms in applied mathematics using MPI Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

Geometric Multigrid Methods

Geometric Multigrid Methods Geometric Multigrid Methods Susanne C. Brenner Department of Mathematics and Center for Computation & Technology Louisiana State University IMA Tutorial: Fast Solution Techniques November 28, 2010 Ideas

More information

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,

More information

On domain decomposition preconditioners for finite element approximations of the Helmholtz equation using absorption

On domain decomposition preconditioners for finite element approximations of the Helmholtz equation using absorption On domain decomposition preconditioners for finite element approximations of the Helmholtz equation using absorption Ivan Graham and Euan Spence (Bath, UK) Collaborations with: Paul Childs (Emerson Roxar,

More information

The Removal of Critical Slowing Down. Lattice College of William and Mary

The Removal of Critical Slowing Down. Lattice College of William and Mary The Removal of Critical Slowing Down Lattice 2008 College of William and Mary Michael Clark Boston University James Brannick, Rich Brower, Tom Manteuffel, Steve McCormick, James Osborn, Claudio Rebbi 1

More information

A User Friendly Toolbox for Parallel PDE-Solvers

A User Friendly Toolbox for Parallel PDE-Solvers A User Friendly Toolbox for Parallel PDE-Solvers Gundolf Haase Institut for Mathematics and Scientific Computing Karl-Franzens University of Graz Manfred Liebmann Mathematics in Sciences Max-Planck-Institute

More information

On nonlinear adaptivity with heterogeneity

On nonlinear adaptivity with heterogeneity On nonlinear adaptivity with heterogeneity Jed Brown jed@jedbrown.org (CU Boulder) Collaborators: Mark Adams (LBL), Matt Knepley (UChicago), Dave May (ETH), Laetitia Le Pourhiet (UPMC), Ravi Samtaney (KAUST)

More information

Domain decomposition on different levels of the Jacobi-Davidson method

Domain decomposition on different levels of the Jacobi-Davidson method hapter 5 Domain decomposition on different levels of the Jacobi-Davidson method Abstract Most computational work of Jacobi-Davidson [46], an iterative method suitable for computing solutions of large dimensional

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

A Review of Preconditioning Techniques for Steady Incompressible Flow

A Review of Preconditioning Techniques for Steady Incompressible Flow Zeist 2009 p. 1/43 A Review of Preconditioning Techniques for Steady Incompressible Flow David Silvester School of Mathematics University of Manchester Zeist 2009 p. 2/43 PDEs Review : 1984 2005 Update

More information

Aggregation-based algebraic multigrid

Aggregation-based algebraic multigrid Aggregation-based algebraic multigrid from theory to fast solvers Yvan Notay Université Libre de Bruxelles Service de Métrologie Nucléaire CEMRACS, Marseille, July 18, 2012 Supported by the Belgian FNRS

More information

Toward less synchronous composable multilevel methods for implicit multiphysics simulation

Toward less synchronous composable multilevel methods for implicit multiphysics simulation Toward less synchronous composable multilevel methods for implicit multiphysics simulation Jed Brown 1, Mark Adams 2, Peter Brune 1, Matt Knepley 3, Barry Smith 1 1 Mathematics and Computer Science Division,

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

M.A. Botchev. September 5, 2014

M.A. Botchev. September 5, 2014 Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev

More information

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology, USA SPPEXA Symposium TU München,

More information

Multigrid finite element methods on semi-structured triangular grids

Multigrid finite element methods on semi-structured triangular grids XXI Congreso de Ecuaciones Diferenciales y Aplicaciones XI Congreso de Matemática Aplicada Ciudad Real, -5 septiembre 009 (pp. 8) Multigrid finite element methods on semi-structured triangular grids F.J.

More information

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Review of matrices. Let m, n IN. A rectangle of numbers written like A = Review of matrices Let m, n IN. A rectangle of numbers written like a 11 a 12... a 1n a 21 a 22... a 2n A =...... a m1 a m2... a mn where each a ij IR is called a matrix with m rows and n columns or an

More information