Solving PDEs on Supercomputers I: modern supercomputer architecture
|
|
- Georgina Shelton
- 6 years ago
- Views:
Transcription
1 Supercomputer architecture Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS I May 17, / 17
2 Supercomputer architecture Moore s Law Moore s Law The number of transistors per unit area on integrated circuits doubles every two years. (1965) P. E. Farrell (Oxford) SPS I May 17, / 17
3 Supercomputer architecture Moore s Law The consequence Individual computers aren t getting faster: we re getting more of them. P. E. Farrell (Oxford) SPS I May 17, / 17
4 Supercomputer architecture A modern supercomputer In this lecture we will give a brief overview of modern supercomputer architecture. ARCHER is composed of 4920 nodes, each with 24 cores, for a total of 118,080 cores. P. E. Farrell (Oxford) SPS I May 17, / 17
5 Supercomputer architecture A node P. E. Farrell (Oxford) SPS I May 17, / 17
6 Supercomputer architecture A node Algorithmic consequence Extreme pressure on memory and memory bandwidth. P. E. Farrell (Oxford) SPS I May 17, / 17
7 Supercomputer architecture A socket P. E. Farrell (Oxford) SPS I May 17, / 17
8 Supercomputer architecture A socket Algorithmic consequence Want to have multiple cores working on the same data. P. E. Farrell (Oxford) SPS I May 17, / 17
9 Supercomputer architecture A core P. E. Farrell (Oxford) SPS I May 17, / 17
10 Supercomputer architecture A core Algorithmic consequence Vectorisation essential for maximum floating point performance. P. E. Farrell (Oxford) SPS I May 17, / 17
11 Supercomputer architecture Hardware properties Some relative timings On a 3.0 GHz Intel Core 2 Duo E8400: One clock cycle: 1/3 nanoseconds ( 10 light-cm!). Accessing L1 data cache (32 KB): 3 cycles Accessing L2 cache (6 MB): 14 cycles Accessing main memory: 250 cycles Accessing disk: 40 million cycles P. E. Farrell (Oxford) SPS I May 17, / 17
12 Supercomputer architecture Hardware properties Some relative timings On a 3.0 GHz Intel Core 2 Duo E8400: One clock cycle: 1/3 nanoseconds ( 10 light-cm!). Accessing L1 data cache (32 KB): 3 cycles Accessing L2 cache (6 MB): 14 cycles Accessing main memory: 250 cycles Accessing disk: 40 million cycles Analogy Register: the data is on your working paper. L1 cache: the data is on your desk (3 seconds). L2 cache: the data is on your bookshelf (14 seconds). Main memory: the data is in the library (a 4 minute walk). P. E. Farrell (Oxford) SPS I May 17, / 17
13 Supercomputer architecture Hardware properties Some relative timings On a 3.0 GHz Intel Core 2 Duo E8400: One clock cycle: 1/3 nanoseconds ( 10 light-cm!). Accessing L1 data cache (32 KB): 3 cycles Accessing L2 cache (6 MB): 14 cycles Accessing main memory: 250 cycles Accessing disk: 40 million cycles Analogy Register: the data is on your working paper. L1 cache: the data is on your desk (3 seconds). L2 cache: the data is on your bookshelf (14 seconds). Main memory: the data is in the library (a 4 minute walk). Disk: go backpacking for 1.2 years. P. E. Farrell (Oxford) SPS I May 17, / 17
14 Supercomputer architecture Hardware properties The interconnect P. E. Farrell (Oxford) SPS I May 17, / 17
15 Supercomputer architecture Hardware properties Some more timings On the Cray Aries interconnect, to send a message: Within a socket: 800 cycles Within a node: 1600 cycles Across the machine: 8000 cycles P. E. Farrell (Oxford) SPS I May 17, / 17
16 Supercomputer architecture Hardware properties Some more timings On the Cray Aries interconnect, to send a message: Within a socket: 800 cycles Within a node: 1600 cycles Across the machine: 8000 cycles Algorithmic consequence Interleave communication and computation. P. E. Farrell (Oxford) SPS I May 17, / 17
17 MPI and OpenMP Domain decomposition The coarsest level of parallelism used is domain decomposition over MPI. from dolfin import * mesh = UnitCubeMesh(32, 32, 32) partitioning = CellFunction("size_t", mesh) partitioning.set_all(mpi.rank(mpi_comm_world())) File("output/partitioning.xdmf") << partitioning $ mpiexec -n 4 python partition.py P. E. Farrell (Oxford) SPS I May 17, / 17
18 MPI and OpenMP MPI: basic model MPI Separate processes with separate memory spaces communicate via message passing. MPI concepts: communicator collective rank blocking and nonblocking communication reductions Each subdomain is assigned to one MPI rank. P. E. Farrell (Oxford) SPS I May 17, / 17
19 MPI and OpenMP Main communication patterns in finite elements Assembly Assembly requires exchanging halo data with your neighbours. processor 0 core owned exec non-exec halos non-exec exec owned core processor 1 P. E. Farrell (Oxford) SPS I May 17, / 17
20 MPI and OpenMP Main communication patterns in finite elements Krylov solvers Neighbour communications for sparse matrix-vector product. Global reductions (allreduce for dot products) Preconditioner application Multigrid: extremely complicated. P. E. Farrell (Oxford) SPS I May 17, / 17
21 MPI and OpenMP OpenMP: basic model OpenMP Separate threads operate on the same memory space. Less overhead in parallel execution Multiple cores can act on the same data Less pressure on memory and memory bandwidth Easier load balancing Extremely difficult to program correctly Subtle race conditions possible Colouring and locks required to synchronise P. E. Farrell (Oxford) SPS I May 17, / 17
22 MPI and OpenMP DOLFIN can also run in OpenMP mode for assembly: from dolfin import * parameters["num_threads"] = 4 #... solve(f == 0, u) # must use a threaded solver # (e.g. pastix)! You can t use MPI and OpenMP at the same time (yet). P. E. Farrell (Oxford) SPS I May 17, / 17
23 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. P. E. Farrell (Oxford) SPS I May 17, / 17
24 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. P. E. Farrell (Oxford) SPS I May 17, / 17
25 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. Flops are (approximately) free. P. E. Farrell (Oxford) SPS I May 17, / 17
26 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. Flops are (approximately) free. Large stencils induce extra communication. P. E. Farrell (Oxford) SPS I May 17, / 17
27 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. Flops are (approximately) free. Large stencils induce extra communication. Must overlap communication and computation. P. E. Farrell (Oxford) SPS I May 17, / 17
28 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. Flops are (approximately) free. Large stencils induce extra communication. Must overlap communication and computation. Solver algorithms must be O(n) or O(nlogn). P. E. Farrell (Oxford) SPS I May 17, / 17
29 Algorithmic consequences General algorithmic consequences Need algorithms with high arithmetical intensity. Caches greatly dislike unstructured memory accesses. Flops are (approximately) free. Large stencils induce extra communication. Must overlap communication and computation. Solver algorithms must be O(n) or O(nlogn). General algorithmic trends Domain-decomposed high-order FE on semi-structured meshes. Multigrid/multilevel solvers with Krylov accelerators. Hybrid parallelism strategies (MPI/OpenMP/AVX). P. E. Farrell (Oxford) SPS I May 17, / 17
30 Solving PDEs on Supercomputers II: practical matters of using supercomputers Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS 2 May 17, / 7
31 Logging on Supercomputers are accessed by sshing to the login nodes. $ ssh mmschpcxx@arcus.oerc.ox.ac.uk You configure your environment with modules: $ module list No Modulefiles Currently Loaded. $ module avail... $ module use -a /data/math-farrellp/crichardson/modules $ module load fenics/1.5.0 $ module list Modules are generally awful, but nothing better exists yet. P. E. Farrell (Oxford) SPS 2 May 17, / 7
32 Running jobs interactively The simplest way to run a job is interactively. This is mainly used for debugging. $ qsub -I -l nodes=1:ppn=16 -l walltime=0:10:00 -q develq qsub: waiting for job headnode1.arcus.osc.local to start # wait until PBS allocates us the resources we asked for... qsub: job headnode1.arcus.osc.local ready $ cd $PBS_O_WORKDIR $ module use -a /data/math-farrellp/crichardson/modules $ module load fenics/1.5.0 $ mpirun $MPI_HOSTS python poisson.py P. E. Farrell (Oxford) SPS 2 May 17, / 7
33 Running jobs in batch mode ARCUS-A and ARCHER are managed using PBS, the Portable Batch System. Users submit jobs to the batch system which decides when and where they get executed. The main PBS commands: qsub qdel qstat The argument to qsub is a PBS script. P. E. Farrell (Oxford) SPS 2 May 17, / 7
34 Running jobs in batch mode #!/bin/bash # set the number of nodes and processes per node #PBS -l nodes=1:ppn=16 # set max wallclock time #PBS -l walltime=1:00:00 # set name of job #PBS -N poisson # mail alert at start, end and abortion of execution #PBS -m bea # send mail to this address #PBS -M patrick.farrell@maths.ox.ac.uk # start job from the directory it was submitted cd $PBS_O_WORKDIR module use -a /data/math-farrellp/crichardson/modules module load fenics/ enable_arcus_mpi.sh mpirun $MPI_HOSTS python poisson.py tee poisson.log P. E. Farrell (Oxford) SPS 2 May 17, / 7
35 HPC 02 Challenge! Investigate the weak scaling of the 2D Poisson solver with parallel LU that you developed last week: Have the code refine the mesh once each time the number of cores quadruples. Hint: size = MPI.size(mpi_comm_world())... for i in nrefine: mesh = refine(mesh, redistribute=false) Run the code on 1, 4 and 16 cores. What happens to the runtime as the problem is scaled weakly?... P. E. Farrell (Oxford) SPS 2 May 17, / 7
36 HPC 02 Challenge! Which components of the solver are taking the longest? Profile the code with DOLFIN timing system: list timings() PETSc timing system: import petsc4py petsc4py.init("-log_summary summary.log".split()) from dolfin import * Now switch to HYPRE algebraic multigrid and compare the timings again. Hint: to get more details about the AMG solve, call PETScOptions.set("pc_hypre_boomeramg_print_statistics", 1) P. E. Farrell (Oxford) SPS 2 May 17, / 7
37 Solving PDEs on Supercomputers III: an introduction to PETSc Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS 3 May 17, / 5
38 PETSc PETSc is a library of linear and nonlinear solvers for sparse PDEs. It has won most awards going: SIAM/ACM Prize in Computational Science and Engineering, R&D Award Gordon Bell Prizes in 2009, 2004, 2003, PETSc makes it easy to express complex hierarchical composed solvers as compactly as possible. P. E. Farrell (Oxford) SPS 3 May 17, / 5
39 Fundamental objects [Vec, Mat, PC, KSP, SNES] Vec Vec represents a dense vector, decomposed in parallel. Example ierr = VecCreateMPI(PETSC COMM WORLD, local, global, &x); ierr = VecDuplicate(x, &y); ierr = VecDotBegin(x, y, &xty); /* other computations */ ierr = VecDotEnd(x, y, &xty); P. E. Farrell (Oxford) SPS 3 May 17, / 5
40 Fundamental objects [Vec, Mat, PC, KSP, SNES] Mat Mat represents a sparse matrix, decomposed in parallel. Example ierr = MatCreateAIJ(PETSC COMM WORLD,..., &mat); for (i = 0; i < local rows; i++) ierr = MatSetValues(mat,...); ierr = MatAssemblyBegin(mat, MAT FINAL ASSEMBLY); ierr = MatAssemblyEnd(mat, MAT FINAL ASSEMBLY); ierr = MatMult(mat, x, y); P. E. Farrell (Oxford) SPS 3 May 17, / 5
41 Fundamental objects [Vec, Mat, PC, KSP, SNES] PC represents a linear preconditioner (Jacobi, Gauss-Seidel, ILU, ICC, AMG, additive Schwarz,...) PC Example ierr = PCCreate(PETSC COMM WORLD, &pc); ierr = PCSetOperators(pc, A, P); ierr = PCSetType(pc, PCILU); ierr = PCSetUp(pc); ierr = PCApply(pc, x, y); P. E. Farrell (Oxford) SPS 3 May 17, / 5
42 Fundamental objects [Vec, Mat, PC, KSP, SNES] KSP KSP represents a linear solver (CG, GMRES, TFQMR, BICGSTAB, MINRES, GCR, Richardson, Chebyshev,...) Example ierr = KSPCreate(PETSC COMM WORLD, &ksp); ierr = KSPSetOperators(ksp, A, P); ierr = KSPSetType(ksp, KSPCG); ierr = KSPSetUp(ksp); ierr = KSPSolve(ksp, b, x); P. E. Farrell (Oxford) SPS 3 May 17, / 5
43 Fundamental objects [Vec, Mat, PC, KSP, SNES] SNES SNES represents a nonlinear solver (Newton, reduced-space Newton, NGMRES, NCG, Anderson acceleration, FAS,...) Example ierr = SNESCreate(PETSC COMM WORLD, &snes); ierr = SNESSetFunction(snes, r, residual); ierr = SNESSetJacobian(snes, J, P, jacobian); ierr = SNESSetType(snes, SNESVINEWTONRSLS); ierr = SNESSetVariableBounds(snes, xl, xu); ierr = SNESSetUp(snes); ierr = SNESSolve(snes, b, x); P. E. Farrell (Oxford) SPS 3 May 17, / 5
44 Hierarchical composition Principle All objects are composable. P. E. Farrell (Oxford) SPS 3 May 17, / 5
45 Hierarchical composition Principle All objects are composable. Principle All objects are configurable. P. E. Farrell (Oxford) SPS 3 May 17, / 5
46 Hierarchical composition Principle All objects are composable. Principle All objects are configurable. (example from variational fracture mechanics) P. E. Farrell (Oxford) SPS 3 May 17, / 5
47 Wiring PETSc and FEniCS We re going to need fine control to design our solvers. A simple interface between FEniCS and PETSc: $ git clone P. E. Farrell (Oxford) SPS 3 May 17, / 5
48 Solving PDEs on Supercomputers IV: algebraic multigrid Patrick Farrell MMSC: Python in Scientific Computing May 18, 2015 P. E. Farrell (Oxford) SPS 4 May 18, / 13
49 Multilevel solvers At the core of most PDE solvers is the solution of a linear system Linear system Ax = b The most powerful solvers for PDEs exploit the fact that there exists an infinite hierarchy of discretisations, all approximating the same problem: Hierarchy of linear systems A h x h = b h A 2h x 2h = b 2h A 4h x 4h = b 4h P. E. Farrell (Oxford) SPS 4 May 18, / 13
50 Geometric multigrid: review Geometric multigrid algorithm Begin with an initial guess. P. E. Farrell (Oxford) SPS 4 May 18, / 13
51 Geometric multigrid: review Geometric multigrid algorithm Begin with an initial guess. Apply a relaxation method to smooth the error. P. E. Farrell (Oxford) SPS 4 May 18, / 13
52 Geometric multigrid: review Geometric multigrid algorithm Begin with an initial guess. Apply a relaxation method to smooth the error. Solve for the smooth error on a coarse grid. P. E. Farrell (Oxford) SPS 4 May 18, / 13
53 Why did geometric multigrid work? Geometric multigrid worked on the Laplacian because: simple relaxation methods yielded geometrically smooth errors; those errors could be well-represented on coarse grids. What about problems where the error isn t smooth after relaxation? P. E. Farrell (Oxford) SPS 4 May 18, / 13
54 Why did geometric multigrid work? Geometric multigrid worked on the Laplacian because: simple relaxation methods yielded geometrically smooth errors; those errors could be well-represented on coarse grids. What about problems where the error isn t smooth after relaxation? Anisotropic Laplacian au xx bu yy = f in Ω = [0, 1] 2 u = g on Ω a = b if x < 1/2 a b if x 1/2. P. E. Farrell (Oxford) SPS 4 May 18, / 13
55 Why did geometric multigrid work? Geometric multigrid worked on the Laplacian because: simple relaxation methods yielded geometrically smooth errors; those errors could be well-represented on coarse grids. What about problems where the error isn t smooth after relaxation? P. E. Farrell (Oxford) SPS 4 May 18, / 13
56 Two responses GMG: design increasingly arcane relaxation methods that do smooth; semi-coarsening, multi-coarsening, etc. P. E. Farrell (Oxford) SPS 4 May 18, / 13
57 Two responses GMG: design increasingly arcane relaxation methods that do smooth; semi-coarsening, multi-coarsening, etc. AMG: fix a simple relaxation method; algebraically construct coarse grids and interpolation operators; demand that these can well represent the error after relaxation. P. E. Farrell (Oxford) SPS 4 May 18, / 13
58 Two responses GMG: design increasingly arcane relaxation methods that do smooth; semi-coarsening, multi-coarsening, etc. AMG: fix a simple relaxation method; algebraically construct coarse grids and interpolation operators; demand that these can well represent the error after relaxation. A nice side effect: AMG requires much less infrastructure: No need to supply coarse grids No need to supply interpolation operators Only applies to linear problems Requires global linearisation (memory) Requires near-nullspace of operator P. E. Farrell (Oxford) SPS 4 May 18, / 13
59 Anisotropic Laplacian again P. E. Farrell (Oxford) SPS 4 May 18, / 13
60 Anisotropic Laplacian again P. E. Farrell (Oxford) SPS 4 May 18, / 13
61 Fundamental principles of AMG I: relaxation and error Recall Richardson iteration with a preconditioner P : Richardson iteration x k+1 = x k + P 1 (b Ax k ). P. E. Farrell (Oxford) SPS 4 May 18, / 13
62 Fundamental principles of AMG I: relaxation and error Recall Richardson iteration with a preconditioner P : Richardson iteration x k+1 = x k + P 1 (b Ax k ). A simple error analysis shows Error analysis of Richardson iteration e k+1 = ( I P 1 A ) e k P. E. Farrell (Oxford) SPS 4 May 18, / 13
63 Fundamental principles of AMG I: relaxation and error Recall Richardson iteration with a preconditioner P : Richardson iteration x k+1 = x k + P 1 (b Ax k ). A simple error analysis shows Error analysis of Richardson iteration e k+1 = ( I P 1 A ) e k Now if e k+1 e k then Near-nullspace of A P 1 Ae k 0 = Ae k 0. P. E. Farrell (Oxford) SPS 4 May 18, / 13
64 Fundamental principles of AMG I: relaxation and error Error after relaxation The error after relaxation is related to the near-nullspace of the operator. P. E. Farrell (Oxford) SPS 4 May 18, / 13
65 Fundamental principles of AMG II: interpolation Recall that in one multigrid cycle we approximate the fine error as Approximation of fine error e h P h He H Thus, we want the near-nullspace to be in the range of P h H. P. E. Farrell (Oxford) SPS 4 May 18, / 13
66 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
67 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
68 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
69 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
70 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
71 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
72 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
73 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
74 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
75 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
76 Coarse grid generation: an example Classical AMG: coarse-grid generation 1. Select C-point with maximal measure 2. Select neighbours as F-points 3. Update measures of neighbours P. E. Farrell (Oxford) SPS 4 May 18, / 13
77 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13
78 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13
79 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13
80 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13
81 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13
82 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13
83 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13
84 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13
85 Coarse grid generation: an example Smoothed-aggregation AMG: coarse-grid generation Phase 1: 1. Pick a root point not adjacent to an aggregation 2. Aggregate root and neighbours Phase 2: Move points into nearby aggregations P. E. Farrell (Oxford) SPS 4 May 18, / 13
86 HPC 04 Challenge! Consider the linear elasticity equation σ(u) = f in Ω u = 0 on Ω D σ n = 0 on Ω N on the pulley mesh, where ε(u) = 1 ( u + u T ), 2 σ(u) = 2µε(u) + λtr(ε(u))i, f = (ρω 2 x, ρω 2 y, 0), Ω D = {(x, y, z) Ω x 2 + y 2 < ( z) 2 } Ω N = Ω \ Ω D, E = 10 9, ν = 0.3, ρ = 10, ω = 300. P. E. Farrell (Oxford) SPS 4 May 18, / 13
87 HPC 04 Challenge! Solve this problem using only smoothed aggregation algebraic multigrid (no Krylov accelerator, -ksp type richardson -ksp monitor true residual -pc type gamg). How many iterations does it take to converge to atol (a) without the near-nullspace (b) with the near-nullspace? Here the near-nullspace is the rigid body translations and rotations. Now investigate the configuration of the smoothed aggregation AMG solver and the Krylov accelerator. (Hint: -help -snes view). By tuning the solver, can you achieve faster convergence? P. E. Farrell (Oxford) SPS 4 May 18, / 13
88 Solving PDEs on Supercomputers V: algebraic multigrid on nonsymmetric problems Patrick Farrell MMSC: Python in Scientific Computing May 19, 2015 P. E. Farrell (Oxford) SPS 5 May 19, / 4
89 HPC 05 Challenge! (1/3) Implement a solver for the Yamabe equation 8 2 u + 1 r 3 u u = 0 on the doughnut mesh with boundary conditions u = 1. Initialise Newton with the initial guess u = 1. P. E. Farrell (Oxford) SPS 5 May 19, / 4
90 HPC 05 Challenge! (2/3) Next, develop an efficient linear solver: 1. First use Newton + LU. 2. Next, try GMRES + GAMG. Does it work well? 3. Try increasing the maximum size of the coarse grid (pc gamg coarse eq limit) 4. Ah! Now we re getting somewhere. Does changing the smoother help (mg levels ksp monitor true residual)? 5. Increase the quality of the smoothed aggregation basis (pc gamg agg nsmooths). P. E. Farrell (Oxford) SPS 5 May 19, / 4
91 HPC 05 Challenge! (3/3) Profile the code. Where is it spending most of its time? How can the preconditioner construction cost be reduced? Once that is done, compare the memory usage of GMRES, FGMRES, GCR and CGS. P. E. Farrell (Oxford) SPS 5 May 19, / 4
92 Solving PDEs on Supercomputers VI: fieldsplit preconditioners Patrick Farrell MMSC: Python in Scientific Computing May 19, 2015 P. E. Farrell (Oxford) SPS 6 May 19, / 8
93 Block triangular factorisations A block matrix with nonsingular A 1 has a block triangular factorisation: ( ) A B J = = C D Block triangular factorisation ( I 0 CA 1 I ) ( A 0 0 S ) ( I A 1 ) B. 0 I where S = D CA 1 B is the (dense!) Schur complement. This gives us an expression for its inverse: ( ) 1 A B = C D Block triangular inverse ( I A 1 ) ( B A I 0 S 1 ) ( I 0 CA 1 I ). P. E. Farrell (Oxford) SPS 6 May 19, / 8
94 Fieldsplit preconditioners This gives rise to four related theorems. The choice P = ( I 0 CA 1 I Theorem (full) ) ( A 0 0 S will induce Krylov convergence in 1 iteration. ) ( I A 1 ) B 0 I P. E. Farrell (Oxford) SPS 6 May 19, / 8
95 Fieldsplit preconditioners This gives rise to four related theorems. The choice P = Theorem (lower) ( ) ( ) I 0 A 0 CA 1 I 0 S will induce Krylov convergence in 2 iterations. P. E. Farrell (Oxford) SPS 6 May 19, / 8
96 Fieldsplit preconditioners This gives rise to four related theorems. The choice P = Theorem (upper) ( ) ( A 0 I A 1 ) B 0 S 0 I will induce Krylov convergence in 2 iterations. P. E. Farrell (Oxford) SPS 6 May 19, / 8
97 Fieldsplit preconditioners This gives rise to four related theorems. The choice Theorem (diag) P = ( ) A 0 0 S will induce Krylov convergence in 3 iterations, if D = 0. P. E. Farrell (Oxford) SPS 6 May 19, / 8
98 Fieldsplit preconditioners This gives rise to four related theorems. The choice Theorem (diag) P = ( ) A 0 0 S will induce Krylov convergence in 3 iterations, if D = 0. How do you use this? Cheaply approximate A 1 and S 1 (problem specific)! P. E. Farrell (Oxford) SPS 6 May 19, / 8
99 Spectral equivalence Definition (spectral equivalence) A h and B h R n n are spectrally equivalent, A h B h, iff there exists constants c, C independent of h such that c λ(b 1 h A h) C. P. E. Farrell (Oxford) SPS 6 May 19, / 8
100 Spectral equivalence Definition (spectral equivalence) A h and B h R n n are spectrally equivalent, A h B h, iff there exists constants c, C independent of h such that c λ(b 1 h A h) C. Solving block-structured systems Find an approximation Ŝ S or Ŝ 1 S 1. P. E. Farrell (Oxford) SPS 6 May 19, / 8
101 Stokes equations The Stokes equations are ν 2 u + p = 0, u = 0. P. E. Farrell (Oxford) SPS 6 May 19, / 8
102 Stokes equations The Stokes equations are ν 2 u + p = 0, u = 0. A stable discretisation yields ( ) A B T J =. B 0 with S = BA 1 B T. P. E. Farrell (Oxford) SPS 6 May 19, / 8
103 Stokes equations The Stokes equations are ν 2 u + p = 0, u = 0. Spectral equivalence (e.g. Elman, Silvester and Wathen, 2005) Let Q be the viscosity-weighted pressure mass matrix 1 Q ij = ν φ iφ j. Then Ω S Q. P. E. Farrell (Oxford) SPS 6 May 19, / 8
104 Coding tools Creating PETSc index sets to extract dofs: u_dofs = SubSpace(Z, 0).dofmap().dofs() u_is = PETSc.IS().createGeneral(u_dofs) P. E. Farrell (Oxford) SPS 6 May 19, / 8
105 Coding tools Creating PETSc index sets to extract dofs: u_dofs = SubSpace(Z, 0).dofmap().dofs() u_is = PETSc.IS().createGeneral(u_dofs) Configuring the dofs to split: fields = [("0", u_is), ("1", p_is)] snes.ksp.pc.setfieldsplitis(*fields) P. E. Farrell (Oxford) SPS 6 May 19, / 8
106 Coding tools Creating PETSc index sets to extract dofs: u_dofs = SubSpace(Z, 0).dofmap().dofs() u_is = PETSc.IS().createGeneral(u_dofs) Configuring the dofs to split: fields = [("0", u_is), ("1", p_is)] snes.ksp.pc.setfieldsplitis(*fields) Setting the matrix for building a preconditioner for the Schur complement: schur = (1.0/nu) * inner(p, q)*dx schur_full = assemble(schur) schur_fmat = as_backend_type(schur_full).mat() schur_mat = schur_fmat.getsubmatrix(p_is, p_is) snes.ksp.pc.setfieldsplitschurpretype(petsc.pc.schurpretype.user, schur_mat) P. E. Farrell (Oxford) SPS 6 May 19, / 8
107 Configuring fieldsplit --petsc.ksp_converged_reason --petsc.ksp_type fgmres --petsc.ksp_monitor_true_residual --petsc.ksp_atol 1.0e-10 --petsc.ksp_rtol petsc.pc_type fieldsplit --petsc.pc_fieldsplit_type schur --petsc.pc_fieldsplit_schur_factorization_type full --petsc.pc_fieldsplit_schur_precondition user --petsc.fieldsplit_0_ksp_type richardson --petsc.fieldsplit_0_ksp_max_it 1 --petsc.fieldsplit_0_pc_type lu --petsc.fieldsplit_0_pc_factor_mat_solver_package mumps --petsc.fieldsplit_1_ksp_type bcgs --petsc.fieldsplit_1_ksp_rtol 1.0e-10 --petsc.fieldsplit_1_ksp_monitor_true_residual --petsc.fieldsplit_1_pc_type lu --petsc.fieldsplit_1_pc_factor_mat_solver_package mumps P. E. Farrell (Oxford) SPS 6 May 19, / 8
108 HPC 06 Challenge! Solve the Stokes equations with ν = 1/100 on the dolphin.xml mesh, with boundary conditions u = (0, 0) on Ω 0 u = ( sin πy, 0) on Ω 1 ν u n = pn on Ω 2, with colours taken from dolphin subdomains.xml. 0. Discretise the equation with a stable finite element pair. Integrate both terms in the momentum equation by parts. 1. Solve the problem with LU (UMFPACK/MUMPS). 2. Implement the fieldsplit preconditioner with ideal inner solvers (LU). 3. Now replace the inner solvers with Krylov solvers (CG/ML/5 for A, BCGS/HYPRE/5 for S). 4. What configuration is fastest? full with strong inner solvers? diag with weak inner solvers? P. E. Farrell (Oxford) SPS 6 May 19, / 8
109 Solving PDEs on Supercomputers VII: PDE-constrained optimisation Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS 7 May 17, / 9
110 The mother problem Consider again the mother problem of PDE-constrained optimisation: 1 min d ) y,u 2 Ω(y y 2 dx + β u 2 dx 2 Ω subject to y = u y = 0 in Ω on Ω P. E. Farrell (Oxford) SPS 7 May 17, / 9
111 The mother problem Consider again the mother problem of PDE-constrained optimisation: 1 min d ) y,u 2 Ω(y y 2 dx + β u 2 dx 2 Ω subject to y = u y = 0 We form the Lagrangian: L(y, u, λ) = 1 y d ) 2 Ω(y 2 dx + β 2 in Ω on Ω Ω u 2 dx + λ y λu dx Ω P. E. Farrell (Oxford) SPS 7 May 17, / 9
112 The optimality conditions Taking the optimality conditions yields the system: find (y, u, λ) H0 1 L2 H0 1 such that ȳ(y y d ) + λ ȳ = 0, Ω Ω β ūu λū = 0, Ω Ω λ y λu = 0. Ω Ω P. E. Farrell (Oxford) SPS 7 May 17, / 9
113 The optimality conditions Taking the optimality conditions yields the system: find (y, u, λ) H0 1 L2 H0 1 such that ȳ(y y d ) + λ ȳ = 0, Ω Ω β ūu λū = 0, Ω Ω λ y λu = 0. On discretisation, this yields the system M 0 K y z 0 βm M u = 0. K M 0 λ 0 Ω Ω P. E. Farrell (Oxford) SPS 7 May 17, / 9
114 Ingredients of a fieldsplit Remember, to fieldsplit you need two things: 1. A diagonal block you can cheaply invert 2. A Schur complement you can cheaply approximate P. E. Farrell (Oxford) SPS 7 May 17, / 9
115 Ingredients of a fieldsplit Remember, to fieldsplit you need two things: 1. A diagonal block you can cheaply invert 2. A Schur complement you can cheaply approximate If we take A = [[M, 0], [0, βm]], the first is satisfied. P. E. Farrell (Oxford) SPS 7 May 17, / 9
116 Ingredients of a fieldsplit Remember, to fieldsplit you need two things: 1. A diagonal block you can cheaply invert 2. A Schur complement you can cheaply approximate If we take A = [[M, 0], [0, βm]], the first is satisfied. How about the Schur complement? Calculating, we find S = KM 1 K + 1 β M. P. E. Farrell (Oxford) SPS 7 May 17, / 9
117 Ingredients of a fieldsplit Remember, to fieldsplit you need two things: 1. A diagonal block you can cheaply invert 2. A Schur complement you can cheaply approximate If we take A = [[M, 0], [0, βm]], the first is satisfied. How about the Schur complement? Calculating, we find S = KM 1 K + 1 β M. Bad news Approximating the inverse of sums is hard. P. E. Farrell (Oxford) SPS 7 May 17, / 9
118 Two approaches Approach one: ignore one of terms (Rees, Dollar, Wathen 2010). S = KM 1 K + 1 β M KM 1 K with inverse Ŝ 1 K 1 MK 1. P. E. Farrell (Oxford) SPS 7 May 17, / 9
119 Two approaches Approach one: ignore one of terms (Rees, Dollar, Wathen 2010). S = KM 1 K + 1 β M KM 1 K with inverse Ŝ 1 K 1 MK 1. Approach two: approximate the sum with a product (Pearson and Wathen, 2012). ( S = K + 1 ) ( M M 1 K + 1 ) M 2 M β β β ( K + 1 ) ( M M 1 K + 1 ) M β β with inverse Ŝ 1 ˆK 1 M ˆK 1. P. E. Farrell (Oxford) SPS 7 May 17, / 9
120 Coding tools No need to pass index sets with scalar fields: """ --petsc.pc_fieldsplit_0_fields 0,1 --petsc.pc_fieldsplit_1_fields 2 """ P. E. Farrell (Oxford) SPS 7 May 17, / 9
121 Coding tools No need to pass index sets with scalar fields: """ --petsc.pc_fieldsplit_0_fields 0,1 --petsc.pc_fieldsplit_1_fields 2 """ You do need index sets to extract submatrices: trial = split(trialfunction(z))[0] test = split(testfunction(z))[0] bc = DirichletBC(Z.sub(0), 0.0, "on_boundary") mass_full = assemble(inner(trial, test)*dx) bc.apply(mass_full)... mass_mat = mass_fmat.getsubmatrix(is_0, is_0) P. E. Farrell (Oxford) SPS 7 May 17, / 9
122 Coding tools Creating a KSP to handle the solve: ksp_kbm = PETSc.KSP() ksp_kbm.create() ksp_kbm.settype("richardson") ksp_kbm.pc.settype("lu") ksp_kbm.setoperators(kbm) ksp_kbm.setoptionsprefix("fieldsplit_1_kbm_") ksp_kbm.setfromoptions() ksp_kbm.setup() P. E. Farrell (Oxford) SPS 7 May 17, / 9
123 Coding tools Using an approximate inverse action with PCMAT: """ --petsc.fieldsplit_1_pc_type mat """ P. E. Farrell (Oxford) SPS 7 May 17, / 9
124 Coding tools Using an approximate inverse action with PCMAT: """ --petsc.fieldsplit_1_pc_type mat """ Configuring a shell matrix: class SchurInv(object): def mult(self, mat, x, y): ksp_kbm.solve(x, tmp1) mass.mult(tmp1, tmp2) ksp_kbm.solve(tmp2, y) schur = PETSc.Mat() schur.createpython(mass.getsizes(), SchurInv()) schur.setup() P. E. Farrell (Oxford) SPS 7 May 17, / 9
125 HPC 07 Challenge! Solve the mother problem on Ω = [0, 1] 2 with { 1 if (x, y) [0, 0.5] 2 y d (x, y) = 0 otherwise and homogeneous Dirichlet boundary conditions. 0. Discretise the equation with [P 1 ] Solve the problem with LU. 2. Implement the two fieldsplit preconditioners with ideal inner solvers. 3. Which performs best as β 0? 4. Now choose scalable inner solvers. 5. Which configuration is fastest on the machine? P. E. Farrell (Oxford) SPS 7 May 17, / 9
126 Solving PDEs on Supercomputers VIII: advanced nonlinear solvers Patrick Farrell MMSC: Python in Scientific Computing May 18, 2015 P. E. Farrell (Oxford) SPS 8 May 18, / 13
127 Globalisation of Newton s method Consider again the p-laplace equation (γ(u) u) = f u = g in Ω on Ω where γ(u) = (ɛ u 2 ) (p 2)/2. The configuration we considered (p = 5) took 121 iterations to converge. Why? P. E. Farrell (Oxford) SPS 8 May 18, / 13
128 Newton steps near singular Jacobians Recall that at our initial guess u = 0, our Jacobian is nearly singular. If then and if σ min 0, then J = UΣV T, J 1 = V Σ 1 U T, δu = J 1 F. P. E. Farrell (Oxford) SPS 8 May 18, / 13
129 Newton steps near singular Jacobians Recall that at our initial guess u = 0, our Jacobian is nearly singular. If then and if σ min 0, then J = UΣV T, J 1 = V Σ 1 U T, δu = J 1 F. This explains 0 SNES Function norm e-02 1 SNES Function norm e+56 2 SNES Function norm e+56 P. E. Farrell (Oxford) SPS 8 May 18, / 13
130 Responses A few possible responses: 1. Start with a better initial guess (continuation) P. E. Farrell (Oxford) SPS 8 May 18, / 13
131 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) P. E. Farrell (Oxford) SPS 8 May 18, / 13
132 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! P. E. Farrell (Oxford) SPS 8 May 18, / 13
133 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! Newton fractal for z 3 1 = 0 with α = 1. P. E. Farrell (Oxford) SPS 8 May 18, / 13
134 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! Newton fractal for z 3 1 = 0 with α = P. E. Farrell (Oxford) SPS 8 May 18, / 13
135 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! Newton fractal for z 3 1 = 0 with α = 0.5. P. E. Farrell (Oxford) SPS 8 May 18, / 13
136 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! Newton fractal for z 3 1 = 0 with α = P. E. Farrell (Oxford) SPS 8 May 18, / 13
137 Responses A few possible responses: 1. Start with a better initial guess (continuation) 2. Regularise further (undesirable) 3. Take a smaller step (damping with α 1)! Newton fractal for z 3 1 = 0 with α = 0.1. P. E. Farrell (Oxford) SPS 8 May 18, / 13
138 Linesearch schemes in PETSc Backtracking linesearch (bt) Finds the minimum of a polynomial fit to the l 2 norm in [0, 1]. Demands monotonic and sufficient decrease. If decrease is insufficient, the interval is reduced. P. E. Farrell (Oxford) SPS 8 May 18, / 13
139 Linesearch schemes in PETSc Backtracking linesearch (bt) Finds the minimum of a polynomial fit to the l 2 norm in [0, 1]. Demands monotonic and sufficient decrease. If decrease is insufficient, the interval is reduced. Good for: convex problems, occasional near-singular Jacobians. P. E. Farrell (Oxford) SPS 8 May 18, / 13
140 Linesearch schemes in PETSc Backtracking linesearch (bt) Finds the minimum of a polynomial fit to the l 2 norm in [0, 1]. Demands monotonic and sufficient decrease. If decrease is insufficient, the interval is reduced. Good for: convex problems, occasional near-singular Jacobians. Bad for: nonconvex problems where the residual must increase before convergence. P. E. Farrell (Oxford) SPS 8 May 18, / 13
141 Linesearch schemes in PETSc Critical point linesearch (cp) Many PDEs have an energy function to be minimised. Suppose F (u) is the gradient of some (unknown) E(u). E(u + αdu) can be minimised by looking for roots of du T F (u + αdu) = 0 with a secant method. P. E. Farrell (Oxford) SPS 8 May 18, / 13
142 Linesearch schemes in PETSc Critical point linesearch (cp) Many PDEs have an energy function to be minimised. Suppose F (u) is the gradient of some (unknown) E(u). E(u + αdu) can be minimised by looking for roots of du T F (u + αdu) = 0 with a secant method. Good for: problems with an energy functional. P. E. Farrell (Oxford) SPS 8 May 18, / 13
143 Linesearch schemes in PETSc Affine-covariant linesearch (nleqerr) Undamped Newton s method is affine covariant. This observation fundamentally changes convergence theorems for Newton (Deuflhard, 2011). Convergence criteria are expressed in terms of affine-covariant Lipschitz constants. This linesearch estimates these constants and uses it to decide step lengths. P. E. Farrell (Oxford) SPS 8 May 18, / 13
144 Linesearch schemes in PETSc Affine-covariant linesearch (nleqerr) Undamped Newton s method is affine covariant. This observation fundamentally changes convergence theorems for Newton (Deuflhard, 2011). Convergence criteria are expressed in terms of affine-covariant Lipschitz constants. This linesearch estimates these constants and uses it to decide step lengths. Good for: problems where you can start within singular manifolds; the hardest nonlinear problems. P. E. Farrell (Oxford) SPS 8 May 18, / 13
145 Nonlinear preconditioning For a linear problem Ax = b we apply an approximate solver P 1 on the left: P 1 Ax = P 1 b. P. E. Farrell (Oxford) SPS 8 May 18, / 13
146 Nonlinear preconditioning For a linear problem Ax = b we apply an approximate solver P 1 on the left: P 1 Ax = P 1 b. Write one step of a nonlinear solver for F (x) = b as x i+1 = N(F, x i, b). P. E. Farrell (Oxford) SPS 8 May 18, / 13
147 Nonlinear preconditioning In nonlinear left preconditioning, we define a new residual R(x) = x N(F, x, b) and apply an outer nonlinear solver to R. P. E. Farrell (Oxford) SPS 8 May 18, / 13
148 Nonlinear preconditioning In nonlinear left preconditioning, we define a new residual R(x) = x N(F, x, b) and apply an outer nonlinear solver to R. In the linear case this is equivalent, since R(x) = x N(F, x, b) = x + P 1 (Ax b) x = P 1 (Ax b) P. E. Farrell (Oxford) SPS 8 May 18, / 13
149 Nonlinear preconditioning In nonlinear left preconditioning, we define a new residual R(x) = x N(F, x, b) and apply an outer nonlinear solver to R. In the linear case this is equivalent, since R(x) = x N(F, x, b) = x + P 1 (Ax b) x = P 1 (Ax b) Can accelerate an inner solver with an outer solver! P. E. Farrell (Oxford) SPS 8 May 18, / 13
150 Examples of nonlinear preconditioning Hyperelasticity (Brune et al, 2013) Inner solver: Newton. Outer solver: nonlinear conjugate gradients. P. E. Farrell (Oxford) SPS 8 May 18, / 13
151 Examples of nonlinear preconditioning Hyperelasticity (Brune et al, 2013) Inner solver: Newton. Outer solver: nonlinear conjugate gradients. High-Reynolds number Navier Stokes (Cai and Keyes, 2002) Inner solver: nonlinear additive Schwarz. Outer solver: Newton Krylov. P. E. Farrell (Oxford) SPS 8 May 18, / 13
152 Examples of nonlinear preconditioning Hyperelasticity (Brune et al, 2013) Inner solver: Newton. Outer solver: nonlinear conjugate gradients. High-Reynolds number Navier Stokes (Cai and Keyes, 2002) Inner solver: nonlinear additive Schwarz. Outer solver: Newton Krylov. High-Prandtl number Navier Stokes (Brune et al, 2013) Inner solver: nonlinear multigrid. Outer solver: nonlinear GMRES. P. E. Farrell (Oxford) SPS 8 May 18, / 13
153 Nonlinear preconditioning: a remark The design space for nonlinear solvers is vast. At the moment we have very little theory to guide us. There are very large potential gains, however. P. E. Farrell (Oxford) SPS 8 May 18, / 13
154 Nonlinear multigrid The main bottleneck for massive problems is the linear system. P. E. Farrell (Oxford) SPS 8 May 18, / 13
155 Nonlinear multigrid The main bottleneck for massive problems is the linear system. What if we didn t have to solve (large) linear systems? P. E. Farrell (Oxford) SPS 8 May 18, / 13
156 Nonlinear multigrid The main bottleneck for massive problems is the linear system. What if we didn t have to solve (large) linear systems? FAS uses fine-grid residuals to correct coarse-grid equations. P. E. Farrell (Oxford) SPS 8 May 18, / 13
157 Full Approximation Scheme (FAS) Given: a problem (F h, x h, b h ) a smoother S and coarse solver M restriction, prolongation and injection operators R, P and ˆR. while not converged: x h s = S(F h, x h i, b h ) x H = ˆRx h s b H = R[b F h (x h )] + F H (x H ) x H c = M(F H, x H, b H ) x h c = x h s + P [x H c x H ] x h i+1 = S(F h, x h c, b h ) P. E. Farrell (Oxford) SPS 8 May 18, / 13
158 Nonlinear multigrid You can use a high-flop smoother on the fine grids, and Newton-LU on the coarse grids! P. E. Farrell (Oxford) SPS 8 May 18, / 13
159 Nonlinear multigrid You can use a high-flop smoother on the fine grids, and Newton-LU on the coarse grids! (see firedrake Yamabe demo) P. E. Farrell (Oxford) SPS 8 May 18, / 13
160 HPC 08 Challenge! Consider again the p-laplace equation (FEniCS lecture III). 1. Investigate the performance of different linesearch schemes on the p-laplace problem. 2. Using only basic for the inner solver, accelerate the convergence of Newton s method with left-preconditioning with ncg/cp. 3. Now use the optimal inner linesearch to beat the unaccelerated solver. 4. Choose sensible Krylov solvers and scale the code on ARCUS. P. E. Farrell (Oxford) SPS 8 May 18, / 13
161 Solving PDEs on Supercomputers IV: a final challenge Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS 8 May 17, / 3
162 HPC 09 Challenge! (1/2) Consider the Cahn Hilliard equation ( ( )) c df t M dc λ 2 c = 0 in Ω, ( ( )) df M dc λ 2 c = 0 on Ω, Mλ c n = 0 on Ω. where c is the unknown field, f(c) = 100c 2 (c 1) 2, n is the unit normal, and M is a scalar parameter. To solve this with standard C 0 elements, write it as two coupled second-order problems. P. E. Farrell (Oxford) SPS 8 May 17, / 3
163 HPC 09 Challenge! (2/2) Discretise and solve the equation on Ω = [0, 1] 2 for M = 1, λ = 10 2, and initial condition class InitialConditions(Expression): def init (self): random.seed(2 + MPI.rank(mpi_comm_world())) def eval(self, values, x): values[0] = *(0.5 - random.random()) values[1] = 0.0 def value_shape(self): return (2,) Make sure your scheme is at least second-order. Sensible values are t = , θ = 0.5. An excellent preconditioner is discussed in doi: / P. E. Farrell (Oxford) SPS 8 May 17, / 3
Contents. Preface... xi. Introduction...
Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism
More informationPreface to the Second Edition. Preface to the First Edition
n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................
More informationSolving PDEs with Multigrid Methods p.1
Solving PDEs with Multigrid Methods Scott MacLachlan maclachl@colorado.edu Department of Applied Mathematics, University of Colorado at Boulder Solving PDEs with Multigrid Methods p.1 Support and Collaboration
More informationLinear Solvers. Andrew Hazel
Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction
More informationFEniCS Course. Lecture 0: Introduction to FEM. Contributors Anders Logg, Kent-Andre Mardal
FEniCS Course Lecture 0: Introduction to FEM Contributors Anders Logg, Kent-Andre Mardal 1 / 46 What is FEM? The finite element method is a framework and a recipe for discretization of mathematical problems
More informationStabilization and Acceleration of Algebraic Multigrid Method
Stabilization and Acceleration of Algebraic Multigrid Method Recursive Projection Algorithm A. Jemcov J.P. Maruszewski Fluent Inc. October 24, 2006 Outline 1 Need for Algorithm Stabilization and Acceleration
More informationMultigrid Methods and their application in CFD
Multigrid Methods and their application in CFD Michael Wurst TU München 16.06.2009 1 Multigrid Methods Definition Multigrid (MG) methods in numerical analysis are a group of algorithms for solving differential
More informationUsing PETSc Solvers in PyLith
Using PETSc Solvers in PyLith Matthew Knepley, Brad Aagaard, and Charles Williams Computational and Applied Mathematics Rice University PyLith Virtual 2015 Cyberspace August 24 25, 2015 M. Knepley (Rice)
More informationFast solvers for steady incompressible flow
ICFD 25 p.1/21 Fast solvers for steady incompressible flow Andy Wathen Oxford University wathen@comlab.ox.ac.uk http://web.comlab.ox.ac.uk/~wathen/ Joint work with: Howard Elman (University of Maryland,
More informationUsing PETSc Solvers in PyLith
Using PETSc Solvers in PyLith Matthew Knepley, Brad Aagaard, and Charles Williams Computational and Applied Mathematics Rice University CIG All-Hands PyLith Tutorial 2016 UC Davis June 19, 2016 M. Knepley
More informationScalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems
Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Pierre Jolivet, F. Hecht, F. Nataf, C. Prud homme Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann INRIA Rocquencourt
More informationOUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU
Preconditioning Techniques for Solving Large Sparse Linear Systems Arnold Reusken Institut für Geometrie und Praktische Mathematik RWTH-Aachen OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative
More informationFast Iterative Solution of Saddle Point Problems
Michele Benzi Department of Mathematics and Computer Science Emory University Atlanta, GA Acknowledgments NSF (Computational Mathematics) Maxim Olshanskii (Mech-Math, Moscow State U.) Zhen Wang (PhD student,
More informationAMG for a Peta-scale Navier Stokes Code
AMG for a Peta-scale Navier Stokes Code James Lottes Argonne National Laboratory October 18, 2007 The Challenge Develop an AMG iterative method to solve Poisson 2 u = f discretized on highly irregular
More informationMultigrid absolute value preconditioning
Multigrid absolute value preconditioning Eugene Vecharynski 1 Andrew Knyazev 2 (speaker) 1 Department of Computer Science and Engineering University of Minnesota 2 Department of Mathematical and Statistical
More informationSolving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners
Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners Eugene Vecharynski 1 Andrew Knyazev 2 1 Department of Computer Science and Engineering University of Minnesota 2 Department
More informationIndefinite and physics-based preconditioning
Indefinite and physics-based preconditioning Jed Brown VAW, ETH Zürich 2009-01-29 Newton iteration Standard form of a nonlinear system F (u) 0 Iteration Solve: Update: J(ũ)u F (ũ) ũ + ũ + u Example (p-bratu)
More informationAlgebraic Multigrid as Solvers and as Preconditioner
Ò Algebraic Multigrid as Solvers and as Preconditioner Domenico Lahaye domenico.lahaye@cs.kuleuven.ac.be http://www.cs.kuleuven.ac.be/ domenico/ Department of Computer Science Katholieke Universiteit Leuven
More informationKasetsart University Workshop. Multigrid methods: An introduction
Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available
More informationRobust solution of Poisson-like problems with aggregation-based AMG
Robust solution of Poisson-like problems with aggregation-based AMG Yvan Notay Université Libre de Bruxelles Service de Métrologie Nucléaire Paris, January 26, 215 Supported by the Belgian FNRS http://homepages.ulb.ac.be/
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 24: Preconditioning and Multigrid Solver Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 5 Preconditioning Motivation:
More informationLecture 17: Iterative Methods and Sparse Linear Algebra
Lecture 17: Iterative Methods and Sparse Linear Algebra David Bindel 25 Mar 2014 Logistics HW 3 extended to Wednesday after break HW 4 should come out Monday after break Still need project description
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More information1. Fast Iterative Solvers of SLE
1. Fast Iterative Solvers of crucial drawback of solvers discussed so far: they become slower if we discretize more accurate! now: look for possible remedies relaxation: explicit application of the multigrid
More informationAn Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems
An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems P.-O. Persson and J. Peraire Massachusetts Institute of Technology 2006 AIAA Aerospace Sciences Meeting, Reno, Nevada January 9,
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large
More informationConstrained Minimization and Multigrid
Constrained Minimization and Multigrid C. Gräser (FU Berlin), R. Kornhuber (FU Berlin), and O. Sander (FU Berlin) Workshop on PDE Constrained Optimization Hamburg, March 27-29, 2008 Matheon Outline Successive
More informationFAS and Solver Performance
FAS and Solver Performance Matthew Knepley Mathematics and Computer Science Division Argonne National Laboratory Fall AMS Central Section Meeting Chicago, IL Oct 05 06, 2007 M. Knepley (ANL) FAS AMS 07
More informationA Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers
Applied and Computational Mathematics 2017; 6(4): 202-207 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20170604.18 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Robust Preconditioned
More informationNonlinear Preconditioning in PETSc
Nonlinear Preconditioning in PETSc Matthew Knepley PETSc Team Computation Institute University of Chicago Department of Molecular Biology and Physiology Rush University Medical Center Algorithmic Adaptivity
More informationElliptic Problems / Multigrid. PHY 604: Computational Methods for Physics and Astrophysics II
Elliptic Problems / Multigrid Summary of Hyperbolic PDEs We looked at a simple linear and a nonlinear scalar hyperbolic PDE There is a speed associated with the change of the solution Explicit methods
More informationAdaptive algebraic multigrid methods in lattice computations
Adaptive algebraic multigrid methods in lattice computations Karsten Kahl Bergische Universität Wuppertal January 8, 2009 Acknowledgements Matthias Bolten, University of Wuppertal Achi Brandt, Weizmann
More informationReview: From problem to parallel algorithm
Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:
More informationPreliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools
Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Xiangjun Wu (1),Lilun Zhang (2),Junqiang Song (2) and Dehui Chen (1) (1) Center for Numerical Weather Prediction, CMA (2) School
More informationAn Accelerated Block-Parallel Newton Method via Overlapped Partitioning
An Accelerated Block-Parallel Newton Method via Overlapped Partitioning Yurong Chen Lab. of Parallel Computing, Institute of Software, CAS (http://www.rdcps.ac.cn/~ychen/english.htm) Summary. This paper
More information9.1 Preconditioned Krylov Subspace Methods
Chapter 9 PRECONDITIONING 9.1 Preconditioned Krylov Subspace Methods 9.2 Preconditioned Conjugate Gradient 9.3 Preconditioned Generalized Minimal Residual 9.4 Relaxation Method Preconditioners 9.5 Incomplete
More informationSolving Ax = b, an overview. Program
Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview
More informationLecture 18 Classical Iterative Methods
Lecture 18 Classical Iterative Methods MIT 18.335J / 6.337J Introduction to Numerical Methods Per-Olof Persson November 14, 2006 1 Iterative Methods for Linear Systems Direct methods for solving Ax = b,
More informationEfficient multigrid solvers for mixed finite element discretisations in NWP models
1/20 Efficient multigrid solvers for mixed finite element discretisations in NWP models Colin Cotter, David Ham, Lawrence Mitchell, Eike Hermann Müller *, Robert Scheichl * * University of Bath, Imperial
More informationEfficient Augmented Lagrangian-type Preconditioning for the Oseen Problem using Grad-Div Stabilization
Efficient Augmented Lagrangian-type Preconditioning for the Oseen Problem using Grad-Div Stabilization Timo Heister, Texas A&M University 2013-02-28 SIAM CSE 2 Setting Stationary, incompressible flow problems
More informationA Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation
A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation Tao Zhao 1, Feng-Nan Hwang 2 and Xiao-Chuan Cai 3 Abstract In this paper, we develop an overlapping domain decomposition
More informationHigh Performance Nonlinear Solvers
What is a nonlinear system? High Performance Nonlinear Solvers Michael McCourt Division Argonne National Laboratory IIT Meshfree Seminar September 19, 2011 Every nonlinear system of equations can be described
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 Iteration basics Notes for 2016-11-07 An iterative solver for Ax = b is produces a sequence of approximations x (k) x. We always stop after finitely many steps, based on some convergence criterion, e.g.
More informationParallel sparse linear solvers and applications in CFD
Parallel sparse linear solvers and applications in CFD Jocelyne Erhel Joint work with Désiré Nuentsa Wakam () and Baptiste Poirriez () SAGE team, Inria Rennes, France journée Calcul Intensif Distribué
More informationUsing an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs
Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs Pasqua D Ambra Institute for Applied Computing (IAC) National Research Council of Italy (CNR) pasqua.dambra@cnr.it
More informationDiscretization of PDEs and Tools for the Parallel Solution of the Resulting Systems
Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday April 4,
More informationMultilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses
Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses P. Boyanova 1, I. Georgiev 34, S. Margenov, L. Zikatanov 5 1 Uppsala University, Box 337, 751 05 Uppsala,
More informationMultipole-Based Preconditioners for Sparse Linear Systems.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal
More informationIterative Methods and Multigrid
Iterative Methods and Multigrid Part 3: Preconditioning 2 Eric de Sturler Preconditioning The general idea behind preconditioning is that convergence of some method for the linear system Ax = b can be
More informationScalable Non-blocking Preconditioned Conjugate Gradient Methods
Scalable Non-blocking Preconditioned Conjugate Gradient Methods Paul Eller and William Gropp University of Illinois at Urbana-Champaign Department of Computer Science Supercomputing 16 Paul Eller and William
More informationCLASSICAL ITERATIVE METHODS
CLASSICAL ITERATIVE METHODS LONG CHEN In this notes we discuss classic iterative methods on solving the linear operator equation (1) Au = f, posed on a finite dimensional Hilbert space V = R N equipped
More informationLecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.
Lecture 9 Approximations of Laplace s Equation, Finite Element Method Mathématiques appliquées (MATH54-1) B. Dewals, C. Geuzaine V1.2 23/11/218 1 Learning objectives of this lecture Apply the finite difference
More informationNonlinear Preconditioning in PETSc
Nonlinear Preconditioning in PETSc Matthew Knepley PETSc Team Computation Institute University of Chicago Challenges in 21st Century Experimental Mathematical Computation ICERM, Providence, RI July 22,
More informationComputers and Mathematics with Applications
Computers and Mathematics with Applications 68 (2014) 1151 1160 Contents lists available at ScienceDirect Computers and Mathematics with Applications journal homepage: www.elsevier.com/locate/camwa A GPU
More informationK.S. Kang. The multigrid method for an elliptic problem on a rectangular domain with an internal conductiong structure and an inner empty space
K.S. Kang The multigrid method for an elliptic problem on a rectangular domain with an internal conductiong structure and an inner empty space IPP 5/128 September, 2011 The multigrid method for an elliptic
More informationNew Multigrid Solver Advances in TOPS
New Multigrid Solver Advances in TOPS R D Falgout 1, J Brannick 2, M Brezina 2, T Manteuffel 2 and S McCormick 2 1 Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, P.O.
More informationIntegration of PETSc for Nonlinear Solves
Integration of PETSc for Nonlinear Solves Ben Jamroz, Travis Austin, Srinath Vadlamani, Scott Kruger Tech-X Corporation jamroz@txcorp.com http://www.txcorp.com NIMROD Meeting: Aug 10, 2010 Boulder, CO
More informationComposing Nonlinear Solvers
Composing Nonlinear Solvers Matthew Knepley Computational and Applied Mathematics Rice University MIT Aeronautics and Astronautics Boston, MA May 10, 2016 Matt (Rice) PETSc MIT 1 / 69 What is PETSc? PETSc
More informationPETSc for Python. Lisandro Dalcin
PETSc for Python http://petsc4py.googlecode.com Lisandro Dalcin dalcinl@gmail.com Centro Internacional de Métodos Computacionales en Ingeniería Consejo Nacional de Investigaciones Científicas y Técnicas
More informationPreconditioners for the incompressible Navier Stokes equations
Preconditioners for the incompressible Navier Stokes equations C. Vuik M. ur Rehman A. Segal Delft Institute of Applied Mathematics, TU Delft, The Netherlands SIAM Conference on Computational Science and
More informationA Numerical Study of Some Parallel Algebraic Preconditioners
A Numerical Study of Some Parallel Algebraic Preconditioners Xing Cai Simula Research Laboratory & University of Oslo PO Box 1080, Blindern, 0316 Oslo, Norway xingca@simulano Masha Sosonkina University
More informationChapter 7 Iterative Techniques in Matrix Algebra
Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition
More information7.4 The Saddle Point Stokes Problem
346 CHAPTER 7. APPLIED FOURIER ANALYSIS 7.4 The Saddle Point Stokes Problem So far the matrix C has been diagonal no trouble to invert. This section jumps to a fluid flow problem that is still linear (simpler
More informationLecture 8: Fast Linear Solvers (Part 7)
Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi
More informationUniversity of Illinois at Urbana-Champaign. Multigrid (MG) methods are used to approximate solutions to elliptic partial differential
Title: Multigrid Methods Name: Luke Olson 1 Affil./Addr.: Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 email: lukeo@illinois.edu url: http://www.cs.uiuc.edu/homes/lukeo/
More informationCME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.
CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax
More informationDistributed Memory Parallelization in NGSolve
Distributed Memory Parallelization in NGSolve Lukas Kogler June, 2017 Inst. for Analysis and Scientific Computing, TU Wien From Shared to Distributed Memory Shared Memory Parallelization via threads (
More informationParallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1
Parallel Numerics, WT 2016/2017 5 Iterative Methods for Sparse Linear Systems of Equations page 1 of 1 Contents 1 Introduction 1.1 Computer Science Aspects 1.2 Numerical Problems 1.3 Graphs 1.4 Loop Manipulations
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationBackground. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58
Background C. T. Kelley NC State University tim kelley@ncsu.edu C. T. Kelley Background NCSU, Spring 2012 1 / 58 Notation vectors, matrices, norms l 1 : max col sum... spectral radius scaled integral norms
More informationParallel Discontinuous Galerkin Method
Parallel Discontinuous Galerkin Method Yin Ki, NG The Chinese University of Hong Kong Aug 5, 2015 Mentors: Dr. Ohannes Karakashian, Dr. Kwai Wong Overview Project Goal Implement parallelization on Discontinuous
More informationPDE Solvers for Fluid Flow
PDE Solvers for Fluid Flow issues and algorithms for the Streaming Supercomputer Eran Guendelman February 5, 2002 Topics Equations for incompressible fluid flow 3 model PDEs: Hyperbolic, Elliptic, Parabolic
More informationANALYSIS OF AUGMENTED LAGRANGIAN-BASED PRECONDITIONERS FOR THE STEADY INCOMPRESSIBLE NAVIER-STOKES EQUATIONS
ANALYSIS OF AUGMENTED LAGRANGIAN-BASED PRECONDITIONERS FOR THE STEADY INCOMPRESSIBLE NAVIER-STOKES EQUATIONS MICHELE BENZI AND ZHEN WANG Abstract. We analyze a class of modified augmented Lagrangian-based
More informationAlgebraic Multigrid Methods for the Oseen Problem
Algebraic Multigrid Methods for the Oseen Problem Markus Wabro Joint work with: Walter Zulehner, Linz www.numa.uni-linz.ac.at This work has been supported by the Austrian Science Foundation Fonds zur Förderung
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD
More informationUniversität Dortmund UCHPC. Performance. Computing for Finite Element Simulations
technische universität dortmund Universität Dortmund fakultät für mathematik LS III (IAM) UCHPC UnConventional High Performance Computing for Finite Element Simulations S. Turek, Chr. Becker, S. Buijssen,
More informationAn advanced ILU preconditioner for the incompressible Navier-Stokes equations
An advanced ILU preconditioner for the incompressible Navier-Stokes equations M. ur Rehman C. Vuik A. Segal Delft Institute of Applied Mathematics, TU delft The Netherlands Computational Methods with Applications,
More informationINTRODUCTION TO MULTIGRID METHODS
INTRODUCTION TO MULTIGRID METHODS LONG CHEN 1. ALGEBRAIC EQUATION OF TWO POINT BOUNDARY VALUE PROBLEM We consider the discretization of Poisson equation in one dimension: (1) u = f, x (0, 1) u(0) = u(1)
More informationMULTIGRID METHODS FOR NONLINEAR PROBLEMS: AN OVERVIEW
MULTIGRID METHODS FOR NONLINEAR PROBLEMS: AN OVERVIEW VAN EMDEN HENSON CENTER FOR APPLIED SCIENTIFIC COMPUTING LAWRENCE LIVERMORE NATIONAL LABORATORY Abstract Since their early application to elliptic
More informationAn Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84
An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A
More informationModelling and implementation of algorithms in applied mathematics using MPI
Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline
More informationGeometric Multigrid Methods
Geometric Multigrid Methods Susanne C. Brenner Department of Mathematics and Center for Computation & Technology Louisiana State University IMA Tutorial: Fast Solution Techniques November 28, 2010 Ideas
More informationA High-Performance Parallel Hybrid Method for Large Sparse Linear Systems
Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,
More informationOn domain decomposition preconditioners for finite element approximations of the Helmholtz equation using absorption
On domain decomposition preconditioners for finite element approximations of the Helmholtz equation using absorption Ivan Graham and Euan Spence (Bath, UK) Collaborations with: Paul Childs (Emerson Roxar,
More informationThe Removal of Critical Slowing Down. Lattice College of William and Mary
The Removal of Critical Slowing Down Lattice 2008 College of William and Mary Michael Clark Boston University James Brannick, Rich Brower, Tom Manteuffel, Steve McCormick, James Osborn, Claudio Rebbi 1
More informationA User Friendly Toolbox for Parallel PDE-Solvers
A User Friendly Toolbox for Parallel PDE-Solvers Gundolf Haase Institut for Mathematics and Scientific Computing Karl-Franzens University of Graz Manfred Liebmann Mathematics in Sciences Max-Planck-Institute
More informationOn nonlinear adaptivity with heterogeneity
On nonlinear adaptivity with heterogeneity Jed Brown jed@jedbrown.org (CU Boulder) Collaborators: Mark Adams (LBL), Matt Knepley (UChicago), Dave May (ETH), Laetitia Le Pourhiet (UPMC), Ravi Samtaney (KAUST)
More informationDomain decomposition on different levels of the Jacobi-Davidson method
hapter 5 Domain decomposition on different levels of the Jacobi-Davidson method Abstract Most computational work of Jacobi-Davidson [46], an iterative method suitable for computing solutions of large dimensional
More informationSOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization
More informationA Review of Preconditioning Techniques for Steady Incompressible Flow
Zeist 2009 p. 1/43 A Review of Preconditioning Techniques for Steady Incompressible Flow David Silvester School of Mathematics University of Manchester Zeist 2009 p. 2/43 PDEs Review : 1984 2005 Update
More informationAggregation-based algebraic multigrid
Aggregation-based algebraic multigrid from theory to fast solvers Yvan Notay Université Libre de Bruxelles Service de Métrologie Nucléaire CEMRACS, Marseille, July 18, 2012 Supported by the Belgian FNRS
More informationToward less synchronous composable multilevel methods for implicit multiphysics simulation
Toward less synchronous composable multilevel methods for implicit multiphysics simulation Jed Brown 1, Mark Adams 2, Peter Brune 1, Matt Knepley 3, Barry Smith 1 1 Mathematics and Computer Science Division,
More informationFINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION
FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros
More informationM.A. Botchev. September 5, 2014
Rome-Moscow school of Matrix Methods and Applied Linear Algebra 2014 A short introduction to Krylov subspaces for linear systems, matrix functions and inexact Newton methods. Plan and exercises. M.A. Botchev
More informationFine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning
Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology, USA SPPEXA Symposium TU München,
More informationMultigrid finite element methods on semi-structured triangular grids
XXI Congreso de Ecuaciones Diferenciales y Aplicaciones XI Congreso de Matemática Aplicada Ciudad Real, -5 septiembre 009 (pp. 8) Multigrid finite element methods on semi-structured triangular grids F.J.
More informationReview of matrices. Let m, n IN. A rectangle of numbers written like A =
Review of matrices Let m, n IN. A rectangle of numbers written like a 11 a 12... a 1n a 21 a 22... a 2n A =...... a m1 a m2... a mn where each a ij IR is called a matrix with m rows and n columns or an
More information