SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
|
|
- Cuthbert Cross
- 6 years ago
- Views:
Transcription
1 SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015
2 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS
3 2D POISSON PROBLEM 2D Poisson problem solution at Cartesius pardiso 1 thread pardiso 12 threads pardiso 24 threads fishpack lapack mkl n x = n y Results on 1 node of Cartesius with 24 cores LAPACK: fastest implementation on Cartesius PARDISO: shared-memory multiprocessing parallel direct sparse solver by Olaf Schenk[ 00-04] optimized for Intel R
4 residu 2D POISSON PROBLEM 2D Poisson problem accuracy pardiso fishpack lapack n x = n y LAPACK: maximum problem size n x = n y = 1300 FISHPACK: convergence till problem size n x = n y = 1400 PARDISO: maximum problem size n x = n y = 5600
5 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS
6 CLUSTER MACHINE Cartesius, the Dutch Supercomputer at SURFsara is a cluster machine Node Type Number Cores CPU Clock Memory thin E v3 2.6 GHz 64 GB thin E v2 2.4 GHz 64 GB fat E GHz 256 GB gpu E v2 2.5 GHz 96 GB 40,960 cores GPUs: Pflop/s (peak performance) 117 TB memory (CPU + GPGPU) Fat nodes have 4 times more memory than thin nodes, but are slower
7 NODES AND CORES A Cartesius node can have 24 or 32 cores Within a node shared memory Over nodes distributed memory Nodes can be configured in different ways 1 NODE 1 NODE 1 NODE 8 CORES (a) 8 MPI processes 8 CORES (b) 8 OpenMP threads 8 CORES (c) 4 MPI processes
8 SOFTWARE MKL LIBRARY Intel R Math Kernel Library is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math. The routines in MKL are hand-optimized specifically for Intel R processors. Sparse solvers: MKL PARDISO- Parallel Direct Sparse Solver interface Parallel Direct Sparse Solver for Cluster Interface Direct Sparse Solvers (DDS) (Interface Routines) Iterative Sparse Solvers (based on Reverse Communication Interface)
9 SOFTWARE Intel R Poisson solvers for a single node: Two-dimensional Helmholtz problem on a Cartesian plane Two-dimensional Poisson problem on a Cartesian plane Two-dimensional Laplace problem on a Cartesian plane Helmholtz problem on a sphere Poisson problem on a sphere Three-dimensional Helmholtz problem Three-dimensional Poisson problem Three-dimensional Laplace problem
10 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS
11 1D CELL CENTERED DIRICHLET BC Hundsdorfer and Verwer: Consider cell centered grid with nodes x i = (i 1 )h; i = 1,, M; h = 1/M. 2 For Dirichlet BC we need in x 0 = 1 h and in x 2 M+1 = h, 2 the virtual values u 0 and u M+1, such that 1 2 (u 0 + u 1 ) = γ (u M + u M+1 ) = γ M. We obtain the following semi-discrete system u 1 u i u M = 1 ( 3u h u 2 ) + 2 γ h 2 0, = 1 (u h 2 i 1 2u i + u i+1 ), 2 i M 1, = 1 (u h 2 M 1 3u M ) + 2 γ h 2 M,
12 1D CELL CENTERED DIRICHLET BC 1D Poisson matrix A of size M and RHS vector b are defined by A = 1 h , b = Note: the Poisson matrix is symmetric positive indefinite Note: correction on the RHS vector b h 2 γ 0 b 2 b 3... b M 1 b M + 2 h 2 γ M
13 2D AND 3D CELL CENTERED DIRICHLET BC 2D Poisson matrix A of size M 2 M 2 for M = 4 is defined by A = 1 h For the 3D case we distinguish 3 diagonal parts [ ] for cells on edges [ ] for cells on surfaces [ ] for inner cells supplemented with 3 sub diagonals and 3 super diagonals
14 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS
15 POISSON SOLVER FOR LARGE 2D AND 3D SIMULATIONS Poisson solvers PARDISO (MKL) CLUSTER_SPARSE_SOLVER (MKL) MUMPS Release 5.0.1
16 ANALYSIS, FACTORIZATION, SOLVE To solve we factorize A into A x = b A = L D L T For both PARDISO, CLUSTER_SPARSE_SOLVER and MUMPS we can distinguish three main phases analysis and reordering factorization solution Note 1 : Each phase can be called independently (not for FISHPACK) Note 2 : Once the matrix has been factorized we may restrict to the solution phase
17 ANALYSIS, FACTORIZATION, SOLVE Analysis phase reordering of the matrix to reduce fill-in choosing pivots using a selection criterion to preserve sparsity matrix input distributions CRS for PARDISO and CLUSTER_SPARSE_SOLVER Central assembled matrix format for MUMPS matrix only on host or distributed over processes if desired an analysis report is made
18 ANALYSIS, FACTORIZATION, SOLVE Factorization phase most time consuming phase most memory consuming phase if desired a report about the factorization is made pivot strategy required only once?
19 ANALYSIS, FACTORIZATION, SOLVE Solution phase Post-processing: iterative refinement Error analysis Compute r = Ax b then max i=1,,m r i < 1 E 12 Let x cont be the solution of the continuous problem then Residu : x x cont 2 or Residu : max i=1,,m x(i) x cont(i)
20 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS
21 2D POISSON PROBLEM Solve U(x, y) = ( 2 x + 2 ) U(x, y) 2 y 2 using a 4-pt centered 2-nd order difference scheme. 2D POISSON PROBLEM WITH KNOWN SOLUTION U(x, y) = exp ( C((x x 0 ) 2 + (y y 0 ) 2 )) U(x, y) = ( 4C + 4C 2 ((x x 0 ) 2 + (y y 0 ) 2 )) exp ( C((x x 0 ) 2 + (y y 0 ) 2 )) on an uniform grid defined on x [0, 1] and y [0, 1] and C {,, 10 4, 10 6 } and x 0 = y 0 = 0.5
22 2D P OISSON PROBLEM Y (d) C = 10 Y X (e) C = X Y (f) C = 104 X Y (g) C = 106 X
23 Residu (2-norm) 2D POISSON PROBLEM D Poisson problem accuracy C= C= C=10 4 C= D Reordering phase on 1 node C=1 C= C=10 4 C= nx = ny (h) convergence 2D Factorize phase on 1 node C= C= C=10 4 C= nx = ny (i) reordering PARDISO 2D Solution phase on 1 node C= C= C=10 4 C= nx = ny (j) factorization PARDISO nx = ny (k) solution PARDISO
24 3D POISSON PROBLEM Solve U(x, y, z) = ( 2 x y + 2 ) U(x, y, z) 2 z2 using a 6-pt centered 2-nd order difference scheme. 3D POISSON PROBLEM WITH KNOWN SOLUTION U(x, y, z) = exp ( C((x x 0 ) 2 + (y y 0 ) 2 ) + (z z 0 ) 2 ) U(x, y, z) = ( 4C + 4C 2 ((x x 0 ) 2 + (y y 0 ) 2 ) + (z z 0 ) 2 ) exp ( C((x x 0 ) 2 + (y y 0 ) 2 + (z z 0 ) 2 )) on an uniform grid defined on x [0, 1], y [0, 1] and z [0, 1] and C {,, 10 4, 10 6 } and x 0 = y 0 = z 0 = 0.5
25 3D POISSON PROBLEM CLUSTER_SPARSE_SOLVER residu(max norm) 3D Poisson problem accuracy 12 cores CLUSTER D Reordering phase 12 cores CLUSTER_SPARSE_SOLVER (l) convergence D Factorize phase 12 cores CLUSTER (m) reordering 3D Solution phase 12 cores CLUSTER_SPARSE_SOLVER 10 3 (n) factorization (o) solution
26 3D POISSON PROBLEM CLUSTER_SPARSE_SOLVER 3D Reordering phase 12 cores CLUSTER_SPARSE_SOLVER D Factorize phase 12 cores CLUSTER 3D Solution phase 12 cores CLUSTER_SPARSE_SOLVER 10 3 (p) reordering (q) factorization (r) solution 3D Reordering phase 24 cores CLUSTER_SPARSE_SOLVER D Factorize phase 24 cores CLUSTER_SPARSE_SOLVER D Solution phase 24 cores CLUSTER_SPARSE_SOLVER 10 3 (s) reordering (t) factorization (u) solution FIGURE: Number of cores per node 12 (upper) and 24 (lower) figures
27 3D POISSON PROBLEM MUMPS D Reordering phase MUMPS D Factorize phase MUMPS 3D Solution phase MUMPS 10 3 (a) reordering (b) factorization (c) solution D Reordering phase MUMPS D Factorize phase MUMPS 3D Solution phase MUMPS (d) reordering (e) factorization (f) solution FIGURE: Number of cores per node 12 (upper) and 24 (lower) figures
28 3D POISSON PROBLEM CLUSTER_SPARSE_SOLVER VERSUS MUMPS 3D Reordering phase 24 cores CLUSTER_SPARSE_SOLVER D Factorize phase 24 cores CLUSTER_SPARSE_SOLVER D Solution phase 24 cores CLUSTER_SPARSE_SOLVER 10 3 (a) reordering (b) factorization (c) solution D Reordering phase MUMPS D Factorize phase MUMPS 3D Solution phase MUMPS (d) reordering (e) factorization (f) solution FIGURE: CLUSTER_SPARSE_SOLVER (upper) versus MUMPS (lower) figures; number of cores per node 24
29 3D POISSON PROBLEM MUMPS Speedup Speedup Speedup D Reordering phase MUMPS D Factorize phase MUMPS D Solution phase MUMPS (a) reordering (b) factorization (c) solution FIGURE: Speedup compared with 1 node
30 3D POISSON PROBLEM Analysis report for 3D MUMPS on 64 nodes n x N NZ operations host avg total MBYTES MBYTES MBYTES E E E E E E E E E
31 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS 5 EXAMPLE OF USE 6 CONCLUSIONS AND REMARKS
32 CONCLUSIONS, REMARKS AND QUESTIONS 2D Poisson problems up to n x = n y = 5400 on single node 2D Poisson problems up to n x = n y = on 32 nodes 3D Poisson problems up to n x = n y = n z = 128 on single nodes 3D Poisson problems up to n x = n y = n z = 256 on 64 nodes MUMPS is very suitable for cluster machines CLUSTER_SPARSE_SOLVER can handle larger problems than MUMPS the solution phase of CLUSTER_SPARSE_SOLVER is slower than MUMPS use MKL software where possible also for MUMPS parallelization with MUMPS or CLUSTER_SPARSE_SOLVER is NOT difficult forget about FISHPACK it is no longer the fastest solver results obtained by FISHPACK are not reliable
33 CONCLUSIONS, REMARKS AND QUESTIONS Is it possible to accelerate Anna s code? Is the 3D approach suitable for Anna? More questions
Parallel sparse direct solvers for Poisson s equation in streamer discharges
Parallel sparse direct solvers for Poisson s equation in streamer discharges Margreet Nool, Menno Genseberger 2 and Ute Ebert,3 Centrum Wiskunde & Informatica (CWI), P.O.Box 9479, 9 GB Amsterdam, The Netherlands
More informationSome notes on efficient computing and setting up high performance computing environments
Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient
More informationSOLUTION of linear systems of equations of the form:
Proceedings of the Federated Conference on Computer Science and Information Systems pp. Mixed precision iterative refinement techniques for the WZ factorization Beata Bylina Jarosław Bylina Institute of
More informationSparse Matrix Computations in Arterial Fluid Mechanics
Sparse Matrix Computations in Arterial Fluid Mechanics Murat Manguoğlu Middle East Technical University, Turkey Kenji Takizawa Ahmed Sameh Tayfun Tezduyar Waseda University, Japan Purdue University, USA
More informationA CUDA Solver for Helmholtz Equation
Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationPFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers
PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers James Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad Electrical and Computer Engineering Department,
More informationSolving PDEs: the Poisson problem TMA4280 Introduction to Supercomputing
Solving PDEs: the Poisson problem TMA4280 Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF February 27. 2017 1 The Poisson problem The Poisson equation is an elliptic partial
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationFEAST eigenvalue algorithm and solver: review and perspectives
FEAST eigenvalue algorithm and solver: review and perspectives Eric Polizzi Department of Electrical and Computer Engineering University of Masachusetts, Amherst, USA Sparse Days, CERFACS, June 25, 2012
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationJacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA
Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationDomain Decomposition-based contour integration eigenvalue solvers
Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM
More informationSolution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI
Solution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI Sagar Bhatt Person Number: 50170651 Department of Mechanical and Aerospace Engineering,
More informationParallel Algorithms for Solution of Large Sparse Linear Systems with Applications
Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Murat Manguoğlu Department of Computer Engineering Middle East Technical University, Ankara, Turkey Prace workshop: HPC
More informationIntel Math Kernel Library (Intel MKL) LAPACK
Intel Math Kernel Library (Intel MKL) LAPACK Linear equations Victor Kostin Intel MKL Dense Solvers team manager LAPACK http://www.netlib.org/lapack Systems of Linear Equations Linear Least Squares Eigenvalue
More informationBalanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems
Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationReview for the Midterm Exam
Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More informationAccelerating Model Reduction of Large Linear Systems with Graphics Processors
Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex
More informationMaximum-weighted matching strategies and the application to symmetric indefinite systems
Maximum-weighted matching strategies and the application to symmetric indefinite systems by Stefan Röllin, and Olaf Schenk 2 Technical Report CS-24-7 Department of Computer Science, University of Basel
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationA dissection solver with kernel detection for unsymmetric matrices in FreeFem++
. p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier
More informationA Comparison of Solving the Poisson Equation Using Several Numerical Methods in Matlab and Octave on the Cluster maya
A Comparison of Solving the Poisson Equation Using Several Numerical Methods in Matlab and Octave on the Cluster maya Sarah Swatski, Samuel Khuvis, and Matthias K. Gobbert (gobbert@umbc.edu) Department
More informationEvaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries
Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou (ctheodos@grid.auth.gr) User and Application Support Scientific Computing Centre @ AUTH Presentation Outline
More informationINITIAL INTEGRATION AND EVALUATION
INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationAccelerating incompressible fluid flow simulations on hybrid CPU/GPU systems
Accelerating incompressible fluid flow simulations on hybrid CPU/GPU systems Yushan Wang 1, Marc Baboulin 1,2, Karl Rupp 3,4, Yann Fraigneau 1,5, Olivier Le Maître 1,5 1 Université Paris-Sud, France 2
More informationMultilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota
Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work
More informationDiscretization of PDEs and Tools for the Parallel Solution of the Resulting Systems
Discretization of PDEs and Tools for the Parallel Solution of the Resulting Systems Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday April 4,
More informationMUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux
The MUMPS library: work done during the SOLSTICE project MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux Sparse Days and ANR SOLSTICE Final Workshop June MUMPS MUMPS Team since beg. of SOLSTICE (2007) Permanent
More informationParallelization of the Molecular Orbital Program MOS-F
Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,
More informationSolving Ax = b, an overview. Program
Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview
More informationMultipole-Based Preconditioners for Sparse Linear Systems.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal
More informationRecent advances in sparse linear solver technology for semiconductor device simulation matrices
Recent advances in sparse linear solver technology for semiconductor device simulation matrices (Invited Paper) Olaf Schenk and Michael Hagemann Department of Computer Science University of Basel Basel,
More informationRWTH Aachen University
IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016
More informationHPMPC - A new software package with efficient solvers for Model Predictive Control
- A new software package with efficient solvers for Model Predictive Control Technical University of Denmark CITIES Second General Consortium Meeting, DTU, Lyngby Campus, 26-27 May 2015 Introduction Model
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationLecture 18 Classical Iterative Methods
Lecture 18 Classical Iterative Methods MIT 18.335J / 6.337J Introduction to Numerical Methods Per-Olof Persson November 14, 2006 1 Iterative Methods for Linear Systems Direct methods for solving Ax = b,
More informationLarge-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors
Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr) Principal Researcher / Korea Institute of Science and Technology
More informationScientific Computing
Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting
More informationUTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement
UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement Wuxi Li, Meng Li, Jiajun Wang, and David Z. Pan University of Texas at Austin wuxili@utexas.edu November 14, 2017 UT DA Wuxi Li
More informationA Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures
A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationAlgorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method
Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk
More informationERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,
More informationMulticore Parallelization of Determinant Quantum Monte Carlo Simulations
Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March
More informationEVALUATING SPARSE LINEAR SYSTEM SOLVERS ON SCALABLE PARALLEL ARCHITECTURES
AFRL-RI-RS-TR-2008-273 Final Technical Report October 2008 EVALUATING SPARSE LINEAR SYSTEM SOLVERS ON SCALABLE PARALLEL ARCHITECTURES Purdue University Sponsored by Defense Advanced Research Projects Agency
More information1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria
1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation
More informationAlgorithms and Methods for Fast Model Predictive Control
Algorithms and Methods for Fast Model Predictive Control Technical University of Denmark Department of Applied Mathematics and Computer Science 13 April 2016 Background: Model Predictive Control Model
More informationParallel Multivariate SpatioTemporal Clustering of. Large Ecological Datasets on Hybrid Supercomputers
Parallel Multivariate SpatioTemporal Clustering of Large Ecological Datasets on Hybrid Supercomputers Sarat Sreepathi1, Jitendra Kumar1, Richard T. Mills2, Forrest M. Hoffman1, Vamsi Sripathi3, William
More informationUtilisation de la compression low-rank pour réduire la complexité du solveur PaStiX
Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.
More informationA simple FEM solver and its data parallelism
A simple FEM solver and its data parallelism Gundolf Haase Institute for Mathematics and Scientific Computing University of Graz, Austria Chile, Jan. 2015 Partial differential equation Considered Problem
More informationPorting a sphere optimization program from LAPACK to ScaLAPACK
Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More informationSolving Large Nonlinear Sparse Systems
Solving Large Nonlinear Sparse Systems Fred W. Wubs and Jonas Thies Computational Mechanics & Numerical Mathematics University of Groningen, the Netherlands f.w.wubs@rug.nl Centre for Interdisciplinary
More informationSymmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano
Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization
More informationDual Reciprocity Boundary Element Method for Magma Ocean Simulations
Dual Reciprocity Boundary Element Method for Magma Ocean Simulations Tyler W. Drombosky drombosk@math.umd.edu Saswata Hier-Majumder saswata@umd.edu 28 April 2010 Physical Motivation Earth s early history
More informationAn evaluation of sparse direct symmetric solvers: an introduction and preliminary finding
Numerical Analysis Group Internal Report 24- An evaluation of sparse direct symmetric solvers: an introduction and preliminary finding J A Scott Y Hu N I M Gould November 8, 24 c Council for the Central
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More informationMatrix Assembly in FEA
Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationPRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM
Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures. Mark Gates. February 2012
MAGMA Matrix Algebra on GPU and Multicore Architectures Mark Gates February 2012 1 Hardware trends Scale # cores instead of clock speed Hardware issue became software issue Multicore Hybrid 1.E+07 1e7
More informationQuantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm)
Quantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm) Alexander Smith & Khashayar Khavari Department of Electrical and Computer Engineering University of Toronto April 15, 2009 Alexander
More informationApplication of Maxwell Equations to Human Body Modelling
Application of Maxwell Equations to Human Body Modelling Fumie Costen Room E3, E0c at Sackville Street Building, fc@cs.man.ac.uk November 6, 00 Fumie Costen Room E3, E0c at Sackville Street Building, Application
More informationMPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory
MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL
More informationPreconditioned Parallel Block Jacobi SVD Algorithm
Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic
More informationR. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012
R. Glenn Brook, Bilel Hadri*, Vincent C. Betro, Ryan C. Hulguin, and Ryan Braby Cray Users Group 2012 Stuttgart, Germany April 29 May 3, 2012 * presenting author Contents Overview on AACE Overview on MIC
More informationMassively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling
2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)
More informationGPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic
GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago
More informationGPU accelerated Arnoldi solver for small batched matrix
15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA
More informationParallel Polynomial Evaluation
Parallel Polynomial Evaluation Jan Verschelde joint work with Genady Yoffe University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/ jan jan@math.uic.edu
More informationStatic-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems
Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse
More informationOpen-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --
Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer
More informationRecent advances in HPC with FreeFem++
Recent advances in HPC with FreeFem++ Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann Fourth workshop on FreeFem++ December 6, 2012 With F. Hecht, F. Nataf, C. Prud homme. Outline
More informationThe Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors
Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems Journée problème de Poisson, IHP, Paris M. Faverge, P. Ramet M. Faverge Assistant Professor Bordeaux INP LaBRI Inria Bordeaux - Sud-Ouest
More informationPosition Papers of the 2013 Federated Conference on Computer Science and Information Systems pp
Position Papers of the 2013 Federated Conference on Computer Science and Information Systems pp. 27 32 Performance Evaluation of MPI/OpenMP Algorithm for 3D Time Dependent Problems Ivan Lirkov Institute
More informationOpportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem
Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners: The Problem The Bethe-Salpeter
More informationMassively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling
2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)
More informationIMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM
IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM Bogdan OANCEA PhD, Associate Professor, Artife University, Bucharest, Romania E-mail: oanceab@ie.ase.ro Abstract:
More informationSOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization
More informationBinding Performance and Power of Dense Linear Algebra Operations
10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique
More informationParallelism in FreeFem++.
Parallelism in FreeFem++. Guy Atenekeng 1 Frederic Hecht 2 Laura Grigori 1 Jacques Morice 2 Frederic Nataf 2 1 INRIA, Saclay 2 University of Paris 6 Workshop on FreeFem++, 2009 Outline 1 Introduction Motivation
More informationPreliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools
Preliminary Results of GRAPES Helmholtz solver using GCR and PETSc tools Xiangjun Wu (1),Lilun Zhang (2),Junqiang Song (2) and Dehui Chen (1) (1) Center for Numerical Weather Prediction, CMA (2) School
More informationA parallel solver for incompressible fluid flows
Available online at www.sciencedirect.com Procedia Computer Science 00 (2013) 000 000 International Conference on Computational Science, ICCS 2013 A parallel solver for incompressible fluid flows Yushan
More informationInformation Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and
Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element
More informationRobust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations
Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations Rohit Gupta, Martin van Gijzen, Kees Vuik GPU Technology Conference 2012, San Jose CA. GPU Technology Conference 2012,
More informationMassively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem
Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing
More informationSparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations
Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin
More informationMassively parallel electronic structure calculations with Python software. Jussi Enkovaara Software Engineering CSC the finnish IT center for science
Massively parallel electronic structure calculations with Python software Jussi Enkovaara Software Engineering CSC the finnish IT center for science GPAW Software package for electronic structure calculations
More information