Accelerating Model Reduction of Large Linear Systems with Graphics Processors
|
|
- Gyles Short
- 6 years ago
- Views:
Transcription
1 Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex Technical Systems (Magdeburg, Germany). 2 Centro de Cálculo-Inst. de la Computación,Univ. de la República (Montevideo, Uruguay). 3 Seminar für Angewandte Mathematik, ETHZ (Zürich, Switzerland). 4 Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I (Castellón, Spain). ModRed 10 - December 2010 remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 1
2 Why GPUs? Chapter 1. Introduction Figure 1-1. Floating-Point Operations per Second and Memory Bandwidth for the CPU and GPU 2 CUDA C Programming Guide Version 3.2 Extracted from: CUDA C Programming Guide 3.1, NVIDIA Corporation remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 2
3 GPUs for general purpose programming GPUs where not developed for general programming Difficult and slow code generation A. Remón Model Reduction of Large Linear Systems with GPUs 3
4 GPUs for general purpose programming GPUs where not developed for general programming Difficult and slow code generation CUDA In 2006 appears CUDA (Computed Unified Device Architecture) Created by NVIDIA, is a HW-SW platform to facilitate the use of GPUs in general purpose programming Software: compilers (c, fortran), libraries (cufft, cublas,...) Hardware: efficient thread management and memory access remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 3
5 GPUs for general purpose programming GPUs where not developed for general programming Difficult and slow code generation CUDA Gap between single and double precision performance A. Remón Model Reduction of Large Linear Systems with GPUs 3
6 GPUs for general purpose programming GPUs where not developed for general programming Difficult and slow code generation CUDA Gap between single and double precision performance Fermi In 2010 appears the Fermi architecture The double precision computations are only two times slower than single precision computations A. Remón Model Reduction of Large Linear Systems with GPUs 3
7 GPUs for general purpose programming GPUs where not developed for general programming Difficult and slow code generation CUDA Gap between single and double precision performance Fermi For a list of scientific CUDA applications visit remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 3
8 Outline Model reduction Model reduction via the BT method Matrix sign function method Numerical results A. Remón Model Reduction of Large Linear Systems with GPUs 4
9 Model Reduction: Purpose Given Eẋ(t) = Ax(t) + Bu(t), t > 0, x(0) = x 0, find a reduced model y(t) = Cx(t) + Du(t), t 0, E r x r (t) = A r x r (t) + B r u(t), t > 0, x r (0) = x 0 r, y r (t) = C r x r (t) + D r u(t), t 0, of order r n and output error such that y y r = Gu G r u = (G G r )u y y r and G G r are small! remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 5
10 Model Reduction: Example Optimal cooling of steel profiles Arises in a manufacturing method for steel profiles. Objective: reduce the temperature as fast as possible. Method: spraying of cooling fluids on the surface. Goal: Material properties (durability, porosity) have to satisfy quality standards. Problem dimensions: n = 5, 177, m = 7, and p = 6. Math. model: STEEL I from the Oberwolfach benchmark collection Oberwolfach benchmark collection: Model details: [Tröltzsch/Unger 1999/2001], [Penzl 1999] and [Saak 2003]. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 6
11 Outline Model reduction Model reduction via the BT method Matrix sign function method Numerical results A. Remón Model Reduction of Large Linear Systems with GPUs 7
12 Balanced Truncation (BT) method Procedure composed of three steps: 1. Solve the coupled generalized Lyapunov matrix equations AW c E T + EW c A T + BB T = 0, A T W o E + E T W o A + C T C = 0, with W 0 = E T W o E for S, R such that W c = S T S, W o = R T R. 2. Compute [ SR T = UΣV T Σ1 = [ U 1 U 2 ] Σ2 with Σ 1 R rxr, Σ 2 R (n r)x(n r). ] [ V T 1 V T 2 ], remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 8
13 Balanced Truncation method (Cont.) 3. In the last stage T l = Σ 1/2 1 V T 1 R and T r = S T U 1 Σ 1/2 1, and (A r, B r, C r, D r, E r ) = (T l AT r, T l B, CT r, D, T l ET r ). The state-space dimension r of the reducer-order model can be chosen adaptatively as this method provides a realization Ĝ satisfying G G r 2 n j=r+1 σ j. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 9
14 Balanced Truncation method (Cont.) 3. In the last stage T l = Σ 1/2 1 V T 1 R and T r = S T U 1 Σ 1/2 1, and (A r, B r, C r, D r, E r ) = (T l AT r, T l B, CT r, D, T l ET r ). The most expensive computation is the solution of the generalized Lyapunov equations. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 9
15 Outline Model reduction Model reduction via the BT method Matrix sign function method Numerical results A. Remón Model Reduction of Large Linear Systems with GPUs 10
16 Matrix sign function method Remarks It is an efficient tool to solve stable Lyapunov equations. There are different schemes to solve the matrix sign function, like the Newton iteration method. The Newton iteration method for the matrix sign function. A 0 = A, A k+1 = 1 2 (A k + A 1 k ), Main features: Simple. Efficient on parallel implementation. Asymptotic quadratic convergence. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 11
17 Matrix sign function method Low-rank factors version of the algorithm On convergence after j iterations, W c S T S and Wo R T R. Convergence can be accelerated using a scaling factor, in our case: c k = A F / EA 1 k E F. Even if A is sparse, {A k } k=1,2,... in general are full dense matrices. Requires O(n 3 ) floating-point operations per iteration. The most computationally expensive step is the matrix inversion. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 12
18 Matrix sign function method Low-rank factors version of the algorithm Algorithm 1 CGCLNC 1: A 0 = A, Ŝ0 = B T, ˆR 0 = C. 2: k = 0. 3: repeat 4: A k+1 = 1 2 ( Ak /c k + c k (EA k 1 )E ). 5: Compute the rank-revealing QR (RRQR) decomposition [ ] [ ] 1 Sk 2ck, c k Sk (EA 1 Us k )T = Q s Π 0 s 6: S k+1 U s Π s 7: Compute the rank-revealing QR (RRQR) decomposition [ ] [ ] 1 2ck R k, c k ( R k A 1 k )E Ur = Q r Π 0 r 8: R k+1 U r Π r 9: k = k : until A k E 1 < τ A k 1 remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 13
19 Hybrid implementation Computations performed at iteration j 1. PA j = LU * CPU GPU 2. EA j 1 ; Rk A j 1 3. (EA j 1 )E 4. Compute the rank-revealing QR (RRQR) decomposition [ ] [ ] 1 S j, c j S j (EA 1 2cj j ) T Us = Q s Π 0 s 5. S j+1 U s Π s 6. Compute the rank-revealing QR (RRQR) decomposition [ ] [ ] 1 Rj, c j ( R j A 1 Ur 2cj j )E = Q r Π 0 r 7. Rj+1 U r Π r 8. A j+1 = 1 2 ( Aj /c j + c j (EA j 1 )E ) * CPU and GPU cooperate during this operation remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 14
20 Outline Model reduction Model reduction via the BT method Matrix sign function method Numerical results A. Remón Model Reduction of Large Linear Systems with GPUs 15
21 Model reduction Hardware and software Hardware Platform consisting of two Intel Xeon QuadCore E5410 processors at 2.33GHz, connected to an Nvidia Tesla C1060 via a PCI-e bus. Software LAPACK(CPU): all the computations are performed on the CPU using LAPACK and BLAS kernels (MKL v.10.2). Hybrid(CPU+GPU): computations are executed on the most convenient architecture minimizing the communications. (MKL(v.10.2)+CUBLAS(v.2.1)) remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 16
22 Model reduction Problem definition: Optimal cooling of steel profiles Model STEEL I from the Oberwolfach benchmark collection Arises in a manufacturing method for steel profiles. The objective is to design a control that yields moderate temperature gradients when the rail is cooled down. The model corresponds to a 2-D heat equation. Dimensions of the problem: n = 5, 177, m = 7, p = 6 Math. model: [Tröltzsch/Unger 1999/2001], [Penzl 1999] and [Saak 2003]. Oberwolfach benchmark collection: remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 17
23 Model reduction Problem definition: Convective Thermal Flow Problems Model FLOW METER from the Oberwolfach benchmark collection Is a 2-D model of an anemometer-like structure. Mainly consists of a tube and a small heat source. The model is given by a spatially semi-discretized instationary convection difussion equation. The reference temperature is set to 300K and Dirichlet boundary conditions as well as initial conditions are set to 0 with respect to the reference. Dimensions of the problem: n = 9, 669, m = 1, p = 5 Math. model: [Harper 1997], [Ernst 2001] and [Mossmann 2004] Oberwolfach benchmark collection: remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 18
24 Model reduction Results for benchmark STEEL I #Iter. Time (s) Conv. criterion k Hybrid Impl. LAPACK A k + E F e e e e e e e e-07 total: time is reduced on a 45% Problem dimensions: n = 5, 177, m = 7, p = 6. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 19
25 Model reduction Results for benchmark STEEL I (hybrid implementation) #Iter. k PA k = LU EA k 1, R k A k 1 Time (s) (EA k 1 )E S k (EA k 1 ),( R k A k 1 )E, Iteration Compress Accumulated time (s) Problem dimensions: n = 5, 177, m = 7, p = 6. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 20
26 Model reduction Results for benchmark FLOW METER #Iter. Time (s) Conv. criterion k Hybrid Impl. LAPACK A k + E F e e e e e e e e e e e-07 total: time is reduced on a 46% Problem dimensions: n = 9, 669, m = 1, p = 5. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 21
27 Model reduction Results for benchmark Flow Meter (hybrid implementation) #Iter. k PA k = LU EA k 1, R k A k 1 Time (s) (EA k 1 )E S k (EA k 1 ),( R k A k 1 )E, Iteration Compress Accumulated time (s) Problem dimensions: n = 9, 669, m = 1, p = 5. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 22
28 Thanks... Any question? A. Remón Model Reduction of Large Linear Systems with GPUs 23
Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems
Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.
More informationNumerical Solution of Differential Riccati Equations on Hybrid CPU-GPU Platforms
Numerical Solution of Differential Riccati Equations on Hybrid CPU-GPU Platforms Peter Benner 1, Pablo Ezzatti 2, Hermann Mena 3, Enrique S. Quintana-Ortí 4, Alfredo Remón 4 1 Fakultät für Mathematik,
More informationParallel Model Reduction of Large Linear Descriptor Systems via Balanced Truncation
Parallel Model Reduction of Large Linear Descriptor Systems via Balanced Truncation Peter Benner 1, Enrique S. Quintana-Ortí 2, Gregorio Quintana-Ortí 2 1 Fakultät für Mathematik Technische Universität
More informationAccelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction
Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction Peter Benner 1, Ernesto Dufrechou 2, Pablo Ezzatti 2, Pablo Igounet 2, Enrique S. Quintana-Ortí 3, and Alfredo Remón
More informationLevel-3 BLAS on a GPU
Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationA Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations
A Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations Peter Benner Max-Planck-Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory
More informationBALANCING-RELATED MODEL REDUCTION FOR DATA-SPARSE SYSTEMS
BALANCING-RELATED Peter Benner Professur Mathematik in Industrie und Technik Fakultät für Mathematik Technische Universität Chemnitz Computational Methods with Applications Harrachov, 19 25 August 2007
More informationEfficient Implementation of Large Scale Lyapunov and Riccati Equation Solvers
Efficient Implementation of Large Scale Lyapunov and Riccati Equation Solvers Jens Saak joint work with Peter Benner (MiIT) Professur Mathematik in Industrie und Technik (MiIT) Fakultät für Mathematik
More informationSonderforschungsbereich 393
Sonderforschungsbereich 393 Parallele Numerische Simulation für Physik und Kontinuumsmechanik Peter Benner Enrique S. Quintana-Ortí Gregorio Quintana-Ortí Solving Stable Sylvester Equations via Rational
More informationGPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic
GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago
More informationMODEL REDUCTION BY A CROSS-GRAMIAN APPROACH FOR DATA-SPARSE SYSTEMS
MODEL REDUCTION BY A CROSS-GRAMIAN APPROACH FOR DATA-SPARSE SYSTEMS Ulrike Baur joint work with Peter Benner Mathematics in Industry and Technology Faculty of Mathematics Chemnitz University of Technology
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationS N. hochdimensionaler Lyapunov- und Sylvestergleichungen. Peter Benner. Mathematik in Industrie und Technik Fakultät für Mathematik TU Chemnitz
Ansätze zur numerischen Lösung hochdimensionaler Lyapunov- und Sylvestergleichungen Peter Benner Mathematik in Industrie und Technik Fakultät für Mathematik TU Chemnitz S N SIMULATION www.tu-chemnitz.de/~benner
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationA Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures
A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationJulian Merten. GPU Computing and Alternative Architecture
Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg
More informationOn Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code
On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance
More informationMulticore Parallelization of Determinant Quantum Monte Carlo Simulations
Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March
More informationA CUDA Solver for Helmholtz Equation
Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College
More informationOn GPU Acceleration of Common Solvers for (Quasi-) Triangular Generalized Lyapunov Equations
Max Planck Institute Magdeburg Preprints Martin Köhler Jens Saak On GPU Acceleration of Common Solvers for (Quasi-) Triangular Generalized Lyapunov Equations MAX PLANCK INSTITUT FÜR DYNAMIK KOMPLEXER TECHNISCHER
More informationAlgorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method
Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk
More informationPerformance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures
Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse
More informationParallel Solution of Large-Scale and Sparse Generalized algebraic Riccati Equations
Parallel Solution of Large-Scale and Sparse Generalized algebraic Riccati Equations José M. Badía 1, Peter Benner 2, Rafael Mayo 1, and Enrique S. Quintana-Ortí 1 1 Depto. de Ingeniería y Ciencia de Computadores,
More informationPassivity Preserving Model Reduction for Large-Scale Systems. Peter Benner.
Passivity Preserving Model Reduction for Large-Scale Systems Peter Benner Mathematik in Industrie und Technik Fakultät für Mathematik Sonderforschungsbereich 393 S N benner@mathematik.tu-chemnitz.de SIMULATION
More informationControllability and observability gramians parallel computation using GPU
Journal of Theoretical and Applied Computer Science Vol. 6 No. 202 pp. 47-66 ISSN 2299-2634 http://www.jtacs.org Controllability and observability gramians parallel computation using GPU Damian Raczyński
More informationJacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA
Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization
More informationGPU accelerated Arnoldi solver for small batched matrix
15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA
More informationModel Reduction for Dynamical Systems
Otto-von-Guericke Universität Magdeburg Faculty of Mathematics Summer term 2015 Model Reduction for Dynamical Systems Lecture 8 Peter Benner Lihong Feng Max Planck Institute for Dynamics of Complex Technical
More informationImplementing QR Factorization Updating Algorithms on GPUs. Andrew, Robert and Dingle, Nicholas J. MIMS EPrint:
Implementing QR Factorization Updating Algorithms on GPUs Andrew, Robert and Dingle, Nicholas J. 214 MIMS EPrint: 212.114 Manchester Institute for Mathematical Sciences School of Mathematics The University
More informationTips Geared Towards R. Adam J. Suarez. Arpil 10, 2015
Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More informationarxiv: v1 [hep-lat] 7 Oct 2010
arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA
More informationParallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors
Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationPractical Combustion Kinetics with CUDA
Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides
More informationMARCH 24-27, 2014 SAN JOSE, CA
MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =
More informationSome notes on efficient computing and setting up high performance computing environments
Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient
More informationACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS
ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS Bojan Musizza, Dejan Petelin, Juš Kocijan, Jožef Stefan Institute Jamova 39, Ljubljana, Slovenia University of Nova Gorica Vipavska 3, Nova Gorica, Slovenia
More informationAccelerating computation of eigenvectors in the nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationParametrische Modellreduktion mit dünnen Gittern
Parametrische Modellreduktion mit dünnen Gittern (Parametric model reduction with sparse grids) Ulrike Baur Peter Benner Mathematik in Industrie und Technik, Fakultät für Mathematik Technische Universität
More informationBlock AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark
Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures. Mark Gates. February 2012
MAGMA Matrix Algebra on GPU and Multicore Architectures Mark Gates February 2012 1 Hardware trends Scale # cores instead of clock speed Hardware issue became software issue Multicore Hybrid 1.E+07 1e7
More informationA hybrid Hermitian general eigenvalue solver
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A hybrid Hermitian general eigenvalue solver Raffaele Solcà *, Thomas C. Schulthess Institute fortheoretical Physics ETHZ,
More informationInformation Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and
Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element
More informationExplore Computational Power of GPU in Electromagnetics and Micromagnetics
Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department
More informationSaving Energy in Sparse and Dense Linear Algebra Computations
Saving Energy in Sparse and Dense Linear Algebra Computations P. Alonso, M. F. Dolz, F. Igual, R. Mayo, E. S. Quintana-Ortí, V. Roca Univ. Politécnica Univ. Jaume I The Univ. of Texas de Valencia, Spain
More informationEnhancing Performance of Tall-Skinny QR Factorization using FPGAs
Enhancing Performance of Tall-Skinny QR Factorization using FPGAs Abid Rafique Imperial College London August 31, 212 Enhancing Performance of Tall-Skinny QR Factorization using FPGAs 1/18 Our Claim Common
More informationFactorized Solution of Sylvester Equations with Applications in Control
Factorized Solution of Sylvester Equations with Applications in Control Peter Benner Abstract Sylvester equations play a central role in many areas of applied mathematics and in particular in systems and
More informationPerm State University Research-Education Center Parallel and Distributed Computing
Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,
More informationAlfredo Remón Gómez, born on May , in Valencia, SPAIN.
Curriculum Vitae Alfredo Remón Gómez, born on May 9 1977, in Valencia, SPAIN. Address: Max Planck Institute for Phone: +49 3916110382 Dynamics of Complex Technical Systems Fax: +49 3916110500 Sandtorstr.
More informationFaster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs
Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline
More informationMPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory
MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL
More informationAntti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA
S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum
More informationAccelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationImproving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra
Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra FLAME Working Note #73 Pedro Alonso, Manuel F. Dolz 2, Francisco D. Igual 3, Rafael Mayo 4, and Enrique S. Quintana-Ortí
More informationParallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)
Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationUtilisation de la compression low-rank pour réduire la complexité du solveur PaStiX
Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationTowards high performance IRKA on hybrid CPU-GPU systems
Towards high performance IRKA on hybrid CPU-GPU systems Jens Saak in collaboration with Georg Pauer (OVGU/MPI Magdeburg) Kapil Ahuja, Ruchir Garg (IIT Indore) Hartwig Anzt, Jack Dongarra (ICL Uni Tennessee
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationThe Lattice Boltzmann Method for Laminar and Turbulent Channel Flows
The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows Vanja Zecevic, Michael Kirkpatrick and Steven Armfield Department of Aerospace Mechanical & Mechatronic Engineering The University of
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationSonderforschungsbereich 393 Parallele Numerische Simulation für Physik und Kontinuumsmechanik
Sonderforschungsbereich 393 Parallele Numerische Simulation für Physik und Kontinuumsmechanik José M. Badía Peter Benner Rafael Mayo Enrique S. Quintana-Ortí Gregorio Quintana-Ortí Jens Saak Parallel Order
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationMagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs
MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs Lucien Ng The Chinese University of Hong Kong Kwai Wong The Joint Institute for Computational Sciences (JICS), UTK and ORNL Azzam Haidar,
More informationA Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!
More informationSaving Energy in the LU Factorization with Partial Pivoting on Multi-Core Processors
20th Euromicro International Conference on Parallel, Distributed and Network-Based Special Session on Energy-aware Systems Saving Energy in the on Multi-Core Processors Pedro Alonso 1, Manuel F. Dolz 2,
More informationLeveraging Task-Parallelism in Energy-Efficient ILU Preconditioners
Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga
More informationScalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver
Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,
More informationMultiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU
Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,
More informationUpdating incomplete factorization preconditioners for model order reduction
DOI 10.1007/s11075-016-0110-2 ORIGINAL PAPER Updating incomplete factorization preconditioners for model order reduction Hartwig Anzt 1 Edmond Chow 2 Jens Saak 3 Jack Dongarra 1,4,5 Received: 18 September
More informationHeterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry
Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)
More informationLow Rank Solution of Data-Sparse Sylvester Equations
Low Ran Solution of Data-Sparse Sylvester Equations U. Baur e-mail: baur@math.tu-berlin.de, Phone: +49(0)3031479177, Fax: +49(0)3031479706, Technische Universität Berlin, Institut für Mathemati, Straße
More informationResearch on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),
More informationDynamic Scheduling within MAGMA
Dynamic Scheduling within MAGMA Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Mathieu Faverge, Julien Langou, Hatem Ltaief, Samuel Thibault and Stanimire Tomov April 5, 2012 Innovative and Computing
More informationModel reduction of coupled systems
Model reduction of coupled systems Tatjana Stykel Technische Universität Berlin ( joint work with Timo Reis, TU Kaiserslautern ) Model order reduction, coupled problems and optimization Lorentz Center,
More informationMAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors
MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors J. Dongarra, M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov University of Tennessee, Knoxville 05 / 03 / 2013 MAGMA:
More informationUsing AmgX to accelerate a PETSc-based immersed-boundary method code
29th International Conference on Parallel Computational Fluid Dynamics May 15-17, 2017; Glasgow, Scotland Using AmgX to accelerate a PETSc-based immersed-boundary method code Olivier Mesnard, Pi-Yueh Chuang,
More informationThe geometric mean algorithm
The geometric mean algorithm Rui Ralha Centro de Matemática Universidade do Minho 4710-057 Braga, Portugal email: r ralha@math.uminho.pt Abstract Bisection (of a real interval) is a well known algorithm
More informationBinding Performance and Power of Dense Linear Algebra Operations
10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training
More informationTECHNISCHE UNIVERSITÄT BERLIN
TECHNISCHE UNIVERSITÄT BERLIN On best rank one approximation of tensors S. Friedland V. Mehrmann R. Pajarola S.K. Suter Preprint 2012/07 Preprint-Reihe des Instituts für Mathematik Technische Universität
More informationMultilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota
Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work
More informationCOMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD
XVIII International Conference on Water Resources CMWR 2010 J. Carrera (Ed) c CIMNE, Barcelona, 2010 COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD James.E. McClure, Jan F. Prins
More informationStructure preserving Krylov-subspace methods for Lyapunov equations
Structure preserving Krylov-subspace methods for Lyapunov equations Matthias Bollhöfer, André Eppler Institute Computational Mathematics TU Braunschweig MoRePas Workshop, Münster September 17, 2009 System
More informationPower-Aware Execution of Sparse and Dense Linear Algebra Libraries
Power-Aware Execution of Sparse and Dense Linear Algebra Libraries Enrique S. Quintana-Ortí quintana@icc.uji.es Power-aware execution of linear algebra libraries 1 CECAM Lausanne, Sept. 2011 Motivation
More informationACCELERATING SPARSE CHOLESKY FACTORIZATION ON THE GPU
ACCELERATING SPARSE CHOLESKY FACTORIZATION ON THE GPU STEVE RENNICH, SR. ENGINEER, NVIDIA DEVELOPER TECHNOLOGY DARKO STOSIC, PHD CANDIDATE, UNIV. FEDERAL DE PERNAMBUCO TIM DAVIS, PROFESSOR, CSE, TEXAS
More informationS0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA
S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session) Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,
More informationThe new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota
The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM Applied Linear Algebra Valencia, June 18-22, 2012 Introduction Krylov
More informationSP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay
SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More information