Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Size: px
Start display at page:

Download "Accelerating Model Reduction of Large Linear Systems with Graphics Processors"

Transcription

1 Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex Technical Systems (Magdeburg, Germany). 2 Centro de Cálculo-Inst. de la Computación,Univ. de la República (Montevideo, Uruguay). 3 Seminar für Angewandte Mathematik, ETHZ (Zürich, Switzerland). 4 Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I (Castellón, Spain). ModRed 10 - December 2010 remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 1

2 Why GPUs? Chapter 1. Introduction Figure 1-1. Floating-Point Operations per Second and Memory Bandwidth for the CPU and GPU 2 CUDA C Programming Guide Version 3.2 Extracted from: CUDA C Programming Guide 3.1, NVIDIA Corporation remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 2

3 GPUs for general purpose programming GPUs where not developed for general programming Difficult and slow code generation A. Remón Model Reduction of Large Linear Systems with GPUs 3

4 GPUs for general purpose programming GPUs where not developed for general programming Difficult and slow code generation CUDA In 2006 appears CUDA (Computed Unified Device Architecture) Created by NVIDIA, is a HW-SW platform to facilitate the use of GPUs in general purpose programming Software: compilers (c, fortran), libraries (cufft, cublas,...) Hardware: efficient thread management and memory access remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 3

5 GPUs for general purpose programming GPUs where not developed for general programming Difficult and slow code generation CUDA Gap between single and double precision performance A. Remón Model Reduction of Large Linear Systems with GPUs 3

6 GPUs for general purpose programming GPUs where not developed for general programming Difficult and slow code generation CUDA Gap between single and double precision performance Fermi In 2010 appears the Fermi architecture The double precision computations are only two times slower than single precision computations A. Remón Model Reduction of Large Linear Systems with GPUs 3

7 GPUs for general purpose programming GPUs where not developed for general programming Difficult and slow code generation CUDA Gap between single and double precision performance Fermi For a list of scientific CUDA applications visit remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 3

8 Outline Model reduction Model reduction via the BT method Matrix sign function method Numerical results A. Remón Model Reduction of Large Linear Systems with GPUs 4

9 Model Reduction: Purpose Given Eẋ(t) = Ax(t) + Bu(t), t > 0, x(0) = x 0, find a reduced model y(t) = Cx(t) + Du(t), t 0, E r x r (t) = A r x r (t) + B r u(t), t > 0, x r (0) = x 0 r, y r (t) = C r x r (t) + D r u(t), t 0, of order r n and output error such that y y r = Gu G r u = (G G r )u y y r and G G r are small! remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 5

10 Model Reduction: Example Optimal cooling of steel profiles Arises in a manufacturing method for steel profiles. Objective: reduce the temperature as fast as possible. Method: spraying of cooling fluids on the surface. Goal: Material properties (durability, porosity) have to satisfy quality standards. Problem dimensions: n = 5, 177, m = 7, and p = 6. Math. model: STEEL I from the Oberwolfach benchmark collection Oberwolfach benchmark collection: Model details: [Tröltzsch/Unger 1999/2001], [Penzl 1999] and [Saak 2003]. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 6

11 Outline Model reduction Model reduction via the BT method Matrix sign function method Numerical results A. Remón Model Reduction of Large Linear Systems with GPUs 7

12 Balanced Truncation (BT) method Procedure composed of three steps: 1. Solve the coupled generalized Lyapunov matrix equations AW c E T + EW c A T + BB T = 0, A T W o E + E T W o A + C T C = 0, with W 0 = E T W o E for S, R such that W c = S T S, W o = R T R. 2. Compute [ SR T = UΣV T Σ1 = [ U 1 U 2 ] Σ2 with Σ 1 R rxr, Σ 2 R (n r)x(n r). ] [ V T 1 V T 2 ], remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 8

13 Balanced Truncation method (Cont.) 3. In the last stage T l = Σ 1/2 1 V T 1 R and T r = S T U 1 Σ 1/2 1, and (A r, B r, C r, D r, E r ) = (T l AT r, T l B, CT r, D, T l ET r ). The state-space dimension r of the reducer-order model can be chosen adaptatively as this method provides a realization Ĝ satisfying G G r 2 n j=r+1 σ j. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 9

14 Balanced Truncation method (Cont.) 3. In the last stage T l = Σ 1/2 1 V T 1 R and T r = S T U 1 Σ 1/2 1, and (A r, B r, C r, D r, E r ) = (T l AT r, T l B, CT r, D, T l ET r ). The most expensive computation is the solution of the generalized Lyapunov equations. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 9

15 Outline Model reduction Model reduction via the BT method Matrix sign function method Numerical results A. Remón Model Reduction of Large Linear Systems with GPUs 10

16 Matrix sign function method Remarks It is an efficient tool to solve stable Lyapunov equations. There are different schemes to solve the matrix sign function, like the Newton iteration method. The Newton iteration method for the matrix sign function. A 0 = A, A k+1 = 1 2 (A k + A 1 k ), Main features: Simple. Efficient on parallel implementation. Asymptotic quadratic convergence. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 11

17 Matrix sign function method Low-rank factors version of the algorithm On convergence after j iterations, W c S T S and Wo R T R. Convergence can be accelerated using a scaling factor, in our case: c k = A F / EA 1 k E F. Even if A is sparse, {A k } k=1,2,... in general are full dense matrices. Requires O(n 3 ) floating-point operations per iteration. The most computationally expensive step is the matrix inversion. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 12

18 Matrix sign function method Low-rank factors version of the algorithm Algorithm 1 CGCLNC 1: A 0 = A, Ŝ0 = B T, ˆR 0 = C. 2: k = 0. 3: repeat 4: A k+1 = 1 2 ( Ak /c k + c k (EA k 1 )E ). 5: Compute the rank-revealing QR (RRQR) decomposition [ ] [ ] 1 Sk 2ck, c k Sk (EA 1 Us k )T = Q s Π 0 s 6: S k+1 U s Π s 7: Compute the rank-revealing QR (RRQR) decomposition [ ] [ ] 1 2ck R k, c k ( R k A 1 k )E Ur = Q r Π 0 r 8: R k+1 U r Π r 9: k = k : until A k E 1 < τ A k 1 remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 13

19 Hybrid implementation Computations performed at iteration j 1. PA j = LU * CPU GPU 2. EA j 1 ; Rk A j 1 3. (EA j 1 )E 4. Compute the rank-revealing QR (RRQR) decomposition [ ] [ ] 1 S j, c j S j (EA 1 2cj j ) T Us = Q s Π 0 s 5. S j+1 U s Π s 6. Compute the rank-revealing QR (RRQR) decomposition [ ] [ ] 1 Rj, c j ( R j A 1 Ur 2cj j )E = Q r Π 0 r 7. Rj+1 U r Π r 8. A j+1 = 1 2 ( Aj /c j + c j (EA j 1 )E ) * CPU and GPU cooperate during this operation remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 14

20 Outline Model reduction Model reduction via the BT method Matrix sign function method Numerical results A. Remón Model Reduction of Large Linear Systems with GPUs 15

21 Model reduction Hardware and software Hardware Platform consisting of two Intel Xeon QuadCore E5410 processors at 2.33GHz, connected to an Nvidia Tesla C1060 via a PCI-e bus. Software LAPACK(CPU): all the computations are performed on the CPU using LAPACK and BLAS kernels (MKL v.10.2). Hybrid(CPU+GPU): computations are executed on the most convenient architecture minimizing the communications. (MKL(v.10.2)+CUBLAS(v.2.1)) remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 16

22 Model reduction Problem definition: Optimal cooling of steel profiles Model STEEL I from the Oberwolfach benchmark collection Arises in a manufacturing method for steel profiles. The objective is to design a control that yields moderate temperature gradients when the rail is cooled down. The model corresponds to a 2-D heat equation. Dimensions of the problem: n = 5, 177, m = 7, p = 6 Math. model: [Tröltzsch/Unger 1999/2001], [Penzl 1999] and [Saak 2003]. Oberwolfach benchmark collection: remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 17

23 Model reduction Problem definition: Convective Thermal Flow Problems Model FLOW METER from the Oberwolfach benchmark collection Is a 2-D model of an anemometer-like structure. Mainly consists of a tube and a small heat source. The model is given by a spatially semi-discretized instationary convection difussion equation. The reference temperature is set to 300K and Dirichlet boundary conditions as well as initial conditions are set to 0 with respect to the reference. Dimensions of the problem: n = 9, 669, m = 1, p = 5 Math. model: [Harper 1997], [Ernst 2001] and [Mossmann 2004] Oberwolfach benchmark collection: remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 18

24 Model reduction Results for benchmark STEEL I #Iter. Time (s) Conv. criterion k Hybrid Impl. LAPACK A k + E F e e e e e e e e-07 total: time is reduced on a 45% Problem dimensions: n = 5, 177, m = 7, p = 6. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 19

25 Model reduction Results for benchmark STEEL I (hybrid implementation) #Iter. k PA k = LU EA k 1, R k A k 1 Time (s) (EA k 1 )E S k (EA k 1 ),( R k A k 1 )E, Iteration Compress Accumulated time (s) Problem dimensions: n = 5, 177, m = 7, p = 6. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 20

26 Model reduction Results for benchmark FLOW METER #Iter. Time (s) Conv. criterion k Hybrid Impl. LAPACK A k + E F e e e e e e e e e e e-07 total: time is reduced on a 46% Problem dimensions: n = 9, 669, m = 1, p = 5. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 21

27 Model reduction Results for benchmark Flow Meter (hybrid implementation) #Iter. k PA k = LU EA k 1, R k A k 1 Time (s) (EA k 1 )E S k (EA k 1 ),( R k A k 1 )E, Iteration Compress Accumulated time (s) Problem dimensions: n = 9, 669, m = 1, p = 5. remon@uji.es A. Remón Model Reduction of Large Linear Systems with GPUs 22

28 Thanks... Any question? A. Remón Model Reduction of Large Linear Systems with GPUs 23

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.

More information

Numerical Solution of Differential Riccati Equations on Hybrid CPU-GPU Platforms

Numerical Solution of Differential Riccati Equations on Hybrid CPU-GPU Platforms Numerical Solution of Differential Riccati Equations on Hybrid CPU-GPU Platforms Peter Benner 1, Pablo Ezzatti 2, Hermann Mena 3, Enrique S. Quintana-Ortí 4, Alfredo Remón 4 1 Fakultät für Mathematik,

More information

Parallel Model Reduction of Large Linear Descriptor Systems via Balanced Truncation

Parallel Model Reduction of Large Linear Descriptor Systems via Balanced Truncation Parallel Model Reduction of Large Linear Descriptor Systems via Balanced Truncation Peter Benner 1, Enrique S. Quintana-Ortí 2, Gregorio Quintana-Ortí 2 1 Fakultät für Mathematik Technische Universität

More information

Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction

Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction Peter Benner 1, Ernesto Dufrechou 2, Pablo Ezzatti 2, Pablo Igounet 2, Enrique S. Quintana-Ortí 3, and Alfredo Remón

More information

Level-3 BLAS on a GPU

Level-3 BLAS on a GPU Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

A Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations

A Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations A Newton-Galerkin-ADI Method for Large-Scale Algebraic Riccati Equations Peter Benner Max-Planck-Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

More information

BALANCING-RELATED MODEL REDUCTION FOR DATA-SPARSE SYSTEMS

BALANCING-RELATED MODEL REDUCTION FOR DATA-SPARSE SYSTEMS BALANCING-RELATED Peter Benner Professur Mathematik in Industrie und Technik Fakultät für Mathematik Technische Universität Chemnitz Computational Methods with Applications Harrachov, 19 25 August 2007

More information

Efficient Implementation of Large Scale Lyapunov and Riccati Equation Solvers

Efficient Implementation of Large Scale Lyapunov and Riccati Equation Solvers Efficient Implementation of Large Scale Lyapunov and Riccati Equation Solvers Jens Saak joint work with Peter Benner (MiIT) Professur Mathematik in Industrie und Technik (MiIT) Fakultät für Mathematik

More information

Sonderforschungsbereich 393

Sonderforschungsbereich 393 Sonderforschungsbereich 393 Parallele Numerische Simulation für Physik und Kontinuumsmechanik Peter Benner Enrique S. Quintana-Ortí Gregorio Quintana-Ortí Solving Stable Sylvester Equations via Rational

More information

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

MODEL REDUCTION BY A CROSS-GRAMIAN APPROACH FOR DATA-SPARSE SYSTEMS

MODEL REDUCTION BY A CROSS-GRAMIAN APPROACH FOR DATA-SPARSE SYSTEMS MODEL REDUCTION BY A CROSS-GRAMIAN APPROACH FOR DATA-SPARSE SYSTEMS Ulrike Baur joint work with Peter Benner Mathematics in Industry and Technology Faculty of Mathematics Chemnitz University of Technology

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

S N. hochdimensionaler Lyapunov- und Sylvestergleichungen. Peter Benner. Mathematik in Industrie und Technik Fakultät für Mathematik TU Chemnitz

S N. hochdimensionaler Lyapunov- und Sylvestergleichungen. Peter Benner. Mathematik in Industrie und Technik Fakultät für Mathematik TU Chemnitz Ansätze zur numerischen Lösung hochdimensionaler Lyapunov- und Sylvestergleichungen Peter Benner Mathematik in Industrie und Technik Fakultät für Mathematik TU Chemnitz S N SIMULATION www.tu-chemnitz.de/~benner

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,

More information

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance

More information

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March

More information

A CUDA Solver for Helmholtz Equation

A CUDA Solver for Helmholtz Equation Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College

More information

On GPU Acceleration of Common Solvers for (Quasi-) Triangular Generalized Lyapunov Equations

On GPU Acceleration of Common Solvers for (Quasi-) Triangular Generalized Lyapunov Equations Max Planck Institute Magdeburg Preprints Martin Köhler Jens Saak On GPU Acceleration of Common Solvers for (Quasi-) Triangular Generalized Lyapunov Equations MAX PLANCK INSTITUT FÜR DYNAMIK KOMPLEXER TECHNISCHER

More information

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk

More information

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse

More information

Parallel Solution of Large-Scale and Sparse Generalized algebraic Riccati Equations

Parallel Solution of Large-Scale and Sparse Generalized algebraic Riccati Equations Parallel Solution of Large-Scale and Sparse Generalized algebraic Riccati Equations José M. Badía 1, Peter Benner 2, Rafael Mayo 1, and Enrique S. Quintana-Ortí 1 1 Depto. de Ingeniería y Ciencia de Computadores,

More information

Passivity Preserving Model Reduction for Large-Scale Systems. Peter Benner.

Passivity Preserving Model Reduction for Large-Scale Systems. Peter Benner. Passivity Preserving Model Reduction for Large-Scale Systems Peter Benner Mathematik in Industrie und Technik Fakultät für Mathematik Sonderforschungsbereich 393 S N benner@mathematik.tu-chemnitz.de SIMULATION

More information

Controllability and observability gramians parallel computation using GPU

Controllability and observability gramians parallel computation using GPU Journal of Theoretical and Applied Computer Science Vol. 6 No. 202 pp. 47-66 ISSN 2299-2634 http://www.jtacs.org Controllability and observability gramians parallel computation using GPU Damian Raczyński

More information

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization

More information

GPU accelerated Arnoldi solver for small batched matrix

GPU accelerated Arnoldi solver for small batched matrix 15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA

More information

Model Reduction for Dynamical Systems

Model Reduction for Dynamical Systems Otto-von-Guericke Universität Magdeburg Faculty of Mathematics Summer term 2015 Model Reduction for Dynamical Systems Lecture 8 Peter Benner Lihong Feng Max Planck Institute for Dynamics of Complex Technical

More information

Implementing QR Factorization Updating Algorithms on GPUs. Andrew, Robert and Dingle, Nicholas J. MIMS EPrint:

Implementing QR Factorization Updating Algorithms on GPUs. Andrew, Robert and Dingle, Nicholas J. MIMS EPrint: Implementing QR Factorization Updating Algorithms on GPUs Andrew, Robert and Dingle, Nicholas J. 214 MIMS EPrint: 212.114 Manchester Institute for Mathematical Sciences School of Mathematics The University

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

Practical Combustion Kinetics with CUDA

Practical Combustion Kinetics with CUDA Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides

More information

MARCH 24-27, 2014 SAN JOSE, CA

MARCH 24-27, 2014 SAN JOSE, CA MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS

ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS ACCELERATED LEARNING OF GAUSSIAN PROCESS MODELS Bojan Musizza, Dejan Petelin, Juš Kocijan, Jožef Stefan Institute Jamova 39, Ljubljana, Slovenia University of Nova Gorica Vipavska 3, Nova Gorica, Slovenia

More information

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Parametrische Modellreduktion mit dünnen Gittern

Parametrische Modellreduktion mit dünnen Gittern Parametrische Modellreduktion mit dünnen Gittern (Parametric model reduction with sparse grids) Ulrike Baur Peter Benner Mathematik in Industrie und Technik, Fakultät für Mathematik Technische Universität

More information

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise

More information

MAGMA. Matrix Algebra on GPU and Multicore Architectures. Mark Gates. February 2012

MAGMA. Matrix Algebra on GPU and Multicore Architectures. Mark Gates. February 2012 MAGMA Matrix Algebra on GPU and Multicore Architectures Mark Gates February 2012 1 Hardware trends Scale # cores instead of clock speed Hardware issue became software issue Multicore Hybrid 1.E+07 1e7

More information

A hybrid Hermitian general eigenvalue solver

A hybrid Hermitian general eigenvalue solver Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A hybrid Hermitian general eigenvalue solver Raffaele Solcà *, Thomas C. Schulthess Institute fortheoretical Physics ETHZ,

More information

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element

More information

Explore Computational Power of GPU in Electromagnetics and Micromagnetics

Explore Computational Power of GPU in Electromagnetics and Micromagnetics Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department

More information

Saving Energy in Sparse and Dense Linear Algebra Computations

Saving Energy in Sparse and Dense Linear Algebra Computations Saving Energy in Sparse and Dense Linear Algebra Computations P. Alonso, M. F. Dolz, F. Igual, R. Mayo, E. S. Quintana-Ortí, V. Roca Univ. Politécnica Univ. Jaume I The Univ. of Texas de Valencia, Spain

More information

Enhancing Performance of Tall-Skinny QR Factorization using FPGAs

Enhancing Performance of Tall-Skinny QR Factorization using FPGAs Enhancing Performance of Tall-Skinny QR Factorization using FPGAs Abid Rafique Imperial College London August 31, 212 Enhancing Performance of Tall-Skinny QR Factorization using FPGAs 1/18 Our Claim Common

More information

Factorized Solution of Sylvester Equations with Applications in Control

Factorized Solution of Sylvester Equations with Applications in Control Factorized Solution of Sylvester Equations with Applications in Control Peter Benner Abstract Sylvester equations play a central role in many areas of applied mathematics and in particular in systems and

More information

Perm State University Research-Education Center Parallel and Distributed Computing

Perm State University Research-Education Center Parallel and Distributed Computing Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling

More information

On the design of parallel linear solvers for large scale problems

On the design of parallel linear solvers for large scale problems On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,

More information

Alfredo Remón Gómez, born on May , in Valencia, SPAIN.

Alfredo Remón Gómez, born on May , in Valencia, SPAIN. Curriculum Vitae Alfredo Remón Gómez, born on May 9 1977, in Valencia, SPAIN. Address: Max Planck Institute for Phone: +49 3916110382 Dynamics of Complex Technical Systems Fax: +49 3916110500 Sandtorstr.

More information

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline

More information

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra

Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra FLAME Working Note #73 Pedro Alonso, Manuel F. Dolz 2, Francisco D. Igual 3, Rafael Mayo 4, and Enrique S. Quintana-Ortí

More information

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

Towards high performance IRKA on hybrid CPU-GPU systems

Towards high performance IRKA on hybrid CPU-GPU systems Towards high performance IRKA on hybrid CPU-GPU systems Jens Saak in collaboration with Georg Pauer (OVGU/MPI Magdeburg) Kapil Ahuja, Ruchir Garg (IIT Indore) Hartwig Anzt, Jack Dongarra (ICL Uni Tennessee

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows

The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows Vanja Zecevic, Michael Kirkpatrick and Steven Armfield Department of Aerospace Mechanical & Mechatronic Engineering The University of

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Sonderforschungsbereich 393 Parallele Numerische Simulation für Physik und Kontinuumsmechanik

Sonderforschungsbereich 393 Parallele Numerische Simulation für Physik und Kontinuumsmechanik Sonderforschungsbereich 393 Parallele Numerische Simulation für Physik und Kontinuumsmechanik José M. Badía Peter Benner Rafael Mayo Enrique S. Quintana-Ortí Gregorio Quintana-Ortí Jens Saak Parallel Order

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs Lucien Ng The Chinese University of Hong Kong Kwai Wong The Joint Institute for Computational Sciences (JICS), UTK and ORNL Azzam Haidar,

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Saving Energy in the LU Factorization with Partial Pivoting on Multi-Core Processors

Saving Energy in the LU Factorization with Partial Pivoting on Multi-Core Processors 20th Euromicro International Conference on Parallel, Distributed and Network-Based Special Session on Energy-aware Systems Saving Energy in the on Multi-Core Processors Pedro Alonso 1, Manuel F. Dolz 2,

More information

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,

More information

Updating incomplete factorization preconditioners for model order reduction

Updating incomplete factorization preconditioners for model order reduction DOI 10.1007/s11075-016-0110-2 ORIGINAL PAPER Updating incomplete factorization preconditioners for model order reduction Hartwig Anzt 1 Edmond Chow 2 Jens Saak 3 Jack Dongarra 1,4,5 Received: 18 September

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

Low Rank Solution of Data-Sparse Sylvester Equations

Low Rank Solution of Data-Sparse Sylvester Equations Low Ran Solution of Data-Sparse Sylvester Equations U. Baur e-mail: baur@math.tu-berlin.de, Phone: +49(0)3031479177, Fax: +49(0)3031479706, Technische Universität Berlin, Institut für Mathemati, Straße

More information

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),

More information

Dynamic Scheduling within MAGMA

Dynamic Scheduling within MAGMA Dynamic Scheduling within MAGMA Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Mathieu Faverge, Julien Langou, Hatem Ltaief, Samuel Thibault and Stanimire Tomov April 5, 2012 Innovative and Computing

More information

Model reduction of coupled systems

Model reduction of coupled systems Model reduction of coupled systems Tatjana Stykel Technische Universität Berlin ( joint work with Timo Reis, TU Kaiserslautern ) Model order reduction, coupled problems and optimization Lorentz Center,

More information

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors J. Dongarra, M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov University of Tennessee, Knoxville 05 / 03 / 2013 MAGMA:

More information

Using AmgX to accelerate a PETSc-based immersed-boundary method code

Using AmgX to accelerate a PETSc-based immersed-boundary method code 29th International Conference on Parallel Computational Fluid Dynamics May 15-17, 2017; Glasgow, Scotland Using AmgX to accelerate a PETSc-based immersed-boundary method code Olivier Mesnard, Pi-Yueh Chuang,

More information

The geometric mean algorithm

The geometric mean algorithm The geometric mean algorithm Rui Ralha Centro de Matemática Universidade do Minho 4710-057 Braga, Portugal email: r ralha@math.uminho.pt Abstract Bisection (of a real interval) is a well known algorithm

More information

Binding Performance and Power of Dense Linear Algebra Operations

Binding Performance and Power of Dense Linear Algebra Operations 10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training

More information

TECHNISCHE UNIVERSITÄT BERLIN

TECHNISCHE UNIVERSITÄT BERLIN TECHNISCHE UNIVERSITÄT BERLIN On best rank one approximation of tensors S. Friedland V. Mehrmann R. Pajarola S.K. Suter Preprint 2012/07 Preprint-Reihe des Instituts für Mathematik Technische Universität

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD

COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD XVIII International Conference on Water Resources CMWR 2010 J. Carrera (Ed) c CIMNE, Barcelona, 2010 COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD James.E. McClure, Jan F. Prins

More information

Structure preserving Krylov-subspace methods for Lyapunov equations

Structure preserving Krylov-subspace methods for Lyapunov equations Structure preserving Krylov-subspace methods for Lyapunov equations Matthias Bollhöfer, André Eppler Institute Computational Mathematics TU Braunschweig MoRePas Workshop, Münster September 17, 2009 System

More information

Power-Aware Execution of Sparse and Dense Linear Algebra Libraries

Power-Aware Execution of Sparse and Dense Linear Algebra Libraries Power-Aware Execution of Sparse and Dense Linear Algebra Libraries Enrique S. Quintana-Ortí quintana@icc.uji.es Power-aware execution of linear algebra libraries 1 CECAM Lausanne, Sept. 2011 Motivation

More information

ACCELERATING SPARSE CHOLESKY FACTORIZATION ON THE GPU

ACCELERATING SPARSE CHOLESKY FACTORIZATION ON THE GPU ACCELERATING SPARSE CHOLESKY FACTORIZATION ON THE GPU STEVE RENNICH, SR. ENGINEER, NVIDIA DEVELOPER TECHNOLOGY DARKO STOSIC, PHD CANDIDATE, UNIV. FEDERAL DE PERNAMBUCO TIM DAVIS, PROFESSOR, CSE, TEXAS

More information

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session) Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,

More information

The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota

The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM Applied Linear Algebra Valencia, June 18-22, 2012 Introduction Krylov

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information