Everyday Multithreading

Size: px
Start display at page:

Download "Everyday Multithreading"

Transcription

1 Everyday Multithreading Parallel computing for genomic evaluations in R C. Heuer, D. Hinrichs, G. Thaller Institute of Animal Breeding and Husbandry, Kiel University August 27, 2014 C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

2 Introduction Introduction C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

3 Introduction Introduction High Dimensional livestock data sets New computational challenges Paradigm shift in breeding programs and computing From sparse to dense MME (or mixtures) High storage, memory and computing demands C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

4 Introduction Introduction High Dimensional livestock data sets New computational challenges Paradigm shift in breeding programs and computing From sparse to dense MME (or mixtures) High storage, memory and computing demands Solution: Making use of available hardware resources by parallel computing C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

5 What parallel computing is: Parallel Computing C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

6 What parallel computing is: Parallel Computing Splitting up a big problem into chunks that are simultaneously being solved by several processing units C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

7 What parallel computing is: Parallel Computing Splitting up a big problem into chunks that are simultaneously being solved by several processing units = C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

8 What parallel computing is: Parallel Computing Splitting up a big problem into chunks that are simultaneously being solved by several processing units = C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

9 Parallel Computing Paradigms Parallel Computing C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

10 Parallel Computing Paradigms Parallel Computing 1 Most importantly: Single Core parallelism Vectorization 2 Shared Memory multi-threading OpenMP 3 Distributed Memory multi-processing MPI 4 GPU-programming CUDA, OpenCL, Intel Xeon-Phi C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

11 Parallel Computing Paradigms Parallel Computing 1 Most importantly: Single Core parallelism Vectorization 2 Shared Memory multi-threading OpenMP 3 Distributed Memory multi-processing MPI 4 GPU-programming CUDA, OpenCL, Intel Xeon-Phi Efficiency/Scaling: Depends on size of the problem: Overhead The less efficient a single-threaded application, the better the scaling C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

12 Parallel Computing Paradigms Parallel Computing 1 Most importantly: Single Core parallelism Vectorization 2 Shared Memory multi-threading OpenMP 3 Distributed Memory multi-processing MPI 4 GPU-programming CUDA, OpenCL, Intel Xeon-Phi Efficiency/Scaling: Depends on size of the problem: Overhead The less efficient a single-threaded application, the better the scaling In general: First single-threaded optimization then parallelization C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

13 R-Package R-package cpgen C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

14 R-Package R-package cpgen Advantages of R: 1 Very flexible open source interpreter language 2 Easy to use, available on all platforms and widely spread Drawbacks of R: C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

15 R-Package R-package cpgen Advantages of R: 1 Very flexible open source interpreter language 2 Easy to use, available on all platforms and widely spread Drawbacks of R: 1 Not designed for big data problems 2 Needs a lot of effort to extend R 3 Strictly single-threaded C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

16 R-Package R-package cpgen Advantages of R: 1 Very flexible open source interpreter language 2 Easy to use, available on all platforms and widely spread Drawbacks of R: 1 Not designed for big data problems 2 Needs a lot of effort to extend R 3 Strictly single-threaded But: Can be extended and multi-threaded through C/C++/Fortran shared libraries C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

17 R-Package General Implementation C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

18 R-Package General Implementation 1 R as the basic environment for data preparation and supply 2 Linking C++ to R: Rcpp (Eddelbuettel and Francois, 2011) 3 Linear Algebra: Eigen Vectorization! (Guennebaud et al., 2010) 4 Eigen + Rcpp + Sparse-Matrix support: RcppEigen (Bates and Eddelbuettel, 2013) 5 Parallelization: OpenMP C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

19 R-Package Parallel Computing in cpgen C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

20 R-Package Parallel Computing in cpgen C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

21 R-Package Functionality C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

22 R-Package Functionality 1 Single-Site Gibbs Sampler for Mixed Models with arbitrary number of random effects (sparse or dense design matrices) 2 Genomic Prediction Module: Different Methods, Cross Validation 3 GWAS Module: EMMAX - highly efficient and very flexible single marker GWAS 4 Tools: Genomic additive and dominance relationships, Crossproducts, Covariance Matrices,... C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

23 R-Package Gibbs Sampler Runs models of the following form: y = Xb + Z 1 u 1 + Z 2 u 2 + Z 3 u Z n u n + e For all u k : MVN(0, Iσ 2 u k ) If u k is assumed to follow some MVN(0, G k σk 2): Design matrix must be constructed as: Z k = Z k G 1/2, yielding independent effect in u k (Waldmann et al., 2008). C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

24 R-Package Genomic BLUP GBLUP can be accomplished very efficiently (Kang et al., 2008): y = Xb + a + e with: a MVN(0, Gσ 2 a) By finding the decomposition: G = UDU and premultiplying the model equation by U we get: U y = U Xb + U a + U e with: Var(U y)=u GUσ 2 a + U Uσ 2 e = U UDU Uσ 2 a + Iσ 2 e = Dσ 2 a + Iσ 2 e C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

25 R-Package GWAS Single marker regression Controlling for polygenic background effect through [assumed] covariance structure V of y - EMMAX (Kang et al., 2010) Obtaining General Least Squares estimates for marker effects: ˆβ =(X V 1 X) 1 X V 1 y This is equivalent to: ˆβ =(X X ) 1 X y, with X = V 1/2 X, y = V 1/2 y C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

26 R-Package Parallel Computing in cpgen What is parallelized: 1 All crossproduct-like computations 2 Sampling of random effects in Gibbs Sampler (dot product, vector-vector subtraction) Fernando et al., Cross Validation for genomic prediction 4 GWAS C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

27 R-Package Parallel Computing in cpgen What is parallelized: 1 All crossproduct-like computations 2 Sampling of random effects in Gibbs Sampler (dot product, vector-vector subtraction) Fernando et al., Cross Validation for genomic prediction 4 GWAS Number of threads being used can be controlled during runtime C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

28 Parallel Scaling 2,000 Observations, 50,000 Markers C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

29 Parallel Scaling 2,000 Observations, 50,000 Markers C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

30 Parallel Scaling Gibbs Sampler (BRR) - 10,000 Markers C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

31 Parallel Scaling Gibbs Sampler (BRR) - 10,000 Markers C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

32 Conclusions Conclusions C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

33 Conclusions Conclusions Shared Memory multithreading can decrease computation time significantly Bridges the gap between single threaded applications and heavily parallelized standalone programs for HPC Clusters We can make use of the computational power present in workstation PCs Everyday Multithreading C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

34 Conclusions Conclusions Shared Memory multithreading can decrease computation time significantly Bridges the gap between single threaded applications and heavily parallelized standalone programs for HPC Clusters We can make use of the computational power present in workstation PCs Everyday Multithreading But: Size of solvable problem is limited by available memory With 1 TB of memory, the package could fit a BRR model with 3 million observations and 40k markers C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

35 Thank you for the attention Github R-Forge C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

36 Absolute Timings 10 Cores, 30,000 markers BRR: 1 million observations, 30k iterations 100 hours GWAS: 10k observations 7 minutes G-Matrix: 10k observations 2 minutes C. Heuer, D. Hinrichs, G. Thaller Parallel Computing in R August 27, / 17

Scalable and Power-Efficient Data Mining Kernels

Scalable and Power-Efficient Data Mining Kernels Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr) Principal Researcher / Korea Institute of Science and Technology

More information

FastGP: an R package for Gaussian processes

FastGP: an R package for Gaussian processes FastGP: an R package for Gaussian processes Giri Gopalan Harvard University Luke Bornn Harvard University Many methodologies involving a Gaussian process rely heavily on computationally expensive functions

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection

More information

G-BLUP without inverting the genomic relationship matrix

G-BLUP without inverting the genomic relationship matrix G-BLUP without inverting the genomic relationship matrix Per Madsen 1 and Jørgen Ødegård 2 1 Center for Quantitative Genetics and Genomics Department of Molecular Biology and Genetics, Aarhus University

More information

Lecture 9 Multi-Trait Models, Binary and Count Traits

Lecture 9 Multi-Trait Models, Binary and Count Traits Lecture 9 Multi-Trait Models, Binary and Count Traits Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 September 018 OUTLINE Multiple-trait

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Introduction to Benchmark Test for Multi-scale Computational Materials Software Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)

More information

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling 2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)

More information

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29

Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Perm State University Research-Education Center Parallel and Distributed Computing

Perm State University Research-Education Center Parallel and Distributed Computing Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling

More information

Parallel Sparse Tensor Decompositions using HiCOO Format

Parallel Sparse Tensor Decompositions using HiCOO Format Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance

More information

Practical Combustion Kinetics with CUDA

Practical Combustion Kinetics with CUDA Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides

More information

The Fast Multipole Method in molecular dynamics

The Fast Multipole Method in molecular dynamics The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules

More information

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling 2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)

More information

Mini-project in scientific computing

Mini-project in scientific computing Mini-project in scientific computing Eran Treister Computer Science Department, Ben-Gurion University of the Negev, Israel. March 7, 2018 1 / 30 Scientific computing Involves the solution of large computational

More information

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

Parallel Polynomial Evaluation

Parallel Polynomial Evaluation Parallel Polynomial Evaluation Jan Verschelde joint work with Genady Yoffe University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/ jan jan@math.uic.edu

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

Package bwgr. October 5, 2018

Package bwgr. October 5, 2018 Type Package Title Bayesian Whole-Genome Regression Version 1.5.6 Date 2018-10-05 Package bwgr October 5, 2018 Author Alencar Xavier, William Muir, Shizhong Xu, Katy Rainey. Maintainer Alencar Xavier

More information

ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems

ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, M YYYY 1 ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems Sameh Abdulah, Hatem Ltaief, Ying Sun,

More information

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element

More information

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

BLUP without (inverse) relationship matrix

BLUP without (inverse) relationship matrix Proceedings of the World Congress on Genetics Applied to Livestock Production, 11, 5 BLUP without (inverse relationship matrix E. Groeneveld (1 and A. Neumaier ( (1 Institute of Farm Animal Genetics, Friedrich-Loeffler-Institut,

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

Dark Energy and Massive Neutrino Universe Covariances

Dark Energy and Massive Neutrino Universe Covariances Dark Energy and Massive Neutrino Universe Covariances (DEMNUniCov) Carmelita Carbone Physics Dept, Milan University & INAF-Brera Collaborators: M. Calabrese, M. Zennaro, G. Fabbian, J. Bel November 30

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters -- Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer

More information

Panorama des modèles et outils de programmation parallèle

Panorama des modèles et outils de programmation parallèle Panorama des modèles et outils de programmation parallèle Sylvain HENRY sylvain.henry@inria.fr University of Bordeaux - LaBRI - Inria - ENSEIRB April 19th, 2013 1/45 Outline Introduction Accelerators &

More information

Targeting Extreme Scale Computational Challenges with Heterogeneous Systems

Targeting Extreme Scale Computational Challenges with Heterogeneous Systems Targeting Extreme Scale Computational Challenges with Heterogeneous Systems Oreste Villa, Antonino Tumeo Pacific Northwest Na/onal Laboratory (PNNL) 1 Introduction! PNNL Laboratory Directed Research &

More information

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.

More information

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems

More information

A Comparison of Solving the Poisson Equation Using Several Numerical Methods in Matlab and Octave on the Cluster maya

A Comparison of Solving the Poisson Equation Using Several Numerical Methods in Matlab and Octave on the Cluster maya A Comparison of Solving the Poisson Equation Using Several Numerical Methods in Matlab and Octave on the Cluster maya Sarah Swatski, Samuel Khuvis, and Matthias K. Gobbert (gobbert@umbc.edu) Department

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

Presentation of XLIFE++

Presentation of XLIFE++ Presentation of XLIFE++ Eigenvalues Solver & OpenMP Manh-Ha NGUYEN Unité de Mathématiques Appliquées, ENSTA - Paristech 25 Juin 2014 Ha. NGUYEN Presentation of XLIFE++ 25 Juin 2014 1/19 EigenSolver 1 EigenSolver

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing

More information

Energy Consumption Evaluation for Krylov Methods on a Cluster of GPU Accelerators

Energy Consumption Evaluation for Krylov Methods on a Cluster of GPU Accelerators PARIS-SACLAY, FRANCE Energy Consumption Evaluation for Krylov Methods on a Cluster of GPU Accelerators Serge G. Petiton a and Langshi Chen b April the 6 th, 2016 a Université de Lille, Sciences et Technologies

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

Calculate Sensitivity Function Using Parallel Algorithm

Calculate Sensitivity Function Using Parallel Algorithm Journal of Computer Science 4 (): 928-933, 28 ISSN 549-3636 28 Science Publications Calculate Sensitivity Function Using Parallel Algorithm Hamed Al Rjoub Irbid National University, Irbid, Jordan Abstract:

More information

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners: The Problem The Bethe-Salpeter

More information

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria 1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation

More information

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization

More information

Hydra: Generation and Tuning of parallel solutions for linear algebra equations. Alexandre X. Duchâteau University of Illinois at Urbana Champaign

Hydra: Generation and Tuning of parallel solutions for linear algebra equations. Alexandre X. Duchâteau University of Illinois at Urbana Champaign Hydra: Generation and Tuning of parallel solutions for linear algebra equations Alexandre X. Duchâteau University of Illinois at Urbana Champaign Collaborators Thesis Advisors Denis Barthou (Labri/INRIA

More information

INITIAL INTEGRATION AND EVALUATION

INITIAL INTEGRATION AND EVALUATION INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52

More information

Modification of negative eigenvalues to create positive definite matrices and approximation of standard errors of correlation estimates

Modification of negative eigenvalues to create positive definite matrices and approximation of standard errors of correlation estimates Modification of negative eigenvalues to create positive definite matrices and approximation of standard errors of correlation estimates L. R. Schaeffer Centre for Genetic Improvement of Livestock Department

More information

Nonparametric forecasting of the French load curve

Nonparametric forecasting of the French load curve An overview RTE & UPMC-ISUP ISNPS 214, Cadix June 15, 214 Table of contents 1 Introduction 2 MAVE modeling 3 IBR modeling 4 Sparse modeling Electrical context Generation must be strictly equal to consumption

More information

CS 542G: Conditioning, BLAS, LU Factorization

CS 542G: Conditioning, BLAS, LU Factorization CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles

More information

Behavioral Simulations in MapReduce

Behavioral Simulations in MapReduce Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?

More information

Optimization Techniques for Parallel Code 1. Parallel programming models

Optimization Techniques for Parallel Code 1. Parallel programming models Optimization Techniques for Parallel Code 1. Parallel programming models Sylvain Collange Inria Rennes Bretagne Atlantique http://www.irisa.fr/alf/collange/ sylvain.collange@inria.fr OPT - 2017 Goals of

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

RWTH Aachen University

RWTH Aachen University IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016

More information

Benchmarking program performance evaluation of Parallel programming language XcalableMP on Many core processor

Benchmarking program performance evaluation of Parallel programming language XcalableMP on Many core processor XcalableMP 1 2 2 2 Xeon Phi Xeon XcalableMP HIMENO L Phi XL 16 Xeon 1 16 Phi XcalableMP MPI XcalableMP OpenMP 16 2048 Benchmarking program performance evaluation of Parallel programming language XcalableMP

More information

UNMIXING 4-D PTYCHOGRAPHIC IMAGES

UNMIXING 4-D PTYCHOGRAPHIC IMAGES UNMIXING 4-D PTYCHOGRAPHIC IMAGES Mentors: Dr. Rick Archibald(ORNL), Dr. Azzam Haidar(UTK), Dr. Stanimire Tomov(UTK), and Dr. Kwai Wong(UTK) PROJECT BY: MICHAELA SHOFFNER(UTK) ZHEN ZHANG(CUHK) HUANLIN

More information

Breaking Computational Barriers: Multi-GPU High-Order RBF Kernel Problems with Millions of Points

Breaking Computational Barriers: Multi-GPU High-Order RBF Kernel Problems with Millions of Points Breaking Computational Barriers: Multi-GPU High-Order RBF Kernel Problems with Millions of Points Michael Griebel Christian Rieger Peter Zaspel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität

More information

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),

More information

Implementing NNLO into MCFM

Implementing NNLO into MCFM Implementing NNLO into MCFM Downloadable from mcfm.fnal.gov A Multi-Threaded Version of MCFM, J.M. Campbell, R.K. Ellis, W. Giele, 2015 Higgs boson production in association with a jet at NNLO using jettiness

More information

Accelerated Neutrino Oscillation Probability Calculations and Reweighting on GPUs. Richard Calland

Accelerated Neutrino Oscillation Probability Calculations and Reweighting on GPUs. Richard Calland Accelerated Neutrino Oscillation Probability Calculations and Reweighting on GPUs Richard Calland University of Liverpool GPU Computing in High Energy Physics University of Pisa, 11th September 2014 Introduction

More information

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory MAX PLANCK INSTITUTE November 5, 2010 MPI at MPI Jens Saak Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory FOR DYNAMICS OF COMPLEX TECHNICAL

More information

Architecture Solutions for DDM Numerical Environment

Architecture Solutions for DDM Numerical Environment Architecture Solutions for DDM Numerical Environment Yana Gurieva 1 and Valery Ilin 1,2 1 Institute of Computational Mathematics and Mathematical Geophysics SB RAS, Novosibirsk, Russia 2 Novosibirsk State

More information

11 Parallel programming models

11 Parallel programming models 237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Parallel Transposition of Sparse Data Structures

Parallel Transposition of Sparse Data Structures Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing

More information

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman

Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA. Bill Putman Global Modeling and Assimilation Office Scaling the Software and Advancing the Science of Global Modeling and Assimilation Systems at NASA Bill Putman Max Suarez, Lawrence Takacs, Atanas Trayanov and Hamid

More information

Language GTC April 7, These slides at Data Science Applications of GPUs in the R.

Language GTC April 7, These slides at   Data Science Applications of GPUs in the R. Data Science April 7, 2016 These slides at http://heather.cs.ucdavis.edu/gtc.pdf Why R? Why R? The lingua franca for the data science community. (R-Python-Julia battle looming?) Statistically Correct:

More information

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors

MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors J. Dongarra, M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov University of Tennessee, Knoxville 05 / 03 / 2013 MAGMA:

More information

QR Decomposition in a Multicore Environment

QR Decomposition in a Multicore Environment QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance

More information

Package BGGE. August 10, 2018

Package BGGE. August 10, 2018 Package BGGE August 10, 2018 Title Bayesian Genomic Linear Models Applied to GE Genome Selection Version 0.6.5 Date 2018-08-10 Description Application of genome prediction for a continuous variable, focused

More information

Optimizing Model Development and Validation Procedures of Partial Least Squares for Spectral Based Prediction of Soil Properties

Optimizing Model Development and Validation Procedures of Partial Least Squares for Spectral Based Prediction of Soil Properties Optimizing Model Development and Validation Procedures of Partial Least Squares for Spectral Based Prediction of Soil Properties Soil Spectroscopy Extracting chemical and physical attributes from spectral

More information

Application to Hyperspectral Imaging

Application to Hyperspectral Imaging Compressed Sensing of Low Complexity High Dimensional Data Application to Hyperspectral Imaging Kévin Degraux PhD Student, ICTEAM institute Université catholique de Louvain, Belgium 6 November, 2013 Hyperspectral

More information

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R Statistical Clustering of Vesicle Patterns Mirko Birbaumer Rmetrics Workshop 3th July 2008 1 / 23 Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R Mirko

More information

Dr. Andrea Bocci. Using GPUs to Accelerate Online Event Reconstruction. at the Large Hadron Collider. Applied Physicist

Dr. Andrea Bocci. Using GPUs to Accelerate Online Event Reconstruction. at the Large Hadron Collider. Applied Physicist Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider Dr. Andrea Bocci Applied Physicist On behalf of the CMS Collaboration Discover CERN Inside the Large Hadron Collider at

More information

Faster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich)

Faster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) Faster Machine Learning via Low-Precision Communication & Computation Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) 2 How many bits do you need to represent a single number in machine

More information

Accelerating incompressible fluid flow simulations on hybrid CPU/GPU systems

Accelerating incompressible fluid flow simulations on hybrid CPU/GPU systems Accelerating incompressible fluid flow simulations on hybrid CPU/GPU systems Yushan Wang 1, Marc Baboulin 1,2, Karl Rupp 3,4, Yann Fraigneau 1,5, Olivier Le Maître 1,5 1 Université Paris-Sud, France 2

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

GWAS with mixed models

GWAS with mixed models GWAS with mixed models (the trip from 10 0 to 10 8 ) Yurii Aulchenko yurii [dot] Aulchenko [at] gmail [dot] com Twitter: @YuriiAulchenko YuriiA consulting 1 Outline Methodology development Speeding things

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Enhancing Performance of Tall-Skinny QR Factorization using FPGAs

Enhancing Performance of Tall-Skinny QR Factorization using FPGAs Enhancing Performance of Tall-Skinny QR Factorization using FPGAs Abid Rafique Imperial College London August 31, 212 Enhancing Performance of Tall-Skinny QR Factorization using FPGAs 1/18 Our Claim Common

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information