Explore Computational Power of GPU in Electromagnetics and Micromagnetics
|
|
- Teresa Holland
- 5 years ago
- Views:
Transcription
1 Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department of Electrical and Computer Engineering, University of California, San Diego 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 1
2 Outline Motivation Micromagnetics : FastMag solver Electromagnetics GPU Acceleration Projects Non-uniform Fast Fourier Transform Sparse Matrix Vector Multiplication Finite Difference Method solver OOMMF Simulation examples 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 2
3 Outline Motivation Micromagnetics : FastMag solver Electromagnetics GPU Acceleration Projects Non-uniform Fast Fourier Transform Sparse Matrix Vector Multiplication Finite Difference Method solver OOMMF Simulation examples 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 3
4 Motivation Typical applications of micromagnetic simulations Hard Drive Magnetic Materials Magnetic Memory Typical problem scale: 100K ~ 100M CPU? Too slow. MPI? Possible but expensive GPU (Relatively) low cost, high performance 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 4
5 Motivation Landau-Lifshitz-Gilbert equation for magnetization dynamics: mˆ t ˆ ˆ 2 ˆ eff 1 m H m m H eff Near field: differential operator Effective field: Solved this nonlinear differential equation by marching-on-in-time, e.g. Integral operator Long-range field: demagnetization field Dense matrix -> Bottleneck: O(N 2 ) Differential operator Local field: exchange field Sparse matrix -> Can become bottleneck mˆ ( t ) ˆ ( ) ˆ ( ) ˆ m t t m t m( t ) H ( t ) m 1 m m m eff m M s 2A mˆ d r r r ˆ 2 M m s Far field: integral operator 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 5
6 Motivation FastMag: a versatile GPU micromagnetic simulator Framework: j i Input interface FastMag LLG simulators Temperature/optics Hybrid simulators Fast Demag: NUFFT Fast Exchange Fast SpMV Time integration Fast Jacobian Parallelization CPU-GPU hybrid Output interface 6 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 6
7 Motivation Typical applications of electromagnetic simulations Mie MOM RCS(dBSW) (degree) Biomedical EM Equations to solve EM wave scattering from airplane Radar cross section 1 A t c t D 1 V t c t 2 B 2 E A H J V 2 2 J 0 0 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 7
8 Motivation Electromagnetic problem example Example: field-based volume integral equation jk 0 r' r 0 ' D ' e ' 2 kede ( k ) ' ed dv k0 dv 4 4 r r ' r r r ' jk r r D i Goal: solve electric flux Step 1: Quadrature points represents integral Q PD N D n 1 D f () r n n Step 2: Quadrature source to potential = ZQ Step 3: Quadrature observer function to testing function P T D i Sparse Matrix: maps basis function to quadrature source points Dense Matrix: Summation of the products between source and Green s function Sparse Matrix: maps quadrature potential points to testing functions 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 8
9 Outline Motivation Micromagnetics : FastMag solver Electromagnetics GPU Acceleration Projects Non-uniform Fast Fourier Transform Sparse Matrix Vector Multiplication Finite Difference Method solver OOMMF Simulation examples 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 9
10 NUFFT Traditional Fast Fourier Transform Advantage Computational complexity: O(N 2 ) O(NlogN) Well-known libraries: e.g. FFTW, Intel MKL, Nvidia CUFFT Electromagnetic probs: u j N jk i j e r r i1 ri rj i j Green's Function q( r ) i Disadvantage Cannot solve non-uniform source distribution problems Non-periodic problems require zero padding NUFFT: Non-uniform Fast Fourier Transform (or Adaptive Integral Method) * Uniform sampling general structures Non-uniform problem Uniform problem Ref: Zhu, Zhenhai, Ben Song, and Jacob White. "pfft++ A general and extensible fast integral equation solver based on a pre-corrected FFT algorithm." Micromagnetic probs: M s mˆ d r r r 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 10
11 NUFFT Algorithm building blocks: 1. Projection 2. Fast Fourier Transform 3. Back-projection 4. Near-field correction CUDA Implementation: Coalesced memory access Shared memory Thread independency Workload balancing Heavy floating ops 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 11
12 NUFFT Algorithm building blocks: 1. Projection 2. Fast Fourier Transform 3. Back-projection 4. Near-field correction CUDA Implementation: Coalesced memory access Shared memory Thread independency Workload balancing Heavy floating ops 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 12
13 NUFFT Algorithm building blocks: 1. Projection 2. Fast Fourier Transform 3. Back-projection 4. Near-field correction CUDA Implementation: Coalesced memory access Shared memory Thread independency Workload balancing Heavy floating ops 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 13
14 NUFFT Algorithm building blocks: 1. Projection 2. Fast Fourier Transform 3. Back-projection 4. Near-field correction CUDA Implementation: Coalesced memory access Shared memory Thread independency Workload balancing Heavy floating ops 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 14
15 NUFFT NUFFT on SINGLE GPU NUFFT on MULTIPLE GPUs CPU GPU CPU GPU Source coords Domain structure Source coords Domain structure Source coords Domain structure Source coordinates Domain structure Get Src Amp Src Amp Src Amp Projection Proj. Proj. Proj. Proj. Projection Src Amp on grids FFT Src Amp on grids in k- space Parallel FFT in 3D Wait TensorMul K-space multiplication Mul. Mul. Mul. Mul. Field in k-space ifft Near field correction Field on grids Near-field correction Parallel inverse FFT in 3D Corr. Corr. Corr. Corr. Observer field Observer field Observer field 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 15
16 NUFFT Simul. Time/ms Single GPU results INTEL 3.2GHz vs. NVIDIA Geforce GTX 690 (1 card) 100x~300x CPU-GPU speed up! Problem Size Direct CPU/s Direct GPU/s E S P T PT p p 1 P NUFFT CPU(cubic)/s NUFFT GPU(cubic)/s NUFFT GPU(linear)/s 16K 7.02E E E E E-3 64K 4.47E1 7.98E E0 1.23E E-3 256K 7.17E2 1.14E0 8.87E0 3.33E E-2 1M N/A 1.79E1 3.99E1 1.26E E-2 4M N/A N/A N/A 4.76E E-1 Multiple GPU results Multiple GPUs: 2 x NVIDIA Geforce GTX 690 (4GPUs) Problem size = 4M Parallel efficiency Ep = 77% across 4 GPUs 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD x 1.8x 2.6x 3.1x GPUs
17 SPMV Sparse Matrix-Vector Multiplication (SpMV) Application: differential operators, projections or interpolations Feature: #non-zero elements << #zero elements GPU Memory: only non-zero elements are kept in memory Computational Complexity: only non-zero elements are computed Example: compressed sparse row format (CSR) 2A ˆ 2 M m s A RowOffset = Ptr = Data = /20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 17
18 SPMV Implementation: single GPU Bind input vector to texture memory Parallel Reduction w/ shuffle operations Input vector Maximize the CPU-GPU memory transfer throughput: Important for CPU-GPU mixture solvers Pinned host memory -> increase memory transfer throughput by 100% Ref: 1. CUDA_C_Best_Practice, Nvidia; 2. Optimizing Parallel Reduction in CUDA, M. Harris; 3. How to Optimize Data Transfers in CUDA C/C++, M. Harris 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 18
19 SPMV Implementation: Sorting Sparse Matrix Input Vector Output Vector Sparse Matrix Input Vector Output Vector X = X = Sorting Sorting Vs. Box-sorting RCM 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 19
20 SPMV Implementation: multiple GPUs Only part of the matrix and input vector is assigned to each GPU Workload balance: leveraging the number of non-zero elements among GPUs Problem: memory scalability across GPUs GPU0 V 1 GPU1 V 2 V 2 V 3 V 3 GPU2 V 4 V 4 V 4 GPU3 V 5 V 5 V 5 V 6 V 6 V 7 V 7 V 8 Before sorting 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 20
21 SPMV Implementation: multiple GPUs Only part of the matrix and input vector is assigned to each GPU Workload balance: leveraging the number of non-zero elements among GPUs Sorting helps to keep the scalability of multi-gpu implementation GPU0 V 1 GPU1 V 2 V 2 V 3 V 4 GPU2 GPU3 V 5 V 6 V 7 V 8 After sorting 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 21
22 SPMV Speed results Two matrices generated from FEM mesh of a cube and a sphere, respectively. Three matrices chosen from Florida sparse matrix collection INTEL 3.2GHz w/ 1core running vs. 2 x NVIDIA Geforce GTX 690 CPU-GPU Memory transfer time is included nnz/ (nnz/row) Computational Time (ms) SPMV 1 GPU 2 GPUs 3 GPUs 4 GPUs Serial CPU MKL CPU Cusparse GPU FEM Cube 17.5M/ FEM Sphere 31.8M/ dielfilterv3real 89.3M/ gsm_ M/ Cube_Coup_dt6 124M/ /20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 22
23 Simul. Time/ms Simul. Time/ms Simul. Time/ms SPMV Speed results Multiple GPUs FEM Sphere Cube_Coup_dt6 DielFilterV3Real x 1.6x 2.2x GPUs memcpy kernel 3.1x x 1.9x 2.6x memcpy 10 kernel x GPUs GPUs memcpy kernel 1.0x 1.8x 2.7x 2.8x 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 23
24 OOMMF GPU OOMMF (Object-oriented Micromagnetic Framework) by NIST Open-source, thousands users worldwide Micromagnetic simulator Landau-Lifshitz-Gilbert equation Finite Difference method Object-oriented coding framework Periodic and non-periodic boundary condition 6-point and 12-point exchange field Uni-axial and cubic anisotropy field Flexibility in changing material properties Problem: CPU speed is too slow for large problems Solution: GPU parallel computation 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 24
25 OOMMF GPU GPU Parallelism m initiation H applied Hanisotropy k H k k m Hexchange M l 2 2 s ex m m m H m m H t 1 eff ( 2 eff ) Heff Happlied Hexchange Hanisotropy Hstray m t 1 m H 2 eff m ( m Heff ) 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 25
26 OOMMF GPU Speed Results Test case: cubic geometry with various problem size Hardware CPU: Xeon w/ 1 core running GPU: Nvidia GTX 690@915MHz w/ 1536 cores running Computational Time Speed-up: Problem Size CPU/ms GPU/ms Speed-up 16 3 = 4K x = 32K x = 256K x = 2M x24.5 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 26
27 Outline Motivatioin Micromagnetics : FastMag solver Electromagnetics GPU Acceleration Projects Non-uniform Fast Fourier Transform Sparse Matrix Vector Multiplication Finite Difference Method solver OOMMF Simulation examples 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 27
28 Micromagnetic Simulation: Magnetic head Challenge and features Complex geometry: 5-10 micron size, ~1000 aspect ratio, complex shapes and coupled parts Hundreds of millions of elements may be needed Parameters: M = emu cc,α = 0.2 s 5 3 5micron size, 50 80nm tip Adams and BDF time stepping Hardware: Tesla S2070 GPU, i7 CPU *Coils are surrounding the head Tip resolution Largest element # of tetrah. elements Time per 1 ns 10 nm 130 nm 130K 1.75 min 10 nm 57 nm 1.2M 17 min 10 nm 33 nm 4.8M 107 min 10 nm 10 nm 126M ~3 days 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 28
29 Micromagnetic Simulation: Granular media Features General Voronoi tessellation Distributions of particle size, shape, separation, material parameters, etc. Single and multiple layers with option for sub-layer discretization Surface and bulk exchange 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 29
30 Micromagnetic Simulation: Magnetic memories Spin-transfer-torque based Magnetic RAM spin valve structure V electron flow In-plane MRAM Free Layer Perpendicular MRAM Free Layer Fixed Layer Fixed Layer 30 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 30
31 Electromagnetic Simulation: Human body scattering Human body simulation Method: Potential Integral Equation Key algorithm: Non-uniform Fast Fourier Transform Mesh: 8.4 million tetrahedrons, 2mm resolution Total number of iterations: 109 Simulation time: 48mins Current distribution along x Incident wave x polarization z y x λ = 1.25m, ε r = 41.4 j18 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 31
32 Summary Have done A Finite Element Method based micromagnetic solver - FastMag Two GPU algorithms: Non-uniform Fast Fourier Transform and Sparse Matrix Vector multiplication with 20x ~ 300x GPU-CPU speed-up Multi-GPU implementation of two algorithms, gaining 65% - 85% parallel efficiency Electromagnetic and micromagnetic simulation examples Future work The entire solver of FastMag is going to be implemented on GPU With the release of CUDA 6.0, implementation with multiple GPUs will be more efficient More information? Please find it out at our group s website: Acknowledgement Shaojing Li Ruinan Chang Marko Lubarda Marco Escobar Majd Kuteifan Marco Menarini Simon Couture Javier Espigares 4/20/2014 COMPUTATIONAL ELECTROMAGNETICS AND MICROMAGNETICS GROUP, ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, UCSD 32
TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationS Subdivide, Preprocess and Conquer: Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems
S4283 - Subdivide, : Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems Elmar Westphal - Forschungszentrum Jülich GmbH 1 Contents Micromagnetism TetraMag, a FEM/BEM Micromagnetism Simulator
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationAccelerating Model Reduction of Large Linear Systems with Graphics Processors
Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationA CUDA Solver for Helmholtz Equation
Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College
More informationOpen-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --
Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer
More informationParallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2
1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013
More informationOn Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code
On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance
More informationFaster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs
Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline
More informationA Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures
A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationScalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver
Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,
More informationJacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA
Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationParticle Dynamics with MBD and FEA Using CUDA
Particle Dynamics with MBD and FEA Using CUDA Graham Sanborn, PhD Senior Research Engineer Solver 2 (MFBD) Team FunctionBay, Inc., S. Korea Overview MFBD: Multi-Flexible-Body Dynamics Rigid & flexible
More informationTwo case studies of Monte Carlo simulation on GPU
Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice
More informationParallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29
Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing
More information9. Spin Torque Majority Gate
eyond MOS computing 9. Spin Torque Majority Gate Dmitri Nikonov Thanks to George ourianoff Dmitri.e.nikonov@intel.com 1 Outline Spin majority gate with in-pane magnetization Spin majority gate with perpendicular
More informationAn FPGA Implementation of Reciprocal Sums for SPME
An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto Objectives Accelerate part of Molecular
More informationTowards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters
Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationGPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic
GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago
More informationLarge-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors
Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr) Principal Researcher / Korea Institute of Science and Technology
More informationDr. Andrea Bocci. Using GPUs to Accelerate Online Event Reconstruction. at the Large Hadron Collider. Applied Physicist
Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider Dr. Andrea Bocci Applied Physicist On behalf of the CMS Collaboration Discover CERN Inside the Large Hadron Collider at
More informationPopulation annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice
Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Michal Borovský Department of Theoretical Physics and Astrophysics, University of P. J. Šafárik in Košice,
More informationAccelerating Quantum Chromodynamics Calculations with GPUs
Accelerating Quantum Chromodynamics Calculations with GPUs Guochun Shi, Steven Gottlieb, Aaron Torok, Volodymyr Kindratenko NCSA & Indiana University National Center for Supercomputing Applications University
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More informationA MEMORY EFFICIENT AND FAST SPARSE MATRIX VECTOR PRODUCT ON A GPU
Progress In Electromagnetics Research, Vol. 116, 49 63, 2011 A MEMORY EFFICIENT AND FAST SPARSE MATRIX VECTOR PRODUCT ON A GPU A. Dziekonski, A. Lamecki, and M. Mrozowski WiComm Center of Excellence, Faculty
More informationResearch on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationarxiv: v1 [physics.comp-ph] 22 Nov 2012
A Customized 3D GPU Poisson Solver for Free BCs Nazim Dugan a, Luigi Genovese b, Stefan Goedecker a, a Department of Physics, University of Basel, Klingelbergstr. 82, 4056 Basel, Switzerland b Laboratoire
More informationScalable and Power-Efficient Data Mining Kernels
Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the
More informationParallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)
Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems
More informationarxiv: v1 [hep-lat] 7 Oct 2010
arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA
More informationACCELERATING SPARSE CHOLESKY FACTORIZATION ON THE GPU
ACCELERATING SPARSE CHOLESKY FACTORIZATION ON THE GPU STEVE RENNICH, SR. ENGINEER, NVIDIA DEVELOPER TECHNOLOGY DARKO STOSIC, PHD CANDIDATE, UNIV. FEDERAL DE PERNAMBUCO TIM DAVIS, PROFESSOR, CSE, TEXAS
More informationParallel Sparse Tensor Decompositions using HiCOO Format
Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline
More informationBeam dynamics calculation
September 6 Beam dynamics calculation S.B. Vorozhtsov, Е.Е. Perepelkin and V.L. Smirnov Dubna, JINR http://parallel-compute.com Outline Problem formulation Numerical methods OpenMP and CUDA realization
More informationAccelerating interior point methods with GPUs for smart grid systems
Downloaded from orbit.dtu.dk on: Dec 18, 2017 Accelerating interior point methods with GPUs for smart grid systems Gade-Nielsen, Nicolai Fog Publication date: 2011 Document Version Publisher's PDF, also
More informationNVIDIA MPI-enabled Iterative Solvers for Large Scale Problems. Joe Eaton Manager, AmgX CUDA Library NVIDIA
NVIDIA MPI-enabled Iterative Solvers for Large Scale Problems Joe Eaton Manager, AmgX CUDA Library NVIDIA ANSYS Fluent Fluent control flow Accelerate this first Non-linear iterations Assemble Linear System
More informationParallel Transposition of Sparse Data Structures
Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing
More informationA microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS
GTC 20130319 A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS Erik Lindahl erik.lindahl@scilifelab.se Molecular Dynamics Understand biology We re comfortably on
More informationWeile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun Fu 1, Xuebin Chi 1, Weiguo Gao 2, Lin-Wang Wang 3
A plane wave pseudopotential density functional theory molecular dynamics code on multi-gpu machine - GPU Technology Conference, San Jose, May 17th, 2012 Weile Jia 1, Long Wang 1, Zongyan Cao 1, Jiyun
More information7. Basics of Magnetization Switching
Beyond CMOS computing 7. Basics of Magnetization Switching Dmitri Nikonov Dmitri.e.nikonov@intel.com 1 Outline Energies in a nanomagnet Precession in a magnetic field Anisotropies in a nanomagnet Hysteresis
More informationComputers and Mathematics with Applications
Computers and Mathematics with Applications 68 (2014) 1151 1160 Contents lists available at ScienceDirect Computers and Mathematics with Applications journal homepage: www.elsevier.com/locate/camwa A GPU
More informationMSE 7025 Magnetic Materials (and Spintronics)
MSE 7025 Magnetic Materials (and Spintronics) Lecture 14: Spin Transfer Torque And the future of spintronics research Chi-Feng Pai cfpai@ntu.edu.tw Course Outline Time Table Week Date Lecture 1 Feb 24
More informationA CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method
A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method Jee Choi 1, Aparna Chandramowlishwaran 3, Kamesh Madduri 4, and Richard Vuduc 2 1 ECE, Georgia Tech 2 CSE, Georgia
More informationMARCH 24-27, 2014 SAN JOSE, CA
MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =
More informationIntroduction to Practical FFT and NFFT
Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Faculty of Mathematics Chemnitz University of Technology 07.09.2010 supported by BMBF grant 01IH08001B Table of Contents 1 Serial
More informationGPU Accelerated Markov Decision Processes in Crowd Simulation
GPU Accelerated Markov Decision Processes in Crowd Simulation Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National
More informationTight-Focusing of Short Intense Laser Pulses in Particle-in-Cell Simulations of Laser-Plasma Interaction
16/05/2017, CTU in Prague Tight-Focusing of Short Intense Laser Pulses in Particle-in-Cell Simulations of Laser-Plasma Interaction Bc. Petr Valenta (petr.valenta@eli-beams.eu) Supervisors: doc. Ing. Ondrej
More informationAn Algorithmic Framework of Large-Scale Circuit Simulation Using Exponential Integrators
An Algorithmic Framework of Large-Scale Circuit Simulation Using Exponential Integrators Hao Zhuang 1, Wenjian Yu 2, Ilgweon Kang 1, Xinan Wang 1, and Chung-Kuan Cheng 1 1. University of California, San
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,
More informationUniversità degli studi di Udine
Università degli studi di Udine GPU Accelerated Time-Domain Discrete Geometric Approach Method for Maxwell's Equations on Tetrahedral Grids This is the peer reviewd version of the followng article: Original
More informationHydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff
Hydra A library for data analysis in massively parallel platforms A. Augusto Alves Jr and Michael D. Sokoloff University of Cincinnati aalvesju@cern.ch Presented at NVIDIA s GPU Technology Conference,
More informationReal-time signal detection for pulsars and radio transients using GPUs
Real-time signal detection for pulsars and radio transients using GPUs W. Armour, M. Giles, A. Karastergiou and C. Williams. University of Oxford. 15 th July 2013 1 Background of GPUs Why use GPUs? Influence
More informationS0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA
S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session) Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,
More informationEfficient Molecular Dynamics on Heterogeneous Architectures in GROMACS
Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS Berk Hess, Szilárd Páll KTH Royal Institute of Technology GTC 2012 GROMACS: fast, scalable, free Classical molecular dynamics package
More informationarxiv: v1 [physics.comp-ph] 30 Oct 2017
An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration Daniel Guterding 1, and Harald O. Jeschke 1 Lucht Probst Associates, Große Gallusstraße 9, 011 Frankfurt am Main, Germany, European
More informationSimulating radiation from Laser-wakefield accelerators
TUSBC1 11 th International Computational Accelerator Physics Conference ICAP 2012 in Rostock Simulating radiation from Laser-wakefield accelerators Alexander Debus, Richard Pausch, René Widera, Michael
More informationLevel-3 BLAS on a GPU
Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón
More informationFINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION
FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros
More informationQuantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm)
Quantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm) Alexander Smith & Khashayar Khavari Department of Electrical and Computer Engineering University of Toronto April 15, 2009 Alexander
More informationEfficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers
Efficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers Jaewoon Jung (RIKEN, RIKEN AICS) Yuji Sugita (RIKEN, RIKEN AICS, RIKEN QBiC, RIKEN ithes) Molecular Dynamics
More informationTips Geared Towards R. Adam J. Suarez. Arpil 10, 2015
Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done
More informationUniversität Dortmund UCHPC. Performance. Computing for Finite Element Simulations
technische universität dortmund Universität Dortmund fakultät für mathematik LS III (IAM) UCHPC UnConventional High Performance Computing for Finite Element Simulations S. Turek, Chr. Becker, S. Buijssen,
More informationCOMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD
XVIII International Conference on Water Resources CMWR 2010 J. Carrera (Ed) c CIMNE, Barcelona, 2010 COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD James.E. McClure, Jan F. Prins
More informationMassively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling
2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)
More informationFEAST eigenvalue algorithm and solver: review and perspectives
FEAST eigenvalue algorithm and solver: review and perspectives Eric Polizzi Department of Electrical and Computer Engineering University of Masachusetts, Amherst, USA Sparse Days, CERFACS, June 25, 2012
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationAcoustics Analysis of Speaker ANSYS, Inc. November 28, 2014
Acoustics Analysis of Speaker 1 Introduction ANSYS 14.0 offers many enhancements in the area of acoustics. In this presentation, an example speaker analysis will be shown to highlight some of the acoustics
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Underwater Acoustics Session 5aUW: Using Graphic Processing Units for
More informationElectromagnetic Field Analysis
Spectral Integral Method and Spectral Element Method Domain Decomposition Method for Electromagnetic Field Analysis by Yun Lin Department of Electrical and Computer Engineering Duke University Date: Approved:
More information上海超级计算中心 Shanghai Supercomputer Center. Lei Xu Shanghai Supercomputer Center San Jose
上海超级计算中心 Shanghai Supercomputer Center Lei Xu Shanghai Supercomputer Center 03/26/2014 @GTC, San Jose Overview Introduction Fundamentals of the FDTD method Implementation of 3D UPML-FDTD algorithm on GPU
More informationUtilisation de la compression low-rank pour réduire la complexité du solveur PaStiX
Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.
More informationOn the Computational Complexity of the Discrete Pascal Transform
6 th International Conference Logic and Applications LAP 207, September 8-22, 207, Dubrovnik, Croatia On the Computational Complexity of the Discrete Pascal Transform Dušan B. Gajić, Radomir S. Stanković
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationFine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning
Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology, USA SPPEXA Symposium TU München,
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationIntroduction to Practical FFT and NFFT
Introduction to Practical FFT and NFFT Michael Pippig and Daniel Potts Department of Mathematics Chemnitz University of Technology September 14, 211 supported by BMBF grant 1IH81B Table of Contents 1 Serial
More informationGPU accelerated Arnoldi solver for small batched matrix
15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA
More informationGPU Computing Activities in KISTI
International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr
More informationA robust multilevel approximate inverse preconditioner for symmetric positive definite matrices
DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric
More informationA particle-in-cell method with adaptive phase-space remapping for kinetic plasmas
A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas Bei Wang 1 Greg Miller 2 Phil Colella 3 1 Princeton Institute of Computational Science and Engineering Princeton University
More informationCoupling atomistic and continuum modelling of magnetism
Coupling atomistic and continuum modelling of magnetism M. Poluektov 1,2 G. Kreiss 2 O. Eriksson 3 1 University of Warwick WMG International Institute for Nanocomposites Manufacturing 2 Uppsala University
More informationPerpendicular MTJ stack development for STT MRAM on Endura PVD platform
Perpendicular MTJ stack development for STT MRAM on Endura PVD platform Mahendra Pakala, Silicon Systems Group, AMAT Dec 16 th, 2014 AVS 2014 *All data in presentation is internal Applied generated data
More informationJulian Merten. GPU Computing and Alternative Architecture
Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg
More informationReduced Vlasov-Maxwell modeling
Reduced Vlasov-Maxwell modeling Philippe Helluy, Michel Massaro, Laurent Navoret, Nhung Pham, Thomas Strub To cite this version: Philippe Helluy, Michel Massaro, Laurent Navoret, Nhung Pham, Thomas Strub.
More informationHybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS
Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures
More informationsri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy
2D Implicit Charge- and Energy- Conserving sri Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy Mentors Dana Knoll and Allen McPherson IS&T CoDesign Summer School 2012, Los Alamos
More informationPerm State University Research-Education Center Parallel and Distributed Computing
Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling
More information1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria
1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation
More informationA Two-Scale Adaptive Integral Method
A Two-Scale Adaptive Integral Method Ali Yilmaz Department of Electrical & Computer Engineering University of Texas at Austin IEEE APS International Symposium USC/URSI ational Radio Science Meeting San
More informationRWTH Aachen University
IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016
More information3D Cartesian Transport Sweep for Massively Parallel Architectures on top of PaRSEC
3D Cartesian Transport Sweep for Massively Parallel Architectures on top of PaRSEC 9th Scheduling for Large Scale Systems Workshop, Lyon S. Moustafa, M. Faverge, L. Plagne, and P. Ramet S. Moustafa, M.
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More information