Accelerating Proton Computed Tomography with GPUs
|
|
- Kenneth Caldwell
- 5 years ago
- Views:
Transcription
1 Accelerating Proton Computed Tomography with GPUs Thomas'D.'Uram,'Argonne'Leadership'Compu2ng'Facility' Michael'E.'Papka,'Argonne'Leadership'Compu2ng'Facility,'Northern'Illinois'University' Nicholas'T.'Karonis,'Northern'Illinois'University,'Argonne'Na2onal'Laboratory
2 Overview Proton'computed'tomography'(pCT)'is'an'alterna2ve'to'xEray'based'CAT'scans,'which' promises'several'medical'benefits'at'the'cost'of'being'significantly'more'computa2onally' expensive' We'designed'a'60Enode'GPU'cluster'to'meet'the'computa2onal'challenge' Computed'tomography' Benefits'of'proton'computed'tomography' Computa2onal'problem'descrip2on' CPU/GPU'performance'comparison 2
3 What is Computed Tomography? CAT'(or'CT)'scans'are'wellEknown' CAT'==' computerized'axial'tomography ' CAT'scans'are'used'to'reconstruct'the'density'distribu2on'within'a'volume,'typically'used' in'medical'imaging' CAT'scans'are'conducted'with'photons'(XErays)' What'is'Proton'Computed'Tomography?' A'reconstruc2on'technique'similar'to'XEray'computed'tomography,'conducted'with' protons'instead'of'photons 3
4 Why Proton Computed Tomography? 13'million'people'are'diagnosed'with'cancer'each'year'worldwide' 2.6'million'of'them'are'candidates'for'proton'therapy'treatment' Proton'therapy'involves'deposi2ng'protons'at'precise'loca2ons'within'a'tumor' site'where'they'irradiate'the'target'2ssue' The'protons'emit'lower'radia2on'as'they'travel'through'the'body'un2l'they' reach'the'target,'where'they'emit'a'burst'of'radia2on'(the'bragg'peak)' Healthy'2ssue'beyond'the'tumor'site'receives'nominally'no'radia2on' It'is'crucially'important'to'precisely'iden2fy'the'tumor'site' To'ensure'that'cancerous'2ssue'is'destroyed' To'avoid'damaging'healthy'2ssue'surrounding'the'tumor,'especially'in' sensi2ve'areas' Proton'therapy'treatment'planning'is'currently'performed'using'XEray'imaging' Photons'and'protons'interact'with'intermediate'material'differently' Conversion'between'photon/proton'modali2es'involves'a'systema0c'range' error'of'365% Image source: Wikipedia 4
5 Proton computed tomography Our'goal'is'to'reconstruct'volume' of'adult'human'head'in'under'10' minutes'' Protons'directed'through'two' frontal'planes,'the'target'volume,' two'backing'planes,'and'finally'a' calorimeter' Measures'posi2on'and'angle'of' incidence'of'protons'at'entry'and' exit,'and'the'energy'loss Final System (in black): 4 tracking planes with XY Si detectors: calorimeter with 64 end=on CsI Crystals Planned Scaled Prototype (in red): 4 planes of XY Si detectors (2 X-SSDs and 2 Y-SSDs per plane): 8 CsI Crystal bars Calorimeter: Each bar corresponds to a 5cm x 5cm CsI Crystal, read out by a photodiode Tracking Plane: Each large square corresponds to one doublesided or two single-sided 9cm x 9cm SSDs 5
6 Problem Description Proton'source,'detector'planes,'and'calorimeter' mounted'on'rota2ng'gantry,'as'in'familiar'xeray'ct' configura2ons' Data'collected'over'a'full'rota2on'of'the'gantry,'180' samples'(every'2'degrees)' Ini2al'detector'designed'to'image'a'human'head' (nominally'25cm'cube)' From'physics'domain,'and'so'that'each'voxel'is' sufficiently'represented'in'the'resul2ng'system' matrix,'we'approximate'requiring'a'volume' consis2ng'of'256x256x36'(2,359,296=~'2.4m)' voxels'and'2'billion'protons'total' For'each'proton,'we'track'11'values:' [x,y,z]'at'entry' [x,y,z]'at'exit' angle'at'entry'and'exit' input'and'output'energy' gantry'rota2on'angle Final System (in black): 4 tracking planes with XY Si detectors: calorimeter with 64 end=on CsI Crystals Planned Scaled Prototype (in red): 4 planes of XY Si detectors (2 X-SSDs and 2 Y-SSDs per plane): 8 CsI Crystal bars Calorimeter: Each bar corresponds to a 5cm x 5cm CsI Crystal, read out by a photodiode Tracking Plane: Each large square corresponds to one doublesided or two single-sided 9cm x 9cm SSDs 6
7 Baseline execution times Began'with'serial'code' 1 billion protons, 60 nodes, CPU only that'took'more'than'7' Phase Execution time (seconds) hours'to'process'131m' protons' Parallelized'with'MPI'to' use'mul2ple'cpus' Established'baseline' execu2on'2mes Setup { Most Likely Path (MLP) Linear solver (CARP) Overall execution time
8 MLP (Most Likely Path) In'contrast'with'XEray'computed'tomography'in' which'the'par2cles'traverse'the'volume'in' straight'lines,'in'pct'the'protons'are'scakered' by'the'material'as'they'travel'through'the' volume' MLP'computes'the'path'integral'of'the'protons' through'the'material'based'on'their'known' entry'and'exit'loca2ons'and'angles'and'the' energy'loss' The'proton'paths'are'discre2zed'as'the'voxels' touched'while'traversing'the'volume' Path'integral'calcula2ons'are'independent'and' parallelize'at'the'level'of'protons'(but'inherently' sequen2al'within'each'path) 8
9 Linear solver (CARP) The'result'of'MLP'is'a'system'of'equa2ons'rela2ng'each'proton s'touched' voxels'to'the'rela2ve'stopping'power'(roughly,'the'energy'loss)' We'began'the'project'with'a'CPU'implementa2on'of'the'rowEac2on'based' sparse'itera2ve'solver'carp'(component'averaged'row'projec2ons)' CARP'decomposes'the'matrix'into'row'blocks,'one'block'per'processor,'and' iterates'to'sa2sfactory'convergence:' Performs'a'JacobiElike'itera2on'sequen2ally'through'the'rows'to'produce'a'perE block'solu2on'vector' Averages'the'perEblock'solu2on'vectors'(in'componentEwise'fashion)' Redistributes'the'solu2on'vector'x'to'all'processors 9
10 Hardware: Gaea GPU cluster at Northern Illinois University 60'compute'nodes' Node'configura2on' 2x'Intel'X5650'12Ecore'CPUs' 2x'NVIDIA'M2070'GPUs' 72GB'RAM' QDR'Infiniband 10
11 Data decomposition 2.1B'protons'/'60'nodes'=~'35M'protons'per'node' 2'GPUs'E>'17M'protons'per'GPU' The'maximum'voxels'per'proton'is'~364' 17M'protons'x'364'voxels'x'4'bytes/voxel'='25GB'data'per'GPU' Larger'than'available'M2070'GPU'memory'of'6GB' High'watermark'memory'requirement'on'cluster'is'3TB'(aggregate) 11
12 MLP (Most Likely Path) CUDA implementation MLP'involves'calcula2ng'path'integral'of'the'protons' Ini2al'implementa2on'assigns'a'thread'per'proton' PerEGPU'proton'data'is'larger'than'GPU'memory'on'M2070' Stage'batches'of'protons'to'GPU' MLP'was'ported'to'the'GPU,'with'mul2ple'variants' gpu'struct:'direct'port'of'cpuebased'code'using'structured'proton/voxel'data' gpu'flat'memory:'flat'memory'space'with'pereproton'padded'voxel'arrays' gpu'flat'memory'+'overlap:'streaming'computa2on'to'overlap'compute'and' hostedevice'transfers' 12
13 MLP (Most Likely Path) CUDA implementation (26M protons, 2 GPUs) Implementation Execution time (seconds) Speedup cpu gpu_struct x gpu_flat_memory x gpu_flat_memory + overlap x 13
14 Linear solver (CARP) CUDA implementation (26M protons, 2 GPUs) CARP'ported'directly'from'CPU'code' PerEnode'rowEblock'data'larger'than'GPU'memory;'batch'process' Further'subdivide'perEnode'rowEblock'into'rowEblocks'per'streaming'mul2processor' Implementation Execution time Speedup (seconds) cpu gpu x Limited'speedup'in'GPU'implementa2on,'because:' roweac2on'based'solver'constrains'parallel'granularity' scakered'memory'accesses'constrain'performance,'as'is'typical'of'sparse'matrix'opera2ons 14
15 Performance at scale 2'billion'protons,'60'nodes,'12'CPU'cores/node,'2'GPUs/node Phase Execution time (seconds) Setup 22.3 Most Likely Path (MLP) Linear solver (CARP) Overall execution time Initial goal was to complete in <600s (10mins) 15
16 Further work: CARP Hybrid CPU/GPU Assign'row'blocks'to'CPU'and'GPU'simultaneously' Weighted'work'distribu2on'based'on'ini2al'performance'measurements 2'billion'protons,'60'nodes,'12'cores/node,'2'GPUs/node Implementation Execution time (seconds) Speedup cpu gpu x hybrid x 16
17 Future work Integrate'alterna2ve'linear'solvers'to'improve'performance (amgx,'cusparse,'petsc)' Consider'alternate'data'decomposi2ons'to'improve'cache'locality' volume'slab'per'streaming'mul2processor' volume'wedge'per'streaming'mul2processor'' Measure'performance'on'nextEgenera2on'GPUs' K80'for'greater'performance' Jetson/TK1'for'greater'performance/wak' Experiment'with'GPU'cloud'plauorms'(Amazon'cloud) 17
18 Acknowledgements Nicholas'T.'Karonis,'Northern'Illinois'University'(NIU)'and'Argonne'Na2onal'Laboratory'(ANL)' Michael'E.'Papka,'NIU'and'ANL' Caesar'Ordoñez,'NIU' Eric'Olson,'ANL' Kirk'Duffin,'NIU' Venkat'Vishwanath,'ANL' US'Department'of'Defense'contract'number'W81XWHE10E1E0170'sponsored'this'work.' 18
Acceleration of WRF on the GPU
Acceleration of WRF on the GPU Daniel Abdi, Sam Elliott, Iman Gohari Don Berchoff, Gene Pache, John Manobianco TempoQuest 1434 Spruce Street Boulder, CO 80302 720 726 9032 TempoQuest.com THE WORLD S FASTEST
More informationHybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS
Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures
More informationUsing a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics
Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de
More informationDynamic Scheduling for Work Agglomeration on Heterogeneous Clusters
Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed
More informationScalable and Power-Efficient Data Mining Kernels
Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationDr. Andrea Bocci. Using GPUs to Accelerate Online Event Reconstruction. at the Large Hadron Collider. Applied Physicist
Using GPUs to Accelerate Online Event Reconstruction at the Large Hadron Collider Dr. Andrea Bocci Applied Physicist On behalf of the CMS Collaboration Discover CERN Inside the Large Hadron Collider at
More informationGPU Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications http://www.tempoquest.com Allen Huang, Ph.D. allen@tempoquest.com CTO, Tempo Quest Inc.
More informationParallel Longest Common Subsequence using Graphics Hardware
Parallel Longest Common Subsequence using Graphics Hardware John Kloetzli rian Strege Jonathan Decker Dr. Marc Olano Presented by: rian Strege 1 Overview Introduction Problem Statement ackground and Related
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationUNIVERSITY of CALIFORNIA SANTA CRUZ
UNIVERSITY of CALIFORNIA SANTA CRUZ ENERGY LOSS OF PROTONS IN SILICON STRIP DETECTORS FOR COMPUTED TOMOGRAPHY A thesis submitted in partial satisfaction of the requirements for the degree of BACHELOR OF
More informationHydra: Generation and Tuning of parallel solutions for linear algebra equations. Alexandre X. Duchâteau University of Illinois at Urbana Champaign
Hydra: Generation and Tuning of parallel solutions for linear algebra equations Alexandre X. Duchâteau University of Illinois at Urbana Champaign Collaborators Thesis Advisors Denis Barthou (Labri/INRIA
More informationGPU Accelerated Markov Decision Processes in Crowd Simulation
GPU Accelerated Markov Decision Processes in Crowd Simulation Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National
More informationHeterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry
Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)
More informationGPU Accelerated 3-D Modeling and Simulation of a Blended Kinetic Impact and Nuclear Subsurface Explosion
GPU Accelerated 3-D Modeling and Simulation of a Blended Kinetic Impact and Nuclear Subsurface Explosion Brian Kaplinger Christian Seltzer Pavithra Premartne Bong Wie Iowa State University Introduction
More informationSP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay
SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain
More informationS8241 VERSIONING GPU- ACCLERATED WRF TO Jeff Adie, 26 March, 2018 (Presented by Stan Posey, NVIDIA)
S8241 VERSIONING GPU- ACCLERATED WRF TO 3.7.1 Jeff Adie, 26 March, 2018 (Presented by Stan Posey, NVIDIA) 1 ACKNOWLEDGEMENT The work presented here today would not have been possible without the efforts
More informationAccelerating Three-Body Molecular Dynamics Potentials Using NVIDIA Tesla K20X GPUs. GE Global Research Masako Yamada
Accelerating Three-Body Molecular Dynamics Potentials Using NVIDIA Tesla K20X GPUs GE Global Research Masako Yamada Overview of MD Simulations Non-Icing Surfaces for Wind Turbines Large simulations ~ 1
More information上海超级计算中心 Shanghai Supercomputer Center. Lei Xu Shanghai Supercomputer Center San Jose
上海超级计算中心 Shanghai Supercomputer Center Lei Xu Shanghai Supercomputer Center 03/26/2014 @GTC, San Jose Overview Introduction Fundamentals of the FDTD method Implementation of 3D UPML-FDTD algorithm on GPU
More informationBlock AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark
Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise
More informationEfficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers
Efficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers Jaewoon Jung (RIKEN, RIKEN AICS) Yuji Sugita (RIKEN, RIKEN AICS, RIKEN QBiC, RIKEN ithes) Molecular Dynamics
More informationPerm State University Research-Education Center Parallel and Distributed Computing
Perm State University Research-Education Center Parallel and Distributed Computing A 25-minute Talk (S4493) at the GPU Technology Conference (GTC) 2014 MARCH 24-27, 2014 SAN JOSE, CA GPU-accelerated modeling
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationOn Verification and Validation of Spring Fabric Model
On Verification and Validation of Spring Fabric Model Zheng Gao, Qiangqiang Shi, Yiyang Yang, Bernard Moore, and Xiaolin Li Department of Applied Mathematics and Statistics Stony Brook University Stony
More informationFactoring Solution Sets of Polynomial Systems in Parallel
Factoring Solution Sets of Polynomial Systems in Parallel Jan Verschelde Department of Math, Stat & CS University of Illinois at Chicago Chicago, IL 60607-7045, USA Email: jan@math.uic.edu URL: http://www.math.uic.edu/~jan
More informationPractical Combustion Kinetics with CUDA
Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationGamma Ray Physics in the Fermi era. F.Longo University of Trieste and INFN
Gamma Ray Physics in the Fermi era F.Longo University of Trieste and INFN Vulcano, May 22, 2018 F.Longo et al. -- 1 Gamma-ray astrophysics above 100 MeV AGILE Fermi 2 Picture of the day, Feb. 28, 2011,
More informationRandom Sampling for Short Lattice Vectors on Graphics Cards
Random Sampling for Short Lattice Vectors on Graphics Cards Michael Schneider, Norman Göttert TU Darmstadt, Germany mischnei@cdc.informatik.tu-darmstadt.de CHES 2011, Nara September 2011 Michael Schneider
More informationNonhydrostatic Icosahedral Model (NIM) A 3-D finite-volume NIM Jin Lee (+ other contributors)
Nonhydrostatic Icosahedral Model (NIM) A 3-D finite-volume NIM Jin Lee (+ other contributors) Earth System Research Laboratory (ESRL) NOAA/OAR GFDL,NSSL,ARL,AOML,GLERL,PMEL Aeronomy Lab. Climate Diagnostic
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More informationPetascale Quantum Simulations of Nano Systems and Biomolecules
Petascale Quantum Simulations of Nano Systems and Biomolecules Emil Briggs North Carolina State University 1. Outline of real-space Multigrid (RMG) 2. Scalability and hybrid/threaded models 3. GPU acceleration
More informationParticle Dynamics with MBD and FEA Using CUDA
Particle Dynamics with MBD and FEA Using CUDA Graham Sanborn, PhD Senior Research Engineer Solver 2 (MFBD) Team FunctionBay, Inc., S. Korea Overview MFBD: Multi-Flexible-Body Dynamics Rigid & flexible
More informationA Blackbox Polynomial System Solver on Parallel Shared Memory Computers
A Blackbox Polynomial System Solver on Parallel Shared Memory Computers Jan Verschelde University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science The 20th Workshop on
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationUsing AmgX to accelerate a PETSc-based immersed-boundary method code
29th International Conference on Parallel Computational Fluid Dynamics May 15-17, 2017; Glasgow, Scotland Using AmgX to accelerate a PETSc-based immersed-boundary method code Olivier Mesnard, Pi-Yueh Chuang,
More informationLearning Particle Physics by Example:
Learning Particle Physics by Example: Accelerating Science with Generative Adversarial Networks arxiv:1701.05927, arxiv:1705.02355 @lukede0 @lukedeo lukedeo@manifold.ai https://ldo.io Luke de Oliveira
More informationa) Dipartimento di Fisiopatologia Clinica, Università degli Studi di Firenze, v.le Morgagni 85, I-
Proton Radiography for Clinical Applications C. Talamonti a,b,c, V. Reggioli a, M. Bruzzi b,d, M. Bucciolini a,b,c, C. Civinini b, L. Marrazzo c, D. Menichelli b,d, S. Pallotta a,b,c, N. Randazzo e, V.
More informationThe Lattice Boltzmann Method for Laminar and Turbulent Channel Flows
The Lattice Boltzmann Method for Laminar and Turbulent Channel Flows Vanja Zecevic, Michael Kirkpatrick and Steven Armfield Department of Aerospace Mechanical & Mechatronic Engineering The University of
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationA Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!
More informationAccelerating computation of eigenvectors in the nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationComparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience
Comparing the Efficiency of Iterative Eigenvalue Solvers: the Quantum ESPRESSO experience Stefano de Gironcoli Scuola Internazionale Superiore di Studi Avanzati Trieste-Italy 0 Diagonalization of the Kohn-Sham
More informationAccelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationCsI Calorimeter for KOTO experiment
PROCEEDINGSof CHEF23 CsI Calorimeter for KOTO experiment Department of Physics, Osaka University E-mail: sato@champ.hep.sci.osaka-u.ac.jp The J-PARC KOTO experiment searches for K L π ν ν decay by observing
More informationA microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS
GTC 20130319 A microsecond a day keeps the doctor away: Efficient GPU Molecular Dynamics with GROMACS Erik Lindahl erik.lindahl@scilifelab.se Molecular Dynamics Understand biology We re comfortably on
More informationMultivariate density estimation and its applications
Multivariate density estimation and its applications Wing Hung Wong Stanford University June 2014, Madison, Wisconsin. Conference in honor of the 80 th birthday of Professor Grace Wahba Lessons I learned
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationAccelerating Model Reduction of Large Linear Systems with Graphics Processors
Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex
More informationLight ion recoil detector
Light ion recoil detector Overall design The detector for light (target-like) particles is a substantial part of the R3B setup. It allows registration of recoils in coincidence with the heavy fragments,
More informationQuantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm)
Quantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm) Alexander Smith & Khashayar Khavari Department of Electrical and Computer Engineering University of Toronto April 15, 2009 Alexander
More informationS0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA
S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session) Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,
More informationGPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign
More informationParallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)
Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics) Eftychios Sifakis CS758 Guest Lecture - 19 Sept 2012 Introduction Linear systems
More informationA Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures
A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationPractical Free-Start Collision Attacks on full SHA-1
Practical Free-Start Collision Attacks on full SHA-1 Inria and École polytechnique, France Nanyang Technological University, Singapore Joint work with Thomas Peyrin and Marc Stevens Séminaire Cryptologie
More informationScalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver
Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,
More informationAcoustics Analysis of Speaker ANSYS, Inc. November 28, 2014
Acoustics Analysis of Speaker 1 Introduction ANSYS 14.0 offers many enhancements in the area of acoustics. In this presentation, an example speaker analysis will be shown to highlight some of the acoustics
More informationETH Beowulf day January 31, Adrian Biland, Zhiling Chen, Derek Feichtinger, Christoph Grab, André Holzner, Urs Langenegger
CMS SM Meeting Nov 28 2005 Analysis and simulation of proton-proton collision data at LHC ETH Beowulf day January 31, 2006 Adrian Biland, Zhiling Chen, Derek Feichtinger, Christoph Grab, André Holzner,
More informationPERFORMANCE OF THE ATLAS MUON TRIGGER IN RUN 2
PERFORMANCE OF THE ATLAS MUON TRIGGER IN RUN 2 M.M. Morgenstern On behalf of the ATLAS collaboration Nikhef, National institute for subatomic physics, Amsterdam, The Netherlands E-mail: a marcus.matthias.morgenstern@cern.ch
More informationHigh-performance processing and development with Madagascar. July 24, 2010 Madagascar development team
High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar
More informationGroebner Bases in Boolean Rings. for Model Checking and. Applications in Bioinformatics
Groebner Bases in Boolean Rings for Model Checking and Applications in Bioinformatics Quoc-Nam Tran, Ph.D. Professor of Computer Science Lamar University Invited Talk at CMU on October 8, 2010 Outline
More informationS Subdivide, Preprocess and Conquer: Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems
S4283 - Subdivide, : Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems Elmar Westphal - Forschungszentrum Jülich GmbH 1 Contents Micromagnetism TetraMag, a FEM/BEM Micromagnetism Simulator
More informationScalable Non-blocking Preconditioned Conjugate Gradient Methods
Scalable Non-blocking Preconditioned Conjugate Gradient Methods Paul Eller and William Gropp University of Illinois at Urbana-Champaign Department of Computer Science Supercomputing 16 Paul Eller and William
More informationUniversität Dortmund UCHPC. Performance. Computing for Finite Element Simulations
technische universität dortmund Universität Dortmund fakultät für mathematik LS III (IAM) UCHPC UnConventional High Performance Computing for Finite Element Simulations S. Turek, Chr. Becker, S. Buijssen,
More informationKlaus Schulten Department of Physics and Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign
Klaus Schulten Department of Physics and Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign GTC, San Jose Convention Center, CA Sept. 20 23, 2010 GPU and the Computational
More informationCalibration of the BABAR CsI (Tl) calorimeter
Journal of Physics: Conference Series Calibration of the BABAR CsI (Tl) calorimeter To cite this article: Jörg Marks and the Calorimeter Group of the BARBAR Collaboration 2009 J. Phys.: Conf. Ser. 60 02005
More informationThe TIC project: tracking gamma rays with a calorimeter
The TIC project: tracking gamma rays with a calorimeter Nicola Mori INFN sezione di Firenze 5th HERD workshop CERN - 12th October 2017 Future cosmic-ray experiments Requirements for the high-energy frontier:
More informationRadiobiology at SCIPP
Radiobiology at SCIPP Hartmut F.-W. Sadrozinski Santa Cruz Inst. for Particle Physics SCIPP Loma Linda University Medical Center UCSC Santa Cruz Institute of Particle Physics INFN Florence & Catania Actvities
More informationQuantum Artificial Intelligence and Machine Learning: The Path to Enterprise Deployments. Randall Correll. +1 (703) Palo Alto, CA
Quantum Artificial Intelligence and Machine : The Path to Enterprise Deployments Randall Correll randall.correll@qcware.com +1 (703) 867-2395 Palo Alto, CA 1 Bundled software and services Professional
More informationPerformance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures
Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures José I. Aliaga Performance and Energy Analysis of the Iterative Solution of Sparse
More informationR3B simulations status
R3B simulations status Héctor Alvarez Pol GENP, Univ. Santiago de Compostela developers H. Alvarez Pol (USC), S. Beceiro (USC), M. Cherciu (ISS Bucharest), B. Fernández (Univ. Liverpool), O. Kiselev (GSI),
More informationA CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method
A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method Jee Choi 1, Aparna Chandramowlishwaran 3, Kamesh Madduri 4, and Richard Vuduc 2 1 ECE, Georgia Tech 2 CSE, Georgia
More informationClassification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks
Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Mohit Shridhar Stanford University mohits@stanford.edu, mohit@u.nus.edu Abstract In particle physics, Higgs Boson to tau-tau
More informationPractical Free-Start Collision Attacks on 76-step SHA-1
Practical Free-Start Collision Attacks on 76-step SHA-1 Inria and École polytechnique, France Nanyang Technological University, Singapore Joint work with Thomas Peyrin and Marc Stevens CWI, Amsterdam 2015
More informationA4 Laser Compton polarimetry
A4 Laser Compton polarimetry progress since PAVI06 J. Diefenbach Workshop on Parity Violation 2009, Bar Harbor, Maine - 24.06.2009 Outline Principles of Laser Compton polarimetry Experimental Setup Data
More informationA Fast, Parallel Potential Flow Solver
Advisor: Jaime Peraire December 16, 2012 Outline 1 Introduction to Potential FLow 2 The Boundary Element Method 3 The Fast Multipole Method 4 Discretization 5 Implementation 6 Results 7 Conclusions Why
More informationValidation of CFD with PIV and other methods Part II
Validation of CFD with PIV and other methods Part II S.V.Jansen1, M.Behbahani2, M.Laumen1, T.Kaufmann1, M.Hormes1, T.Schmitz-Rode1, M.Behr2, U.Steinseifer1 1: Applied Medical Engineering, Helmholtz Institute,
More informationParallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29
Parallelization of Molecular Dynamics (with focus on Gromacs) SeSE 2014 p.1/29 Outline A few words on MD applications and the GROMACS package The main work in an MD simulation Parallelization Stream computing
More informationStatic-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems
Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse
More informationN-body Simulations. On GPU Clusters
N-body Simulations On GPU Clusters Laxmikant Kale Filippo Gioachin Pritish Jetley Thomas Quinn Celso Mendes Graeme Lufkin Amit Sharma Joachim Stadel Lukasz Wesolowski James Wadsley Edgar Solomonik Fabio
More informationAntimatter. Jan Meier. Seminar: Experimental Methods in Atomic Physics May, 8th 2007
Antimatter Jan Meier Seminar: Experimental Methods in Atomic Physics May, 8th 27 Overview Antimatter and CPT theorie what is antimatter? what physics does it follow to? First observations of antimatter
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More informationarxiv: v1 [cs.ne] 29 Jul 2014
A CUDA-Based Real Parameter Optimization Benchmark Ke Ding and Ying Tan School of Electronics Engineering and Computer Science, Peking University arxiv:1407.7737v1 [cs.ne] 29 Jul 2014 Abstract. Benchmarking
More informationATHENA / AD-1. First production and detection of cold antihydrogen atoms. ATHENA Collaboration. Rolf Landua CERN
ATHENA / AD-1 First production and detection of cold antihydrogen atoms ATHENA Collaboration Rolf Landua CERN 1 LONG TERM PHYSICS GOALS Antihydrogen = Hydrogen? CPT Gravity But... 2 FIRST GOAL PRODUCTION
More informationParallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco
Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and
More informationBeam Test for Proton Computed Tomography PCT
Beam Test for Proton Computed Tomography PCT (aka Mapping out The Banana ) Hartmut F.-W. Sadrozinski Santa Cruz Inst. for Particle Physics SCIPP The pct Project Most likely Path MLP Beam Test Set-up Comparison
More informationDigital Calorimetry for Future Linear Colliders. Tony Price University of Birmingham University of Birmingham PPE Seminar 13 th November 2013
Digital Calorimetry for Future Linear Colliders Tony Price University of Birmingham University of Birmingham PPE Seminar 13 th November 2013 Overview The ILC Digital Calorimetry The TPAC Sensor Electromagnetic
More informationMURCIA: Fast parallel solvent accessible surface area calculation on GPUs and application to drug discovery and molecular visualization
MURCIA: Fast parallel solvent accessible surface area calculation on GPUs and application to drug discovery and molecular visualization Eduardo J. Cepas Quiñonero Horacio Pérez-Sánchez Wolfgang Wenzel
More informationThe Silicon-Tungsten Tracker of the DAMPE Mission
The Silicon-Tungsten Tracker of the DAMPE Mission Philipp Azzarello, DPNC, University of Geneva for the DAMPE-STK collaboration 10th International Hiroshima Symposium on the Development and Application
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationGPU accelerated Arnoldi solver for small batched matrix
15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA
More informationAlgebraic Reconstruction of Ultrafast Tomography Images at the Large Scale Data Facility
Algebraic Reconstruction of Ultrafast Tomography Images at the Xiaoli Yang, Thomas Jejkal, Rainer Stotzka, Halil Pasic, Tomy dos Santos Rolo, Thomas van de Kamp, Achim Streit Institute for Data Processing
More informationACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS
ACCELERATING WEATHER PREDICTION WITH NVIDIA GPUS Alan Gray, Developer Technology Engineer, NVIDIA ECMWF 18th Workshop on high performance computing in meteorology, 28 th September 2018 ESCAPE NVIDIA s
More informationTowards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters
Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM
More informationTau-neutrino production study at CERN SPS: Novel approach by the DsTau experiment
Tau-neutrino production study at CERN SPS: Novel approach by the DsTau experiment O. Sato (Nagoya University) for the DsTau collaboration 26/Aug/2016 NuFact2016, ICISE, Quy Nhon Current status on neutrino
More informationSwift: task-based hydrodynamics at Durham s IPCC. Bower
Swift: task-based hydrodynamics at Durham s IPCC Gonnet Schaller Chalk movie: Richard Bower (Durham) For the cosmological simulations of the formation of galaxies Bower Institute for Computational Cosmology
More informationUTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement
UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement Wuxi Li, Meng Li, Jiajun Wang, and David Z. Pan University of Texas at Austin wuxili@utexas.edu November 14, 2017 UT DA Wuxi Li
More informationHadron structure with photon and antiproton induced reactions - QCD in the non-perturbative range -
Hadron structure with photon and antiproton induced reactions - QCD in the non-perturbative range - Introduction Present: Photoproduction of Mesons at ELSA and MAMI CB-ELSA/TAPS Experiment Crystal Ball/TAPS
More information