Solving RODEs on GPU clusters

Size: px
Start display at page:

Download "Solving RODEs on GPU clusters"

Transcription

1 HIGH SCIENCE Solving RODEs on GPU clusters Christoph Riesinger Technische Universität München March 4, 206 HIGH SCIENCE, March 4, 206

2 Motivation - Parallel Computing HIGH SCIENCE, March 4, 206 2

3 Motivation - Parallel Computing HIGH SCIENCE, March 4, 206 2

4 Motivation - Parallel Computing HIGH SCIENCE, March 4, 206 2

5 Motivation - Parallel Computing HIGH SCIENCE, March 4, 206 2

6 Motivation - Parallel Computing HIGH SCIENCE, March 4, 206 2

7 Motivation - Multiple Levels of Parallelism HIGH SCIENCE, March 4, 206 3

8 Motivation - Multiple Levels of Parallelism HIGH SCIENCE, March 4, 206 3

9 Technische Universita t Mu nchen Motivation - Multiple Levels of Parallelism HIGH SCIENCE, March 4, 206 3

10 Building Blocks Pseudo Random Number Generation Ornstein-Uhlenbeck Process Averaging Numerical Solver () (3) x 3 x 6 x 9 μ () μ (3) x x 4 x 7 x 0 x 2 x 5 x 8 x μ () μ 3 (3) x 2 x 3 x 5 x 6 x 8 x 9 x μ () μ 2 μ 3 (3) x 5 x x 6 x GPU 0 Monte Carlo Pseudo Random Number Generation Ornstein-Uhlenbeck Process Averaging Numerical Solver GPU Pseudo Random Number Generation Ornstein-Uhlenbeck Process Averaging Numerical Solver GPU N- HIGH SCIENCE, March 4, 206 4

11 Pseudo Random Number Generation - Ziggurat The area under the Gaussian function is approximated by strips R i These strips are further subdivided in central (green), tail (purple), and cap (red) regions and a base strip (blue) y0 x R 0 y x 2 R y2 x 3 R 2 y3 x 4 R3 y4 R 4 x 5 x 6 y5 R 5 y 6 R 6 x 7 =r y 7 R 7 =R B HIGH SCIENCE, March 4, 206 5

12 Pseudo Random Number Generation - Ziggurat The area under the Gaussian function is approximated by strips R i These strips are further subdivided in central (green), tail (purple), and cap (red) regions and a base strip (blue) y0 x R 0 y x 2 R y2 x 3 R 2 y3 x 4 R3 y4 R 4 x 5 x 6 y5 R 5 y 6 R 6 x 7 =r y 7 R 7 =R B To do the transformation, a strip is randomly selected An uniform random number u [0, [ is stretched by a lookup table value basing on the selected strip If a central region is hit, the transformation is very cheap, otherwise it s much more expensive HIGH SCIENCE, March 4, 206 5

13 Pseudo Random Number Generation - Trade-off The more strips are used for the Ziggurat, the bigger the ratio of the sum of all central regions to the sum of all strips gets The bigger this ratio gets, the higher the likelihood to hit a (cheap) central region gets In addition, on GPUs, this reduces the likelihood for warp divergence So runtime can be reduced by using more strips which results in larger lookup tables runtime/memory trade-off HIGH SCIENCE, March 4, 206 6

14 Pseudo Random Number Generation - Results /2 Performance of the Ziggurat Method GPU architecture Fermi Kepler Maxwell Model M2090 Tesla K40m GTX 750 Ti #Processing elements Peak performance SP (TFLOPS) Peak performance DP (TFLOPS) Peak memory bandwidth (GByte/s) Tesla M2090 (Fermi) 2.5 Tesla K40m (Kepler) 2.5 GTX 750 Ti (Maxwell) giga pseudo random numbers per second number of strips number of strips number of strips local local local local local shared shared shared shared shared HIGH SCIENCE, March 4, 206 7

15 Pseudo Random Number Generation - Results 2/2 Comparison with other Normal PRNGs giga pseudo random numbers per second Tesla M2090 (Fermi) grid configuration Ziggurat Inverse CDF 4.5 Tesla K40m (Kepler) grid configuration Rational Polynomial curand Wallace XORWOW 4.5 GTX 750 Ti (Maxwell) grid configuration MKL on Xeon E v2 HIGH SCIENCE, March 4, 206 8

16 Ornstein-Uhlenbeck process - Link to Prefix Sum /2 O th = µo t σ X n () O t2h = µo th σ X n = µ µo t σ X n () ( ) = µ 2 O t σ X µn () n ( O t3h = µo t2h σ X n (3) = µ ( µ ( µo t σ X n () ( = µ µo th σ X n ) ) σ X n σ X n (3) ( = µ 3 O t σ X µ 2 n () µn n (3)... =... i ( ) O tih = µ i O t σ X µ i k n (k) k= ) ) σ X n = ) σ X n (3) = = HIGH SCIENCE, March 4, 206 9

17 Ornstein-Uhlenbeck process - Link to Prefix Sum 2/2 This looks very similar to the prefix sum or scan operation: i ( ) O tih = µ i O t σ X µ i k n (k) i O tih = k= n (k) k= HIGH SCIENCE, March 4, 206 0

18 Ornstein-Uhlenbeck process - Link to Prefix Sum 2/2 This looks very similar to the prefix sum or scan operation: i ( ) O tih = µ i O t σ X µ i k n (k) i O tih = k= n (k) k= x x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 0 x x 2 x 3 x 4 x 5 x x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 0 x x 2 x 3 x 4 x 5 HIGH SCIENCE, March 4, 206 0

19 Ornstein-Uhlenbeck process - Parallel Prefix Sum Up-Sweep Algorithm Up-sweep phase : for d = ; d log 2 (n); d do 2: for i = 0; i < n 2 d ; i do 3: x (i)2 d x (i)2 d x (i 2 )2 d 4: end for 5: end for d=4 μ () μ 3 (3) μ (5) (6) μ 7 (7) μ (9) (0) μ 8 μ 3 μ () μ (3) μ 5 (5) d=3 μ () μ 3 (3) μ (5) (6) μ 7 (7) μ (9) (0) μ 3 μ () μ (3) μ 4 μ 4 μ 7 (5) d=2 d= μ n 0 () μ () μ 3 (3) μ (3) μ (5) μ (5) (6) (6) μ 3 μ (7) μ (6) (7) μ (9) μ (9) (0) (0) μ 3 μ () μ (0) () μ (3) μ 2 μ 2 μ 2 μ 2 μ (3) μ μ μ μ μ μ μ μ μ 3 (5) μ (5) d=0 () (3) (5) (6) (7) (9) (0) () (3) (5) HIGH SCIENCE, March 4, 206

20 Ornstein-Uhlenbeck process - Parallel Prefix Sum Down-Sweep Algorithm 2 Down-sweep phase : for d = log 2 (n) ; d 0; d-- do 2: for i = 0; i < n 2 d ; i do 3: x (i 3 2 )2 d x (i)2 d x (i 3 2 )2 d 4: end for 5: end for d=3 μ () X μ 3 (3) μ (5) μ 7 (6) (7) μ (9) (0) μ 3 () μ (3) μ 5 (5) d=2 μ () μ 3 (3) μ (5) (6) μ 7 (7) μ (9) μ 4 (0) μ () μ (3) μ 5 (5) d= d=0 μ () μ () μ 2 μ 3 (3) μ 3 (3) μ 2 μ 2 μ 2 μ 4 μ 5 (5) μ 5 (5) (6) μ 6 (6) μ 7 (7) μ 7 (7) μ 8 μ 9 (9) μ 9 (9) (0) μ 0 (0) μ () μ () μ 2 μ 3 (3) μ μ μ μ μ μ μ μ 3 (3) μ 4 μ 5 (5) μ 5 (5) HIGH SCIENCE, March 4, 206 2

21 Ornstein-Uhlenbeck process - Results 3.5 Tesla M2090 (Fermi) 3.5 Tesla K40m (Kepler) 3.5 GTX 750 Ti (Maxwell) giga realizations of OU process elements per thread float, 2 7 threads/block double, 2 7 threads/block elements per thread float, 2 8 threads/block double, 2 8 threads/block float, 2 9 threads/block double, 2 9 threads/block elements per thread float, 2 0 threads/block double, 2 0 threads/block HIGH SCIENCE, March 4, 206 3

22 Averaging x 3 x 6 x 9 x 2 x 5 x 8 x 2 x x 4 x 7 x 0 x 3 x 6 x 9 x 22 x 2 x 5 x 8 x x 4 x 7 x 20 x 23 x 2 x 3 x 5 x 6 x 8 x 9 x x 2 x 4 x 5 x 7 x 8 x 20 x 2 x 23 x 5 x 6 x x 2 x 7 x 8 x 23 x x 2 x 23 x 23 HIGH SCIENCE, March 4, 206 4

23 Averaging - Results Tesla M2090 (Fermi) Tesla K40m (Kepler) GTX 750 Ti (Maxwell) ratio of maximum bandwidth threads per block float, single averaging double, single averaging threads per block float, double averaging double, double averaging float, 3-tridiagonal double, 3-tridiagonal threads per block float, 4-tridiagonal double, 4-tridiagonal architecture Tesla M2090 Tesla K40m GTX 750 Ti ratio peak memory bandwidth 88.5% 72.% 8.% configuration ( threads block 28, double 2 8, double 2 6, double HIGH SCIENCE, March 4, 206 5

24 Solving one instance of the RODE on a single GPU Tesla M2090 (Fermi) Tesla K40m (Kepler) GTX 750 Ti (Maxwell) float, double, float, double, float, double, initstatesnormalkernel() scanexclusiveoukernel() averagedeulerkernel() purple blue green red float, double, float, double, float, double, float, double, float, double, float, double, float, double, float, double, getrandomnumbersnormalkernel() scanoufixkernel() float, double, float, double, float, double, float, double, singleaveragekernel() realizeouprocesskernel() numerical solver averaging Ornstein-Uhlenbeck process pseudo random number generation float, double, float, double, HIGH SCIENCE, March 4, 206 6

25 Solving several instances of the RODE on multiple GPUs cluster JuDGE Hydra TSUBAME 2.5 location FZJ RZG GSIC GPUs per node 2 3 total # of GPUs Interconnect QDR InfiniBand FDR InfiniBand QDR InfiniBand efficiency JuDGE number of GPUs float, GPU computations double, GPU computations Hydra number of GPUs float, MPI_Reduce() double, MPI_Reduce() TSUBAME number of GPUs float, total double, total HIGH SCIENCE, March 4, 206 7

26 Final slide HIGH SCIENCE, March 4, 206 8

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

arxiv: v1 [hep-lat] 31 Oct 2015

arxiv: v1 [hep-lat] 31 Oct 2015 and Code Optimization arxiv:1511.00088v1 [hep-lat] 31 Oct 2015 Hwancheol Jeong, Sangbaek Lee, Weonjong Lee, Lattice Gauge Theory Research Center, CTP, and FPRD, Department of Physics and Astronomy, Seoul

More information

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures

A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures A Massively Parallel Eigenvalue Solver for Small Matrices on Multicore and Manycore Architectures Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

Two case studies of Monte Carlo simulation on GPU

Two case studies of Monte Carlo simulation on GPU Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice

More information

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice

Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Michal Borovský Department of Theoretical Physics and Astrophysics, University of P. J. Šafárik in Košice,

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM

More information

arxiv: v1 [hep-lat] 10 Jul 2012

arxiv: v1 [hep-lat] 10 Jul 2012 Hybrid Monte Carlo with Wilson Dirac operator on the Fermi GPU Abhijit Chakrabarty Electra Design Automation, SDF Building, SaltLake Sec-V, Kolkata - 700091. Pushan Majumdar Dept. of Theoretical Physics,

More information

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory

Nuclear Physics and Computing: Exascale Partnerships. Juan Meza Senior Scientist Lawrence Berkeley National Laboratory Nuclear Physics and Computing: Exascale Partnerships Juan Meza Senior Scientist Lawrence Berkeley National Laboratory Nuclear Science and Exascale i Workshop held in DC to identify scientific challenges

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Unraveling the mysteries of quarks with hundreds of GPUs. Ron Babich NVIDIA

Unraveling the mysteries of quarks with hundreds of GPUs. Ron Babich NVIDIA Unraveling the mysteries of quarks with hundreds of GPUs Ron Babich NVIDIA Collaborators and QUDA developers Kip Barros (LANL) Rich Brower (Boston University) Mike Clark (NVIDIA) Justin Foley (University

More information

GPU Computing Activities in KISTI

GPU Computing Activities in KISTI International Advanced Research Workshop on High Performance Computing, Grids and Clouds 2010 June 21~June 25 2010, Cetraro, Italy HPC Infrastructure and GPU Computing Activities in KISTI Hongsuk Yi hsyi@kisti.re.kr

More information

GPU accelerated Arnoldi solver for small batched matrix

GPU accelerated Arnoldi solver for small batched matrix 15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Improving weather prediction via advancing model initialization

Improving weather prediction via advancing model initialization Improving weather prediction via advancing model initialization Brian Etherton, with Christopher W. Harrop, Lidia Trailovic, and Mark W. Govett NOAA/ESRL/GSD 15 November 2016 The HPC group at NOAA/ESRL/GSD

More information

Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS

Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS Berk Hess, Szilárd Páll KTH Royal Institute of Technology GTC 2012 GROMACS: fast, scalable, free Classical molecular dynamics package

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8))

An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8)) An Implementation of SPELT(31, 4, 96, 96, (32, 16, 8)) Tung Chou January 5, 2012 QUAD Stream cipher. Security relies on MQ (Multivariate Quadratics). QUAD The Provably-secure QUAD(q, n, r) Stream Cipher

More information

Origami: Folding Warps for Energy Efficient GPUs

Origami: Folding Warps for Energy Efficient GPUs Origami: Folding Warps for Energy Efficient GPUs Mohammad Abdel-Majeed*, Daniel Wong, Justin Huang and Murali Annavaram* * University of Southern alifornia University of alifornia, Riverside Stanford University

More information

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,

More information

Measuring freeze-out parameters on the Bielefeld GPU cluster

Measuring freeze-out parameters on the Bielefeld GPU cluster Measuring freeze-out parameters on the Bielefeld GPU cluster Outline Fluctuations and the QCD phase diagram Fluctuations from Lattice QCD The Bielefeld hybrid GPU cluster Freeze-out conditions from QCD

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD

APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD APPLICATION OF CUDA TECHNOLOGY FOR CALCULATION OF GROUND STATES OF FEW-BODY NUCLEI BY FEYNMAN'S CONTINUAL INTEGRALS METHOD M.A. Naumenko, V.V. Samarin Joint Institute for Nuclear Research, Dubna, Russia

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

arxiv: v1 [cs.dc] 4 Sep 2014

arxiv: v1 [cs.dc] 4 Sep 2014 and NVIDIA R GPUs arxiv:1409.1510v1 [cs.dc] 4 Sep 2014 O. Kaczmarek, C. Schmidt and P. Steinbrecher Fakultät für Physik, Universität Bielefeld, D-33615 Bielefeld, Germany E-mail: okacz, schmidt, p.steinbrecher@physik.uni-bielefeld.de

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

Stochastic Modelling of Electron Transport on different HPC architectures

Stochastic Modelling of Electron Transport on different HPC architectures Stochastic Modelling of Electron Transport on different HPC architectures www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivan ova Institute of Information and Communication Technologies Bulgarian Academy

More information

New approaches to strongly interacting Fermi gases

New approaches to strongly interacting Fermi gases New approaches to strongly interacting Fermi gases Joaquín E. Drut The Ohio State University INT Program Simulations and Symmetries Seattle, March 2010 In collaboration with Timo A. Lähde Aalto University,

More information

A simple Concept for the Performance Analysis of Cluster-Computing

A simple Concept for the Performance Analysis of Cluster-Computing A simple Concept for the Performance Analysis of Cluster-Computing H. Kredel 1, S. Richling 2, J.P. Kruse 3, E. Strohmaier 4, H.G. Kruse 1 1 IT-Center, University of Mannheim, Germany 2 IT-Center, University

More information

Molecular Dynamics Simulation of a Biomolecule with High Speed, Low Power and Accuracy Using GPU-Accelerated TSUBAME2.

Molecular Dynamics Simulation of a Biomolecule with High Speed, Low Power and Accuracy Using GPU-Accelerated TSUBAME2. APSIPA ASC 2011 Xi an Molecular Dynamics Simulation of a Biomolecule with High Speed, Low Power and Accuracy Using GPU-Accelerated TSUBAME2.0 Supercomputer Shiqiao Du, Takuro Udagawa, Toshio Endo and Masakazu

More information

Monte Carlo Methods for Electron Transport: Scalability Study

Monte Carlo Methods for Electron Transport: Scalability Study Monte Carlo Methods for Electron Transport: Scalability Study www.hp-see.eu Aneta Karaivanova (Joint work with E. Atanassov and T. Gurov) Institute of Information and Communication Technologies Bulgarian

More information

Numerical Characterization of Multi-Dielectric Green s Function for 3-D Capacitance Extraction with Floating Random Walk Algorithm

Numerical Characterization of Multi-Dielectric Green s Function for 3-D Capacitance Extraction with Floating Random Walk Algorithm Numerical Characterization of Multi-Dielectric Green s Function for 3-D Capacitance Extraction with Floating Random Walk Algorithm Hao Zhuang 1, 2, Wenjian Yu 1 *, Gang Hu 1, Zuochang Ye 3 1 Department

More information

Randomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory

Randomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory Randomized Selection on the GPU Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory High Performance Graphics 2011 August 6, 2011 Top k Selection on GPU Output the top k keys

More information

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de

More information

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

Fast event generation system using GPU. Junichi Kanzaki (KEK) ACAT 2013 May 16, 2013, IHEP, Beijing

Fast event generation system using GPU. Junichi Kanzaki (KEK) ACAT 2013 May 16, 2013, IHEP, Beijing Fast event generation system using GPU Junichi Kanzaki (KEK) ACAT 2013 May 16, 2013, IHEP, Beijing Motivation The mount of LHC data is increasing. -5fb -1 in 2011-22fb -1 in 2012 High statistics data ->

More information

Machine Learning I Continuous Reinforcement Learning

Machine Learning I Continuous Reinforcement Learning Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

Implementing NNLO into MCFM

Implementing NNLO into MCFM Implementing NNLO into MCFM Downloadable from mcfm.fnal.gov A Multi-Threaded Version of MCFM, J.M. Campbell, R.K. Ellis, W. Giele, 2015 Higgs boson production in association with a jet at NNLO using jettiness

More information

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline

More information

Multiscale simulations of complex fluid rheology

Multiscale simulations of complex fluid rheology Multiscale simulations of complex fluid rheology Michael P. Howard, Athanassios Z. Panagiotopoulos Department of Chemical and Biological Engineering, Princeton University Arash Nikoubashman Institute of

More information

Approximation of inverse Poisson CDF on GPUs

Approximation of inverse Poisson CDF on GPUs Approximation of inverse Poisson CDF on GPUs Mike Giles Mathematical Institute, University of Oxford Oxford-Man Institute of Quantitative Finance 38th Conference on Stochastic Processes and their Applications

More information

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method Jee Choi 1, Aparna Chandramowlishwaran 3, Kamesh Madduri 4, and Richard Vuduc 2 1 ECE, Georgia Tech 2 CSE, Georgia

More information

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp

More information

上海超级计算中心 Shanghai Supercomputer Center. Lei Xu Shanghai Supercomputer Center San Jose

上海超级计算中心 Shanghai Supercomputer Center. Lei Xu Shanghai Supercomputer Center San Jose 上海超级计算中心 Shanghai Supercomputer Center Lei Xu Shanghai Supercomputer Center 03/26/2014 @GTC, San Jose Overview Introduction Fundamentals of the FDTD method Implementation of 3D UPML-FDTD algorithm on GPU

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

RWTH Aachen University

RWTH Aachen University IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016

More information

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

WRF performance tuning for the Intel Woodcrest Processor

WRF performance tuning for the Intel Woodcrest Processor WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

Architecture-Aware Algorithms and Software for Peta and Exascale Computing

Architecture-Aware Algorithms and Software for Peta and Exascale Computing Architecture-Aware Algorithms and Software for Peta and Exascale Computing Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 4/25/2011 1 H. Meuer, H. Simon, E.

More information

Optimized LU-decomposition with Full Pivot for Small Batched Matrices S3069

Optimized LU-decomposition with Full Pivot for Small Batched Matrices S3069 Optimized LU-decomposition with Full Pivot for Small Batched Matrices S369 Ian Wainwright High Performance Consulting Sweden ian.wainwright@hpcsweden.se Based on work for GTC 212: 1x speed-up vs multi-threaded

More information

Solving Quadratic Equations with XL on Parallel Architectures

Solving Quadratic Equations with XL on Parallel Architectures Solving Quadratic Equations with XL on Parallel Architectures Cheng Chen-Mou 1, Chou Tung 2, Ni Ru-Ben 2, Yang Bo-Yin 2 1 National Taiwan University 2 Academia Sinica Taipei, Taiwan Leuven, Sept. 11, 2012

More information

Case Study: Quantum Chromodynamics

Case Study: Quantum Chromodynamics Case Study: Quantum Chromodynamics Michael Clark Harvard University with R. Babich, K. Barros, R. Brower, J. Chen and C. Rebbi Outline Primer to QCD QCD on a GPU Mixed Precision Solvers Multigrid solver

More information

Network Security. Random Numbers. Cornelius Diekmann. Version: November 21, 2015

Network Security. Random Numbers. Cornelius Diekmann. Version: November 21, 2015 Network Security Random Numbers Cornelius Diekmann Lehrstuhl für Netzarchitekturen und Netzdienste Institut für Informatik Version: November 21, 2015 IN2101, WS 15/16, Network Security 1 Fakulta t fu r

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning

Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Fine-Grained Parallel Algorithms for Incomplete Factorization Preconditioning Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology, USA SPPEXA Symposium TU München,

More information

Random Sampling for Short Lattice Vectors on Graphics Cards

Random Sampling for Short Lattice Vectors on Graphics Cards Random Sampling for Short Lattice Vectors on Graphics Cards Michael Schneider, Norman Göttert TU Darmstadt, Germany mischnei@cdc.informatik.tu-darmstadt.de CHES 2011, Nara September 2011 Michael Schneider

More information

Domain Decomposition-based contour integration eigenvalue solvers

Domain Decomposition-based contour integration eigenvalue solvers Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM

More information

Computations of Properties of Atoms and Molecules Using Relativistic Coupled Cluster Theory

Computations of Properties of Atoms and Molecules Using Relativistic Coupled Cluster Theory Computations of Properties of Atoms and Molecules Using Relativistic Coupled Cluster Theory B P Das Department of Physics School of Science Tokyo Institute of Technology Collaborators: VS Prasannaa, Indian

More information

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria 1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation

More information

sri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy

sri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy 2D Implicit Charge- and Energy- Conserving sri Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy Mentors Dana Knoll and Allen McPherson IS&T CoDesign Summer School 2012, Los Alamos

More information

GPU accelerated Monte Carlo simulations of lattice spin models

GPU accelerated Monte Carlo simulations of lattice spin models Available online at www.sciencedirect.com Physics Procedia 15 (2011) 92 96 GPU accelerated Monte Carlo simulations of lattice spin models M. Weigel, T. Yavors kii Institut für Physik, Johannes Gutenberg-Universität

More information

Establishing a CUDA Research Center at Penn State: Perspectives on GPU-Enabled Teaching and Research

Establishing a CUDA Research Center at Penn State: Perspectives on GPU-Enabled Teaching and Research Establishing a CUDA Research Center at Penn State: Perspectives on GPU-Enabled Teaching and Research William J. Brouwer (wjb19@psu.edu) Pierre-Yves Taunay (py.taunay@psu.edu) Research Computing and Cyberinfrastructure

More information

The Memory Intensive System

The Memory Intensive System DiRAC@Durham The Memory Intensive System The DiRAC-2.5x Memory Intensive system at Durham in partnership with Dell Dr Lydia Heck, Technical Director ICC HPC and DiRAC Technical Manager 1 DiRAC Who we are:

More information

High-Performance Computing and Groundbreaking Applications

High-Performance Computing and Groundbreaking Applications INSTITUTE OF INFORMATION AND COMMUNICATION TECHNOLOGIES BULGARIAN ACADEMY OF SCIENCE High-Performance Computing and Groundbreaking Applications Svetozar Margenov Institute of Information and Communication

More information

Real-time signal detection for pulsars and radio transients using GPUs

Real-time signal detection for pulsars and radio transients using GPUs Real-time signal detection for pulsars and radio transients using GPUs W. Armour, M. Giles, A. Karastergiou and C. Williams. University of Oxford. 15 th July 2013 1 Background of GPUs Why use GPUs? Influence

More information

Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU

Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU Jacques M. Bahi, Raphaël Couturier, Christophe Guyeux, and Pierre-Cyrille Héam October 30, 2018 arxiv:1112.5239v1

More information

Data analysis of massive data sets a Planck example

Data analysis of massive data sets a Planck example Data analysis of massive data sets a Planck example Radek Stompor (APC) LOFAR workshop, Meudon, 29/03/06 Outline 1. Planck mission; 2. Planck data set; 3. Planck data analysis plan and challenges; 4. Planck

More information

Physics plans and ILDG usage

Physics plans and ILDG usage Physics plans and ILDG usage in Italy Francesco Di Renzo University of Parma & INFN Parma The MAIN ILDG USERS in Italy are the ROME groups A (by now) well long track of ILDG-based projects mainly within

More information

Cosmology with Galaxy Clusters: Observations meet High-Performance-Computing

Cosmology with Galaxy Clusters: Observations meet High-Performance-Computing Cosmology with Galaxy Clusters: Observations meet High-Performance-Computing Julian Merten (ITA/ZAH) Clusters of galaxies GPU lensing codes Abell 2744 CLASH: A HST/MCT programme Clusters of galaxies DM

More information

FEM-Level Set Techniques for Multiphase Flow --- Some recent results

FEM-Level Set Techniques for Multiphase Flow --- Some recent results FEM-Level Set Techniques for Multiphase Flow --- Some recent results ENUMATH09, Uppsala Stefan Turek, Otto Mierka, Dmitri Kuzmin, Shuren Hysing Institut für Angewandte Mathematik, TU Dortmund http://www.mathematik.tu-dortmund.de/ls3

More information

Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Statistical Methods for Data Analysis Random number generators Luca Lista INFN Napoli Pseudo-random generators Requirement: Simulate random process with a computer E.g.: radiation interaction with matter,

More information

Plaquette Renormalized Tensor Network States: Application to Frustrated Systems

Plaquette Renormalized Tensor Network States: Application to Frustrated Systems Workshop on QIS and QMP, Dec 20, 2009 Plaquette Renormalized Tensor Network States: Application to Frustrated Systems Ying-Jer Kao and Center for Quantum Science and Engineering Hsin-Chih Hsiao, Ji-Feng

More information

arxiv: v1 [cs.ms] 7 Nov 2018

arxiv: v1 [cs.ms] 7 Nov 2018 Gravitational octree code performance evaluation on Volta GPU Yohei Miki 1 arxiv:1811.02761v1 [cs.ms] 7 Nov 2018 Abstract In this study, the gravitational octree code originally optimized for the Fermi,

More information

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners: The Problem The Bethe-Salpeter

More information

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville

Performance of the fusion code GYRO on three four generations of Crays. Mark Fahey University of Tennessee, Knoxville Performance of the fusion code GYRO on three four generations of Crays Mark Fahey mfahey@utk.edu University of Tennessee, Knoxville Contents Introduction GYRO Overview Benchmark Problem Test Platforms

More information

GPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018

GPU-accelerated Computing at Scale. Dirk Pleiter I GTC Europe 10 October 2018 GPU-accelerated Computing at Scale irk Pleiter I GTC Europe 10 October 2018 Outline Supercomputers at JSC Future science challenges Outlook and conclusions 2 3 Supercomputers at JSC JUQUEEN (until 2018)

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

Light curve modeling of eclipsing binary stars

Light curve modeling of eclipsing binary stars Light curve modeling of eclipsing binary stars Gábor Marschalkó Baja Observatory of University of Szeged Wigner Research Centre for Physics Binary stars physical variables pulsating stars mass, radius,

More information

Parallel Simulations of Self-propelled Microorganisms

Parallel Simulations of Self-propelled Microorganisms Parallel Simulations of Self-propelled Microorganisms K. Pickl a,b M. Hofmann c T. Preclik a H. Köstler a A.-S. Smith b,d U. Rüde a,b ParCo 2013, Munich a Lehrstuhl für Informatik 10 (Systemsimulation),

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters

A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters Abal-Kassim Cheik Ahamed and Frédéric Magoulès Introduction By giving another way to see beneath the Earth, gravimetry

More information

Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra

Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra Hybrid parallelization of a pseudo-spectral DNS code and its computational performance on RZG s idataplex system Hydra Markus Rampp 1, Liang Shi 2, Marc Avila 3,2, Björn Hof 2,4 1 Computing Center of the

More information

Level-3 BLAS on a GPU

Level-3 BLAS on a GPU Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón

More information

Explore Computational Power of GPU in Electromagnetics and Micromagnetics

Explore Computational Power of GPU in Electromagnetics and Micromagnetics Explore Computational Power of GPU in Electromagnetics and Micromagnetics Presenter: Sidi Fu, PhD candidate, UC San Diego Advisor: Prof. Vitaliy Lomakin Center of Magnetic Recording Research, Department

More information

Calculation of ground states of few-body nuclei using NVIDIA CUDA technology

Calculation of ground states of few-body nuclei using NVIDIA CUDA technology Calculation of ground states of few-body nuclei using NVIDIA CUDA technology M. A. Naumenko 1,a, V. V. Samarin 1, 1 Flerov Laboratory of Nuclear Reactions, Joint Institute for Nuclear Research, 6 Joliot-Curie

More information

Chile / Dirección Meteorológica de Chile (Chilean Weather Service)

Chile / Dirección Meteorológica de Chile (Chilean Weather Service) JOINT WMO TECHNICAL PROGRESS REPORT ON THE GLOBAL DATA PROCESSING AND FORECASTING SYSTEM AND NUMERICAL WEATHER PREDICTION RESEARCH ACTIVITIES FOR 2015 Chile / Dirección Meteorológica de Chile (Chilean

More information

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical &

More information

Accelerating interior point methods with GPUs for smart grid systems

Accelerating interior point methods with GPUs for smart grid systems Downloaded from orbit.dtu.dk on: Dec 18, 2017 Accelerating interior point methods with GPUs for smart grid systems Gade-Nielsen, Nicolai Fog Publication date: 2011 Document Version Publisher's PDF, also

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information