Performance Prediction for Tensor Contractions

Size: px
Start display at page:

Download "Performance Prediction for Tensor Contractions"

Transcription

1 1 / 18 Performance Prediction for Tensor Contractions Paolo Bientinesi, Edoardo Di Napoli, Diego Fabregat, Elmar Peise AICES, RWTH Aachen pauldj@aices.rwth-aachen.de June 3rd, 214 PASCConference 14 Zürich, Switzerland

2 Tensors Crash course (1/2) 2 / 18 MATHEMATICS, PHYSICS multilinear map multidimensional array + metric COMPUTER SCIENCE multidimensional array

3 Tensors Crash course (1/2) 2 / 18 MATHEMATICS, PHYSICS multilinear map multidimensional array + metric COMPUTER SCIENCE multidimensional array t dimensional tensors S αβ...γδ, S β γ α... δ, S...γδ αβ,... }{{} t indices

4 Tensors Crash course (1/2) 2 / 18 MATHEMATICS, PHYSICS multilinear map multidimensional array + metric COMPUTER SCIENCE multidimensional array t dimensional tensors S αβ...γδ, S β γ α... δ, S...γδ αβ,... }{{} t indices Operations low dimensional approximations contractions

5 Tensors Crash course (1/2) 2 / 18 MATHEMATICS, PHYSICS multilinear map multidimensional array + metric COMPUTER SCIENCE multidimensional array t dimensional tensors S αβ...γδ, S β γ α... δ, S...γδ αβ,... }{{} t indices Operations low dimensional approximations contractions Examples: S αij T ij, S αiβ M ik T kγ, S αij M ik T kh M hj,...

6 Tensors Crash course (1/2) 2 / 18 MATHEMATICS, PHYSICS multilinear map multidimensional array + metric COMPUTER SCIENCE multidimensional array t dimensional tensors S αβ...γδ, S β γ α... δ, S...γδ αβ,... }{{} t indices Operations low dimensional approximations contractions Examples: S αij T ij, S αiβ M ik T kγ, S αij M ik T kh M hj,...

7 Crash course (2/2) 3 / 18 Contraction S αij T jδi α, δ free indices i, j contracted indices

8 Crash course (2/2) 3 / 18 Contraction S αij T jδi α, δ free indices i, j contracted indices V αδ = S αij T jδi α δ v αδ = i s αij t jδi j

9 Crash course (2/2) 3 / 18 Contraction S αij T jδi α, δ free indices i, j contracted indices V αδ = S αij T jδi α δ v αδ = i s αij t jδi j Storage S αβγ... α stride 1 β stride α γ stride α β.

10 A well known contraction (1/2) 4 / 18 C ij = A ik B kj

11 A well known contraction (1/2) 4 / 18 C ij = A ik B kj Direct call C := GEMM(A,B)

12 A well known contraction (1/2) 4 / 18 C ij = A ik B kj Direct call C := GEMM(A,B) 1 A is sliced horizontally C := AB = a 1. a m B = a 1 B. a m B for i=1,..., Ci:=GEMV(Ai,B)

13 A well known contraction (1/2) 4 / 18 C ij = A ik B kj Direct call C := GEMM(A,B) 1 A is sliced horizontally C := AB = a 1. a m B = a 1 B. a m B for i=1,..., Ci:=GEMV(Ai,B) 2 B is sliced vertically C := AB = A[b 1 b 2... b n ] = [Ab 1 Ab 2... Ab n ] for i=1,..., Ci:=GEMV(A,Bi)

14 A well known contraction (2/2) 5 / 18 C ij = A ik B kj 3 A is sliced vertically and B horizontally b 1 [a 1... a k ]. = a 1 b 1 + a 2 b a k b k b k for i=1,..., C+=GER(Ai,Bi)

15 A well known contraction (2/2) 5 / 18 C ij = A ik B kj 3 A is sliced vertically and B horizontally b 1 [a 1... a k ]. = a 1 b 1 + a 2 b a k b k b k for i=1,..., C+=GER(Ai,Bi) 4 A is sliced horizontally and B vertically a 1 a 1 b 1... a 1 b n. [b 1... b n ] =.... a m a m b 1 a m b n for i=1,..., for j=1,..., Cij:=DOT(Ai,Bj)

16 Mathematically equivalent, but... All experiments: OpenBLAS.2.8, Intel IvyBridge_EP E5-268v2 flops / cycle GEMM GER GEMV GEMV DOT DOT 5 1, 1,5 2, 2,5 k 6 / 18

17 How to use BLAS for contractions? 7 / 18 R αβγ := T αβσ S σγ

18 How to use BLAS for contractions? 7 / 18 R αβγ := T αβσ S σγ 1) Slicing along β R α1γr α2γ R α3γ R αnγ T α1σ T α2σ T α3σ T αnσ S σγ GEMM

19 How to use BLAS for contractions? 7 / 18 R αβγ := T αβσ S σγ 2) Slicing along α R mβγ T mβσ S σγ R 3βγ R 2βγ R 1βγ T 3βσ T 2βσ T 1βσ GEMM Transposition + GEMM

20 How to use BLAS for contractions? 7 / 18 R αβγ := T αβσ S σγ 3) Slicing along α and β S σγ GEMV

21 Taxonomy 8 / 18 V h1 h 2... := S i1 i 2...T j1 j 2... Definition: (X) = # of free indices of X

22 Taxonomy 8 / 18 V h1 h 2... := S i1 i 2...T j1 j 2... Definition: (X) = # of free indices of X Class 1: (S) = (T ) = BLAS3 BLAS2 BLAS1

23 Taxonomy 8 / 18 V h1 h 2... := S i1 i 2...T j1 j 2... Definition: (X) = # of free indices of X Class 1: (S) = (T ) = BLAS3 BLAS2 BLAS1 Class 2: (S) 1 (T ) = or (S) = (T ) 1 BLAS3 BLAS2 (+ transp) BLAS1

24 Taxonomy V h1 h 2... := S i1 i 2...T j1 j 2... Definition: (X) = # of free indices of X Class 1: (S) = (T ) = BLAS3 BLAS2 BLAS1 Class 2: (S) 1 (T ) = or (S) = (T ) 1 BLAS3 BLAS2 (+ transp) BLAS1 Class 3: (S) 1 (T ) 1 BLAS3 (+ transp) BLAS2 BLAS1 Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions, Applied Mathematics and Computation, / 18

25 Nice and easy... 9 / 18 V bcd := S ijb T icjd

26 Nice and easy variants 9 / 18 flops / cycle V bcd := S ijb T icjd GEMM GER GEMV DOT , b = c = d

27 Small dimensions... flops / cycle GEMM GER GEMV GEMV DOT DOT k 1 / 18

28 Small dimensions... flops / cycle GEMM GER GEMV GEMV DOT DOT m = n = k 1 / 18

29 Small dimensions... flops / cycle GEMM GER GEMV GEMV DOT DOT flops / cycle GEMM GER GEMV GEMV DOT DOT m = n = k m = n = k 1 / 18

30 11 / 18 Goal Automatic selection of the best variants Idea Performance prediction Approach - Kernels execution - Algorithms execution Challenges Fluctuations uncertainties Cache influence Solution - Performance models - Context-aware timings

31 Fluctuations performance models 2 15 σ/ x m = n = k 12 / 18

32 Fluctuations performance models 1,24 2% % n ,24 m 1% 5% % Performance Modeling for Dense Linear Algebra, E. Peise, P.B., PMBS12 (SC12). 12 / 18

33 Fluctuations performance models 1,24 2% % n ,24 m 1% 5% % Performance Modeling for Dense Linear Algebra, E. Peise, P.B., PMBS12 (SC12). 12 / 18

34 Models... Timings? 13 / 18 Observation: typical linear algebra algorithms shrinking active region tensor contractions identical size slices

35 Models... Timings? 13 / 18 Observation: typical linear algebra algorithms shrinking active region tensor contractions identical size slices V a := S aij T ij flops / cycle GEMV GEMV DOT DOT DOT DOT , a = i = j

36 Models... Timings? 13 / 18 Observation: typical linear algebra algorithms shrinking active region tensor contractions identical size slices V a := S aij T ij flops / cycle GEMV GEMV DOT DOT DOT DOT flops / cycle , a = i = j , a = i = j

37 Influence of caching (1/2) 6 14 measured independent timing #cycles 4 2 for i=1,..., c_i:=gemv(a_i,b) invocation of GEMV 14 / 18

38 Influence of caching (1/2) 6 14 measured independent timing #cycles 4 2 for i=1,..., c_i:=gemv(a_i,b) Idea: cache setup invocation of GEMV 14 / 18

39 Influence of caching (1/2) 6 14 measured independent timing cache aware timing #cycles 4 2 for i=1,..., c_i:=gemv(a_i,b) Idea: cache setup invocation of GEMV 14 / 18

40 Influence of caching (2/2) measured cache aware timing.8 #cycles.6.4 for i=1,..., for j=1,..., A_i:=GER(ai,bj) invocation of GER 15 / 18

41 Influence of caching (2/2) measured cache aware timing.8 #cycles for i=1,..., for j=1,..., A_i:=GER(ai,bj) Idea: first iteration invocation of GER 15 / 18

42 Influence of caching (2/2) measured cache aware timing loop aware timing #cycles for i=1,..., for j=1,..., A_i:=GER(ai,bj) Idea: first iteration invocation of GER 15 / 18

43 (Nice) results 16 / 18 V a := S aij T ij flops / cycle GEMV GEMV DOT DOT DOT DOT flops / cycle , a = i = j , a = i = j

44 (So so) results 17 / 18 V bcd := S ijb T icjd flops / cycle 2 flops / cycle , b = c = d , b = c = d

45 (Awesome) results 18 / 18 V bcd := S ijb T icjd flops / cycle 4 flops / cycle , b = c = d , b = c = d

46 Conclusions 18 / 18 Tensor contractions algorithmic space need for BLAS4? automation? 8 matrix operations LARGE! maybe Yes, please! flops / cycle 4 flops / cycle , b = c = d , b = c = d

A Compiler for Linear Algebra Operations

A Compiler for Linear Algebra Operations A Compiler for Linear Algebra Operations Paolo Bientinesi In collaboration with Diego Fabregat AICES, RWTH Aachen pauldj@aices.rwth-aachen.de CScADS Autotuning Workshop 2012 August 13-14, 2012 Snowbird,

More information

A tale of efficiency and productivity. From scalar to tensor computations.

A tale of efficiency and productivity. From scalar to tensor computations. A tale of efficiency and productivity. From scalar to tensor computations. Paolo Bientinesi Aachen nstitute for Computational Engineering Science RWTH Aachen University October 23, 2017 Umeå Universitet

More information

Strassen s Algorithm for Tensor Contraction

Strassen s Algorithm for Tensor Contraction Strassen s Algorithm for Tensor Contraction Jianyu Huang, Devin A. Matthews, Robert A. van de Geijn The University of Texas at Austin September 14-15, 2017 Tensor Computation Workshop Flatiron Institute,

More information

A knowledge-based approach to high-performance computing in ab initio simulations.

A knowledge-based approach to high-performance computing in ab initio simulations. Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background

More information

arxiv: v3 [cs.ms] 7 Nov 2017

arxiv: v3 [cs.ms] 7 Nov 2017 A Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer, AICES, RWTH Aachen Paolo Bientinesi, AICES, RWTH Aachen arxiv:1607.00145v3 [cs.ms] 7 Nov 2017 We present GEMM-like Tensor-Tensor

More information

A Design of a High-Performance GEMM-like Tensor-Tensor Multiplication

A Design of a High-Performance GEMM-like Tensor-Tensor Multiplication A Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer, AICES, RWTH Aachen Paolo Bientinesi, AICES, RWTH Aachen We present GEMM-like Tensor-Tensor multiplication (GETT), a

More information

Mitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability

Mitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability Mitglied der Helmholtz-Gemeinschaft Linear algebra tasks in Materials Science: optimization and portability ADAC Workshop, July 17-19 2017 Edoardo Di Napoli Outline Jülich Supercomputing Center Chebyshev

More information

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8

More information

RWTH Aachen University

RWTH Aachen University IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016

More information

Tensor Contractions with Extended BLAS Kernels on CPU and GPU

Tensor Contractions with Extended BLAS Kernels on CPU and GPU Tensor Contractions with Extended BLAS Kernels on CPU and GPU Cris Cecka Senior Research Scientist NVIDIA Research, Santa Clara, California Joint work with Yang Shi, U.N. Niranjan, and Animashree Anandkumar

More information

Algorithms and Methods for Fast Model Predictive Control

Algorithms and Methods for Fast Model Predictive Control Algorithms and Methods for Fast Model Predictive Control Technical University of Denmark Department of Applied Mathematics and Computer Science 13 April 2016 Background: Model Predictive Control Model

More information

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

Knowledge-Based Automatic Generation of Algorithms and Code

Knowledge-Based Automatic Generation of Algorithms and Code Knowledge-Based Automatic Generation of Algorithms and Code Diego Fabregat Traver AICES, RWTH Aachen fabregat@aices.rwth-aachen.de Doctoral Defense Aachen, December 6th, 2013 Diego Fabregat (AICES, RWTH

More information

Mathematics 13: Lecture 10

Mathematics 13: Lecture 10 Mathematics 13: Lecture 10 Matrices Dan Sloughter Furman University January 25, 2008 Dan Sloughter (Furman University) Mathematics 13: Lecture 10 January 25, 2008 1 / 19 Matrices Recall: A matrix is a

More information

The existence of Burnett coefficients in the periodic Lorentz gas

The existence of Burnett coefficients in the periodic Lorentz gas The existence of Burnett coefficients in the periodic Lorentz gas N. I. Chernov and C. P. Dettmann September 14, 2006 Abstract The linear super-burnett coefficient gives corrections to the diffusion equation

More information

Math 671: Tensor Train decomposition methods

Math 671: Tensor Train decomposition methods Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016 Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3

More information

Matrix Powers and Applications

Matrix Powers and Applications Assistant Professor of Mathematics 3/7/18 Motivation Let A = [ 1 3 1 2 ]. Suppose we d like to find the 99 th power of A, Would you believe that A 99 = A 99 = A } A {{ A A}. 99 times [ 1 0 0 1 ]? Motivation

More information

Draft Version 1 Mark scheme Further Maths Core Pure (AS/Year 1) Unit Test 5: Algebra and Functions. Q Scheme Marks AOs. Notes

Draft Version 1 Mark scheme Further Maths Core Pure (AS/Year 1) Unit Test 5: Algebra and Functions. Q Scheme Marks AOs. Notes 1 b Uses α + β = to write 4p = 6 a TBC Solves to find 3 p = Uses c αβ = to write a 30 3p = k Solves to find k = 40 9 (4) (4 marks) Education Ltd 018. Copying permitted for purchasing institution only.

More information

arxiv:hep-ex/ v1 17 Sep 1999

arxiv:hep-ex/ v1 17 Sep 1999 Propagation of Errors for Matrix Inversion M. Lefebvre 1, R.K. Keeler 1, R. Sobie 1,2, J. White 1,3 arxiv:hep-ex/9909031v1 17 Sep 1999 Abstract A formula is given for the propagation of errors during matrix

More information

Strassen-like algorithms for symmetric tensor contractions

Strassen-like algorithms for symmetric tensor contractions Strassen-like algorithms for symmetric tensor contractions Edgar Solomonik Theory Seminar University of Illinois at Urbana-Champaign September 18, 2017 1 / 28 Fast symmetric tensor contractions Outline

More information

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in

More information

A Note on Time Measurements in LAMMPS

A Note on Time Measurements in LAMMPS arxiv:1602.05566v1 [cond-mat.mtrl-sci] 17 Feb 2016 A Note on Time Measurements in LAMMPS Daniel Tameling, Paolo Bientinesi, and Ahmed E. Ismail Aachen Institute for Advanced Study in Computational Engineering

More information

Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions

Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions Edgar Solomonik 1, Devin Matthews 3, Jeff Hammond 4, James Demmel 1,2 1 Department of

More information

PAPER 309 GENERAL RELATIVITY

PAPER 309 GENERAL RELATIVITY MATHEMATICAL TRIPOS Part III Monday, 30 May, 2016 9:00 am to 12:00 pm PAPER 309 GENERAL RELATIVITY Attempt no more than THREE questions. There are FOUR questions in total. The questions carry equal weight.

More information

Tensor Network Computations in Quantum Chemistry. Charles F. Van Loan Department of Computer Science Cornell University

Tensor Network Computations in Quantum Chemistry. Charles F. Van Loan Department of Computer Science Cornell University Tensor Network Computations in Quantum Chemistry Charles F. Van Loan Department of Computer Science Cornell University Joint work with Garnet Chan, Department of Chemistry and Chemical Biology, Cornell

More information

HPMPC - A new software package with efficient solvers for Model Predictive Control

HPMPC - A new software package with efficient solvers for Model Predictive Control - A new software package with efficient solvers for Model Predictive Control Technical University of Denmark CITIES Second General Consortium Meeting, DTU, Lyngby Campus, 26-27 May 2015 Introduction Model

More information

Einstein Toolkit Workshop. Joshua Faber Apr

Einstein Toolkit Workshop. Joshua Faber Apr Einstein Toolkit Workshop Joshua Faber Apr 05 2012 Outline Space, time, and special relativity The metric tensor and geometry Curvature Geodesics Einstein s equations The Stress-energy tensor 3+1 formalisms

More information

Towards Fast, Accurate and Reproducible LU Factorization

Towards Fast, Accurate and Reproducible LU Factorization Towards Fast, Accurate and Reproducible LU Factorization Roman Iakymchuk 1, David Defour 2, and Stef Graillat 3 1 KTH Royal Institute of Technology, CSC, CST/PDC 2 Université de Perpignan, DALI LIRMM 3

More information

Intel Math Kernel Library (Intel MKL) LAPACK

Intel Math Kernel Library (Intel MKL) LAPACK Intel Math Kernel Library (Intel MKL) LAPACK Linear equations Victor Kostin Intel MKL Dense Solvers team manager LAPACK http://www.netlib.org/lapack Systems of Linear Equations Linear Least Squares Eigenvalue

More information

ICS141: Discrete Mathematics for Computer Science I

ICS141: Discrete Mathematics for Computer Science I ICS4: Discrete Mathematics for Computer Science I Dept. Information & Computer Sci., Jan Stelovsky based on slides by Dr. Baek and Dr. Still Originals by Dr. M. P. Frank and Dr. J.L. Gross Provided by

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

7.1 Creation and annihilation operators

7.1 Creation and annihilation operators Chapter 7 Second Quantization Creation and annihilation operators. Occupation number. Anticommutation relations. Normal product. Wick s theorem. One-body operator in second quantization. Hartree- Fock

More information

Using Kernel Couplings to Predict Parallel Application Performance

Using Kernel Couplings to Predict Parallel Application Performance Using Kernel Couplings to Predict Parallel Application Performance Valerie Taylor, Xingfu Wu, Jonathan Geisler Department of Electrical and Computer Engineering, Northwestern University, Evanston IL 60208

More information

Computing With Tensors: Potential Applications of Physics-Motivated Mathematics to Computer Science

Computing With Tensors: Potential Applications of Physics-Motivated Mathematics to Computer Science Computing With Tensors: Potential Applications of Physics-Motivated Mathematics to Computer Science Martine Ceberio and Vladik Kreinovich Department of Computer Science University of Texas at El Paso El

More information

Computation of the mtx-vec product based on storage scheme on vector CPUs

Computation of the mtx-vec product based on storage scheme on vector CPUs BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix Computation of the mtx-vec product based on storage scheme on

More information

Using BLIS for tensor computations in Q-Chem

Using BLIS for tensor computations in Q-Chem Using BLIS for tensor computations in Q-Chem Evgeny Epifanovsky Q-Chem BLIS Retreat, September 19 20, 2016 Q-Chem is an integrated software suite for modeling the properties of molecular systems from first

More information

Multilinear forms. Joel Kamnitzer. April 1, 2011

Multilinear forms. Joel Kamnitzer. April 1, 2011 Multilinear forms Joel Kamnitzer April 1, 2011 Assume that all fields are characteristic 0 (i.e. 1 + + 1 0), for example F = Q, R, C. Assume also that all vector spaces are finite dimensional. 1 Dual spaces

More information

Programm, womit zu. des. Friedrichs-Werderschen Gymnasiums, welche. Mittwoch, den 1. April 1846 Vormittags von 9, Nachmittags von 2 1 Uhr an

Programm, womit zu. des. Friedrichs-Werderschen Gymnasiums, welche. Mittwoch, den 1. April 1846 Vormittags von 9, Nachmittags von 2 1 Uhr an Programm, womit zu der öffentlichen Prüfung der Zöglinge des Friedrichs-Werderschen Gymnasiums, welche Mittwoch, den 1. April 1846 Vormittags von 9, Nachmittags von 2 1 Uhr an 2 in dem Hörsaale der Anstalt

More information

Parallel Transposition of Sparse Data Structures

Parallel Transposition of Sparse Data Structures Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing

More information

u = 1 (B 2 + E2 E B (16.2) + N = j E (16.3) One might be tempted to put u and N into a 4-vector N and write the equation in the form

u = 1 (B 2 + E2 E B (16.2) + N = j E (16.3) One might be tempted to put u and N into a 4-vector N and write the equation in the form Chater 6 Energy-momentum tensor (Version., 3 November 7 Earlier, we obtained for the energy density and flux u = (B + E µ c (6. We also had a continuity equation N = µ E B (6. u t + N = j E (6.3 One might

More information

Vector, Matrix, and Tensor Derivatives

Vector, Matrix, and Tensor Derivatives Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions

More information

Saving Energy in the LU Factorization with Partial Pivoting on Multi-Core Processors

Saving Energy in the LU Factorization with Partial Pivoting on Multi-Core Processors 20th Euromicro International Conference on Parallel, Distributed and Network-Based Special Session on Energy-aware Systems Saving Energy in the on Multi-Core Processors Pedro Alonso 1, Manuel F. Dolz 2,

More information

Lecture III: Tensor calculus and electrodynamics in flat spacetime

Lecture III: Tensor calculus and electrodynamics in flat spacetime Lecture III: Tensor calculus and electrodynamics in flat spacetime Christopher M. Hirata Caltech M/C 350-17, Pasadena CA 91125, USA (Dated: October 5, 201 I. OVERVIEW In this lecture we will continue to

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

What is the Matrix? Linear control of finite-dimensional spaces. November 28, 2010

What is the Matrix? Linear control of finite-dimensional spaces. November 28, 2010 What is the Matrix? Linear control of finite-dimensional spaces. November 28, 2010 Scott Strong sstrong@mines.edu Colorado School of Mines What is the Matrix? p. 1/20 Overview/Keywords/References Advanced

More information

PENGFEI GUAN, QUN LI, AND XI ZHANG

PENGFEI GUAN, QUN LI, AND XI ZHANG A UNIQUENESS THEOREM IN KÄHLER GEOMETRY PENGFEI GUAN, QUN LI, AND XI ZHANG Abstract. We consider compact Kähler manifolds with their Kähler Ricci tensor satisfying F (Ric) = constant. Under the nonnegative

More information

Physics 411 Lecture 13. The Riemann Tensor. Lecture 13. Physics 411 Classical Mechanics II

Physics 411 Lecture 13. The Riemann Tensor. Lecture 13. Physics 411 Classical Mechanics II Physics 411 Lecture 13 The Riemann Tensor Lecture 13 Physics 411 Classical Mechanics II September 26th 2007 We have, so far, studied classical mechanics in tensor notation via the Lagrangian and Hamiltonian

More information

PAPER 52 GENERAL RELATIVITY

PAPER 52 GENERAL RELATIVITY MATHEMATICAL TRIPOS Part III Monday, 1 June, 2015 9:00 am to 12:00 pm PAPER 52 GENERAL RELATIVITY Attempt no more than THREE questions. There are FOUR questions in total. The questions carry equal weight.

More information

Notes on LU Factorization

Notes on LU Factorization Notes on LU Factorization Robert A van de Geijn Department of Computer Science The University of Texas Austin, TX 78712 rvdg@csutexasedu October 11, 2014 The LU factorization is also known as the LU decomposition

More information

Linear Algebra Review

Linear Algebra Review Chapter 1 Linear Algebra Review It is assumed that you have had a course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc. I will review some of these terms here, but quite

More information

AN INVERSE CURVATURE FLOW IN A SPACETIME WITH A FUTURE SINGULARITY

AN INVERSE CURVATURE FLOW IN A SPACETIME WITH A FUTURE SINGULARITY AN INVERSE CURVATURE FLOW IN A SPACETIME WITH A FUTURE SINGULARITY HEIKO KRÖNER Abstract. We consider an inverse curvature flow (ICF) (0.1) ẋ = F 1 ν in a Lorentzian manifold N with a certain future singularity,

More information

Deterministic Finite Automaton (DFA)

Deterministic Finite Automaton (DFA) 1 Lecture Overview Deterministic Finite Automata (DFA) o accepting a string o defining a language Nondeterministic Finite Automata (NFA) o converting to DFA (subset construction) o constructed from a regular

More information

Generality of the quaternionic contact structures

Generality of the quaternionic contact structures Generality of the quaternionic contact structures Jan Slovák Masaryk University, Brno, Czech Republic joint work with Ivan Minchev, Brno / Sofia Warsaw, November 16, 2017 1 Our motivation 2 Geometric structures

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

Parallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen

Parallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen Parallel Iterative Methods for Sparse Linear Systems Lehrstuhl für Hochleistungsrechnen www.sc.rwth-aachen.de RWTH Aachen Large and Sparse Small and Dense Outline Problem with Direct Methods Iterative

More information

SPECIAL RELATIVITY AND ELECTROMAGNETISM

SPECIAL RELATIVITY AND ELECTROMAGNETISM SPECIAL RELATIVITY AND ELECTROMAGNETISM MATH 460, SECTION 500 The following problems (composed by Professor P.B. Yasskin) will lead you through the construction of the theory of electromagnetism in special

More information

SIZE = Vehicle size: 1 small, 2 medium, 3 large. SIDE : 1 right side of car, 2 left side of car

SIZE = Vehicle size: 1 small, 2 medium, 3 large. SIDE : 1 right side of car, 2 left side of car THREE-WAY ANOVA MODELS (CHAPTER 7) Consider a completely randomized design for an experiment with three treatment factors A, B and C. We will assume that every combination of levels of A, B and C is observed

More information

Math 304 (Spring 2010) - Lecture 2

Math 304 (Spring 2010) - Lecture 2 Math 304 (Spring 010) - Lecture Emre Mengi Department of Mathematics Koç University emengi@ku.edu.tr Lecture - Floating Point Operation Count p.1/10 Efficiency of an algorithm is determined by the total

More information

Binding Performance and Power of Dense Linear Algebra Operations

Binding Performance and Power of Dense Linear Algebra Operations 10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique

More information

Chenhan D. Yu The 3rd BLIS Retreat Sep 28, 2015

Chenhan D. Yu The 3rd BLIS Retreat Sep 28, 2015 GSKS GSKNN BLIS-Based High Performance Computing Kernels in N-body Problems Chenhan D. Yu The 3rd BLIS Retreat Sep 28, 2015 N-body Problems Hellstorm Astronomy and 3D https://youtu.be/bllwkx_mrfk 2 N-body

More information

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs Lucien Ng The Chinese University of Hong Kong Kwai Wong The Joint Institute for Computational Sciences (JICS), UTK and ORNL Azzam Haidar,

More information

Applications of the Serre Spectral Sequence

Applications of the Serre Spectral Sequence Applications of the Serre Spectral Seuence Floris van Doorn November, 25 Serre Spectral Seuence Definition A Spectral Seuence is a seuence (E r p,, d r ) consisting of An R-module E r p, for p, and r Differentials

More information

Covarient Formulation Lecture 8

Covarient Formulation Lecture 8 Covarient Formulation Lecture 8 1 Covarient Notation We use a 4-D space represented by the Cartesian coordinates, x 0 (orx 4 ), x 1, x 2, x 3. The components describe a vector (tensor of rank 1) in this

More information

Ma/CS 6a Class 18: Groups

Ma/CS 6a Class 18: Groups Ma/CS 6a Class 18: Groups = Rotation 90 Vertical flip Diagonal flip 2 By Adam Sheffer A Group A group consists of a set G and a binary operation, satisfying the following. Closure. For every x, y G, we

More information

Born on the 27th of October, 1973, in Livorno, ITALY. Italian citizen. Married.

Born on the 27th of October, 1973, in Livorno, ITALY. Italian citizen. Married. Paolo Bientinesi Curriculum Vitae Born on the 27th of October, 1973, in Livorno, ITALY. Italian citizen. Married. Office AICES, Room 432 Phone +49 (241) 8099134 RWTH Aachen Fax +49 (241) 80628498 Schinkelstrasse

More information

Modeling Performance through Memory-Stalls

Modeling Performance through Memory-Stalls Aachen Institute for Avance Stuy in Computational Engineering Science Preprint: AICES-/7- /July/ Moeling Performance through Memory-Stalls R. Iakymchuk an P. Bientinesi Financial support from the Deutsche

More information

Towards Mechanical Derivation of Krylov Solver Libraries

Towards Mechanical Derivation of Krylov Solver Libraries Towards Mechanical Derivation of Krylov Solver Libraries Victor Eijkhout Texas Advanced Computing Center with Paolo Bientinesi and Robert van de Geijn support: National Science Foundation award #0917096

More information

10. Cartan Weyl basis

10. Cartan Weyl basis 10. Cartan Weyl basis 1 10. Cartan Weyl basis From this point on, the discussion will be restricted to semi-simple Lie algebras, which are the ones of principal interest in physics. In dealing with the

More information

Lecture 17: Iterative Methods and Sparse Linear Algebra

Lecture 17: Iterative Methods and Sparse Linear Algebra Lecture 17: Iterative Methods and Sparse Linear Algebra David Bindel 25 Mar 2014 Logistics HW 3 extended to Wednesday after break HW 4 should come out Monday after break Still need project description

More information

TENLAB A MATLAB Ripoff for Tensors

TENLAB A MATLAB Ripoff for Tensors TENLAB A MATLAB Ripoff for Tensors Y. Cem Sübakan, ys2939 Mehmet K. Turkcan, mkt2126 Dallas Randal Jones, drj2115 February 9, 2016 Introduction MATLAB is a great language for manipulating arrays. However,

More information

Lecture 1: Center for Uncertainty Quantification. Alexander Litvinenko. Computation of Karhunen-Loeve Expansion:

Lecture 1: Center for Uncertainty Quantification. Alexander Litvinenko. Computation of Karhunen-Loeve Expansion: tifica Lecture 1: Computation of Karhunen-Loeve Expansion: Alexander Litvinenko http://sri-uq.kaust.edu.sa/ Stochastic PDEs We consider div(κ(x, ω) u) = f (x, ω) in G, u = 0 on G, with stochastic coefficients

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-08-26 1. Our enrollment is at 50, and there are still a few students who want to get in. We only have 50 seats in the room, and I cannot increase the cap further. So if you are

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Modeling Performance through Memory-Stalls

Modeling Performance through Memory-Stalls Moeling Performance through Memory-Stalls Roman Iakymchuk an Paolo Bientinesi AICES, RWTH Aachen Schinkelstr. 5 Aachen, Germany {iakymchuk,paulj}@aices.rwth-aachen.e ABSTRACT We aim at moeling the performance

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

7.2 Linear equation systems. 7.3 Linear least square fit

7.2 Linear equation systems. 7.3 Linear least square fit 72 Linear equation systems In the following sections, we will spend some time to solve linear systems of equations This is a tool that will come in handy in many di erent places during this course For

More information

Systematic Generation of Algorithms for Iterative Methods

Systematic Generation of Algorithms for Iterative Methods Rheinisch-Westfälische Technische Hochschule Aachen Aachen Institute for Advanced Study in Computational Engineering Science Systematic Generation of Algorithms for Iterative Methods Henrik Barthels, B.Sc.

More information

CSE 160 Lecture 13. Numerical Linear Algebra

CSE 160 Lecture 13. Numerical Linear Algebra CSE 16 Lecture 13 Numerical Linear Algebra Announcements Section will be held on Friday as announced on Moodle Midterm Return 213 Scott B Baden / CSE 16 / Fall 213 2 Today s lecture Gaussian Elimination

More information

Factorial Treatment Structure: Part I. Lukas Meier, Seminar für Statistik

Factorial Treatment Structure: Part I. Lukas Meier, Seminar für Statistik Factorial Treatment Structure: Part I Lukas Meier, Seminar für Statistik Factorial Treatment Structure So far (in CRD), the treatments had no structure. So called factorial treatment structure exists if

More information

A.1 Appendix on Cartesian tensors

A.1 Appendix on Cartesian tensors 1 Lecture Notes on Fluid Dynamics (1.63J/2.21J) by Chiang C. Mei, February 6, 2007 A.1 Appendix on Cartesian tensors [Ref 1] : H Jeffreys, Cartesian Tensors; [Ref 2] : Y. C. Fung, Foundations of Solid

More information

Maths Extension 2 - Polynomials. Polynomials

Maths Extension 2 - Polynomials. Polynomials Maths Extension - Polynomials Polynomials! Definitions and properties of polynomials! Factors & Roots! Fields ~ Q Rational ~ R Real ~ C Complex! Finding zeros over the complex field! Factorization & Division

More information

arxiv: v1 [cs.dc] 19 Nov 2016

arxiv: v1 [cs.dc] 19 Nov 2016 A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization with Partial Pivoting arxiv:1611.06365v1 [cs.dc] 19 Nov 2016 Sandra Catalán a, José R. Herrero b, Enrique S. Quintana-Ortí

More information

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Improving the performance of applied science numerical simulations: an application to Density Functional Theory

Improving the performance of applied science numerical simulations: an application to Density Functional Theory Improving the performance of applied science numerical simulations: an application to Density Functional Theory Edoardo Di Napoli Jülich Supercomputing Center - Institute for Advanced Simulation Forschungszentrum

More information

Matrices: 2.1 Operations with Matrices

Matrices: 2.1 Operations with Matrices Goals In this chapter and section we study matrix operations: Define matrix addition Define multiplication of matrix by a scalar, to be called scalar multiplication. Define multiplication of two matrices,

More information

Level-3 BLAS on a GPU

Level-3 BLAS on a GPU Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón

More information

High-Performance Small-Scale Solvers for Moving Horizon Estimation

High-Performance Small-Scale Solvers for Moving Horizon Estimation Downloaded from orbit.dtu.dk on: Oct 14, 218 High-Performance Small-Scale Solvers for Moving Horizon Estimation Frison, Gianluca; Vukov, Milan ; Poulsen, Niels Kjølstad; Diehl, Moritz ; Jørgensen, John

More information

Lecture 3 Linear Algebra Background

Lecture 3 Linear Algebra Background Lecture 3 Linear Algebra Background Dan Sheldon September 17, 2012 Motivation Preview of next class: y (1) w 0 + w 1 x (1) 1 + w 2 x (1) 2 +... + w d x (1) d y (2) w 0 + w 1 x (2) 1 + w 2 x (2) 2 +...

More information

arxiv: v1 [cs.ms] 11 Oct 2017

arxiv: v1 [cs.ms] 11 Oct 2017 Deriving Correct High-Performance Algorithms FLAME Working Note #86 arxiv:1710.04286v1 [cs.ms] Oct 2017 Devangi N. Parikh Margaret E. Myers Robert A. van de Geijn The University of Texas at Austin Austin,

More information

Families of Algorithms for Reducing a Matrix to Condensed Form

Families of Algorithms for Reducing a Matrix to Condensed Form Families of Algorithms for Reducing a Matrix to Condensed Form FIELD G. VAN ZEE, The University of Texas at Austin ROBERT A. VAN DE GEIJN, The University of Texas at Austin GREGORIO QUINTANA-ORTí, Universidad

More information

Strassen-like algorithms for symmetric tensor contractions

Strassen-like algorithms for symmetric tensor contractions Strassen-lie algorithms for symmetric tensor contractions Edgar Solomoni University of Illinois at Urbana-Champaign Scientfic and Statistical Computing Seminar University of Chicago April 13, 2017 1 /

More information

CS 542G: Conditioning, BLAS, LU Factorization

CS 542G: Conditioning, BLAS, LU Factorization CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles

More information

Algorithms as multilinear tensor equations

Algorithms as multilinear tensor equations Algorithms as multilinear tensor equations Edgar Solomonik Department of Computer Science ETH Zurich Technische Universität München 18.1.2016 Edgar Solomonik Algorithms as multilinear tensor equations

More information

REPRESENTATIONS FOR THE THEORY AND PRACTICE OF HIGH-PERFORMANCE DENSE LINEAR ALGEBRA ALGORITHMS. DRAFT October 23, 2007

REPRESENTATIONS FOR THE THEORY AND PRACTICE OF HIGH-PERFORMANCE DENSE LINEAR ALGEBRA ALGORITHMS. DRAFT October 23, 2007 REPRESENTATIONS FOR THE THEORY AND PRACTICE OF HIGH-PERFORMANCE DENSE LINEAR ALGEBRA ALGORITHMS ROBERT A VAN DE GEIJN DRAFT October 23, 2007 Abstract The Cholesky factorization operation is used to demonstrate

More information

Curved Spacetime I. Dr. Naylor

Curved Spacetime I. Dr. Naylor Curved Spacetime I Dr. Naylor Last Week Einstein's principle of equivalence We discussed how in the frame of reference of a freely falling object we can construct a locally inertial frame (LIF) Space tells

More information

UNIT-I CURVE FITTING AND THEORY OF EQUATIONS

UNIT-I CURVE FITTING AND THEORY OF EQUATIONS Part-A 1. Define linear law. The relation between the variables x & y is liner. Let y = ax + b (1) If the points (x i, y i ) are plotted in the graph sheet, they should lie on a straight line. a is the

More information

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals

More information