Performance Prediction for Tensor Contractions
|
|
- Primrose Daniels
- 5 years ago
- Views:
Transcription
1 1 / 18 Performance Prediction for Tensor Contractions Paolo Bientinesi, Edoardo Di Napoli, Diego Fabregat, Elmar Peise AICES, RWTH Aachen pauldj@aices.rwth-aachen.de June 3rd, 214 PASCConference 14 Zürich, Switzerland
2 Tensors Crash course (1/2) 2 / 18 MATHEMATICS, PHYSICS multilinear map multidimensional array + metric COMPUTER SCIENCE multidimensional array
3 Tensors Crash course (1/2) 2 / 18 MATHEMATICS, PHYSICS multilinear map multidimensional array + metric COMPUTER SCIENCE multidimensional array t dimensional tensors S αβ...γδ, S β γ α... δ, S...γδ αβ,... }{{} t indices
4 Tensors Crash course (1/2) 2 / 18 MATHEMATICS, PHYSICS multilinear map multidimensional array + metric COMPUTER SCIENCE multidimensional array t dimensional tensors S αβ...γδ, S β γ α... δ, S...γδ αβ,... }{{} t indices Operations low dimensional approximations contractions
5 Tensors Crash course (1/2) 2 / 18 MATHEMATICS, PHYSICS multilinear map multidimensional array + metric COMPUTER SCIENCE multidimensional array t dimensional tensors S αβ...γδ, S β γ α... δ, S...γδ αβ,... }{{} t indices Operations low dimensional approximations contractions Examples: S αij T ij, S αiβ M ik T kγ, S αij M ik T kh M hj,...
6 Tensors Crash course (1/2) 2 / 18 MATHEMATICS, PHYSICS multilinear map multidimensional array + metric COMPUTER SCIENCE multidimensional array t dimensional tensors S αβ...γδ, S β γ α... δ, S...γδ αβ,... }{{} t indices Operations low dimensional approximations contractions Examples: S αij T ij, S αiβ M ik T kγ, S αij M ik T kh M hj,...
7 Crash course (2/2) 3 / 18 Contraction S αij T jδi α, δ free indices i, j contracted indices
8 Crash course (2/2) 3 / 18 Contraction S αij T jδi α, δ free indices i, j contracted indices V αδ = S αij T jδi α δ v αδ = i s αij t jδi j
9 Crash course (2/2) 3 / 18 Contraction S αij T jδi α, δ free indices i, j contracted indices V αδ = S αij T jδi α δ v αδ = i s αij t jδi j Storage S αβγ... α stride 1 β stride α γ stride α β.
10 A well known contraction (1/2) 4 / 18 C ij = A ik B kj
11 A well known contraction (1/2) 4 / 18 C ij = A ik B kj Direct call C := GEMM(A,B)
12 A well known contraction (1/2) 4 / 18 C ij = A ik B kj Direct call C := GEMM(A,B) 1 A is sliced horizontally C := AB = a 1. a m B = a 1 B. a m B for i=1,..., Ci:=GEMV(Ai,B)
13 A well known contraction (1/2) 4 / 18 C ij = A ik B kj Direct call C := GEMM(A,B) 1 A is sliced horizontally C := AB = a 1. a m B = a 1 B. a m B for i=1,..., Ci:=GEMV(Ai,B) 2 B is sliced vertically C := AB = A[b 1 b 2... b n ] = [Ab 1 Ab 2... Ab n ] for i=1,..., Ci:=GEMV(A,Bi)
14 A well known contraction (2/2) 5 / 18 C ij = A ik B kj 3 A is sliced vertically and B horizontally b 1 [a 1... a k ]. = a 1 b 1 + a 2 b a k b k b k for i=1,..., C+=GER(Ai,Bi)
15 A well known contraction (2/2) 5 / 18 C ij = A ik B kj 3 A is sliced vertically and B horizontally b 1 [a 1... a k ]. = a 1 b 1 + a 2 b a k b k b k for i=1,..., C+=GER(Ai,Bi) 4 A is sliced horizontally and B vertically a 1 a 1 b 1... a 1 b n. [b 1... b n ] =.... a m a m b 1 a m b n for i=1,..., for j=1,..., Cij:=DOT(Ai,Bj)
16 Mathematically equivalent, but... All experiments: OpenBLAS.2.8, Intel IvyBridge_EP E5-268v2 flops / cycle GEMM GER GEMV GEMV DOT DOT 5 1, 1,5 2, 2,5 k 6 / 18
17 How to use BLAS for contractions? 7 / 18 R αβγ := T αβσ S σγ
18 How to use BLAS for contractions? 7 / 18 R αβγ := T αβσ S σγ 1) Slicing along β R α1γr α2γ R α3γ R αnγ T α1σ T α2σ T α3σ T αnσ S σγ GEMM
19 How to use BLAS for contractions? 7 / 18 R αβγ := T αβσ S σγ 2) Slicing along α R mβγ T mβσ S σγ R 3βγ R 2βγ R 1βγ T 3βσ T 2βσ T 1βσ GEMM Transposition + GEMM
20 How to use BLAS for contractions? 7 / 18 R αβγ := T αβσ S σγ 3) Slicing along α and β S σγ GEMV
21 Taxonomy 8 / 18 V h1 h 2... := S i1 i 2...T j1 j 2... Definition: (X) = # of free indices of X
22 Taxonomy 8 / 18 V h1 h 2... := S i1 i 2...T j1 j 2... Definition: (X) = # of free indices of X Class 1: (S) = (T ) = BLAS3 BLAS2 BLAS1
23 Taxonomy 8 / 18 V h1 h 2... := S i1 i 2...T j1 j 2... Definition: (X) = # of free indices of X Class 1: (S) = (T ) = BLAS3 BLAS2 BLAS1 Class 2: (S) 1 (T ) = or (S) = (T ) 1 BLAS3 BLAS2 (+ transp) BLAS1
24 Taxonomy V h1 h 2... := S i1 i 2...T j1 j 2... Definition: (X) = # of free indices of X Class 1: (S) = (T ) = BLAS3 BLAS2 BLAS1 Class 2: (S) 1 (T ) = or (S) = (T ) 1 BLAS3 BLAS2 (+ transp) BLAS1 Class 3: (S) 1 (T ) 1 BLAS3 (+ transp) BLAS2 BLAS1 Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions, Applied Mathematics and Computation, / 18
25 Nice and easy... 9 / 18 V bcd := S ijb T icjd
26 Nice and easy variants 9 / 18 flops / cycle V bcd := S ijb T icjd GEMM GER GEMV DOT , b = c = d
27 Small dimensions... flops / cycle GEMM GER GEMV GEMV DOT DOT k 1 / 18
28 Small dimensions... flops / cycle GEMM GER GEMV GEMV DOT DOT m = n = k 1 / 18
29 Small dimensions... flops / cycle GEMM GER GEMV GEMV DOT DOT flops / cycle GEMM GER GEMV GEMV DOT DOT m = n = k m = n = k 1 / 18
30 11 / 18 Goal Automatic selection of the best variants Idea Performance prediction Approach - Kernels execution - Algorithms execution Challenges Fluctuations uncertainties Cache influence Solution - Performance models - Context-aware timings
31 Fluctuations performance models 2 15 σ/ x m = n = k 12 / 18
32 Fluctuations performance models 1,24 2% % n ,24 m 1% 5% % Performance Modeling for Dense Linear Algebra, E. Peise, P.B., PMBS12 (SC12). 12 / 18
33 Fluctuations performance models 1,24 2% % n ,24 m 1% 5% % Performance Modeling for Dense Linear Algebra, E. Peise, P.B., PMBS12 (SC12). 12 / 18
34 Models... Timings? 13 / 18 Observation: typical linear algebra algorithms shrinking active region tensor contractions identical size slices
35 Models... Timings? 13 / 18 Observation: typical linear algebra algorithms shrinking active region tensor contractions identical size slices V a := S aij T ij flops / cycle GEMV GEMV DOT DOT DOT DOT , a = i = j
36 Models... Timings? 13 / 18 Observation: typical linear algebra algorithms shrinking active region tensor contractions identical size slices V a := S aij T ij flops / cycle GEMV GEMV DOT DOT DOT DOT flops / cycle , a = i = j , a = i = j
37 Influence of caching (1/2) 6 14 measured independent timing #cycles 4 2 for i=1,..., c_i:=gemv(a_i,b) invocation of GEMV 14 / 18
38 Influence of caching (1/2) 6 14 measured independent timing #cycles 4 2 for i=1,..., c_i:=gemv(a_i,b) Idea: cache setup invocation of GEMV 14 / 18
39 Influence of caching (1/2) 6 14 measured independent timing cache aware timing #cycles 4 2 for i=1,..., c_i:=gemv(a_i,b) Idea: cache setup invocation of GEMV 14 / 18
40 Influence of caching (2/2) measured cache aware timing.8 #cycles.6.4 for i=1,..., for j=1,..., A_i:=GER(ai,bj) invocation of GER 15 / 18
41 Influence of caching (2/2) measured cache aware timing.8 #cycles for i=1,..., for j=1,..., A_i:=GER(ai,bj) Idea: first iteration invocation of GER 15 / 18
42 Influence of caching (2/2) measured cache aware timing loop aware timing #cycles for i=1,..., for j=1,..., A_i:=GER(ai,bj) Idea: first iteration invocation of GER 15 / 18
43 (Nice) results 16 / 18 V a := S aij T ij flops / cycle GEMV GEMV DOT DOT DOT DOT flops / cycle , a = i = j , a = i = j
44 (So so) results 17 / 18 V bcd := S ijb T icjd flops / cycle 2 flops / cycle , b = c = d , b = c = d
45 (Awesome) results 18 / 18 V bcd := S ijb T icjd flops / cycle 4 flops / cycle , b = c = d , b = c = d
46 Conclusions 18 / 18 Tensor contractions algorithmic space need for BLAS4? automation? 8 matrix operations LARGE! maybe Yes, please! flops / cycle 4 flops / cycle , b = c = d , b = c = d
A Compiler for Linear Algebra Operations
A Compiler for Linear Algebra Operations Paolo Bientinesi In collaboration with Diego Fabregat AICES, RWTH Aachen pauldj@aices.rwth-aachen.de CScADS Autotuning Workshop 2012 August 13-14, 2012 Snowbird,
More informationA tale of efficiency and productivity. From scalar to tensor computations.
A tale of efficiency and productivity. From scalar to tensor computations. Paolo Bientinesi Aachen nstitute for Computational Engineering Science RWTH Aachen University October 23, 2017 Umeå Universitet
More informationStrassen s Algorithm for Tensor Contraction
Strassen s Algorithm for Tensor Contraction Jianyu Huang, Devin A. Matthews, Robert A. van de Geijn The University of Texas at Austin September 14-15, 2017 Tensor Computation Workshop Flatiron Institute,
More informationA knowledge-based approach to high-performance computing in ab initio simulations.
Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background
More informationarxiv: v3 [cs.ms] 7 Nov 2017
A Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer, AICES, RWTH Aachen Paolo Bientinesi, AICES, RWTH Aachen arxiv:1607.00145v3 [cs.ms] 7 Nov 2017 We present GEMM-like Tensor-Tensor
More informationA Design of a High-Performance GEMM-like Tensor-Tensor Multiplication
A Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer, AICES, RWTH Aachen Paolo Bientinesi, AICES, RWTH Aachen We present GEMM-like Tensor-Tensor multiplication (GETT), a
More informationMitglied der Helmholtz-Gemeinschaft. Linear algebra tasks in Materials Science: optimization and portability
Mitglied der Helmholtz-Gemeinschaft Linear algebra tasks in Materials Science: optimization and portability ADAC Workshop, July 17-19 2017 Edoardo Di Napoli Outline Jülich Supercomputing Center Chebyshev
More informationBLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product
Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8
More informationRWTH Aachen University
IPCC @ RWTH Aachen University Optimization of multibody and long-range solvers in LAMMPS Rodrigo Canales William McDoniel Markus Höhnerbach Ahmed E. Ismail Paolo Bientinesi IPCC Showcase November 2016
More informationTensor Contractions with Extended BLAS Kernels on CPU and GPU
Tensor Contractions with Extended BLAS Kernels on CPU and GPU Cris Cecka Senior Research Scientist NVIDIA Research, Santa Clara, California Joint work with Yang Shi, U.N. Niranjan, and Animashree Anandkumar
More informationAlgorithms and Methods for Fast Model Predictive Control
Algorithms and Methods for Fast Model Predictive Control Technical University of Denmark Department of Applied Mathematics and Computer Science 13 April 2016 Background: Model Predictive Control Model
More informationThe Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors
Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors
More informationEfficient algorithms for symmetric tensor contractions
Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to
More informationKnowledge-Based Automatic Generation of Algorithms and Code
Knowledge-Based Automatic Generation of Algorithms and Code Diego Fabregat Traver AICES, RWTH Aachen fabregat@aices.rwth-aachen.de Doctoral Defense Aachen, December 6th, 2013 Diego Fabregat (AICES, RWTH
More informationMathematics 13: Lecture 10
Mathematics 13: Lecture 10 Matrices Dan Sloughter Furman University January 25, 2008 Dan Sloughter (Furman University) Mathematics 13: Lecture 10 January 25, 2008 1 / 19 Matrices Recall: A matrix is a
More informationThe existence of Burnett coefficients in the periodic Lorentz gas
The existence of Burnett coefficients in the periodic Lorentz gas N. I. Chernov and C. P. Dettmann September 14, 2006 Abstract The linear super-burnett coefficient gives corrections to the diffusion equation
More informationMath 671: Tensor Train decomposition methods
Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016 Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3
More informationMatrix Powers and Applications
Assistant Professor of Mathematics 3/7/18 Motivation Let A = [ 1 3 1 2 ]. Suppose we d like to find the 99 th power of A, Would you believe that A 99 = A 99 = A } A {{ A A}. 99 times [ 1 0 0 1 ]? Motivation
More informationDraft Version 1 Mark scheme Further Maths Core Pure (AS/Year 1) Unit Test 5: Algebra and Functions. Q Scheme Marks AOs. Notes
1 b Uses α + β = to write 4p = 6 a TBC Solves to find 3 p = Uses c αβ = to write a 30 3p = k Solves to find k = 40 9 (4) (4 marks) Education Ltd 018. Copying permitted for purchasing institution only.
More informationarxiv:hep-ex/ v1 17 Sep 1999
Propagation of Errors for Matrix Inversion M. Lefebvre 1, R.K. Keeler 1, R. Sobie 1,2, J. White 1,3 arxiv:hep-ex/9909031v1 17 Sep 1999 Abstract A formula is given for the propagation of errors during matrix
More informationStrassen-like algorithms for symmetric tensor contractions
Strassen-like algorithms for symmetric tensor contractions Edgar Solomonik Theory Seminar University of Illinois at Urbana-Champaign September 18, 2017 1 / 28 Fast symmetric tensor contractions Outline
More informationTENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018
TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in
More informationA Note on Time Measurements in LAMMPS
arxiv:1602.05566v1 [cond-mat.mtrl-sci] 17 Feb 2016 A Note on Time Measurements in LAMMPS Daniel Tameling, Paolo Bientinesi, and Ahmed E. Ismail Aachen Institute for Advanced Study in Computational Engineering
More informationCyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions
Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions Edgar Solomonik 1, Devin Matthews 3, Jeff Hammond 4, James Demmel 1,2 1 Department of
More informationPAPER 309 GENERAL RELATIVITY
MATHEMATICAL TRIPOS Part III Monday, 30 May, 2016 9:00 am to 12:00 pm PAPER 309 GENERAL RELATIVITY Attempt no more than THREE questions. There are FOUR questions in total. The questions carry equal weight.
More informationTensor Network Computations in Quantum Chemistry. Charles F. Van Loan Department of Computer Science Cornell University
Tensor Network Computations in Quantum Chemistry Charles F. Van Loan Department of Computer Science Cornell University Joint work with Garnet Chan, Department of Chemistry and Chemical Biology, Cornell
More informationHPMPC - A new software package with efficient solvers for Model Predictive Control
- A new software package with efficient solvers for Model Predictive Control Technical University of Denmark CITIES Second General Consortium Meeting, DTU, Lyngby Campus, 26-27 May 2015 Introduction Model
More informationEinstein Toolkit Workshop. Joshua Faber Apr
Einstein Toolkit Workshop Joshua Faber Apr 05 2012 Outline Space, time, and special relativity The metric tensor and geometry Curvature Geodesics Einstein s equations The Stress-energy tensor 3+1 formalisms
More informationTowards Fast, Accurate and Reproducible LU Factorization
Towards Fast, Accurate and Reproducible LU Factorization Roman Iakymchuk 1, David Defour 2, and Stef Graillat 3 1 KTH Royal Institute of Technology, CSC, CST/PDC 2 Université de Perpignan, DALI LIRMM 3
More informationIntel Math Kernel Library (Intel MKL) LAPACK
Intel Math Kernel Library (Intel MKL) LAPACK Linear equations Victor Kostin Intel MKL Dense Solvers team manager LAPACK http://www.netlib.org/lapack Systems of Linear Equations Linear Least Squares Eigenvalue
More informationICS141: Discrete Mathematics for Computer Science I
ICS4: Discrete Mathematics for Computer Science I Dept. Information & Computer Sci., Jan Stelovsky based on slides by Dr. Baek and Dr. Still Originals by Dr. M. P. Frank and Dr. J.L. Gross Provided by
More informationCyclops Tensor Framework
Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r
More informationHeterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry
Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)
More information7.1 Creation and annihilation operators
Chapter 7 Second Quantization Creation and annihilation operators. Occupation number. Anticommutation relations. Normal product. Wick s theorem. One-body operator in second quantization. Hartree- Fock
More informationUsing Kernel Couplings to Predict Parallel Application Performance
Using Kernel Couplings to Predict Parallel Application Performance Valerie Taylor, Xingfu Wu, Jonathan Geisler Department of Electrical and Computer Engineering, Northwestern University, Evanston IL 60208
More informationComputing With Tensors: Potential Applications of Physics-Motivated Mathematics to Computer Science
Computing With Tensors: Potential Applications of Physics-Motivated Mathematics to Computer Science Martine Ceberio and Vladik Kreinovich Department of Computer Science University of Texas at El Paso El
More informationComputation of the mtx-vec product based on storage scheme on vector CPUs
BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix Computation of the mtx-vec product based on storage scheme on
More informationUsing BLIS for tensor computations in Q-Chem
Using BLIS for tensor computations in Q-Chem Evgeny Epifanovsky Q-Chem BLIS Retreat, September 19 20, 2016 Q-Chem is an integrated software suite for modeling the properties of molecular systems from first
More informationMultilinear forms. Joel Kamnitzer. April 1, 2011
Multilinear forms Joel Kamnitzer April 1, 2011 Assume that all fields are characteristic 0 (i.e. 1 + + 1 0), for example F = Q, R, C. Assume also that all vector spaces are finite dimensional. 1 Dual spaces
More informationProgramm, womit zu. des. Friedrichs-Werderschen Gymnasiums, welche. Mittwoch, den 1. April 1846 Vormittags von 9, Nachmittags von 2 1 Uhr an
Programm, womit zu der öffentlichen Prüfung der Zöglinge des Friedrichs-Werderschen Gymnasiums, welche Mittwoch, den 1. April 1846 Vormittags von 9, Nachmittags von 2 1 Uhr an 2 in dem Hörsaale der Anstalt
More informationParallel Transposition of Sparse Data Structures
Parallel Transposition of Sparse Data Structures Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng Department of Computer Science, Virginia Tech Niels Bohr Institute, University of Copenhagen Scientific Computing
More informationu = 1 (B 2 + E2 E B (16.2) + N = j E (16.3) One might be tempted to put u and N into a 4-vector N and write the equation in the form
Chater 6 Energy-momentum tensor (Version., 3 November 7 Earlier, we obtained for the energy density and flux u = (B + E µ c (6. We also had a continuity equation N = µ E B (6. u t + N = j E (6.3 One might
More informationVector, Matrix, and Tensor Derivatives
Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions
More informationSaving Energy in the LU Factorization with Partial Pivoting on Multi-Core Processors
20th Euromicro International Conference on Parallel, Distributed and Network-Based Special Session on Energy-aware Systems Saving Energy in the on Multi-Core Processors Pedro Alonso 1, Manuel F. Dolz 2,
More informationLecture III: Tensor calculus and electrodynamics in flat spacetime
Lecture III: Tensor calculus and electrodynamics in flat spacetime Christopher M. Hirata Caltech M/C 350-17, Pasadena CA 91125, USA (Dated: October 5, 201 I. OVERVIEW In this lecture we will continue to
More informationLinear Algebra (Review) Volker Tresp 2018
Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c
More informationWhat is the Matrix? Linear control of finite-dimensional spaces. November 28, 2010
What is the Matrix? Linear control of finite-dimensional spaces. November 28, 2010 Scott Strong sstrong@mines.edu Colorado School of Mines What is the Matrix? p. 1/20 Overview/Keywords/References Advanced
More informationPENGFEI GUAN, QUN LI, AND XI ZHANG
A UNIQUENESS THEOREM IN KÄHLER GEOMETRY PENGFEI GUAN, QUN LI, AND XI ZHANG Abstract. We consider compact Kähler manifolds with their Kähler Ricci tensor satisfying F (Ric) = constant. Under the nonnegative
More informationPhysics 411 Lecture 13. The Riemann Tensor. Lecture 13. Physics 411 Classical Mechanics II
Physics 411 Lecture 13 The Riemann Tensor Lecture 13 Physics 411 Classical Mechanics II September 26th 2007 We have, so far, studied classical mechanics in tensor notation via the Lagrangian and Hamiltonian
More informationPAPER 52 GENERAL RELATIVITY
MATHEMATICAL TRIPOS Part III Monday, 1 June, 2015 9:00 am to 12:00 pm PAPER 52 GENERAL RELATIVITY Attempt no more than THREE questions. There are FOUR questions in total. The questions carry equal weight.
More informationNotes on LU Factorization
Notes on LU Factorization Robert A van de Geijn Department of Computer Science The University of Texas Austin, TX 78712 rvdg@csutexasedu October 11, 2014 The LU factorization is also known as the LU decomposition
More informationLinear Algebra Review
Chapter 1 Linear Algebra Review It is assumed that you have had a course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc. I will review some of these terms here, but quite
More informationAN INVERSE CURVATURE FLOW IN A SPACETIME WITH A FUTURE SINGULARITY
AN INVERSE CURVATURE FLOW IN A SPACETIME WITH A FUTURE SINGULARITY HEIKO KRÖNER Abstract. We consider an inverse curvature flow (ICF) (0.1) ẋ = F 1 ν in a Lorentzian manifold N with a certain future singularity,
More informationDeterministic Finite Automaton (DFA)
1 Lecture Overview Deterministic Finite Automata (DFA) o accepting a string o defining a language Nondeterministic Finite Automata (NFA) o converting to DFA (subset construction) o constructed from a regular
More informationGenerality of the quaternionic contact structures
Generality of the quaternionic contact structures Jan Slovák Masaryk University, Brno, Czech Republic joint work with Ivan Minchev, Brno / Sofia Warsaw, November 16, 2017 1 Our motivation 2 Geometric structures
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationAntti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA
S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum
More informationParallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen
Parallel Iterative Methods for Sparse Linear Systems Lehrstuhl für Hochleistungsrechnen www.sc.rwth-aachen.de RWTH Aachen Large and Sparse Small and Dense Outline Problem with Direct Methods Iterative
More informationSPECIAL RELATIVITY AND ELECTROMAGNETISM
SPECIAL RELATIVITY AND ELECTROMAGNETISM MATH 460, SECTION 500 The following problems (composed by Professor P.B. Yasskin) will lead you through the construction of the theory of electromagnetism in special
More informationSIZE = Vehicle size: 1 small, 2 medium, 3 large. SIDE : 1 right side of car, 2 left side of car
THREE-WAY ANOVA MODELS (CHAPTER 7) Consider a completely randomized design for an experiment with three treatment factors A, B and C. We will assume that every combination of levels of A, B and C is observed
More informationMath 304 (Spring 2010) - Lecture 2
Math 304 (Spring 010) - Lecture Emre Mengi Department of Mathematics Koç University emengi@ku.edu.tr Lecture - Floating Point Operation Count p.1/10 Efficiency of an algorithm is determined by the total
More informationBinding Performance and Power of Dense Linear Algebra Operations
10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique
More informationChenhan D. Yu The 3rd BLIS Retreat Sep 28, 2015
GSKS GSKNN BLIS-Based High Performance Computing Kernels in N-body Problems Chenhan D. Yu The 3rd BLIS Retreat Sep 28, 2015 N-body Problems Hellstorm Astronomy and 3D https://youtu.be/bllwkx_mrfk 2 N-body
More informationMagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs
MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs Lucien Ng The Chinese University of Hong Kong Kwai Wong The Joint Institute for Computational Sciences (JICS), UTK and ORNL Azzam Haidar,
More informationApplications of the Serre Spectral Sequence
Applications of the Serre Spectral Seuence Floris van Doorn November, 25 Serre Spectral Seuence Definition A Spectral Seuence is a seuence (E r p,, d r ) consisting of An R-module E r p, for p, and r Differentials
More informationCovarient Formulation Lecture 8
Covarient Formulation Lecture 8 1 Covarient Notation We use a 4-D space represented by the Cartesian coordinates, x 0 (orx 4 ), x 1, x 2, x 3. The components describe a vector (tensor of rank 1) in this
More informationMa/CS 6a Class 18: Groups
Ma/CS 6a Class 18: Groups = Rotation 90 Vertical flip Diagonal flip 2 By Adam Sheffer A Group A group consists of a set G and a binary operation, satisfying the following. Closure. For every x, y G, we
More informationBorn on the 27th of October, 1973, in Livorno, ITALY. Italian citizen. Married.
Paolo Bientinesi Curriculum Vitae Born on the 27th of October, 1973, in Livorno, ITALY. Italian citizen. Married. Office AICES, Room 432 Phone +49 (241) 8099134 RWTH Aachen Fax +49 (241) 80628498 Schinkelstrasse
More informationModeling Performance through Memory-Stalls
Aachen Institute for Avance Stuy in Computational Engineering Science Preprint: AICES-/7- /July/ Moeling Performance through Memory-Stalls R. Iakymchuk an P. Bientinesi Financial support from the Deutsche
More informationTowards Mechanical Derivation of Krylov Solver Libraries
Towards Mechanical Derivation of Krylov Solver Libraries Victor Eijkhout Texas Advanced Computing Center with Paolo Bientinesi and Robert van de Geijn support: National Science Foundation award #0917096
More information10. Cartan Weyl basis
10. Cartan Weyl basis 1 10. Cartan Weyl basis From this point on, the discussion will be restricted to semi-simple Lie algebras, which are the ones of principal interest in physics. In dealing with the
More informationLecture 17: Iterative Methods and Sparse Linear Algebra
Lecture 17: Iterative Methods and Sparse Linear Algebra David Bindel 25 Mar 2014 Logistics HW 3 extended to Wednesday after break HW 4 should come out Monday after break Still need project description
More informationTENLAB A MATLAB Ripoff for Tensors
TENLAB A MATLAB Ripoff for Tensors Y. Cem Sübakan, ys2939 Mehmet K. Turkcan, mkt2126 Dallas Randal Jones, drj2115 February 9, 2016 Introduction MATLAB is a great language for manipulating arrays. However,
More informationLecture 1: Center for Uncertainty Quantification. Alexander Litvinenko. Computation of Karhunen-Loeve Expansion:
tifica Lecture 1: Computation of Karhunen-Loeve Expansion: Alexander Litvinenko http://sri-uq.kaust.edu.sa/ Stochastic PDEs We consider div(κ(x, ω) u) = f (x, ω) in G, u = 0 on G, with stochastic coefficients
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 Logistics Notes for 2016-08-26 1. Our enrollment is at 50, and there are still a few students who want to get in. We only have 50 seats in the room, and I cannot increase the cap further. So if you are
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationModeling Performance through Memory-Stalls
Moeling Performance through Memory-Stalls Roman Iakymchuk an Paolo Bientinesi AICES, RWTH Aachen Schinkelstr. 5 Aachen, Germany {iakymchuk,paulj}@aices.rwth-aachen.e ABSTRACT We aim at moeling the performance
More informationA Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!
More information7.2 Linear equation systems. 7.3 Linear least square fit
72 Linear equation systems In the following sections, we will spend some time to solve linear systems of equations This is a tool that will come in handy in many di erent places during this course For
More informationSystematic Generation of Algorithms for Iterative Methods
Rheinisch-Westfälische Technische Hochschule Aachen Aachen Institute for Advanced Study in Computational Engineering Science Systematic Generation of Algorithms for Iterative Methods Henrik Barthels, B.Sc.
More informationCSE 160 Lecture 13. Numerical Linear Algebra
CSE 16 Lecture 13 Numerical Linear Algebra Announcements Section will be held on Friday as announced on Moodle Midterm Return 213 Scott B Baden / CSE 16 / Fall 213 2 Today s lecture Gaussian Elimination
More informationFactorial Treatment Structure: Part I. Lukas Meier, Seminar für Statistik
Factorial Treatment Structure: Part I Lukas Meier, Seminar für Statistik Factorial Treatment Structure So far (in CRD), the treatments had no structure. So called factorial treatment structure exists if
More informationA.1 Appendix on Cartesian tensors
1 Lecture Notes on Fluid Dynamics (1.63J/2.21J) by Chiang C. Mei, February 6, 2007 A.1 Appendix on Cartesian tensors [Ref 1] : H Jeffreys, Cartesian Tensors; [Ref 2] : Y. C. Fung, Foundations of Solid
More informationMaths Extension 2 - Polynomials. Polynomials
Maths Extension - Polynomials Polynomials! Definitions and properties of polynomials! Factors & Roots! Fields ~ Q Rational ~ R Real ~ C Complex! Finding zeros over the complex field! Factorization & Division
More informationarxiv: v1 [cs.dc] 19 Nov 2016
A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization with Partial Pivoting arxiv:1611.06365v1 [cs.dc] 19 Nov 2016 Sandra Catalán a, José R. Herrero b, Enrique S. Quintana-Ortí
More informationAccelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationImproving the performance of applied science numerical simulations: an application to Density Functional Theory
Improving the performance of applied science numerical simulations: an application to Density Functional Theory Edoardo Di Napoli Jülich Supercomputing Center - Institute for Advanced Simulation Forschungszentrum
More informationMatrices: 2.1 Operations with Matrices
Goals In this chapter and section we study matrix operations: Define matrix addition Define multiplication of matrix by a scalar, to be called scalar multiplication. Define multiplication of two matrices,
More informationLevel-3 BLAS on a GPU
Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón
More informationHigh-Performance Small-Scale Solvers for Moving Horizon Estimation
Downloaded from orbit.dtu.dk on: Oct 14, 218 High-Performance Small-Scale Solvers for Moving Horizon Estimation Frison, Gianluca; Vukov, Milan ; Poulsen, Niels Kjølstad; Diehl, Moritz ; Jørgensen, John
More informationLecture 3 Linear Algebra Background
Lecture 3 Linear Algebra Background Dan Sheldon September 17, 2012 Motivation Preview of next class: y (1) w 0 + w 1 x (1) 1 + w 2 x (1) 2 +... + w d x (1) d y (2) w 0 + w 1 x (2) 1 + w 2 x (2) 2 +...
More informationarxiv: v1 [cs.ms] 11 Oct 2017
Deriving Correct High-Performance Algorithms FLAME Working Note #86 arxiv:1710.04286v1 [cs.ms] Oct 2017 Devangi N. Parikh Margaret E. Myers Robert A. van de Geijn The University of Texas at Austin Austin,
More informationFamilies of Algorithms for Reducing a Matrix to Condensed Form
Families of Algorithms for Reducing a Matrix to Condensed Form FIELD G. VAN ZEE, The University of Texas at Austin ROBERT A. VAN DE GEIJN, The University of Texas at Austin GREGORIO QUINTANA-ORTí, Universidad
More informationStrassen-like algorithms for symmetric tensor contractions
Strassen-lie algorithms for symmetric tensor contractions Edgar Solomoni University of Illinois at Urbana-Champaign Scientfic and Statistical Computing Seminar University of Chicago April 13, 2017 1 /
More informationCS 542G: Conditioning, BLAS, LU Factorization
CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles
More informationAlgorithms as multilinear tensor equations
Algorithms as multilinear tensor equations Edgar Solomonik Department of Computer Science ETH Zurich Technische Universität München 18.1.2016 Edgar Solomonik Algorithms as multilinear tensor equations
More informationREPRESENTATIONS FOR THE THEORY AND PRACTICE OF HIGH-PERFORMANCE DENSE LINEAR ALGEBRA ALGORITHMS. DRAFT October 23, 2007
REPRESENTATIONS FOR THE THEORY AND PRACTICE OF HIGH-PERFORMANCE DENSE LINEAR ALGEBRA ALGORITHMS ROBERT A VAN DE GEIJN DRAFT October 23, 2007 Abstract The Cholesky factorization operation is used to demonstrate
More informationCurved Spacetime I. Dr. Naylor
Curved Spacetime I Dr. Naylor Last Week Einstein's principle of equivalence We discussed how in the frame of reference of a freely falling object we can construct a locally inertial frame (LIF) Space tells
More informationUNIT-I CURVE FITTING AND THEORY OF EQUATIONS
Part-A 1. Define linear law. The relation between the variables x & y is liner. Let y = ax + b (1) If the points (x i, y i ) are plotted in the graph sheet, they should lie on a straight line. a is the
More informationBlock Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems
Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals
More information