Cache Oblivious Stencil Computations

Similar documents
JACOBI S ITERATION METHOD

Computational Methods. Systems of Linear Equations

Lecture 18 Classical Iterative Methods

Department of Mathematics California State University, Los Angeles Master s Degree Comprehensive Examination in. NUMERICAL ANALYSIS Spring 2015

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Motivation: Sparse matrices and numerical PDE's

LINEAR SYSTEMS (11) Intensive Computation

Next topics: Solving systems of linear equations

9. Iterative Methods for Large Linear Systems

Introduction to PDEs and Numerical Methods Lecture 7. Solving linear systems

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

COURSE Iterative methods for solving linear systems

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

Iterative Methods. Splitting Methods

Numerical Solution Techniques in Mechanical and Aerospace Engineering

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns

Lecture 16 Methods for System of Linear Equations (Linear Systems) Songting Luo. Department of Mathematics Iowa State University

Chapter Two: Numerical Methods for Elliptic PDEs. 1 Finite Difference Methods for Elliptic PDEs

Classical iterative methods for linear systems

30.5. Iterative Methods for Systems of Equations. Introduction. Prerequisites. Learning Outcomes

Algebra C Numerical Linear Algebra Sample Exam Problems

The Solution of Linear Systems AX = B

Computational Linear Algebra

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

CHAPTER 5. Basic Iterative Methods

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Solving PDEs with CUDA Jonathan Cohen

ECE539 - Advanced Theory of Semiconductors and Semiconductor Devices. Numerical Methods and Simulation / Umberto Ravaioli

BTCS Solution to the Heat Equation

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3

Introduction. Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods. Example: First Order Richardson. Strategy

Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

TMA4125 Matematikk 4N Spring 2017

Chapter 7 Iterative Techniques in Matrix Algebra

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

9.1 Preconditioned Krylov Subspace Methods

The Triangle Algorithm: A Geometric Approach to Systems of Linear Equations

A Hybrid Method for the Wave Equation. beilina

Course Notes: Week 1

(f(x) P 3 (x)) dx. (a) The Lagrange formula for the error is given by

EXAMPLES OF CLASSICAL ITERATIVE METHODS

Solving Linear Systems of Equations

AIMS Exercise Set # 1

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

MA3232 Numerical Analysis Week 9. James Cooley (1926-)

Numerical Methods Process Systems Engineering ITERATIVE METHODS. Numerical methods in chemical engineering Edwin Zondervan

CS412: Lecture #17. Mridul Aanjaneya. March 19, 2015

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Eigenvalues and Eigenvectors

Theory of Iterative Methods

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

Numerical Analysis: Solutions of System of. Linear Equation. Natasha S. Sharma, PhD

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

ME Computational Fluid Mechanics Lecture 5

Solving Linear Systems of Equations

CPE 310: Numerical Analysis for Engineers

Introduction to PDEs and Numerical Methods Tutorial 5. Finite difference methods equilibrium equation and iterative solvers

CAAM 454/554: Stationary Iterative Methods

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Up to this point, our main theoretical tools for finding eigenvalues without using det{a λi} = 0 have been the trace and determinant formulas

30.3. LU Decomposition. Introduction. Prerequisites. Learning Outcomes

Numerical Linear Algebra

Jae Heon Yun and Yu Du Han

Iterative Solution methods

Numerical Methods - Numerical Linear Algebra

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform

Poisson Equation in 2D

Numerical Analysis Fall. Gauss Elimination

CLASSICAL ITERATIVE METHODS

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Gaussian Elimination -(3.1) b 1. b 2., b. b n

Computational Economics and Finance

Computational Fluid Dynamics Prof. Sreenivas Jayanti Department of Computer Science and Engineering Indian Institute of Technology, Madras

Iterative techniques in matrix algebra

Linear Systems of Equations. ChEn 2450

Solving Ax = b, an overview. Program

Introduction to Scientific Computing

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Solution of Linear Equations

Linear Systems of n equations for n unknowns

PDE Based Image Diffusion and AOS

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Chapter 7. Tridiagonal linear systems. Solving tridiagonal systems of equations. and subdiagonal. E.g. a 21 a 22 a A =

Lecture 8: Fast Linear Solvers (Part 7)

Hence a root lies between 1 and 2. Since f a is negative and f(x 0 ) is positive The root lies between a and x 0 i.e. 1 and 1.

Consider the following example of a linear system:

CS 323: Numerical Analysis and Computing

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

PowerPoints organized by Dr. Michael R. Gustafson II, Duke University

Stabilization and Acceleration of Algebraic Multigrid Method

Process Model Formulation and Solution, 3E4

Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018

Preface to the Second Edition. Preface to the First Edition

MATHEMATICS FOR COMPUTER VISION WEEK 2 LINEAR SYSTEMS. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3

Transcription:

Cache Oblivious Stencil Computations S. HUNOLD J. L. TRÄFF F. VERSACI Lectures on High Performance Computing 13 April 2015 F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 1 / 19

References Matteo Frigo and Volker Strumpen. Cache oblivious stencil computations. In: ICS. 2005, pp. 361 366. Matteo Frigo and Volker Strumpen. The memory behavior of cache oblivious stencil computations. In: The Journal of Supercomputing 39.2 (2007), pp. 93 112. F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 2 / 19

Heat Diffusion Problem definition Let x = (x, y, z) be a point in the space Let t be the time Let u(x, t) be the temperature Let α be the thermal diffusivity Heat equation u t = α 2 u ( ) u t = α 2 u x 2 + 2 u y 2 + 2 u z 2 Intuition The Laplacian is a local averaging operator: if the temperature around x is higher than in x, than u(x) will increase accordingly F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 3 / 19

Heat Diffusion Discretization of the unidimensional case We now want to discretize the 1D case: u t = u α 2 x 2 Assuming time horizon τ and length λ, we define: t := τ T x := λ N 1 We approximate the second derivative with the second order finite difference, finally obtaining u(t + t, x) u(t, x) t u(t, x + x) 2u(t, x) + u(t, x x) = α ( x) 2 We also assume some boundary conditions (e.g., fixed values or cyclic space) F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 4 / 19

Heat Diffusion Discretization of the unidimensional case Naive algorithm Consider a matrix V, s.t. V(i, ) := u(i t, x), then V(i + 1, ) = V(i, ) + α t ( ) V(i, + 1) 2V(i, ) + V(i, 1) ( x) 2 We know the initial temperature: V(0, ) We want to compute the temperature at time T: V(T, ) We do not need to keep the whole V in memory (inplace update) We can ust keep the last two rows t and t + 1 by accessing V(i mod 2, ) instead of V(i, ) f o r ( i n t i =0; i < T ; ++i ) f o r ( i n t =0; < N; ++ ) update ( V, ( i +1)%2, ) ; F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 5 / 19

Iterative Methods for Solving Linear Systems We want to solve a linear system: Ax = b With A being a n n matrix We may adopt a direct method (e.g., Gaussian elimination) and compute x = A 1 b This takes O(n 3 ) time and O(n 2 ) space (even if A is sparse) Iterative splitting methods We write A = M + N, with M invertible The equation can be rewritten as x = M 1 (b Nx) Consider the related iteration x (t+1) = M 1 (b Nx (t) ) We are interested in cases in which inverting M is easy (i.e., O(n 2 )) and/or M 1 and N are sparse whenever A is F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 6 / 19

Gauss-Seidel Method Decomposition We decompose A = (a i ) i as A = L + U a 11 0 0 0 a 12 a 1n a L = 21 a 22 0...... U = 0 0 a 2n...... a n1 a n2 a nn 0 0 0 L is lower triangular (diagonal included) and U is upper triangular The iteration is thus x (t+1) = L 1 (b Ux (t) ) Note that n =2 a 1x (t) n =3 a 2x (t) n Ux (t) =. a nn x (t) n 0 = =i+1 a i x (t) i {1,...,n} F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 7 / 19

Gauss-Seidel Method System to be solved We want to solve (for x (t+1) ) a 11 0 0 a 21 a 22 0...... a n1 a n2 a nn Lx (t+1) = b Ux (t) x (t+1) 1 x (t+1) 2. x (t+1) n = q (t) 1 q (t) 2. q (t) n Where q (t) i := b i n =i+1 a i x (t) F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 8 / 19

Gauss-Seidel Method Update iteration Since L is triangular we can solve the system without inverting L, by means of forward substitution: i i a i x (t+1) = q (t) i x (t+1) i = 1 q (t) i 1 a i a i x (t+1) ii =1 Finally, the complete update iteration is x (t+1) i = 1 n b i a i x (t) a ii =i+1 =1 i 1 a i x (t+1) The method converges if and only if all the eigenvalues of the iteration matrix have absolute value less than 1: ρ( L 1 U) < 1 =1 F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 9 / 19

Gauss-Seidel Method Band matrices x (t+1) i = 1 b i a ii When updating x (t) i for > i x (t) x (t+1) i for < i n =i+1 x (t+1) i a i x (t) we use i 1 a i x (t+1) =1 This means that we can update the vector x inplace If the matrix is banded (e.g., tridiagonal), then the value of x (t+1) depends only on some neighbour values F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 10 / 19

Stencil Computations Definition Given a multidimensional array u (0) We construct a sequence of arrays u (t) by updating ( ) u (t+1) (x) = kernel u (t) (y) : y B(x, r) I.e., the new value of u in x is a function of the old values in some neighbourhood of x (of radius r) Typically, u is updated in-place and r is small (e.g., r = 1,..., 3) Examples: PDE, Jacobi and Gauss-Seidel methods, cellular automata, image processing F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 11 / 19

Stencil Computations Naive vs. cache-aware implementations f o r ( t [0, T[ ) f o r ( x X ) update ( u,t + dt,x ) ; Consider a two-level memory with cache size Z and cache line B Let p be the size of computed space ( ) For large p the naive algorithm incurs Θ p B misses (i.e., a miss every time a block is accessed) ( ) p Optimal cache aware algorithms incur Θ misses, n being the number of space dimensions BZ 1 n This can be done by time skewing, i.e., by cleverly exploring the (n + 1)-dimensional timespace F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 12 / 19

Cache Oblivious Algorithm Unidimensional case t w t 1 t t 0 x 0 x 1 x Recursive algorithm to traverse a trapezoid Parameters: t 0, t 1, x 0, x 1, ẋ 0, ẋ 1, ds with ẋ = dx dt and ds = stencil slope Width Average of the parallel sides: w = x 1 x 0 + t(ẋ 1 ẋ 0 ) 2 Volume Number of points in the trapezoid F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 13 / 19

Cache Oblivious Algorithm Unidimensional case Space cut t t 1 T 2 T 1 t 0 x x 0 x 1 x m If wide enough (w 4 t ds), then cut the space The cut is through the center (defined as average of the vertices) The slope of the cut is ds and the mid point x m is x m := x 0 + x 1 2 + t(2 ds + ẋ 0 + ẋ 1 ) 4 F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 14 / 19

Cache Oblivious Algorithm Unidimensional case Time cut t t 1 T 2 t 0 s T 1 x 0 x 1 x If the trapezoid is not wide enough then it is cut horizontally F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 15 / 19

Cache Oblivious Algorithm Multidimensional case n-dimensional trapezoid T(t 0, t 1, x (i) 0, x(i) 1, ẋ(i) 0, ẋ(i) 1 ) It is the set of points (t, x (0),..., x (n 1) ) such that i 0 i < n t 0 t < t 1 x (i) 0 + ẋ (i) 0 (t t 0) x (i) < x (i) 1 + ẋ (i) 1 (t t 0) The proection (t, x (i) ) looks like a unidimensional trapezoid Multidimensional algorithm 1 If possible cut some space dimension 2 Otherwise cut the time dimension F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 16 / 19

Cache Oblivious Algorithm I/O Complexity Lemma Let T be a trapezoid and Vol(T) its (n + 1 dimensional) volume. Let m := min{ t, w 0,..., w n 2 }/2. ) Then the (n-dimensional) surface of Vol(T) has measure O. Theorem ( Vol(T) m Let T be a trapezoid and assume t Ω ( ) ( ) Z n 1 and i w i Ω Z n 1. Then( the number ) of misses incured by the cache oblivious algorithm Vol(T) is O. BZ 1 n F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 17 / 19

Cache Oblivious Algorithm Simulation Problem dimensions = 1 Stencil slope = 1 Block size = 8 elements Vector size = 48 points = 6 blocks Buffer size = 10 blocks Miss time = 30 hit time F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 18 / 19

References Matteo Frigo and Volker Strumpen. Cache oblivious stencil computations. In: ICS. 2005, pp. 361 366. Matteo Frigo and Volker Strumpen. The memory behavior of cache oblivious stencil computations. In: The Journal of Supercomputing 39.2 (2007), pp. 93 112. F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 19 / 19