Parallel Programming in C with MPI and OpenMP

Size: px
Start display at page:

Download "Parallel Programming in C with MPI and OpenMP"

Transcription

1 Parallel Programming in C with MPI and OpenMP Michael J. Quinn

2 Chapter 13 Finite Difference Methods

3 Outline n Ordinary and partial differential equations n Finite difference methods n Vibrating string problem n Steady state heat distribution problem

4 Ordinary and Partial Differential Equations n Ordinary differential equation: equation containing derivatives of a function of one variable in one dimension n Partial differential equation: equation containing derivatives of a function in higher Dimensions where the differentiation occurs in some dimension(s).

5 Examples of Phenomena Modeled by PDEs n Air flow over an aircraft wing n Blood circulation in human body n Water circulation in an ocean n Bridge deformations as its carries traffic n Evolution of a thunderstorm n Oscillations of a skyscraper hit by earthquake n Heat transfer

6 Model of Sea Surface Temperature in Atlantic Ocean Courtesy MICOM group at the Rosenstiel School of Marine and Atmospheric Science, University of Miami

7 Difference Quotients f(x+h/2) f'(x) f(x-h/2) x-h x-h/2 x x+h/2 x+h

8 Formulas for 1st, 2d Derivatives h h x f h x f x f 2) / ( 2) / ( ) '( - - +» h h x f h x f x f 2) / ( 2) / ( ) '( - - +» 2 ) ( ) ( 2 ) ( ) ''( h h x f x f h x f x f »

9 f (x) and f (x) n f (x) often called df/dx ufirst derivative, the rate of change of f in the x direction. If f is location then f is speed. n f (x) often called d 2 f/dx 2 usecond derivative, the rate of change of f in x direction. If f is speed, then f is accelleration.

10 Vibrating String Problem t=0.0 and 1.0 t=0.1 and 0.9 t=0.2 and 0.8 t = 0.3 and 0.7 t=0.4 and 0.6 t= Vibrating string modeled by a hyperbolic PDE

11 Solution Stored in 2-D Matrix n Each row represents state of string at some point in time n Each column shows how position of string at a particular point changes with time n In some cases (jacobi 1D in PA4) we are just interested in the end state (heat transfer) and we keep only 2 vectors (prev and curr)

12 Discrete Space, Time Intervals Lead to 2-D Matrix T u = u(x, t ) i,j i j t 0 a x

13 Heart of Sequential C Program u[j+1][i] = 2.0*(1.0-L)*u[j][i] + L*(u[j][i+1] + u[j][i-1]) - u[j-1][i]; u(i,j+1) u(i-1,j) u(i,j) u(i+1,j) u(i,j-1)

14 Parallel Program Design n Associate primitive task with each element of matrix n Examine communication pattern n Agglomerate tasks in same column n Static number of identical tasks n Regular communication pattern n Strategy: agglomerate columns, assign one block of columns to each task

15 Result of Agglomeration and Mapping

16 Communication Still Needed n Initial values (in lowest row) are computed without communication n Values in black cells cannot be computed without access to values held by other tasks

17 Ghost Points n Ghost points: memory locations used to store redundant copies of data held by neighboring processes n Allocating ghost points as extra columns simplifies parallel algorithm by allowing same loop to update all cells

18 Matrices Augmented with Ghost Points Lilac cells are the ghost points.

19 Communication in an Iteration This iteration the process is responsible for computing the values of the yellow cells.

20 Computation in an Iteration This iteration the process is responsible for computing the values of the yellow cells. The striped cells are the ones accessed as the yellow cell values are computed.

21 Complexity Analysis n Computation time per element is constant, so sequential time complexity per iteration is Q(n) n Elements divided evenly among processes, so parallel computational complexity per iteration is Q(n / p) n During each iteration a process with an interior block sends two messages and receives two messages, so communication complexity per iteration is Q(1)

22 Replicating Computations n If only one value is transmitted, the execution time is dominated by communication (message latency) n We can reduce the number of communications by replicating computations n If we send k values instead of one, we can advance the simulation k time steps before another communication n We call these k values Ghost or Halo elements

23 Replicating Computations Without replication: Step 1 Exchange data With replication: Step 2 Step 1 Exchange data

24 Communication Time vs. Number of Ghost Points Time Ghost Points

25 Steady State Heat Distribution Problem Ice bath Steam Steam Steam

26 Heart of Sequential C Program w[i][j] = (u[i-1][j] + u[i+1][j] + u[i][j-1] + u[i][j+1]) / 4.0; u(i,j+1) u(i-1,j) w(i,j) u(i+1,j) u(i,j-1)

27 Parallel Algorithm 1 n Associate primitive task with each matrix element n Agglomerate tasks in contiguous rows (rowwise block striped decomposition) n Add rows of ghost points above and below rectangular region controlled by process

28 Example Decomposition grid divided among 4 processors

29 Example Decomposition grid divided among 4 processors Here we can also avoid communication by replicated computation (larger ghost buffers)

30 Example Decomposition grid divided among 16 processors

31 Implementation Details n Using ghost points around 2-D blocks requires extra copying steps n Ghost points for left and right sides are not in contiguous memory locations n An auxiliary buffer must be used when receiving these ghost point values n Similarly, buffer must be used when sending column of values to a neighboring process

32 Summary (1) n PDEs used to model behavior of a wide variety of physical systems n Realistic problems yield PDEs too difficult to solve analytically, so scientists solve them numerically n Two most common numerical techniques for solving PDEs u finite element method u finite difference method

33 Summary (2) n Finite different methods umatrix-based methods store matrix explicitly umatrix-free implementations store matrix implicitly n We have designed and analyzed parallel algorithms based on matrix-free implementations

34 Summary (3) n Ghost points store copies of values held by other processes n Explored increasing number of ghost points and replicating computation in order to reduce number of message exchanges n Optimal number of ghost points depends on characteristics of parallel system

Solution of Linear Systems

Solution of Linear Systems Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing

More information

Finite difference methods. Finite difference methods p. 1

Finite difference methods. Finite difference methods p. 1 Finite difference methods Finite difference methods p. 1 Overview 1D heat equation u t = κu xx +f(x,t) as a motivating example Quick intro of the finite difference method Recapitulation of parallelization

More information

Parallel Programming. Parallel algorithms Linear systems solvers

Parallel Programming. Parallel algorithms Linear systems solvers Parallel Programming Parallel algorithms Linear systems solvers Terminology System of linear equations Solve Ax = b for x Special matrices Upper triangular Lower triangular Diagonally dominant Symmetric

More information

Numerically Solving Partial Differential Equations

Numerically Solving Partial Differential Equations Numerically Solving Partial Differential Equations Michael Lavell Department of Applied Mathematics and Statistics Abstract The physics describing the fundamental principles of fluid dynamics can be written

More information

Modelling and implementation of algorithms in applied mathematics using MPI

Modelling and implementation of algorithms in applied mathematics using MPI Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline

More information

Overview: Synchronous Computations

Overview: Synchronous Computations Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

Numerical Solution Techniques in Mechanical and Aerospace Engineering

Numerical Solution Techniques in Mechanical and Aerospace Engineering Numerical Solution Techniques in Mechanical and Aerospace Engineering Chunlei Liang LECTURE 3 Solvers of linear algebraic equations 3.1. Outline of Lecture Finite-difference method for a 2D elliptic PDE

More information

Image Reconstruction And Poisson s equation

Image Reconstruction And Poisson s equation Chapter 1, p. 1/58 Image Reconstruction And Poisson s equation School of Engineering Sciences Parallel s for Large-Scale Problems I Chapter 1, p. 2/58 Outline 1 2 3 4 Chapter 1, p. 3/58 Question What have

More information

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution

More information

Parallel Scientific Computing

Parallel Scientific Computing IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

A Hybrid Method for the Wave Equation. beilina

A Hybrid Method for the Wave Equation.   beilina A Hybrid Method for the Wave Equation http://www.math.unibas.ch/ beilina 1 The mathematical model The model problem is the wave equation 2 u t 2 = (a 2 u) + f, x Ω R 3, t > 0, (1) u(x, 0) = 0, x Ω, (2)

More information

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers

Barrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers Overview: Synchronous Computations Barrier barriers: linear, tree-based and butterfly degrees of synchronization synchronous example : Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization

More information

CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication

CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication 1 / 26 CEE 618 Scientific Parallel Computing (Lecture 7): OpenMP (con td) and Matrix Multiplication Albert S. Kim Department of Civil and Environmental Engineering University of Hawai i at Manoa 2540 Dole

More information

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4 Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix

More information

Earth System Modeling Domain decomposition

Earth System Modeling Domain decomposition Earth System Modeling Domain decomposition Graziano Giuliani International Centre for Theorethical Physics Earth System Physics Section Advanced School on Regional Climate Modeling over South America February

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices*

A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices* A Parallel Algorithm for Computing the Extremal Eigenvalues of Very Large Sparse Matrices* Fredrik Manne Department of Informatics, University of Bergen, N-5020 Bergen, Norway Fredrik. Manne@ii. uib. no

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers

More information

Solution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI

Solution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI Solution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI Sagar Bhatt Person Number: 50170651 Department of Mechanical and Aerospace Engineering,

More information

The Sieve of Erastothenes

The Sieve of Erastothenes The Sieve of Erastothenes Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 25, 2010 José Monteiro (DEI / IST) Parallel and Distributed

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel Laura Grigori Mark Frederick Hoemmen Julien Langou Electrical Engineering and Computer Sciences University of California at

More information

Efficiently solving large sparse linear systems on a distributed and heterogeneous grid by using the multisplitting-direct method

Efficiently solving large sparse linear systems on a distributed and heterogeneous grid by using the multisplitting-direct method Efficiently solving large sparse linear systems on a distributed and heterogeneous grid by using the multisplitting-direct method S. Contassot-Vivier, R. Couturier, C. Denis and F. Jézéquel Université

More information

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem

Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Massively parallel semi-lagrangian solution of the 6d Vlasov-Poisson problem Katharina Kormann 1 Klaus Reuter 2 Markus Rampp 2 Eric Sonnendrücker 1 1 Max Planck Institut für Plasmaphysik 2 Max Planck Computing

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Structured and Low Rank Matrices Section 6.3 Numerical Optimization Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at

More information

Computational Seismology: An Introduction

Computational Seismology: An Introduction Computational Seismology: An Introduction Aim of lecture: Understand why we need numerical methods to understand our world Learn about various numerical methods (finite differences, pseudospectal methods,

More information

MATH 333: Partial Differential Equations

MATH 333: Partial Differential Equations MATH 333: Partial Differential Equations Problem Set 9, Final version Due Date: Tues., Nov. 29, 2011 Relevant sources: Farlow s book: Lessons 9, 37 39 MacCluer s book: Chapter 3 44 Show that the Poisson

More information

Multigrid Methods and their application in CFD

Multigrid Methods and their application in CFD Multigrid Methods and their application in CFD Michael Wurst TU München 16.06.2009 1 Multigrid Methods Definition Multigrid (MG) methods in numerical analysis are a group of algorithms for solving differential

More information

Parallel LU Decomposition (PSC 2.3) Lecture 2.3 Parallel LU

Parallel LU Decomposition (PSC 2.3) Lecture 2.3 Parallel LU Parallel LU Decomposition (PSC 2.3) 1 / 20 Designing a parallel algorithm Main question: how to distribute the data? What data? The matrix A and the permutation π. Data distribution + sequential algorithm

More information

SWE Anatomy of a Parallel Shallow Water Code

SWE Anatomy of a Parallel Shallow Water Code SWE Anatomy of a Parallel Shallow Water Code CSCS-FoMICS-USI Summer School on Computer Simulations in Science and Engineering Michael Bader July 8 19, 2013 Computer Simulations in Science and Engineering,

More information

Linear Algebra I Lecture 8

Linear Algebra I Lecture 8 Linear Algebra I Lecture 8 Xi Chen 1 1 University of Alberta January 25, 2019 Outline 1 2 Gauss-Jordan Elimination Given a system of linear equations f 1 (x 1, x 2,..., x n ) = 0 f 2 (x 1, x 2,..., x n

More information

Cache Oblivious Stencil Computations

Cache Oblivious Stencil Computations Cache Oblivious Stencil Computations S. HUNOLD J. L. TRÄFF F. VERSACI Lectures on High Performance Computing 13 April 2015 F. Versaci (TU Wien) Cache Oblivious Stencil Computations 13 April 2015 1 / 19

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/

More information

ECC for NAND Flash. Osso Vahabzadeh. TexasLDPC Inc. Flash Memory Summit 2017 Santa Clara, CA 1

ECC for NAND Flash. Osso Vahabzadeh. TexasLDPC Inc. Flash Memory Summit 2017 Santa Clara, CA 1 ECC for NAND Flash Osso Vahabzadeh TexasLDPC Inc. 1 Overview Why Is Error Correction Needed in Flash Memories? Error Correction Codes Fundamentals Low-Density Parity-Check (LDPC) Codes LDPC Encoding and

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Laboratory #3: Linear Algebra. Contents. Grace Yung

Laboratory #3: Linear Algebra. Contents. Grace Yung Laboratory #3: Linear Algebra Grace Yung Contents List of Problems. Introduction. Objectives.2 Prerequisites 2. Linear Systems 2. What is a Matrix 2.2 Quick Review 2.3 Gaussian Elimination 2.3. Decomposition

More information

5. Solving the Bellman Equation

5. Solving the Bellman Equation 5. Solving the Bellman Equation In the next two lectures, we will look at several methods to solve Bellman s Equation (BE) for the stochastic shortest path problem: Value Iteration, Policy Iteration and

More information

arxiv: v1 [hep-lat] 7 Oct 2010

arxiv: v1 [hep-lat] 7 Oct 2010 arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

where is the Laplace operator and is a scalar function.

where is the Laplace operator and is a scalar function. Elliptic PDEs A brief discussion of two important elliptic PDEs. In mathematics, Laplace's equation is a second-order partial differential equation named after Pierre-Simon Laplace who first studied its

More information

Lecture 38 Insulated Boundary Conditions

Lecture 38 Insulated Boundary Conditions Lecture 38 Insulated Boundary Conditions Insulation In many of the previous sections we have considered fixed boundary conditions, i.e. u(0) = a, u(l) = b. We implemented these simply by assigning u j

More information

Poisson Equation in 2D

Poisson Equation in 2D A Parallel Strategy Department of Mathematics and Statistics McMaster University March 31, 2010 Outline Introduction 1 Introduction Motivation Discretization Iterative Methods 2 Additive Schwarz Method

More information

SPECIAL PROJECT PROGRESS REPORT

SPECIAL PROJECT PROGRESS REPORT SPECIAL PROJECT PROGRESS REPORT Reporting year 2015 Project Title: Potential sea-ice predictability with a high resolution Arctic sea ice-ocean model Computer Project Account: Principal Investigator(s):

More information

The Design, Implementation, and Evaluation of a Symmetric Banded Linear Solver for Distributed-Memory Parallel Computers

The Design, Implementation, and Evaluation of a Symmetric Banded Linear Solver for Distributed-Memory Parallel Computers The Design, Implementation, and Evaluation of a Symmetric Banded Linear Solver for Distributed-Memory Parallel Computers ANSHUL GUPTA and FRED G. GUSTAVSON IBM T. J. Watson Research Center MAHESH JOSHI

More information

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA 2008-09 Salvatore Orlando 1 0-1 Knapsack problem N objects, j=1,..,n Each kind of item j has a value p j and a weight w j (single

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms

More information

Tensor Network Computations in Quantum Chemistry. Charles F. Van Loan Department of Computer Science Cornell University

Tensor Network Computations in Quantum Chemistry. Charles F. Van Loan Department of Computer Science Cornell University Tensor Network Computations in Quantum Chemistry Charles F. Van Loan Department of Computer Science Cornell University Joint work with Garnet Chan, Department of Chemistry and Chemical Biology, Cornell

More information

c 1999 Society for Industrial and Applied Mathematics

c 1999 Society for Industrial and Applied Mathematics SIAM J SCI COMPUT Vol 20, No 6, pp 2261 2281 c 1999 Society for Industrial and Applied Mathematics NEW PARALLEL SOR METHOD BY DOMAIN PARTITIONING DEXUAN XIE AND LOYCE ADAMS Abstract In this paper we propose

More information

REQUEST FOR INFORMATION (RFI) HYCOM. Criteria #1: A strong partnership between the developers of the model and the CESM community

REQUEST FOR INFORMATION (RFI) HYCOM. Criteria #1: A strong partnership between the developers of the model and the CESM community REQUEST FOR INFORMATION (RFI) HYCOM Brief overview The HYbrid Coordinate Ocean Model (Bleck, 2002; Chassignet et al., 2003) is a global non- Boussinesq ocean circulation model that utilizes a generalized

More information

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58 Background C. T. Kelley NC State University tim kelley@ncsu.edu C. T. Kelley Background NCSU, Spring 2012 1 / 58 Notation vectors, matrices, norms l 1 : max col sum... spectral radius scaled integral norms

More information

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia

Crossing the Chasm. On the Paths to Exascale: Presented by Mike Rezny, Monash University, Australia On the Paths to Exascale: Crossing the Chasm Presented by Mike Rezny, Monash University, Australia michael.rezny@monash.edu Crossing the Chasm meeting Reading, 24 th October 2016 Version 0.1 In collaboration

More information

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed

More information

A method to reduce load imbalances in simulations of phase change processes with FS3D

A method to reduce load imbalances in simulations of phase change processes with FS3D A method to reduce load imbalances in simulations of phase change processes with FS3D Johannes Müller Philipp Offenhäuser Martin Reitzle Workshop on Sustained Simulation Performance HLRS, Stuttgart, Germany

More information

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C. Lecture 9 Approximations of Laplace s Equation, Finite Element Method Mathématiques appliquées (MATH54-1) B. Dewals, C. Geuzaine V1.2 23/11/218 1 Learning objectives of this lecture Apply the finite difference

More information

Exam Spring Embedded Systems. Prof. L. Thiele

Exam Spring Embedded Systems. Prof. L. Thiele Exam Spring 20 Embedded Systems Prof. L. Thiele NOTE: The given solution is only a proposal. For correctness, completeness, or understandability no responsibility is taken. Sommer 20 Eingebettete Systeme

More information

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3 Pivoting Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3 In the previous discussions we have assumed that the LU factorization of A existed and the various versions could compute it in a stable manner.

More information

Iterative Solvers. Lab 6. Iterative Methods

Iterative Solvers. Lab 6. Iterative Methods Lab 6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require

More information

sbpom: A parallel implementation of Princenton Ocean

sbpom: A parallel implementation of Princenton Ocean Manuscript Click here to view linked References 0 0 0 0 0 sbpom: A parallel implementation of Princenton Ocean Model Antoni Jordi and Dong-Ping Wang Institut Mediterrani d Estudis Avançats, IMEDEA (UIB-CSIC),

More information

Convergence Models and Surprising Results for the Asynchronous Jacobi Method

Convergence Models and Surprising Results for the Asynchronous Jacobi Method Convergence Models and Surprising Results for the Asynchronous Jacobi Method Jordi Wolfson-Pou School of Computational Science and Engineering Georgia Institute of Technology Atlanta, Georgia, United States

More information

Kasetsart University Workshop. Multigrid methods: An introduction

Kasetsart University Workshop. Multigrid methods: An introduction Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available

More information

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations Sparse Linear Systems Iterative Methods for Sparse Linear Systems Matrix Computations and Applications, Lecture C11 Fredrik Bengzon, Robert Söderlund We consider the problem of solving the linear system

More information

LU Factorization a 11 a 1 a 1n A = a 1 a a n (b) a n1 a n a nn L = l l 1 l ln1 ln 1 75 U = u 11 u 1 u 1n 0 u u n 0 u n...

LU Factorization a 11 a 1 a 1n A = a 1 a a n (b) a n1 a n a nn L = l l 1 l ln1 ln 1 75 U = u 11 u 1 u 1n 0 u u n 0 u n... .. Factorizations Reading: Trefethen and Bau (1997), Lecture 0 Solve the n n linear system by Gaussian elimination Ax = b (1) { Gaussian elimination is a direct method The solution is found after a nite

More information

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters

Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters Towards a highly-parallel PDE-Solver using Adaptive Sparse Grids on Compute Clusters HIM - Workshop on Sparse Grids and Applications Alexander Heinecke Chair of Scientific Computing May 18 th 2011 HIM

More information

Eddy-resolving Simulation of the World Ocean Circulation by using MOM3-based OGCM Code (OFES) Optimized for the Earth Simulator

Eddy-resolving Simulation of the World Ocean Circulation by using MOM3-based OGCM Code (OFES) Optimized for the Earth Simulator Chapter 1 Atmospheric and Oceanic Simulation Eddy-resolving Simulation of the World Ocean Circulation by using MOM3-based OGCM Code (OFES) Optimized for the Earth Simulator Group Representative Hideharu

More information

4th year Project demo presentation

4th year Project demo presentation 4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Lecture 11 Linear programming : The Revised Simplex Method

Lecture 11 Linear programming : The Revised Simplex Method Lecture 11 Linear programming : The Revised Simplex Method 11.1 The Revised Simplex Method While solving linear programming problem on a digital computer by regular simplex method, it requires storing

More information

Solving a non-linear partial differential equation for the simulation of tumour oxygenation

Solving a non-linear partial differential equation for the simulation of tumour oxygenation Solving a non-linear partial differential equation for the simulation of tumour oxygenation Julian Köllermeier, Lisa Kusch, Thorsten Lajewski MathCCES, RWTH Aachen Talk at Karolinska Institute, 19.09.2011

More information

Transposition Mechanism for Sparse Matrices on Vector Processors

Transposition Mechanism for Sparse Matrices on Vector Processors Transposition Mechanism for Sparse Matrices on Vector Processors Pyrrhos Stathis Stamatis Vassiliadis Sorin Cotofana Electrical Engineering Department, Delft University of Technology, Delft, The Netherlands

More information

James Demmel. CS 267: Applications of Parallel Computers. Lecture 17 - Structured Grids. Motifs

James Demmel. CS 267: Applications of Parallel Computers. Lecture 17 - Structured Grids. Motifs Motifs CS 267: Applications of Parallel Computers Lecture 17 - Structured Grids Topic of this lecture The Motifs (formerly Dwarfs ) from The Berkeley View (Asanovic et al.) Motifs form key computational

More information

Numerical Solution of Laplace's Equation

Numerical Solution of Laplace's Equation Syracuse University SURFACE Electrical Engineering and Computer Science Technical Reports College of Engineering and Computer Science 9-1992 Numerical Solution of Laplace's Equation Per Brinch Hansen Syracuse

More information

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

Porting a sphere optimization program from LAPACK to ScaLAPACK

Porting a sphere optimization program from LAPACK to ScaLAPACK Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference

More information

Lecture 4. Writing parallel programs with MPI Measuring performance

Lecture 4. Writing parallel programs with MPI Measuring performance Lecture 4 Writing parallel programs with MPI Measuring performance Announcements Wednesday s office hour moved to 1.30 A new version of Ring (Ring_new) that handles linear sequences of message lengths

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

High-resolution finite volume methods for hyperbolic PDEs on manifolds

High-resolution finite volume methods for hyperbolic PDEs on manifolds High-resolution finite volume methods for hyperbolic PDEs on manifolds Randall J. LeVeque Department of Applied Mathematics University of Washington Supported in part by NSF, DOE Overview High-resolution

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra

Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Laura Grigori Abstract Modern, massively parallel computers play a fundamental role in a large and

More information

Module 3: BASICS OF CFD. Part A: Finite Difference Methods

Module 3: BASICS OF CFD. Part A: Finite Difference Methods Module 3: BASICS OF CFD Part A: Finite Difference Methods THE CFD APPROACH Assembling the governing equations Identifying flow domain and boundary conditions Geometrical discretization of flow domain Discretization

More information

The Performance Evolution of the Parallel Ocean Program on the Cray X1

The Performance Evolution of the Parallel Ocean Program on the Cray X1 The Performance Evolution of the Parallel Ocean Program on the Cray X1 Patrick H. Worley Oak Ridge National Laboratory John Levesque Cray Inc. 46th Cray User Group Conference May 18, 2003 Knoxville Marriott

More information

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8

More information

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns

5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns 5.7 Cramer's Rule 1. Using Determinants to Solve Systems Assumes the system of two equations in two unknowns (1) possesses the solution and provided that.. The numerators and denominators are recognized

More information

An introduction to parallel algorithms

An introduction to parallel algorithms An introduction to parallel algorithms Knut Mørken Department of Informatics Centre of Mathematics for Applications University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 1/26

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

5. Surface Temperatures

5. Surface Temperatures 5. Surface Temperatures For this case we are given a thin rectangular sheet of metal, whose temperature we can conveniently measure at any points along its four edges. If the edge temperatures are held

More information

Parallel Sparse Matrix Vector Multiplication (PSC 4.3)

Parallel Sparse Matrix Vector Multiplication (PSC 4.3) Parallel Sparse Matrix Vector Multiplication (PSC 4.) original slides by Rob Bisseling, Universiteit Utrecht, accompanying the textbook Parallel Scientific Computation adapted for the lecture HPC Algorithms

More information

Structural Dynamics Lecture 7. Outline of Lecture 7. Multi-Degree-of-Freedom Systems (cont.) System Reduction. Vibration due to Movable Supports.

Structural Dynamics Lecture 7. Outline of Lecture 7. Multi-Degree-of-Freedom Systems (cont.) System Reduction. Vibration due to Movable Supports. Outline of Multi-Degree-of-Freedom Systems (cont.) System Reduction. Truncated Modal Expansion with Quasi-Static Correction. Guyan Reduction. Vibration due to Movable Supports. Earthquake Excitations.

More information

Towards Highly Parallel and Compute-Bound Computation of Eigenvectors of Matrices in Schur Form

Towards Highly Parallel and Compute-Bound Computation of Eigenvectors of Matrices in Schur Form Towards Highly Parallel and Compute-Bound Computation of Eigenvectors of Matrices in Schur Form Björn Adlerborn, Carl Christian Kjelgaard Mikkelsen, Lars Karlsson, Bo Kågström May 30, 2017 Abstract In

More information

More Science per Joule: Bottleneck Computing

More Science per Joule: Bottleneck Computing More Science per Joule: Bottleneck Computing Georg Hager Erlangen Regional Computing Center (RRZE) University of Erlangen-Nuremberg Germany PPAM 2013 September 9, 2013 Warsaw, Poland Motivation (1): Scalability

More information

632 CHAP. 11 EIGENVALUES AND EIGENVECTORS. QR Method

632 CHAP. 11 EIGENVALUES AND EIGENVECTORS. QR Method 632 CHAP 11 EIGENVALUES AND EIGENVECTORS QR Method Suppose that A is a real symmetric matrix In the preceding section we saw how Householder s method is used to construct a similar tridiagonal matrix The

More information

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y) 5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h

More information

Finite difference method for elliptic problems: I

Finite difference method for elliptic problems: I Finite difference method for elliptic problems: I Praveen. C praveen@math.tifrbng.res.in Tata Institute of Fundamental Research Center for Applicable Mathematics Bangalore 560065 http://math.tifrbng.res.in/~praveen

More information

A simple FEM solver and its data parallelism

A simple FEM solver and its data parallelism A simple FEM solver and its data parallelism Gundolf Haase Institute for Mathematics and Scientific Computing University of Graz, Austria Chile, Jan. 2015 Partial differential equation Considered Problem

More information

REDISTRIBUTION OF TENSORS FOR DISTRIBUTED CONTRACTIONS

REDISTRIBUTION OF TENSORS FOR DISTRIBUTED CONTRACTIONS REDISTRIBUTION OF TENSORS FOR DISTRIBUTED CONTRACTIONS THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of the Ohio State University By

More information

Numerical Methods Process Systems Engineering ITERATIVE METHODS. Numerical methods in chemical engineering Edwin Zondervan

Numerical Methods Process Systems Engineering ITERATIVE METHODS. Numerical methods in chemical engineering Edwin Zondervan IERAIVE MEHODS Numerical methods in chemical engineering Edwin Zondervan 1 OVERVIEW Iterative methods for large systems of equations We will solve Laplace s equation for steady state heat conduction LAPLACE

More information

MATHEMATICAL ENGINEERING TECHNICAL REPORTS

MATHEMATICAL ENGINEERING TECHNICAL REPORTS MATHEMATICAL ENGINEERING TECHNICAL REPORTS Combinatorial Relaxation Algorithm for the Entire Sequence of the Maximum Degree of Minors in Mixed Polynomial Matrices Shun SATO (Communicated by Taayasu MATSUO)

More information