Loop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling.

Size: px
Start display at page:

Download "Loop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling."

Transcription

1 Advanced Topics Which Loops are Parallel? review Optimization for parallel machines and memory hierarchies Last Time Dependence analysis Today Loop transformations An example - McKinley, Carr, Tseng loop transformations to improve cache performance do = 1, N do J = 1, N S 1 A(,J) = A(,J-1) + 1 do = 1, N do J = 1, N S 2 A(,J) = A(-1,J-1) + 1 do = 1, N S 3 do J = 1, N B(,J) = B(-1,J+1) + 1 J A dependence D = (d 1,...,d k ) is carried at level i, if d i is the first nonzero element of the distance/direction vector. A loop l i is parallel, if a dependence D j carried at level i. Either distance vector direction vector D j d 1,...,d i 1 > 0 d 1,...,d i 1 = < OR d 1,...,d i = 0 d 1,...,d i = = CS 380C Lecture 24 1 Locality Analysis CS 380C Lecture 24 2 Locality Analysis

2 Loop Transformations Loop nterchange Taxonomy Loop unrolling Loop interchange Loop fusion do = 1, N do J = 1, N S 1 A(,J) = A(-1,J) + 1 enddo enddo Loop distribution (a.k.a. fission) Loop skewing Strip mine and interchange (a.k.a. tiling & blocking) Unroll-and-jam (a variety of tiling) do = 1, N do J = 1, N S 2 B(,J) = B(-1,J+1) + 1 enddo enddo J Loop reversal Loop interchange is safe iff it does not reverse the execution order of the source and sink of any dependence in the nest, i.e., if the distance vector would become negative. Enables parallelization of outer and/or inner loops Changes execution order of the statements Can improve reuse J CS 380C Lecture 24 3 Locality Analysis CS 380C Lecture 24 4 Locality Analysis

3 Loop Fusion Loop Distribution s 1 s 2 s 1 s 2 = loop fusion = do i = 2, n a(i) = b(i) do i = 2, n c(i) = b(i) a(i-1) do i = 2, n a(i) = b(i) c(i) = b(i) a(i-1) = loop distribution = s 1 s 2 s 2 s 1 do i = 2, n a(i) = b(i) c(i) = b(i) a(i+1) do i = 2, n c(i) = b(i) a(i+1) do i = 2, n a(i) = b(i) = loop distribution = Loop Fusion is safe iff no forward dependence between nests becomes a backward loop carried dependence. Would fusion be safe if s 2 referenced a(i+1)? Benefits Reuse Eliminates synchronization between parallel loops Reduced loop overhead Loop Distribution is safe iff statements involved in a cycle of dependences (recurrence) remain in the same loop, & if a dependence between two statements placed in different loops, it must be forward. Benefits Partial parallelization Reduces resource requirements Enables other transformations CS 380C Lecture 24 5 Locality Analysis CS 380C Lecture 24 6 Locality Analysis

4 Loop Skewing and nterchange Strip Mine and nterchange do = 1, 100 do J = 2, 100 A(,J) = A(-1,J) + A(,J-1) do = 1, 100 do J = +2, i+100 A(,J) = A(-1,J) + A(,J-1) = Strip Mine = do = 1, n do J = 1, n A(J,) = B(J) C() = nterchange = do = 1, n, tile do =, + tile -1 do J = 1, n A(J,) = B(J) C() do = 1, n, tile do J = 1, n do =, + tile -1 A(J,) = B(J) C() J J Loop skewing is always safe. The original dependences, (1,0) & (0,1) prevent interchange and parallelization. t changes the dependences to (1,1) & (0,1) and after interchange, (1,1) & (1,0), the inner loops is parallel. Enables inner loop parallelization J Strip Mining is always safe. With interchange it enables loop invariant reuse by changing the shape of the iteration space J CS 380C Lecture 24 7 Locality Analysis CS 380C Lecture 24 8 Locality Analysis

5 mproving Reuse with Loop Transformations Motivation: Enable portable programming without sacrificing performance optimization framework cache model compound loop transformation algorithm permutation distribution results transformation simulation performance fusion reversal Optimization Framework 1. improve order of memory accesses to exploit all levels of the memory hierarchy = cache line size 2. Tile to fit in cache, second level cache, TLB = size of cache(s), replacement policy, associativity, 3. register tiling via unroll-and-jam and scalar replacement = number and type of registers Assumptions cls - the cache line size in terms of data items Fortran column-major order interference occurs rarely for small numbers of inner loop iterations CS 380C Lecture 24 9 Locality Analysis CS 380C Lecture Locality Analysis

6 Reuse temporal locality spatial locality reuse of a specific location reuse of adjacent locations (cache lines) do j = 1, 100 do k = 1, 100 do i = 1, 100 c(i,j) = c(i,j) + a(i,k) To Determine Temporal and Spatial Reuse: b(k,j) Loop Cost of dmxpy From Linpackd Loop Cost in cache lines, cls = 4 do j = 1, n2 do i = 1, n1 y(i)= y(i) + x(j) m(i,j) reference group loop i loop j y(i) 1/4 n1 n2 1 n1 x(j) 1 n2 1/4 n2 n1 m(i,j) 1/4 n1 n2 n2 n1 total loop cost 1/2 n1 n2 + n2 5/4 n1 n2 + n1 for each loop l in a nest, consider l innermost group references with temporal locality reference groups compute the cost in cache lines accessed loop cost rank the loops based on their cost = memory order CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis

7 Loop permutation key insight: f loop l promotes more reuse than loop k at the innermost position, then it will also promote more reuse at any outer position. memory order (a) is it legal? (b) if not find a nearbypermutation O(n 2 ) = avoids combinatorial search present in other algorithms NearbyPermutation nput: O = {i 1,i 2,...,i n }, the original loop ordering DV = set of original legal direction vectors for l n L = {i σ1,i σ2,...,i σn }, a permutation ofo Output: P a nearby permutation ofo Algorithm: P = /0 ; k = 0 ; m = n whilel /0 for j = 1,m l = l j L if direction vectors for {p 1,..., p k,l} are legal P = {p 1,..., p k,l} L =L {l} ; k = k+1 ; m = m 1 break for endif endfor endwhile CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis

8 Matrix Multiply - execution times in seconds 60 Execution Times (in seconds) vs. Loop Organization Sun Sparc2 ntel i860 BM RS/6000 JK KJ JK JK KJ KJ - ORDER Loop Fusion Erlebacher Fortran 90 loops for AD ntegration DO = 2, N X(,1:N) = X(,1:N) - X(-1,1:N) A(,1:N)/B(-1,1:N) B(,1:N) = B(,1:N) - A(,1:N) A(,1:N)/B(-1,1:N) DO = 2, N simple translation to Fortran 77 DO K = 1, N X(,K) = X(,K) - X(-1,K) A(,K)/B(-1,K) DO K = 1, N B(,K) = B(,K) - A(,K) A(,K)/B(-1,K) DO K = 1, N loop fusion & interchange DO = 2, N X(,K) = X(,K) - X(-1,K) A(,K)/B(-1,K) B(,K) = B(,K) - A(,K) A(,K)/B(-1,K) Execution times in seconds. Memory Order Processor Original Distributed Fused Sun Sparc ntel i BM RS Sun Sparc2 ntel i860 BM RS/6000 JK KJ JK JK KJ KJ - ORDER CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis

9 Loop Distribution Cholesky Factorization Algorithm Summary DO K = 1,N DO K = 1,N A(K,K) = SQRT(A(K,K)) A(K,K) = SQRT(A(K,K)) DO = K+1,N DO = K+1,N A(,K) = A(,K)/A(K,K) A(,K) = A(,K)/A(K,K) DO J = K+1, DO J = K,N A(,J) = A(,K)*A(J,K) DO =J+1,N A(,J) = A(,K)*A(J,K) 12 = loop distribution & triangular interchange = 10 8 Execution times (in seconds) for each nestl in a set of adjacent nests compute reference groups for each l i compute loop cost for each l i and sort permutation with reversal? fuse inner loops and permute? distribute and permute? 6 fuse nestsl? Sun Sparc2 ntel i860 BM RS/6000 KJ JK KJ KJ JK JK - ORDER CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis

10 Number of Programs Results Achieving Memory Order for Loop Nests test suite Perfect Benchmarks SPEC Benchmarks NAS Benchmarks 4 additional programs 16 experiments on our ability to transform programs simulated hit rates for RS/6000 and i860 execution times on an RS/ Original Final <= 20 >=40 >= 60 >= 70 >=80 >= 90 Percentage of Loop Nests in Memory Order CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis

11 Number of Programs Achieving Memory Order for nner Loops Simulated Cache Hit Rates cache1: RS/ K cache, 4-way, 128 byte cache line cache2: i860 8K cache, 2-way, 32 byte cache line for RS/ Original Final <= 20 >=40 >= 60 >= 70 >=80 >= 90 Percent of nner Loops in Memory Order = in 12 of 27 programs, optimized procedures started with 100% hit rates = average hit rate for original programs Memory Order Opt. Proc. Hit Rates Perfect Org Prm Fail Cache 1 Cache 2 Club % org final org final adm arc2d bdna dyfesm flo mdg mg3d ocean qcd spec track trfd CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis

12 Performance Results in Seconds on RS6000 Program Original Transformed Speedup arc2d dyfesm flo dnasa7 (btrix) dnasa7 (emit) dnasa7 (gmtry) dnasa7(vpenta) applu appsp linpackd simple wave Summary Recap of Transformation Results 80 % of nests were permuted into memory order 85 % of inner loops were permuted into memory order loop permutation is the most effective optimization 229 candidates for fusion, resulting in 80 nests 23 nests were distributed, resulting in 52 nests Observations many programs started out with high hit ratios smaller cache sizes result in higher improvements in hit rates = regardless of the original target architecture, compiler optimizations improve locality for uniprocessors CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis

13 Next Time Compiling for an EDGE Arcuitecture Read: D. Burger et al., Compiling for TRPS: Scaling to the End of Silicon with EDGE Architectures, EEE Computer, pp , July CS 380C Lecture Locality Analysis

Dependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo

Dependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo Dependence Analysis Dependence Examples Last Time: Brief introduction to interprocedural analysis Today: Optimization for parallel machines and memory hierarchies Dependence analysis Loop transformations

More information

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Loop Transformations Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Today Recall stencil computations Intro to loop transformations Data dependencies between

More information

Advanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen

Advanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen Advanced Restructuring Compilers Advanced Topics Spring 2009 Prof. Robert van Engelen Overview Data and control dependences The theory and practice of data dependence analysis K-level loop-carried dependences

More information

Fortran program + Partial data layout specifications Data Layout Assistant.. regular problems. dynamic remapping allowed Invoked only a few times Not part of the compiler Can use expensive techniques HPF

More information

Improving Memory Hierarchy Performance Through Combined Loop. Interchange and Multi-Level Fusion

Improving Memory Hierarchy Performance Through Combined Loop. Interchange and Multi-Level Fusion Improving Memory Hierarchy Performance Through Combined Loop Interchange and Multi-Level Fusion Qing Yi Ken Kennedy Computer Science Department, Rice University MS-132 Houston, TX 77005 Abstract Because

More information

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1 Loop Scheduling and Software Pipelining 2008-04-24 \course\cpeg421-08s\topic-7.ppt 1 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: 2008-04-24 \course\cpeg421-08s\topic-7.ppt

More information

Saturday, April 23, Dependence Analysis

Saturday, April 23, Dependence Analysis Dependence Analysis Motivating question Can the loops on the right be run in parallel? i.e., can different processors run different iterations in parallel? What needs to be true for a loop to be parallelizable?

More information

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution

More information

Loop Parallelization Techniques and dependence analysis

Loop Parallelization Techniques and dependence analysis Loop Parallelization Techniques and dependence analysis Data-Dependence Analysis Dependence-Removing Techniques Parallelizing Transformations Performance-enchancing Techniques 1 When can we run code in

More information

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 3 13 January 2009 Acknowledgments Slides

More information

A Polynomial-Time Algorithm for Memory Space Reduction

A Polynomial-Time Algorithm for Memory Space Reduction A Polynomial-Time Algorithm for Memory Space Reduction Yonghong Song Cheng Wang Zhiyuan Li Sun Microsystems, Inc. Department of Computer Sciences 4150 Network Circle Purdue University Santa Clara, CA 95054

More information

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =...

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =... 15-745 Lecture 7 Data Dependence in Loops 2 Complex spaces Delta Test Merging vectors Copyright Seth Goldstein, 2008-9 Based on slides from Allen&Kennedy 1 The General Problem S 1 A(f 1 (i 1,,i n ),,f

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

1 Problem 1 Solution. 1.1 Common Mistakes. In order to show that the matrix L k is the inverse of the matrix M k, we need to show that

1 Problem 1 Solution. 1.1 Common Mistakes. In order to show that the matrix L k is the inverse of the matrix M k, we need to show that 1 Problem 1 Solution In order to show that the matrix L k is the inverse of the matrix M k, we need to show that Since we need to show that Since L k M k = I (or M k L k = I). L k = I + m k e T k, M k

More information

COMP 515: Advanced Compilation for Vector and Parallel Processors

COMP 515: Advanced Compilation for Vector and Parallel Processors COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 4 1 September, 2011 1 Acknowledgments Slides

More information

' $ Dependence Analysis & % 1

' $ Dependence Analysis & % 1 Dependence Analysis 1 Goals - determine what operations can be done in parallel - determine whether the order of execution of operations can be altered Basic idea - determine a partial order on operations

More information

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =...

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =... 15-745 Lecture 7 Data Dependence in Loops 2 Delta Test Merging vectors Copyright Seth Goldstein, 2008 The General Problem S 1 A(f 1 (i 1,,i n ),,f m (i 1,,i n )) = S 2 = A(g 1 (i 1,,i n ),,g m (i 1,,i

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast

More information

Homework #2: assignments/ps2/ps2.htm Due Thursday, March 7th.

Homework #2:   assignments/ps2/ps2.htm Due Thursday, March 7th. Homework #2: http://www.cs.cornell.edu/courses/cs612/2002sp/ assignments/ps2/ps2.htm Due Thursday, March 7th. 1 Transformations and Dependences 2 Recall: Polyhedral algebra tools for determining emptiness

More information

Parallel Scientific Computing

Parallel Scientific Computing IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.

More information

Practical Dependence Testing *

Practical Dependence Testing * Practical Dependence Testing * Gina Goff Ken Kennedy Cllau-Wen Tseng Department of Computer Science Rice Houston, University TX 77251-1892 Abstract Precise and efficient dependence tests are essential

More information

CS 161 Summer 2009 Homework #2 Sample Solutions

CS 161 Summer 2009 Homework #2 Sample Solutions CS 161 Summer 2009 Homework #2 Sample Solutions Regrade Policy: If you believe an error has been made in the grading of your homework, you may resubmit it for a regrade. If the error consists of more than

More information

Compiler Optimisation

Compiler Optimisation Compiler Optimisation 8 Dependence Analysis Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This

More information

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Outline 1 midterm exam on Friday 11 July 2014 policies for the first part 2 questions with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Intro

More information

Automatic Loop Interchange

Automatic Loop Interchange RETROSPECTIVE: Automatic Loop Interchange Randy Allen Catalytic Compilers 1900 Embarcadero Rd, #206 Palo Alto, CA. 94043 jra@catacomp.com Ken Kennedy Department of Computer Science Rice University Houston,

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 2 Systems of Linear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

MATH 3511 Lecture 1. Solving Linear Systems 1

MATH 3511 Lecture 1. Solving Linear Systems 1 MATH 3511 Lecture 1 Solving Linear Systems 1 Dmitriy Leykekhman Spring 2012 Goals Review of basic linear algebra Solution of simple linear systems Gaussian elimination D Leykekhman - MATH 3511 Introduction

More information

Solving Dense Linear Systems I

Solving Dense Linear Systems I Solving Dense Linear Systems I Solving Ax = b is an important numerical method Triangular system: [ l11 l 21 if l 11, l 22 0, ] [ ] [ ] x1 b1 = l 22 x 2 b 2 x 1 = b 1 /l 11 x 2 = (b 2 l 21 x 1 )/l 22 Chih-Jen

More information

Engineering Computation

Engineering Computation Engineering Computation Systems of Linear Equations_1 1 Learning Objectives for Lecture 1. Motivate Study of Systems of Equations and particularly Systems of Linear Equations. Review steps of Gaussian

More information

Topic 17. Analysis of Algorithms

Topic 17. Analysis of Algorithms Topic 17 Analysis of Algorithms Analysis of Algorithms- Review Efficiency of an algorithm can be measured in terms of : Time complexity: a measure of the amount of time required to execute an algorithm

More information

2.1 Gaussian Elimination

2.1 Gaussian Elimination 2. Gaussian Elimination A common problem encountered in numerical models is the one in which there are n equations and n unknowns. The following is a description of the Gaussian elimination method for

More information

Pattern History Table. Global History Register. Pattern History Table. Branch History Pattern Pattern History Bits

Pattern History Table. Global History Register. Pattern History Table. Branch History Pattern Pattern History Bits An Enhanced Two-Level Adaptive Multiple Branch Prediction for Superscalar Processors Jong-bok Lee, Soo-Mook Moon and Wonyong Sung fjblee@mpeg,smoon@altair,wysung@dspg.snu.ac.kr School of Electrical Engineering,

More information

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i ) Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical

More information

Section Summary. Sequences. Recurrence Relations. Summations Special Integer Sequences (optional)

Section Summary. Sequences. Recurrence Relations. Summations Special Integer Sequences (optional) Section 2.4 Section Summary Sequences. o Examples: Geometric Progression, Arithmetic Progression Recurrence Relations o Example: Fibonacci Sequence Summations Special Integer Sequences (optional) Sequences

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS We want to solve the linear system a, x + + a,n x n = b a n, x + + a n,n x n = b n This will be done by the method used in beginning algebra, by successively eliminating unknowns

More information

Chapter 8 Gauss Elimination. Gab-Byung Chae

Chapter 8 Gauss Elimination. Gab-Byung Chae Chapter 8 Gauss Elimination Gab-Byung Chae 2008 5 19 2 Chapter Objectives How to solve small sets of linear equations with the graphical method and Cramer s rule Gauss Elimination Understanding how to

More information

Matrix-Matrix Multiplication

Matrix-Matrix Multiplication Chapter Matrix-Matrix Multiplication In this chapter, we discuss matrix-matrix multiplication We start by motivating its definition Next, we discuss why its implementation inherently allows high performance

More information

Ax = b. Systems of Linear Equations. Lecture Notes to Accompany. Given m n matrix A and m-vector b, find unknown n-vector x satisfying

Ax = b. Systems of Linear Equations. Lecture Notes to Accompany. Given m n matrix A and m-vector b, find unknown n-vector x satisfying Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter Systems of Linear Equations Systems of Linear Equations Given m n matrix A and m-vector

More information

The purpose of computing is insight, not numbers. Richard Wesley Hamming

The purpose of computing is insight, not numbers. Richard Wesley Hamming Systems of Linear Equations The purpose of computing is insight, not numbers. Richard Wesley Hamming Fall 2010 1 Topics to Be Discussed This is a long unit and will include the following important topics:

More information

LU Factorization a 11 a 1 a 1n A = a 1 a a n (b) a n1 a n a nn L = l l 1 l ln1 ln 1 75 U = u 11 u 1 u 1n 0 u u n 0 u n...

LU Factorization a 11 a 1 a 1n A = a 1 a a n (b) a n1 a n a nn L = l l 1 l ln1 ln 1 75 U = u 11 u 1 u 1n 0 u u n 0 u n... .. Factorizations Reading: Trefethen and Bau (1997), Lecture 0 Solve the n n linear system by Gaussian elimination Ax = b (1) { Gaussian elimination is a direct method The solution is found after a nite

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T. Heath Chapter 2 Systems of Linear Equations Copyright c 2001. Reproduction permitted only for noncommercial,

More information

Gaussian Elimination without/with Pivoting and Cholesky Decomposition

Gaussian Elimination without/with Pivoting and Cholesky Decomposition Gaussian Elimination without/with Pivoting and Cholesky Decomposition Gaussian Elimination WITHOUT pivoting Notation: For a matrix A R n n we define for k {,,n} the leading principal submatrix a a k A

More information

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 10 10 February 2009 Announcement Feb 17 th

More information

Enumerate all possible assignments and take the An algorithm is a well-defined computational

Enumerate all possible assignments and take the An algorithm is a well-defined computational EMIS 8374 [Algorithm Design and Analysis] 1 EMIS 8374 [Algorithm Design and Analysis] 2 Designing and Evaluating Algorithms A (Bad) Algorithm for the Assignment Problem Enumerate all possible assignments

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.

More information

CSE 160 Lecture 13. Numerical Linear Algebra

CSE 160 Lecture 13. Numerical Linear Algebra CSE 16 Lecture 13 Numerical Linear Algebra Announcements Section will be held on Friday as announced on Moodle Midterm Return 213 Scott B Baden / CSE 16 / Fall 213 2 Today s lecture Gaussian Elimination

More information

Optimized LU-decomposition with Full Pivot for Small Batched Matrices S3069

Optimized LU-decomposition with Full Pivot for Small Batched Matrices S3069 Optimized LU-decomposition with Full Pivot for Small Batched Matrices S369 Ian Wainwright High Performance Consulting Sweden ian.wainwright@hpcsweden.se Based on work for GTC 212: 1x speed-up vs multi-threaded

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra R. J. Renka Department of Computer Science & Engineering University of North Texas 02/03/2015 Notation and Terminology R n is the Euclidean n-dimensional linear space over the

More information

Data Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1

Data Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization Wei Li 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11,

More information

COURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems.

COURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems. COURSE 9 4 Numerical methods for solving linear systems Practical solving of many problems eventually leads to solving linear systems Classification of the methods: - direct methods - with low number of

More information

Advanced Compiler Construction

Advanced Compiler Construction CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 DEPENDENCE ANALYSIS The slides adapted from Vikram Adve and David Padua Kinds of Data Dependence Direct Dependence X =

More information

CE 221 Data Structures and Algorithms. Chapter 7: Sorting (Insertion Sort, Shellsort)

CE 221 Data Structures and Algorithms. Chapter 7: Sorting (Insertion Sort, Shellsort) CE 221 Data Structures and Algorithms Chapter 7: Sorting (Insertion Sort, Shellsort) Text: Read Weiss, 7.1 7.4 1 Preliminaries Main memory sorting algorithms All algorithms are Interchangeable; an array

More information

Fundamentals of Engineering Analysis (650163)

Fundamentals of Engineering Analysis (650163) Philadelphia University Faculty of Engineering Communications and Electronics Engineering Fundamentals of Engineering Analysis (6563) Part Dr. Omar R Daoud Matrices: Introduction DEFINITION A matrix is

More information

Lecture: Pipelining Basics

Lecture: Pipelining Basics Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:

More information

Matrix-Matrix Multiplication

Matrix-Matrix Multiplication Week5 Matrix-Matrix Multiplication 51 Opening Remarks 511 Composing Rotations Homework 5111 Which of the following statements are true: cosρ + σ + τ cosτ sinτ cosρ + σ sinρ + σ + τ sinτ cosτ sinρ + σ cosρ

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8

More information

Organization of a Modern Compiler. Middle1

Organization of a Modern Compiler. Middle1 Organization of a Modern Compiler Source Program Front-end syntax analysis + type-checking + symbol table High-level Intermediate Representation (loops,array references are preserved) Middle1 loop-level

More information

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization

More information

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise

More information

A Review of Matrix Analysis

A Review of Matrix Analysis Matrix Notation Part Matrix Operations Matrices are simply rectangular arrays of quantities Each quantity in the array is called an element of the matrix and an element can be either a numerical value

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations Last time, we found that solving equations such as Poisson s equation or Laplace s equation on a grid is equivalent to solving a system of linear equations. There are many other

More information

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same.

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same. Introduction Matrix Operations Matrix: An m n matrix A is an m-by-n array of scalars from a field (for example real numbers) of the form a a a n a a a n A a m a m a mn The order (or size) of A is m n (read

More information

Lecture 5: Loop Invariants and Insertion-sort

Lecture 5: Loop Invariants and Insertion-sort Lecture 5: Loop Invariants and Insertion-sort COMS10007 - Algorithms Dr. Christian Konrad 11.01.2019 Dr. Christian Konrad Lecture 5: Loop Invariants and Insertion-sort 1 / 12 Proofs by Induction Structure

More information

Lecture 12. Linear systems of equations II. a 13. a 12. a 14. a a 22. a 23. a 34 a 41. a 32. a 33. a 42. a 43. a 44)

Lecture 12. Linear systems of equations II. a 13. a 12. a 14. a a 22. a 23. a 34 a 41. a 32. a 33. a 42. a 43. a 44) 1 Introduction Lecture 12 Linear systems of equations II We have looked at Gauss-Jordan elimination and Gaussian elimination as ways to solve a linear system Ax=b. We now turn to the LU decomposition,

More information

CS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms

CS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms CS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions

More information

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a

More information

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II) Math 59 Lecture 2 (Tue Mar 5) Gaussian elimination and LU factorization (II) 2 Gaussian elimination - LU factorization For a general n n matrix A the Gaussian elimination produces an LU factorization if

More information

Algorithms in Braid Groups

Algorithms in Braid Groups Algorithms in Braid Groups Matthew J. Campagna matthew.campagna@pb.com Secure Systems Pitney Bowes, Inc. Abstract Braid Groups have recently been considered for use in Public-Key Cryptographic Systems.

More information

April 26, Applied mathematics PhD candidate, physics MA UC Berkeley. Lecture 4/26/2013. Jed Duersch. Spd matrices. Cholesky decomposition

April 26, Applied mathematics PhD candidate, physics MA UC Berkeley. Lecture 4/26/2013. Jed Duersch. Spd matrices. Cholesky decomposition Applied mathematics PhD candidate, physics MA UC Berkeley April 26, 2013 UCB 1/19 Symmetric positive-definite I Definition A symmetric matrix A R n n is positive definite iff x T Ax > 0 holds x 0 R n.

More information

The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework and Experimental Evaluation

The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework and Experimental Evaluation The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework and Experimental Evaluation Rezaul Alam Chowdhury Vijaya Ramachandran UTCS Technical Report TR-06-04 March 9, 006 Abstract The cache-oblivious

More information

IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM

IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM Bogdan OANCEA PhD, Associate Professor, Artife University, Bucharest, Romania E-mail: oanceab@ie.ase.ro Abstract:

More information

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Jim Lambers MAT 610 Summer Session Lecture 2 Notes Jim Lambers MAT 610 Summer Session 2009-10 Lecture 2 Notes These notes correspond to Sections 2.2-2.4 in the text. Vector Norms Given vectors x and y of length one, which are simply scalars x and y, the

More information

1 Closest Pair of Points on the Plane

1 Closest Pair of Points on the Plane CS 31: Algorithms (Spring 2019): Lecture 5 Date: 4th April, 2019 Topic: Divide and Conquer 3: Closest Pair of Points on a Plane Disclaimer: These notes have not gone through scrutiny and in all probability

More information

Linear Algebra: Lecture notes from Kolman and Hill 9th edition.

Linear Algebra: Lecture notes from Kolman and Hill 9th edition. Linear Algebra: Lecture notes from Kolman and Hill 9th edition Taylan Şengül March 20, 2019 Please let me know of any mistakes in these notes Contents Week 1 1 11 Systems of Linear Equations 1 12 Matrices

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law Topics 2 Page The Nature of Time real (i.e. wall clock) time = User Time: time spent

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Topics Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law 2 The Nature of Time real (i.e. wall clock) time = User Time: time spent executing

More information

Register Allocation. Maryam Siahbani CMPT 379 4/5/2016 1

Register Allocation. Maryam Siahbani CMPT 379 4/5/2016 1 Register Allocation Maryam Siahbani CMPT 379 4/5/2016 1 Register Allocation Intermediate code uses unlimited temporaries Simplifying code generation and optimization Complicates final translation to assembly

More information

A Divide-and-Conquer Algorithm for Functions of Triangular Matrices

A Divide-and-Conquer Algorithm for Functions of Triangular Matrices A Divide-and-Conquer Algorithm for Functions of Triangular Matrices Ç. K. Koç Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report, June 1996 Abstract We propose

More information

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn Today s class Linear Algebraic Equations LU Decomposition 1 Linear Algebraic Equations Gaussian Elimination works well for solving linear systems of the form: AX = B What if you have to solve the linear

More information

LECTURE NOTES ON DESIGN AND ANALYSIS OF ALGORITHMS

LECTURE NOTES ON DESIGN AND ANALYSIS OF ALGORITHMS G.PULLAIAH COLLEGE OF ENGINEERING AND TECHNOLOGY LECTURE NOTES ON DESIGN AND ANALYSIS OF ALGORITHMS Department of Computer Science and Engineering 1 UNIT 1 Basic Concepts Algorithm An Algorithm is a finite

More information

CS473 - Algorithms I

CS473 - Algorithms I CS473 - Algorithms I Lecture 10 Dynamic Programming View in slide-show mode CS 473 Lecture 10 Cevdet Aykanat and Mustafa Ozdal, Bilkent University 1 Introduction An algorithm design paradigm like divide-and-conquer

More information

The Solution of Linear Systems AX = B

The Solution of Linear Systems AX = B Chapter 2 The Solution of Linear Systems AX = B 21 Upper-triangular Linear Systems We will now develop the back-substitution algorithm, which is useful for solving a linear system of equations that has

More information

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

Lecture 1: Asymptotics, Recurrences, Elementary Sorting Lecture 1: Asymptotics, Recurrences, Elementary Sorting Instructor: Outline 1 Introduction to Asymptotic Analysis Rate of growth of functions Comparing and bounding functions: O, Θ, Ω Specifying running

More information

Review Questions REVIEW QUESTIONS 71

Review Questions REVIEW QUESTIONS 71 REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of

More information

Things we can already do with matrices. Unit II - Matrix arithmetic. Defining the matrix product. Things that fail in matrix arithmetic

Things we can already do with matrices. Unit II - Matrix arithmetic. Defining the matrix product. Things that fail in matrix arithmetic Unit II - Matrix arithmetic matrix multiplication matrix inverses elementary matrices finding the inverse of a matrix determinants Unit II - Matrix arithmetic 1 Things we can already do with matrices equality

More information

Lecture 4: Linear Algebra 1

Lecture 4: Linear Algebra 1 Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation

More information

Lecture 12: Solving Systems of Linear Equations by Gaussian Elimination

Lecture 12: Solving Systems of Linear Equations by Gaussian Elimination Lecture 12: Solving Systems of Linear Equations by Gaussian Elimination Winfried Just, Ohio University September 22, 2017 Review: The coefficient matrix Consider a system of m linear equations in n variables.

More information

Lecture Note 2: The Gaussian Elimination and LU Decomposition

Lecture Note 2: The Gaussian Elimination and LU Decomposition MATH 5330: Computational Methods of Linear Algebra Lecture Note 2: The Gaussian Elimination and LU Decomposition The Gaussian elimination Xianyi Zeng Department of Mathematical Sciences, UTEP The method

More information

Review of Matrices and Block Structures

Review of Matrices and Block Structures CHAPTER 2 Review of Matrices and Block Structures Numerical linear algebra lies at the heart of modern scientific computing and computational science. Today it is not uncommon to perform numerical computations

More information

Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions

Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions Copyright c 2015 Andrew Klapper. All rights reserved. 1. For the functions satisfying the following three recurrences, determine which is the

More information

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination Chapter 2 Solving Systems of Equations A large number of real life applications which are resolved through mathematical modeling will end up taking the form of the following very simple looking matrix

More information

Linear Algebra Review

Linear Algebra Review Chapter 1 Linear Algebra Review It is assumed that you have had a beginning course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc I will review some of these terms here,

More information

9. Numerical linear algebra background

9. Numerical linear algebra background Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

More information

Multiplication of polynomials

Multiplication of polynomials Multiplication of polynomials M. Gastineau IMCCE - Observatoire de Paris - CNRS UMR828 77, avenue Denfert Rochereau 7514 PARIS FRANCE gastineau@imcce.fr Advanced School on Specific Algebraic Manipulators,

More information

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS OHSU/OGI (Winter 2009) CS532 ANALYSIS AND DESIGN OF ALGORITHMS LECTURE 7 1 Quick Sort QuickSort 1 is a classic example of divide and conquer. The hard work is to rearrange the elements of the array A[1..n]

More information

Memory Elements I. CS31 Pascal Van Hentenryck. CS031 Lecture 6 Page 1

Memory Elements I. CS31 Pascal Van Hentenryck. CS031 Lecture 6 Page 1 Memory Elements I CS31 Pascal Van Hentenryck CS031 Lecture 6 Page 1 Memory Elements (I) Combinational devices are good for computing Boolean functions pocket calculator Computers also need to remember

More information

Circuits. Lecture 11 Uniform Circuit Complexity

Circuits. Lecture 11 Uniform Circuit Complexity Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2 Recall Non-uniform complexity P/1 Decidable 2 Recall Non-uniform complexity P/1 Decidable NP P/log NP = P 2 Recall

More information

Chapter 30 Design and Analysis of

Chapter 30 Design and Analysis of Chapter 30 Design and Analysis of 2 k DOEs Introduction This chapter describes design alternatives and analysis techniques for conducting a DOE. Tables M1 to M5 in Appendix E can be used to create test

More information