Loop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling.
|
|
- Miranda Gilbert
- 6 years ago
- Views:
Transcription
1 Advanced Topics Which Loops are Parallel? review Optimization for parallel machines and memory hierarchies Last Time Dependence analysis Today Loop transformations An example - McKinley, Carr, Tseng loop transformations to improve cache performance do = 1, N do J = 1, N S 1 A(,J) = A(,J-1) + 1 do = 1, N do J = 1, N S 2 A(,J) = A(-1,J-1) + 1 do = 1, N S 3 do J = 1, N B(,J) = B(-1,J+1) + 1 J A dependence D = (d 1,...,d k ) is carried at level i, if d i is the first nonzero element of the distance/direction vector. A loop l i is parallel, if a dependence D j carried at level i. Either distance vector direction vector D j d 1,...,d i 1 > 0 d 1,...,d i 1 = < OR d 1,...,d i = 0 d 1,...,d i = = CS 380C Lecture 24 1 Locality Analysis CS 380C Lecture 24 2 Locality Analysis
2 Loop Transformations Loop nterchange Taxonomy Loop unrolling Loop interchange Loop fusion do = 1, N do J = 1, N S 1 A(,J) = A(-1,J) + 1 enddo enddo Loop distribution (a.k.a. fission) Loop skewing Strip mine and interchange (a.k.a. tiling & blocking) Unroll-and-jam (a variety of tiling) do = 1, N do J = 1, N S 2 B(,J) = B(-1,J+1) + 1 enddo enddo J Loop reversal Loop interchange is safe iff it does not reverse the execution order of the source and sink of any dependence in the nest, i.e., if the distance vector would become negative. Enables parallelization of outer and/or inner loops Changes execution order of the statements Can improve reuse J CS 380C Lecture 24 3 Locality Analysis CS 380C Lecture 24 4 Locality Analysis
3 Loop Fusion Loop Distribution s 1 s 2 s 1 s 2 = loop fusion = do i = 2, n a(i) = b(i) do i = 2, n c(i) = b(i) a(i-1) do i = 2, n a(i) = b(i) c(i) = b(i) a(i-1) = loop distribution = s 1 s 2 s 2 s 1 do i = 2, n a(i) = b(i) c(i) = b(i) a(i+1) do i = 2, n c(i) = b(i) a(i+1) do i = 2, n a(i) = b(i) = loop distribution = Loop Fusion is safe iff no forward dependence between nests becomes a backward loop carried dependence. Would fusion be safe if s 2 referenced a(i+1)? Benefits Reuse Eliminates synchronization between parallel loops Reduced loop overhead Loop Distribution is safe iff statements involved in a cycle of dependences (recurrence) remain in the same loop, & if a dependence between two statements placed in different loops, it must be forward. Benefits Partial parallelization Reduces resource requirements Enables other transformations CS 380C Lecture 24 5 Locality Analysis CS 380C Lecture 24 6 Locality Analysis
4 Loop Skewing and nterchange Strip Mine and nterchange do = 1, 100 do J = 2, 100 A(,J) = A(-1,J) + A(,J-1) do = 1, 100 do J = +2, i+100 A(,J) = A(-1,J) + A(,J-1) = Strip Mine = do = 1, n do J = 1, n A(J,) = B(J) C() = nterchange = do = 1, n, tile do =, + tile -1 do J = 1, n A(J,) = B(J) C() do = 1, n, tile do J = 1, n do =, + tile -1 A(J,) = B(J) C() J J Loop skewing is always safe. The original dependences, (1,0) & (0,1) prevent interchange and parallelization. t changes the dependences to (1,1) & (0,1) and after interchange, (1,1) & (1,0), the inner loops is parallel. Enables inner loop parallelization J Strip Mining is always safe. With interchange it enables loop invariant reuse by changing the shape of the iteration space J CS 380C Lecture 24 7 Locality Analysis CS 380C Lecture 24 8 Locality Analysis
5 mproving Reuse with Loop Transformations Motivation: Enable portable programming without sacrificing performance optimization framework cache model compound loop transformation algorithm permutation distribution results transformation simulation performance fusion reversal Optimization Framework 1. improve order of memory accesses to exploit all levels of the memory hierarchy = cache line size 2. Tile to fit in cache, second level cache, TLB = size of cache(s), replacement policy, associativity, 3. register tiling via unroll-and-jam and scalar replacement = number and type of registers Assumptions cls - the cache line size in terms of data items Fortran column-major order interference occurs rarely for small numbers of inner loop iterations CS 380C Lecture 24 9 Locality Analysis CS 380C Lecture Locality Analysis
6 Reuse temporal locality spatial locality reuse of a specific location reuse of adjacent locations (cache lines) do j = 1, 100 do k = 1, 100 do i = 1, 100 c(i,j) = c(i,j) + a(i,k) To Determine Temporal and Spatial Reuse: b(k,j) Loop Cost of dmxpy From Linpackd Loop Cost in cache lines, cls = 4 do j = 1, n2 do i = 1, n1 y(i)= y(i) + x(j) m(i,j) reference group loop i loop j y(i) 1/4 n1 n2 1 n1 x(j) 1 n2 1/4 n2 n1 m(i,j) 1/4 n1 n2 n2 n1 total loop cost 1/2 n1 n2 + n2 5/4 n1 n2 + n1 for each loop l in a nest, consider l innermost group references with temporal locality reference groups compute the cost in cache lines accessed loop cost rank the loops based on their cost = memory order CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis
7 Loop permutation key insight: f loop l promotes more reuse than loop k at the innermost position, then it will also promote more reuse at any outer position. memory order (a) is it legal? (b) if not find a nearbypermutation O(n 2 ) = avoids combinatorial search present in other algorithms NearbyPermutation nput: O = {i 1,i 2,...,i n }, the original loop ordering DV = set of original legal direction vectors for l n L = {i σ1,i σ2,...,i σn }, a permutation ofo Output: P a nearby permutation ofo Algorithm: P = /0 ; k = 0 ; m = n whilel /0 for j = 1,m l = l j L if direction vectors for {p 1,..., p k,l} are legal P = {p 1,..., p k,l} L =L {l} ; k = k+1 ; m = m 1 break for endif endfor endwhile CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis
8 Matrix Multiply - execution times in seconds 60 Execution Times (in seconds) vs. Loop Organization Sun Sparc2 ntel i860 BM RS/6000 JK KJ JK JK KJ KJ - ORDER Loop Fusion Erlebacher Fortran 90 loops for AD ntegration DO = 2, N X(,1:N) = X(,1:N) - X(-1,1:N) A(,1:N)/B(-1,1:N) B(,1:N) = B(,1:N) - A(,1:N) A(,1:N)/B(-1,1:N) DO = 2, N simple translation to Fortran 77 DO K = 1, N X(,K) = X(,K) - X(-1,K) A(,K)/B(-1,K) DO K = 1, N B(,K) = B(,K) - A(,K) A(,K)/B(-1,K) DO K = 1, N loop fusion & interchange DO = 2, N X(,K) = X(,K) - X(-1,K) A(,K)/B(-1,K) B(,K) = B(,K) - A(,K) A(,K)/B(-1,K) Execution times in seconds. Memory Order Processor Original Distributed Fused Sun Sparc ntel i BM RS Sun Sparc2 ntel i860 BM RS/6000 JK KJ JK JK KJ KJ - ORDER CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis
9 Loop Distribution Cholesky Factorization Algorithm Summary DO K = 1,N DO K = 1,N A(K,K) = SQRT(A(K,K)) A(K,K) = SQRT(A(K,K)) DO = K+1,N DO = K+1,N A(,K) = A(,K)/A(K,K) A(,K) = A(,K)/A(K,K) DO J = K+1, DO J = K,N A(,J) = A(,K)*A(J,K) DO =J+1,N A(,J) = A(,K)*A(J,K) 12 = loop distribution & triangular interchange = 10 8 Execution times (in seconds) for each nestl in a set of adjacent nests compute reference groups for each l i compute loop cost for each l i and sort permutation with reversal? fuse inner loops and permute? distribute and permute? 6 fuse nestsl? Sun Sparc2 ntel i860 BM RS/6000 KJ JK KJ KJ JK JK - ORDER CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis
10 Number of Programs Results Achieving Memory Order for Loop Nests test suite Perfect Benchmarks SPEC Benchmarks NAS Benchmarks 4 additional programs 16 experiments on our ability to transform programs simulated hit rates for RS/6000 and i860 execution times on an RS/ Original Final <= 20 >=40 >= 60 >= 70 >=80 >= 90 Percentage of Loop Nests in Memory Order CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis
11 Number of Programs Achieving Memory Order for nner Loops Simulated Cache Hit Rates cache1: RS/ K cache, 4-way, 128 byte cache line cache2: i860 8K cache, 2-way, 32 byte cache line for RS/ Original Final <= 20 >=40 >= 60 >= 70 >=80 >= 90 Percent of nner Loops in Memory Order = in 12 of 27 programs, optimized procedures started with 100% hit rates = average hit rate for original programs Memory Order Opt. Proc. Hit Rates Perfect Org Prm Fail Cache 1 Cache 2 Club % org final org final adm arc2d bdna dyfesm flo mdg mg3d ocean qcd spec track trfd CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis
12 Performance Results in Seconds on RS6000 Program Original Transformed Speedup arc2d dyfesm flo dnasa7 (btrix) dnasa7 (emit) dnasa7 (gmtry) dnasa7(vpenta) applu appsp linpackd simple wave Summary Recap of Transformation Results 80 % of nests were permuted into memory order 85 % of inner loops were permuted into memory order loop permutation is the most effective optimization 229 candidates for fusion, resulting in 80 nests 23 nests were distributed, resulting in 52 nests Observations many programs started out with high hit ratios smaller cache sizes result in higher improvements in hit rates = regardless of the original target architecture, compiler optimizations improve locality for uniprocessors CS 380C Lecture Locality Analysis CS 380C Lecture Locality Analysis
13 Next Time Compiling for an EDGE Arcuitecture Read: D. Burger et al., Compiling for TRPS: Scaling to the End of Silicon with EDGE Architectures, EEE Computer, pp , July CS 380C Lecture Locality Analysis
Dependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo
Dependence Analysis Dependence Examples Last Time: Brief introduction to interprocedural analysis Today: Optimization for parallel machines and memory hierarchies Dependence analysis Loop transformations
More informationAnnouncements PA2 due Friday Midterm is Wednesday next week, in class, one week from today
Loop Transformations Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Today Recall stencil computations Intro to loop transformations Data dependencies between
More informationAdvanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen
Advanced Restructuring Compilers Advanced Topics Spring 2009 Prof. Robert van Engelen Overview Data and control dependences The theory and practice of data dependence analysis K-level loop-carried dependences
More informationFortran program + Partial data layout specifications Data Layout Assistant.. regular problems. dynamic remapping allowed Invoked only a few times Not part of the compiler Can use expensive techniques HPF
More informationImproving Memory Hierarchy Performance Through Combined Loop. Interchange and Multi-Level Fusion
Improving Memory Hierarchy Performance Through Combined Loop Interchange and Multi-Level Fusion Qing Yi Ken Kennedy Computer Science Department, Rice University MS-132 Houston, TX 77005 Abstract Because
More informationLoop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1
Loop Scheduling and Software Pipelining 2008-04-24 \course\cpeg421-08s\topic-7.ppt 1 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: 2008-04-24 \course\cpeg421-08s\topic-7.ppt
More informationSaturday, April 23, Dependence Analysis
Dependence Analysis Motivating question Can the loops on the right be run in parallel? i.e., can different processors run different iterations in parallel? What needs to be true for a loop to be parallelizable?
More informationCS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University
CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution
More informationLoop Parallelization Techniques and dependence analysis
Loop Parallelization Techniques and dependence analysis Data-Dependence Analysis Dependence-Removing Techniques Parallelizing Transformations Performance-enchancing Techniques 1 When can we run code in
More informationCOMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University
COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 3 13 January 2009 Acknowledgments Slides
More informationA Polynomial-Time Algorithm for Memory Space Reduction
A Polynomial-Time Algorithm for Memory Space Reduction Yonghong Song Cheng Wang Zhiyuan Li Sun Microsystems, Inc. Department of Computer Sciences 4150 Network Circle Purdue University Santa Clara, CA 95054
More informationDO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =...
15-745 Lecture 7 Data Dependence in Loops 2 Complex spaces Delta Test Merging vectors Copyright Seth Goldstein, 2008-9 Based on slides from Allen&Kennedy 1 The General Problem S 1 A(f 1 (i 1,,i n ),,f
More informationVector Lane Threading
Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program
More information1 Problem 1 Solution. 1.1 Common Mistakes. In order to show that the matrix L k is the inverse of the matrix M k, we need to show that
1 Problem 1 Solution In order to show that the matrix L k is the inverse of the matrix M k, we need to show that Since we need to show that Since L k M k = I (or M k L k = I). L k = I + m k e T k, M k
More informationCOMP 515: Advanced Compilation for Vector and Parallel Processors
COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 4 1 September, 2011 1 Acknowledgments Slides
More information' $ Dependence Analysis & % 1
Dependence Analysis 1 Goals - determine what operations can be done in parallel - determine whether the order of execution of operations can be altered Basic idea - determine a partial order on operations
More informationDO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =...
15-745 Lecture 7 Data Dependence in Loops 2 Delta Test Merging vectors Copyright Seth Goldstein, 2008 The General Problem S 1 A(f 1 (i 1,,i n ),,f m (i 1,,i n )) = S 2 = A(g 1 (i 1,,i n ),,g m (i 1,,i
More informationCS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform
CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast
More informationHomework #2: assignments/ps2/ps2.htm Due Thursday, March 7th.
Homework #2: http://www.cs.cornell.edu/courses/cs612/2002sp/ assignments/ps2/ps2.htm Due Thursday, March 7th. 1 Transformations and Dependences 2 Recall: Polyhedral algebra tools for determining emptiness
More informationParallel Scientific Computing
IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.
More informationPractical Dependence Testing *
Practical Dependence Testing * Gina Goff Ken Kennedy Cllau-Wen Tseng Department of Computer Science Rice Houston, University TX 77251-1892 Abstract Precise and efficient dependence tests are essential
More informationCS 161 Summer 2009 Homework #2 Sample Solutions
CS 161 Summer 2009 Homework #2 Sample Solutions Regrade Policy: If you believe an error has been made in the grading of your homework, you may resubmit it for a regrade. If the error consists of more than
More informationCompiler Optimisation
Compiler Optimisation 8 Dependence Analysis Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This
More informationOutline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014
Outline 1 midterm exam on Friday 11 July 2014 policies for the first part 2 questions with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Intro
More informationAutomatic Loop Interchange
RETROSPECTIVE: Automatic Loop Interchange Randy Allen Catalytic Compilers 1900 Embarcadero Rd, #206 Palo Alto, CA. 94043 jra@catacomp.com Ken Kennedy Department of Computer Science Rice University Houston,
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 2 Systems of Linear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationMATH 3511 Lecture 1. Solving Linear Systems 1
MATH 3511 Lecture 1 Solving Linear Systems 1 Dmitriy Leykekhman Spring 2012 Goals Review of basic linear algebra Solution of simple linear systems Gaussian elimination D Leykekhman - MATH 3511 Introduction
More informationSolving Dense Linear Systems I
Solving Dense Linear Systems I Solving Ax = b is an important numerical method Triangular system: [ l11 l 21 if l 11, l 22 0, ] [ ] [ ] x1 b1 = l 22 x 2 b 2 x 1 = b 1 /l 11 x 2 = (b 2 l 21 x 1 )/l 22 Chih-Jen
More informationEngineering Computation
Engineering Computation Systems of Linear Equations_1 1 Learning Objectives for Lecture 1. Motivate Study of Systems of Equations and particularly Systems of Linear Equations. Review steps of Gaussian
More informationTopic 17. Analysis of Algorithms
Topic 17 Analysis of Algorithms Analysis of Algorithms- Review Efficiency of an algorithm can be measured in terms of : Time complexity: a measure of the amount of time required to execute an algorithm
More information2.1 Gaussian Elimination
2. Gaussian Elimination A common problem encountered in numerical models is the one in which there are n equations and n unknowns. The following is a description of the Gaussian elimination method for
More informationPattern History Table. Global History Register. Pattern History Table. Branch History Pattern Pattern History Bits
An Enhanced Two-Level Adaptive Multiple Branch Prediction for Superscalar Processors Jong-bok Lee, Soo-Mook Moon and Wonyong Sung fjblee@mpeg,smoon@altair,wysung@dspg.snu.ac.kr School of Electrical Engineering,
More information1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )
Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical
More informationSection Summary. Sequences. Recurrence Relations. Summations Special Integer Sequences (optional)
Section 2.4 Section Summary Sequences. o Examples: Geometric Progression, Arithmetic Progression Recurrence Relations o Example: Fibonacci Sequence Summations Special Integer Sequences (optional) Sequences
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS We want to solve the linear system a, x + + a,n x n = b a n, x + + a n,n x n = b n This will be done by the method used in beginning algebra, by successively eliminating unknowns
More informationChapter 8 Gauss Elimination. Gab-Byung Chae
Chapter 8 Gauss Elimination Gab-Byung Chae 2008 5 19 2 Chapter Objectives How to solve small sets of linear equations with the graphical method and Cramer s rule Gauss Elimination Understanding how to
More informationMatrix-Matrix Multiplication
Chapter Matrix-Matrix Multiplication In this chapter, we discuss matrix-matrix multiplication We start by motivating its definition Next, we discuss why its implementation inherently allows high performance
More informationAx = b. Systems of Linear Equations. Lecture Notes to Accompany. Given m n matrix A and m-vector b, find unknown n-vector x satisfying
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter Systems of Linear Equations Systems of Linear Equations Given m n matrix A and m-vector
More informationThe purpose of computing is insight, not numbers. Richard Wesley Hamming
Systems of Linear Equations The purpose of computing is insight, not numbers. Richard Wesley Hamming Fall 2010 1 Topics to Be Discussed This is a long unit and will include the following important topics:
More informationLU Factorization a 11 a 1 a 1n A = a 1 a a n (b) a n1 a n a nn L = l l 1 l ln1 ln 1 75 U = u 11 u 1 u 1n 0 u u n 0 u n...
.. Factorizations Reading: Trefethen and Bau (1997), Lecture 0 Solve the n n linear system by Gaussian elimination Ax = b (1) { Gaussian elimination is a direct method The solution is found after a nite
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationLecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T. Heath Chapter 2 Systems of Linear Equations Copyright c 2001. Reproduction permitted only for noncommercial,
More informationGaussian Elimination without/with Pivoting and Cholesky Decomposition
Gaussian Elimination without/with Pivoting and Cholesky Decomposition Gaussian Elimination WITHOUT pivoting Notation: For a matrix A R n n we define for k {,,n} the leading principal submatrix a a k A
More informationCOMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University
COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP 515 Lecture 10 10 February 2009 Announcement Feb 17 th
More informationEnumerate all possible assignments and take the An algorithm is a well-defined computational
EMIS 8374 [Algorithm Design and Analysis] 1 EMIS 8374 [Algorithm Design and Analysis] 2 Designing and Evaluating Algorithms A (Bad) Algorithm for the Assignment Problem Enumerate all possible assignments
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.
More informationCSE 160 Lecture 13. Numerical Linear Algebra
CSE 16 Lecture 13 Numerical Linear Algebra Announcements Section will be held on Friday as announced on Moodle Midterm Return 213 Scott B Baden / CSE 16 / Fall 213 2 Today s lecture Gaussian Elimination
More informationOptimized LU-decomposition with Full Pivot for Small Batched Matrices S3069
Optimized LU-decomposition with Full Pivot for Small Batched Matrices S369 Ian Wainwright High Performance Consulting Sweden ian.wainwright@hpcsweden.se Based on work for GTC 212: 1x speed-up vs multi-threaded
More informationNumerical Linear Algebra
Numerical Linear Algebra R. J. Renka Department of Computer Science & Engineering University of North Texas 02/03/2015 Notation and Terminology R n is the Euclidean n-dimensional linear space over the
More informationData Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1
Data Dependences and Parallelization Wei Li 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11,
More informationCOURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems.
COURSE 9 4 Numerical methods for solving linear systems Practical solving of many problems eventually leads to solving linear systems Classification of the methods: - direct methods - with low number of
More informationAdvanced Compiler Construction
CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 DEPENDENCE ANALYSIS The slides adapted from Vikram Adve and David Padua Kinds of Data Dependence Direct Dependence X =
More informationCE 221 Data Structures and Algorithms. Chapter 7: Sorting (Insertion Sort, Shellsort)
CE 221 Data Structures and Algorithms Chapter 7: Sorting (Insertion Sort, Shellsort) Text: Read Weiss, 7.1 7.4 1 Preliminaries Main memory sorting algorithms All algorithms are Interchangeable; an array
More informationFundamentals of Engineering Analysis (650163)
Philadelphia University Faculty of Engineering Communications and Electronics Engineering Fundamentals of Engineering Analysis (6563) Part Dr. Omar R Daoud Matrices: Introduction DEFINITION A matrix is
More informationLecture: Pipelining Basics
Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:
More informationMatrix-Matrix Multiplication
Week5 Matrix-Matrix Multiplication 51 Opening Remarks 511 Composing Rotations Homework 5111 Which of the following statements are true: cosρ + σ + τ cosτ sinτ cosρ + σ sinρ + σ + τ sinτ cosτ sinρ + σ cosρ
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationBLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product
Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8
More informationOrganization of a Modern Compiler. Middle1
Organization of a Modern Compiler Source Program Front-end syntax analysis + type-checking + symbol table High-level Intermediate Representation (loops,array references are preserved) Middle1 loop-level
More informationLU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b
AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization
More informationBlock AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark
Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise
More informationA Review of Matrix Analysis
Matrix Notation Part Matrix Operations Matrices are simply rectangular arrays of quantities Each quantity in the array is called an element of the matrix and an element can be either a numerical value
More informationSystems of Linear Equations
Systems of Linear Equations Last time, we found that solving equations such as Poisson s equation or Laplace s equation on a grid is equivalent to solving a system of linear equations. There are many other
More informationEquality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same.
Introduction Matrix Operations Matrix: An m n matrix A is an m-by-n array of scalars from a field (for example real numbers) of the form a a a n a a a n A a m a m a mn The order (or size) of A is m n (read
More informationLecture 5: Loop Invariants and Insertion-sort
Lecture 5: Loop Invariants and Insertion-sort COMS10007 - Algorithms Dr. Christian Konrad 11.01.2019 Dr. Christian Konrad Lecture 5: Loop Invariants and Insertion-sort 1 / 12 Proofs by Induction Structure
More informationLecture 12. Linear systems of equations II. a 13. a 12. a 14. a a 22. a 23. a 34 a 41. a 32. a 33. a 42. a 43. a 44)
1 Introduction Lecture 12 Linear systems of equations II We have looked at Gauss-Jordan elimination and Gaussian elimination as ways to solve a linear system Ax=b. We now turn to the LU decomposition,
More informationCS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms
CS 4407 Algorithms Lecture 3: Iterative and Divide and Conquer Algorithms Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline CS 4407, Algorithms Growth Functions
More informationCSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo
CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a
More informationLecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)
Math 59 Lecture 2 (Tue Mar 5) Gaussian elimination and LU factorization (II) 2 Gaussian elimination - LU factorization For a general n n matrix A the Gaussian elimination produces an LU factorization if
More informationAlgorithms in Braid Groups
Algorithms in Braid Groups Matthew J. Campagna matthew.campagna@pb.com Secure Systems Pitney Bowes, Inc. Abstract Braid Groups have recently been considered for use in Public-Key Cryptographic Systems.
More informationApril 26, Applied mathematics PhD candidate, physics MA UC Berkeley. Lecture 4/26/2013. Jed Duersch. Spd matrices. Cholesky decomposition
Applied mathematics PhD candidate, physics MA UC Berkeley April 26, 2013 UCB 1/19 Symmetric positive-definite I Definition A symmetric matrix A R n n is positive definite iff x T Ax > 0 holds x 0 R n.
More informationThe Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework and Experimental Evaluation
The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework and Experimental Evaluation Rezaul Alam Chowdhury Vijaya Ramachandran UTCS Technical Report TR-06-04 March 9, 006 Abstract The cache-oblivious
More informationIMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM
IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM Bogdan OANCEA PhD, Associate Professor, Artife University, Bucharest, Romania E-mail: oanceab@ie.ase.ro Abstract:
More informationJim Lambers MAT 610 Summer Session Lecture 2 Notes
Jim Lambers MAT 610 Summer Session 2009-10 Lecture 2 Notes These notes correspond to Sections 2.2-2.4 in the text. Vector Norms Given vectors x and y of length one, which are simply scalars x and y, the
More information1 Closest Pair of Points on the Plane
CS 31: Algorithms (Spring 2019): Lecture 5 Date: 4th April, 2019 Topic: Divide and Conquer 3: Closest Pair of Points on a Plane Disclaimer: These notes have not gone through scrutiny and in all probability
More informationLinear Algebra: Lecture notes from Kolman and Hill 9th edition.
Linear Algebra: Lecture notes from Kolman and Hill 9th edition Taylan Şengül March 20, 2019 Please let me know of any mistakes in these notes Contents Week 1 1 11 Systems of Linear Equations 1 12 Matrices
More informationMeasurement & Performance
Measurement & Performance Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law Topics 2 Page The Nature of Time real (i.e. wall clock) time = User Time: time spent
More informationMeasurement & Performance
Measurement & Performance Topics Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law 2 The Nature of Time real (i.e. wall clock) time = User Time: time spent executing
More informationRegister Allocation. Maryam Siahbani CMPT 379 4/5/2016 1
Register Allocation Maryam Siahbani CMPT 379 4/5/2016 1 Register Allocation Intermediate code uses unlimited temporaries Simplifying code generation and optimization Complicates final translation to assembly
More informationA Divide-and-Conquer Algorithm for Functions of Triangular Matrices
A Divide-and-Conquer Algorithm for Functions of Triangular Matrices Ç. K. Koç Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331 Technical Report, June 1996 Abstract We propose
More informationToday s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn
Today s class Linear Algebraic Equations LU Decomposition 1 Linear Algebraic Equations Gaussian Elimination works well for solving linear systems of the form: AX = B What if you have to solve the linear
More informationLECTURE NOTES ON DESIGN AND ANALYSIS OF ALGORITHMS
G.PULLAIAH COLLEGE OF ENGINEERING AND TECHNOLOGY LECTURE NOTES ON DESIGN AND ANALYSIS OF ALGORITHMS Department of Computer Science and Engineering 1 UNIT 1 Basic Concepts Algorithm An Algorithm is a finite
More informationCS473 - Algorithms I
CS473 - Algorithms I Lecture 10 Dynamic Programming View in slide-show mode CS 473 Lecture 10 Cevdet Aykanat and Mustafa Ozdal, Bilkent University 1 Introduction An algorithm design paradigm like divide-and-conquer
More informationThe Solution of Linear Systems AX = B
Chapter 2 The Solution of Linear Systems AX = B 21 Upper-triangular Linear Systems We will now develop the back-substitution algorithm, which is useful for solving a linear system of equations that has
More informationLecture 1: Asymptotics, Recurrences, Elementary Sorting
Lecture 1: Asymptotics, Recurrences, Elementary Sorting Instructor: Outline 1 Introduction to Asymptotic Analysis Rate of growth of functions Comparing and bounding functions: O, Θ, Ω Specifying running
More informationReview Questions REVIEW QUESTIONS 71
REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of
More informationThings we can already do with matrices. Unit II - Matrix arithmetic. Defining the matrix product. Things that fail in matrix arithmetic
Unit II - Matrix arithmetic matrix multiplication matrix inverses elementary matrices finding the inverse of a matrix determinants Unit II - Matrix arithmetic 1 Things we can already do with matrices equality
More informationLecture 4: Linear Algebra 1
Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation
More informationLecture 12: Solving Systems of Linear Equations by Gaussian Elimination
Lecture 12: Solving Systems of Linear Equations by Gaussian Elimination Winfried Just, Ohio University September 22, 2017 Review: The coefficient matrix Consider a system of m linear equations in n variables.
More informationLecture Note 2: The Gaussian Elimination and LU Decomposition
MATH 5330: Computational Methods of Linear Algebra Lecture Note 2: The Gaussian Elimination and LU Decomposition The Gaussian elimination Xianyi Zeng Department of Mathematical Sciences, UTEP The method
More informationReview of Matrices and Block Structures
CHAPTER 2 Review of Matrices and Block Structures Numerical linear algebra lies at the heart of modern scientific computing and computational science. Today it is not uncommon to perform numerical computations
More informationAlgorithm Design CS 515 Fall 2015 Sample Final Exam Solutions
Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions Copyright c 2015 Andrew Klapper. All rights reserved. 1. For the functions satisfying the following three recurrences, determine which is the
More informationChapter 2. Solving Systems of Equations. 2.1 Gaussian elimination
Chapter 2 Solving Systems of Equations A large number of real life applications which are resolved through mathematical modeling will end up taking the form of the following very simple looking matrix
More informationLinear Algebra Review
Chapter 1 Linear Algebra Review It is assumed that you have had a beginning course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc I will review some of these terms here,
More information9. Numerical linear algebra background
Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization
More informationMultiplication of polynomials
Multiplication of polynomials M. Gastineau IMCCE - Observatoire de Paris - CNRS UMR828 77, avenue Denfert Rochereau 7514 PARIS FRANCE gastineau@imcce.fr Advanced School on Specific Algebraic Manipulators,
More information1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS
OHSU/OGI (Winter 2009) CS532 ANALYSIS AND DESIGN OF ALGORITHMS LECTURE 7 1 Quick Sort QuickSort 1 is a classic example of divide and conquer. The hard work is to rearrange the elements of the array A[1..n]
More informationMemory Elements I. CS31 Pascal Van Hentenryck. CS031 Lecture 6 Page 1
Memory Elements I CS31 Pascal Van Hentenryck CS031 Lecture 6 Page 1 Memory Elements (I) Combinational devices are good for computing Boolean functions pocket calculator Computers also need to remember
More informationCircuits. Lecture 11 Uniform Circuit Complexity
Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2 Recall Non-uniform complexity P/1 Decidable 2 Recall Non-uniform complexity P/1 Decidable NP P/log NP = P 2 Recall
More informationChapter 30 Design and Analysis of
Chapter 30 Design and Analysis of 2 k DOEs Introduction This chapter describes design alternatives and analysis techniques for conducting a DOE. Tables M1 to M5 in Appendix E can be used to create test
More information