Loop-independent dependence: dependence exists within an iteration; i.e., if the loop is removed, the dependence still exists.

Similar documents
Third handout: On the Gini Index

Econometric Methods. Review of Estimation

CHAPTER 4 RADICAL EXPRESSIONS

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Chapter 5 Properties of a Random Sample

Introduction to local (nonparametric) density estimation. methods

Statistics: Unlocking the Power of Data Lock 5

The expected value of a sum of random variables,, is the sum of the expected values:

PTAS for Bin-Packing

A Primer on Summation Notation George H Olson, Ph. D. Doctoral Program in Educational Leadership Appalachian State University Spring 2010

Multiple Choice Test. Chapter Adequacy of Models for Regression

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS

Class 13,14 June 17, 19, 2015

X ε ) = 0, or equivalently, lim

Chapter 4 Multiple Random Variables

ρ < 1 be five real numbers. The

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers.

ENGI 4421 Propagation of Error Page 8-01

NATIONAL SENIOR CERTIFICATE GRADE 11

EECE 301 Signals & Systems

Exercises for Square-Congruence Modulo n ver 11

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

18.413: Error Correcting Codes Lab March 2, Lecture 8

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Lecture 3. Sampling, sampling distributions, and parameter estimation

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Dimensionality Reduction and Learning

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

Algorithms Theory, Solution for Assignment 2

22 Nonparametric Methods.

Module 7. Lecture 7: Statistical parameter estimation

L5 Polynomial / Spline Curves

Correlation and Regression Analysis

d dt d d dt dt Also recall that by Taylor series, / 2 (enables use of sin instead of cos-see p.27 of A&F) dsin

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Pseudo-random Functions

1 Onto functions and bijections Applications to Counting

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

Analysis of System Performance IN2072 Chapter 5 Analysis of Non Markov Systems

Descriptive Statistics

Multiple Linear Regression Analysis

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Lecture 2 - What are component and system reliability and how it can be improved?

Computational Geometry

Centroids Method of Composite Areas

Mu Sequences/Series Solutions National Convention 2014

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Simple Linear Regression

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

MATH 247/Winter Notes on the adjoint and on normal operators.

EP2200 Queueing theory and teletraffic systems. Queueing networks. Viktoria Fodor KTH EES/LCN KTH EES/LCN

Newton s Power Flow algorithm

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

1 0, x? x x. 1 Root finding. 1.1 Introduction. Solve[x^2-1 0,x] {{x -1},{x 1}} Plot[x^2-1,{x,-2,2}] 3

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

ε. Therefore, the estimate

ENGI 3423 Simple Linear Regression Page 12-01

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

TESTS BASED ON MAXIMUM LIKELIHOOD

Lecture Notes Types of economic variables

Taylor s Series and Interpolation. Interpolation & Curve-fitting. CIS Interpolation. Basic Scenario. Taylor Series interpolates at a specific

arxiv:math/ v1 [math.gm] 8 Dec 2005

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

MA 524 Homework 6 Solutions

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

Chapter Statistics Background of Regression Analysis

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

( ) ( ) ( ( )) ( ) ( ) ( ) ( ) ( ) = ( ) ( ) + ( ) ( ) = ( ( )) ( ) + ( ( )) ( ) Review. Second Derivatives for f : y R. Let A be an m n matrix.

Point Estimation: definition of estimators

Functions of Random Variables

Bayesian belief networks

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

MTH 146 Class 7 Notes

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Lecture 9: Tolerant Testing

,m = 1,...,n; 2 ; p m (1 p) n m,m = 0,...,n; E[X] = np; n! e λ,n 0; E[X] = λ.

Bayes (Naïve or not) Classifiers: Generative Approach

9.3 Taylor s Theorem: Error Analysis for Series. Tacoma Narrows Bridge: November 7, 1940

We have already referred to a certain reaction, which takes place at high temperature after rich combustion.

STA302/1001-Fall 2008 Midterm Test October 21, 2008

PART ONE. Solutions to Exercises

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Laboratory I.10 It All Adds Up

Physics 114 Exam 2 Fall Name:

2006 Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America

16 Homework lecture 16

NP!= P. By Liu Ran. Table of Contents. The P versus NP problem is a major unsolved problem in computer

Power Flow S + Buses with either or both Generator Load S G1 S G2 S G3 S D3 S D1 S D4 S D5. S Dk. Injection S G1

Transcription:

Loop-depedet vs. loop-carred depedeces [ 3.] Loop-carred depedece: depedece exsts across teratos;.e., f the loop s removed, the depedece o loger exsts. Loop-depedet depedece: depedece exsts wth a terato;.e., f the loop s removed, the depedece stll exsts. Example: for (=; <; ++) { S: a[] = a[-] + ; S: b[] = a[]; for (=; <; ++) for (j=; j< ; j++) S3: a[][j] = a[][j-] + ; S[] S[+]: loop-carred S[] S[]: loop-depedet S3[,j] S3[,j+]: loop-carred o for j loop o loop-carred depedece for loop for (=; <; ++) for (j=; j< ; j++) S4: a[][j] = a[-][j] + ; Iterato-space raversal Graph (IG) S4[,j] S4[+,j]: o loop-carred depedece for j loop loop-carred o for loop [ 3..] he IG shows graphcally the order of traversal the terato space. hs s sometmes called the happes-before relatoshp. I a IG, A ode represets a pot the terato space A drected edge dcates the ext pot that wll be ecoutered after the curret pot s traversed Example: for (=; <4; ++) for (j=; j<4; j++) S3: a[][j] = a[][j-] + ; Lecture 5 Archtecture of Parallel Computers

j 3 3 Loop-carred Depedece Graph (LDG) LDG shows the true/at/output depedece relatoshp graphcally. A ode s a pot the terato space. A drected edge represets the depedece. Example: for (=; <4; ++) for (j=; j<4; j++) S3: a[][j] = a[][j-] + ; 00 Edward F. Gehrger CSC/ECE 506 Lecture Notes, Sprg 00

j 3 3 Aother example: for (=; <=; ++) for (j=; j<=; j++) S: a[][j] = a[][j-] + a[][j+] + a[-][j] + a[+][j]; for (=; <=; ++) for (j=; j<=; j++) { S: a[][j] = b[][j] + c[][j]; S3: b[][j] = a[][j-] * d[][j]; Draw the IG Lst all the depedece relatoshps Note that there are two loop ests the code. he frst volves S. he other volves S ad S3. What do we kow about the IG for these ested loops? Lecture 5 Archtecture of Parallel Computers 3

...... Depedece relatoshps for Loop Nest rue depedeces: o S[,j] S[,j+] o S[,j] S[+,j] Output depedeces: o Noe At-depedeces: o S[,j] A S[+,j] o S[,j] A S[,j+] Exercse: Suppose we dropped off the frst half of S, so we had S: a[][j] = a[-][j] + a[+][j]; or the last half, so we had S: a[][j] = a[][j-] + a[][j+]; Whch of the depedeces would stll exst? 00 Edward F. Gehrger CSC/ECE 506 Lecture Notes, Sprg 00 4

Draw the LDG for Loop Nest. j...... Note: each edge represets both true, ad at-depedeces Depedece relatoshps for Loop Nest rue depedeces: o S[,j] S3[,j+] Output depedeces: o Noe At-depedeces: o S[,j] A S3[,j] (loop-depedet depedece) Lecture 5 Archtecture of Parallel Computers 5

Draw the LDG for Loop Nest. j... Note: each edge represets oly true depedeces... Why are there o vertcal edges ths graph? Aswer here. Why s the at-depedece ot show o the graph? Fdg parallel tasks across teratos [ 3..] Aalyze loop-carred depedeces: Depedeces must be eforced (especally true depedeces; other depedeces ca be removed by prvatzato) here are opportutes for parallelsm whe some depedeces are ot preset. Example for (=; <=; ++) S: a[] = a[-]; LDG: 00 Edward F. Gehrger CSC/ECE 506 Lecture Notes, Sprg 00 6

We ca dvde the loop to two parallel tasks (oe wth odd teratos ad aother wth eve teratos): Example for (=; <=; +=) S: a[] = a[-]; for (=3; <=; +=) S: a[] = a[-]; for (=0; <; ++) for (j=0; j< ; j++) S3: a[][j] = a[][j-] + ; LDG j...... How may parallel tasks are there here? Example 3 for (=; <=; ++) for (j=; j<=; j++) S: a[][j] = a[][j-] + a[][j+] + a[-][j] + a[+][j]; LDG... j Note: each edge represets both true, ad at-depedeces Lecture 5 Archtecture of Parallel Computers 7

Idetfy whch odes are ot depedet o each other I each at-dagoal, the odes are depedet of each other... Note: each edge represets both true, ad at-depedeces... We eed to rewrte the code to terate over at-dagoals: Calculate umber of at-dagoals for each at-dagoal do Calculate the umber of pots the curret at-dagoal for each pot the curret at-dagoal do Compute the value of the curret pot the matrx Parallelze loops hghlghted above. for (=; <= *-; ++) {// - at-dagoals f ( <= ) { pots = ; // umber of pots at-dag row = ; // frst pt (row,col) at-dag col = ; // ote that row+col = + always else { pots = * ; row = ; col = -+; // ote that row+col = + always for_all (k=; k <= pots; k++) { a[row][col] = // update a[row][col] row--; col++; 00 Edward F. Gehrger CSC/ECE 506 Lecture Notes, Sprg 00 8

DOACROSS Parallelsm [ 3..3] Suppose we have ths code: Ca we execute aythg parallel? for (=; <=N; ++) { S: a[] = a[-] + b[] * c[]; Well, we ca t ru the teratos of the for loop parallel, because S[] S[+] (here s a loop-carred depedece.) But, otce that the b[]*c[] part has o loop-carred depedece. hs suggests breakg up the loop to two: for (=; <=N; ++) { S: temp[] = b[] * c[]; for (=; <=N; ++) { S: a[] = a[-] + temp[]; he frst loop s zable. he secod s ot. Executo tme: N ( S + S ) What s a dsadvatage of ths approach? Here s how to solve ths problem: post(0); for (=; <=N; ++) { S: temp = b[] * c[]; wat(-); S: a[] = a[-] + temp; post(); What s the executo tme ow? Parallelsm across statemets a loop [ 3..4] Idetfy depedeces a loop body. If there are depedet statemets, ca splt/dstrbute the loops. Lecture 5 Archtecture of Parallel Computers 9

Example: for (=0; <; ++) { S: a[] = b[+] * a[-]; S: b[] = b[] * coef; S3: c[] = 0.5 * (c[] + a[]); S4: d[] = d[-] * d[]; Loop-carred depedeces: Loop-dep. depedeces: Note that S4 has o depedeces wth other statemets S[] A S[+] mples that S at terato + must be executed after S at terato. Hece, the depedece s ot volated f all Ss executed after all Ss. After loop dstrbuto: for (=0; <; ++) { S: a[] = b[+] * a[-]; S: b[] = b[] * coef; S3: c[] = 0.5 * (c[] + a[]); for (=0; <; ++) { S4: d[] = d[-] * d[]; Each loop s a parallel task. hs s called fucto parallelsm. Further trasformatos ca be performed (see p. 44 of text). hs s called fucto parallelsm, ad ca be dstgushed from data parallelsm, whch we saw DOALL ad DOACROSS. Characterstcs of fucto parallelsm: Ca use fucto parallelsm alog wth data parallelsm whe data parallelsm s lmted. DOPIPE Parallelsm [ 3..5] Aother strategy for loop-carred depedeces s ppelg the statemets the loop. 00 Edward F. Gehrger CSC/ECE 506 Lecture Notes, Sprg 00 0

Cosder ths stuato: Loop-carred depedeces: for (=; <=N; ++) { S: a[] = a[-] + b[]; S: c[] = c[] + a[]; Loop-dep. depedeces: o parallelze, we just eed to make sure the two statemets are executed syc: for (=; <=N; ++) { a[] = a[-] + b[]; post(); for (=; <=N; ++) { wat(); c[] = c[] + a[]; Questo: What s the dfferece betwee DOACROSS ad DOPIPE? Lecture 5 Archtecture of Parallel Computers