John Weatherwax. Analysis of Parallel Depth First Search Algorithms

Similar documents
Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

4. Score normalization technical details We now discuss the technical details of the score normalization method.

MATH 2710: NOTES FOR ANALYSIS

Feedback-error control

Hotelling s Two- Sample T 2

Round-off Errors and Computer Arithmetic - (1.2)

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel

Statics and dynamics: some elementary concepts

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Principles of Computed Tomography (CT)

Elementary Analysis in Q p

Approximating min-max k-clustering

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Machine Learning: Homework 4

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Convex Optimization methods for Computing Channel Capacity

Department of Mathematics

A Social Welfare Optimal Sequential Allocation Procedure

Understanding DPMFoam/MPPICFoam

8 STOCHASTIC PROCESSES

A generalization of Amdahl's law and relative conditions of parallelism

arxiv: v1 [physics.data-an] 26 Oct 2012

Numerical Linear Algebra

Chapter 7 Rational and Irrational Numbers

An Analysis of Reliable Classifiers through ROC Isometrics

Distributed Rule-Based Inference in the Presence of Redundant Information

Analysis of some entrance probabilities for killed birth-death processes

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

Economics 101. Lecture 7 - Monopoly and Oligopoly

Chapter 1 Fundamentals

2 Asymptotic density and Dirichlet density

Notes on pressure coordinates Robert Lindsay Korty October 1, 2002

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Supplementary Materials for Robust Estimation of the False Discovery Rate

Sets of Real Numbers

Fig. 21: Architecture of PeerSim [44]

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

PARALLEL MATRIX MULTIPLICATION: A SYSTEMATIC JOURNEY

1 Gambler s Ruin Problem

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

CMSC 425: Lecture 4 Geometry and Geometric Programming

Periodic scheduling 05/06/

MAS 4203 Number Theory. M. Yotov

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

RESOLUTIONS OF THREE-ROWED SKEW- AND ALMOST SKEW-SHAPES IN CHARACTERISTIC ZERO

The non-stochastic multi-armed bandit problem

PARALLEL MATRIX MULTIPLICATION: A SYSTEMATIC JOURNEY

Plotting the Wilson distribution

2 Asymptotic density and Dirichlet density

PHYS 301 HOMEWORK #9-- SOLUTIONS

Fault Tolerant Quantum Computing Robert Rogers, Thomas Sylwester, Abe Pauls

CMSC 425: Lecture 7 Geometric Programming: Sample Solutions

On the Toppling of a Sand Pile

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Published: 14 October 2013

End-to-End Delay Minimization in Thermally Constrained Distributed Systems

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

Radial Basis Function Networks: Algorithms

CSC165H, Mathematical expression and reasoning for computer science week 12

The Arm Prime Factors Decomposition

Solutions to Problem Set 5

ECE 534 Information Theory - Midterm 2

START Selected Topics in Assurance

Positive decomposition of transfer functions with multiple poles

MA3H1 TOPICS IN NUMBER THEORY PART III

Adam Paweł Zaborski. 8 Plasticity. reloading. 1. Bauschinger s effect. 2. unchanged yielding limit. 3. isotropic hardening

HENSEL S LEMMA KEITH CONRAD

THE SET CHROMATIC NUMBER OF RANDOM GRAPHS

An Analysis of TCP over Random Access Satellite Links

Almost 4000 years ago, Babylonians had discovered the following approximation to. x 2 dy 2 =1, (5.0.2)

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at

Analysis of Multi-Hop Emergency Message Propagation in Vehicular Ad Hoc Networks

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

A randomized sorting algorithm on the BSP model

SAS for Bayesian Mediation Analysis

MATH 250: THE DISTRIBUTION OF PRIMES. ζ(s) = n s,

Multi-Operation Multi-Machine Scheduling

SIMULATED ANNEALING AND JOINT MANUFACTURING BATCH-SIZING. Ruhul SARKER. Xin YAO

Universal Finite Memory Coding of Binary Sequences

ECE 6960: Adv. Random Processes & Applications Lecture Notes, Fall 2010

Brownian Motion and Random Prime Factorization

Statistical Multiplexing Gain of Link Scheduling Algorithms in QoS Networks (Short Version)

A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations

SUPER-GEOMETRIC CONVERGENCE OF A SPECTRAL ELEMENT METHOD FOR EIGENVALUE PROBLEMS WITH JUMP COEFFICIENTS *

Linear diophantine equations for discrete tomography

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Elliptic Curves and Cryptography

Transcription:

Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel Deth First Search Algorithms Following the discussion in the book on this toic I will clarify some oints that I had difficulty following. Hoefully, these comments will be of hel to others as they learn this material. In the resented analysis, there is a total of W work (searching) to be done and we initially have all of it assigned to a single rocessor. This is in fact the way most arallel rograms begin so our assumtions are in line with reality. Since we have not secifically secified the load balancing scheme we can be guaranteed that this W amount of work will be slit u and distributed to the remaining rocessors if the load balancing scheme is weakly fair. Intuitively, this means that after enough requests for work in the system (from any rocessor) have been issued eventually every rocessor will have work requested from it. In other words, no rocessor is excluded from being asked to rovide work. To quantify the notion of how many requests must take lace before we can guarantee that every rocessor has had work requested from it. We define V () to be the number of requests that must take lace in the system before we can guarantee that every rocessor has been olled for work. Obviously, V () deends on the secific load balancing scheme used be it global round

robin, randomized olling etc. We see that at least V () since if V () < we have not even generated enough olling to request from every existing rocessor. Secifically V () will deend on the load balancing algorithm and will be secified when we discuss the exlicit load balancing schemes below. In our initial situation, with all the work W at one rocessor, we must have V () requests in our system before we can be guaranteed to to have requested work from. With the first request that accets, the maximal work in the system dros to ( α)w. This is assuming an α-slitting as discussed in the book. After 2V () requests it dros to ( α) 2 W, etc. After nv () total requests it dros to ( α) n W. To determine the number n of V () requests before our tasks are so small they can no longer be slit we enforce the requirement that after these n work divisions, the maximal work left in the system be less than ɛ. This gives ( α) n W < ɛ Solving for n gives n > log /( α) ( W ɛ ) Thus after n chunks of V () requests we can guarantee that the maximal work at any given node in the system is less than ɛ. Thus the total number of work requests is O(nV ()) and since O(n) = O(log /( α) ( W ɛ )) = O(log W ) the total number of work requests simlifies to O(V () log W ) as given in the text. If we further assume that each request for work and the corresond work transfers occuy constant time t comm, an uer bound on the amount of communication overhead can be given by T o = t comm V () log W. For variations on communication that does not take lace in constant time this see the roblems from this chater. A Different Analysis of V() for Random Polling Here is a different method of attack for obtaining the same answer for this question. As stated in the text, assume that we have total boxes to mark and we continue to make random attemts to mark an unmarked box. Define 2

f(q, n) to be the robability that q boxes are marked after n trials. Then the average number of trials needed to mark all of the boxes is given by N = kf(, k) () k= A simle artial difference equation for f(q, n) is given by the following logic. The robability that q + boxes are marked after n + trials can be given by the sum of the two mutually exclusive events q + boxes are marked after n trials and the n + attemt to mark a box fails q boxes are marked after n trials and the n + attemt to mark a box succeeds Now at any stage, with q (of ) boxes marked, the chance we mark another box (a success) on the next trial is q and the chance we do not (a failure) is given by q. With this background we can now exress f(q +, n + ) as f(q +, n + ) = q + (q + ) f(q +, n) + f(q, n). (2) For q 0 and n. Note that this is a linear artial difference equation and has many amenable method for its solution. For initial and boundary conditions for f(q, n) we see from simle arguments that f(0, n) = 0 for n (3) f(q, 0) = 0 for q (4) f(q, n) = 0 for q > n (5) f(, ) = (6) f(q, q) = (q) ( )( 2)... ( q + ) = for q (7) q q We now desire the solution to the artial difference equation, Eq. 2. It will benefit us to write it decremented in q by one or as f(q, n + ) = q q f(q, n) + f(q, n) for n, q. (8) 3

Attemts to solve this artial difference equation are reorted on below using a variety of techniques. Some were more successful that others and may indicate why the book choose the aroach they took. Many reduce to the same calculations. In locations where I couldn t make more rogress, I simly stoed, oting to come back to these derivations when my skills had imroved more. Oerator Method As a first attemt, lets try to solve Eq. 8 using the Oerator Method. As such we define E f(q, n) = f(q +, n) (9) E 2 f(q, n) = f(q, n + ) (0) and our original equation becomes ( q E 2 f = ( q ) )E f () therefore recognizing the above as n iterations on f(q, n) we have (by induction) and then the binomial theorem that ( q f(q, n) = + ( q ) n )E g(q) (2) ( ) ( ) n k ( n q = q ) n k E (n k) g(q) (3) k k=0 ( ) ( ) n k ( n q = q n k g(q n + k) (4) k ) k=0 As an interesting ( ) observation, that may shed light into how to roceed, notice that ( q n k )k ( q )n k is the Binomial distribution. That is, the robability of k successes in n trials where the robability of a single success is q. The Z-transform with resect to n Taking the Z-transform with resect to n gives zf (q, z) zf(q, 0) = q q F (q, z) + F (q, z) for q (5) 4

Now f(q, 0) = 0 when q > 0 so the above can be solved for F (q, z) giving Solving by iteration we obtain F (q, z) = q z q F (q, z) for q (6) F (q, z) = q ( l ) q (z l (0, z) (7) )F Note that F (0, z) = Z(f(0, n)) = n 0 f(0, n)z n = f(0, 0)z 0 = f(0, 0) (8) At this oint we have not determined f(0, 0). Limiting values of Eqs. 3 and 4 above would indicate that its value should be 0. This cannot be true, or else we end u with the trivial solution for F (q, z). In addition, limiting values of the Eq. 7 require the value of f(0, 0) =, which is what we use in the analysis below. To comute the inverse Z-transform of Eq. 7 we note that this exression can be searated into a sum of individual terms by artial fractions. To enable such a transformation we are looking for a set of coefficients A l such that Multilying both sides by A l q q (z l ) = z l ( q z l ) in the standard way we can obtain the following for the coefficients A l (9) A l = q m=,m l ( l ) = m q q m=,m l (l m) for l =, 2,..., q (20) so that with this substitution F (q, z) becomes F (q, z) = ( q ( l ) ) q 5 ( z l l A l ) (2)

Note that the roduct exression in the above equation can be written in terms of the factorial function as q ( l q ) = l) ( (22) = q ( l) (23) q so that at this oint F (q, z) becomes = ( )( 2)... ( q) (24) q = q+ (q+) (25) F (q, z) = (q+) q q A l ( z l l ) (26) To further comute the inverse Z-transform of this exression recall that the Z-transform of the Heavyside unit ste (with jum occurring at ) is given by (see Page 2 in the book by Kelly and Peterson []: Examle 3.37) n 0 u n () = z and recalling the geometric roduct formula for the Z-transform (27) Z(a n y n ) = Y ( z a ) (28) we can invert the above function obtaining ( ) f(q, n) = (q+) q n A l l u q n () (29) l = (q+) q u n() A q+n l l n (30) The Method of Generating Functions In this method we multily both sides of the equation by x n and sum from n 0. This gives f(q, n + )x n = q n 0 n 0 f(q, n)xn + q f(q, n)x n (3) n 0 6

Defining F (q, x) = n 0 f(q, n)x n the above becomes or or n f(q, n)x n = q F (q, x) + ( q )F (q, x) (32) x F (q, x) = q F (q, x) + ( q )F (q, x) (33) F (q, x) = ( q ) x q (34) The Method of Searation of Variables To use Searation of Variables we define f(q, n) = F q G n and insert this into our artial difference above to get Now dividing by G n F q we obtain F q G n+ = q F qg n + ( q )F q G n (35) G n+ = q ( G n + q ) Fq. (36) F q Since the left hand side is a function only n and the right hand side is only a function of q they each must equal a searation constant we denote α. With this substitution the equation for G n then becomes G n+ = αg n (37) with a solution of Similarly F q must satisfy or G n = G 0 α n (38) q + ( q )F q F q = α (39) F q = q q αf q (40) 7

Derivation of a difference equation for N This may yet be another method that could yield manageable calculations. Chater 8 Solutions Problem To evaluate the erformance of this scheme we will follow the discussion given in the book and comute the overhead due to communication i.e. resulting from work requests and work transfers necessitated by the load balancing algorithm; be it asynchronous round robin, global round robin, or random oling. From the the discussions in the book, the communication overhead is given by T 0 = t comm V () log W. (4) Assuming a constant communication time (t comm ) indeendent of the distance between rocessors and the amount of data sent. Problem 4 Part (a): Assume as in the discussion in the text that W amount of work is initially laced at a single rocessor. In V () requests and assuming an α-slitting we reduce the maximum work at any given node to ( α)w. We also require distributing either αw or ( α)w to the requesting rocessor (in a real imlementation where communication bandwidth is limited one would want to transmit the smaller of those two exressions). This transmission requires a communication time roortional to either αw or ( α)w and is bounded above by the larger of these two. Since we assume that 0 < α < 0.5 this is ( α)w. This total negotiation for work results in at most V () requests each with t comm communication time and the additional transfer of work which is bounded above by t w ( α)w. Giving a total communication overhead of T () o = t comm V () + t w ( α)w. (42) A second alication of the same logic results in T (2) o = T () o + t comm V () + t w ( α) 2 W (43) 8

= 2t comm V () + t w ( ( α)w + ( α) 2 W ) (44) Induction to multile blocks of work requests on the above attern (until the work at all rocessors is less than the minimum slit amount ɛ) gives for the total overhead due to communication (transfer of data lus requests) of n T o = t comm V () log W + t w W ( α) i/2 (45) Where as in the notes for this section n > log /( α) ( W ɛ ). To determine the total communication overhead we must erform the summation in Eq. 45. The simlest way to evaluate this exression is to consider the limit when n tends to infinity. In that case, the summation above is a geometric series and converges nicely to a constant. Giving in total that the total overhead is given by i= T 0 = t comm V () log(w ) + O(t w W ) (46) Now Eq. 46 reresents the communication required by the load balancing strategy through its deendence on V (). For load balancing by Global Round Robin (GRR) the function V () = since we must have at least work requests in the system before we can be gaurrenteed that every rocessor has received at least one request and Eq. 46 becomes T 0 = t comm log(w ) + C t w W + C0 (47) For the hyercube architecture the average distance between rocessors is given by Θ(log()) so that the above becomes T 0 = O( log() log(w )) + O(t w W ). (48) To derive the iso-efficiency function we ballance the communication overhead T 0 with the total work to be done W giving W = O( log() log(w )) + O( W ) (49) and we desire to solve the above asymtotically for W = W (). Asymtotically, W W, and we dro the O( W ) term from the above giving W log() log(w ) when W 9

giving W = O( log() log( log( log(w )))) (50) = O( log((log() + log(log()) + log(w )))) (5) O( log() 2 ) (52) As discussed in the text for global round robin, the iso-efficiency due to contention for the shared resource counter rocessor is O( 2 log()) and this dominates the iso-efficiency above. Part (b): In this case, after the system has seen V () work requests the maximal work at any one rocessor is reduced to W ( α), after two blocks of V () work requests the maximal work in any one rocessor is given by W ( α) 2. As before we have, after n blocks of V () work requests the maximal work in any one rocessor is given by W ( α) n. The assumtions of this art of the roblem assume that the communication time for the transfer of work is uer bounded by log(w ( α) n ). (53) So in total, the total time required for the data transfered is bounded above by n t w log(w ( α) i ) = n t w log(w ) + i log( α) (54) i= i= = n t w n log(w ) + t w log( α) i (55) i= with t w the er word data transfer time (time required to transfer a word of data). Since n = O(log(W )) as discussed above we observe that the above is bounded above by t w (log(w )) 2 + t w log( α)n 2 = O(log(W ) 2 ). (56) Thus the overhead due to communication of work requests is T 0 = t comm V () log(w ) + (log(w )) 2 (57) since for Global Round Robin V () = and we obtain T 0 = t comm log(w ) + (log(w )) 2 (58) 0

For the hyercube architecture t comm = log() and we have T 0 = log() log(w ) + (log(w )) 2. (59) To derive the iso-efficency function set T 0 = W and solve for W as a function of. The assignment of T 0 = W gives W = log() log(w ) + (log(w )) 2 (60) Since (log(w )) 2 is subdominant to W (the left hand side) we obtain equating W with the log() term the exression W = log() log(w ) (6) which is the same result as in Part (a) and gives an iso-efficiency function of W = O((log()) 2 as before. Since for both of these situations the communication time due to data transfer did not affect the iso-efficiency function we see a motivation for not including it in the discussion resented in the book. References [] W. G. Kelley and A. C. Peterson. Difference Equations. An Introduction with Alications. Academic Press, New York, 99.