A generalization of Amdahl's law and relative conditions of parallelism

Similar documents
MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Estimation of the large covariance matrix with two-step monotone missing data

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

State Estimation with ARMarkov Models

MATH 2710: NOTES FOR ANALYSIS

Elliptic Curves and Cryptography

Speedup for Multi-Level Parallel Computing

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

arxiv: v1 [physics.data-an] 26 Oct 2012

Feedback-error control

Convex Optimization methods for Computing Channel Capacity

Statics and dynamics: some elementary concepts

A Closed-Form Solution to the Minimum V 2

CMSC 425: Lecture 4 Geometry and Geometric Programming

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms

An Analysis of Reliable Classifiers through ROC Isometrics

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

PROFIT MAXIMIZATION. π = p y Σ n i=1 w i x i (2)

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations

Multi-Operation Multi-Machine Scheduling

A Parallel Algorithm for Minimization of Finite Automata

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

START Selected Topics in Assurance

A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE

The Binomial Approach for Probability of Detection

Some results of convex programming complexity

Parallel Quantum-inspired Genetic Algorithm for Combinatorial Optimization Problem

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Finite-State Verification or Model Checking. Finite State Verification (FSV) or Model Checking

Uncertainty Modeling with Interval Type-2 Fuzzy Logic Systems in Mobile Robotics

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes

Information collection on a graph

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Generalized analysis method of engine suspensions based on bond graph modeling and feedback control theory

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Approximating min-max k-clustering

Applied Mathematics and Computation

Lower bound solutions for bearing capacity of jointed rock

Topology Optimization of Three Dimensional Structures under Self-weight and Inertial Forces

Round-off Errors and Computer Arithmetic - (1.2)

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Hotelling s Two- Sample T 2

General Linear Model Introduction, Classes of Linear models and Estimation

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction

Session 5: Review of Classical Astrodynamics

Chapter 7 Rational and Irrational Numbers

A randomized sorting algorithm on the BSP model

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

COMPARISON OF VARIOUS OPTIMIZATION TECHNIQUES FOR DESIGN FIR DIGITAL FILTERS

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Universal Finite Memory Coding of Binary Sequences

Radial Basis Function Networks: Algorithms

Understanding and Using Availability

Finite Mixture EFA in Mplus

Minimax Design of Nonnegative Finite Impulse Response Filters

The non-stochastic multi-armed bandit problem

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Elementary Analysis in Q p

Galois Fields, Linear Feedback Shift Registers and their Applications

Age of Information: Whittle Index for Scheduling Stochastic Arrivals

GIVEN an input sequence x 0,..., x n 1 and the

One step ahead prediction using Fuzzy Boolean Neural Networks 1

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process

Principles of Computed Tomography (CT)

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

Improved Capacity Bounds for the Binary Energy Harvesting Channel

AR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM. Radu Balan, Alexander Jourjine, Justinian Rosca. Siemens Corporation Research

Churilova Maria Saint-Petersburg State Polytechnical University Department of Applied Mathematics

FORMAL DEFINITION OF TOLERANCING IN CAD AND METROLOGY

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Information collection on a graph

CSC165H, Mathematical expression and reasoning for computer science week 12

A SIMPLE AD EFFICIET PARALLEL FFT ALGORITHM USIG THE BSP MODEL MARCIA A. IDA AD ROB H. BISSELIG Abstract. In this aer, we resent a new arallel radix-4

Adaptive estimation with change detection for streaming data

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

On Fractional Predictive PID Controller Design Method Emmanuel Edet*. Reza Katebi.**

An Improved Calibration Method for a Chopped Pyrgeometer

Fault Tolerant Quantum Computing Robert Rogers, Thomas Sylwester, Abe Pauls

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION

Developing A Deterioration Probabilistic Model for Rail Wear

Approximation of the Euclidean Distance by Chamfer Distances

Implementation of a Column Generation Heuristic for Vehicle Scheduling in a Medium-Sized Bus Company

Recent Developments in Multilayer Perceptron Neural Networks

q-ary Symmetric Channel for Large q

ON THE DEVELOPMENT OF PARAMETER-ROBUST PRECONDITIONERS AND COMMUTATOR ARGUMENTS FOR SOLVING STOKES CONTROL PROBLEMS

Eigenanalysis of Finite Element 3D Flow Models by Parallel Jacobi Davidson

Transcription:

A generalization of Amdahl's law and relative conditions of arallelism Author: Gianluca Argentini, New Technologies and Models, Riello Grou, Legnago (VR), Italy. E-mail: gianluca.argentini@riellogrou.com Abstract: In this work I resent a generalization of Amdahl's law on the limits of a arallel imlementation with many rocessors. In articular I establish some mathematical relations involving the number of rocessors and the dimension of the treated roblem, and with these conditions I define, on the ground of the reachable seedu, some classes of arallelism for the imlementations. I also derive a condition for obtaining suerlinear seedu. The used mathematical technics are those of differential calculus. I describe some examles from classical roblems offered by the secialized literature on the subject. Key words: dimension of a roblem, high erformances, arallel imlementation, scalability analysis, seedu.. Introduction In the world of arallel comuting or in general of high erformances one of the metric more useful for evaluating the gain reachable in an imlementation on many rocessors of a rogram in comarison with its serial monorocessor version is the seedu S (v. PACHECO, 997), defined as the ratio between the time Tser occurred for the execution of the serial rogram and the time Tar occurred for its arallel version: S = Tser ÅÅÅÅÅÅÅÅÅÅÅÅÅÅ Tar In this work I consider these two times as comuted by a scalability analysis (v. GROPP, 2002) of a articular logic imlementation of the roblem which they refer to, and not by their measurement on a articular hardware system. From the formula () one obtain the Amdahl's Law (v. AMDAHL, 967) by mean of the concet of arallelizable fraction f of a articular arallel imlementation, that is the ercentage of statements that are executable at the same time on many rocessors: S = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ f H - L + where is the number of used rocessors and 0 < f. The otimal case, that is when f =, rovides for S a value equal to. The S is an increasing function resect to the variable, and for tending to infinity we obtain the limit () (2)

2 ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ - f that exresses the Amdahl's law, from which one observe that the seedu admits a suerior limitation determined by the used code, even if the number of rocessors is very high. In figure is drawn the grahic of seedu for a code with f = 0.8; the asymtotic limit is 5: (3) Seedu 6 Fig. 5 4 3 2 20 40 60 80 00 rocessors But the limitation imosed from this law can be overtaken, even in a a large measure, if one thinks that in formula (2) is not exlicitly resent the arameter that exresses the dimension of the considered roblem, that is the number n of data given as inut. A first reexamination of the Amdahl's law (v. BENNER - GUSTAFSON - MONTRY, 988) shows that, in several real situations, when the number of rocessors increases, a corresonding oortune increase of the number of treated data rovides a seedu much bigger than that imosed by (2). Even in (PACHECO, 997) there is a brief but illuminating discussion of this ossibility. The urose of this work is to exlain from a mathematical oint of view how it is ossibile to obtain seedu values much higher than those estimated from Amdahl's law, and to classify the arallel imlementations on the ground of the seedu obtained by various combinations of the arameters n and. 2. Generalization of Amdahl's law I will refer to an imlementation of a given roblem, that is to a triad (Problem, Program, System) constituted by a roblem, for examle the multilication of a matrix for a vector, by a rogram that accets the roblem's data as inut and by an oerative environment, to be intended as hardware as software, into which the rogram runs. The roblem gives the dimension n of the imlementation; the rogram, adating itself even to the hardware that is used, gives the number of rocessors. In accordance with the most art of the literature on the high erformances, for simlicity I identify the number of rocessors with the number of indeendent used rocesses, equal coies of the considered rogram. For examle, in the codes that use the arallelization library MPI (see MPI FORUM) the start of a arallel rogram ermits to secify the number of rocesses, that is of indeendent coies of the rogram itself, that communicate each other during the execution. Also I use the condition that the rocessors are all of the same tye and that the oerative environment is homogeneous resect to the used software (.e. comilers, libraries).

3 The first consideration is that the arameter f of Amdahl's Law can deend from the used imlementation, that is from the code instructions of the rogram used for treating the roblem, and hence in general it will be a function of and n: f = f H, nl If we introduce (4) into (2) and derive artially resect to, we obtain (4) S ÅÅÅÅÅÅÅÅÅÅ = ÅÅÅÅÅÅÅ 2 - ÅÅÅÅÅÅÅ + f ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ H f + H - f L L 2 One realist and desirable condition in the arallel imlementations for high erformances is that the seedu grows when the numbers of used rocessors increases, hence we imose the following condition: (5) S ÅÅÅÅÅÅÅÅÅÅ 0 and from the fact that the denominator in (5) is always ositive it follows that ÅÅÅÅÅÅÅÅÅÅ 2 - ÅÅÅÅÅÅÅÅÅÅ + f 0 Since f > 0, (7) can be written in this way: ÅÅÅÅÅÅÅ + H 2 - L ÅÅÅÅÅÅÅÅÅÅ 0 f Also, since > in a arallel imlementation, follows that 2 - > 0, ÅÅÅÅÅÅÅ H Log f L and remembering that ÅÅÅÅÅÅÅÅÅÅ = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ, from H8L one obtains : f (6) (7) (8) H Log f L ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ (9) H 2 + ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ 0 - L Indefinitely integrating the first addendum resect to the variable, (9) is equivalent to ÅÅÅÅÅÅÅÅÅÅ i k jlog ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ - + ghnl + Log f y z 0 { where g (n) is an arbitrary function of n, that is the constant of integration resect to. Indicating with FH, nl the sum of the two logarithmic exressions, one can write (0) Log f = FH, nl - Log - ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ from which follows that: = FH, nl + Log ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ - ()

4 f = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ExH FH, nll - (2) We call the function F exonent of arallelism. We see now two F 's roerties useful for the subsequent discussion. First of all from (0) follow F ÅÅÅÅÅÅÅÅÅ (3) 0 hence F is an increasing function resect to. Also, being by definition f, from (2) one obtains the following condition: F H, nl Log - ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ < 0 Now we'll do some reasonings about the relation between F and the arallelism of the relative imlementation. In the first lace we can notice that it's realistic assuming that the dimension n of a treated roblem is greater of the number of rocessors used in the corresonding arallel imlementation; on the contrary we have a waste of hardware resources. Also, as exlicited.e. in (PACHECO, 997) and (BENNER - GUSTAFSON - MONTRY, 988), it's realistic assuming that as the number of used rocessors increases, the roblem dimension can increase too, otherwise the concet of arallelism's erformance and the research of a good imlementation itself could have no meaning. Under these iothesis from (2) there are two ossibilities: A) if becomes very large, the ratio ÅÅÅÅÅ n tends to a finite limit greater than 0; this is the case for examle when n = k, where k is a constant, even large; in this situation the two ossible meaningful alternatives are, remembering the conditions (3) and (4): a') FH, nl tends to 0, and from this follows that f tends to ; hence the arallel imlementation allows an unlimited seedu when increases, under the condition that the growth of roblem's dimension n is asymtotic to the rocessors number; a'' ) FH, nl tends to a finite limit smaller than 0, and from this follows that f tends to a value f 0 where 0 < f 0 < ; in this case, when the rocessors number increases, the arallel imlementation gives a behaviour that conforms to Amdahl's law, and the seedu is equal to ÅÅÅÅÅÅÅÅÅÅÅÅ (4) - f 0 ; B) if becomes very large, the ratio ÅÅÅÅÅ n tends to + ; this is the case for examle when n = Log ; in this situation the two ossible meaningful alternatives are: b') FH, nl tends to 0, and from this follows that f tends to ; hence the arallel imlementation allows an unlimited seedu when increases, and the roblem dimension can now increases with a very high velocity resect to the rocessors number; hence this situation is otimal and a real case is given for examle in (PACHECO, 997) by a arallel imlementation of the numerical integration with the traezoidal rule; b'' ) FH, nl tends to a finite limit smaller than 0, and from this follows that f tends to a value f 0

5 where 0 < f 0 < ; hence in this case too, when the rocessors number increases, the arallel imlementation gives a behaviour that conforms to Amdahl's law, but the roblem dimension can now increase in a way not asymtotic to the rocessors number. I consider not very realistic or at least not meaningful for an analysis of their arallelism the situations where the exonent of arallelism tends to -, in which case f tends to 0, the seedu becomes and there is resources's waste, or whose for which the ratio ÅÅÅÅ n tends to 0. From the revious considerations we can get the following theorem-definition, which generalizes the Amdahl's law and establishes the arallelism tye for a given imlementation: In a arallel imlementation, n is the dimension of the relative roblem, the rocessors number, F (,n) the function exonent of arallelism. gh L If n = g() is an increasing function of such that the ratio ÅÅÅÅÅÅÅÅÅÅÅ tends to + and F (, g()) tends to 0 for tending to +, than the imlementation is strongly arallel. If F (, g()) tends to 0 for tending to + only for g() increasing function gh L of such that the ratio ÅÅÅÅÅÅÅÅÅÅÅ tends to a finite limit, than the imlementation is weakly arallel. If for every g () increasing function of the function F (, g()) tends to a limit smaller than zero for tending to +, than the imlementation is Amdahl-like arallel. 3. Considerations and examles When we aly the receding classification we meet two roblems: the first is how to calculate the function exonent of arallelism, and the second is how to demonstrate if exist the right g () function. The former deends uon the used imlementation and it can be solved by mean of a scalability analysis (see GROPP, 2002), and in the following considerations I'll try to exlain how it is ossible to obtain some useful information; the latter is a roblem of mathematical kind which can be tackled by mean of the methods of differential calculus. First of all we can notice that in (2) we are interested to know when, for large enough values of, the exonential values are near to, and hence if we write the olynomial series of Ex(F) resect to the argument F we can assert that the following is a good aroximation: i f = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ - k j + F + F 2 y ÅÅÅÅÅÅÅÅÅÅ z 2 { Using this exression of f into (2) one can obtain the following formula: F =- $%%%%%%%%%%%%%% - ÅÅÅÅÅ 2 S (5) (6)

6 where the interesting case is that with the + sign, because it is useful for examine the closeness of F to 0. In () we can consider Tar = Tar(, n). Also we can establish, as a good aroximation in general and otimal in the case when all the system's rocessors are of the same kind, that Tser = Tar(, n), i.e. the time sent in a serial imlementation of a rogram can be aroximated with the time sent in the corrisonding arallel imlementation executed in a unique rocessor. From () and (6) one obtain the following arallelism condition: Tar H, nl Tar H, nl ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ (7) 2 which can be interretated as a minimal condition of arallelism for the warranty of an advantageous seedu. We now examine some examle in which using the relation 2 T ar H, nl F =- + $%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ T ar H, nl (8) we can obtain the informations given in the receding generalization of Amdahl's law. In (PACHECO, 997) is resented a arallel imlementation of the numeric comutation of a definite integral by mean of the traezoidal rule, and using a scalability analysis the following estimate is obtained: T ar H, nl = a n ÅÅÅÅÅÅÅ + b Log whee a and b are two ositive constants deending from the used oerative environment. Hence we have Tar H, nl ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ Tar H, nl an + b Log = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ an from which we see, using the rules of calculus, that if one consider for examle n = 2, increasing function resect to, the limit of the ratio when indefinitely grows is 0, and from (8) one obtain that F tends to 0. Hence on the ground of the receding classification, the imlementation is strongly arallel, and this fact is in accordance with the scalability argumentations develoed from Pacheco, from which if the increment of n is oortunity guided by that of, the seedu remains high. One can see that from the receding ratio for alying the classification rule it is sufficient to use the function n = Log, as reorted by Pacheco. If one uses the function n =, the limit of F when increases is still 0, but since the ratio ÅÅÅÅ n tends to a finite limit the imlementation, already classified as strongly arallel, reveals in such conditions a weakly arallel behaviour. This fact shows, in accordance with (BENNER -GUSTAFSON-MONTRY, 988), the imortance of a convenient growth function n = g() in a arallel imlementation which would have as aim a high seedu. Also it can be notice that keeing fixed the dimension n of the roblem, when the rocessors number increases the receding ratio increases in an unbounded manner, hence the arallelism is

7 no more advantageous, the (8) is not alicable, S tends to zero and hence the Amdahl's law (3) turns out not correct. This fact suggests the hyotesis that in the set of ossible imlementations the strongly arallel imlementations are those that turn out advantageous when the roblem's dimension grows in a suitably considerable way resect to the used rocessors number. In contrast with the Fig., we resent in the following Fig.2 the grahics of the seedu for the traezoidal rule resectively in the case n = 2, n = Log e n = : Seedu 500 Fig.2 400 300 200 00 00 200 300 400 500 rocessors from which one can see that the seedu imroves when the derivative resect to the rocessors number of the roblem dimension grows. In (CORMEN-LEISERSON-RIVEST, 990) is resented a arallel imlementation for the calculation of a Fast Fourier Transform which has the aim of keeing an asymtotic execution time Tar = A Log (n), where A is a constant, n is the inut dimension and the logarithm is in base 2. The corresonding serial imlementation shows an asymtotic time Tser = B n Log (n). The constants A and B deend from the used oerative environment. The arallel rocedure is obtained by mean of a necessary configuration of n combinatorial elements, which are resonsable of the oerations of addition, multilication and intercommunication of artial results. If one use rocessors, each of these assembles k combinatorial elements, it is n = k, and hence T ar ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ Tser = A ÅÅÅÅÅÅÅÅÅÅÅÅ Bn = A ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ Bk The conclusion is that the function F tends to 0 for tending to +, and this under the condition that the roblem dimension grows linearly resect to the rocessors number. Hence on the ground of the receding classification, the arallel imlementation roosed for the FFT is weakly arallel. In this examle if one kees constant the roblem dimension n, increasing even indefinitely the rocessors (.e. identifying each of them with a single combinatorial element, hence imosing k = ), the ratio of the execution times is constant too, and therefore in this case the Amdahl's law can be alied. This consideration suggests that in the set of ossible imlementations the weakly arallel imlementations are those for which, keeing constant the relative roblem's dimension, the Amdahl's law is alicable.

8 In (GROPP, 2002) is resented an imlementation for multilying a n x n matrix with a vector of n comonents. The scalability analysis gives the following estimate: T ar H, nl = ah 2 n2 - n L ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ + b H n 2 + n L where a and b are two ositive constants deending from the used oerative environment. Hence one obtain T ar H, nl ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ T ar H, nl = a H 2 n2 - n L + bhn 2 + n L ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅ @a H 2 n 2 - n L + b H n 2 + n LD = bn2 + 2 an 2 + bn-an ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ H 2 a + b L n 2 + H- a + bl n from which we see, using the rules of calculus, that if one considers a whatever increasing function n = g () such that ÅÅÅÅÅÅÅÅÅÅÅ g HL tends to + or not, the limit of the ratio when grows is b ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ H 2 a + b L Suosing that this quantity satisfies the minimal condition of arallelism (7), the limit of the function F given by (8) is finite and negative, therefore the rule of classification reviously enunciated secifies that the considered imlementation is Amdahl-like arallel. In (GROPP, 2002) is shown that the arameter b is the number of microseconds sent by a secific hardware for communicate beetwen two rocesses a floating-oint quantity, while a is the time sent for the execution of a floating-oint oeration. The (7) should imly the condition b 2a, not very realistic with the resent hardware, but the imrovements on the communications technology beetwen rocesses could render it reliable in the future. Imosing b = 2a, from (8) we have F = -, hence from (5) we obtain for f the limit 0.5, therefore the asymtotic value of the seedu is 2, that is near to the value 2.3 obtainable directly from (); in the following Fig.3 I resent the grahics of the seedu in the case n = 2, n = Log and n =, from which it is clear that the increasing of the dimension roblem resect to the used rocessors number ractically hasn't influence, on the contrary of what haens in a strongly arallel imlementation: Seedu 2.5 Fig.3 2.5 0.5 0 20 30 40 rocessors 4. Suerlinear seedu In some real situations a suerlinear seedu is been registered, that is for some values of and

9 n the exerimental value of S is resulted greater then. A necessary condition for this situation is that in (2) the function f assumes values greater than, and recisely f > ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ - This doesn't agree fullfully with the initial definition of f as arallelizable fraction of a code, and hence smaller or equal to. Now we see how the suerlinear seedu can be exlained, in the mathematical roosed model, in a fashion coherent with the original definition of f. From (2) and (2) the suerlinear seedu imoses that (9) ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ExH FH, nll H - L + < - and hence it must be ExH FH, nll> - ÅÅÅÅÅÅ therefore one obtains the condition (20) (2) FH, nl > Log i (22) k j - ÅÅÅÅÅÅ y z { First of all we can notice that the argument of logarithm is smaller than, therefore the (22) can be satisfied for values of F smaller than 0 too, and hence the roosed descrition is coherent with the originary condition f. We'll use the following aroximations, that are good for small values of the argument x: Log H + xl = x - x2 ÅÅÅÅÅÅÅÅ 2 (23) è!!!!!!!!!!! + x = + ÅÅÅÅÅ x (24) 2 - ÅÅÅÅÅÅÅÅ x2 8 In the case of a suerlinear imlementation the quantity under logarithm in (22) in general is small (mathematically it would sufficient = 0), therefore alying (23), from (22) and (6) one obtains FH, nl >-ÅÅÅÅÅÅÅÅÅÅÅÅ 2 2 - ÅÅÅÅÅÅ that can be considered as a suerlinear condition of an imlementation characterized by an exonent of arallelism F (,n). The figure 4 resents the grahic of the second member of (25), that reresents the inferior limit which must be resected by the function F (,n): (25)

0 Inferior limit for FH,nL Fig.4 2 4 6 8-0.25 rocessors -0.5-0.75 - -.25 -.5 In such conditions the argument under square root in (6) is near to, hence from (24) one obtains ÅÅÅÅÅÅÅÅÅÅÅÅÅ 2 S 2 + ÅÅÅÅÅ S < ÅÅÅÅÅÅÅÅÅÅÅÅ 2 2 + ÅÅÅÅÅÅ condition which is trivialy satisfied by the classical condition of suerlinearity S >. Therefore (25) and (26) extend the notion of suerlinearity, and this in a coherent fashion with the original condition f. As an examle of alication of (25) or (26), we consider the arallel imlementation of the Fast Fourier Transform already mentioned. In this case one has, as uon reorted, ÅÅÅÅÅ S = A ÅÅÅÅÅÅÅÅÅ B n We suose for simlicity that n, as haens in real alications, is sufficiently large so that in (26) one can disregard the quadratic term at first member. Denoting by C the quantity ÅÅÅÅÅ A, that B deends from the hardware and software environment, using (26) and solving the disequality resect to the variable one obtains < ÅÅÅÅÅÅÅÅÅÅ (27) 2 C J n + "###################### n 2 + 2 Cn N hence in this examle the suerlinear seedu is ossible only if the rocessors number is uer limited by a relation that involves the roblem dimension. In (CAVAZZONI-CHIAROTTI, 200) is reorted the exerimental observation of suerlinear seedu in an imlementation that uses in a great and sofisticated manner many arallelized FFT on a system Cray T3E with Fortran 90 as comiler: the rocessors region which resents suerlinearity is uer limited, and the (26)

henomenon is due to the effects of hardware and software caches, which in the receding mathematical schema are ointed out by the resence of the constant C. 5. Conclusions In this work I roosed a mathematical interretation of some exerimental results and of some theoretical digressions reorted in the literature on the ossible limits and erformances of arallel comuting. I roosed a generalization of Amdahl's law on the ossible seedu obtainable in a arallel imlementation. In articular I have resented some sufficient conditions in order that, in a given imlementation, the seedu could indefinitely grow when the dimension of the analyzed roblem increases as consequence of the growing of the used rocessors number. By mean of these conditions I have defined three classes into which the arallel imlementations can be classified, and the discriminant agent is offered by the relation of growth of roblem's dimension resect to the used rocessors number. Also I have roosed a condition of suerlinear seedu that is coherent with the original definition of arallelizable fraction of an imlementation. Some concrete examles are been resented to illustrate the formulated mathematical descrition, which in articular show that the obtained conditions have some constants deending from the hardware and software environment where the arallel imlementation is executed. Possible further develoments can regard the formulation of an algorithm for calculate the function exonent of arallelism of an imlementation, and an extension of the mathematical model to hardware architectures with non homogeneous rocessors. 6. Bibliograhy Gene AMDAHL, Validity of the single rocessor aroach to achieving large scale comuting caabilities, AFIPS Conference Proceedings, 967. R. BENNER - J. GUSTAFSON - G. MONTRY, Develoment of arallel methods for a 024- rocessor hyercube, SIAM Journal on Scientific and Statistical Comuting, 9(4), 988. Carlo CAVAZZONI - Guido CHIAROTTI, Imlementation of a Parallel and Modular Car- Parrinello Code, in Science and Suercomuting at CINECA, CINECA, Italy, 200. T.H. CORMEN - C.E. LEISERSON - R.L. RIVEST, Introduction to Algorithms, The MIT Press, Boston, 990. W. GROPP - E. LUSK, Parallel rogramming with MPI, in T. STERLING, Beowulf cluster comuting with Linux, The MIT Press, Boston, 2002. MPI FORUM, Web site: www.mi-forum.org, 2002. Peter PACHECO, Parallel rogramming with MPI, Morgan Kaufmann, San Francisco, 997.