A generalization of Amdahl's law and relative conditions of parallelism

A generalization of Amdahl's law and relative conditions of arallelism Author: Gianluca Argentini, New Technologies and Models, Riello Grou, Legnago (VR), Italy. E-mail: gianluca.argentini@riellogrou.com Abstract: In this work I resent a generalization of Amdahl's law on the limits of a arallel imlementation with many rocessors. In articular I establish some mathematical relations involving the number of rocessors and the dimension of the treated roblem, and with these conditions I define, on the ground of the reachable seedu, some classes of arallelism for the imlementations. I also derive a condition for obtaining suerlinear seedu. The used mathematical technics are those of differential calculus. I describe some examles from classical roblems offered by the secialized literature on the subject. Key words: dimension of a roblem, high erformances, arallel imlementation, scalability analysis, seedu.. Introduction In the world of arallel comuting or in general of high erformances one of the metric more useful for evaluating the gain reachable in an imlementation on many rocessors of a rogram in comarison with its serial monorocessor version is the seedu S (v. PACHECO, 997), defined as the ratio between the time Tser occurred for the execution of the serial rogram and the time Tar occurred for its arallel version: S = Tser ÅÅÅÅÅÅÅÅÅÅÅÅÅÅ Tar In this work I consider these two times as comuted by a scalability analysis (v. GROPP, 2002) of a articular logic imlementation of the roblem which they refer to, and not by their measurement on a articular hardware system. From the formula () one obtain the Amdahl's Law (v. AMDAHL, 967) by mean of the concet of arallelizable fraction f of a articular arallel imlementation, that is the ercentage of statements that are executable at the same time on many rocessors: S = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ f H - L + where is the number of used rocessors and 0 < f. The otimal case, that is when f =, rovides for S a value equal to. The S is an increasing function resect to the variable, and for tending to infinity we obtain the limit () (2)

2 ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ - f that exresses the Amdahl's law, from which one observe that the seedu admits a suerior limitation determined by the used code, even if the number of rocessors is very high. In figure is drawn the grahic of seedu for a code with f = 0.8; the asymtotic limit is 5: (3) Seedu 6 Fig. 5 4 3 2 20 40 60 80 00 rocessors But the limitation imosed from this law can be overtaken, even in a a large measure, if one thinks that in formula (2) is not exlicitly resent the arameter that exresses the dimension of the considered roblem, that is the number n of data given as inut. A first reexamination of the Amdahl's law (v. BENNER - GUSTAFSON - MONTRY, 988) shows that, in several real situations, when the number of rocessors increases, a corresonding oortune increase of the number of treated data rovides a seedu much bigger than that imosed by (2). Even in (PACHECO, 997) there is a brief but illuminating discussion of this ossibility. The urose of this work is to exlain from a mathematical oint of view how it is ossibile to obtain seedu values much higher than those estimated from Amdahl's law, and to classify the arallel imlementations on the ground of the seedu obtained by various combinations of the arameters n and. 2. Generalization of Amdahl's law I will refer to an imlementation of a given roblem, that is to a triad (Problem, Program, System) constituted by a roblem, for examle the multilication of a matrix for a vector, by a rogram that accets the roblem's data as inut and by an oerative environment, to be intended as hardware as software, into which the rogram runs. The roblem gives the dimension n of the imlementation; the rogram, adating itself even to the hardware that is used, gives the number of rocessors. In accordance with the most art of the literature on the high erformances, for simlicity I identify the number of rocessors with the number of indeendent used rocesses, equal coies of the considered rogram. For examle, in the codes that use the arallelization library MPI (see MPI FORUM) the start of a arallel rogram ermits to secify the number of rocesses, that is of indeendent coies of the rogram itself, that communicate each other during the execution. Also I use the condition that the rocessors are all of the same tye and that the oerative environment is homogeneous resect to the used software (.e. comilers, libraries).

3 The first consideration is that the arameter f of Amdahl's Law can deend from the used imlementation, that is from the code instructions of the rogram used for treating the roblem, and hence in general it will be a function of and n: f = f H, nl If we introduce (4) into (2) and derive artially resect to, we obtain (4) S ÅÅÅÅÅÅÅÅÅÅ = ÅÅÅÅÅÅÅ 2 - ÅÅÅÅÅÅÅ + f ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ H f + H - f L L 2 One realist and desirable condition in the arallel imlementations for high erformances is that the seedu grows when the numbers of used rocessors increases, hence we imose the following condition: (5) S ÅÅÅÅÅÅÅÅÅÅ 0 and from the fact that the denominator in (5) is always ositive it follows that ÅÅÅÅÅÅÅÅÅÅ 2 - ÅÅÅÅÅÅÅÅÅÅ + f 0 Since f > 0, (7) can be written in this way: ÅÅÅÅÅÅÅ + H 2 - L ÅÅÅÅÅÅÅÅÅÅ 0 f Also, since > in a arallel imlementation, follows that 2 - > 0, ÅÅÅÅÅÅÅ H Log f L and remembering that ÅÅÅÅÅÅÅÅÅÅ = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ, from H8L one obtains : f (6) (7) (8) H Log f L ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ (9) H 2 + ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ 0 - L Indefinitely integrating the first addendum resect to the variable, (9) is equivalent to ÅÅÅÅÅÅÅÅÅÅ i k jlog ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ - + ghnl + Log f y z 0 { where g (n) is an arbitrary function of n, that is the constant of integration resect to. Indicating with FH, nl the sum of the two logarithmic exressions, one can write (0) Log f = FH, nl - Log - ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ from which follows that: = FH, nl + Log ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ - ()

4 f = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ExH FH, nll - (2) We call the function F exonent of arallelism. We see now two F 's roerties useful for the subsequent discussion. First of all from (0) follow F ÅÅÅÅÅÅÅÅÅ (3) 0 hence F is an increasing function resect to. Also, being by definition f, from (2) one obtains the following condition: F H, nl Log - ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ < 0 Now we'll do some reasonings about the relation between F and the arallelism of the relative imlementation. In the first lace we can notice that it's realistic assuming that the dimension n of a treated roblem is greater of the number of rocessors used in the corresonding arallel imlementation; on the contrary we have a waste of hardware resources. Also, as exlicited.e. in (PACHECO, 997) and (BENNER - GUSTAFSON - MONTRY, 988), it's realistic assuming that as the number of used rocessors increases, the roblem dimension can increase too, otherwise the concet of arallelism's erformance and the research of a good imlementation itself could have no meaning. Under these iothesis from (2) there are two ossibilities: A) if becomes very large, the ratio ÅÅÅÅÅ n tends to a finite limit greater than 0; this is the case for examle when n = k, where k is a constant, even large; in this situation the two ossible meaningful alternatives are, remembering the conditions (3) and (4): a') FH, nl tends to 0, and from this follows that f tends to ; hence the arallel imlementation allows an unlimited seedu when increases, under the condition that the growth of roblem's dimension n is asymtotic to the rocessors number; a'' ) FH, nl tends to a finite limit smaller than 0, and from this follows that f tends to a value f 0 where 0 < f 0 < ; in this case, when the rocessors number increases, the arallel imlementation gives a behaviour that conforms to Amdahl's law, and the seedu is equal to ÅÅÅÅÅÅÅÅÅÅÅÅ (4) - f 0 ; B) if becomes very large, the ratio ÅÅÅÅÅ n tends to + ; this is the case for examle when n = Log ; in this situation the two ossible meaningful alternatives are: b') FH, nl tends to 0, and from this follows that f tends to ; hence the arallel imlementation allows an unlimited seedu when increases, and the roblem dimension can now increases with a very high velocity resect to the rocessors number; hence this situation is otimal and a real case is given for examle in (PACHECO, 997) by a arallel imlementation of the numerical integration with the traezoidal rule; b'' ) FH, nl tends to a finite limit smaller than 0, and from this follows that f tends to a value f 0

5 where 0 < f 0 < ; hence in this case too, when the rocessors number increases, the arallel imlementation gives a behaviour that conforms to Amdahl's law, but the roblem dimension can now increase in a way not asymtotic to the rocessors number. I consider not very realistic or at least not meaningful for an analysis of their arallelism the situations where the exonent of arallelism tends to -, in which case f tends to 0, the seedu becomes and there is resources's waste, or whose for which the ratio ÅÅÅÅ n tends to 0. From the revious considerations we can get the following theorem-definition, which generalizes the Amdahl's law and establishes the arallelism tye for a given imlementation: In a arallel imlementation, n is the dimension of the relative roblem, the rocessors number, F (,n) the function exonent of arallelism. gh L If n = g() is an increasing function of such that the ratio ÅÅÅÅÅÅÅÅÅÅÅ tends to + and F (, g()) tends to 0 for tending to +, than the imlementation is strongly arallel. If F (, g()) tends to 0 for tending to + only for g() increasing function gh L of such that the ratio ÅÅÅÅÅÅÅÅÅÅÅ tends to a finite limit, than the imlementation is weakly arallel. If for every g () increasing function of the function F (, g()) tends to a limit smaller than zero for tending to +, than the imlementation is Amdahl-like arallel. 3. Considerations and examles When we aly the receding classification we meet two roblems: the first is how to calculate the function exonent of arallelism, and the second is how to demonstrate if exist the right g () function. The former deends uon the used imlementation and it can be solved by mean of a scalability analysis (see GROPP, 2002), and in the following considerations I'll try to exlain how it is ossible to obtain some useful information; the latter is a roblem of mathematical kind which can be tackled by mean of the methods of differential calculus. First of all we can notice that in (2) we are interested to know when, for large enough values of, the exonential values are near to, and hence if we write the olynomial series of Ex(F) resect to the argument F we can assert that the following is a good aroximation: i f = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ - k j + F + F 2 y ÅÅÅÅÅÅÅÅÅÅ z 2 { Using this exression of f into (2) one can obtain the following formula: F =- $%%%%%%%%%%%%%% - ÅÅÅÅÅ 2 S (5) (6)

6 where the interesting case is that with the + sign, because it is useful for examine the closeness of F to 0. In () we can consider Tar = Tar(, n). Also we can establish, as a good aroximation in general and otimal in the case when all the system's rocessors are of the same kind, that Tser = Tar(, n), i.e. the time sent in a serial imlementation of a rogram can be aroximated with the time sent in the corrisonding arallel imlementation executed in a unique rocessor. From () and (6) one obtain the following arallelism condition: Tar H, nl Tar H, nl ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ (7) 2 which can be interretated as a minimal condition of arallelism for the warranty of an advantageous seedu. We now examine some examle in which using the relation 2 T ar H, nl F =- + $%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ T ar H, nl (8) we can obtain the informations given in the receding generalization of Amdahl's law. In (PACHECO, 997) is resented a arallel imlementation of the numeric comutation of a definite integral by mean of the traezoidal rule, and using a scalability analysis the following estimate is obtained: T ar H, nl = a n ÅÅÅÅÅÅÅ + b Log whee a and b are two ositive constants deending from the used oerative environment. Hence we have Tar H, nl ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ Tar H, nl an + b Log = ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ an from which we see, using the rules of calculus, that if one consider for examle n = 2, increasing function resect to, the limit of the ratio when indefinitely grows is 0, and from (8) one obtain that F tends to 0. Hence on the ground of the receding classification, the imlementation is strongly arallel, and this fact is in accordance with the scalability argumentations develoed from Pacheco, from which if the increment of n is oortunity guided by that of, the seedu remains high. One can see that from the receding ratio for alying the classification rule it is sufficient to use the function n = Log, as reorted by Pacheco. If one uses the function n =, the limit of F when increases is still 0, but since the ratio ÅÅÅÅ n tends to a finite limit the imlementation, already classified as strongly arallel, reveals in such conditions a weakly arallel behaviour. This fact shows, in accordance with (BENNER -GUSTAFSON-MONTRY, 988), the imortance of a convenient growth function n = g() in a arallel imlementation which would have as aim a high seedu. Also it can be notice that keeing fixed the dimension n of the roblem, when the rocessors number increases the receding ratio increases in an unbounded manner, hence the arallelism is

7 no more advantageous, the (8) is not alicable, S tends to zero and hence the Amdahl's law (3) turns out not correct. This fact suggests the hyotesis that in the set of ossible imlementations the strongly arallel imlementations are those that turn out advantageous when the roblem's dimension grows in a suitably considerable way resect to the used rocessors number. In contrast with the Fig., we resent in the following Fig.2 the grahics of the seedu for the traezoidal rule resectively in the case n = 2, n = Log e n = : Seedu 500 Fig.2 400 300 200 00 00 200 300 400 500 rocessors from which one can see that the seedu imroves when the derivative resect to the rocessors number of the roblem dimension grows. In (CORMEN-LEISERSON-RIVEST, 990) is resented a arallel imlementation for the calculation of a Fast Fourier Transform which has the aim of keeing an asymtotic execution time Tar = A Log (n), where A is a constant, n is the inut dimension and the logarithm is in base 2. The corresonding serial imlementation shows an asymtotic time Tser = B n Log (n). The constants A and B deend from the used oerative environment. The arallel rocedure is obtained by mean of a necessary configuration of n combinatorial elements, which are resonsable of the oerations of addition, multilication and intercommunication of artial results. If one use rocessors, each of these assembles k combinatorial elements, it is n = k, and hence T ar ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ Tser = A ÅÅÅÅÅÅÅÅÅÅÅÅ Bn = A ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ Bk The conclusion is that the function F tends to 0 for tending to +, and this under the condition that the roblem dimension grows linearly resect to the rocessors number. Hence on the ground of the receding classification, the arallel imlementation roosed for the FFT is weakly arallel. In this examle if one kees constant the roblem dimension n, increasing even indefinitely the rocessors (.e. identifying each of them with a single combinatorial element, hence imosing k = ), the ratio of the execution times is constant too, and therefore in this case the Amdahl's law can be alied. This consideration suggests that in the set of ossible imlementations the weakly arallel imlementations are those for which, keeing constant the relative roblem's dimension, the Amdahl's law is alicable.

8 In (GROPP, 2002) is resented an imlementation for multilying a n x n matrix with a vector of n comonents. The scalability analysis gives the following estimate: T ar H, nl = ah 2 n2 - n L ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ + b H n 2 + n L where a and b are two ositive constants deending from the used oerative environment. Hence one obtain T ar H, nl ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ T ar H, nl = a H 2 n2 - n L + bhn 2 + n L ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅ @a H 2 n 2 - n L + b H n 2 + n LD = bn2 + 2 an 2 + bn-an ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ H 2 a + b L n 2 + H- a + bl n from which we see, using the rules of calculus, that if one considers a whatever increasing function n = g () such that ÅÅÅÅÅÅÅÅÅÅÅ g HL tends to + or not, the limit of the ratio when grows is b ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ H 2 a + b L Suosing that this quantity satisfies the minimal condition of arallelism (7), the limit of the function F given by (8) is finite and negative, therefore the rule of classification reviously enunciated secifies that the considered imlementation is Amdahl-like arallel. In (GROPP, 2002) is shown that the arameter b is the number of microseconds sent by a secific hardware for communicate beetwen two rocesses a floating-oint quantity, while a is the time sent for the execution of a floating-oint oeration. The (7) should imly the condition b 2a, not very realistic with the resent hardware, but the imrovements on the communications technology beetwen rocesses could render it reliable in the future. Imosing b = 2a, from (8) we have F = -, hence from (5) we obtain for f the limit 0.5, therefore the asymtotic value of the seedu is 2, that is near to the value 2.3 obtainable directly from (); in the following Fig.3 I resent the grahics of the seedu in the case n = 2, n = Log and n =, from which it is clear that the increasing of the dimension roblem resect to the used rocessors number ractically hasn't influence, on the contrary of what haens in a strongly arallel imlementation: Seedu 2.5 Fig.3 2.5 0.5 0 20 30 40 rocessors 4. Suerlinear seedu In some real situations a suerlinear seedu is been registered, that is for some values of and

9 n the exerimental value of S is resulted greater then. A necessary condition for this situation is that in (2) the function f assumes values greater than, and recisely f > ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ - This doesn't agree fullfully with the initial definition of f as arallelizable fraction of a code, and hence smaller or equal to. Now we see how the suerlinear seedu can be exlained, in the mathematical roosed model, in a fashion coherent with the original definition of f. From (2) and (2) the suerlinear seedu imoses that (9) ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ ExH FH, nll H - L + < - and hence it must be ExH FH, nll> - ÅÅÅÅÅÅ therefore one obtains the condition (20) (2) FH, nl > Log i (22) k j - ÅÅÅÅÅÅ y z { First of all we can notice that the argument of logarithm is smaller than, therefore the (22) can be satisfied for values of F smaller than 0 too, and hence the roosed descrition is coherent with the originary condition f. We'll use the following aroximations, that are good for small values of the argument x: Log H + xl = x - x2 ÅÅÅÅÅÅÅÅ 2 (23) è!!!!!!!!!!! + x = + ÅÅÅÅÅ x (24) 2 - ÅÅÅÅÅÅÅÅ x2 8 In the case of a suerlinear imlementation the quantity under logarithm in (22) in general is small (mathematically it would sufficient = 0), therefore alying (23), from (22) and (6) one obtains FH, nl >-ÅÅÅÅÅÅÅÅÅÅÅÅ 2 2 - ÅÅÅÅÅÅ that can be considered as a suerlinear condition of an imlementation characterized by an exonent of arallelism F (,n). The figure 4 resents the grahic of the second member of (25), that reresents the inferior limit which must be resected by the function F (,n): (25)

0 Inferior limit for FH,nL Fig.4 2 4 6 8-0.25 rocessors -0.5-0.75 - -.25 -.5 In such conditions the argument under square root in (6) is near to, hence from (24) one obtains ÅÅÅÅÅÅÅÅÅÅÅÅÅ 2 S 2 + ÅÅÅÅÅ S < ÅÅÅÅÅÅÅÅÅÅÅÅ 2 2 + ÅÅÅÅÅÅ condition which is trivialy satisfied by the classical condition of suerlinearity S >. Therefore (25) and (26) extend the notion of suerlinearity, and this in a coherent fashion with the original condition f. As an examle of alication of (25) or (26), we consider the arallel imlementation of the Fast Fourier Transform already mentioned. In this case one has, as uon reorted, ÅÅÅÅÅ S = A ÅÅÅÅÅÅÅÅÅ B n We suose for simlicity that n, as haens in real alications, is sufficiently large so that in (26) one can disregard the quadratic term at first member. Denoting by C the quantity ÅÅÅÅÅ A, that B deends from the hardware and software environment, using (26) and solving the disequality resect to the variable one obtains < ÅÅÅÅÅÅÅÅÅÅ (27) 2 C J n + "###################### n 2 + 2 Cn N hence in this examle the suerlinear seedu is ossible only if the rocessors number is uer limited by a relation that involves the roblem dimension. In (CAVAZZONI-CHIAROTTI, 200) is reorted the exerimental observation of suerlinear seedu in an imlementation that uses in a great and sofisticated manner many arallelized FFT on a system Cray T3E with Fortran 90 as comiler: the rocessors region which resents suerlinearity is uer limited, and the (26)

henomenon is due to the effects of hardware and software caches, which in the receding mathematical schema are ointed out by the resence of the constant C. 5. Conclusions In this work I roosed a mathematical interretation of some exerimental results and of some theoretical digressions reorted in the literature on the ossible limits and erformances of arallel comuting. I roosed a generalization of Amdahl's law on the ossible seedu obtainable in a arallel imlementation. In articular I have resented some sufficient conditions in order that, in a given imlementation, the seedu could indefinitely grow when the dimension of the analyzed roblem increases as consequence of the growing of the used rocessors number. By mean of these conditions I have defined three classes into which the arallel imlementations can be classified, and the discriminant agent is offered by the relation of growth of roblem's dimension resect to the used rocessors number. Also I have roosed a condition of suerlinear seedu that is coherent with the original definition of arallelizable fraction of an imlementation. Some concrete examles are been resented to illustrate the formulated mathematical descrition, which in articular show that the obtained conditions have some constants deending from the hardware and software environment where the arallel imlementation is executed. Possible further develoments can regard the formulation of an algorithm for calculate the function exonent of arallelism of an imlementation, and an extension of the mathematical model to hardware architectures with non homogeneous rocessors. 6. Bibliograhy Gene AMDAHL, Validity of the single rocessor aroach to achieving large scale comuting caabilities, AFIPS Conference Proceedings, 967. R. BENNER - J. GUSTAFSON - G. MONTRY, Develoment of arallel methods for a 024- rocessor hyercube, SIAM Journal on Scientific and Statistical Comuting, 9(4), 988. Carlo CAVAZZONI - Guido CHIAROTTI, Imlementation of a Parallel and Modular Car- Parrinello Code, in Science and Suercomuting at CINECA, CINECA, Italy, 200. T.H. CORMEN - C.E. LEISERSON - R.L. RIVEST, Introduction to Algorithms, The MIT Press, Boston, 990. W. GROPP - E. LUSK, Parallel rogramming with MPI, in T. STERLING, Beowulf cluster comuting with Linux, The MIT Press, Boston, 2002. MPI FORUM, Web site: www.mi-forum.org, 2002. Peter PACHECO, Parallel rogramming with MPI, Morgan Kaufmann, San Francisco, 997.