Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences Example : Example : Example : do =, do =, do =, do =, do =, do =, A( + ; ) = A( +

Size: px
Start display at page:

Download "Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences Example : Example : Example : do =, do =, do =, do =, do =, do =, A( + ; ) = A( +"

Transcription

1 Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences Jaln Ju and Vpn Chaudhary Parallel and Dstrbuted Computng Laboratory, Wayne State Unversty, Detrot, MI, USA Emal: Although many methods exst for nested loop parttonng, most of them perform poorly when parallelzng loops wth non-unform dependences. Ths paper addresses the ssue of automatc parallelzaton of loops wth non-unform dependences. Such loops normally are not parallelzed by exstng parallelzng complers and transformatons. Even when parallelzed n rare nstances, the performance s very poor. Our approach s based on the Convex Hull theory whch has adequate nformaton to handle non-unform dependences. We ntroduce the concept of Complete Dependence Convex Hull, Unque Head and Tal Sets and abstract the dependence nformaton nto these sets. These sets form the bass of the teraton space parttons. The propertes of the unque head and tal sets are derved. Dependng on the relatve placement of these unque sets, parttonng schemes are suggested for mplementaton of our technque. Implementaton results of our scheme on the Cray J9 and comparson wth other schemes show the superorty of our technque. Receved November, 99; revsed July, 997. INTRODUCTION Gven a sequental program, a challengng problem for parallelzng complers s to detect maxmum parallelsm. It s generally agreed upon and shown n the study by Kuck et. al. that most of the computaton tme s spent n loops. Current parallelzng complers concentrate on loop parallelzaton. A loop can be easly parallelzed f there are no cross-teraton dependences. However, loops wth cross-teraton dependences are very common. Parallelzng loops wth crossteraton dependences s a maor concern facng parallelzng complers today. Loops wth cross-teraton dependences can be roughly dvded nto two groups. One s loops wth statc regular dependences, whch can be analyzed durng comple tme. Example, n Fgure belong to ths group. The other group s loops wth dynamc rregular dependences, whch have ndrect access patterns. Example shows a typcal rregular loop, whch s used for edge-orented representaton of sparse matrces. These knd of loops cannot be parallelzed at comple tme, for lack of sucent nformaton. To execute such loop ecently n parallel, runtme support must be provded. The maor ob of parallelzng complers s to parallelze loops wth statc regular dependences. Statc regular loops can be further dvded nto two sub-groups. One s wth unform dependences and the other s wth non-unform dependences. The dependences are unform only when the patterns of dependence vectors are unform. In other words, the dependence vectors can be expressed by constants,.e., dstance vectors. Example llustrates a unform dependence loop. Its dependence vectors are (, ) and (, -). Fgure shows the dependence patterns of Example n the teraton space. In the same fashon, we call some dependences non-unform when dependence vectors are n rregular patterns whch cannot be expressed by dstance vectors. Fgure shows the dependence patterns of Example n the teraton space. A lot of research has been done n parallelzng loops wth unform dependences, from dependence analyss to loop transformaton, such as loop nterchange, loop permutaton, skew, reversal, wavefront, tlng, etc. But lttle research been done for the loops wth non-unform dependences. The exstng commercal parallelzng complers and research parallelzng complers, such as Stanford's SUIF, CSRD's Parafrase-, and Unversty of Maryland's Omega Proect, can parallelze most of the loops wth unform dependences. But they do not satsfactorly handle loops wth non-unform dependences. Most of the tme, the compler treats such loops as un- The Computer Journal, Vol., No., 997

2 Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences Example : Example : Example : do =, do =, do =, do =, do =, do =, A( + ; ) = A( + ; + ) = A(B(); C()) = = A(; ) + A(; + ) = A( + + ; + + ) = A(B(? ); C( + )) FIGURE. Examples of loops wth derent knds of dependences ( a ) ( b ) FIGURE. Iteraton spaces wth Unform dependences and Non-unform dependences parallelzable and leaves them runnng sequentally. For nstance, nether SUIF nor Parafrase- can parallelze the loop n Example. Unfortunately, loops wth nonunform dependences are not so uncommon n the real world. In an emprcal study, Shen et al. observed that nearly % of two dmensonal array references are coupled, whch means array subscrpts are lnear combnatons of loop ndces. These coupled subscrpts lead to non-unform dependence. Hence, t s mperatve to gve loops wth non-unform dependence a serous consderaton, even though they are more dcult to parallelze.. SURVEY OF RELATED RESEARCH The convex hull created by solvng the lnear Dophantne equatons s requred for detectng parallelsm n non-unform loops snce t s the least abstracton to have adequate nformaton to accomplsh the detecton of parallelsm n non-unform loops 7. Thus, most of the technques proposed for parallelzng loops wth nonunform dependences are based on dependence convex hull theory. These can be classed nto four categores: unformzaton, unform parttonng, non-unform parttonng, and nteger programmng based parttonng... Unformzaton Ths paper focuses on parallelzaton of perfectly nested loops wth non-unform dependences. The rest of ths paper s organzed as follows. Secton two surveys the research n parallelzaton of non-unform dependence loops. Secton three revews the Dependence Convex Hull theory and ntroduces the Complete Dependence Convex Hull. Secton four gves the denton of unque sets and the technques to nd them. Secton ve presents our unque set orented parttonng approach. Secton sx extends our technque to a general program model wth multple nestngs. Secton seven conrms the superorty of our technque wth an mplementaton on Cray J9 and comparson wth prevously proposed technques. Fnally, we conclude n secton eght. Tzen and N proposed the dependence unformzaton technque. Based on solvng a system of Dophantne equatons and a system of nequaltes, they compute the maxmal and mnmal dependence slopes of any unform and non-unform dependence pattern n a twodmensonal teraton space. Then, by applyng the dea of vector decomposton, a set of basc dependences s chosen to replace all orgnal dependence constrants n every teraton so that the dependence pattern becomes unform. They also proved that any doubly nested loop could always be unformzed to a unform dependence loop wth two dependence vectors. They proposed an ndex synchronzaton method to reduce the synchronzaton, n whch synchronzaton could be systematcally nserted. Ths unformzaton helps n applyng exstng parttonng and schedulng technques. But t The Computer Journal, Vol., No., 997

3 J. Ju and V. Chaudhary mposes too many dependences to the teraton space whch otherwse has only a few of them. Chen and Yew9 presented a scheme whch computes a Basc Dependence Vector Set and schedules the teratons usng Statc Strp Schedulng. They extended the dependence unformzaton technque of Tzen and N and presented algorthms to compute better basc dependence vector sets whch extract more parallelsm from the nested loops. The program model s more general, ncludng non-perfect nested loops. Whle ths technque s dentely an mprovement over Tzen and N's work, t also mposes too many dependences on the teraton space, thereby reducng the extractable parallelsm. Moreover, ths unformzaton needs a lot of synchronzaton. Chen and Shang proposed another unformzaton technque. They form the set of basc dependence vectors and mprove ths set usng certan obectve functons. They select those basc dependence vectors whch are tme-optmal and cone-optmal. After unformzng the teraton space, they use optmal lnear schedules to order the executon of the teratons. Ths technque lke both the prevous unformzaton technques mpose too many dependences... Unform Parttonng Punyamurtula and Chaudhary extended the theory of Convex Hull to the Integer Dependence Convex Hull(IDCH) and proposed a Mnmum Dependence Dstance Tlng technque. Every nteger pont n the IDCH corresponds to a dependence vector n the teraton space of the nested loops. They showed that the mnmum and maxmum values of the dependence dstance functon occur at the extreme ponts of the IDCH. Therefore, t s only necessary to calculate the dependence dstance at the extreme ponts and compare all the values of the dstance to get the mnmum dependence dstance. These mnmum dependence dstances are used to partton the teraton space nto tles of unform sze and shape. The wdth of tles s less than or equal to the mnmum dependence dstance n at least one drecton. Ths would guarantee that for any dependence vector, ts head and tal would fall nto derent tles. Iteratons n a tle would be executed n parallel. Tles n a group would be executed n sequence and the dependence slope nformaton of Tzen and N can be used to synchronze the executon of nter-group tles. Ths technque works very well for cases when the mnmum dstance n one drecton s large. It does not work as well for the case when the dependence dstances are small as t would nvolve too much synchronzaton overhead... Non-unform Parttonng Zaafran and Ito proposed the three-regon technque. Ths technque dvdes the teraton space nto two parallel regons and one sequental regon. The teratons n the parallel regons can be executed fully n parallel whle the teratons n the sequental regon can only be executed sequentally. Two parallel regons are called Area and Area, respectvely, and the sequental regon s called Area. Area represents the part of the teraton space where the destnaton teraton comes lexcally before the source teraton. The teratons n Area can be fully executed n parallel provded that varable renamng s performed. Area corresponds to the regon where the drecton vector s equal to (<, ) or equal to (=, <). Area represents the part of the teraton space where the destnaton teraton comes lexcally after the source teraton and the source teraton s n Area. If Area s executed rst, then the nodes n Area can be executed n parallel. Area represents the rest of the teraton space (teraton space - (Area [ Area)). Once Area and Area are executed, then the nodes n Area should be executed sequentally. Zaafran and Ito apply ther technque to the entre teraton space, though t wll suce to applyng t only to the DCH or IDCH. The nodes that are not n the DCH can be executed n parallel because of the nonexstence of dependences for these nodes. Ths s equvalent to dvdng the teraton space nto four regons (Area, Area, Area, and non-dch). Agan ths technque has ts dsadvantages. The sequental part of the teraton space s the bottleneck for the performance. If the sequental part of teraton space s small, ths technque s ne. Otherwse the sequental part can be a serous drawback n performance... Integer Programmng Based Approach Tseng et. al. proposed a parttonng scheme usng Integer Programmng technques. They start wth an orgnal dependence vector set and dvde t nto eght groups. They nd the mnmum dependence vector set by solvng nteger programmng formulatons. Then they use mnmum dependence vector set to represent the dependence vectors of nested loops and partton the teratons of loops nto groups. All teratons n the same group can be executed at the same tme. They also proposed a group synchronzaton method for arrangng synchronzaton. But the method they used to compute the mnmum dependence vector set may not always gve mnmum dependence dstances. Besdes, nteger programmng approach s tme-consumng. Pugh and Wonnacott construct several sets of constrants that descrbe, for each statement, whch teratons of that statement can be executed concurrently. By constructng constrants that correspond to derent assumptons about whch dependences mght be elmnated through addtonal analyss, transformatons, and user assertons, they determne whether they can expose parallelsm by elmnaton dependences. Then they look for condtonal parallelsm, and try to dentfy the knds of teraton-reorderng transformatons that could be used to produce parallel loops. However, ther method The Computer Journal, Vol., No., 997

4 Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences θ θ o o ( a ) parallel to axs ( b ) < θ < 9 (c ) parallel to axs ( d ) -9 < θ< FIGURE. Possble dependence drectons n lexcographc order may produce false dependences.. DEPENDENCE ANALYSIS Cross-teraton dependence s the maor concern that may keep the program from runnng n parallel. For the four types of data dependences, ow, ant, output, and nput dependence, nput dependence mposes no orderng constrants, so we only look at the other three types. We won't consder output dependences as real dependences ether. We can always use the storage replcaton technque to allow the statements whch have output dependences to execute concurrently. Ths research wll look at the cases of ow dependences and ant dependences. Data dependence denes the executon order among teratons. The executon order can be expressed as Lexcographc order. Lexcographc order can be shown as an arrow n the teraton space, whch also represents the dependence vector. All the arrows n Fgure are n lexcographc order. The teraton correspondng to the arrow head cannot be executed untl the teraton correspondng to the tal has been executed. All the dependences dscussed n ths paper are put nto lexcographc order. If there s a dependence from teraton to teraton, and executes before, we represent t by drawng an arrow!. Fgure shows all four possble drectons f all the dependence vectors are put n lexcographc order wth two level of loops, where s the ndex for the outer loop and s the ndex for the nner loop. The runnng order mposes that there cannot exst an arrow pontng to the left or an arrow parallel to axs and pontng down. The arrows here are the dependence vectors... Dependence and Convex Hull Studes, show that most of the loops wth complex array subscrpts are two dmensonal loops. We start wth ths typcal case. We smplfy our general program model to a normalzed, doubly nested loop wth coupled subscrpts (.e., wth subscrpts beng lnear functons of loop ndces) as shown n gure. We wsh to dscover what cross-teraton dependences do = L, U do = L, U A(a + b + c ; a + b + c ) = = A(a + b + c ; a + b + c ) FIGURE. Doubly Nested Loop Model exst between the two references to array A n the program model. There are a large varety of tests that can prove ndependence n some cases. It s nfeasble to solve the problem drectly, even for lnear subscrpt expressons, because ndng dependences s equvalent to the NP-complete problem of ndng nteger solutons to systems of lnear Dophantne equatons7. Two general and approxmate tests are GCD and Baneree's nequaltes9. Recently, Subhlok and Kennedy proposed a new search procedure that dentes an nteger soluton n a convex regon, or prove that no nteger solutons exst. The most common methods to compute data dependence s to solve a set of lnear Dophantne equatons wth a set of constrants whch are the teraton boundares. A dependence exsts only f the equatons have a soluton. We want to nd a set of nteger solutons ( ; ; ; ) that satsfy the system of Dophantne equatons () and the system of lnear nequaltes (). a + b + c = a + b + c a + b + c = a + b + c () >< >: L U L U L U L U () Once the general solutons are found, dependence nformaton can be represented by dependence vector. The dependence s unform when dependence vectors are constants. Otherwse the dependence s nonunform. The Computer Journal, Vol., No., 997

5 J. Ju and V. Chaudhary The data dependence analyss technques do well on loops wth unform dependences snce dependence dstance vectors can be calculated precsely. A lot of research has been done for unform dependence analyss and loop transformaton technques,,,. However, for the case of non-unform dependences, Yang, Ancourt and Irgon7 showed that drecton vector alone does not have enough nformaton for transformng nonunform dependence. Dependence Convex Hull (DCH) s the least requrement f we want to parallelze loops wth non-unform dependence. DCHs are convex polyhedrons and are subspace of the soluton space. Frst of all, we show how to nd DCHs. There are two approaches to solve the system of Dophantne equatons of (). One way s to set to x and to y and get the soluton to and. a + b + c = a x + b y + c a + b + c = a x + b y + c We have the soluton as = x + y + where = x + y + = a b? a b a b? a b = b b? b b a b? a b = b c + b c? b c? b c a b? a b = a a? a b a b? a b = a b? a b a b? a b = a c + a c? a c? a c a b? a b The soluton space S s the set of ponts (x; y) satsfyng the soluton gven above. Now the set of nequaltes can be wrtten as >< >: L x U L y U () L x + y + U L x + y + U where () denes a DCH denoted by. Another approach s to set to x and to y and solve for the soluton to and. a + b + c = a x + b y + c a + b + c = a x + b y + c We have the soluton as = x + y + where = x + y + = a b? a b a b? a b = b b? b b a b? a b = b c + b c? b c? b c a b? a b = a a? a b a b? a b = a b? a b a b? a b = a c + a c? a c? a c a b? a b The soluton space S s the set of ponts (x; y) satsfyng the soluton gven above. Now the set of nequaltes can be wrtten as >< >: L x + y + U L x + y + U () L x U L y U where () denes another DCH, denoted by DCH. Both sets of solutons are vald. Each of them has the dependence nformaton on one extreme. For some smple cases, for nstance, there s only one knd of dependence, ether ow or ant dependence, one set of solutons(:e: DCH) should be enough. Punyamurtula and Chaudhary used constrants () for ther technque, whle Zaafran and Ito used () for ther technque. For those more complcated cases, where both ow and ant dependences are nvolved and dependence patterns are rregular, we need to use both sets of solutons. We wll ntroduce a new term Complete Dependence Convex Hull to summarze these two DCHs and we demonstrate that the Complete DCH contans complete nformaton about dependences... Complete Dependence Convex Hull (CDCH) Defnton. (Complete DCH (CDCH)). Complete DCH s the unon of two closed sets of nteger ponts n the teraton space, whch satsfy () or (). 9 7 DCH 7 9 FIGURE. CDCH of Example Fgure shows the CDCH of Example. We use an arrow to represent a dependence n the teraton space. We call the arrow's head the dependence head and the arrow's tal the dependence tal. The Computer Journal, Vol., No., 997

6 Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences 7 Theorem.. All the dependence heads and tals le wthn the CDCH. The head and tal of any partcular dependence le n the two DCHs of the CDCH. Proof. Let us assume that ( ; ) s dependent on ( ; ). In the teraton space graph we can have an arrow from ( ; ) to ( ; ). Here ( ; ) s the arrow tal and ( ; ) s the arrow head. Because of the exstng dependence, ( ; ) and ( ; ) must satsfy the system of lnear Dophantne equatons () and the system of lnear nequaltes (). There are four unknown varables. We can reduce two unknown varables by settng = x and = y and solve for and. Then and must satsfy (). Hence ( ; ) les n the area dened by () whch s one of the DCH of the CDCH. In the same way, we reduce and by settng = x and = y and solve for and. Here ( ; ) les n the area dened by () whch s another DCH of the CDCH. Therefore, both ( ; ) and ( ; ) fall nto dfferent DCHs of the CDCH. If teraton ( ; ) s dependent on ( ; ), then dependence vector D(x, y) s expressed as: d (x; y) =? d (x; y) =? So, for, we have d (x ; y ) = (? )x + y + d (x ; y ) = x + (? )y + () For DCH, we have d (x ; y ) == (? )x? y? d (x ; y ) =? x + (? )y? () Clearly f there s a soluton (x ; y ) n, there must be a soluton (x ; y ) n DCH, because they have been solved from the same set of lnear Dophantne equatons (). Gven the dependence vectors above, there must exst a mnmum and a maxmum value of D(x; y). It was shown by Punyamurtula and Chaudhary that the mnmum and maxmum values of the dependence D(x; y) occur at the extreme ponts of the DCH.. UNIQUE SETS IN THE ITERATION SPACE If a loop has cross-teraton dependences, we can construct ts CDCH (comprsng of and DCH). As we have proved earler, all dependences le wthn the CDCH. In other words, the teratons lyng outsde the CDCH can be executed n parallel. Punyamurtula and Chaudhary proposed the concept of mnmum dependence dstance tlng, whch gves an excellent parttonng of teraton space for the case when ~d(x; y) = ~ does not pass through any DCH. However, mnmum dependence dstance cannot be calculated when d(x; ~ y) = ~ passes through the DCH. Our technque works well for both the cases. Suppose all dependence tals fall nto and all dependence heads fall nto DCH (Fgure ) and the two DCHs do not overlap. Partton can be done by drawng a lne between the two DCHs. The area contanng the DCH of tal wll execute rst followed by the area contanng the DCH of heads. Fgure llustrates ths fact by rst executng area followed by area. The teratons wthn the two areas are fully parallelzable. The dea behnd the above example s to nd separate sets that contan the dependence heads and tals. We want to mnmze these sets and then partton the teraton space by drawng lnes separatng these sets n the teraton space. The executon order s determned by whether the set contans heads or tals. The next problem how s to nd unque sets. The problem s compounded f these sets overlap... Unque Head and Unque Tal Sets There are only two DCHs gven the program model n Fgure. All the dependence heads and tals wll le wthn these two DCHs. These areas are our prmtve sets. For one partcular set, t s qute possble that t contans both the dependence heads and tals. Because of the complexty of the problem, we have to dstngush between the ow and ant dependences, and partton the teraton space n a non-unform way because the dependence tself s non-unform. Let us look at Fgure whch shows the CDCH of Example. We note that contans all ant dependence heads and all ow dependence tals. DCH contans all the ow dependence heads and ant dependence's tals. Fgure 7 separates the ow and ant dependences to gve a clearer pcture. It can be found out that s the unon of ow dependence tal set and ant dependence head set, and DCH s the unon of ow dependence head set and ant dependence tal set. Hence, the followng denton s derved to dstngush the sets. Defnton. (Unque Head(Tal) Set). Unque head(tal) set s a set of nteger ponts n the teraton space that satses the followng condtons:. t s subset of one of the DCH (or s the DCH tself).. t contans all the dependence arrow's heads(tals), but does not contan any other dependence arrow's tals(heads). Obvously the DCHs n Fgure 7 are not the unque sets we are tryng to nd, because each DCH contans The Computer Journal, Vol., No., 997

7 J. Ju and V. Chaudhary FIGURE. Parttonng wth two non-overlappng DCHs 9 7 DCH 9 7 DCH FIGURE 7. Flow dependence, Ant dependence the dependence heads of one knd and the dependence tals of the other knd. Therefore, these DCHs must be further parttoned nto smaller unque sets... Fndng Unque Head and Unque Tal Sets Frst propertes of and DCH must be examned. Theorem.. contans all ow dependence tals and all ant dependence heads (f they exst) and DCH contans all ant dependence tals and all ow dependence heads (f they exst). Proof. The system of nequaltes n () denes and = x = y = x + y + = x + y + If there exsts a ow dependence, we can assume that ( ; ; ; ) s a soluton to the ow dependence. From the denton of ow dependence, ( ; ) should be wrtten somewhere n the teraton space before ( ; ) s referenced. So we can draw an arrow from ( ; ) to ( ; ) n the teraton space to represent the dependence and executon order as ( ; )! ( ; ) whch s equvalent to (x ; y )! ( x + y + ; x + y + ). Here (x ; y ) s the arrow tal. Snce (x ; y ) satses () and we have assumed that ( ; ; ; ) s a soluton, must contans all ow dependence tals. If there exsts an ant dependence, we can agan assume that ( ; ; ; ) s a soluton to the ant dependence. From the denton of ant dependence, we have an arrow from ( ; ) to ( ; ),.e., ( x + y + ; x + y + )! (x ; y ). Snce (x ; y ) s the arrow's head and (x ; y ) satses (), contans all ant dependence heads. The proof that DCH contans all ant dependence tals and ow dependence heads (f they exst) s smlar to the proof for. The above theorem tells us that and DCH are not unque head or unque tal sets f there are both ow and ant dependences. If there exst only ow or ant dependence, ether contans all the ow dependence tals or ant dependence heads, and DCH ether contans all the ow dependence heads or ant dependence tals. Under these condtons, both and The Computer Journal, Vol., No., 997

8 Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences 9 DCH are unque sets. The followng theorem states the condton for and DCH to be unque sets. Theorem.. If d (x; y) = does not pass through any DCH, then there s only one knd of dependence, ether ow or ant dependence, and the DCH tself s the unque head set or the unque tal set. [Part ] d (x ; y ) corresponds to and Proof. d (x ; y ) corresponds to DCH. Suppose d (x ; y ) does not pass through. Snce d (x ; y ) =? = (? )x + y + and the teraton (x ; y ) that satses () must not satsfy (? )x + y + = (d (x ; y ) = s a lne n the teraton space), must be on one sde of d (x ; y ) =,.e., ether d (x ; y ) < or d (x ; y ) >. Frst let us look at the case when d (x ; y ) <. If d (x ; y ) <, then x + y + s always less than x. Thus, > s always true. Also, the array element correspondng to ndex s wrtten and the array element correspondng to ndex s read. Clearly, only ant dependence can satsfy ths condton. Therefore, contans only ant dependences. Next, let us look at the case when d (x ; y ) >. Here <. Clearly, only ow dependence can satsfy ths condton. Therefore, contans only ant dependence. The proof for DCH follows smlarly. Thus, f d (x; y) = does not pass through any DCH, then there s only one knd of dependence. [Part ] We have already shown above that f d (x; y) = does not pass through, then there s only one knd of dependence. If the dependence s ow dependence, then from theorem, contans only the ow dependence tals or ant dependence heads, makng a unque tal or head set. Smlarly, f the dependence s ant dependence, then from theorem, DCH contans only the ant dependence tals or ow dependence heads, makng DCH a unque tal or head set. and DCH are constructed from the same system of lnear Dophantne equatons and system of nequaltes. The followng two theorems hghlght the common attrbutes. Theorem.. If d (x ; y ) = does not pass through, then d (x ; y ) = does not pass through DCH. Proof. If d (x ; y ) = does not pass through, then ether les on the sde where d (x ; y ) < or on the sde where d (x ; y ) >. Frst let us consder the case when s on same sde of d (x ; y ) <. Snce d (x ; y ) s?, we have that <. We can nd the same soluton ( ; ; ; ) for DCH, because they are solved from the same set of lnear Dophantne equatons. d (x ; y ) s also dened as?. Hence, we can get d (x ; y ) < whch means d (x ; y ) = does not pass through DCH. The second case when s on the same sde of d (x ; y ) > can be proved smlarly.. Corollary.. When d (x ; y ) = does not pass through,. f d (x ; y ) > n, s ow dependence unque tal set. DCH s ow dependence unque head set.. f d (x ; y ) < n, s ant dependence unque head set. DCH s ant dependence unque tal set. Proof. It follows from theorems and. Corollary.. When d (x ; y ) = does not pass through,. f d (x ; y ) > n, then d (x ; y ) > n DCH.. f d (x ; y ) < n, then d (x ; y ) < n DCH. Proof. It s obvous from the above theorems and proofs gven. We have now establshed that f d (x ; y ) = does not pass through, then both and DCH are unque sets. When d (x; y) = passes through the CDCH, a DCH mght contan both the dependence heads and tals (even f and DCH do not overlap). Ths makes t harder to nd the unque head and tal sets. The next theorem looks at some common attrbutes when d (x; y) = passes through the CDCH. Theorem.. If d (x ; y ) = passes through, then d (x ; y ) = must pass through DCH. Proof. Suppose d (x ; y ) = passes through. Then we must be able to nd (x ; ) such that y d (x ; y) < and (x ; y ) such that d (x ; y ) > n. Correspondngly we can nd (x ; ) and y (x ; y ) n DCH such that d (x ; ) = y? = d (x ; y) and d (x ; y ) =? = d (x ; y ). Therefore, we have d (x ; ) < and d y (x ; y ) >. Hence, d (x ; y ) = must pass through DCH. Usng the above theorem we can now deal wth the case where a DCH contans all the dependence tals of one knd and all the dependence heads of another knd. Theorem.7. If d (x; y) = passes through a DCH, then t wll dvde that DCH nto a unque tal set and a unque head set. Furthermore, d (x; y) = decdes the ncluson of d (x; y) = n one of the sets. Proof. The proof for and DCH are symmetrc. Let us consder the case where d (x ; y ) = passes through. Frst consder ow dependences. Wthout loss of generalty, let ( ; ) and ( ; ) be the teratons whch cause any ow dependence. Then, ( ; ) and ( ; ) satsfy (). Thus, from the denton of The Computer Journal, Vol., No., 997

9 J. Ju and V. Chaudhary ow dependence, we have ether < or = and <. We can now solve () wth = x = y = x + y + (7) = x + y + Snce x < x + y +, we have (? )x + y + = d (x ; y ) >. From the above equatons we also have x = x + y + and y < x + y +, whch gves us d (x ; y ) = and d (x ; y ) >. Now let us consder ant dependence. We ether have > or = and >. Snce x > x + y +, we have (? )x + y + = d (x ; y ) <. From the set of equatons (7) above we also have x = x + y + and y > x + y +, whch gves us d (x ; y ) = and d (x ; y ) <. d (x ; y ) = dvdes nto two parts, d (x ; y ) > and d (x ; y ) <. Flow dependences satsfy d (x ; y ) >. From theorem we know that these are the ow dependence tals. Whether d (x ; y ) = belongs to ths set s dependent on whether d (x ; y ) > or not. Therefore, d (x ; y ) decdes the ow dependence unque tal set. Smlarly d (x ; y ) decdes the ant dependence unque head set. Note that f d (x ; y ) >, then the lne segment correspondng to d (x ; y ) = belongs to the ow dependence unque tal set and f d (x ; y ) <, then the lne segment correspondng to d (x ; y ) = belongs to the ant dependence unque head set. The teraton correspondng to the ntersecton of d (x ; y ) = and d (x ; y ) =, has no cross-teraton dependence. If the ntersecton pont of d (x ; y ) = and d (x ; y ) = les n, then one segment of the lne d (x ; y ) = nsde s a subset of the ow dependence unque tal set and the other segment of the lne d (x ; y ) = nsde s a subset of the ant dependence unque head set. For DCH, we have smlar results as above. To summarze, the followng corollary s derved. Corollary.. The ow dependence unque tal set s expressed by L x U L >< y U L x + y + U L x + y + U d >: (x ; y ) > and d (x ; y ) = d (x ; y ) > The ant dependence unque head set s expressed by L x U L >< y U L x + y + U L x + y + U d (x ; y ) < and d (x ; y ) = d (x ; y ) < The ow dependence unque head set s expressed by >: L x + y + U L >< x + y + U L x U L y U d (x ; y ) > and d (x ; y ) = d (x ; y ) > The ant- dependence unque tal set s expressed by >: The Computer Journal, Vol., No., 997 >< >: L x + y + U L x + y + U L x U L y U d (x ; y ) < and d (x ; y ) = d (x ; y ) < Proof. It follows drectly from Theorem. Corollary.9. When d (x ; y ) = passes through, then. s the unon of the ow dependence unque tal set and the ant dependence unque head set.. DCH s the unon of the ow dependence unque head set and the ant dependence unque tal set. Proof. It follows from Corollary. Fgure llustrates the applcatons of our results to Example. Clearly d (x ; y ) = dvdes nto two parts. The area on the left sde of d (x ; y ) = s the ow dependence unque tal set and the area on the rght sde of d (x ; y ) = s the ant dependence unque head set. d (x ; y ) = belongs to ant dependence unque head set. d (x ; y ) = dvdes DCH nto two parts too. The area below d (x ; y ) = s the ow dependence unque head set and the area above d (x ; y ) = s the ant dependence unque tal set. d (x ; y ) = belongs to ant dependence unque tal set.. UNIQUE SETS ORIENTED PARTITION- ING In the prevous sectons we have grouped teratons based on ther beng unque head or tal sets. Clearly the unque head set wll execute after the unque tal set. For our program model, there are at most four sets,.e., ow dependence unque tal set, ow dependence head set, ant dependence unque tal set, and ant dependence unque head set. The teratons outsde these sets can be executed concurrently. Moreover, the teratons wthn each set can be executed concurrently. In order to maxmze the parallelsm, we want to partton the teraton space accordng to unque sets.

10 Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences 9 7 Flow dependence unque tal set d(x, y) = d(x, y) = Flow dependence unque head set 9 7 d(x, y) = Ant dependence unque head set Ant dependence unque tal set d(x, y) = FIGURE. Unque head sets and unque tal sets of Flow dependence, Ant dependence Ant dependence unque head set Ant dependence unque head set DCH Ant dependence unque tal set DCH Ant dependence unque tal set FIGURE 9. One knd of dependence and does not overlap wth DCH It s mportant, however, to note that the eectveness of a parttonng scheme depends on the archtecture of the parallel machne beng used. In ths paper we do not recommend parttons for partcular archtectures, rather, we explore the varous parttons that can be generated from the avalable nformaton. The sutablty of a partcular partton for a specc archtecture s not studed. Based on the unque head and tal sets that we can dentfy that there exst varous combnatons of overlaps (and/or dsontness) of these unque head and tal sets. We categorze these combnatons as varous cases startng from smpler cases and leadng up to the more complcated ones. Case: There s only one knd of dependence and does not overlap wth DCH. Fgure 9 llustrates ths relatvely easy case wth an example. Any lne drawn between and DCH dvdes the teraton space nto two areas. Insde each area, all teraton are ndependent. The DCHs n ths case are unque head and unque tal sets. The teratons wthn each DCH can be executed concurrently. However, DCH needs to execute before as shown by the parttonng n Fgure 9. The executon order s gven as!. From the mplementaton pont of vew, t s advsable to partton the teraton space along the or axs so that the parttoned areas can be easly represented as a loop. It s also advsable to partton the teraton space as evenly as possble. However, the nal decson on parttonng wll depend on the underlyng archtecture. Case : There s only one knd of dependence and overlaps wth DCH. Fgure llustrates ths case. and DCH overlap to produce three dstnct areas denoted by Area, Area, and Area, respectvely. Area and Area are ether unque tal or unque head sets and thus teratons wthn each set can execute concurrently. Area contans both dependence heads and tals. We can apply the Mnmum Dependence Dstance Tlng technque proposed by Punyamurtula and Chaudhary to Area. Dependng on the type of dependence there are two dstnct executon orders possble. If DCH s a unque tal set, then the executon order s Area! Area! Area. Otherwse the executon order s Area! Area! Area. From the mplementaton pont of vew, we want to use a straght lne to partton the teraton space, so The Computer Journal, Vol., No., 997

11 J. Ju and V. Chaudhary Ant dependence unque tal set Area Area Ant dependence unque head set Area Ant dependence unque tal set Ant dependence unque head set DCH DCH FIGURE. One knd of dependence and overlaps wth DCH Flow dependence unque tal set Ant dependence unque head set Ant dependence unque tal set DCH DCH Flow dependence unque head set FIGURE. Two knds of dependence and does not overlap wth DCH that the generated code wll be much smpler. An example parttonng s shown n Fgure for the problem n Fgure. The executon order s gven as!!!. Another approach to parallelze the teraton space n ths case s to apply the Mnmum Dependence Dstance Tlng technque drectly to the entre teraton space. Case : There are two knds of dependence and does not overlap wth DCH. Fgure llustrates ths case. Snce and DCH are dsont we can partton the teraton space nto two, wth and DCH belongng to dstnct parttons. From Theorem we know that d (x; y) = wll dvde the DCHs nto unque tal and unque head sets. Next, we partton the area wthn by the lne d (x ; y ) =, and the area wthn DCH by the lne d (x ; y ) =. So, we have four parttons, each of whch s totally parallelzable. Fgure gves one possble partton wth executon order as!!!. Note that the unque head sets must execute after the unque tal sets. Case : There are two knds of dependence and overlaps wth DCH, and there s at least one solated unque set. Fgure and (c) llustrate ths case. What we want to do s to separate ths solated unque set from the others. The lne d (x; y) = s the best canddate to do ths. If d (x; y) = does not ntersect wth any other unque set or another DCH, then t wll dvde the teraton space nto two parts as shown n Fgure. If d (x; y) = does ntersect wth other unque sets or another DCH, we can add one edge of the other DCH as the boundary to partton the teraton space nto two as shown n Fgure (d). Let us denote the partton contanng the solated unque set by Area. The other partton s denoted by Area. If Area contans a unque tal set, then Area must execute before Area, otherwse Area must execute after Area. The next step s to partton Area. Snce Area has only one knd of dependence (as long as we mantan the executon order dened above) and overlaps wth DCH, t falls under the category of case and can be further parttoned. Case : There are two knds of dependence and all unque sets overlap each other. Fgure llustrates ths case. The CDCH can be parttoned nto at most eght parts as shown n Fgure. These parttons are areas that contan The Computer Journal, Vol., No., 997

12 Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences Ant dependence unque head set Flow dependence unque tal set Area Flow dependence unque head set DCH DCH Ant dependence unque tal set Area Ant dependence unque head set Flow dependence unque tal set Area Flow dependence unque head set Area DCH DCH Ant dependence unque tal set (c) (d) FIGURE. Two knds of dependence and one unque set solated only ow dependence tals, and we denote t by Area. only ant dependence tals, and we denote t by Area. only ant dependence heads,and we denote t by Area. only ow dependence heads, and we denote t by Area. ow dependence tals and ant dependence tals, and we denote t by Area. ow dependence heads and ant dependence heads, and we denote t by Area. ow dependence tals and ow dependence heads, and we denote t by Area7. ant dependence tals and ant dependence heads, and we denote t by Area. Area, Area, and Area can be combned together nto a larger area, because they contan only the dependence tals. Let us denote ths combned area by AreaI. In the same way, Area, Area, and Area can also be combned together, because they contan only the dependence heads. Let us denote ths combned area by AreaII. AreaI and AreaII are fully parallelzable. The executon order becomes AreaI! Area7! Area! AreaII. Snce Area7 and Area contan both dependence heads and tals, we can apply Mnmum Dependence Dstance Tlng technque to parallelze ths area. We may not always have all eght areas n ths case. For example, f d (x ; y ) = does not ntersect d (x ; y ) = nsde the CDCH, then ether Area7 or Area exsts, but not both. However, the proposed parttonng and executon order stll hold. Now let us go back to Example. From Fgure, we know that t ts n the category of Case FIGURE. Parttonng scheme for Example The parttonng scheme s shown n gure. There are ve areas. All the teratons n each area are fully parallelzable. These area should be run n the order of!!!!. Area s the overlappng area. Mnmum Dependence Dstance Tlng technque s adopted to partton along the drecton wth mnmum dstance of. The parallelzed code of Example The Computer Journal, Vol., No., 997

13 J. Ju and V. Chaudhary Ant dependence unque tal set Flow dependence unque head set DCH Area DCH Area Area 7 Area Area Area Area Flow dependence unque tal set Ant dependence unque head set Area FIGURE. Two knds of dependence and all unque sets overlapped each other s shown below. /* area */ doparallel =, doparallel = cel(= + ), mn(floor( + :); ) A( + ; + ) = = A( + + ; + + ) /* area */ doparallel =, doparallel = (floor(()= + ) + ), A( + ; + ) = = A( + + ; + + ) /* area */ doparallel = floor(()= + ), cel(x + 7=) doparallel =, A( + ; + ) = = A( + + ; + + ) /* area */ doparallel = floor(()= + ), cel(x + 7=) doparallel = 9, A( + ; + ) = = A( + + ; + + ) /* area */ doparallel =, doparallel =, (cel(= + )? ) A( + ; + ) = = A( + + ; + + ) Ths parttonng scheme seems to be worse than other technques at rst glance. Ths s because the loop upper bounds s only. As the loop upper bounds ncrease, ths scheme wll show the advantage. No matter how large the loop s, t synchronzes only ve tmes. Synchronzaton overhead s always the maor factor that aects the performance.. EXTENSION TO GENERAL NESTED LOOPS We dscussed the parallelzaton of two dmensonal program model n the former sectons. We now look at loops wth n levels of nestngs whose ndces are,,, n. The array subscrpts are lnear functons of loop ndces as shown n gure. do = L, U do n = L n, U n S : A[f ( ; : : : ; n ); : : : ; f m ( ; : : : ; n )] = S : = A[g ( ; : : : ; n ); : : : ; g m ( ; : : : ; n )] FIGURE. General Program Model We want to nd a set of nteger solutons ( ; : : : ; n ; ; : : : ; n) that satsfy the system of Dophantne equatons () and the system of lnear nequaltes (9). f ( ; : : : ; n ) = g ( ; : : : ; n). () f m ( ; : : : ; n ) = g m ( ; : : : ; n) >< >: L U L U L n n U n L n n U n (9) To avod lengthy repetton, we consder DCH as an example to llustrate how to get unque sets. From former sectons, we know that DCH should contan The Computer Journal, Vol., No., 997

14 Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences ow dependence unque head set and ant dependence unque tal set. Usng the second approach to solve the set of Dophantne equatons, we have nteger solutons ( ; ; n ; ; ; n) whch are functons of x ; ; x n. They can be wrtten as: ( ; ; n ; ; ; n) = (s (x ; ; x n ); ; s n (x ; ; x n ); s n+ (x ; ; x n ); ; s n+n (x ; ; x n )) From the general soluton the dependence vector functon D(x ; ; x n ) can be wrtten as D(x ; ; x n ) = f(s n+ (x ; ; x n )? s (x ; ; x n )); >< >: ; (s n+n (x ; ; x n )? s n (x ; ; x n ))g Hence the dependence vectors are: d (x ; ; x n ) = (s n+ (x ; ; x n )? s (x ; ; x n )). d n (x ; ; x n ) = (s n+n (x ; ; x n )? s n (x ; ; x n )) The dependence vector D(x ; ; x n ) dvdes ths DCH nto two parts. One s ow dependence unque head set and the other s ant dependence unque tal set. The decson on the ownershp of D(x ; ; x n ) comes next. The theorems proposed n secton. are also vald for mult-dmensonal loops. d (x ; ; x n ) > belongs to ow dependence unque head set and d (x ; ; x n ) < belongs to ow ant dependence unque tal set. When d (x ; ; x n ) =, d (x ; ; x n ) has to be checked. If d (x ; ; x n ) >, then ow dependence unque head set contans d (x ; ; x n ) = and d (x ; ; x n ) >. If d (x ; ; x n ) <, then ant dependence unque head tal contans d (x ; ; x n ) = and d (x ; ; x n ) <. For d (x ; ; x n ) =, d (x ; ; x n ) has to be checked. We contnue n ths fashon untl d n (x ; ; x n ) s checked. Usng ths method, we can get the unque sets for the gven general program model. Accordng to the postons of these sets, we can partton the teraton space. Durng the parttonng, the area contanng unque tal set must be run before the area contanng unque head set. The parttonng process s bascally the same as for doubly nested loops, except that we now deal everythng wth mult-dmensonal teraton space. The shape of the unque set s also mult-dmensonal. An alternatve way to parallelze mult-dmensonal loops s to parallelze only the two outer most loop nests, leavng nner loops runnng sequentally. The advantages of one approach over the other s left for future work. However, we feel that mult-dmensonal unque set of parttonng wll gve us greater exblty to transform the loops to adapt specc archtectures. 7. EXPERIMENTAL RESULTS We present results for two programs. The rst program s smlar to Example as shown n Fgure. We tested the performance for varyng loop szes. The loop szes (SIZE) used n the experments are,,, and. do =, SIZE do =, SIZE A( + ; + ) = = A( + + ; + + ) FIGURE. Program SUBROUTINE CHOLSKY (IDA, NMAT, M, N, A, NRHS, IDB, B) C C CHOLESKY DECOMPOSITION/SUBSTITUTION SUBROUTINE. C C // D H BAILEY MODIFIED FOR NAS KERNEL TEST C 7 REAL A(:IDA, -M:, :N), B(:NRHS, :IDB, :N), EPSS(:) DATA EPS/E-/ 9 C C CHOLESKY DECOMPOSITION C DO J =, N I = MAX ( -M, -J ) C C OFF DIAGONAL ELEMENTS C 7 DO I = I, - DO JJ = I - I, - 9 DO L =, NMAT A(L,I,J) = A(L,I,J) - A(L,JJ,I+J) * A(L,I+JJ,J) DO L =, NMAT A(L,I,J) = A(L,I,J) * A(L,,I+J) C C STORE INVERSE OF DIAGONAL ELEMENTS C DO L =, NMAT 7 EPSS(L) = EPS * A(L,,J) DO JJ = I, - 9 DO L =, NMAT A(L,,J) = A(L,,J) - A(L,JJ,J) ** DO L =, NMAT A(L,,J) =. / SQRT ( ABS (EPSS(L) + A(L,,J)) ) C C SOLUTION C DO I =, NRHS 7 DO 7 K =, N DO L =, NMAT 9 B(I,L,K) = B(I,L,K) * A(L,,K) DO 7 JJ =, MIN (M, N-K) DO 7 L =, NMAT 7 B(I,L,K+JJ) = B(I,L,K+JJ) - A(L,-JJ,K+JJ) * B(I,L,K) C DO K = N,, - DO 9 L =, NMAT 9 B(I,L,K) = B(I,L,K) * A(L,,K) 7 DO JJ =, MIN (M, K) DO L =, NMAT 9 B(I,L,K-JJ) = B(I,L,K-JJ) - A(L,-JJ,K) * B(I,L,K) C RETURN END FIGURE 7. Program The second program s shown n Fgure 7. Ths s a subroutne taken from a benchmark test program whch has been developed for use by the NAS program at NASA Ames Research Center to ad n the evaluaton of supercomputer. Ths subroutne deals wth the problem of Cholesky Decomposton and Substtuton. We are more nterested n the part from lne 7 to lne. Non-unform dependences can be found n ths part of the program. To llustrate the mpact of non-unform dependence and to make our experment more comprehensve, we use the entre subroutne to evaluate the performance of our technque. In fact, the varable N and N M AT decde the program sze n ths part of program. When we say the the program sze s, both N and NMAT are set to. We present results for The Computer Journal, Vol., No., 997

15 J. Ju and V. Chaudhary Speedup of Unque Sets method Speedup of Chen and Yew s method Speedup of Cray s autotaskng Speedup of Omega proect Speedup of Zaafran and Ito s method Lnear Speedup Speedup of Unque Sets method Speedup of Chen and Yew s method Speedup of Cray s autotaskng Speedup of Omega proect Speedup of Zaafran and Ito s method Lnear Speedup Speedup Speedup CPUs CPUs SIZE = SIZE = Speedup of Unque Sets method Speedup of Chen and Yew s method Speedup of Cray s autotaskng Speedup of Omega proect Speedup of Zaafran and Ito s method Lnear Speedup Speedup of Unque Sets method Speedup of Chen and Yew s method Speedup of Cray s autotaskng Speedup of Omega proect Speedup of Zaafran and Ito s method Lnear Speedup Speedup Speedup CPUs CPUs (c) SIZE = (d) SIZE = FIGURE. Performance Results for Program on Cray program szes,,, and, respectvely. All the experments are done on a Cray J9 wth processors. Autotaskng Expert System(atexpert) are used to analyze the program. Atexpert s a tool developed by CRI (Cray Research, Inc.) for accurately measurng and graphcally dsplayng taskng performance from a ob run on an arbtrarly loaded CRI system. It can predct speedups on a dedcated system from data collected from a sngle run on a non-dedcated system. It shows where a program s spendng most of ts tme and whether those areas are executed sequentally or n parallel. User-Drected Taskng drectves are used to construct parallelzable areas n the teraton space. Synchronzatons are mplemented wth the help of guarded regon. The format s as below. #pragma CRI parallel defaults #pragma CRI taskloop loop #pragma CRI endparallel #pragma CRI guard loop or varable #pragma CRI endguard Our results are compared wth those of Chen and Yew's method 9, Cray's natve Autotaskng, Omega proect of Unversty of Maryland, and Zaafran and Ito's method. Zaafran and Ito's method s not mplemented for Program, because t s unable to handle non-perfect nestngs of loops. To mplement Chen and Yew's method, guarded regons were used to smulate the functon of semaphore. For the method of Omega proect, verson. of the Omega Proect software was used. We run the source codes through P ett, a research tool developed by Unversty of Maryland. It calls both the Omega lbrary and the Unform lbrary and generates parallelzed c source code. We rewrte the parallelzed source codes wth Cray's Autotaskng drectves to do the experments. Fgure shows the speedup comparson of our technque, Chen and Yew's technque, Cray's autotaskng, Omega proect, and Zaafran and Ito's three-regon technque. Cray's autotaskng dd not gve any speedup at all, runnng the loops sequentally. Omega proect dd not parallelze ths program ether. It s not so clear n Fgure, because the speedups of Omega proect and those of Cray's autotaskng are overlapped. Both are. Our method shows near lnear speedup wth the loop sze of and, whch are the models closer to the real world programs. Our technque s consstently outperforms other technques consderably for all szes. Chen and Yew's gave some speedup, but not too much, The Computer Journal, Vol., No., 997

16 Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences 7 Speedup of Unque Sets method Speedup of Chen and Yew s method Speedup of Cray s autotaskng Speedup of Omega proect Lnear Speedup Speedup of Unque Sets method Speedup of Chen and Yew s method Speedup of Cray s autotaskng Speedup of Omega proect Lnear Speedup Speedup Speedup CPUs CPUs Program sze = Program sze = Speedup of Unque Sets method Speedup of Chen and Yew s method Speedup of Cray s autotaskng Speedup of Omega proect Lnear Speedup Speedup of Unque Sets method Speedup of Chen and Yew s method Speedup of Cray s autotaskng Speedup of Omega proect Lnear Speedup Speedup Speedup CPUs CPUs (c) Program sze = (d) Program sze = FIGURE 9. Performance Results for Program on Cray because of the synchronzaton overhead. Zaafran and Ito's method showed very lttle speedup. The sequental regon of ther method s the bottle neck for good performance. The gure shows that the loop szes have a tremendous mpact on the performance even for the same loop usng the same parallelzaton technque. In practce, we alway want to parallelze the loops where programs spend most of ther tme. Fgure 9 shows the performance for the Cholesky Decomposton subroutne. From the plots, t s clear that our technque outperforms all the other technques. As program sze ncreases, our technque shows better results. Cray's Autotaskng got some speed up for ths routne. It parallelzed the nner most loop. Ths s more lke vectorzng than parallelzng. The result of Omega proect s worse than that of Cray's autotaskng when the program sze of, as shown n Fgure 9. As the program sze ncreases, t outperformed the Cray's autotaskng. When the program sze s, the performance of Omega proect s nearly twce that of Cray's autotaskng. The reason s that Cray's autotaskng only parallelzes the nnermost loops, whle Omega proect does not. Overall, Chen and Yew's technque performed worst. Agan, ncreased synchronzaton s responsble for ths.. CONCLUSION In ths paper, we systematcally analyzed the characterstcs of the dependences n the teraton space. We proposed the concept of Complete Dependence Convex Hull, whch contans the entre dependence nformaton of the program. We also proposed the concepts of Unque head sets and Unque tal sets whch solated the dependence nformaton and showed the relatonshp among the dependences. The relatonshp of the unque head and tal sets forms the foundaton for parttonng the teraton space. Dependng on the relatve placement of these unque sets, varous cases were consdered. Several parttonng schemes were also suggested for mplementatng our technque. The suggested scheme was mplemented on a Cray J9 and compared wth Chen and Yew's method 9, Cray's natve Autotaskng, Omega proect of Unversty of Maryland, and Zaafran and Ito's method. The mplementaton results of real benchmark code shows that our technque consstently outperformed all the other technques consderably. ACKNOWLEDGMENTS We would lke to thank Sumt Roy for hs help n the mplementaton of the technques on the Cray J9 and hs comments on a prelmnary draft of the paper. We The Computer Journal, Vol., No., 997

Common loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule:

Common loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule: 15-745 Lecture 6 Data Dependence n Loops Copyrght Seth Goldsten, 2008 Based on sldes from Allen&Kennedy Lecture 6 15-745 2005-8 1 Common loop optmzatons Hostng of loop-nvarant computatons pre-compute before

More information

Compiling for Parallelism & Locality. Example. Announcement Need to make up November 14th lecture. Last time Data dependences and loops

Compiling for Parallelism & Locality. Example. Announcement Need to make up November 14th lecture. Last time Data dependences and loops Complng for Parallelsm & Localty Announcement Need to make up November 14th lecture Last tme Data dependences and loops Today Fnsh data dependence analyss for loops CS553 Lecture Complng for Parallelsm

More information

The Problem: Mapping programs to architectures

The Problem: Mapping programs to architectures Complng for Parallelsm & Localty!Last tme! SSA and ts uses!today! Parallelsm and localty! Data dependences and loops CS553 Lecture Complng for Parallelsm & Localty 1 The Problem: Mappng programs to archtectures

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Calculation of time complexity (3%)

Calculation of time complexity (3%) Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

CS 331 DESIGN AND ANALYSIS OF ALGORITHMS DYNAMIC PROGRAMMING. Dr. Daisy Tang

CS 331 DESIGN AND ANALYSIS OF ALGORITHMS DYNAMIC PROGRAMMING. Dr. Daisy Tang CS DESIGN ND NLYSIS OF LGORITHMS DYNMIC PROGRMMING Dr. Dasy Tang Dynamc Programmng Idea: Problems can be dvded nto stages Soluton s a sequence o decsons and the decson at the current stage s based on the

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Affine transformations and convexity

Affine transformations and convexity Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

An Interactive Optimisation Tool for Allocation Problems

An Interactive Optimisation Tool for Allocation Problems An Interactve Optmsaton ool for Allocaton Problems Fredr Bonäs, Joam Westerlund and apo Westerlund Process Desgn Laboratory, Faculty of echnology, Åbo Aadem Unversty, uru 20500, Fnland hs paper presents

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

A 2D Bounded Linear Program (H,c) 2D Linear Programming

A 2D Bounded Linear Program (H,c) 2D Linear Programming A 2D Bounded Lnear Program (H,c) h 3 v h 8 h 5 c h 4 h h 6 h 7 h 2 2D Lnear Programmng C s a polygonal regon, the ntersecton of n halfplanes. (H, c) s nfeasble, as C s empty. Feasble regon C s unbounded

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k) ISSN 1749-3889 (prnt), 1749-3897 (onlne) Internatonal Journal of Nonlnear Scence Vol.17(2014) No.2,pp.188-192 Modfed Block Jacob-Davdson Method for Solvng Large Sparse Egenproblems Hongy Mao, College of

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling Real-Tme Systems Multprocessor schedulng Specfcaton Implementaton Verfcaton Multprocessor schedulng -- -- Global schedulng How are tasks assgned to processors? Statc assgnment The processor(s) used for

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

form, and they present results of tests comparng the new algorthms wth other methods. Recently, Olschowka & Neumaer [7] ntroduced another dea for choo

form, and they present results of tests comparng the new algorthms wth other methods. Recently, Olschowka & Neumaer [7] ntroduced another dea for choo Scalng and structural condton numbers Arnold Neumaer Insttut fur Mathematk, Unverstat Wen Strudlhofgasse 4, A-1090 Wen, Austra emal: neum@cma.unve.ac.at revsed, August 1996 Abstract. We ntroduce structural

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

THE SUMMATION NOTATION Ʃ

THE SUMMATION NOTATION Ʃ Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be

More information

Formulas for the Determinant

Formulas for the Determinant page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence) /24/27 Prevew Fbonacc Sequence Longest Common Subsequence Dynamc programmng s a method for solvng complex problems by breakng them down nto smpler sub-problems. It s applcable to problems exhbtng the propertes

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

7. Products and matrix elements

7. Products and matrix elements 7. Products and matrx elements 1 7. Products and matrx elements Based on the propertes of group representatons, a number of useful results can be derved. Consder a vector space V wth an nner product ψ

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Min Cut, Fast Cut, Polynomial Identities

Min Cut, Fast Cut, Polynomial Identities Randomzed Algorthms, Summer 016 Mn Cut, Fast Cut, Polynomal Identtes Instructor: Thomas Kesselhem and Kurt Mehlhorn 1 Mn Cuts n Graphs Lecture (5 pages) Throughout ths secton, G = (V, E) s a mult-graph.

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look

More information

Math 261 Exercise sheet 2

Math 261 Exercise sheet 2 Math 261 Exercse sheet 2 http://staff.aub.edu.lb/~nm116/teachng/2017/math261/ndex.html Verson: September 25, 2017 Answers are due for Monday 25 September, 11AM. The use of calculators s allowed. Exercse

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor Taylor Enterprses, Inc. Adjusted Control Lmts for U Charts Copyrght 207 by Taylor Enterprses, Inc., All Rghts Reserved. Adjusted Control Lmts for U Charts Dr. Wayne A. Taylor Abstract: U charts are used

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Fundamental loop-current method using virtual voltage sources technique for special cases

Fundamental loop-current method using virtual voltage sources technique for special cases Fundamental loop-current method usng vrtual voltage sources technque for specal cases George E. Chatzaraks, 1 Marna D. Tortorel 1 and Anastasos D. Tzolas 1 Electrcal and Electroncs Engneerng Departments,

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 13 GENE H GOLUB 1 Iteratve Methods Very large problems (naturally sparse, from applcatons): teratve methods Structured matrces (even sometmes dense,

More information

DUE: WEDS FEB 21ST 2018

DUE: WEDS FEB 21ST 2018 HOMEWORK # 1: FINITE DIFFERENCES IN ONE DIMENSION DUE: WEDS FEB 21ST 2018 1. Theory Beam bendng s a classcal engneerng analyss. The tradtonal soluton technque makes smplfyng assumptons such as a constant

More information

Société de Calcul Mathématique SA

Société de Calcul Mathématique SA Socété de Calcul Mathématque SA Outls d'ade à la décson Tools for decson help Probablstc Studes: Normalzng the Hstograms Bernard Beauzamy December, 202 I. General constructon of the hstogram Any probablstc

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Analytical Chemistry Calibration Curve Handout

Analytical Chemistry Calibration Curve Handout I. Quck-and Drty Excel Tutoral Analytcal Chemstry Calbraton Curve Handout For those of you wth lttle experence wth Excel, I ve provded some key technques that should help you use the program both for problem

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Lecture Note 3. Eshelby s Inclusion II

Lecture Note 3. Eshelby s Inclusion II ME340B Elastcty of Mcroscopc Structures Stanford Unversty Wnter 004 Lecture Note 3. Eshelby s Incluson II Chrs Wenberger and We Ca c All rghts reserved January 6, 004 Contents 1 Incluson energy n an nfnte

More information

O-line Temporary Tasks Assignment. Abstract. In this paper we consider the temporary tasks assignment

O-line Temporary Tasks Assignment. Abstract. In this paper we consider the temporary tasks assignment O-lne Temporary Tasks Assgnment Yoss Azar and Oded Regev Dept. of Computer Scence, Tel-Avv Unversty, Tel-Avv, 69978, Israel. azar@math.tau.ac.l??? Dept. of Computer Scence, Tel-Avv Unversty, Tel-Avv, 69978,

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

5 The Rational Canonical Form

5 The Rational Canonical Form 5 The Ratonal Canoncal Form Here p s a monc rreducble factor of the mnmum polynomal m T and s not necessarly of degree one Let F p denote the feld constructed earler n the course, consstng of all matrces

More information

Line Drawing and Clipping Week 1, Lecture 2

Line Drawing and Clipping Week 1, Lecture 2 CS 43 Computer Graphcs I Lne Drawng and Clppng Week, Lecture 2 Davd Breen, Wllam Regl and Maxm Peysakhov Geometrc and Intellgent Computng Laboratory Department of Computer Scence Drexel Unversty http://gcl.mcs.drexel.edu

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES BÂRZĂ, Slvu Faculty of Mathematcs-Informatcs Spru Haret Unversty barza_slvu@yahoo.com Abstract Ths paper wants to contnue

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Interactive Bi-Level Multi-Objective Integer. Non-linear Programming Problem

Interactive Bi-Level Multi-Objective Integer. Non-linear Programming Problem Appled Mathematcal Scences Vol 5 0 no 65 3 33 Interactve B-Level Mult-Objectve Integer Non-lnear Programmng Problem O E Emam Department of Informaton Systems aculty of Computer Scence and nformaton Helwan

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space. Lnear, affne, and convex sets and hulls In the sequel, unless otherwse specfed, X wll denote a real vector space. Lnes and segments. Gven two ponts x, y X, we defne xy = {x + t(y x) : t R} = {(1 t)x +

More information

PHYS 705: Classical Mechanics. Calculus of Variations II

PHYS 705: Classical Mechanics. Calculus of Variations II 1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary

More information

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed

More information

The KMO Method for Solving Non-homogenous, m th Order Differential Equations

The KMO Method for Solving Non-homogenous, m th Order Differential Equations The KMO Method for Solvng Non-homogenous, m th Order Dfferental Equatons Davd Krohn Danel Marño-Johnson John Paul Ouyang March 14, 2013 Abstract Ths paper shows a smple tabular procedure for fndng the

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information